reddit2text is the Python library designed to effortlessly transform any Reddit thread into clean, readable text data.
Perfect for prompting to an LLM, performing NLP/data analysis, or simply archiving for offline use, reddit2text offers a straightforward interface to access and convert content from Reddit.
Easy install using pip
pip3 install reddit2textDeveloping from source: clone the repo and run make — see CONTRIBUTING.md.
First, you need to create a Reddit app to get your client_id and client_secret, in order to access the Reddit API.
Here's a visual step-by-step guide I created to do this! Alternatively, you can look at Reddit's API documentation.
Then, add your credentials. When using env vars: copy .env.example to .env and fill in your Reddit app values:
cp .env.example .envThe user agent can be anything you like, but we recommend following this convention according to Reddit's guidelines: '<app type>:<app name>:<version> (by <your username>)'
This is enough to get started:
from reddit2text import Reddit2Text
r2t = Reddit2Text(
# replace with your actual creds
client_id='123abc',
client_secret='123abc',
user_agent='script:my_app:v1.0 (by u/reddit2text)'
)
URL = 'https://www.reddit.com/r/AskReddit/comments/1by3p2o/whats_the_stupidest_animal_and_how_has_it/'
output = r2t.textualize_post(URL)
print(output)Here is an example (truncated) output from the above code! https://pastebin.com/niQTGbys
- max_comment_depth,
Optional[str]:- Maximum depth of comments to output. Includes the top-most comment. Defaults to
Noneor-1to include all.
- Maximum depth of comments to output. Includes the top-most comment. Defaults to
- comment_delim,
Optional[str]:- String/character used to indent comments according to their nesting level. Defaults to
|to mimic reddit.
- String/character used to indent comments according to their nesting level. Defaults to
r2t = Reddit2Text(
# credentials ...
max_comment_depth=3, # all comment chains will be limited to a max of 3 replies
comment_delim='#' # each comment level will be preceded by multiples of this string
)- Convert any Reddit thread (the post + all its comments) into structured text.
- Include all comments, with the ability to specify the maximum comment depth.
- Configure a custom comment delimiter, for visual separation of nested comments.
Have a Feature Idea?
Simply open an issue on github and tell me what should be added to the next release!
- Comprehensive Formatting/Saving
- Being able to save to a file location as .txt, .csv, .json, or to your clipboard!
- Filtering/Sorting
- Filter/sort comments based on upvotes, author name, body content, number of replies, etc. Also add in the ability to get the Top N comments.
- Extra data fields
- Access extra information for each post/comment, like whether it's NFSW or not and when it was created
- Image/video support
- Enable mining of not just text threads, but also image and video posts
- CLI output
- Add a progress bar to the terminal for threads with a large amount of comments
- Anonymize usernames
- Give the ability to obfuscate usernames, while still preserving their uniqueness across all comments
- Iterate across many posts at once
- Given a subreddit as the input and the sorting method (hot, top, new, etc.), loop over multiple posts at once and textualize them
Contributions are welcome. See CONTRIBUTING.md for development setup and how to run linting before submitting a PR.
reddit2text is released under the Apache License 2.0. See the LICENSE file for more details.