/r/q?req.query.q -- Subreddit Search

14,617 Subscribers

What is easiest way to track keywords by subreddit over time?

I am working on a project where I need to track daily counts of keywords for different subreddits. Is there an easy way to do this aside from downloading all the dumps? What is the easiest way available?

For context, there are 50 keywords and 5 subreddits and I need daily data going back 5 years.

2 Comments

2025/01/30
18:39 UTC

Dump files from 2005-06 to 2024-12

Here is the latest version of the monthly dump files from the beginning of reddit to the end of 2024.

If you have previously downloaded my other dump files, the older files in this torrent are unchanged and your torrent client should only download the new ones.

I am working on the per subreddit files through the end of 2024, but it's a somewhat slow process and will take several more weeks.

15 Comments

2025/01/19
01:15 UTC

Upvote in the comments

Does the separate dump files for the top 40k subreddits also contain the upvotes of the comments and if yes how can I retrieve them as well?

0 Comments

2025/01/17
10:00 UTC

How have the archived subreddits changed over time?

Is there any easy way to figure this out, or would I have to download each monthly dump to check? How often is the list of included subreddits updated? Is it on a monthly basis?

I also have a more basic question. The way I understand it, the entirety of Reddit is archived in the PushShift API, but only the top 20k subreddits are included in the dumps. Is this correct? Or is the API also limited to 20k?

3 Comments

2025/01/13
23:01 UTC

Does the keyword frequency graph on subreddit stats still work?

I tried using it but takes forever to load.

Also, is it possible check trends for specific subreddits instead of the entirety of Reddit?

0 Comments

2025/01/04
13:33 UTC

Can't get a new token

It says "Internal Server Error"

0 Comments

2024/12/30
12:48 UTC

is there a way to bypass the 1000 post cap for posts given by the api

hey guys I'm trying to make a dataset of liminal space images with corresponding likes, but I cant scroll bellow the 1000 post limit, is there anyway to either load more posts or set the posts to be between specific times beyond the generic top today, top week, etc options available normally? thank you for the help (:

2 Comments

2024/12/30
10:33 UTC

Need Posts & Comments for 2022-10

Hi, I need to get all the Reddit posts and comments for year 2022 month 10. I realize there are torrents for all yeas between 2006 and 2023, but I was kind of hoping I wouldn't need to download all 2+ TB of data just to get at the month I need. Is there a place where the monthly files are individually downloadable?

2 Comments

2024/12/26
09:21 UTC

[IMPORTANT] Pushshift Removal Requests

Hello everyone,

We would like to confirm that our systems are operational, including for processing of any removal requests.

As a reminder, please fill out this form if you want to have your account removed from Pushshift: https://docs.google.com/forms/d/1JSYY0HbudmYYjnZaAMgf2y_GDFgHzZTolK6Yqaz6_kQ

Requests are processed within one week at most. If you believe your request has not been addressed by then, please email us at pushshift-support@ncri.io with your account handle and any supporting data (payload, request query, etc.) that can help us address your claims. Please adhere to this method for removal requests. We may not be able to address any requests that are sent via DMs or any other methods.

Best Regards,

Team Pushshift

2 Comments

2024/12/25
19:43 UTC

Is there a way to download data from a particular subreddit without downloading everything

Hi I have a limited internet plan, us there a way to download 1 subreddit data without having to download everything?

6 Comments

2024/12/20
07:44 UTC

Need help with .zst files

I've downloaded a .zst file from the-eye and even after spending hours I haven't come across a proper guide to how can I view the data. I am no expert in python but can work with it if someone gives proper instructions. Please help.

8 Comments

2024/12/19
11:10 UTC

Complete list of authors/usernames on reddit.

Hi iirc there was a list of all reddit usernames or authors on reddit until 202x? I don't remember who posted nor can I find it again. Anyone know where this may be found? Thank you

3 Comments

2024/12/18
22:32 UTC

Help Needed: Scraping 10k+ Reddit Posts for PhD Research Using Pushshift (New to Coding)

Hello!

As context, I am doing medical research for my PhD and a portion of my project involves scraping posts from a particular subreddit and analyzing them. At first, I was using Praw and my Reddit credentials, but I wasn't able to scrape as may posts as I need for robust data. (I'm trying to get at least 10k posts from the past 5 years off of a one subreddit.) I wasn't able to scrape more than 200 at a time, and at one point, I noticed a lot of posts I scraped were duplicated in the dataset.

Now I'm thinking I really need to use Pushshift, but I am unable to pull because I am not a moderator on Reddit. I am wondering if anyone can help me, or alternative ways around? As context, I'm totally new to coding. Thank you!!!

6 Comments

2024/12/18
16:42 UTC

[IMPORTANT] PushShift is not processing removal requests. Submitting the removal or opt-out request form has not been doing anything for months. NCRI, which runs PushShift, has been ignoring communications about this issue.

If you think your removal request has been processed, it hasn't been. I don't know how long this has been ongoing, but PushShift has effectively abandoned processing removal requests despite the understanding by this subreddit that they still are. I know this from personal experience having submitted a request for an old account months ago and still being able to see it in PushShift and also know from others facing the same issue.

For those who don't know, Reddit has a formal partnership with NCRI, which runs PushShift. An official Reddit support page talks about this, too. https://support.reddithelp.com/hc/en-us/articles/16470271632404-Pushshift-Access-Request Part of that partnership is that NCRI would be available to support any issues, with a user u/pushshift-support to contact. Unfortunately, PushShift/NCRI has abandoned this responsibility.

Despite this partnership, PushShift is no longer processing opt-out requests despite this being officially advertised on this stickied post: https://www.reddit.com/r/pushshift/comments/10yj803/removal_request_form_please_put_your_removal/

Even worse, PushShift ignores ALL communications.

Official Reddit support page (https://support.reddithelp.com/hc/en-us/articles/16470271632404-Pushshift-Access-Request) says to message u/pushshift-support, but this account seems to be abandoned and not replying to messages.

I emailed pushshift-support@ncri.io on November 24 about this same issue, and still no response other than a canned auto response telling me they'd get back to me in 2-3 business days.

I contacted NCRI through the contact form on their website https://networkcontagion.us/contact/, and got no response.

NCRI/PushShift is breaking its obligations to Reddit and its users and, due to negligence, lying to them about processing removal requests, while ignoring all communications about this issue. Hopefully this post can help bring awareness to this issue and get NCRI to resolve this issue.

2 Comments

2024/12/13
23:41 UTC

Subreddit metadata

Hi everyone, any pointers/resources to retrieve metadata about subreddits by year, similar to this? https://academictorrents.com/details/c902f4b65f0e82a5e37db205c3405f02a028ecdf

I need to retrieve some info about the time of earliest post. Thank you so much in advance!

0 Comments

2024/12/12
10:57 UTC

PushshiftDumpts/scripts/filter_file.py

Hello!

I am struggling to get the code you have posted on your github(https://github.com/Watchful1/PushshiftDumps/blob/master/scripts/filter\_file.py) to work. I kept everything in the code unchanged after I downloaded it. The only thing I changed was set the end date to 2005-02-01 and the path to the files. Nevertheless, after it finishes going through the file I have 0 entries in my csv file. Any solutions on how to fix that? Would really appreciate it! Thanks a lot in advance!

8 Comments

2024/11/24
13:18 UTC

Need help with data processing for my Masterthesis

Hi everyone,

for my masterthesis I want to test whether there is an empirical correlation of the development of meme stocks and reddit activity. To do so I need reddit data of the subreddits r/wallstreetbets and r/mauerstrassenwetten from beginning of 2020 to most recent date possible. To download the yearly dumps I followed the step by step explanation from u/watchful1 but the files specially the one from wallstreetbet are to big to process them using R (I have to use R). I only need 4 of the 125 columns but I'm not able to delete the unnecessary ones as long as I'm not able to import the data into R. Does anyone have a solution for this problem? And anyone an idea how to get data for 2024?

Would be very very greatful for any help.

Best,

2 Comments

2024/11/23
17:35 UTC

Any mod who can help me!

Im struggling with my uni research where I have to collect somewhat big data about some posts on subreddits and comments. Anyone who have access to the API (need a token). Also want to know that if the API allows for historic data from 2021 to 2023? Is this possible?

4 Comments

2024/11/05
03:04 UTC

Why are some banned subreddits missing data months before their ban?

I am researcher looking at the gendercritical subreddit. Although the subreddit was banned at the end of June, the comment dumps stop mid April. Does the data exist anywhere? And if not why is that so I can at least put a reason as to why the data cuts off.

Thanks

2 Comments

2024/11/04
11:27 UTC

Method Not Allowed error

I've been getting this error for the past couple days. I had access in the past. Is there anything I can do to fix the issue? Or is it happening to others.

This is after trying to authorize from https://api.pushshift.io/signup

1 Comment

2024/09/08
22:28 UTC

Any clue why I get this when I try to authenticate?

{"detail":"User is not an authorized moderator."}

{"detail":"User is not an authorized moderator."}

6 Comments

2024/09/04
22:31 UTC

Need Access for Research

Hi all,

I want to access the reddit data using pushshift API. I raised a request. Can anyone help me how can I get the access at the earliest?

Thanks1

17 Comments

2024/09/04
16:58 UTC

Gab data for research purpose.

Hi, I've been searching for a dataset containing Gab posts. I finally came across a link but there is a login page coming up. I signed up and logged in, but since there is another guardrail requiring approval of requests and requests can only be submitted by moderators. I am unable to get access.

Is there any way of getting access to the data through my researcher credentials.

9 Comments

2024/08/25
03:51 UTC

Help with handling big data sets

Hi everyone :) I'm new to using big data dumps. I downloaded the r/Incels and r/MensRights data sets from u/Watchful1 and are now stuck with these big data sets. I need them for my Master Thesis including NLP. I just want to sample about 3k random posts from each Subreddit, but have absolutely no idea how to do it on data sets this big and still unzipped as a zst (which is too big to access). Has anyone a script or any ideas? I'm kinda lost

8 Comments

2024/08/22
09:33 UTC

How can I view a deleted post

I'm not a programmer, but I know that Pushshift functions as an archive for Reddit. Many posts I've interacted with have been deleted, and sometimes I'd like to see what the original post said. How can I view it?

Additionally, sometimes the post itself isn't deleted, but the original poster's account is gone, and I want to remember who made the post.

1 Comment

2024/08/06
17:13 UTC