/r/datasets

Photograph via //r/datasets

A place to share, find, and discuss Datasets.

Datasets for Data Mining, Analytics and Knowledge Discovery

Rules

  • Try to post original source whenever you can.
  • Low effort posts will be removed.
  • Self-promotion(of a website/domain you work for or own) without disclosure will be removed.
  • Any Paid Dataset or Resource must be marked as such in the title with [PAID].
  • Any Synthetic/Mock data must be marked as such in the title with [Synthetic].
  • All Survey posts are subject to approval. Message the mods before posting.

Unsure about your post?

Feel free to message the mods and discuss it before posting.

Related Subreddits

/r/datasets

191,009 Subscribers

1

Help with finding relational database particularly Oil & Gas related

Does anybody know a good source for relational databases/datasets for practising SQL. In the past I used

https://relational.fit.cvut.cz but its not working anymore

0 Comments
2024/05/08
09:28 UTC

1

English - Klingon / Klingon - English dataset

Hi, I am working on an English to Klingon translator for my summer project. I am considering using a transformer model, so I would need a dataset where English phrases are translated to Klingon phrases, or vice versa. Do y'all know where I can find one? Thanks in advance!

1 Comment
2024/05/08
07:07 UTC

1

Labeled voice and text Quran dataset

Hello, I am working on a project and indeed of a voice labeled text quran dataset. I would appreciate any help <3

1 Comment
2024/05/07
16:36 UTC

1

Renters Attributes and Default Rates

Hi reddit,

I'm planning on doing some analysis on renter default rates for residential dwelling units (apartments or houses). I'm hoping to find a dataset that contains fields such income, credit score, ethnicity(optional), zip code, etc. (the more details the better) and whether or not the renter (or buyer) of a property defaulted on the property. Im planning on running some ML models on this, so really the more attributes the better. Any leads will be greatly appreciated!

Thanks!

2 Comments
2024/05/07
23:55 UTC

1

Please help in finding healthcare dataset.

Hello.

Is there any open source pubmed or cardionet like dataset available?

Thanks.

1 Comment
2024/05/07
20:48 UTC

1

Does anyone have experience with FEM data?

I really need to be connected with someone who has experience working with fema data especially the 2023 fema national household survey (https://www.fema.gov/about/openfema/data-sets/national-household-survey). I have no idea what I am doing wrong it took months to turn it to binary.

I really just need to talk to someone who has experience with this dataset. I have cleaned national data before but nothing like this set. If anyone can help or connect me with someone.

Has anyone ever emailed someone like fema to be connected to someone who has used the dataset?

2 Comments
2024/05/07
18:38 UTC

2

Financial dataset 4 persnal project

can anyone please provide some good financial datset for personal projects

3 Comments
2024/05/07
13:35 UTC

3

How does one create a dataset to finetune LLM based on existing txt files ?

Hello, I'm struggling to transform data (CSV, TXT, etc.) into structured data suitable for fine-tuning my LLM. Are there any methods or guides available to help me automate this process?

2 Comments
2024/05/07
09:08 UTC

1

Anyone have experience with working with the NIS/HCUP Datasets in R?

Hi all, trying to load NIS data into R since I don't have access to SAS/STATA/SPSS, they provide load programs for those but nothing for R obviously. However, no matter what I try I can't seem to load it into program? I constantly get column mismatches. The file is several gbs so I can't open a text editor to view it. Anyone have experience with this?

The link to their load programs https://hcup-us.ahrq.gov/db/nation/sasloadprog.jsp?year=2016&db=NIS

3 Comments
2024/05/07
00:36 UTC

1

Resume / CV dataset needed for project

Does anyone know a good place where I can find large number of resume or CV data? How should I go about finding it? Any help is appericiated.

3 Comments
2024/05/06
18:33 UTC

1

I can't for the life of me find historical peak UV index data!

I am no longer associated with a university library otherwise I would enlist the help of a librarian. There doesn't seem to be an easy way to get this info. We have searched the web up and down. Can anybody help?

1 Comment
2024/05/06
14:49 UTC

1

Bourbon dataset - Does It Exist in full form. I see a few whiskey databases out there that have bits and pieces

Is there a dataset that's got most of the following attributes.

  • mash bill

  • average rating

  • flavors.

  • avg cost

  • produced by

  • how long was it aged

1 Comment
2024/05/06
09:45 UTC

0

Sales Forecasting for prediction of a product

What is the best data source to get historical sales Data, UK-related, for sales forecasting?

0 Comments
2024/05/06
09:06 UTC

4

What are some companies that deal with "data for good"? (in the US preferably)

1 Comment
2024/05/05
23:28 UTC

1

Request: News Personalized Recommendation

I’m searching for a news dataset which contains personalized recommended news to users. So far, I found only 1 dataset :(

1 Comment
2024/05/05
17:23 UTC

1

Looking for indoor house plant sales dataset preferably over a few years and after 2020?

Can anyone help me find a dataset for indoor house plant sales that has genus information? This is for a school project. Looking to find trends and the popularity of various plant types over time.

3 Comments
2024/05/05
16:42 UTC

1

Datasets on Age-Related Macular Degeneration (AMD) Eye Disease

Hello, I'm doing a ML project for my 3rd academic year at university. For this I need images of "Age-Related Macular Degeneration (AMD) Eye Disease" in 3 categories.

Normal
Wet
Dry

I have enough images for the Normal condition. But I can't find enough data for the Wet and Dry conditions. At least I need 1000 images per category. Does anyone know where to find datasets for this specific eye disease?

1 Comment
2024/05/05
16:06 UTC

2

request: dataset on El Salvador monthly gang-related homicides and El Salvador monthly gang incarcerations from 2019 onwards

I want two separate datasets. I'm analyzing the effectiveness of the gang crackdown for a data assignment. Thanks.

1 Comment
2024/05/04
20:31 UTC

3

request: dataset of 80s movies with information on smoking, drugs, etc. (like found on commonsensemedia)

Hello. I'm taking a data science course in Python. To practice classification, I wanted to take movies from the 80s from before and after the pg-13 rating came into effect. The idea is to use the movies after the pg-13 rating was in effect to create a model to reclassify the movies before and see which ones that were pg would have been pg-13. I tried https://www.commonsensemedia.org/ as it has a 5-star ratings for things like drinking, swearing, drugs, nudity, etc. However, the number of 80s movies seems to be limited to the ones that are still popular/watched (not surprisingly). Are there any datasets out there that have a lot of 80s movies with this info?

4 Comments
2024/05/04
14:15 UTC

2

What is the best commercial health insurance dataset that contains remittances?

Pretty much what the title says. Any dataset that contains ERAs.

1 Comment
2024/05/04
07:17 UTC

2

A particular dataset I want, on drug policy can only be accessed by those with a British University email address. I would be extremely grateful if someone could get it for me!

A quick request that I would be very grateful if someone could fulfill. A particular dataset I want, the on drug policy voices can only be accessed by those with a British University email address. I would be extremely grateful if someone could get it for me!

The dataset can be found here:

https://reshare.ukdataservice.ac.uk/856279/

It's concerned with the political beliefs of drug users in the UK.

If you manage to get it let me know DM me or say so in the comments and I'll DM you.

Thankyou!

2 Comments
2024/05/04
05:34 UTC

4

Recommendations for beginner friendly dataset for learning R

Hello! I am learning R and I need a dataset to practice doing regression. I wanted to use data from IPUMS but it is not loading properly and now I don’t want to lose anymore time playing with it. Can anyone suggest any social science datasets in R that are easy to work with? I’m interested in inequality but any topic is probably okay. In class we used Boston Housing so probably not that exact one, but something similarly beginner friendly would be good. Thanks in advance for any suggestions!

2 Comments
2024/05/04
01:52 UTC

2

How important are demonstrating that you know JOIN's in your data analyst portfolio for entry level roles? What is the best approach to showcase this knowledge?

Hi guys,

My typical approach when creating portfolio projects is finding a public dataset online (which most are already cleaned, etc. and ready to go). I then come up with specific problems I would like to investigate. I write SQL queries to solve these problems. I then visualize the solutions on a Tableau dashboard to tell a story.

Every job is different but I assume that most will require you to Join multiple tables together prior to analysis. The issue i've come across during portfolio creation are that most datasets that are publicly available online are already put together.

I've come up with the idea of finding two completely unrelated datasets and trying to join them together with a common column but completely struggle with execution due to the complexity of the datasets and a common column not always being available. Ex: Amazon package delivery speeds vs weather and joining on DATES.

I know what joins are and can solve easy to maybe medium SQL leet code Join questions with not that much difficulty but completely struggle with the hard problem as well as my scenario in the prev paragraph. So few questions:

  1. How important are demonstrating that you know joins in a data analyst portfolio for entry level roles? Aka showing the sql code of joining 2+ tables and doing your analysis on that?

  2. if it is needed, how can i demonstrate this? I struggle with joining two completely unrelated datasets together. Is there a better way to do this while still showing that i know joins or should i just keep on doing analysis on fully completed datasets that are already available online?

Thanks so much, greatly appreciate any advice I can get in regards to this!! Located in big city in midwest, USA btw.

0 Comments
2024/05/03
22:02 UTC

0

Womens Health Clinic or Center patient data?

Howdy folks,

Was wondering if someone might possibly have an example data set of a woman's health clinic or center patient data set?

Im interviewing for an org that specializes in customer acquisition for womens health clinics and trying to find any example datasets to build out a portfolio. I know customer acquisition is a bit different than the patient care here, but Id still like to show I could transform this type of data for operations.

I looked on Kaggle and didnt see anything pertaining to this exactly. Maybe some type of clinic data, but not any focused on women in particular.

If you know of anything that might fit, please let me know.

Thank you.

2 Comments
2024/05/03
11:54 UTC

1

HELP!!! NEED DATASET FOR NETWORK ANALYSIS

my final paper is on binge drinking in college and I need data to preform a network analysis.

I need a dataset for the top 2,000 tweets and related network nodes and edge data points relating to #alcohol and another one for #party (or any other # that could relate to this topic) please I am literally begging

2 Comments
2024/05/03
02:05 UTC

1

Dataset on global plants and native area

I'm looking for a dataset connecting global native plants with their natural locations (countries, regions, cities, etc). I've found a few datasets that don't have locations, but cover tons of plants!

Any other datasets you all have used? Thanks!

2 Comments
2024/05/02
23:56 UTC

0

HELP FOR MY STATA PROJECT (FINDING DATASETS)

Hi guys i would like to ask some information about Datasets in Stata, Does someone know where i can download a dta file or an excel in order to do a project It would be better to be official datas i was searching in particular for health datas such as Drug abuse and the use of drugs in Medicine as drugs Otherwise im looking for anything that is interesting as long as makes the professor evaluate the project well! Thanks in advance

4 Comments
2024/05/02
22:19 UTC

8

Complete Dataset of Bluesky posts and interactions

https://zenodo.org/doi/10.5281/zenodo.11082878

This dataset contains the full collection of posts from 80% of Bluesky accounts up to March 2024. Features 235M posts from 4M users spanning over a year. Also comes with interaction data (follows, replies, reposts, likes, etc.).

0 Comments
2024/05/02
18:51 UTC

Back To Top