/r/q?req.query.q -- Subreddit Search

198,634 Subscribers

Need Dataset for the final project ..

I need to make a Ai/ML final project for my course, the deadline is for 2 weeks and i have decided to go with personalised learning pathways.... therefore i need for the same so that i can make the project and also some feedback would be good , about is this a good project . If not then , please tell me some ideas or share resources for another idea...but yeh please share the dataset

1 Comment

2024/12/04
08:47 UTC

Looking for a labeled water quality anomaly dataset

Hi good people,

I'm currently working on a project focused on anomaly detection in water quality and am on the lookout for a labeled dataset that include labeled instances of abnormal water quality conditions.

If anyone has come across or worked with such datasets, I’d greatly appreciate it if you could share a link or point me in the right direction.

Any help is much appreciated!

0 Comments

2024/12/04
00:55 UTC

Has anyone used the Health and Retirement Study 2016

I was doing a project using the Health and Retirement Study (HRS) but it turns out the years I wanted to use would not work but 2016 would. The data is downloading as a .dat file which means that it is not possible to open. Has anyone ever used it in the past before it was converted to a .dat file. I need to make this change in 24 hours and have spent the last few months trying to clean the other data I thought I needed. Now I need to make this switch.

0 Comments

2024/12/04
00:54 UTC

Looking for Datasets on the May 6, 2010 Flash Crash

Hi everyone!

I'm a student working on a research project about the 2010 Flash Crash. My focus is on understanding how algorithmic trading and market infrastructure contributed to the event.

I'm searching for historical datasets that capture intraday trading activity on May 6, 2010, particularly for key indices (Dow Jones Industrial Average, S&P 500, and Nasdaq Composite Index) and other heavily impacted individual equities. Ideally, i'm looking for tick-level or minute-by-minute data, but i'm open to aggregated datasets as well.

Also any pointers to how I can obtain this data is appreciated!

Thanks in advance!

1 Comment

2024/12/03
17:57 UTC

Looking for DATA sets sites and sources

Hello everyone,

I am currently working on module as part of my artificial intelligence course in the university, and my task is to develop a module which find correlation connection chronical diseases with ECG and blood test recordings.
I am currently struggling to find the right data sets and recordings on PhysioNet and on Kaggle.
Can you direct to me more websites contain data bases or even specific data sets?

Thanks.

3 Comments

2024/12/03
13:50 UTC

Need help retrieving Parcel Data Set

I'm trying to download the parcel data set from the following public website:

https://gishub-beltramicounty.hub.arcgis.com/datasets/BeltramiCounty::tax-parcels/about

But it seems to keep failing out and not being able to create the download. i've tried this on multiple computers for several different internet connections and haven't been able to get this to work.

Does anyone know what I'm missing here? Or do i just need to email the county and ask for the file directly?

Thank you!

1 Comment

2024/12/03
03:54 UTC

Compare polyA and ribo depletion mRNA head to head in nearly 300 matched samples

0 Comments

2024/12/02
19:26 UTC

Looking for dataset for my project due to next week

Hello everyone, this is my first time posting in here and I'm really really in need of heart beat, geroscope, thermometer,

My project is about detecting phobia specifically agoraphobia using ML and AI yet I couldn't find any dataset for it or any kind of data related to stress and it's too late for me to back off and change the topic

I'm begging you, if you can help me please dont hesitate I am desperate and I dont know what to do

9 Comments

2024/12/02
14:26 UTC

Weight and Height of people in one country over time

People used to be small. And now they are taller and have a heavier BMI. But i wonder what the increase of just weight (mass) over time looks like. Theres data for BMI in ourworldindata and gapminder. But not raw
average mass eg of men in France 1900 60kg, 1920 65kg 1940 70kg etc type data.
The separated out heights and weights that make up BMI.

Do you know a dataset like this?
This wikipedia page links to individual government sites but searching for German data if you are not german is really hard https://en.wikipedia.org/wiki/Human_body_weight

BMI but not height and weight separated https://w3.unece.org/PXWeb2015/pxweb/en/STAT/STAT__30-GE__06-Health/006_en_GEHEWeight_r.px/table/tableViewLayout1/

https://www.gapminder.org/fw/world-health-chart/
Heght and weight but not with by historical time https://www.kaggle.com/datasets/burnoutminer/heights-and-weights-dataset
3 recent years but not a long view https://data.gov.ie/dataset/his53-average-weight
Does the us army have data on the people it takes in each year? That would do it.

0 Comments

2024/12/02
14:07 UTC

Any datasets on every billboard #1 hit since the charts creation?

This is probably a long shot. I know there's a huge one with all charting songs, but I only need #1 songs.

1 Comment

2024/12/01
20:03 UTC

Explore the Minerals Yearbook Dataset for Africa & Middle East

Hi everyone

I’ve just uploaded an exciting dataset to Kaggle that could be a game-changer for researchers, data analysts, and anyone interested in the mineral industry of Africa and the Middle East. 🪨💎

🔍 What’s in the Dataset ?

The dataset is sourced from the USGS Minerals Yearbook and includes:

• 📈 Annual data on mineral commodities across various countries.

• 🏛️ Information on government policies, environmental issues, and trade.

• 🏗️ Insights into industry structure, commodity developments, and infrastructure.

• 📊 XLS files organized by country and year for easy analysis.

This dataset offers a treasure trove of information to analyze trends, explore economic impacts, or even build machine learning models for predictive insights. 🚀

📜 Source and Collection

• Source: USGS National Minerals Information Center.

• Methodology: Data was collected using Scrapy and Selenium to handle dynamic content, ensuring a complete and accurate dataset.

• Disclaimer: The dataset is for educational and research purposes only. It’s not affiliated with or endorsed by USGS.

🔗 Explore Now

You can access the dataset here: https://www.kaggle.com/datasets/amaboh/africa-and-the-middle-east-minerals-dataset/data

(Feel free to upvote if you find it helpful! 👍)

💡 Why Check This Out?

Whether you’re into:

• Data Science 🧑‍💻: Test algorithms on real-world data.

• Economics & Policy 📚: Analyze the role of minerals in regional economies.

• Geography 🌍: Understand spatial trends in mineral production.

• Environmental Studies 🌱: Dive into the impact of mining on the environment.

There’s something here for everyone!

🙌 Let’s Collaborate

I’d love to hear how you’re using the dataset or collaborate on a project. Drop a comment or DM me if you’re exploring something interesting!

Feel free to share this with your networks and let me know what you think. Let’s uncover insights together! 🔎✨

P.S.: If you’re on Kaggle, don’t forget to upvote the dataset if you find it useful and share your findings! 🙏

1 Comment

2024/12/01
15:22 UTC

R2: An Interactive Online Portal for Tumor Subgroup Gene Expression and Survival Analyses, Intended for Biomedical Researchers

1 Comment

2024/12/01
12:35 UTC

Working with vectors of data, building intuition on disparate data sources

If I have two vectors:

exam scores [100, 80, 90]
exam weights [.4, .3, .3]

I can take the dot product of these two and find *i'*s total score. Now across many rows, I can find "the class" total scores by i.

Now, imagine I have two other random vectors:

happiness score [10, 8. 9]
sadness score [10, 8, 8]

Mathematically, nothing is stopping me from taking the dot product of happiness score and sadness score for person i.

This is where my intuition isn't strong. What "possibly" could the dot product of these two mood scores tell me? I am just looking for any random ideas or ways to take the dot product and "make it make sense".

Overall, This will help me take different vectors of data sets and infer insights. So, if you are given two vectors, how do you approach "combining them" to product an output that "makes sense"?

0 Comments

2024/11/30
22:33 UTC

Help regarding NIS Database research analysis

I’m fairly inexperienced with programming/data analysis and I’m unsure of how to proceed with my dataset. Hopefully I’m posting in the correct subreddit.

I’m using a national inpatient hospital database (NIS database) to analyze at how a specific procedure volume changed pre vs. post COVID. I’ve already combined the years I’m looking at (2018-2021), filtered the data for only the procedure code I’m interested in, introduced a time period variable (2018/2019 =1, 2020/2020 =2) and weighed my cases by the “discharge weight” variable to represent population estimates. At this point, each row is basically a count for the procedure.

Now I’m stuck and don’t know what kind of statistical analysis I should be doing and what variables to use. I’ve played around with using independent t test using time period x discharge weights, thinking that each row x discharge weight = estimate of procedures, but I’m not really sure if that’s right.

I’d appreciate it if someone could please help me with this.

1 Comment

2024/11/30
20:17 UTC

Looking for 3D Successful Prints Dataset Sources

Hello guys! We have a thesis proposal entitled: YOLOv10s in anomaly detection and classification in 3D Printing. Since we want to train our model for both images of 3d prints with defects and without defects, we already have dataset source for 3D Prints that have defects but we are having a problem with finding dataset for the one without defect or 3D prints that are successful. Anyone has the idea where we can look sources that has those images, if possible with annotations already. We already looked all throughout Kaggle and Roboflow. Thank you in advance!!

0 Comments

2024/11/30
02:54 UTC

Latin -> Italian translation (5k paired sentences)

https://huggingface.co/datasets/Dddixyy/latin_italian_parallel

I made this dataset of 5k paired latin and italian sentences for translation. You can use this database as u prefer

For translation tasks it's recommended to use a seq2seq model or finetune an existing t5 model

0 Comments

2024/11/29
15:38 UTC

Request for dataset on ccTLD royalty revenue and GDP proportion

I’m looking for a dataset that includes information on the royalty revenue generated by countries’ country-code top-level domains (ccTLDs) and the percentage of this revenue relative to their respective GDPs. Specifically, I am looking for data that covers the following variables:

1.	Country Name or ISO Code
2.	ccTLD Royalty Revenue (annual)
3.	Percentage of ccTLD Royalty Revenue Relative to GDP

0 Comments

2024/11/29
13:46 UTC

Undergraduate Dissertation Dataset Access

Hello,

I am doing my dissertation in music recommendation systems and I was wondering if academic/research access to the Spotify Million Playlist dataset is still available outside the scope of the challenge? The AI Crowd challenge states the following:

"Please note: The dataset associated with this challenge is not available for download anymore. We request you to directly reach out to Spotify Research for access to this dataset."

I have sent an email to Spotify Research to ask for access to the datasets two weeks ago, but I still did not receive any replies, so I was wondering since you can still access the dataset in the resource tab and there is a citation part in the challenge still, can I use it as long as I still cite it?

2 Comments

2024/11/28
14:53 UTC

Help with Calculating Spotify Profile Matches for a Scientific Experiment

Hi everyone,

I’m currently working on my Bachelor’s thesis and I want to calculate the match between Spotify profiles to study its influence on relationship satisfaction. The idea is to have two people authenticate via the Spotify API, and then I analyze their listening data (Top Songs, Artists, Genres, etc.) to create a "match score."

My questions are:

Metrics: What metrics are best for calculating similarity between two users? I’ve been thinking about using Jaccard Index (for genres or artists) and Cosine Similarity (for audio features). Has anyone worked on a similar project?
Automation: Is there a way to replicate the Spotify Blend logic or use similar functions via the API? I would like to automate this match calculation.
Playlist Creation: How can I automatically create a playlist with the best matching songs from both users? I’m currently using Python and the Spotipy library.
Scaling: My goal is to provide this feature to multiple participants in an online experiment. Are there any best practices for integrating Spotify data into web apps (e.g., with Flask or Django)?

I’d appreciate any tips or resources that could help me implement this. Also, if anyone knows how I could contact Spotify directly to learn more about their algorithms (e.g., behind the Blend feature), that would be really helpful.

Thanks in advance for your support!

1 Comment

2024/11/28
13:08 UTC

Looking for a dataset of all Amyloid PET Scan locations in the US, any information is useful.

This is not any ordinary PET/CT location dataset, but the locations need to perform amyloid PET scans. Any info, even at the state or lower level is useful.

1 Comment

2024/11/27
16:57 UTC

Need a Dataset that Maps Disease/Deficiency with the food ingredients to avoid.

I am looking for a dataset that tells me the food ingredients and the number of nutritional values allowed in the food item that a user with a specific disease or deficiency has. For example, the patient with Type 1 diabetes is not allowed to eat x ingredient, and allowed amount of carbohydrate is 40 - 60 per 100 g, like that.

0 Comments

2024/11/27
15:43 UTC

Looking for a Dataset of Common Grammar Mistakes by English Learners

Hi everyone!

I'm working on a project where I need a dataset focused on common grammar mistakes made by people learning English as a second language. Ideally, this dataset would include examples of incorrect sentences along with their corrected versions and, if possible, brief explanations of the corrections.

I’ve heard about resources like the Cambridge Learner Corpus, but it seems to be proprietary. Are there any open-source datasets or tools that provide similar information?

If anyone knows where I can find something like this, or if you have suggestions for creating such a dataset from scratch, I’d really appreciate your input!

3 Comments

2024/11/27
13:35 UTC

Project management datasets required

Hi Everyone, I am writing a doctoral thesis on project management methodology selection for digital product teams. I am looking for datasets which would have certain dimensions of the projects listed (team size, org structure, industry, etc.) the project management methodology applied (e.g. agile, waterfall) and whether the project was a success. I know it's a very specific/particular ask but thought it might be worth asking. Thanks!

1 Comment

2024/11/27
13:39 UTC

Vehicle Repair Dataset to help create flow charts for most common problems

Hello everybody! I am helping a mechanic friend who wants started a personal project and needs some razzle dazzle to convince his bosses to give him more access to repair orders. Is there any open source datasets on repair orders on vehicles or maintenance orders? Thanks in advance!

2 Comments

2024/11/26
23:32 UTC

Spanish and international football database, players and matches

Hello everyone, I would like to know where I can get data on results, lineups, statistics, etc. from first division matches in the Spanish league. Thank you so much

2 Comments

2024/11/25
21:58 UTC

Complete UFC data set fights and fighters

Hello everyone, I would like to know where I can get a dataset with UFC data, fighters, results, age, weight... Thank you so much

0 Comments

2024/11/25
21:56 UTC

Please Help with my project based on detecting grasping points.

Does anyone know about any project available on github which uses yolo for detecting grasping points of an object for a parallel or two plate gripper.

0 Comments

2024/11/25
14:36 UTC

Rules

Unsure about your post?

Related Subreddits

198,634 Subscribers

Need Dataset for the final project ..

Looking for a labeled water quality anomaly dataset

Has anyone used the Health and Retirement Study 2016

Looking for Datasets on the May 6, 2010 Flash Crash

Looking for DATA sets sites and sources

Need help retrieving Parcel Data Set

Compare polyA and ribo depletion mRNA head to head in nearly 300 matched samples

Looking for dataset for my project due to next week

Weight and Height of people in one country over time

Any datasets on every billboard #1 hit since the charts creation?

Explore the Minerals Yearbook Dataset for Africa & Middle East

R2: An Interactive Online Portal for Tumor Subgroup Gene Expression and Survival Analyses, Intended for Biomedical Researchers

Working with vectors of data, building intuition on disparate data sources

Help regarding NIS Database research analysis

Looking for 3D Successful Prints Dataset Sources

Latin -> Italian translation (5k paired sentences)

Request for dataset on ccTLD royalty revenue and GDP proportion

Undergraduate Dissertation Dataset Access

Help with Calculating Spotify Profile Matches for a Scientific Experiment

Looking for a dataset of all Amyloid PET Scan locations in the US, any information is useful.

Need a Dataset that Maps Disease/Deficiency with the food ingredients to avoid.

Looking for a Dataset of Common Grammar Mistakes by English Learners

Project management datasets required

Vehicle Repair Dataset to help create flow charts for most common problems

Spanish and international football database, players and matches

Complete UFC data set fights and fighters

Please Help with my project based on detecting grasping points.