/r/opendata

11,607 Subscribers

2

Any GovTech folks here? I was at the OpenData meetup in DC last week and curious if any one from that world is active here.

Just looking to see if this is an active govdata community or just opendata

1 Comment
2024/04/30
02:49 UTC

3

Looking for an open source platform to host and share datasets elegantly (and easier than CKAN!)

Hi guys!

I spent quite a few hours today trying to get CKAN setup (both via Kubernetes clustering and via a "simple" Docker deployment).

I eventually got the AWS Marketplace image working but .. I found it such a cumbersome installation process (and the documentation suggests it's not much easier to run).

I'm sure a great and very powerful for governments wishing to share data but ... it seems too hard and "enterprise scale" for my objectives.

Here's what I'm doing:

I'm hoping to create an open access data portal specific to impact investment, a form of finance that tries to integrate sustainability objectives.

I'm thinking, in terms of functionalities:

- Aggregating various open access datasets into one place

- Sharing my edited versions of these source datasets (mostly CSV, JSON)

- It would also be nice to able to embed and share live data (and perhaps even host a sandbox for connecting to a read-only PostgreSQL DB) but ... those are "nice to haves" rather than essential features

Right now I'm updating a Github repository and I was sure that there was something like a CMS that could make the process of sharing datasets more attractive.

Related to my job but ultimately it's a not for profit venture that I'd be bootstrapping. So while I can spin up a VPS for hosting, I'm looking to keep costs reasonable, etc.

TIA for any recommendations!

3 Comments
2024/04/16
02:09 UTC

1

Land concentration in Israel?

Does anyone have any sources about the concentration of land in Israel?

Interested in things like what percent of land value or land area is under control of the largest or wealthiest landholders, maybe split by things like "desert" vs "non-desert", use (like agricultural vs residential vs other) or institution (like individual/business/government).

(I say "concentration of land" rather than "concentration of land ownership" since I think most Israeli land is leased from the government.)

0 Comments
2024/03/26
21:48 UTC

0

Dateno - a new dataset search engine

0 Comments
2024/03/13
14:56 UTC

3

Looking for a database with logical word combinations

Hello,

I am looking for a free data set that represents logical word pairs in the following form. Examples:

  • tree + tree -> forest
  • water + fire -> vapour
  • vapour + steam -> cloud
  • water + water -> River
  • river + river -> ocean
  • king + queen -> Princes

Background: I want to develop a logical game for children in primary school so that they think about the words and then create new words with meaning. I think that there might be such a database with logical connections. Could anyone give me a tip or hint, please?

Thank you very much in advance!

4 Comments
2024/03/09
16:40 UTC

1

Snowfall and snow depth information for a list of US coordinate pairs?

  • I'm working on a low-stakes party game-type thing for my friends, not trying to predict weather or guide big decisions or anything.
  • I'm looking for a good-ish way of showing historical snowfall and/or snow depth information for a list of locations in the US (latitude/longitude).
  • I assume that snow can vary a lot even within a small area, like near a big lake or something. But I'm okay with approximating using data from nearby without adjustments, even if that leads to a lot of inaccuracy. I also don't really care about the timeframe for the historical data that much, I just want information from a long-ish period ending recently-ish.
  • Do you have thoughts on what I should do?

My thoughts so far are either

  • use USA.com to get information by county, and use that;
  • use NCDC data by station, and use the snow data from one or more stations near each of my locations.

Do those make sense? Is there something better?

0 Comments
2024/03/07
18:04 UTC

1

Collection of symbol sets from unicode, for each language, separating punctuation/vowels/consonants/etc., as open data?

I know you can wade through the Unicode/Unihan database files and group the symbols by "unicode block", but are there any open collections of symbols/glyphs which group them by more fine-grained categories? Something like this, but way more.

For example, we might have these JSON files:

  • devanagari-vowels.json
  • devanagari-consonants.json
  • devanagari-letters.json (all letters)
  • devanagari-punctuation.json
  • hebrew-punctuation.json
  • hebrew-letters.json
  • latin-numbers.json
  • latin-lowercase.json
  • latin-uppercase.json
  • latin-other-symbols.json
  • finnish-alphabet.json
  • hungarian-alphabet.json
  • ... lots of ways to group the letters.

I searched around GitHub for a while but didn't find anything (surprisingly!). Have you seen anything like this? Doesn't need to be complete, but hoping not to have to roll my own solution. Thank you for your help.

Perhaps you know of some machine learning tool which has aggregated this stuff (I am imagining like tesseract somewhere). Or some sort of NLP dataset.

Not really sure what this is (https://github.com/unicode-org/cldr-json) but are you able to find it in there perhaps?

0 Comments
2024/03/02
20:09 UTC

8

Find open data + analyze with AI, all in one platform!

Hi! Meet Wobby.

You can find all kinds of statistical, open data on Wobby and analyze it immediately in a sort of ChatGPT environment. You can also upload your own data ;)

https://youtu.be/YJUMYEuUPq4

3 Comments
2024/02/26
16:43 UTC

4

A growing database of AI/ML/DS salaries for 2024 (Open Data)

0 Comments
2024/02/26
10:04 UTC

1

Dataset Containing Federal Criminal Charge Labels and Reference Data

0 Comments
2024/02/01
23:44 UTC

2

How to 'know your customer' when you're in the Open data sector?

I'd like to improve our offer of open data sets and also try to better inform users in case of upcoming changes to datasets and platforms. How do you do this for users who are, by default, not required to sign in on anything?

2 Comments
2024/01/21
12:15 UTC

1

Training data sets or open classifier models for spam identification?

I am doing a project that will be scraping and analyzing large numbers of web pages (>10^(7) pages at a time). One of the things I need to do is efficiently identify spam content, advertisements, banner ads, etc. to pre-filter it.

Are there any pre-existing libraries that accurately classify this sort of material? I'm looking both for text/HTML processing libraries, but also image classification for things like banner ads. If there are not pre-existing open-source libraries that do this, then I would be interested in training data sets that I could use to develop my own filters.

Thanks!

4 Comments
2024/01/21
01:17 UTC

1

Did the rate of workplace injuries drop in 2022 for local trucking? If so, was it just noise?

0 Comments
2023/11/28
02:19 UTC

5

Huge OpenData dataset with a lot of Attributes

Hello community,I'm seeking, for a personal project, a huge opendata dataset which will have a bunch of attributes.

This dataset (or these datasets) will be used to feed a star/snowflake schema which will be used as datasource for an OLAP cube.

Thats why I'm searching for a lot of atttributes (which will become dimensions in the hypercube).

Ideally a sales dataset with product, customer, country, date of sales, unit price, quantity, discount... will be more than welcome.

Thanks in advance for your help !

Bob

4 Comments
2023/10/31
13:34 UTC

1

Opendata for car parts

Hi looking for online car parts database preferably includes russian. Didn’t want to parse other websites

0 Comments
2023/10/29
10:55 UTC

2

Seeking live AIS shipping / ADSB aircraft data

I am particularly interested in the river Thames, or the English channel, for shipping

For aircraft, British airspace would be nice

But, in both cases, I will accept anything worldwide.

I would prefer a realtime feed, but could live with a delay of an hour, day or week. Maybe longer, just as long it is continuous and I can regularly pull fresh data every minute or so.

It is important to note that I do not have a feed to offer them in exchange, which many such sites require.

2 Comments
2023/10/03
16:02 UTC

1

Looking for data related to linguistic discrimination

Hey! I am working on a project related to normative linguistic discrimination. Would appreciate any tips on where I might find relevant data, especially related to education and age. Thanks a lot! I know this is a little vague so please let me know if I can answer questions that might help with the search.

0 Comments
2023/09/24
18:32 UTC

1

PubMed Papers & annotated MESH Terms Dataset?

0 Comments
2023/09/23
10:16 UTC

2

MMXX - Crash Server - a video during the pandemic made in live coding (python/foxDot/supercollider) - with data from Strasbourg Open Data;

1 Comment
2023/09/04
03:32 UTC

1

I created r/imagecaptions for people to share datasets and other resources about captioning images, or even their own captions, with a focus on getting high-quality captions for machine learning

0 Comments
2023/08/28
02:08 UTC

1

Electric vehicle charger recommender

Hello, I've recently released an app for electric vehicle drivers, which ensures they can easily find great charge points. ⚡ ⚡ 🚙

It has coverage in the UK and US and uses the Open Charge Map and OpenStreetMaps databases. We will send user submitted data about chargers to the Open Charge Map database as well to share user contributions.

I was looking for feedback on the app and testers to find how to further improve it, especially the opendata aspects. 🙂

What is unique (I think!) about the app is that its focus is on making it easy to compare charge points by providing users with a ranked list based on travel time, connector numbers, crime rate or the types of amenities or brands users want to have present nearby. i.e. if you'd like a charger with a Starbucks nearby it can find it quite easily. ☕

I am also using crowd-sourced data and machine learning techniques to further power the charger recommender with charge point use forecasts.

The goal is to make the charging experience and environment which users encounter at chargers more predictable for drivers!

Easiest way to download is via electro-app.com.

Please reply below or DM me if you would like to know more!

0 Comments
2023/08/10
09:26 UTC

4

What language is everyone using?

I work in the open data office of a smallish city; we use R for almost everything. For those of you working in open data, what language do you use most?

View Poll

1 Comment
2023/07/05
15:13 UTC

3

Seeking geospatial data of classical Rome

My favourite era is Caesar's, but I will accept anything from the founding to the fall of the Western Empire.

Ideally, lat/long, plus a title. Any description would be a bonus, but a title is enough,and I will research for the description.

Primary interest is Caesar's Rome, secondary is military bases, third is anything notable in the Roman empire, including battle sites, occupied territories, roads, palaces, etc, etc, etc.

As much data as possible please, and I will create an open source project with whatever I get.

I am a programmer who is fascinated by maps. And ancient Rome. And I need a new hobby

0 Comments
2023/07/02
14:44 UTC

7

Introducing NBA Stats API: Access NBA Season and Playoff Totals, Advanced Statistics, and More!

Hello, fellow data enthusiasts and NBA fans!

I am excited to announce the release of my latest project, the NBA Stats API (version 0.1 Beta). This API provides access to NBA season and playoff player totals, advanced statistics, shot chart data, and more. As an NBA fan and data enthusiast myself, I've always had a passion for finding patterns and trends in sports statistics. This API is my contribution to the community, in hopes that it will fuel your own analysis, be it for fantasy leagues, sports journalism, predictive modeling, or simply out of curiosity.

I've put in many hours of work into this project, ensuring that the data is not only accurate but also easy to access and understand. The API is currently in its Beta version (0.1), and I'm excited to see how it will evolve with your valuable feedback and suggestions. Currently, the advanced statistics is in testing and will be made available very soon.

The complete API documentation is available as a POSTMAN collection at the following link: API Documentation.

I've also hosted all the code behind this project on GitHub under MIT license: NBA Stats GitHub Repository

I am continuously working on improving and expanding the API, and your feedback and suggestions are more than welcome. Feel free to ask any questions, provide suggestions, or even share what you've managed to achieve using the API. I'm looking forward to your creations!

I've created a small website to start visualizing this data. Check out my favorite chart displaying Total Points vs. Win Shares. All data on this site fetches from the API.

Thank you for your time and happy data diving!

1 Comment
2023/06/27
02:14 UTC

6

Groundwater datasets

Hey Redditors,

I'm Amiya Sur, currently pursuing my dissertation at the University of Leeds. My research is focused on water governance, specifically investigating the impacts of environmental factors on groundwater levels with a view to develop a predictive model.

I'm facing some challenges with data collection and hoping some of you can point me in the right direction. If anyone has information or resources on datasets related to groundwater levels and environmental characteristics, especially from the African region (but I'm open to others too), it would be a great help.

Any insights or pointers would be much appreciated. Thank you!

Cheers,
Amiya Sur

3 Comments
2023/06/20
15:12 UTC

Back To Top