/r/bigdata

Photograph via snooOG

For all bigdata gurus everywhere from hedgefunds (quant finance) to biotech (drug discovery) to social media (twitter) to discuss the latest trends, topics, career opportunities and tricks of the trade!

Rules: No advertising, don't blatantly link to your own product(s). Posts must be relevant to big data technologies or discussions.

Related subreddits:

r/datascience

r/bigdatajobs

r/machinelearning

r/datagangsta

/r/bigdata

57,852 Subscribers

4

Looking for guidance on how i can start on the field of Bigdata and where I can begin with?

Lemme know about any books which would be helpful for me to progress in understanding the field.

2 Comments
2024/11/01
16:34 UTC

2

Active Graphs: A New Approach to Contextual Data Management and Real-Time Insights

Hey r/bigdata,

I wanted to share something I’ve been working on that could shift how we think about data management and analysis. I call it Active Graphs—a framework designed to organize data not as static tables or isolated points, but as dynamic, context-aware relationships. I’m hoping to get some feedback from the community here and open a discussion on its potential.

What Are Active Graphs?

Active Graphs represent a shift in data structure: each data point becomes a “node” that inherently understands its context within a broader ecosystem, linking dynamically to other nodes based on predefined relationships. Imagine a data model that’s not just about storing information but actively interpreting its connections and evolving as new data comes in.

Key Features:

•	Dynamic, Real-Time Relationships: Relationships aren’t rigidly defined; they adapt as new data is added, allowing for a constantly evolving network of information.
•	Contextual Intelligence: Data isn’t just stored; it understands its relevance within the network, making complex queries simpler and more intuitive.
•	Built for Multi-Domain Data: Active Graphs allow cross-domain insights without re-indexing or reconfiguration, ideal for industries with highly interconnected data needs—think finance, healthcare, and legal.

How Active Graphs Could Be a Game-Changer

Let’s take healthcare as an example. With Active Graphs, patient data isn’t just recorded—it’s actively mapped against diagnoses, treatments, and outcomes. You could run a query like “Show all admitted patients with Pneumonia and their most recent treatments,” and Active Graphs would deliver real-time insights based on all relevant data points. No custom code, no complex reconfiguration—just actionable insights.

Or in finance, imagine a trading bot that can adapt its strategy based on real-time contextual updates. Each trade and indicator would be dynamically linked to broader contexts (like day, week, and market sentiment), helping it make informed, split-second decisions without needing to retrain on historical data.

Why This Matters

Traditional databases and even graph databases are powerful, but they’re often limited by static relationships and rigid schemas. Active Graphs breaks from that by making data flexible, relational, and inherently context-aware—and it’s ready for integration in real-world applications.

TL;DR: Active Graphs turns data into a self-organizing, interconnected network that adapts in real-time, offering new possibilities for industries that rely on complex, evolving datasets. I’d love to hear your thoughts on this approach and how you think it might apply in your field.

Disclaimer: Active Graphs and its associated concepts are part of an ongoing patent development process. All rights reserved.

0 Comments
2024/11/01
05:23 UTC

2

Calling Data Engineers and Architects with hands-on experience in Real-Time and Near Real-Time Streaming Data solutions!

Hi,

If you’re skilled in streaming data – from ingesting and routing to managing and setting real-time alerts – we want to hear from you! We’re seeking experienced professionals to provide feedback on a new product in development.

During the session, we’ll discuss your experience with streaming data and gather valuable insights on our latest design flow.

By participating, you’ll help shape the future of streaming data experiences!

Study Details:

  • Qualified participants will be paid
  • Time Commitment: Approximately 90 minutes.
  • Format: Remote online session.

If you’re interested, please complete this short screener to see if you qualify:

https://www.userinterviews.com/projects/O-tG9o1DSA/apply.

Looking forward to hearing from you!

Best,
Yamit Provizor
UX Researcher, Microsoft – Fabric

0 Comments
2024/10/31
14:11 UTC

3

Beginner’s Guide to Spark UI: How to Monitor and Analyze Spark Jobs

I am sharing my article on Medium that introduces Spark UI for beginners.

It covers the essential features of Spark UI, showing how to track job progress, troubleshoot issues, and optimize performance.

From understanding job stages and tasks to exploring DAG visualizations and SQL query details, the article provides a walkthrough designed for beginners.

Please provide feedback and share with your network if you find it useful.

Beginner’s Guide to Spark UI: How to Monitor and Analyze Spark Jobs

3 Comments
2024/10/29
08:02 UTC

1

Unfolding the Role of Black Box and Explainable AI in Data Science

USDSI® can be the key differentiator that stands you out from the herd and propel your career forward. black box ai

https://preview.redd.it/4nh8h8td4nxd1.jpg?width=1922&format=pjpg&auto=webp&s=6a77e81bbf25d35e8cd21972f6820113d3128455

0 Comments
2024/10/29
06:34 UTC

0

CAN DATA SCIENCE COMMAND THE FUTURE OF BUSINESSES IN 2025?

Foster huge growth with top skills in data visualization, data mining, and machine learning today. Look at the interesting trends and future that data science holds.

https://preview.redd.it/791mf2gl5ixd1.jpg?width=1921&format=pjpg&auto=webp&s=254ee52dcdfb761eda8bc3a6c3162249251977b3

0 Comments
2024/10/28
13:52 UTC

0

i need an alternative to my mysql database

excuse my english as it ain't my first langauge,
so basically i've been programming for 1 year now but it's my first time dealing with databases,
i'm using xampp as a local server on my pc.
i've created a desktop application using python and pyQt5 and it works well
but the problem here is that when i try to retrieve data from my mysql database it takes about 10 minutes
i forgot to tell you the data is 45 million rows about products and my clients would be local clients who need this data for their businesses
i search the database by two columns and if there is a match i return return the whole product info
when i try to only retrieve one column it takes around 2 minutes
is there a faster alternative to retrieve the data faster or am i doing something wrong ?

2 Comments
2024/10/28
09:25 UTC

0

HOW TO GAIN KNOWLEDGE IN DATA SCIENCE | INFOGRAPHIC

Data science is an interdisciplinary field and to succeed in your data science career path, you must have a strong knowledge in the foundational subjects and core disciplines of data science which are Mathematics and statistics, computer science, and domain or industry knowledge.

The knowledge of programming language, mathematical concepts like probability distribution, linear algebra, and business acumen will help you understand the business problem efficiently and develop accurate data science models.

Explore the core data science subjects that you must master before starting your career in data science and learn about specialized data science components like data analysis, data visualization, data engineering, and more in this detailed infographic.

https://preview.redd.it/f9e7l7u251xd1.jpg?width=1200&format=pjpg&auto=webp&s=1e81077f56cd6de3e14fe8d147c3d6ab7c78fd0d

0 Comments
2024/10/26
04:39 UTC

1

Looking for database + analytics solution to analyze 3D printed data

Hello, I am looking for a software which can injest data from a 3D printer and provide a analytics sandbox where that data can be analyzed / dashboards can be built. The type of data ranges from PLC data (export JSON), log files (text), csv files, to images. I am looking at solutions such as Cloudera (seems expensive) or SPLUNK. Does anybody have any other advise for such a flexible software solution that is also affordable? Thanks!

1 Comment
2024/10/25
14:27 UTC

4

Folks who do data modeling: what is your biggest pain in the a**??

What is your most challenging and time consuming task?
Is it getting business requirements, aligning on naming convention, fixing broken pipelines?

We want to build internal tools to automate some of the tasks thanks to AI and wish to understand what to focus on.

1 Comment
2024/10/25
08:22 UTC

0

Partecipate to a research

I developed this questionnaire for my PhD. It analyses the influence of the human factor in Big Data Analytics. To answer you need to work in the field of data analytics. We need to collect a large number of answers for the analysis, if you want to help us it will only take 10 minutes of your time. At the end of the questionnaire (if you have entered your email) you will receive the average of the answers so far to compare with the averages of the other answers.

https://docs.google.com/forms/d/e/1FAIpQLSeIrT1_ERSIcBMYOt8GcDoAKG3cHJ5b3q9W-SBQDmTbzisXBA/viewform?usp=sf_link 

0 Comments
2024/10/24
13:12 UTC

1

Transform Your Accounts Payable &Receivable with Agentic AI

0 Comments
2024/10/24
12:41 UTC

0

A BEGINNER'S ROADMAP TO WB SCRAPING IN PYTHON USING BEAUTIFULSOUP

Looking to explore the world of web scraping? Python's BeautifulSoup is your gateway! Learn how to transform unstructured web data into valuable insights in just a few steps.

https://preview.redd.it/gx7h45upsowd1.jpg?width=1080&format=pjpg&auto=webp&s=acabd6471b11dba64b2eb6ebde2be31328f39a77

1 Comment
2024/10/24
11:09 UTC

2

Data Science v/s Cloud Computing: An Overview

Want to know how data science and cloud computing are shaping the future of business? Our new guide breaks down the key differences and shows you how these technologies work together to drive innovation.

USDSI® presents this unique guide on Data Science vs Cloud computing that discusses how each of these technologies contribute for organizations to making data-driven decisions. The guide also discusses several interesting stats and facts related to data science and cloud computing, for example, AWS is the biggest player in cloud computing with a 31% market share. Did you know it?

Download your copy now and explore more facts.

https://preview.redd.it/oyk0bbz209wd1.jpg?width=1921&format=pjpg&auto=webp&s=5ccede7f54b63a2863512fbb066275dceffab2c3

0 Comments
2024/10/22
06:01 UTC

1

Data Collection vs Data Extraction: Key Differences Explained by a Data Consultant

Hey

I’ve been digging deeper into the distinctions between data collection and data extraction, and I found a great blog that lays it out from a data consultant’s perspective. Here are some interesting insights I came across: 

  • Data Collection: The process of gathering raw data from various sources, either manually or through automated systems. It's all about building a strong foundation for analysis by ensuring you’re pulling in the right information from the right places. 

  • Data Extraction: This involves retrieving specific data from an existing data set (like scraping the web or extracting from documents) to make it usable for analysis. 

The post also goes into how different tools and techniques play a role in these processes and how both are crucial for decision-making, especially in data-driven industries. 

If you’re into the technical nuances of data management or just curious about how these processes differ and overlap, check out the full blog here: Data Collection vs Data Extraction: Insights from a Consultant 

I’d love to hear your thoughts—what’s been your experience dealing with data collection vs data extraction? 

1 Comment
2024/10/21
12:42 UTC

1

Need help! How to upload json files on databricks

I'm given a project on detecting fake reviews on yelp, for this I need to use databricks and apache spark. Here, I have the dataset downloaded in zip folder which have json files in it. As I'm completely new to use databricks, I don't know how to upload this zip file on databricks. Please need help!

2 Comments
2024/10/20
02:11 UTC

2

Top 3 Tips Marketing Teams Need to Know About Data Science In

https://reddit.com/link/1g73bvi/video/0c153gz5wnvd1/player

Data science is changing the game for marketers everywhere. Get ready to supercharge your strategies with data science insights for 2024. In our latest video, you will discover the top three tips every marketing team needs to know about data science. Learn how AI is reshaping marketing tactics, why data democratization is on the rise, and the crucial role of data in delivering personalized customer experiences across channels. Ready to level up? Enroll in USDSI®'s data science certifications today and unlock endless possibilities!

0 Comments
2024/10/19
07:02 UTC

Back To Top