/r/Database
Related Reddits:
This is a knowledge sharing forum, not a help, how-to, or homework forum, and such questions are likely to be removed.
Try /r/DatabaseHelp instead!
Platforms:
/r/Database
So, I was watching a video where the lecturer mentioned a relation where there are three attributes: cookie price, number of cookies, and box price.
If we have the cookie price and number of cookies as a primary composite key, we can say that it functionally determines the box price right? But functionally dependency doesn't necessarily refer to this sort of computational form, as I've seen other examples where it just says that if we have a relation, nd I know about value in one row, then I can determine the values of other attributes in that row by searching for the row.
I can't even begin to figure out what the teacher wants of me. They ask for 4 pages of this cited but then the question is different. Any help would be appreciated.
Hey everyone,
I'm looking for advice on selecting a big data database for two main use cases:
High-Volume Data Storage and Processing: We need to handle tens of thousands of writes per second, storing raw data efficiently for later processing.
Log Storage and Fast Search: The database should manage high log volumes and enable fast searches across many columns, with quick query response times.
We're currently using HBase but are exploring alternatives like ScyllaDB, Cassandra, ClickHouse, MongoDB, and Loki (just for the logging purpose). Cost-effective deployment is a priority, and we prefer deploying on Kubernetes.
Key Requirements:
Support for tens of thousands of writes per second.
Efficient data storage for processing.
Fast search capabilities across numerous columns.
Cost-effective deployment, preferably on Kubernetes.
Questions:
What are your experiences with these databases for similar use cases?
Are there other databases we should consider?
Any specific tips for optimizing these databases for our needs?
Which options are the most cost-effective for Kubernetes deployment?
Thanks in advance for your insights!
I am university student, working on a project for one of my classes. This is my first time using MySql and i an trying to import, using import wizard, a 1GB .cxl file. My laptop (with 16 gb ram) is running for 24+ hours and the import isnt done yet. Is this normal?
Hey guys,
I’m designing a database, and am definitely in over my head but am hoping to do a decent job anyway.
This project tracks machines, products, and packages, and the database is (at least for now) exclusively used to save the user defined data so it can be reloaded later.
All of these categories have different types under them. Meaning theres multiple types of machines, types of products, and types of packages. An example could be two types of packages: a plastic tray and a vacuum sealed pack. Of course these are both packages, but they also have many differences. They both have a length and a width, but only the tray has a height. The vacuum pack needs to know the consistency of what’s inside, while the tray doesn’t care.
So, what I’m asking is: does having repeated columns in multiple tables break the normal forms, or is it just the chance for repeated data that breaks it? A tray and a vacuum pack are two separate entities always. Both packages, but never the same package. Can I make two tables, one for each, and each table have a height and a width column? Or is the proper way to stick to the normal forms having a kind of “parent” package table that holds those shared fields, like length and width, and leave only the unique fields to the “child” tables? The amount of overlap varies a lot. There are machines that need 95% of the same information, and there are machines that need three of the same columns as the rest, along with 20 more.
I’m not sure if that’s the right phrasing, I come from a purely software background, the most I ever do usually is write a query. Im sure there’s going to be some “well it’s really up to you, it’s totally based on the situation”, but I’m just looking for best practices. Thanks!
What are those lines between attributes? And why there is a line from DataInizio that goes to the relationship?
Hi everyone,
I recently published a video discussing a topic that comes up a lot in database design but isn’t often fully explained: why binary trees aren’t the best choice for on-disk storage systems. As I’ve been digging into database internals, I realised this is a critical concept for designing efficient and scalable storage solutions, so I wanted to break it down. I wondered why so much emphasis is given to B trees and why traditional trees are not suitable for on disk storage.
Whether you’re interested in system design, database engineering, or just want to understand database performance at a deeper level, I think you’ll find this valuable.
Check out the video here: https://www.youtube.com/watch?v=bsHu0W2lN8s
I’d love to hear your thoughts or answer any questions about database structures and why this kind of detail matters in real-world applications.
Thanks in advance for checking it out, and I hope it adds value to your journey!!
Hi, I’m working on a personal project and need some help. I have a Postgres database, let’s call it DB1 and a schema called DB1.Sch1. There’s a bunch of tables, say from T1 to T10. Now when my users wants to connect to this database they can connect from several interfaces, some through API and some through direct JDBC connections. What I want to do is, in both the cases I want to intercept the SQL query before it hits the DB, add additional attributes like the username, their team name, location code and store it in a log file or a separate table (say log table). How can I do this, also can I rewrite the query with an additional where clause team_name=<some name parameter >?
Can someone share some light?
I own a small landscaping business and want to take the step to getting a database. What should I do?
Update: I believe I want some type of business intelligence, maybe the ability create dashboards to track my businesses heartbeat. I currently track everything manually on excel.
Hi everyone,
in all the database editor i've tryied everyone missed some modern feature you'd find one something like eclipse/jetbrains'IDE/VS Code etc.
Starting from the fact that still exists program like SQLDeveloper that is a desktop app written in java that is a big jump in the past like we are in 2005 again. I'm not even mad over how ugly it is, but rather on how bad the workflow is, missing shortcuts, drag and drop, newer UI controls and the general laggyness which is a distinctive characteristic on java GUI apps.
I've read somewhere that some features are not needed and existing Database editors gets the job done, so if it's like that why do I need to frequently switch to more modern text editors like VSCode or Notepad++ to get the work done?
Things like advanced search and replace, better code parsing, goddamn dark-mode.
And this was something about the stupid things, now lets talk about what matters: the SQL language itself.
Because of its compiling strategy stored procedures, functions, and packages will bring up one error at a time. So why does not the editor help the developer the same way a IDEl ike NetBeans or Eclipse does (variable not defined, type mismatch, syntax checks, etc.)?
In compiled programming languages not every check is made by the compiler but often the IDE helps correct errors ahead, allowing for fewer errors, in SQL you only have your damn compiler.
From what I see there are not many choices around, and if so they all look the same, because major players are moving towards the cloud, often the SQL editors are now web-based in which you only have 10% of the available features on a desktop counterpart. This is also because said cloud databases are also managed (PaaS and IaaS gatcha stuff) so why even bother with DBA tools?
Rant over, what are your thoughts?
Hi everyone,
I’m working on an e-commerce project with a large dataset – 20-30 million products per user, with a few thousand users. Data arrives separately as products, stock, and prices, with updates every 2 hours ranging from 2,000 to 4 million records depending on the supplier.
Requirements:
LIKE
queries, keyword searches).I’m deciding between SQL (e.g., PostgreSQL with advanced indexing and partitioning) and NoSQL (e.g., MongoDB or Cassandra) for better scalability and performance with large, frequent updates.
Does anyone have experience with a similar setup? Any advice on structuring data for optimal performance?
Thanks!
This may be very idiotic thinking, but bear with.
I was studying my notes for an upcoming uni test, and started thinking
the explicit Definitions provided to me by my lectures slides
Normal Forms
Other
These are the explicit rules. I am following in university.
I appreciate there are implicit rules, but for the sake of this idiotic thinking im purely going of explicits
Take a table that satisfies 1NF, where every value “y” is a random non-repeating INT.
A | B | C |
---|---|---|
y | y | y |
y | y | y |
In this case, there is no PK. Every row is unique and the other 1NF parts are satisfied.
The only way to uniquely identify every row is to use A,B and C
Anyway.
The super keys / Candidate keys are always present in every table. They exist whether you look for them or not
The primary key, similarly, does exist, but it is a matter of choosing it that determines if it will exist or not (this is getting a bit philosophical)
I know there is zero practical reason to have this table. It is purely just a thought experiment. And in this thought experiment. You (from what I can tell) can satisfy the rules of normalisation full without a primary key at all.
idk what you guys will say. I just wanted to get it out tho
I'll probably get roasted for my naive Database understanding, lol.
A technical deepdive on some of the choices we made while building pg_mooncake –– columnstore tables + duckdb execution in Postgres.
https://mooncake.dev/blog/how-we-built-pgmooncake
p.s: I'm one of the founders of the project
I'm working on a personal data bank as a hobby project. My goal is to gather and analyze interesting data, with a focus on psychological and social insights. At first, I'll be capturing people's opinions on social interactions, their reasoning, and perceptions of others. While this is currently a small project for personal or small-group use, I'm open to sharing parts of it publicly or even selling it if it attracts interest from companies.
I'm looking for someone (or a few people) to collaborate with on building this data bank.
Here’s the plan and structure I've developed so far:
user_id
) will link data across tables, allowing smooth and effective cross-category analysis.I'm working on a project that ingests millions of sensor reading per day. This data is processed and eventually ends up in a cloud based SQL Server database. A realtime web app consumes the data in SQL Server. The web app runs arbitrary queries on the data, allowing users to answering questions like "what is the average temperature for all sensor readings in the last 3 months". "What was the average duration it took a sensor to move from NYC to London".
Even with partitioning and index optimization this has proven to be extremely resource intensive for a RDBMS.
While first reading about it, this seems like a job for a Time Series database. However, from what I'm reading, Time Series database seem more like Data warehouses than something a real time web app would consume.