/r/Database
Related Reddits:
This is a knowledge sharing forum, not a help, how-to, or homework forum, and such questions are likely to be removed.
Try /r/DatabaseHelp instead!
Platforms:
/r/Database
Hey everyone!
I recently put together a video explaining the different layers and components of DBMS architecture. I’ve been diving deep into how databases work and thought others might find this useful too. Understanding the internal structure of databases is super helpful for anyone working in software engineering, especially when designing scalable systems.
In the video, I cover:
• The main layers of DBMS architecture: Transport subsystem, query processor, storage engine, execution engine
• Key components within each layer and how they interact.
I wanted to create something that breaks it down clearly, without assuming too much prior knowledge, but still goes into enough detail to be valuable for anyone wanting to level up their understanding of databases.
If you’re someone who’s learning about system design or aiming to grow as a backend engineer, I think this might be really helpful.
Would love to hear your thoughts or answer any questions you have about DBMS architecture!
Watch the video here: https://youtu.be/WWu2cCdDnso?si=scmdux7EhhUXUu4Y
Thanks in advance for checking it out, and I hope it adds value to your journey!!
I have to perform the normalization of these two problems by the first 3 standardization rules, what happens is that reviewing my teacher's document does not specify any primary key in either of the two problems, I do not know very well how to start without a given primary key, if someone can help me until the first normal form I will be grateful
Branch (branch_name, account_number, branch_city, balance, client_name, account_number, operations_registration, date, customer_address, customer_phone, branch_address, branch_phone, amount, account_type)
Exercise 5
Loans (book_code, book_name, publisher, copy, author, no_control, student_name, semester, career, loan_date, return_date, employee_key, employee_name, employee_shift)
Would appreciate some advice (sorry for the lame question)! I work for a relatively small business, we run an inventory management system with a very high SKU library (roughly 80k SKU's). To help with data analysis and information management I want to create a relational database to store product information.
Data entry is currently done via a CSV. Upload to the prospective database will be done via an in-house app that currently parses and uploads differently formatted versions of the CSV to dropbox, I imagine I will just tack a connection to the DB and an ingestion script on.
Familiar with SQL and database design, unsure of what direction and platform would be best! Any advice would be extremely valued. Thank you.
Hi, could anyone help me I got this ER diagram and I should convert it to a relational diagram by mapping of course, it is a ternary relationship with two weak entities and it is an identifying relationship.
Greetings my fello database enthusiasts! Alex here, I'd like to introduce you to a new high performance open source storage engine called K4.
K4 is a library that can be embedded into your GO applications(soon more) and used as a storage engine.
Benchmarks
goos: linux
goarch: amd64
pkg: github.com/guycipher/k4
cpu: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz
BenchmarkK4_Put
BenchmarkK4_Put-16 158104 6862 ns/op # 145,000 ops/s
RocksDB vs K4
+=+=+=+=+=+=+=+
Both engines were used with default settings and similar configurations.
**RocksDB v7.8.3** 1 million writes sequential key-value pairs default settings = 2.9s-3.1s
**K4 v1.0.0** 1 million writes sequential key-value pairs default settings = 1.7s-1.9s
More benchmarks to come.
Features
- High speed writes and reads
- Durability
- Variable length binary keys and values. Keys and their values can be any length
- Write-Ahead Logging (WAL). System writes PUT and DELETE operations to a log file before applying them to the LSM tree.
- Atomic transactions. Multiple PUT and DELETE operations can be grouped together and applied atomically to the LSM tree.
- Paired compaction. SSTables are paired up during compaction and merged into a single SSTable(s). This reduces the number of SSTables and minimizes disk I/O for read operations.
- Memtable implemented as a skip list.
- In-memory and disk-based storage
- Configurable memtable flush threshold
- Configurable compaction interval (in seconds)
- Configurable logging
- Configurable skip list
- Bloom filter for faster lookups. SSTable initial pages contain a bloom filter. The system uses the bloom filter to determine if a key is in the SSTable before scanning the SSTable.
- Recovery from WAL
- Granular page locking
- Thread-safe
- TTL support
- Optional compression support (Simple lightweight and optimized Lempel-Ziv 1977 inspired compression algorithm)
- Background flushing and compaction operations for less blocking on read and write operations
- No dependencies
Do let me know your thoughts!
Thank you :)
So I've been implementing an ecommerce system, let's say I need to gather the information about the total order amount each month and all time in the table OrderStatistics. There are two ways about this:
- Option 1: When an order is complete, you added the order amount in the OrderStatistics row of the current month (create if not exists) and the row for all time. This happens immediately and need to have a transaction to make sure the number is correct.
- Option 2: Running a cron job everyday to calculate the order amount made and add them to the OrderStatistics row of the current month and all time, it needs to be idempotent. This will delayed the info a bit.
Option 1 is neat but it also kinda confuse me, because the data could be wrong in some unexpected way that I don't know of, what if there is also another type of data I need to add later to the OrderStatistics, then I will need to run script to update it correctly.
Option 2 seems to be more accurate and it can fix itself if something goes wrong, but the delay for real data might make user experience suffers.
I have trouble thinking about how to make the data accurate as possible and avoid incorrect display accumulate data to the user.
So I've been implementing an ecommerce system, let's say I need to gather the information about the total order amount each month and all time in the table OrderStatistics. There are two ways about this:
- Option 1: When an order is complete, you added the order amount in the OrderStatistics row of the current month (create if not exists) and the row for all time. This happens immediately and need to have a transaction to make sure the number is correct.
- Option 2: Running a cron job everyday to calculate the order amount made and add them to the OrderStatistics row of the current month and all time, it needs to be idempotent. This will delayed the info a bit.
Option 1 is neat but it also kinda confuse me, because the data could be wrong in some unexpected way that I don't know of, what if there is also another type of data I need to add later to the OrderStatistics, then I will need to run script to update it correctly.
Option 2 seems to be more accurate and it can fix itself if something goes wrong, but the delay for real data might make user experience suffers.
I have trouble thinking about how to make the data accurate as possible and avoid incorrect display accumulate data to the user.
I work in competitive intelligence, and we track a lot of market and competitor information. Our team houses most of our data in excel worksheets as we track competitor activity. However, we also have external public databases that supply information on competitor activity (business with the US Gov't), and then we also have PDF documents that we store with information on each competitor.
Our team of analysts has grown, and we are searching for a solution to bring all of this data together... or at least some of it. I'm trying to understand some solutions well enough that I can take them to our IT team and speak about them knowledgably.
Ideally, we are looking for something that can:
Can you help me understand if this is a reasonable ask and what types of solutions might exist? I'm also interested the in possibilities of RAG to interact with all of this data. Our company uses Oracle databases and analytics and is on the Microsoft office platform for the rest. I know I may be limited to an in-house tool, but for now I want to better understand the possibilities and be better able to define what I am looking for.
As the title suggested, I've been thinking about pursuing the path of an Oracle DBA. I was laid off last month due to reduction in force but I recently received a job offer for another full stack developer position. I honestly don't like working as a full stack developer because I hate JavaScript/typescript or anything front end. Backend development jobs are rare and hard to land. I only accepted the offer because I already have 6 years of full stack development experience which lands me interviews. I have not started the new job yet but they use oracle for their databases and I will try to slip my way into doing more tasks with databases. I've been thinking about doing some self studying to understand linux, improve sql skills, and learn oracle database administration. Does this learning path/strategy seem like a good way to go about getting my foot in the door as an Oracle DBA?
I'm working on registering a water probe sensor device to the database, like those from Eureka, as long as they support Bluetooth. I'm also using python as my main language to work with, and MySQL to design the tables. Has anyone worked on connecting bluetooth devices to the database? Any advice on how I can make it work? And what properties are necessary to register from the device to the database? I'm currently focusing on Eureka water probes but I'm open for any other probes that can connect wirelessly.
Hey y’all,
I’m working on a designing a database for a program where the user will select the types of machines they want, then we collect data about each of the machines they selected.
I’m wondering what the best practice would be for something like this. I have a “project” table that takes the info that is required no matter what, and a have tables for each of the machines. I imagine having “has_machine1”, “has_machine2” etc columns in the project table and the project_id in each of the machine tables would be bad design. I was considering making an empty record in each machine table after the user has made their selections and checking which machine tables have a row matching product_id, but this would mean creating rows that haven’t been populated with information that is required to move on.
It’s a 1:(0-1) relationship between the project table and each machine table. I need to be able to remember which machines were selected regardless of whether they filled out the information for that machine yet. What would y’all call good design for this kind of thing? I can think of solutions, but I have very little database design experience and would like to make good decisions early.
Bonus question: We also take product information, one project to many products. As well as package information, many products to many packages. There are 3 “types” of products, one very common one, and two less common ones with many more columns. There are 4 “types” of packages, two common, two uncommon, none of which share more than 3-4 columns. Am I just looking at lots of junction tables?
Also I’m using postgreSQL if that can simplify things at all. Thanks!
I am looking for a simple database management application that can be used to log data on shopping centers and malls. Something that can log the name and ID number, the landlord, the location info (City, State, ZIP), leasing contact, traffic numbers, website link, etc... Would like it to be able to be filtered and export to excel files and to be easily navigated. We have been using Google AppSheet with an excel sheet as a backup, but do not like it. Let me know some suggestions! Thanks.
The best way to perform data change for second-line support purposes is to fix the bugs in your application so that you don't have to! If a user has a problem with their account, and you run some SQL to fix it... well that's the worst way I can think of to fix a problem, but it does fix it.
But if I was to find myself in a project which used Postgres for its storage, and sufficiently deep in technical debt that this otherwise terrible option was being used as an intermediate solution until all the related bugs could be fixed... what would you recommend, please?
My Google-fu is failing me, probably because I'm not sure what to call this beastie (for instance, are the keywords "enterpise" or "workflow" relevant?)
My ideal system for this imperfect world would include the following features:
Database schema is currently managed by Flyway, so one option, which I hope can be improved upon, is to run a second parallel flyway system (the existing one for development, and a new one for support).
(approval via usual PR process, applied via the usual CI process, data changes recorded in git, data changes not reversable, no templating but I could code some with mustache relatively easily if I had to).
Thanks for your advice, and I hope to find myself in a slightly favorable circle of hell soon!
Hello, I am following this book and it is introducing me to sql alchemy. Basically this is the I have issue with.
cnxn_string = (
"postgresql+psycopg2://{username}:{pswd}@{host}:{port}/{database}"
)
print(cnxn_string)
engine = create_engine(
cnxn_string.format(
username="username",
pswd="password",
host="localhost",
port=5432,
database="sqlda"
)
)
# Book states this works (even shows screenshots)
engine.execute("SELECT * FROM customers LIMIT 2;").fetchall()
My engine object doesnt even have an execute method, so it throws this error as expected.
Whereas I can only get it to work through creating a connection object
# Execute the query using a connection and wrapping SQL in `text`
with engine.connect() as connection:
result = connection.execute(text("SELECT * FROM customers LIMIT 2;"))
rows = result.fetchall()
# Print the fetched rows
print(rows)
Can someone explain why the book would say this would work, I can see from the screenshots that it does on their jupyter notebook as It seems that every cell has executed as normal.
Hi everyone,
I am currently learning how to use neo4j and would like to know where can I get a dataset on which I will first do data profiling and then I will create a database on that in neo4j which I will then perform machine learning on to get analysis and few predictions?
Hey guys,
I have some information in Jira and I was asked to build a relational database on Aws with that data. Do you have any idea how I can transfer the info from Jira to Aws? I was thinking to have an API call or something like it, then build a pipeline in cribl and then send the data to Aws database. If you have any other idea please tell me... I am new to this and I have no idea what I will do 😞
Thank you!!!!
I know this is silly but is it pronounced "es-kyuu-lait" or "skyuu-lait"??
I’m being asked to consolidate four databases into one each with its own application. One of the applications will remain and the other three will be combined into one new application. Weighing the pros and cons of combining all data or combining just the three and leave the database for the other remaining application alone.
I'm volunteering at a small learning center and want to create a database. Seeing how it's a small learning center, it would be best to not use a cloud-based DB for financial savings, so I would like to know if there are any open-source/free DB software I can use that can store a moderate amount of info and can if possible, be implemented and managed it one server for everyone to use (not just have it local on my PC or to one device)
Hello everyone, how are you guys doing?
I am posting this post to find out whether anyone out knows if there are any books that talk about the process of picking a database for a given project.
What I am trying to ask is if any of you know a book that describes the pros and cons of every each kind of database and outlines which I should pick. I don't want only a comparison between NoSQL and SQL, I'd like something that compared MariaDB and MySQL for instance... Or something along that line.
Thank you and may you have a lovely day!
I'm trying to create a database for the company I work for. I've already installed SQLite on my Mac and have tried exploring it with TablePlus, but the features feel limited. I currently track clients and invoices using Numbers sheets, but with so many clients, it's becoming unmanageable. The problem is that working with TablePlus (or any other database manager app) is very different from using Numbers, so I feel lost and frustrated, and I think I need more than two tables. Any advice? I'm also beginning to wonder if the database I'm trying to build is even feasible, which adds to the challenge.
I am developing an application for small businesses and have considered using SQLite as an option. Ideally, I want users to be able to edit the database only through my application. I need only an administrator user (usually the business owner) on Windows to have direct access to the database file, while non-administrator users (with limited permissions) should not have access to this file. I am considering handling this through Windows file permissions. Do you think this approach will work? Will a non-administrator user be able to use the application and edit the database through it without special permissions, or should I take additional measures? I am open to suggestions on managing security in this type of application, including using other methods or others database management systems (free). If you have experience with this, your input would be very helpful. Thank you for your time.
PS: That the non-administrator user can delete something is not relevant to my project, on the contrary, that he can edit the database outside of my application would be (long to explain)
I have a table created titled "SAMPLE" and attributes include ID, Option, User_id (fk1) and device_id(fk2), as sample is a set of information uploaded by the user or both the user and the device they're using. However, that's kind of what I'm struggling for a bit. If the user didn't use a device to upload information but instead the user uploaded manually, what would happen to the device_id foreign key? Is it possible to make it null? Or is there a better alternative?
Hi fellow DB nerds, I am currently exploring the use of MyRocks for write-intensive and storage-intensive use-cases in production.
I am wondering if, by chance, any of you have resources that could point me to useful resources?
Things I'm having issues with as of now:
---> information schema takes only an estimation of the disk size, so I wonder how accurate this might be. whereas on OS level, it's almost impossible to calculate due to how files are all split up in the .rocksdb
Any form of discussion will be greatly appreciated, thanks!