/r/Database
Related Reddits:
This is a knowledge sharing forum, not a help, how-to, or homework forum, and such questions are likely to be removed.
Try /r/DatabaseHelp instead!
Platforms:
/r/Database
So this situation I have is that am I working for a new company that dose not have any kind of database or ERP system. Any "data" they have is contained in very simple excel sheets. One of my long term goals is to start properly collecting and storing our data in a database, as well as providing any of the forms or sheets to collect the data, coming up with a front end to at lest view the data, but preferably insert it as well.
What I have experience with in production is a local midrange IMB iSeries server running a DB2 database, with a built in COBOL & RPG II compiler, and SQL access via IBM's system i navigator or via excels power query.
We are in the process of implementing an ERP solution, but it will be cloud based so I do not expect I will have the same level of access I did to the self hosted solution and doubt I will be able to create new databases or add code to the system as I am used to. But I have made it a requirement that I have at least have read access to their database so I will be able to include that data in custom reporting when the ERP's built in reporting tools fall short of what we need.
Now at my old job the fact that we didn't have a front end for everything, and I had to manually do things, wasn't the biggest problem. But I was only filling the role of Sys admin then, at my new company I have that as well as project management and estimating responsibilities. So I'm going to need a self service option for pulling custom reports
I was hoping to find some kind of solution that would have a database server already installed and preconfigured to a degree, as well as some way write, schedule, and execute code, and a way to automatically import xml, or csv files uploaded to a select location into tables. If there were low code options for creating the front end that would be helpful, but I don't want low code to be my only option.
And advice is greatly appreciated.
Hi everyone. I just published a special edition of my book, filled with lots of practice queries for SQL Server. Check it out if you’d like; I’m super proud of it! The challenges are very realistic, based on AdventureWorks2022. It's OK for beginners but not absolute beginners. Lots of intermediate and difficult problems. Let me know if you have any questions. If you're not from the US I can give you a link to the Amazon listing in your country. Thanks! If self-promotion is not OK here, I apologize in advance!
Hello, I am quite neophyte in databases, I know a little bit of programming but not at an advanced level.
I would like to develop a project to facilitate my work within my company that deals with consumption control in apartment buildings.
My idea is to create a database with the list of all condominiums and the basic information of this condominium (name, address, etc...).
Within each condominium we find the list of the various apartments, with their info (number, owner, any tenants, with the dates they lived there (beginning - end), etc...).
Within each apartment I must have a list then of devices for consumption control (device ID, consumption, errors, etc...).
My question is, how would you set up this project at the database level?
I can't figure out how I should create a database (apartment buildings) with a sub-database inside (apartments).
I have a knowledge base of SQL, then a base of Python, Ruby and Javascript.
Side question: Is it possible to read a csv file and consequently create a database entry based on the info that is written to the file?
Hello,
I’m looking for advice on selecting a tool to track membership in a distributed system. I’m working on a CRDT-based system where clients connect with each other in a peer-to-peer (P2P) network.
To enable a specific garbage collection algorithm, I need processes to have a precise and consistent view of the system's membership (i.e., who is part of the system). Additionally, to maintain this garbage collection algorithm liveness, I need to be able to remove processes that have crashed during execution.
Managing membership in a P2P system is notoriously challenging, which is why I’m seeking the right tool for the job. I’ve come across ZooKeeper and Etcd as potential options for tracking system membership, and would like your advice on this.
hey all, I'm working on a project that would basically require some sort of product catalog for an insurance company.
It's not as extensive as your typical commerce sites, it's just the title, description, image url, image2 url, price, discounted price,etc..
Do you happen to know what's the best approach for this ?
Also would there be by any chance an open source tool for this that includes UI? could be locally hosted, it's okay.
Cheers
Spent nearly all day on this yesterday, didn't work.
NAME
matchesSUFFIXES
DETAILS
Copy only rows in the Mediabay3.db column "Filename" which matches the Serumpresetdb.dat column "PresetDisplayName" in the FILE: Serpresetdb.dat and in the TABLE: Serumpresettable
Copy from Mediabay3.db
Only of rows who's "MediaType"
column is of type "VstPreset"
Copy these columns from Mediabay3.db:
"MediaRating"
"MediaComment"
"MediaComment"
"MusicalCategory"
Located in FILE mediabay3.db
TABLE: media
Copied to the columns Serumpresetdb.dat
"Rating"
"Instrument"
"Category"
"Description"
Again, only where the the Cubase db column: "Filename" matches the serum db column "PresetDisplayName" in the FILE: Serpresetdb.dat TABLE: Serpresettable
I published a new video: Compiler Applications to Query Processing. This presentation has 3 main themes: (1) Query Interpretation, (2) Query Compilation / Code Generation, and (3) an overview of the SIGMOD 2024 paper: Query Compilation Without Regrets, which tries to reconcile the two. I have a more detailed outline and other comments in the description of the video
P.S. I tried to post this twice but it kept getting deleted. I'm not sure why... As far as I can tell, this is related to databases. For example, almost all the papers I mention in the presentation are published in top DB venues.
Hi everyone! I’m working with an SQL database containing hundreds of construction products from a supplier. Each product has a specific name (e.g., Adesilex G19 Beige, Additix PE), and I need to assign a general product category (e.g., Adhesives, Concrete Additives).
The challenge is that the product names are not standardized, and I don’t have a pre-existing mapping or dictionary. To identify the correct category, I would typically need to look up each product's technical datasheet, which is impractical given the large volume of data.
My SQL table currently looks like this:
product_code | product_name |
---|---|
2419926 | Additix P bucket 0.9 kg (box of 6) |
410311 | Adesilex G19 Beige unit 10 kg |
I need to add a column like this:
general_product_category |
---|
Concrete Additives |
Adhesives |
How can I automate this categorization without manually checking every product's technical datasheet? Are there tools, Python libraries, or SQL methods that could help with text analysis, pattern matching, or even online lookups?
Any help or pointers would be greatly appreciated! Thanks in advance 😊
Straight to the point: I'm a fairly seasoned, yet, hobbyist programmer. This generally comes with strangely placed knowledge gaps here and there accompanied by days and days of 1 to 200 opened tabs on chrome exploring the "rabbithole of the month" far and wide. I wanted to throw this one out there: got any beautiful gems of video presentations or walk-thrus on the C programming level of the database designs and implementation details of these databases? Not too interested in the query language, but the pedal to the metal of queues, locks, reads, writes, management, compaction, etc in all their glorious guts? Super bonus points for cool info on geographically spaced yet amazing wizard like powers to keep in sync and somehow know when 50k stadium tickets sell out in the matter of minutes.
This stuff could fill a lifetime I know but I'd really like to lift the veil on some of it. And you know, sometimes there is just that one amazing video out there buried in a sea of ai generated hyper landfill that you'd never find on luck alone 😁
Edit: made a typo in the title, meant RDBMS
Everytime I see an article on double entry accounting in databases, I get conflicting information on how it should be implemented.
The first method I see is two entries into the transactions table, which has a fk relation to the journal. They both show the same amount, but one is a debit and the other a credit, and there is just one account fk per line. Together they make up the whole transaction.
The second method is a single row in a transactions table with to and from account fk columns. While it doesn't look like the old fashion DEA, it still stores all the information, and is automatically balanced. You can always present either direction as a debit or credit in the front end.
The only downside I see is the single line transaction can only represent a to-from, not a 3 or more part transaction. Tigerbeetle recommends getting around that by creating a temporary account and adding extra rows that are linked by a journal Id, sort of a combination of both methods.
Why then do I see so many decry the from-to transaction table? I don't see why information it misses. I understand the benefit in a manual bookkeeping context, but in the storage layer it seems redundant to make 2 entries.
Hi everyone,
Hope you’re all doing well!
I’ve been working on a project for the past few months and trying out different approaches to permission systems. Thought I’d give you a quick rundown and get your thoughts on it.
So, I’ve got a website that stores events and organisations of all sorts (shops, associations, communities, etc.). Each organisation has its members, and every member has an organisational role. These roles are tied to a permissions table (organization_role_permissions), which links resource_permission to the organisational roles. Basically, it’s an RBAC (Role-Based Access Control) setup.
For events, it’s quite similar, users are assigned roles within the event, and each role comes with a set of permissions. When an event is created, the system automatically creates roles like Owner, Admin, and Moderator with their respective permissions.
So, in essence, I’ve got two RBAC systems (one for organisations and one for events).
Now for the tricky bits:
I feel like I’ve been overthinking this a lot recently, and I’d really appreciate your opinions or suggestions on how best to handle it.
Thanks a ton!
I have an app with the following structure:
Company
has many Clients
.Client
has many Projects
.Project
has many Tasks
.User
belongs to a Company
and can only access/edit/delete tasks associated with the same company.I need to ensure that users can only access resources (like tasks) that belong to their company. I’m considering two main approaches:
company_id
to all related tables (e.g., tasks
, projects
, clients
) This would allow quick authorization checks by comparing company_id
directly, reducing the need for joins when querying.task → project → client → company
) and enforce access through the hierarchy, resulting in complex joins but no redundant data.In my opinion Option 1 feels better because i can straight away check if a user can edit a task or not, instead of joining tasks with project and client and then checking the company_id's of them both.
Would there be significant performance or maintainability trade-offs with each approach? Which method would you recommend and why?
Thanks in advance for your insights!
I've been using SQLite to do this project which is to manage a company's inventory and warehouse. I choose SQLite for this C# application because it works easily and it is an embedded software. But The database gets locked always. This problem rarely happened in the start. Now due to excessive queries, the app crashes before working itself.
This is my create connection method :
static SQLiteConnection CreateConnection()
{
SQLiteConnection sqlite_conn;
try
{
sqlite_conn = new SQLiteConnection("Data Source=database.db; Version=3;New=False; Compress=True;");
sqlite_conn.Open();
return sqlite_conn;
}
catch (Exception ex)
{
Console.WriteLine("Connection failed: " + ex.Message);
return null; }
}
These are the 2 methods that I'm calling :
public void TestExecuteNonQuery(string query)
{
SQLiteConnection connw = null;
if (connw != null)
{
Console.WriteLine("connw is not null execute");
connw = CreateConnection();
}
if (connw == null)
{
Console.WriteLine("connw is null execute");
connw = CreateConnection();
}
try
{
SQLiteCommand sqlite_cmd = connw.CreateCommand();
sqlite_cmd.CommandText = query;
sqlite_cmd.ExecuteNonQuery();
}
catch (Exception ex)
{
Console.WriteLine("Command failed execute non query: " + ex.Message);
Console.WriteLine(query);
}
finally
{
connw.Dispose();
}
}
public int get_latest_node_id_tree_exp(String tablename)
{
int lastid = 0;
int count = 0;
Console.WriteLine(lastid);
try
{
if (conn != null)
{
Console.WriteLine("conn is not null select");
conn = CreateConnection();
}
if (conn == null)
{
Console.WriteLine("conn is null select");
conn = CreateConnection();
}
string cql = "SELECT COUNT(*) FROM " + tablename + ";";
SQLiteCommand sqlite_cmd = new SQLiteCommand(cql, conn);
SQLiteDataReader sqlite_datareader = sqlite_cmd.ExecuteReader();
if (sqlite_datareader.Read() && !sqlite_datareader.IsDBNull(0)) // Check for null result
{
count = sqlite_datareader.GetInt32(0);
Console.WriteLine("count = " + count);
}
if (count > 0)
{
string sql = "SELECT id FROM " + tablename + " order by id desc limit 1;";
sqlite_cmd = new SQLiteCommand(sql, conn);
sqlite_datareader = sqlite_cmd.ExecuteReader();
Console.WriteLine(sql);
if (sqlite_datareader.Read() && !sqlite_datareader.IsDBNull(0)) // Check for null result
{
lastid = sqlite_datareader.GetInt32(0);
Console.WriteLine("last id1 = " + lastid);
}
}
}
catch (Exception ex)
{
Console.WriteLine("Error while fetching the last ID: " + ex.Message);
}
conn.Dispose();
Console.WriteLine("last id = " + lastid);
return lastid;
}
This is the OnClick function :
private void button5_Click(object sender, EventArgs e)
{
DBManager db = new DBManager();
Console.WriteLine("exe1");
db.TestExecuteNonQuery("insert into sampletable values('minu',9,3)");
Console.WriteLine("exe2");
db.TestExecuteNonQuery("insert into sampletable values('meow',9,3)");
Console.WriteLine("exe3");
Console.WriteLine(db.get_latest_node_id_tree_exp("tree"));
}
When I press on the button the first time, it gets executed properly. But when I click on the button the second time, it just doesn't work and shows : "Database is Locked"
I've tried many times, with conn.close, conn.dispose
and all options like making conn global variable, static, local variable, and many more. How to prevent this database locking issue. Is it a problem to create multiple connections or to just reuse the connections. Someone Please help me, I need to submit this project to my company soon.
I'm trying to understand when it's best to add indexes to columns that are frequently queried, and I came across some guidelines during my research. Can anyone provide more insights or confirm these ideas?
Here’s what I found:
What do you think ?
Hi everyone,
I am currently working on a database project and ran into an issue in the design that I can't figure out for the life of me. Would appreciate some assistance.
I have three tables relevant to this question: OWNER, ANIMAL, and INSURANCE. Every Animal must have an owner, and every Insurance ID also falls under an owner.
An Owner can have multiple Animals. An Owner does not need to have Insurance, but if they do, that insurance falls under a specific Animal.
My problem is that if I make a relationship between INSURANCE and ANIMAL to see which animal the insurance falls under, I think I will run into a CR error between the three tables. However, I can't think of another way to view the Animal under Insurance.
I have looked into a bridge table but don't understand how that would fix the issue since it just seems like a longer CR to me.
Any insight?
so i am creating an ebay style db where sellers can create product, each product they need to pay before starting an auction. buyers also need to pay before being able to bid (once) but if not successful (either seller or buyer) i re-release the money back to them.
would this SIMPLE transactions table suffice? (i just realised i should also add a product_id so i can associate the transaction with a product, originally i thought i could add to NOTES..
```
CREATE TABLE `transactions` (
...
`type` enum('deposit','withdrawn','used') COLLATE utf8mb4_unicode_ci NOT NULL,
....
```
Whatdatavse are people running on Mainframes. There are many makes, IBM, Oracle, Fujitsu etc, but what is the DB running across all of them.
Or is it just too socialist to the hardware?
Can anyone be running PostgreSQL across mainframe for example?!
I have Windows server OS, i want to use Microsoft SQL but it is very expensive, I’m currently working on a setup where three PCs are connected to a shared Synology device. All three machines are accessing a shared folder that contains code files (primarily in Visual Studio) and a SQLite database. The code, executed directly from this shared folder, reads and writes to the database thousands of times per hour. Additionally, I’m using Python threading and integrating the database with a Telegram bot.
I’ve been experiencing frequent issues, including database locks and other access-related problems. I’m looking for advice on more robust database solutions or approaches. Specifically:
Any insights or guidance on how to transition away from my current setup, while retaining functionality, would be greatly appreciated.
I am asking this again because I want to know which one you think is the better choice. Also, I heard that hosting the database on my Windows Server OS is better than on my Synology. Is that true? Thank you!
There is a new database with a very unique design!
https://medium.com/@sharikrishna1990/a-look-at-aurora-dsqls-architecture-93a5dbc3b856
Firstly apologies if I am asking a stupid question or indeed maybe asking in the wrong place, I am quite new to the data world!
The company I work for acquired another business 2 years ago, who at the time were using a different ERP system to us. In September last year (just before I joined the business) they switched to use the same one as us and all their data from that point on is on a new database associated with the ERP system, which I have access to. Everything prior to this is still on the old database so on the system it looks like there were no sales etc before September 2023.
Is that normal?? I would have thought that it should be added to the new database or linked to feed into the ERP system or something.
Neither company has historically been at all data driven but my role has slowly been evolving into business analysis for the original company, and I am now being asked to do analysis for the other company as well. I need to speak to our IT manager to request access to the old database but I just want to understand what is standard practice here because he is famously not very helpful. All their suppliers and customers have new codes matching original company's scheme, plus new analysis codes etc etc so presumably somewhere there must be tables matching them up which he should be able to share with me?
I've been using Power BI to create dashboards for original company which I want to replicate for new company, so I'm assuming the solution is to pull in data from the old database and map it to the new codes... Just a bit daunted at having to face another database with a whole new schema when I have just got to grips with the first one
Rant: So I have a custom development vendor polluting all my custom application with their company name as the prefix to the tables. What? Not a good idea for several real reasons. What are your thoughts? Is this a new age "marking my territory" way I need to be aware of? seems so unprofessional.
I’m currently working on a setup where three PCs are connected to a shared Synology device. All three machines are accessing a shared folder that contains code files (primarily in Visual Studio) and a SQLite database. The code, executed directly from this shared folder, reads and writes to the database thousands of times per hour. Additionally, I’m using Python threading and integrating the database with a Telegram bot.
I’ve been experiencing frequent issues, including database locks and other access-related problems. I’m looking for advice on more robust database solutions or approaches. Specifically:
Any insights or guidance on how to transition away from my current setup, while retaining functionality, would be greatly appreciated.
I have postgres DB where there are multiple tables(call it as source) from where want to copy data to the other relevant tables(cal it as destination) with almost similar structure(same column names and type, sourcehave morecolums). I also want to copy the references as well. What should be the best practice? I was thinking to create a function that takes source and destination table names and copy the data from only specified columns.
I need to perform this operation multiple times in a day. The tables can have upto 300k records.
I'm developing an enterprise-grade warehouse management application for a chemistry laboratory. A critical feature involves mixing multiple stock solutions to create new solutions based on predefined formulas. However, due to inherent measurement errors during the mixing process, discrepancies arise between real-world measurements and digital data., I am facing some challenges that I do not exactly how to solve.
One of the key use cases is creating a new solution by utilizing existing stock materials. Specifically, users select one of the predefined solutions from our database and specify the desired quantity to produce. The system then calculates the required materials automatically with a 1% of tolerance. For example, to create 10 L of Dissolution A, the system displays:
NaCl 10.00 ± 0.1 g
Glucose 10.00 ± 0.1 g
NaOH 1M 100.00 ± 1 mL
Users then anotates how much of this materials (NaCl, Glucose and NaOH) it uses for the creation and confirm the creation of this new Dissolution A, so, the quantity of this materials is substracted and the new one is added. However, due to measurement inaccuracies and human errors, some of this scenarios can happen:
What are the best architectural practices and engineering techniques to manage this types of situations? Any strategies, design patterns, or validation techniques that can address these challenges effectively or that you know is been applyed in some real world scenarios would be appreciated!
Thank you in advance!
P.S.: I am not a native English speaker, so I used ChatGPT to help me write this question. Sorry if it seems a bit robotic.
My current design.
So currently im working on a quiz system
The rules is like:
Users(Students) attempt a topic for the quiz, and the user's history answers and results will be saved.
There will be multiple topics and questions for the topic.
Admin can create multiple Topics and Lessons for users to learn.
and lets say Topic 1 has around 10 questions, and Topic 2 has 20 questions.
how do I store the user's answer in the table? How do I connect them and is my current diagram correct? what table should I add or remove?
Aside from popular ones like supabase, neon tech, aiven are there some other popular db hosting sites.. Recently i have been facing issue of stmt cache and idk how to resolve with supabase and neon. Aiven provides a really slow db with one to two second latency..
TLDR: Is making sure I can max read speed across all PCIE slots as simple as going with intel and making sure the PCIE buses aren't shared?
I'm going to be building a database(probably with elastic) that'll need a lot of read speed, and I want to make sure that if I ever face the question of whether or not to add another node to it for speed, that I do so knowing I've gotten the most out of the first node that I can, and I'm assuming this would also involve making sure that the queries that I run will include shards that span across as many PCIE slots as I can to keep a single PCIE slot from bottlenecking the read speed.
I noticed on my AMD computer, if I start to add too many disks and USB devices, connectivity issues will pop up. Sometimes the USB devices disconnect, or my mouse/keyboard will become gittery. I'm assuming these issues would also show up in the database context I described. I ran across a line from the youtuber coreteks that made me think this might just be an AMD issue, at least when we're sticking to desktop type hardware, he said this of arrow lake
"It could be a really good option for those who stress the io on their systems, populating all m.2 slots and maximizing usb usage with external devices and so on, both am4 and am5 on the AMD side have been terrible in this regard. Once you start saturating the IO on AMD's platforms, the system completely shits the bed, becoming unstable and dropping connections left and right."
So if I go with an intel build and make sure the PCIE slots all have their own dedicated IO, is that pretty much all there is to making sure I can max read from them all at the same time? Are there any other considerations?
Hello everyone, I've been trying for more than a week to create the E-R diagram for these relationships that I'll explain below, but I still don't have a clear understanding. I don't know if it's because of a lack of foundation, or if the situation might be ambiguous. The thing is, I’ve created my first task management application, and it’s working, but the issue of cardinalities in the E-R diagram is still unclear to me.
Let me explain: there are 3 tables: USERS, TASK, and TASK_USERS.
This is the schema I created: https://imgur.com/a/gwRA1LT, but I think it’s wrong because I believe the relationships between USERS-TASK_USERS and TASK-TASK_USERS, depending on how you look at it, should both be N:M, right?
Honestly, I’m confused, so if anyone could help, I’d appreciate it.