/r/Database

Photograph via snooOG
  • Data and database centric technologies
  • Open and closed source database systems
  • Related technologies including NOSQL (NotOnlySQL)

  • Related Reddits:


    This is a knowledge sharing forum, not a help, how-to, or homework forum, and such questions are likely to be removed.

    Try /r/DatabaseHelp instead!


    Platforms:


    /r/Database

    59,549 Subscribers

    0

    Had my first introduction into Database design curious about how I could work in my own time and what jobs it leads to.

    1 Comment
    2024/12/03
    20:41 UTC

    2

    Need ‏Help with Functional Dependencies

    Hi everyone, my group and I are struggling to identify and define the functional dependencies. We’re not sure how to properly structure them. If anyone can help us, we’d really appreciate it! Thanks in advance. =)

    1 Comment
    2024/12/03
    17:31 UTC

    1

    Is Excel the best bet for this data base or is there a better option out there?

    Beginner here. I have created an archive for a League of Legends esports league. Currently I'm forced to go into the archives after each season and manually update it on an excel sheet. Does anyone know if there is a better way to store the data? I want to get it to where all I need to do is enter in the Teams and players who played in the most recent season and it would go in to the player profiles and auto update it all.

    https://docs.google.com/spreadsheets/d/1R91Fa6erSAq5htPt2vE7m7rNW-Q9NjXXXAP-fuzUsEw/edit?usp=sharing

    4 Comments
    2024/12/03
    16:35 UTC

    1

    Unify a multiple online search engines

    There are several searchbxes from different online databases that I want to join into a single one. As an example with just two urls, searching the word "cervantes" in each returns as follows:

    https://castillalamancha.ebiblio.es/resources?q=cervantes

    https://galicia.ebiblio.es/resources?q=cervantes

    I've tried google's Programmable Search Engine, adding urls up to the ?q= sequence, but it did not work though.

    What alternatives do I have?

    0 Comments
    2024/12/03
    03:15 UTC

    0

    Vitess, Vstream, and performance

    We have a Vitess cluster on Planetscale but routinely need to scale up the memory and CPU as it grows (no surprise there) but as it gets more complex we are having trouble chasing down performance issues. We recently started utilizing Vstream to do change-data-capture but some of us are worried about performance impact. My intuition tells me that it will have negligible impact on the Vitess cluster itself, and reading the (sometimes incomplete) docs seems to support that, but I figured I would ask -- does anyone have experience with Vitess & Vstream and was the performance impact a consideration when building on it?

    0 Comments
    2024/12/02
    21:54 UTC

    0

    The best database for leaderboards/ranking

    Right now I need to implement a highly loaded ranking system with multi-value sorting. Also, all the values are highly volatile.

    This is what I think at the moment:

    • Redis: of course I know about redis/valkey with its ordered set, but the score value there is 64-bit and I need more to store all the parameters in one value by offsetting them.
    • Postgres/other popular RDBs: I know I can optimize indexing in many ways, but it won't be enough. I need to request the ranks/scores/items very frequently, and RANK or ROW_NUMBER functions are too bad for this purpose.

    I don't have a lot of experience with other databases, maybe someone could recommend me something good for this case? I know it can be realtively easily implemented in Go or something, but I don't want to introduce yet another language into the project.

    9 Comments
    2024/12/01
    18:07 UTC

    2

    TidesDB - High performance, transactional, durable key value store engine (BETA RELEASED!)

    0 Comments
    2024/12/01
    05:31 UTC

    7

    Software developer to DBA

    Hi all,

    I graduated with a software development degree in winter 2023. It took me a year to find my current job, a fullstack developer position. I've been with them for a month now. I felt I have always had a talent for SQL and ever since learning about database management I have only done well. What does the software developer to DBA pipeline in 2024/2025 look like? I looked into certifications and most people online say they aren't worth it if you are already proficient at SQL and utilized them at past jobs. Most of them are oriented towards people with non-technical backgrounds.

    My main goal with becoming a DBA is 1st the money (who isn't?), and 2nd I am always most interested in the database design or querying parts of planning/developing new features, and perhaps I've never been challenged enough but felt like I have had a talent for SQL compared to my peers.

    Sorry if I come off as egotistical, didn't mean that.

    Edit: I will say that with my current position it is an extremely well rounded position because there are no senior developers. There are 3 of us who have each been out of college for a maximum of 2 years and we are responsible for basically the entire organization's programming needs. Its a fairly large organization and we work with code that has been carried through a few generations of programmers.

    5 Comments
    2024/11/30
    20:57 UTC

    7

    Need a simple explanation of 3NF normalization

    lot of terms which i am unsure about such as transitive dependency

    what differentiaties candidate and primary key

    18 Comments
    2024/11/28
    20:56 UTC

    5

    Advice needed: Transitioning from Excel to a database system as a solo data analyst in a small company

    I've been working at a small company for the last few months as their solo data analyst. My predecessor stored everything in Excel, with occasional Power BI reports linked to Excel as the data source. I'm starting to reach my wits' end without a proper database to pull data from or upload new data to. My frequent reports involve manually downloading CSV files from various websites, saving them to data folders, and refreshing Power Queries and Pivot tables.

    In my previous job, I primarily used SQL and Power BI, where we had a setup with all necessary data stored in a database, automatic processes updating the database as new data became available, and auto-refreshes on Power BI to keep reports up to date. However, that company was much larger with dedicated data engineers managing the data warehousing.

    I'm looking for advice on how to transition to shift away from excel. Our data isn't overly complex; I estimate needing only about 10 tables to start. I believe I could put this together over a few months while learning as I go.

    Any advice on tools or what to learn or personal experiences with similar transitions would be greatly appreciated!

    20 Comments
    2024/11/28
    20:34 UTC

    1

    Adivce on obtaining data needed for machine learning project

    Hey!
    I hope the goddess of Fortune is looking after all of you!

    I'm not 100% sure, whether this subreddit is an appropriate one for this type of question. If that's not the case, I apologize to you in advance!

    I'm just starting my machine learning journey by taking the course "Statistical Machine Learning" during my master's. The goal of this project is to apply methods from a paper ( https://pages.cs.wisc.edu/~jerryzhu/pub/zgl.pdf ) either to the same data or to the similar data.

    While trying to obtain data used there, I run into a problem with the price of the data (they want 950$ for it, or for University researchers it's 250$ - I don't think as a student I qualify for this price and even if, it's still way too much ).

    The data I need are the images of the handwritten digits (preferably, but what would also work would be the images of words/letters in Latin alphabet) to analyze them and assign labels to them. The data set I need is rather large - preferably around a thousand images ( more images, the better! ).

    I am stuck - I have no idea, where I could access data sets like this without paying a lot of money. I would be very grateful for any advice for obtaining the datasets for my project/ the datasets itself.

    Thank you in advance!

    0 Comments
    2024/11/28
    18:44 UTC

    2

    Building a Database from Scratch in Go (part 01) - File Manager

    0 Comments
    2024/11/27
    15:56 UTC

    1

    mysql missing after running some commands

    Hi. I'm using mariadb version 10.6 in Ubuntu 20.04. Recently one of my colleagues asked me for access to the server, and I gave it to her using these commands;

    extracted from history

      455  useradd fai
      456  cd /home/
      457  l
      458  ls
      459  useradd -m fai
      460  userdel fai
      461  useradd -m fai
      462  groups
      463  groups workgroup
      464  groups fai
      465  getent
      466  getent group
      467  groups workgroup
      468  usermod -a -G adm sudo
      469  usermod -a -G adm sudo fai
      470  usermod -a -G adm,sudo fai
      471  passwd fai
      472  groups workgroup
      473  usermod -a -G dip,plugdev,lxd fai
      474  usermod -a -G adm,cdrom fai

    Fast forward to today, I wanted to show her how to restore the mariadb database. But a few things have been missing, such as mysql user when I want to run chown -R mysql:mysql /var/lib/mysqland even mysql service is missing. Usually I could just use systemctl stop/start mysql but now I have to use systemctl stop/start mariadb . I have checked and she did not do anything to the server yet (I have her password for now), and this is the only thing I have done to the system since.

    Do you have any idea if the commands I typed caused the issue?

    0 Comments
    2024/11/27
    04:10 UTC

    1

    [PostgreSQL] - Dynamic fields design question

    Hi there!

    I am currently working on an app to help my sports manage members and prospective members, mostly done front-end for the last while so my DB design is a bit rusty.A bit of background first, the way joining the club works is by a waitlist. You put yourself on the waitlist, and we take new members several times a year. Once an new batch is in, they are given an intro class and if they like it then they can become members. I have the following data model to represent this (there is more but this is omitted for brevity):

    https://preview.redd.it/431nh9oirb3e1.png?width=732&format=png&auto=webp&s=b81884aefa87acb90a317b05ffbd829cd48aedf6

    `user_profiles` exists because there is some overlap between the data we collect from the waitlist sign up form with the members profile. If a waitlisted person becomes a member then that data is already in `user_profiles`, the person only needs to be added to the `members` table.

    Now, the issue is that we want to experiment gathering different data points from members and prospective members (i.e how did you hear about us, what is your interest in the sport, etc). These data points might change often as we experiment, and as such I don't think altering these tables is the way to go, as I would need to write a new migration and handling dropping columns for existing data, etc.

    So between researching and asking Claude I have come to the following solution:

    https://preview.redd.it/22if9sbvtb3e1.png?width=732&format=png&auto=webp&s=db6cd8cc6b489687c56d913bb0528edab89f8356

    The idea is as follows:

    • `additional_user_info_schema` would contain a JSON schema stored as `jsonb`. Said JSON schema would contain a schema definition to define a custom field i.e (this is not final just a representation)

    {

    "fieldName": "expectations",

    "type": "list",

    "options": ["Exercise", "Have fun"]

    }

    • The current active schema would be marked as `is_active`, that way I could SSG the front-end and use this schema to know which components to render.
    • Once the user submits these fields, the JSON schema is validated by Postgres (I am using supabase so I have access to `pg_jsonschema`) and stored in the `additional_user_info`.
    • This way we can add new entries ato the additional_user_info_schema table for new experiments, whist keeping the old schema for comparison. And being `jsonb` that means we can query them for analytics, etc.

    Is this a good approach? Am I completely off track? Am I being completely overkill? Keen to hear suggestions, etc. Thanks!

    0 Comments
    2024/11/26
    23:04 UTC

    1

    Help with accessing a WMDB file

    I'm trying to explore some old music data in a ~2003 .wmdb database. No real point other than nostalgia/morbid curiosity about my former musical tastes :)

    My current windows media player does not recognize the file. Any ideas would be appreciated. Thanks!

    4 Comments
    2024/11/26
    16:48 UTC

    9

    Should I stick with postgres or use a graph database?

    Parts of my application is a contacts manager with relations, I need to keep track of who is related to whom, the only reason I am hesitant to switch to neo4j is because the tooling kind of sucks, I am used to drizzle orm and I am not aware of any strong typed orm, I tried graphql ogm but it's lacking in type safety

    I have tried modeling this in postgres but it doesn't seem possible, I am sure it is but I can't think of a way

    I am not concerned about scaling, I am gonna have have 100k contacts at most and search doesn't have to be super fast (for querying with relations)

    13 Comments
    2024/11/26
    07:02 UTC

    0

    Multi-Region Replication: Conflicts and Ordering Issues

    I’m trying to understand how conflicts and ordering issues are handled in a multi-region replication setup. Here’s the scenario: • Let’s assume we have two leaders, A and B, which are fully synced. • Two writes, wa and wb, occur at leader B, one after the other.

    My questions: 1. If wa reaches leader A before wb, how does leader A detect that there is a conflict? 2. If wb reaches leader A before wa, what happens in this case? How is the ordering resolved?

    Would appreciate any insights into how such scenarios are typically handled in distributed systems!

    Is multi-region replication used in any high scale scenarios ? Or leaderless is defecto standard?

    3 Comments
    2024/11/25
    16:14 UTC

    4

    [MySQL] Data Normalization Question - Multiple Possible Types into another Object

    I'm trying to figure out how to model, in the database, a specific concept built around "Ingredients".

    The Middle object in this hierarchy is an Ingredient. An Ingredient can be any one of: Flora(part), Fauna(part), or Fungi(part).

    Initially, I thought to make an IngredientType table that would take FK_Ingredient, and then FK_FloraId, FK_FaunaId, FK_FungiId, and just make the last three each nullable, and rely upon business logic to enforce setting one and only one for a given row.

    However, this doesn't seem the wisest way.

    What is (and why) a smarter way to handle this concept?

    Relationship: Every ingredient *IS A* aspect of a part of Flora, Fauna, or Fungi. But, each ingredient is only one of those. I want to represent this with sound naming and table structuring, that is also logical enough to program against.

    Thank you, in advance for suggestions!

    12 Comments
    2024/11/23
    23:53 UTC

    0

    Need help with an database assignment on normal forms

    I’m having a really hard time understanding how Normal Forms work and what purpose they serve. If anyone could please help me or at least guide me in the right direction, I would be really grateful. I’ve been to all my lectures, I’ve watched YouTube courses and yet I still struggle understanding these seemingly simple topics and have began doubting my understanding and knowledge of everything.

    Maybe I’ve just been unlucky with the courses I’ve been watching or maybe I’m stupid, I don’t know

    10 Comments
    2024/11/23
    22:29 UTC

    1

    Datafusion Tops Clickhouse, DuckDB, and more in Querying Parquet Data

    Fastest Single Node Query Engine For Parquet (Apache Datafusion)

    Apache Datafusion has recently been able to perform faster than huge companies like Clickhouse + DuckDB. I find this quite interesting as from what I see, Datafusion is fully open source and nobody is working on it full time. What are your thoughts?

    0 Comments
    2024/11/21
    19:53 UTC

    4

    Anyone using YugabyteDB in production?

    I'm thinking of proposing YugabyteDB as a geodistributed database with active-active clusters in a SaaS project. Has anyone already used it in production? How does it compare to CockroachDB?

    2 Comments
    2024/11/20
    16:43 UTC

    1

    Oracle DB stuck at Installer

    Tried to uninstall, restart and installed Oracle 21c but it keeps on getting stuck at the Installer page. The logs say "Checking whether the IP address of the localhost could be determined..."

    https://preview.redd.it/6ynciq0w622e1.png?width=1576&format=png&auto=webp&s=912a5f60ae362baa22ee79d4f1b47f3cb8267727

    7 Comments
    2024/11/20
    13:31 UTC

    Back To Top