4,802 Subscribers

YouTrack is working on binary compatible fork of OrientDB

A mix of graph and object-oriented database written in Java.

GitHub - https://github.com/youtrackdb/youtrackdb

Roadmap - https://youtrack.jetbrains.com/articles/YTDB-A-3/Short-term-roadmap

1 Comment

2025/01/14
04:48 UTC

Old but Gold: Introduction to NoSQL • Martin Fowler

0 Comments

2024/11/20
09:10 UTC

Non-relational database that stores in a single file similar to Sqlite?

Despite the simplicity and drawbacks of Sqlite, one of its perks is that it's all stored in a single file. Since it’s accessed via a file path (instead of local host) and stored in the same file, it can be stored directly in the repo of small or example projects.

Is there any non-relational equivalent to Sqlite that stores its data in a single file (or folder) so it can be easily added to application repositories?

I did a quick search for can't seem to word the question in a way that doesn't just result in traditional SQL databases as alternatives, or non-relational databases that don't fit the single-file criteria.

14 Comments

2024/09/29
19:26 UTC

Help needed with undesrtanding the concept of wide-column store

Hello,

I'm learning about NoSQL databases and I'm struggling to understand what are the advantages and disadvantages of wide-column stores and how they're laid out on the disk. I read a few articles, but they didn't help that much. I thought that it might be good to try to translate this concept into data structures in the language I know (C++), so that I got the basics and then could build my knowledge upon that. I asked ChatGPT to help me with that and this is what it produced. Can you tell me whether it's correct?

For those not knowing c++: using a = x - introducing an alias "a" for "x" std::unordered_map<key, value> - it's a hash map std::map<key, value> - it's a binary search tree which is sorted based on the key


using ColumnFamilyName = std::string;

using ColumnName = std::string;



using RowKey = int;

using Value = std::string;



using Column = std::unordered_map<RowKey, Value>



using ColumnFamily = std::map<ColumnName, Column>;



using WideColumnStore = std::map<ColumnFamilyName, ColumnFamily>;



WideColumnStore db;

My observations:

data is stored on the disk laid out by column family
accessing data from within a single column family is cheap - optimized for quries like give me all people having name "John"
accessing all the data bound with a given row key is expensive (it requires extracting nested data) - poor performance on queries like give me all the details(columns and values) about "John Smith", who is identified by RowKey 123

Are the observations correct? Is there anything else that could help me conceive this concept or I should be aware of?

I would greatly appreciate any help.

0 Comments

2024/09/10
06:04 UTC

Need suggestion

I have inherited quite a lot of MsAccess databases that do the same thing with minor differences (e.g. activities are tracked differently for each customer/country/project). i can normalize a structure that encompasses 90-95% of everything but there Is Always something "absolutely mandatory" else that must be stored and cannot be left behind.

Where would you suggest a newbie to look for a more flexible data structure? I'm asking for a friend that Is required to maintain the mentioned "quite a lot" of MsAccess databases

0 Comments

2024/09/09
11:52 UTC

Do other nosql dbs have an equivalent of dynamo db's event stream?

tldr; Do other nosql dbs have an equivalent of dynamo db's event stream?

The only nosql database I've ever used has been dynamo db. In my previous position we mainly used event driven architecture and used dynamo db event streams all over the place to facilitate these events -- it was a very nice way to avoid the dual write problem

I find myself interviewing for positions and having to do system design interviews. Since I'm unfamiliar with other nosql dbs I always find myself using dynamo db which I don't love

Do other nosql db's have an equivalent of the dynamo db event stream?

4 Comments

2024/08/12
17:01 UTC

I loved to hate nosql (mongodb in particular).

However as a javascript covert I can see the lure and benefits. Considering what you need to do as a dev to store and read some json, the differences between a nosql and sql db are rather stunning.

A sql db, will require proper backend apis, with a dedicated dev or team. You want a field, yeah we are going to need the 305b form in triplicate.
Or if you are the fullstack person doing front and backend, you’ll need to learn a bunch of sql, ddl and write a lot of code to manage schema changes. Then you need to redeploy your backend each time you change data queries or schema (and coordinate that with your team!) or you need to write some more code to make queries and schema dynamic. Then fix and protect against sql-injection.

But, sql db benefits are real and worth the effort, but why is it so hard?

I decided i want a json sql like query and a json schema format. No backend recompile, fully dynamic. Post a json to the /api/query endpoint from the client, enjoy the json results.

Code and more rants here: https://www.inuko.net/blog/platform_sql_for_web_schema/

1 Comment

2024/07/11
21:20 UTC

Ecommerce Project - Product Service should use Mongo or MySQL?

I am trying to create a small e-commerce project using SpringBoot. I have various services in it like User Service (for login/signup purpose), Product Service to search for products, Order Service to add to cart, and so on.

I am working on Product Service right now.
I have identified the following models - Product, Category, Brand, Review and Seller
I am thinking of using mongo db for all of the models. Is that a good idea?

Further this is what my schema looks like -
Product -> Cateogry (M:M relation)
Product -> Brand (M:1 relation)
Product -> Review (1:M relation)
Product -> Seller (M:M relation)

Now my doubts are :

Should I use embedded documents in both directions for Product and Category
Should I keep Seller in MySQL and just store a SQL reference of Seller in Product Collection (Mongo)
Does my Schema Design look okay? Is something missing?

0 Comments

2024/07/11
13:28 UTC

An advice needed for saving lots of data in the same format

I'm going to save lots of data in the same for mat. It will look like: title, description, username, date, and id number. This same format data will be saved tens of thousands of times. The main application is using PostgreSQL relational database. I think NoSQL database will be more efficient for saving simple and repetitive data. I want to use either DynamoDB or MongoDB. Which one is better for a python application? Are they significantly faster for the job I have mentioned? I'll save tens of thousands of data in the same format and retrieve many of them daily.

2 Comments

2024/06/24
20:38 UTC

I need visuals! noSQL vs relational SQL

I've read a dozen articles about noSQL and RDMS, and there's a LOT of text about advantages and disadvantages, but I have yet to find any practical example comparisons, e.g. this is how you do a thing in RDB, this is how you do a similar thing in noSQL. Not one line of code or query. For all I know, any given noSQL database stores the information on an enormous abacus in Portland, Maine.

"The only way to understand it is to do it." If that's the case, I'm screwed because researching this stuff isn't paid for by the Day Job. I have time to read, not time to write a new app.

6 Comments

2024/06/11
16:21 UTC

A Novel Fault-Tolerant, Scalable, and Secure Distributed Database Architecture

In my PhD thesis, I have designed a novel distributed database architecture named "Parallel Committees."This architecture addresses some of the same challenges as NoSQL databases, particularly in terms of scalability and security, but it also aims to provide stronger consistency.

The thesis explores the limitations of classic consensus mechanisms such as Paxos, Raft, or PBFT, which, despite offering strong and strict consistency, suffer from low scalability due to their high time and message complexity. As a result, many systems adopt eventual consistency to achieve higher performance, though at the cost of strong consistency.
In contrast, the Parallel Committees architecture employs classic fault-tolerant consensus mechanisms to ensure strong consistency while achieving very high transactional throughput, even in large-scale networks. This architecture offers an alternative to the trade-offs typically seen in NoSQL databases.

Additionally, my dissertation includes comparisons between the Parallel Committees architecture and various distributed databases and data replication systems, including Apache Cassandra, Amazon DynamoDB, Google Bigtable, Google Spanner, and ScyllaDB.

I have prepared a video presentation outlining the proposed distributed database architecture, which you can access via the following YouTube link:

https://www.youtube.com/watch?v=EhBHfQILX1o

A narrated PowerPoint presentation is also available on ResearchGate at the following link:

https://www.researchgate.net/publication/381187113_Narrated_PowerPoint_presentation_of_the_PhD_thesis

My dissertation can be accessed on Researchgate via the following link: Ph.D. Dissertation

If needed, I can provide more detailed explanations of the problem and the proposed solution.

I would greatly appreciate feedback and comments on the distributed database architecture proposed in my PhD dissertation. Your insights and opinions are invaluable, so please feel free to share them without hesitation.

0 Comments

2024/06/08
01:01 UTC

Aerospike vs. Volt Active Data

Can someone help me understand the difference between these two platforms in the context of real-time data analytics and processing? Seems to me like they share a lot of the same clients, so I'm curious where they might be different vs. very similar.

Thanks in advance for your help!

0 Comments

2024/05/30
14:44 UTC

Drastically reducing the cost of MongoDB Atlas clusters with this tool

For the existing users of MongoDB Atlas, you'd agree: it's a brilliant DBaaS, with a major drawback: the auto-scaling sucks in a lot of use cases! It is based on hardware utilization thresholds (CPU & RAM), and could take up to 24 hours before it scales you down in case you have lesser workload, and you end up paying for the expensive hardware until that point!

I made this little tool called ScaleWithBuddha.com, which allows you to specify a schedule for upgrade and downgrade, so you don't have to pay for the expensive tier for 24 hours a day. This works best for apps with predictable workload, and works alongside all other features of MongoDB Atlas as an add-on.

For example, if your app is used heavily from morning till evening on weekdays, the tool allows you to schedule downgrading in the evening and upgrading again in the morning, repeatable on weekdays. This can help you reduce the cost of MongoDB Atlas Clusters by more than 50% in some cases!

If this interests you, do checkout the tool: https://www.scalewithbuddha.com! Running a 10% off with coupon code REDDIT10.

0 Comments

2024/05/12
06:39 UTC

Exploring Azure Cosmos DB: A Guide to Scalable NoSQL Database Solutions

🚀 Dive into the future of databases with our latest blog on Azure Cosmos DB! 🌐 Discover how this fully managed NoSQL and relational database service can revolutionize your applications through global scalability, massive throughput, and efficient data partitioning. 🌟

🔗 Learn about the key features:

Scalable partitioning (Logical & Physical)
Horizontal scaling for high availability
Global distribution and multi-master replication

🛠️ Plus, get a step-by-step guide on setting up your own Cosmos DB instance!

Perfect for developers looking to elevate their applications to the next level. Check it out now!
https://erwinschleier.medium.com/exploring-azure-cosmos-db-a-guide-to-scalable-nosql-database-solutions-24c5474f74ca

AzureCosmosDB #NoSQL #DataScalability #CloudComputing #MicrosoftAzure

0 Comments

2024/05/08
14:21 UTC

Redis, MongoDB, Cassandra, Neo4J programing tasks

Hello everyone!

I have a few tasks that I need to complete, however I am clueless in python and prefer using R (I do fine, but definitely not the best at understanding it), but do not know where should I begin as programing with databases is different, requires database installation. Is there reliable and easy to understand information so I can complete these tasks using R? The tasks are below for reference.

#1 Task: Redis

The program registers video views. For each visited video (with a text identifier), a view is recorded - which user watched it and when. The program must effectively return the number of views of each video. If necessary, return the list of all unique viewers and for each viewer which videos he has watched.

Comment on why specific capabilities are needed to solve parallel data modification problems (why, for example, using a database without such capabilities would not be possible).

Requirements for the task:
a) The program should allow the creation, storage and efficient reading of at least 2 entities (entity - an object existing in the subject area, for example, a car in a car service, a student, a course, a lecture, a teacher in a university). If entities need to be read according to different keys (criteria), the application must provide for efficient reading of such data, assuming that the data may be very large.
b) The task involves modeling a complex data modification problem that would cause data anomalies in a typical key-value database.

#2 Task: MongoDB

Model the database by estimating that the data model is documents. Provide the UML diagram of the database model, mark external keys with aggregations, embedded entities with composition relations (alternatively, the embedded entity can be marked with the stereotype <<embedded>>).

The selected field must contain at least 3 entities (for example: universities, student groups, students). Choose a situation so that at least one relationship is external and at least one requires a nested entity.

Comment on your choices for: data types, connections.

Write requests in the program:

To receive embedded entities (for example, a bank - all accounts of all customers). If you use a find operation, use projection and don't send unnecessary data.
At least two aggregating requests (e.g. bank balances of all customers, etc.)
Do not use banking for the database.

#3 Task: Cassandra

Provide a physical data model for the Apache Cassandra database (UML). Write a program that implements several operations in the chosen subject area.

Features for the area:

At least some entities exist
There are at least two entities with a one-to-many relationship
Use cases require multiple queries with different parameters for at least one entity.

For example, in a bank, we store customers, their accounts (one-to-many relationship) and credit cards. We want to search for accounts by customer (find all his accounts) and by account number, we want to search for customers by their customer ID or personal code. We want to search for credit cards by their number, and we also want to find the account associated with a specific card.

In at least one situation, make meaningful use of Cassandra's compare-and-set operations (hint: IF) in an INSERT or UPDATE statement. For example, we want to create a new account with a code only if it does not exist. We want to transfer money only if the balance is sufficient.

Cannot use ALLOW FILTERING and indexes that would cause the query to be executed on all nodes (fan out) in queries.

#4 Task: Neo4J

Write a simple program implementing scope suitable for graph databases.

Model at least a few entities with properties.
Demonstrate meaningful requests:

2.1. Find entities by attribute (eg find a person by personal identification number, find a bank account by number).
2.2. Find entities by relationship (e.g. bank accounts belonging to a person, bank cards linked to accounts of a specific person).
2.3. Find entities connected by deep connections (eg friends of friends, all roads between Birmingham and London; all buses that can go from stop X to stop Y).
2.4. Finding the shortest path by evaluating the weights (e.g. finding the shortest path between Birmingham and London; finding the cheapest way to convert from currency X to currency Y, when the conversion information of all banks is available and the optimal way can be performed in several steps).
2.5. Aggregate data (e.g. like 2.4, only find path length or conversion cost). Don't take the shortest path.

For simplicity, have test data ready. The program should allow you to make queries (say entering city X, city Y and planning a route between them).

No modeling about movies and cities databases!
Do not print the internal data structures of the Neo4J driver - format the result for the user.

0 Comments

2024/04/20
10:07 UTC

Manage a database of 10 billion of data

Hi everyone,

I have a rather unusual project

I have a file containing 10 billion references with a length of 40 letters, to which another reference value of variable length is associated.

I'd like to use an API request to retrieve the value associated with a given reference in record time (ideally less than 0.5 seconds, i know it can be possible in arround 0,30 sec, but i don't know how ..).

Which solution do you think is best suited to this problem ? How to optimize it ?

I'm not basically an SQL specialist, and I wanted to move towards NoSQL, but I didn't really have any ideas on how to optimize it... The aim is to be the fastest without costing €1,000 a month.

The user types in a reference and gets it almost instantly. All he then has to do is enter a reference via the API to retrieve the associated reference.

Many thanks to you

3 Comments

2024/04/04
22:02 UTC

Migrating data from Dynamo DB tables into Google Firestore Databases

I am working on a startup and we have decided to stop using AWS and start using Google Firebase. I have several small data tables (NoSQL) in Dynamo DB that I need moved over to the firestore (NoSQL) database. I can easily take the data out of dynamo DB as a Json, but I don't know how to insert that data to Firestore. I need this done yesterday so could really use some help. Thanks!

0 Comments

2024/04/04
16:49 UTC

Graph Your World on Windows with Apache AGE

Hey r/nosql crew!

🚀 Big news: Apache AGE's Windows installer is here! Making graph databases a breeze for our Windows-using friends. 🪟💫 Download here

Why You’ll Love It:

Easy Install: One-click away from graph power.
Open-Source Magic: Dive into graphs with the robustness of PostgreSQL.

Join In:

Got a cool graph project? Share it!
Questions or tips? Let's hear them!

Let's explore the graph possibilities together!

0 Comments

2024/03/26
20:45 UTC

Apache AGE: Graph Meets SQL in PostgreSQL

Hello r/NoSQL community!

I'm thrilled to dive into a topic that bridges the gap between the relational and graph database worlds, something I believe could spark your interest and potentially revolutionize the way you handle data complexities. As someone deeply involved in the development of Apache AGE, an innovative extension for PostgreSQL, I'm here to shed light on how it seamlessly integrates graph database capabilities into your familiar SQL environment.

Why Apache AGE?

Here's the scoop:

Seamless Integration: Imagine combining the power of graph databases with the robustness of PostgreSQL. That's what AGE offers, allowing both graph and relational data to coexist harmoniously.
Complex Relationships Simplified: Navigate intricate data relationships with ease, all while staying within the comfort and familiarity of SQL. It's about making your data work smarter, not harder.
Open-Source Innovation: Join a community that's passionate about pushing the boundaries of database technology. Apache AGE is not just a tool; it's a movement towards more flexible, interconnected data solutions.

Who stands to benefit? Whether you're untangling complex network analyses, optimizing intricate joins, or simply graph-curious, AGE opens up new possibilities for enhancing your projects and workflows.

I'm here for the conversation! Eager to explore how Apache AGE can transform your data landscape? Got burning questions or insights? Let's dive deep into the world of graph databases within PostgreSQL.

For a deep dive into the technical workings, and documentation, and to join our growing community, visit our Apache AGE GitHub and official website.

0 Comments

2024/03/20
21:01 UTC

How to explain NoSQL concepts to undergraduate kids with very little or no knowledge of SQL

Same as title

2 Comments

2024/02/29
23:11 UTC

Converting sql peer data table data to JSON

I’m having trouble determining the best structure for a peer group database and generating a json import file from sample data in table format. I’m new to MongoDB and coming from an Oracle SQL background. In relational framework, I would setup two tables, one for peer group details and a second for peers. I already have sample data I would like to load into mongo but split out into two different tables. I’ve heard generally I should try and create 1 collection and use embedding, but how would I create that json from my sample tabular data? And longterm, we want to make an api with this peer data where users can lookup by the peer group or by the individual peer. Is an embedded structure still the best structure considering that requirement? Thanks for any info, tips, advice!

0 Comments

2024/02/08
02:28 UTC

MongoDB vs DynamoDB vs DocumentDB vs Elastisearch for my usecase

Disclaimer: I don't have any experience with NoSQL

Hi, I'm currently developing a fantasy sports web app, now a game can have many matches and each match can also have many stats results(let's say a match contains at minimum 20 rows of stats results(for both Player A and Player B) that will be stored in the database).

Now that would be a hell of a load being put into my mysql database. So I thought of using nosql, since the structure of results also varies per game type.

Now, I don't really know which to use, and all while considering that we are on budget, so the most cost effective db would be preferred. We are on AWS environment btw.

10 Comments

2024/01/19
13:59 UTC

Seeking Guidance: Designing a Data Platform for Efficient Image Annotation, Deep Learning, and Metadata Search

Hello everyone!

Currently, at my company, I am tasked with designing and leading a team to build a data platform to meet the company's needs. I would appreciate your assistance in making design choices.

We have a relatively small dataset of around 50,000 large S3 images, with each image having an average of 12 annotations. This results in approximately 600,000 annotations, each serving as both text metadata and images. Additionally, these 50,000 images are expected to grow to 200,000 in a few years.

Our goal is to train Deep Learning models using these images and establish the capability to search and group them based on their metadata. The plan is to store all images in a data lake (S3) and utilize a database as a metadata layer. We need a database that facilitates the easy addition of new traits/annotations (schema evolution) for images, enabling data scientists and machine learning engineers to seamlessly search and extract data.

How can we best achieve this goal, considering the growth of our dataset and the need for flexible schema evolution in the database for efficient searching and data extraction by our team?

Do you have any resources/blog posts with similar problems and solutions to those described above?

Thank you!

0 Comments

2023/12/28
12:51 UTC

MongoDB ReplicaSet Manager for Docker Swarm

I've written this tool out of a need to self-host a MongoDB based application on Docker Swarm, as file-based shared storage of mongodb data does not work - Mongo requires a replicaSet deployment) .

This tool can be used with any docker based application/service that depends on Mongo. It automates the configuration, initiation, monitoring, and management of a MongoDB replica set within a Docker Swarm environment, ensuring continuous operation, and adapting to changes within the Swarm network, to maintain high availability and consistency of data.

If anybody finds this use-case useful and wishes to try it out, here's the repo:

MongoDB-ReplicaSet-Manager

0 Comments

2023/12/06
08:06 UTC

Which NoSQL databases use the new SQL++ language for query-ing?

Hi, I know Couchbase and Apache Asterix use the SQL++ language. But is that it so far? Or are there more?

0 Comments

2023/12/04
16:29 UTC