/r/cassandra

Photograph via snooOG

/r/cassandra

2,683 Subscribers

1

JSON query builder for Cassandra

I am creating an application where the user can define their own queries. To avoid bad queries (and alot of other issues like injection), the queries will be written using JSON. The format will be similar to Mongo's queries. Example:

{

"type": "find", "table": "table1", "conditions": { "a": 1 }, "project": { "a": 1, "b": 1 } }

resolves to select a, b from table1 where a = 1

Another very important feature is variable injection.

{

"type": "find", "table": "table1", "conditions": { "a": { // get value from variable b in code. assume b to be a global variable in this case with value 2 "type": "variable", "get": "b" } }, "project": { "a": 1, "b": 1 } }

resolves to select a, b from table1 where a = 2

this is basically to allow parametrized queries but with safety This should be flexible as for to allow parameters to be requested from REST APIs too later on.

However I have no idea on how to go about doing this both in terms of language and security. If there is a better of way of doing this (maybe using something other than JSON), I am open to suggestions. My language of choice is Golang. I'll be using ScyllaDB but considering that it is just a clone of Apache Cassandra, anything related to Cassandra would be relevant as well. Any help or pointer in the right direction would be a massively appreciated.

9 Comments
2024/04/16
23:09 UTC

3

Bulk deletion

Hi guys, . Please suggest a way to delete bulk delete some million entries in table.

1 Comment
2024/04/04
18:48 UTC

2

Multi Host Cassandra with Stargate

Hello, i’m trying to deploy Stargate that gets connected to Cassandra on three different hosts. My issue is that Cassandra manages to communicate with the other hosts that have the db up but when it comes to Stargate it’s just failing. If it s standalone wants a local seed, if i deploy a cassandra on the stargate host it s failing. Same within a docker setup and bare metal setup. Any advice on how to have cassandra on multiple hosts and a stargate in front of them? Stargate documentation is not that great. Thanks

0 Comments
2024/03/29
14:14 UTC

2

IO problems after migration

Hello,

I migrated from cassandra 3.11 to Cassandra 4.1 recently. I also moved from Red Hat 7 to Red Hat 9.

I have a one node only setup that I use for Glowroot. The thing is working great for a while but every 4 hours exaclty (9h, 13h...) we see a peak in io (cpu is up to 90% in wait) that last way to long and slow downs everything.

Any idea what this does come from? Do I need to look for somehting specific in debug mode?

My last option is to make a 3 node setup to try to fight this but I'd like to be sure that it will help.

My data is around 100GB, 8cpu 32GB ram machine, the previous machine was half that...

Thanks for any help

6 Comments
2024/03/27
12:40 UTC

1

Repeatable migrations/transformations on cassandra data

In short:

I'd like to perform repeatable migrations/data transformations to a cassandra database. Does anyone have any experience of this kind of thing or suggestions for tools that can manage this procedure?

More context:

We have a cassandra database with time series data in, hosted across multiple pods in a k8 cluster. The structure of the database is along the lines of: Name (string, pk), Type (string, pk), Value (long). We recently added a new Type to the time-series, and we'd like to perform a migration where we can back-populate the database. The data needed to do the back-population already exists in the timeseries, it just needs to be aggregated somehow. We have a bit of a hacky way to do this that would not allow us to do any rollbacks, or have a (good) record of the information that was migrated. I'd like to find a way to manage this a little more reliably.

If anyone has any input it'd be much appreciated!

2 Comments
2024/03/18
14:17 UTC

2

Token as a clustering key

Hi! Is there a way for me to add the token("partition_key") as a clustering key of my table? I need to sort the data based on the token.

4 Comments
2024/03/12
19:09 UTC

1

Is it possible to check if the cassandra queries present in my cqlsh file are correct?

title

3 Comments
2024/03/02
14:29 UTC

3

Cassandra for bulk SMS Database

Hi,

I want to build a bulk SMS sender with Twilio, on Spring Boot, and I'm looking at non-relational databases to store the SMS. The website, being used to communicate with a growing number of users, scalability is the priority. I was thinking of using MongoDB, but due to the potentially high number of SMS the website would have to deal with, the cost of MongoDB causes an issue. I would like to know if Cassandra would be just as effective or if there's another solution since I know it's not as easy to implement and work with as MongoDB.

3 Comments
2024/02/06
05:45 UTC

3

How to design a database?

Hello everyone, i am a junior (mainly frontend) and i want to build a personal full stack project. So by now i decided to use cassandra as my database (because it just seems to be the fastest and cheapest option). But i dont know how to design a good cassandra db as i cannot apply the rules for sql data bases. Does somebody has a good learing website or some Information for me? VG

2 Comments
2023/12/19
20:55 UTC

1

How to convert map records from blob to text using?

I have a table with following schema

PK" text,

"SK" text,

":attrs" map<text, blob>,

PRIMARY KEY ("PK", "SK")

I would like to get the string value of a record that I inserted into this table? Currently I am getting the hexadecimal values since it's a blob.

Something like this but I can't get the syntax right

select blobAsText(":attrs"['key_name']) from my_table

0 Comments
2023/11/28
18:30 UTC

1

Speed of cassandra-driver in serverless functions?

In the SQL realm, there's been all this talk about the drivers being slow. For example, folks were complaining that prisma took too long to load and then people moved on to drizzle-orm because it's only a wrapper around raw sql.

Now for datasax cassandra, I started to use the cassandra-driver but I suspect it might not be as light as something like drizzle-orm.

  1. Is there any performance stats on the speed to connect to the database with cassandra-driver?
  2. In production with serverless functions, do folks just use the REST API instead?
  3. How do folks who use NextJS and serverless functions typically access Cassandra in production?
  4. REST API or GraphQL API - making a chat-like app with threads and messages?
  5. What's the difference between the document API and the REST API?
0 Comments
2023/11/19
17:33 UTC

3

Vote for Cassandra in LangChain integrations.

  1. Go to https://integrations.langchain.com/
  2. Sign In with email/GitHub/discord
  3. Click Vector Stores filter
  4. Press the heart button for Cassandra
0 Comments
2023/11/02
17:01 UTC

1

Has any used Stargate.io proxy?

Currently we are seeing a very huge number of connections to cassandra cluster ( 60k+ ) and seems like that is causing increase in latency. We want to evaluate stargate.io . Will this help significantly with number of connections? What other features does it provide?

0 Comments
2023/10/31
20:52 UTC

1

Stress testing cassandra with different workloads

Hey,

I want to stress test cassandra with different workload to see how it reacts. Ideally 30% serial, parallel and crosstalk each. But it seems there are no settings to do this with Cassandra-stress, it will only test one of them at a time, which is not the same.

Does anyone know a way to do this?

2 Comments
2023/10/17
16:21 UTC

1

Do I loose my data?

My Cassandra fails to start when I try to run the instance pointing to an existing db (created using another node) .

0 Comments
2023/10/12
08:30 UTC

1

DB integrity check

Any suggestions on how to effectively enable database integrity check on Cassandra DB? For this exercise, we are planning to have two Azure VMs. VM1 for running the DB operations and VM2 to perform the integrity check against VM1. Does Cassandra have any inbuilt command/function? Similar to what SQL Server has “DBBC CheckDB”?

1 Comment
2023/09/27
05:15 UTC

0

Cassandra Tree Question

I am looking for advice on how to setup an Apache Cassandra Multi Site instance.
I have one main site (location) that will only be taking replicas, then multiple other sites that will be ingesting their own data.

I understand that for each "data center", the cluster should be seeded together. As of right now, there will be only 2 Cassandra nodes per site, and one main node that will replicate the key spaces for each of these sites.

In the future, more sites can be added, and the main replica node will need to be updated to add these sites.

Any configuration setup advice would be appreciated.

https://preview.redd.it/wg93xj49pglb1.png?width=721&format=png&auto=webp&s=005769fe120cf6fa2cc1314008aad98642ab3774

3 Comments
2023/08/31
15:12 UTC

4

Node.JS 'cassandra-driver' very slow in establishing a connection to astra db serverless

Hi, I'm designing a small / niche social media site using astra db serverless on netlify. I want to create serverless functions for things like createPost, likePost, getUser etc.

My issue is when I run a serverless function like getUser, which uses the 'cassandra-driver' it takes 5 seconds to connect before running the query. This is much much slower than when using astrajs/rest with node.js

Is it fair to say that the cassandra-driver isn't suited for running on serverles functions on netlify?

4 Comments
2023/08/20
20:56 UTC

2

Live Coding 'VersionedKeyValueStore' part1 - 'kafka-streams-cassandra-state-store'

1 Comment
2023/08/11
12:44 UTC

2

Interactive Queries with 'kafka-streams-cassandra-state-store' - Demo

1 Comment
2023/08/06
13:51 UTC

2

any good cassandra gui client to view data in realtime for mac ?

3 Comments
2023/08/04
07:18 UTC

1

How to implement custom indexes in Cassandra

Hello i want to implement my indexes for the Cassandra.

How can i do it ?

Thank you !

1 Comment
2023/08/02
16:54 UTC

1

Why does DevCenter freeze for 5 minutes every time I switch environment?

I use DevCenter to interact with my Cassandra database. I have multiple environments with different connection IPs. When I use DevCenter 1.6 and switch environments, the program will freeze for like 5 minutes before it starts responding again. Has any experienced this / does anyone know how to fix this?

0 Comments
2023/07/27
15:18 UTC

2

Could anyone tell me why I got this error and how to fix it?

The return for this count will be almost around almost 1.5 Cr as I know what I pushed into it.

https://preview.redd.it/87agj806iucb1.png?width=1846&format=png&auto=webp&s=df25a0eca87becae70f8ed5605fbe1bf9618f961

9 Comments
2023/07/19
03:59 UTC

1

How many concurrent connections can cassandra handle?

I know that might depends on many factors some including

  • Number of nodes
  • Ram and storage of nodes
  • CPU power etc

So if I have all this data how can I know haw many maximum connections can we have?

We are planning to use it for IOT application to store time series sensor data.

5 Comments
2023/07/14
18:09 UTC

1

How can I make (game_id, user_id) unique, yet (game_id, score) indexed/clustered, in ScyllaDB?

See this in ScyllaDB/Cassandra:

CREATE TABLE scores_leaderboards (
    game_id int,
    score int,
    user_id bigint,
    PRIMARY KEY (game_id, score, user_id)
) WITH CLUSTERING ORDER BY (score DESC);

The idea is that we can get the user IDs with the top scores for a game.

This means that (game_id, score) needs to be indexed, and that's why I put it like that in the Primary Key.

However, I had to include user_id, so that 2 users can have the exact same score.

The problem is that, like this, (game_id, user_id) isn't unique. I want to make sure the table never contains 2+ pairs of the same (game_id, user_id).

My questions:

  1. What do you suggest I can do, so that (game_id, user_id) is unique, yet (game_id, score) is indexed?

  2. Ideally, (game_id, user_id) would be the primary key, and then I'd create a compound index with (game_id, score).

However, if I try to create a compound index,

CREATE INDEX scores_leaderboards_idx ON scores_leaderboards (game_id, score);

I get the following:

InvalidRequest: Error from server: code=2200 [Invalid query] message="Only CUSTOM indexes support multiple columns"

But I'm not finding how I can create a CUSTOM index... is this an extension I need to install?
Is there any recommendation against using custom indexes?

4 Comments
2023/07/08
20:05 UTC

2

Help me understand the Tokens and Owns in nodetool status

Can someone explain what are Tokens and what is Owns (effective) and why I have those weird percentages?
I haven't created any keyspace or anything yet! It is just a fresh installation.

nodetool status

2 Comments
2023/07/07
05:08 UTC

Back To Top