Friday, September 9, 2016

Cassandra Query Patterns: Not using the “in” query for multiple partitions.

Cassandra Query Patterns: Not using the “in” query for multiple partitions.

So lets say you’re doing you’re best to data model all around one partition. You’ve done your homework and all you queries look like this:
SELECT * FROM my_keyspace.users where id = 1
 
Over time as features are added however, you make some tradeoffs and need to start doing queries across partitions. At first there are only a few queries like this.
SELECT * FROM my_keyspace.users where id in (1,2,3,4)
 
You’re cluster is well tuned so you have no problems, but as time goes on your dataset increases and users are doing bigger searches across more users.
SELECT * FROM my_keyspace.users where id in
(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23)
 
Now you start seeing GC pauses and heap pressure that leads to overall slower performance, your queries are coming back in what happened?
Imagine the contrived scenario where we have a partition key with the values A,B,C with 9 nodes and a replication factor of 3.

When I send in my query that looks like SELECT * FROM mykeyspace.mytable WHERE id IN (‘A’,’B’,C’) the coordinator has to do something like:

No comments:

Post a Comment