Cassandra Query Patterns: Not using the “in” query for multiple partitions.
So lets say you’re doing you’re best to data model all around one partition. You’ve done your homework and all you queries look like this:SELECT * FROM my_keyspace.users where id = 1
Over time as features are added however, you make some tradeoffs and
need to start doing queries across partitions. At first there are only a
few queries like this.SELECT * FROM my_keyspace.users where id in (1,2,3,4)
You’re cluster is well tuned so you have no problems, but as time
goes on your dataset increases and users are doing bigger searches
across more users.SELECT * FROM my_keyspace.users where id in
(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23)
Now you start seeing GC pauses and heap pressure that leads to
overall slower performance, your queries are coming back in what
happened?Imagine the contrived scenario where we have a partition key with the values A,B,C with 9 nodes and a replication factor of 3.
When I send in my query that looks like
SELECT * FROM mykeyspace.mytable WHERE id IN (‘A’,’B’,C’)
the coordinator has to do something like:
No comments:
Post a Comment