My team has recently decided to move from a default READ consistency level of LOCAL_QUORUM
to THREE
. After this change, the CassandraHealthIndicator
can no longer execute the query below successfully. I'm wondering if there's a better test query that could work at all Consistency Levels?
@Override
protected void doHealthCheck(Health.Builder builder) throws Exception {
Select select = QueryBuilder.select("release_version").from("system", "local");
ResultSet results = this.cassandraOperations.getCqlOperations().queryForResultSet(select);
Comment From: philwebb
The cassandra health checks were contributed in #2064 by @jdubois. I'd be interested to know if he has any suggestions.
Comment From: jdubois
I would be very surprised that there is reason this request fails on a specific consistency level, could you provide some documentation, anything to support this claim?
Comment From: ankit--sethi
Here's one github issue discussing an almost identical problem.
Going by what they say -- which seems right based on my knowledge of the System tables -- some Consistency Levels can never successfully execute on system tables that use LocalStrategy
.
The fix for this should be relatively straightforward -- ignore the default (or user-configured) Consistency Level set within CassandraOperations
and explicitly set it to be ONE
(or any of the workable values) for the the healthcheck.
Comment From: jdubois
I'm very surprised that system tables cannot have a high consistency level... In that case, they seem to be a poor choice to check the cluster health - lowering the consistency level would make the cluster look healthy, when in fact you cannot read or write... So if that's correct we would need to create a specific table in the database schema, which wouldn't be easy to use for Spring Boot (because then people would need to create it, etc).
If nobody finds a good solution, I'll check with my friends from Datastax when I'm back from holidays, in about 1 month.
Comment From: wilkinsona
@jdubois I hope you enjoyed your holidays. Unfortunately, I don't think we've found a good solution for this one in your absence. If you have a moment could you please check with your friends at Datastax and see what they would recommend?
Comment From: jdubois
Indeed, let me call @bguedes @clun for help!!!
Comment From: wilkinsona
@clun @bguedes If you have a few minutes, we'd be really grateful for your recommendation here.
Comment From: clun
Hi team,
Thank you @wilkinsona for the poke, I missed the last one for some reason. The system
, like any other keyspace, has some replication_factor
attribute and default is maybe 1. So, if you add nodes later on and do not increase the replication factor you will hit some errors.
Try this:
ALTER KEYSPACE system
WITH REPLICATION= {'class' : 'NetworkTopologyStrategy',
'data_center_name'_1 : 3, 'data_center_name' : 3};
I personally don't like TWO,THREE CL => seems not generic.
I would go with ALL
to ensure that all nodes are up and LOCAL_QUORUM
in more optimistic approach.
select * from system.local
is still an efficient query I would say but why not relying on the driver itself ?
This is the same for amy system-related keyspace. https://docs.datastax.com/en/security/6.7/security/secSystemKeyspace.html
Comment From: wilkinsona
Thanks very much, @clun. Unfortunately, we're not in a position to alter a keyspace and just have to rely upon what the user has configured.
select * from system.local is still an efficient query I would say but why not relying on the driver itself ?
This is intriguing. How would we go about relying on the driver itself? Is there something provided by the driver that we can call to determine Cassandra's health?
Comment From: adutra
I was just made aware of this issue.
The system
keyspace is a bit special. It has a replication factor of 1 and uses a special replication strategy called LocalStrategy
. Basically this means that this keyspace is local to each node.
Concretely speaking, this means that querying that keyspace can only work with the following consistency levels: ONE
, LOCAL_ONE
, QUORUM
, LOCAL_QUORUM
, EACH_QUORUM
, ALL
. This is because the quorum for replication factor 1 is 1, so all the aforementioned levels are equivalent to ONE
with RF 1.
However TWO
and THREE
consistency levels cannot be met on that keyspace. You would get the following error:
UnavailableException: Not enough replicas available for query at consistency THREE (3 required but only 1 alive)
As a consequence, queries to system.local
MUST force the consistency level to ONE
or LOCAL_ONE
. I will see if my team can provide a fix for this quickly.
And as a side note: do not use THREE
, use QUORUM
or ALL
.
Comment From: jdubois
Thanks so much @adutra !! Don't hesitate to ping me
Comment From: snicoll
Closing in favour of PR #20709