Redis [NEW]Reduce the gossip traffic of the cluster by an order of magnitude

The problem/use-case that the feature addresses

According to existing information and experience, we found the Redis problem of CPU usage in big clusters. This is the reason that directly led upper limit of the Redis cluster. This is mainly due to Redis's gossip strategy. In the Redis cluster, each gossip request carries 10% of the entire cluster state message. The feature of the gossip will amplify this issue. In a cluster of 900 Redis Node, CPU usage will reach 12% at gossip protocol, this is because The communication volume of messages increases exponentially with the size of the cluster.

https://github.com/redis/redis/issues/11740

https://github.com/redis/redis/issues/3929

This issue seems to have been noticed a long time ago. But it should be possible to use a more aggressive strategy to avoid it.

https://github.com/redis/redis/pull/10624

Description of the feature

Therefore, A simple solution can used for optimizing message size.

https://ieeexplore.ieee.org/abstract/document/1028914

In this paper propose a solution that avoids each piece of information grows with the number of nodes.

When the node is running, it goes online and offline, broadcast gossip to other nodes. Synchronize the entire amount of node information between nodes at regular intervals (using a long cycle), this option can be optimized by using the Merkel tree(anti-entropy). Each node sends ping requests sequentially to the nodes in the list. Due to each node keeping a different view for the entire system(maybe the nodes are the same, but the order in arrays is different), so each node has the same probability of receiving ping requests. If the node that receives the Ping request can't respond Ping message, the sender will allow other nodes to indirectly detect(this process is similar to objective down and subjective down of Redis, But in SWIM as long as a node replies, the detected is considered alive), the heartbeat information is very small.

By using the above scheme, the traffic of the cluster will not increase with the size of the cluster(averaged to each node), it also reduces CPU usage.

Alternatives you've considered

https://github.com/redis/redis/issues/3929

For 200 nodes, the traffic is already considerable(If deployed across data centers, bandwidth will be very expensive).

Considering that there will be significant adjustments to Redis existing strategy, perhaps some of the side strategies mentioned earlier can be used to solve the problem first: Summarize the data using Merkle Tree, and regular full synchronization.

https://tech.meituan.com/2024/03/15/kv-squirrel-cellar.html#:~:text=%E8%BF%9B%E8%A1%8C%E8%AF%BB%E5%86%99%E6%93%8D%E4%BD%9C%E3%80%82-,3.1,-Squirrel%E6%B0%B4%E5%B9%B3%E6%89%A9%E5%B1%95

This strategy has been used by a Chinese company, reducing gossip bandwidth by 90%.(Although the traffic for each node still increases with the size of the cluster).

Additional information

This method has already demonstrated good performance in the "Memberlist"(gossip cluster service discovery library). Both Serf and Consul are developed based on this project.

According to my testing, A high-performance machine can support 1000 instances of Memberlist at the same time.

Memberlist has many good features that can reduce the false positive rate of detection.

https://arxiv.org/abs/1707.00788