I have always been intrigued by the concept of hashing, and I spent quite some time and effort researching it:
- Stateless load balancing of connections (US patent #9560126B2, 2013)
- Cache-optimized hash table data structure (US patent #9,852,074, 2015)
Consistent hashing is a known technique for implementing distributed storage systems, in a way that avoids redistribution of stored items in case of changes. Companies like Akamai have deployed these ideas with great success.
Contemporary open source implementations like HAProxy use consistent hashing as a core feature. However, these implementations tend to be stateful: The application maintains an in-memory mapping of hash keys to buckets (representing servers, etc.). Even minimal implementations as suggested here still maintain some form of state.
The problem with maintaining state, is that it has a tendency to become stale. In a distributed system, when multiple nodes maintain state, there is a non-zero probability that the value maintained by one node does not match one or more others, for a given period of time. A second problem with in-memory state, is that it gets lost upon reboot (or restart of a Docker container).
To implement a more stateless consistent hashing scheme, consider the following:
- A set of service endpoints is represented as a bit mask - say 64 bits on modern x86 CPUs
- Endpoints that are online are '1', offline are '0'
- An incoming IP packet is hashed to a 64-bit key using a hash function h(ip_hdr)
- One of the online endpoints is selected based on a series of pairwise comparisons between endpoints, with ties (both online) resolved using a bit from the hash value
No comments:
Post a Comment