Wednesday, October 21, 2020

More or less stateful-less consistent hashing

I have always been intrigued by the concept of hashing, and I spent quite some time and effort researching it:

Consistent hashing is a known technique for implementing distributed storage systems, in a way that avoids redistribution of stored items in case of changes. Companies like Akamai have deployed these ideas with great success.

Contemporary open source implementations like HAProxy use consistent hashing as a core feature. However, these implementations tend to be stateful: The application maintains an in-memory mapping of hash keys to buckets (representing servers, etc.). Even minimal implementations as suggested here still maintain some form of state.

The problem with maintaining state, is that it has a tendency to become stale. In a distributed system, when multiple nodes maintain state, there is a non-zero probability that the value maintained by one node does not match one or more others, for a given period of time. A second problem with in-memory state, is that it gets lost upon reboot (or restart of a Docker container).

To implement a more stateless consistent hashing scheme, consider the following:

  • A set of service endpoints is represented as a bit mask - say 64 bits on modern x86 CPUs
  • Endpoints that are online are '1', offline are '0'
  • An incoming IP packet is hashed to a 64-bit key using a hash function h(ip_hdr)
  • One of the online endpoints is selected based on a series of pairwise comparisons between endpoints, with ties (both online) resolved using a bit from the hash value
In logic terms, the outputs of the first circuit can be defined as follows:
s1' = s1 and (not s2 or hash[0])
s2' = s2 and (not s1 or not hash[0])

In a programming language, this function could start something like this:

function select_server( ulong64 hash, ulong64 active ) {
   const unsigned long m0x55 = 0x5555555555555555UL;
   const unsigned long m0xaa = 0xaaaaaaaaaaaaaaaaUL;

   // For 64 bits: Do 6 rounds (2^6=64), using the hash input to     
   // resolve ties
   // A->B B->A
   unsigned long others = ((active&m0x55)<<1) | ((active&m0xaa)>>1);
   unsigned long h = (hash&m0xaa); // ....3-1-
   h |= h>>1; // ....3311
   h ^= m0x55; // ....3^1^
   active &= ~others | h;

   ... more rounds ...
}

No comments:

Post a Comment