Inverse of a hash function

I’ve used Thomas Wang’s integer hash functions for years for various purposes. Using techniques invented by Bob Jenkins for general hashing (e.g., hashes of strings), Wang derived several hash specialized for fixed size integer input. His 64-bit version is

uint64_t hash(uint64_t key) {
  key = (~key) + (key << 21); // key = (key << 21) - key - 1;
  key = key ^ (key >> 24);
  key = (key + (key << 3)) + (key << 8); // key * 265
  key = key ^ (key >> 14);
  key = (key + (key << 2)) + (key << 4); // key * 21
  key = key ^ (key >> 28);
  key = key + (key << 31);
  return key;
}

Key properties include avalanche (changing any input bit changes about half of the output bits) and invertibility. Recently I wanted to make explicit use of the inverse, in order to verify that zero would never arise as the hash of a given set of inputs. This property would allow me to initialize the hash table in question (which takes up several gigabytes) with zeros and avoid an explicit occupied bit on each entry. Thus, I needed inverse_hash(0).

Our function is the composition of the functions on each line, so we need to invert each one. Multiplication by 21 and 265 is easy; both numbers are odd, and therefore have multiplicative inverses mod 2 64. The rest of the lines are invertible because they’re Feistel functions; they break the key into two pieces, leave one piece alone, run the second function through an invertible function that depends on the first. For example, the line key = key ^ (key >> 24) leaves the top 24 bits alone. Once you know the top 24 bits, you can reconstruct the next 24 bits with an xor, and one more round gives the remaining bits. The full inverse is

uint64_t inverse_hash(uint64_t key) {
  uint64_t tmp;

  // Invert key = key + (key << 31)
  tmp = key-(key<<31);
  key = key-(tmp<<31);

  // Invert key = key ^ (key >> 28)
  tmp = key^key>>28;
  key = key^tmp>>28;

  // Invert key *= 21
  key *= 14933078535860113213u;

  // Invert key = key ^ (key >> 14)
  tmp = key^key>>14;
  tmp = key^tmp>>14;
  tmp = key^tmp>>14;
  key = key^tmp>>14;

  // Invert key *= 265
  key *= 15244667743933553977u;

  // Invert key = key ^ (key >> 24)
  tmp = key^key>>24;
  key = key^tmp>>24;

  // Invert key = (~key) + (key << 21)
  tmp = ~key;
  tmp = ~(key-(tmp<<21));
  tmp = ~(key-(tmp<<21));
  key = ~(key-(tmp<<21));

  return key;
}

The cleverness of the original hash function is that each invertible step is also extremely fast. The inverse is slower, but only moderately.

Finally, I did indeed luck out:

inverse_hash(0) = 0x7ffffbffffdfffff

This isn’t a valid pentago board in my packed representation, so zero initialization works. This turned out to be obvious in hindsight: all but the first step in the hash function leaves zero alone, and the last step is complement on the lower 21 bits, which is enough to know that inverse_hash(0) can’t be a valid pentago board. It’s still cool to have the full inverse, though.

I tried to email Thomas Wang in case he hadn’t had the occasion to write down the inverse explicitly, but unfortunately his HP email bounced. Impermanent email addresses make me sad.

Tags: , , ,

10 Responses to “Inverse of a hash function”

  1. Brian Says:

    Thanks for posting this! I was wondering if the Wang/Jenkins hash function was invertible, because I want to use it to scramble a 64-bit id. A strong encryption algorithm is overkill for my use case.

  2. Geoffrey Irving Says:

    Glad it helps! Make your scrambling function isn’t for security purposes, though: for security purposes, applying this kind of hash function is equivalent to applying the identity map.

  3. Attractivechaos Says:

    For any two integers, does Wang’s hash function always map the two to different integers, or there exists integers k!=l such that hash(k)=hash(l)? Do you know this? Thanks a lot!

  4. Geoffrey Irving Says:

    Yes, different integers map to different integers. That is part of what invertible means, and is implied by the existence of an inverse.

  5. Vipul Periwal Says:

    I’m looking for an invertible hash function that would map a given set of symbol strings into fixed length hashes maintaining some sort of string distance. I haven’t found anything near as clever as this. Do you know of anything that would shed light on this?

    Thanks .

  6. Geoffrey Irving Says:

    A hash can only be invertible if its input is the same size as its output: this one maps 64-bit integers to 64-bit integers. If you wanted to do the same thing for strings, they would need to be at most 8 characters long (shorter in unicode), so this approach probably isn’t what you want. Maintaining distance with a hash function is also going to be impossible except extremely approximately. If you specify your requirements more exactly I may be able to point out a better solution.

  7. Vipul Periwal Says:

    Right, that’s why I specified something that would be invertible on a given set of symbol strings. Basically this amounts to finding some sort of invertible perceptual hashing, but I’m driven by a need to map sets of symbol sequences into an equal length format that captures the essential features of the symbol sequences. Is there an iterative construction of a hashing function given a set of symbol sequences?

    Thanks

  8. Geoffrey Irving Says:

    You can look into perfect hashing, but I still don’t know your actual requirements: “essential features” and “perceptual” aren’t very specific.

  9. Vipul Periwal Says:

    For example, in biology we have to deal with sequences of proteins from different organisms, which often have different lengths of amino-acid chains even though they are functionally very similar. So we align them which usually means inserting gaps in some sequences at appropriate places to make the important amino-acids line up when you stack all the sequences. I would prefer it if I could find a fixed length hash that could take a given set of such amino-acid sequences. I have a technique for finding patterns in aligned symbol sequences and so I would like to find a more conceptual way of going from a set of objects (here variable length protein sequences) to fixed length sequences which is nevertheless true to invariants in the original sequences. Perceptual is a term that I found in the hashing literature when people discuss making image hashes for copyright purposes.

  10. Geoffrey Irving Says:

    Any solution to your problem is definitely going to be unrelated to conventional hash functions. It’s conceivably that a perfect hashing algorithm could be painstakingly modified to preserve a distance measure, but it would only do that on exactly the set of known inputs, and you might as well just compute the distance directly.

Leave a Reply