Siphash Concerns

This article highlights concerns about the security of Cuckoo Cycle in relationship to its use of Siphash as a seeding mechanism for the graph. I’ll begin with a technical explanation, but casual readers may skip to the Nontechnical section for a more accessible exposition.

Technical

Cuckoo Cycle is Hypersensitive to Any Loss of Entropy in Siphash

Cuckoo Cycle uses a ratio of edges-to-nodes such that the mean degree c = 2M/N = 1. In Erdos-Renyi graph models, this value is known to be a sharp transition between subcritical and supercritical networks, where subcritical finite components of size log(N) quickly coalesce into a supercritical network that contains a giant component of order N. Similarly, the number of cycles in subcritical graphs is finite as N --> +inf, while for networks at or above the critical threshold, the number of cycles approaches infinity as N also approaches infinity. Because Cuckoo Cycle is poised exactly at c=1, any change to the mean degree in the graph has dramatic effects on the number of cycles in a given graph instance. Specifically, if we can reduce the entropy of Siphash by a factor of z = H'/H < 1, then c = 2M / zN = 1/z > 1 and we have entered the supercritical regime where a giant component has emerged in the network and the number of cycles absolutely explodes.

This effect may be quickly and easily demonstrated in the existing codebase. We may remove one bit of entropy by simply ORing every node ID with 2 at the end of the dipnode function. Note that we take care to alter a bit that is not the LSB since the LSB is special in the Cuckatoo and Cuckaroo variants. The removal of one bit of entropy from the node ID’s causes a large number of 42-cycles to appear for every nonce. Entropy reductions of less than a single bit may be numerically explored by applying the OR filter with probability < 1.

Possible Misuse of Siphash

Siphash was designed to be used as an HMAC, not as a hash function. That is to say, siphash claims only that an attacker who knows message M and siphash(k,M) cannot guess key k. No claims are made about siphash(k,M) when k may be selected adversarially. This leads to concerns that specific manipulation of the key values can lead to reduced entropy in siphash and therefore an enormous excess of solution cycles. For example, ARX based hashes are known to be vulnerable to mix states of all zero’s, a state which trivializes all three ARX operations.

Siphash Key Space Searching

Since Siphash is seeded using 256 bits of the Blake2b performed on the block header, we may preselect any b bits of the siphash key by performing 2^b iterations of Blake2b. Current ASIC speeds for solving Blake2b are on the order of 2^42 so we may reasonably expect to choose a large number of the siphash key bits. If something like differential analysis or a boomerang attack could sufficiently relate keybits to final entropy, Cuckoo Cycle would be reduced to a brute-force search of Blake2b.

Possible Initialization Weakness

Cuckoo Cycle uses a variant of siphash with a modified initialization routine that may cause unforseen weaknesses in siphash. The original authors’ key initialization uses a 128-bit key which is then split into two 64-bit words k0 and k2, each of which are then XOR’ed with special constants to derive k1 and k3, completing the initialization of the four 64-bit mixing words. In the modified version for Cuckoo Cycle, all 256 bits of the initial state are taken from Blake2b, and no XOR is performed. On first glance this may appear to increase the entropy of the initial state, but the concern is that the authors’ original initialization routine forced k0 and k2 to be different and k1 and k3 to be different. The very first ARX round mixes the word pairs which are guaranteed to be different on initialization, so if they are allowed to be the same, there may be significant entropy reduction possible in the Cuckoo Cycle variant compared to the original siphash. This seemingly innocuous change to the initialization procedure may have important implications for the differential analysis of siphash.

Nontechnical

Solving Cuckoo Cycle means to “find cycles of a certain length in a graph,” but what does that mean? A graph is any network of connections, so let’s think about a network of roads that connect cities. Cuckoo Cycle basically generates a random road map and then asks you to find a round-trip loop on that map which has an exact distance. Find a round-trip loop within Belgium that has a distance of exactly 42 km. If you think about how many roads exist in a developed country like Belgium, you’re pretty sure you can find such a loop. In fact there are many many round-trips in Belgium with such a distance, because there are so many different ways to drive in a circle. But what if we had to find a round-trip of exactly 42 km using only the roads of the Sahara desert? There aren’t too many roads cutting across the Sahara so that’s going to be very VERY difficult if not impossible to find a round-trip of exactly 42km.

It turns out that finding round-trips goes from being really hard to being really easy very quickly, once you get enough roads. To explain how this works, let’s leave the Sahara and Belgium behind, and travel to the continental United States during the development of railroads. At first there were only connections between Boston, NYC, and DC in the east, a separate rail from Chicago to St Louis in the Midwest, and another separate rail along the Pacific Northwest.

If you wanted to find a round-trip loop of 2000 miles, it would be hard to do, because everything is split apart into small pieces of railroad networks. You can’t get from New York to San Francisco. Now as the railroad develops and more track is laid, we get to a point where everything is almost connected. The Pac NW network grew, and the Midwest network grew, and soon there’s a point when the Eastern and Midwest networks connect and suddenly a bunch of new ways to travel open up: Boston to Chicago, NY to Chicago, DC to Chicago, Boston to St Louis, etc…

And as the railroad grows further, we get to a point where that last connecting track is laid, the continent is fully connected, and we suddenly have lots and lots of different ways to travel, all the way from the Pacific to the Atlantic, with many variations on routes in-between. That last one-foot of track has a huge impact on the number of round-trip combinations we’re able to make on the railroad.

Cuckoo Cycle is tuned to be perfectly at this critical connection threshold where there’s a lot of railroad tracks but they just barely don’t fully connect. If we are able to increase the number of railroads by even a very tiny amount, suddenly everything is fully connected and the number of round-trip possibilities explodes. This means if we can mess with the way the map is generated, we can cause tons and tons of round-trip solutions to appear. Siphash is the way the map is generated, so the main point of this article is that any slight problem with siphash, even a very small weakness, will cause a huge number of round-trip solutions to appear on the map, effectively breaking the difficulty of Cuckoo Cycle, turning it suddenly from a “hard-to-find-loops” map into an “easy-to-find-loops” map.

Interesting and thanks for the very clear non-technical analogy.

But siphash has no entropy reducing operations like binary-or. All ARX operations Addition, Rotation, and Xor, preserve entropy. So this appears to be a non-issue.

As noted above, there is no reduction in entropy in a ARX construction.
Cuckoo Cycle does not rely on the MAC properties of siphash; that (without knowing k) having seen siphash outputs for many chosen inputs, you can predict outputs for new inputs. It relies on siphash generating graphs that are hard to distinguish from random graphs (no statistical bias).

Even if v0 and v2 are the same, it doesn’t amount to siphash having reduced entropy, but to having the first round do slightly less than full effect.

If you search millions of graphs, then you see that the number of 42-cycles follows the same distribution you expect for purely random graphs: you roughly find i 42-cycles once every 42^i graphs.

The transition you talk about is easy to quantify. With M edges on a bipartite graph of N+N nodes, the expected number of L-cycles
is (1/L) * (M/N)^L for abitrarily large N. So for instance to double the expected number of 42-cycles, you need M/N = 2^(1/42) = 1.01664
At N=2^31 that’s an additional 35.7 million “roads”. Not quite as dramatic as your one " last connecting track "

1 Like

What’s the chance siphash has unknown opperations?

What about outside the box attacks like a slight Maxwell’s demon?

I had siphash as labled as “fancy math” should that not be true?

It strikes me as though this would be of similar difficulty to cracking sha256 to “break” bitcoin mining with the same techniques. Is relating siphash keybits to final entropy fundamentally different from this?

Interesting. Thanks for the analysis.
(Secondary lesson: Analogies, while instructive, may lack quantitative accuracy :slight_smile: )