Critical PoW vulnerability closed; the accidental birth of a new PoW

tromp · July 27, 2020, 3:44pm

As you may know, Grin has a unique dual-PoW system during its first 2 years.
The primary PoW is an ASIC friendly one that initially gets only 10% of the rewards.
The secondary one, starting from 90% of the rewards, should strongly resist ASICs, in order to foster a vibrant GPU mining community, with a low barrier to entry.
Over the course of 2 years, this 10% vs 90% distribution will linearly shift to a 100% vs 0% distribution.

Apart from some inherent resistance, the major source of ASIC resistance should come from frequent tweaks that no ASIC manufacturer should be able to predict or otherwise accomodate.
Both PoW are members of the Cuckoo Cycle family, in which puzzles take the form of random graphs, i.e. nodes connected by edges, and a solution is a cycle of 42 edges along 42 distinct nodes.

Act 3

During the 6 month reign of Grin version 3, from the 2nd hard fork mid January 2020, up to the 3rd hard fork mid July, its secondary PoW was supposed to be Cuckaroom29.
Introduced in [1], this third Cuckaroo instance replaced the hitherto bipartite graphs by monopartite ones.
In a bipartite graph the nodes form 2 separate partitions, with all edges going from one partition to the other, whereas in a monopartite graph, edges can go from any node to any node.

Except, something went wrong.

Although implemented according to spec in the Cuckoo Cycle repository [2] in commit 8f6f33b [3], the Rust implementation in Pull Request #3136 [4] suffered a nasty oversight in the post copy-and-paste modifications from the previous instance, known as Cuckarood29. In particular, the nodemask setting was left unchanged:

		let nodemask = self.params.edge_mask >> 1;

This effectively sets the number of nodes equal to half the number of edges. As it should be in the bipartite Cuckarood.
But in Cuckaroom, the number of nodes should equal the number of edges. In fact the number of nodes is chosen so as to make the expected number of 42-cycles in a random graph very small.
In the first instance, Cuckaroo, it was 1/42. In the second, Cuckarood, it doubled to 1/21.
This means that most graphs don’t have any 42-cycles. You have to search many graphs before you can find one.
Even when you find a 42-cycle, it still has to meet the network difficulty.

A new PoW is born

I had accidentally created a new PoW. By having only half as many nodes, or equivalently, twice as many edges, it had many more expected 42-cycles. 2^42 times as many to be precise, resulting in over 100 billion 42-cycles. With that many, you won’t need to search more than one graph. There’s bound to be 42-cycle satisfying the network difficulty in just the one graph that you search. Hence I shall call this PoW CuckarOne.

Why nobody noticed

Almost all Cuckaroom29 solutions are also Cuckarone29 solutions. If you have a 42-cycle along 29-bit nodes, then stripping off the most significant bit of each node still results in a 42-cycle. Except when two of its nodes get identified, then it’s no longer a proper cycle. But that is exceedingly unlikely to happen, about 1 in 625,000. With only 262,080 Cuckaroom29 blocks, there was likely no miner who saw their winning solution rejected by the network.

Why I noticed eventually

In very early June, I was repeating the process of preparing a Pull Request for Grin’s next hard fork. It so happens that Cuckarooz, the final instance, has twice as many nodes as edges. This time I did (eventually) think of adjusting the nodemask, and realized that the old setting was wrong.
After then realizing the implication on the expected number of 42-cycles, I feared for the worst. That someone else would have made the same discovery and developed a much more efficient solver, minting Grin nearly for free. That would leave Grin’s coin distribution forever tainted.

Looking for exploits

Fixing the bug was as easy as making a single bit change:

		let nodemask = self.params.edge_mask >> 0;

I made this fix in my private Grin repo and started a sync from scratch. If
anyone had exploited the bug by solving Cuckarone29 instead of Cuckaroom29, my node would reject the block and just remain stuck there.
Block 524160 (2nd hard fork) came and went, and the sync kept going.
Many agonizing minutes later, I was synced. And quite relieved; there was no exploit yet. But there were still 1.5 months of Cuckarone29 to go.

Dealing with the critical vulnerability

On Saturday June 6, 2020, I informed the Grin security team. Judging purely from the explosion in 42-cycles, I suggested that an attacker might get an efficiency advantage of between a million and a billion.
With the hard fork coming up in mid-July, we decided to implement a silent fix, disguised as a PoW code refactoring.
This came with the risk of a chain split, so we decided to run monitoring nodes with the fix in place in several timezones, that would trigger alerts if invalid blocks were relayed.
In that event we were prepared to disclose the fix and contact major pools and exchanges to urge them to apply the one line fix as soon as possible.

The PoW refactoring Pull Request #3365 [5] was published on June 24 and merged 4 days later.
Thankfully, the monitoring nodes remained silent.
On Jul 17 2020, at block height 786240, the network successfully upgraded to v4.0.0.

It’s important that the new code judges Cuckatone29 solutions to be invalid, even after the succesful migration to Cuckarooz29. Otherwise, an attacker could use their ability to cheaply generate high-difficulty Cuckatone29 solutions to
perform a months deep re-org. Although the re-org would lack any primary C31+ blocks, the secondary scale would merely be driven down to its minimum value of 13, which is only 2 orders of magnitude below recent values.

Fix details

The PoW code refactoring includes the addition of
a node_mask field to the CuckooParams struct, which is set in the
constructor from an additional node_bits parameter:

    pub fn new(edge_bits: u8, node_bits: u8, proof_size: usize)
                               -> Result<CuckooParams, Error> {
        let num_edges = 1u64 << edge_bits;
        let edge_mask = num_edges - 1;
        let num_nodes = 1u64 << node_bits;
        let node_mask = num_nodes - 1;

The Cuckaroom29 struct is then created as

     let params = CuckooParams::new(edge_bits, edge_bits, proof_size)?;

which passes edge_bits=29 for parameter node_bits.

How easy is Cuckarone29?

All existing efficient Cuckoo solvers rely on edge trimming,
which is only feasible because the expected number of cycles is small.
In a Cuckarone graph, most nodes are part of many cycles, so few edges can be trimmed.

Just throwing away half of the 2^29 edges in the Cuckarone29 graph reduces it to a
Cuckaroom28 graph. That gives a speedup of 2x over solving it as a Cuckaroom29 graph.

For much larger speedups, of well over 1000x, the following method looks like it should work.

The single source double tree algorithm

We build an efficiently indexible representation of the graph in which each node has room to store up to 4 incoming edges and up to 4 outgoing edges. Then it picks a single node v that has at least 4 incoming and 4 outgoing edges.
It can be shown that the expected number of 42-cycles going through v is at least 2^14.
We then build, layer by layer, a tree of incoming paths each of length 21, and a tree of outgoing paths each of length 21.
Denote the set of nodes found to have 21-paths to v as U, and the nodes reached by 21-path from v as W. With the average in- an out-degree of the original graph equal to 2, we expect U and W to be of size at least 2^21, which is 1/128 of all nodes.
We then expect U and W to intersect in at least 2^21/128 = 2^14 nodes. These common nodes can be efficiently enumerated as can their paths to and from v, each forming a 42-cycle. These must still be filtered for self-intersection, but the vast majority will survive. The total effort to generate these roughly 2^14 42-cycles through v is proportional to size of the double trees, or about 2^24. If none of the 42-cycles meets the network difficulty, then the process can be repeated with a different v.

Links

[1] https://forum.grin.mw/t/next-pow-cuckaroom-unveiled-at-grincon1
[2] https://github.com/tromp/cuckoo/blob/master/src/cuckaroom/cuckaroom.hpp#L133-L164
[3] https://github.com/tromp/cuckoo/commit/8f6f33bc819e0d7fa954435bb53589e8d88ea9f8
[4] https://github.com/mimblewimble/grin/pull/3136
[5] https://github.com/mimblewimble/grin/pull/3365
[6] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-15899
[7] https://grin.mw/blog/2020/07/27/critical-vuln-patched-in-grin-4.html

Anynomous · July 27, 2020, 4:13pm

Luckilly no one (appart from you) noticed.
On the plus site, nice invention of a new PoW.

Kurt · July 27, 2020, 5:10pm

God Save The Grin

bluimes · July 27, 2020, 6:13pm

Which God do you mean?

david · July 27, 2020, 6:58pm

He probably means @God

bluimes · July 27, 2020, 7:02pm

Ah makes sense, now you say it

mcm-mike · July 27, 2020, 9:20pm

Thank you @tromp for disclosing the process to the public

I really hope we will find you a “sparring partner” with the deep knowledge you have designing the “Cuck*$algorithm*” for a second pari of

Thank’s also to @joltz which proofs the security team lead decision was chosen wisely.

Topic		Replies	Views
Next PoW CuckARooM unveiled at GrinCon1 Mining	18	4739	January 8, 2020
mid-July PoW hardfork: Cuckaroo29 -> Cuckarood29 Mining	12	5240	July 18, 2019
Introducing the final tweak: CuckARooZ Mining	6	6176	May 20, 2020
On dual PoW, Graph rates, gps, and Difficulty Mining	50	11257	October 1, 2021
Grin Mining FAQ - All of the answers to your mining questions Mining	49	38396	November 29, 2021