Cuckoo Cycle weakness and possible fix (Cuckatoo Cycle)

The stated memory requirements based on 2-bit counters are incorrect, aince we only require counting up to 2.
So they’re really trits (ternary digits) requiring only log 3 bits.
An ASIC would pack 5 trits into a byte, with an atomic 2-bit update becoming an atomic 1-byte update.
cuckoo30 doesn’t need 128 MB for the counters, but only 4/5 times that, or 103 MB.

I consider this a weakness, and would prefer that an optimal ASIC design accesses single bits.

To that end I will later today propose a small change in the Cuckoo Cycle construction, that I hope everyone can agree makes Cuckoo Cycle even more ASIC friendly.

Is that a good thing? I was under the impression that one of the initial purposes of Cuckoo Cycle was to be ASIC unfriendly.

Originally Cuckoo Cycle was designed to make memory latency a bottleneck. Now, many years later, we realize that the SRAM that Cuckoo Cycle makes excellent use of (needing an order of magnitude less than the DRAM needed for efficient mining) is quite affordable in ASICs.
We expect ASICs to have a large efficiency advantage over GPUs.

Here’s the change in construction that allows replacing 2-bit counters by single-bit counters:

Instead of adjacent edges sharing an endpoint in the Cuckoo Graph, we require the two endpoints to differ in the last bit only.

This is a minimal change in the verification function, we replace
line 66 of cuckoo.c

  if (uvs[k] == uvs[i]) { // find other edge endpoint identical to one at i

by

  if (uvs[k]>>1 == uvs[i]>>1) { // find other edge endpoint matching one at i

and line 72

if (j == i) return POW_DEAD_END;  // no matching endpoint

by

if (j == i || uvs[j] == uvs[i]) return POW_DEAD_END;  // no matching endpoint

We also need to replace line 52

node_t xor0 = 0, xor1  =0;

by

node_t xor0, xor1;
xor0 = xor1 = (PROOFSIZE/2) & 1;

to fix early detection of bogus solutions. This has no effect on the distribution of cycle lengths in random graphs; a random endpoint is as likely to match another as it is to differ in the last bit only.

With this change, lean mining only needs a node bitmap to record whether it has any edge adjacent to it.
An edge then survives if the bit for its endpoint^1 is set.

Thus, a graph with 2^n edges will require 2^n bits of SRAM for full speed. It makes more sense to use this n in the name of the PoW.

Let’s call this new version Cuckatoo Cycle.

 )/_
<' \
/)  )
/'-""

We will then denote by cuckatoo32 the problem with 2^32 edges
needing 2^32 bits of SRAM for full speed.
Compare with cuckoo32 that only has 2^31 edges but similarly needed 2^32 bits of SRAM (or 4/5 of that with trit packing).

So I propose that Grin uses cuckatoo32+ as main PoW instead of cuckoo32+.

1 Like

This seems to make a lot of sense to me, it’s a very minor change and simplifies the memory layout conceptually and practically. What would the impact be on the mean miner? Would that become the default Cuckoo also from your repository?

Mean miner could use bigger buckets, since 48 KB of shared mem can now hold 2^18 counters instead of 2^17. That would be helpful for cuckoo31, since Nvidia apparently is much less efficient bucketing 2^13+ way than 2^12 way.
I haven’t decided how to deal with a pair of PoW in the repo yet.

It turns out that the expected number of cycles of length L on a graph of 2^n edges is slightly different, but in the limit as n->infinity it still converges to 1/L.

Originally Cuckoo Cycle was designed to make memory latency a bottleneck. Now, many years later, we realize that the SRAM that Cuckoo Cycle makes excellent use of (needing an order of magnitude less than the DRAM needed for efficient mining) is quite affordable in ASICs.
We expect ASICs to have a large efficiency advantage over GPUs.

I’m still unclear if this was a political shift or a “fuck it ship it” shift

I can understand the position from either a “each major coin should be asic friendly, but different with asics” or a “I guess my starting assumptions were wrong but now I’m an expert in something and some people like asics, sooooo whatever”

3 Likes

I was thinking the same thing I am now having flash backs of way back when mining BTC on R9’s before Asics took over. This could posiblby be bitmain 2.0.

Hi Tromp
could you elaborate on this? I cannot understand why “adjacent edges sharing an endpoint” can be transformed to “two endpoints to differ in the last bit only”…
Thanks!

You replace the adjacency test

endpoint(edge1, side) == endpoint(edge2, side)

or its equivalent

endpoint(edge1, side) ^ endpoint(edge2, side) == 0

by

endpoint(edge1, side) ^ endpoint(edge2, side) == 1

Done.

Doesn’t this mean endpoint(edge1, side) and endpoint(edge2, side) are not the same endpoint? So edge1 and edge2 are not connected?

I’m changing what it means to be connected. Watch me explain it about 34 minutes into my GrinconUS presentation at https://www.youtube.com/watch?v=h4AJDKoeO9E

You see, nobody cares about it. I was thinking too that an ideal truly decentralized coin should be PoW (CPU+GPU)+PoC(+PoS may be). But… Nothing personal, it’s just a business…

Thanks for pointing me to the correct answer! So in general, Cuckaroo shares the same CYCLE definition with your cuckoo cycle whitepaper, but Cuckatoo uses a modifed version, am I right?

yes, cuckoo and cuckaroo use the standard notion of cycle, while cuckatoo uses a nonstandard notion which i call off-by-1 cycles.

The only problem is that Grin’s individual qualities were supposed to be 1) privacy 2) GPU encouraging/ASIC discouraging and decentralized and 3) emission rate inflationary

  1. it doesn’t have over Monero, ZCash, many others and many layer-2 protocols
  2. exists but will drilled negatively on social media when adoption is attemtped

and now 2) is gone, well before the 3 years that were supposed to be GPU encouraging.

It seems like these are coin-killing decisions. But, gotta keep Chinese hardware manufacturers happy I guess …

  1. Grin has some privacy trade-offs, but there is a lot in the works that will make grin even more private than competitors. Right now it is very comparable, in fact you can hardly call zcash a privacy coins as most transactions are not private, and Monero still largely relies on ring signatures whose plausible deniability are not fool proof or perfectly safe. Privacy alone was not grin’s unique premise.

  2. ASICs are inevitable and good in the long term (they are most efficient way to run the network), grin has always aimed to create as fair and transparent an ASIC market as possible, as this seemed the most pragmatic approach to ASICS, given they are next to impossible to prevent (without big tradeoffs). Consider the alternative: people were working on [secret] grin ASICs for launch day, which apparently grin has prevented from being the case. That would have been a total failure of decentralization. There are now multiple ASIC companies competing against each other, proof that grin’s ASIC resistance startegy is already appearing to be successful through the lack of monopoly, and minimizing the odds that there can be secret ASIC mining.

  3. Emission rate is still cool.

  4. The nubmer one quality of grin (and mimblewimble) is that it is very lightweight due to cut-through/not needing to keep all transactions to still run a verifying node. Also one of the very first to use bulletproofs live. This is still true and the efficiency of the network is coolest part of grin.

4 Likes