Scheduled PoW upgrades proposal

grin will launch with 10% ramp to 90% so

cuckatoo32 is for 10% to 55%
cuckatoo33 is for 55% to 100%
then another year of cuckato33
then there are four years of cuckatoo34

is this right?

Are there reasons to believe that a memory-IO bound PoW will yield a more competitive mining ecosystem than a compute-bound PoW?

I am concerned that a memory-intensive PoW may actually yield a less competitive mining ecosystem, for a few reasons:

  • There are many possible layouts, architectures, and memory technologies to use in a high-memory ASIC. I think that this flexibility increases the chances of large discrepancies in profitability between mining ASICs. I don’t believe this is the case for PoWs which lend themselves to multiple identical hashing cores, since in that case you just make as fast a core as possible, and stuff as many as you can in the chip.

  • I think that ASIC manufacturers will look to exotic techniques, such as EMIB, in order to overcome inter-die communication latency and bandwidth issues. I think this will increase the design and capital resources needed to create competitive mining hardware, and thus reduce the number of competitive ASICs available.

  • Compute-bound PoW ASICs benefit from packing gates densely on a die, and run them fast and at low power, and dissipate heat. If one ASIC manufacturer is able to obtain a technological breakthrough at one of these things, their compute-bound PoW ASIC will have a huge advantage. However, I think this is unlikely, since these things (gate density, speed, power, and heat dissipation) are highly sought after and researched by foundries and semiconductor companies, being central to profitability and competitive advantage. On the other hand, it’s not clear to me that the architectural concerns of a Cuckatoo Cycle ASIC are as well trodden, so to speak, so I think it more likely that one ASIC manufacturer will find an exotic optimization, perhaps not applicable to other chips, that will reduce the number of competitive ASICs available.

  • Cost of electricity is central to the profitability of compute-bound PoW ASICs. The need to find cheap electricity is actually a decentralizing factor, since cheap electricity is geographically dispersed and not unlimited. If ASICs need cheap electricity to be profitable, then the manufacturer must sell them on the open market, to geographically and politically distributed entities who will be able to mine profitably with them. If cost of electricity is of secondary importance, as may be the case with high-memory ASICs, then the best place to mine with them is as close to possible to wherever they are assembled, and a manufacturer is likely to mine with their own hardware, reducing availability of competitive ASICs.

  • This is a concern specific to Cuckatoo Cycle+, but is worth mentioning since you’ve said that single-die Cuckatoo Cycle ASICs are not meaningfully different from compute-bound PoW ASICs: If smaller graph sizes are periodically made obsolete, in order to prevent single-die ASICs, and older generations of miners are rendered useless, this gives more power to the mining hardware manufacturers, since they do not have to compete against their old chips that they’ve previously sold into the market.

  • In this article you can read about how a new generation of Ethereum ASICs has been released that trounces GPUs and even old ASICs by a huge margin. I think this is more than the difference between subsequent generations of Bitcoin ASICs these days. Unless other companies are able to respond, this may be the only competitive ASIC out there. Although this isn’t directly applicable, but it is an example of an existing high-memory PoW that does not have a particularly competitive mining landscape.

  • Bitcoin ASICs have been around for 5 years, so ASICs for a PoW which is similar to double SHA256 will be easier for existing mining hardware producers to bring to market, which I think would be good for competition.

Unless there strong reasons to believe that a memory-intensive PoW would actually be better than a compute-intensive PoW, I think that such an experiment isn’t worth the risk, because of the above reasons, which I think are reasonable, that it may be worse.

1 Like

I believe the current theory is that memory chips have a fairly well defined cost from decades of everyone wanting it small while rearranging logic gates to this months way of edge trimming doesn’t.

Memory chips only have a well defined cost if they’re standard blocks of SRAM or DRAM chips. If they’re custom designs I think that can get quite expensive in terms of IP and design.

On the other hand, I think that rearranging logic gates probably has the most well defined cost, since so much of chip production is just rearranging logic gates.

Right.

Cucatoo32 starts getting phased out after 2 years from launch.
So it’s 2 years of cukatoo34 next.

True; Cuckatoo Cycle is as much about board level optimization as about chip level optimization.

These are challenges that the general computing market is already focussing and competing on (unlike optimizing SHA256 circuitry), so we can expect wider applicability of any optimizations.

As they already do with compute bound ASICs.

We expect cuckatoo miners to be able to handle multiple graph sizes, to get comparable lifetimes to fixed size PoWs.

They have yet to be released, and if the delayed ETH hardfork switches PoW to ProgPoW we may never see then released. In any case, we saw such leap-frogging in earlier bitcoin days as well. It simply takes many years of ASIC development to significantly reduce the size of leaps.

Altogether I admit there are considerable risks to the experiment. At some point I suggested a long term dual pow model with 50% compute bound and 50% memory bound, but this was deemed too complicated.

so cuckatoo32 and cuckatoo33 only get 1 year each?

how much memory is needed to do mean mining for cuckatoo34?

Yes; recall that cuckatoo32+K gets phased out after 2^K years.

Let’s say cuckatoo29 requires 3 GB. Then cuckatoo34 needs 2^(34-29) = 32 times more, or 96 GB DRAM, 33% more than what you find in an E3 ethash miner.

That’s why lean mining should be much preferrable, needing only 2 GB DRAM and 2 GB SRAM (or some fraction thereof, by trading off SRAM for extra passes). Note that an Z9 mini Equihas miner has almost 2 GB of SRAM spread over 12 separate ASICs.

so you can do mean mining cuckatoo34 with only slightly more dram than an E3 miner? this is interesting to me.

if you were to make an asic for cuckatoo today, what would you do? cuckatoo32 and cuckatoo33 seem like bad ideas because so little block reward time, not worth bothering

This seems undesirable, right? This would be an additional source of optimizations. I would assume that the fewer sources of optimization, the better, since thus different manufacturers are more likely to produce competitive miners.

It’s true that the general computing market is not focusing on optimizing SHA256 circuitry, but that’s just the algorithm. It is focusing on the supporting fabrication techniques, i.e. packing gates tightly, running them quickly, at low power, and dissipating heat. I think that the specific algorithm isn’t significant.

Right, I’m not saying that manufacturers don’t mine with compute-bound ASICs. My point is that it’ll be worse with memory-bound ASICs, because cost of electricity will probably not be much of a factor. Cost of electricity is decentralizing, since deals on electricity are localize and don’t scale infinitely.

But planned increases to the minimum graph size will only reduce the lifetime of miners, right? So they would have a strictly longer lifetime if the minimum graph size wasn’t increased.

What are the expected upsides of this experiment for Grin itself?

I agree the fewer things to optimize the better. And in this regard Cuckatoo Cycle is worse than SHA256.
I disagree about electricity cost not being much of a factor with memory bound ASICs. SRAM is quite power hungry.
The lifetime limits that graph size increases impose on miners become more and more relaxed over time, as they are exponentially spaced in time.

The upside is appealing to people who like exploring new avenues and having a more interesting PoW puzzle that will require a more computer-like miner, possibly attracting existing memory technology companies to get into miner development.

Thanks for the explanation.

I would really prefer that the proof-of-work algorithm be chosen simply to maximize predictability of the network’s hashrate and security, and minimize uncertainty surrounding future price/performance/efficiency improvements of mining hardware.

I think that a bitcoin-style proof-of-work, while not perfect, does this as well as we know how. The most efficient implementation is conceptually simple: a core of gates, replicated as many times as possible on a die. Of course, fabrication techniques and low-level design optimization still play a role in differentiating ASICs.

With Cuckoo Cycle though, there are multiple additional sources of optimization: The GPU solvers are nontrivial and still being optimized, and they will surely be optimized further and in secret. Similarly, the optimization space for hardware implementations is huge, and we seem likely to see a variety of board-level optimizations as time goes on.

I think that, since there aren’t compelling benefits to the coin or network, this makes a high-memory pow algorithm an imprudent choice. I don’t want to belabor this point though, since I think that i’ve made all of my arguments.

3 Likes

One advantage with choosing a net-new pow algo (vs sha256) is that Grin avoids attacks from existing asic miners and enforces a smoother hash-rate ramp-up.

I definitely wouldn’t advocate for SHA256. I like SHA3. It’s been scrutinized extensively, but no ASICs for it exist and the coins that use is are comparatively small.

I’ll provide some counterpoints in no particular order. I don’t think any one of them is a definitive win and you’ll likely have a “yes but” for multiple of them. But I do hope that overall they provide enough weight to move your personal balance away from SHA3 (for example) for Cuckoo Cycle.

  1. Proof of work security does not exist in a vacuum. It’s useless if not enforced by a node. The validation of a Cuckoo Cycle is extremely simple and short. I can look at an implementation and see if it’s correct fairly quickly. Same for other implementations. SHA3 or other complex hash functions, not so much. I do acknowledge that SHA3 libraries could generally be more reviewed, but that’s not always true and not for all programming languages. So I choose simplicity for security.
  2. Cuckoo Cycle is an actual proof of work. This may be more of a point of design or aesthetics, but generally hash functions are not. They become one when paired with a network target. But I like that Cuckoo has been designed with a PoW, provides several levels of difficulty and is extremely tweakable.
  3. I can explain how Cuckoo Cycle works and why it does. Most hash functions are pretty much black magic at this point, where the right series of gates and logical operations gets stapled one after another to provide more entropy. They provide extreme levels of safety against extreme levels of opacity. Could you see where an XOR is missing and why it needs to be there? I think they’re a hammer and we don’t really have a nail.
  4. Speaking of security, what are the goals of the proof of work? I’d personally list them as security, decentralization and distribution. Note that the 2 first are intimately related and I wouldn’t consider the first without the second. A pure hash-based PoW has mostly shown to lead to a winner-take-all situation for whoever can get the most silicon. Only clever chip design has been able to counterbalance that.
  5. I see custom ASICs as a local optimum. Even this industry is starting to move toward paired programmable FPGA-like chips to be able to wire up ASIC functionalities differently if the upstream algo changes (see Ethereum miners for example). And this is a fast-changing industry as well. Designing or choosing algorithms just for what this industry can do right now could leave us very ill-prepared for the future.
  6. There are multiple possible sources of optimization for Cuckoo Cycle. Is that good or bad? I don’t know, I’ve heard believable arguments on both sides. For what goal? I think this argument is a total wash.
  7. On uncertainty of mining hardware improvement, I’m not sure I share your analysis. Uncertainty used to be pretty high for Litecoin for example, even though Scrypt is closer to SHA3 than Cuckoo. I think that uncertainty is inherent to the market and not the PoW design. And so again, picking the design based on the market is, ill-advised IMO.
  8. I don’t worry that Cuckoo Cycle solvers are too complex. As mentioned people used to think the same of Litecoin. There’s lots of ingenuity in the hardware design space. And the lean solver is very simple to explain.

I’ll stop here. But overall I think it’s important to realize that currently this is a many-variables problem, with many unknowns. I don’t think there’s one true best answer and I don’t think there’s one best variable we should optimize on either. So in the presence of high uncertainty around the best solution to a problem, I tend to pick something simple but flexible. Cuckoo Cycle is that (and of course some other things too).

8 Likes

FWIW, Handshake avoided using (solely) SHA3 because it is highly likely there exist SHA3 ASICs. Sia’s David Vorick seems to agree.

Have you actually looked at Keccak? It’s actually pretty simple, more simple IMO than the Cuckoo Cycle validator.
Compare:
https://keccak.team/keccak_specs_summary.html

Although Cuckoo’s validator is “fewer lines of code” I find its nested looping constructs relatively difficult to understand and follow. IMO it’s easier to read and understand Keccak than Cuckoo. Is it more LOC? Technically yes, but it’s simpler. Also, the verifier is the solver. CC’s solvers are crazy complex.

Huh? Hash functions are designed to best ensure non-reversability. There is no proof that one-way functions even exist (such proof would imply P != NP), but hash functions are cryptologists’ best attempts. Cuckoo Cycle has basically no research behind it, and the potential for shortcuts is tremendous. In this regard, CC is a much weaker “proof” of work than peer-reviewed hash functions.

Also, changing network target is an easy tuning parameter. In fact, Cuckoo Cycle hashes the final edges to produce a value that’s compared with a network target. How is this any different?

Just because you don’t understand them doesn’t mean there’s no theoretical grounding for them. The ARX family, for example, is used because Add flips bits nonlinearly, XOR explodes combinations, and Rotate helps create avalanche. Furthermore, all three are constant-time operations which prevents timing attacks. Can you say anything like that about Cuckoo Cycle?

  • Security: Traditional hashes beat Cuckoo Cycle. They have wayyyy more research and peer review. Not even close.
  • Decentralization: This is related to complexity of implementation, and again traditional hashes win. CC solvers are crazy complex compared to something like Keccak or Blake.
  • Distribution: Complex PoW’s are what lead to a winner-takes-all market, not the ability to print silicon. Wafer production is highly commoditized, and even my tiny startup has access to the same production facilities as Bitmain. When you use a complicated PoW, only the most clever designers can produce effective chips. A simpler PoW would improve commoditization and distribution.

FPGA’s are only useful if you need to make changes. If the algorithm is fixed, ASICs win easily, no contest, never gonna change. FPGA’s are gaining popularity because of their flexibility, not for their speed.

Absolutely bad. The door is wide open for secret optimizations, allowing a few clever private miners to dominate the hashrate. A well-reviewed, highly researched hash function is much stronger against secret optimizations.

No way. It’s absolutely related to the design. Scrypt was a relatively new proposal which hadn’t been extensively attacked before it was put into Litecoin. The lazy-computation paper led to subsequent improvements/fixes to Scrypt, but this is exactly the use case to demonstrate why you don’t want to put an unproven, unstudied PoW into a production coin.

… Which led to unexpected ASIC development and the aggregation of hashpower to the few people who A) understood the Scrypt attack B) could implement the more complex algo needed to take advantage of the attack. Regular people were left behind.

1 Like

But WHY? What’s wrong with single-chip ASIC’s? If you are accepting ASIC miners as a reality, what’s your perceived important difference between single-chip ASIC’s and ASIC’s that require IO?

So what if they “resemble” GP hardware or not? They cannot be used for GP hardware. And you cannot rip DRAMs off a miner’s logic board and reuse them. They’re trash. You throw away the entire unit and make new ones from scratch.

If you are imagining that CC ASIC miners will use DIMM’s that can be popped off, just so they can be recycled when they’re obsolete… hell no. DIMM’s are only used in GP hardware to allow end-users to select different memory sizes. Any ASIC miner will use surface-mount BGA’s. And by the time the ASIC isn’t profitable anymore, those DRAMs on the board won’t be worth anything, certainly not worth enough to rip them off the board.

1 Like

They’ve been done to death. They’re rather boring IMO.

Grin is taking an admittedly risky gamble on a novel and elegant PoW to find out what benefits competition for the most efficient memory IO can bring. I’m hoping, perhaps against all odds, but hoping still, that we will see some advances in memory technology in the long run (over decades), as such advances can be immediately monetized, whereas they may falter in the overly conservative commodity memory market.

Where every coin was happy with a fixed emission or small tail emission, Grin took the bold step of trying a pure linear emission.

ZCash clearly tried to avoid single-chip ASICs (stressing how Equihash would be “equitable”), but chose its parameters poorly.

To boldly go where no blockchain has gone before.

And because I love answering the question “what algo?” with

finding 42-cycles in random bipartite graphs with billions of nodes…

I strongly disagree Keccak is simpler. The cycle checking is conceptually trivial; it’s following the 42 edges around the cycle to verify the adjacencies. Any half decent programmer can code that from just the words “42-cycle”. So the real complexity is in the hash function. And Keccak is definitely way more complex than siphash.

Some of them are. Others like the lean miner are quite simple, spending the vast majority of their time in essentially just these few loops

The complexity is due to the limitations of current-day memory technology, which Cuckoo Cycle aims to help lift.