Choice of ASIC Resistant PoW for GPU miners

I missed quite the debate. I agree with tim that its possible the plan could backfire but rather then change the details I think you need a bounty for open source designs.

@timolson how do you go about designing an asic? I’m afraid my hardware experience is only the game mhrd, yet it seems a topic no one knows

No, 64 MB of SRAM is the full 2^29 bits node bitmap for cuckatoo29, i.e. PART_BITS = 0.

I consider the first round of trimming. A single trimming round has 2^PART_BITS passes. Line 155 of https://github.com/tromp/cuckoo/blob/master/src/cuckatoo/lean.cu loops over rounds, and line 156 over passes.

No, it wouldn’t, since number of remaining edges decreases geometrically. On average, an edge is hashed in less than 4 trimming rounds. And is hashed twice in a round, once in count_node_deg, and once in kill_leaf_edges. A siphash computation should cost much less energy than an SRAM lookup.

OH! I thought we were talking about Cuckatoo-32, which explains my confusion. Didn’t we start the conversation talking about a 7GB mean table?

I am NOT going to claim that a Cuckoo 29 ASIC can’t run lean miner with an edge liveness table! Plenty of room for that.

Are you saying that Cuckatoo-29 will be allowed on mainnet? I thought it was just a testnet thing.

Before I embarrass myself further, can we clarify the parameters for the ASIC discussion? Everything I said so far assumed C-32 where an edge liveness table wouldn’t fit on-chip without cutting memory at least by 2 and suffering the extra time & power per hash.

If the lean miner uses external memory, then the bottleneck is bandwidth. To fix bandwidth issues, use more dimms. Dimms have lots of storage, for lean mining all this storage is wasted.

I think mean mining makes sense then. Lots of storage, not much bandwidth. Optimize the precious resource.

We were talking about SRAM vs DRAM, but I used cuckatoo29 for my numbers because mean can still use 32 bit edges in the first round. For bigger sizes like cuckatoo32 it will need a less “round” number like 40 bits. I now see that causes unnecessary confusion. Sorry about that.
Btw, the mean CUDA miner for cuckatoo29 uses about 7GB, while the CPU miner gets away with 2.2GB, using lots of space saving tricks that would slow the GPU way down.

No, but the AR resistant Cuckaroo29 will be the secondary PoW.

Yes, Cuckatoo32+ is the mainnet primary PoW. And its incompressible edge bitmap of at least 512 MB will not fit on chip, so any ASIC will require off-chip DRAM. SRAM can be either off-chip, or on-chip, but the latter only with tmto.

My previous reply was mostly caused by the tone, not really the content.

And I appreciate that, your input is valuable, especially so when you keep it on the constructive side of discourse.

2 Likes

I think we can save those changes for when we need to tweak the PoW 6 months after launch. We can then look for changes that will increase GPU hardware utilization as well.

1 Like

"The idea is that a long sequential computation is required to compute the endpoints of a whole block of edges. The mean solver would do this only once and store all results, one 64-bit int per edge. "

It might be possible to compute them on another device and free the GPU for trimming (via pipe-lining of the jobs). CPU’s can help with AES, while accelerator cards like Acorn CLE 215 can probably do Siphash pretty well.

are accellerator cards good or are they too close to asics?

When you say that the diminishing returns for GPU miners will be linear, do we know what that will look like? Will that be a monthly diminution? Daily? Every block?

It’s planned to drop by 1% every 11648 blocks (2 * HEIGHT_YEAR / 90), a little over 8 days…

1 Like

It drops by an “absolute” 1%, right? This is a linear function not geometric: 90.00% 89.00% 88.00%…

Yes, absolute 1%, as implied by “linearly decreasing from 90% to 0% over 2 years”.

What’s the rationale behind putting up a 3-4GB requirement?
There are many in the mining community with old GPUs or very many in the Monero community with low power low end 2GB GPUs that work just perfect with Monero (in some cases better than Vegas in terms of hash/watt). It would actively hurt decentralization & egalitarianism by not including every possible hardware - if feasible to implement.

The requirement for cuckaroo29 is actually 7 GB.
The rationale is that Cuckoo Cycle is a memory hard PoW,
and as such wants to exercise as much memory as possible.

The lower the memory requirements, the higher the risk of FPGAs and ASICs getting a large advantage.

There are on the order of a million GPUs available that can mine cuckaroo29,
and we hope to attract a decent fraction of them.

So by lowering the spec requirement to about ~1GB instead of 7GB and therefore allow many more devices to participate (mobile GPUs in phones, notebooks & low to mid-end GPUs) what exactly would it make that much easier to implememt an ASIC?
I don’t see a big problem with FPGAs by the way, if they can implement it with 1GB they can also do it with 7GB, I’d expect the costs to grow linearly on both specialty hardware & high-end GPUs?
Please correct me if I’m wrong. I am not a hardware expert by any means.
But I think the benefit might outweigh potential drawback by allowing every kind of low end device to also participate instead of specialized high-end 8GB Vega/1080 miners (I have both a Vega farm & a 2GB low end GPU farm).
Just imagine all the additional people that this might attract & introduce to the project. I only got into Bitcoin back in 2012/2013 because I was able to mine it on my Laptop Nvidia GPU :slight_smile:

I also don’t think ASIC manufacturers would get involved too much since you’re taking a clear anti-ASIC stance and take such a risk when the reward drops by 1% every week (0 reward after 2 years). It’s just a temporary measurement and personally I believe getting as many people as possible into the project is the best way forward for Grin. After all, cryptocurrencies can only establish themselves if the community is big & diverse.

By the way what’s the limiting factor with cuckaroo29? Mem bandwidth? Ops/s (timings/clocks)?

Existing FPGAs have limits on amount of memory supported. There will generally be many more FPGAs able to support 1 GB than ones able to support 8 GB.
ASICs, on second thought, should not be a problem as long as the DRAM required doesn’t fit on chip, which is certainly the case at >= 1GB.

There is another potentially large benefit to requiring more than 6GB. Many huge mining farms are dominated by <= 6GB GPUs. Excluding them makes it easier for hobby miners to compete.

Lowering memory requirements allows more people to mine with their current equipment, but most won’t be able to do so profitably.

That said, we could in principle support a range of acceptable cuckaroo sizes rather than the single 2^29 size, if deemed beneficial to adoption.

I think the most popular FPGA in crypto-mining is the BCU-1525 with 16GB of RAM. And from what I’ve heard the Mineority group hosts/and/or owns over 5000 of these (correct me if I’m wrong), so if they decide to write Bitstreams for it they’ll pretty much have no competition.
I think that’s another reason why it’s important to open up mining to as many people and categories of devices as possible (as you are doing).
I also believe there are as many 8GB mining farms as there are 4GB mining farms, the added expense from manufacturer’s side to either go with 4GB or 8GB is minimal (like 20 bucks when I last phoned with an AMD card manufacturer’s manager) and since most big farms are Ethereum farms I’m sure they were smart enough to pay $20 more and not be out-DAG’ed.

Personally I believe it’s smaller farms & hobbyist miners that went with smaller RAM sizes since they don’t plan to run it for years out, like a big icelandic farm with cheap power would.
Don’t exclude the hobbyists & small farms from participating with their existing infrastructure. The only argument is that it makes it harder for FPGAs (going with 6-7GB), but there’s probably going to be FPGAs mining on cuckaroo whether you like it or not. So at least open it up a bit.

Just because you have 16GB on that BCU-1525 doesn’t mean you also have GPU-like bandwidth. I’d be surprised if that thing comes even close to GTX1070 when it comes to cuckaroo. It will eat less power, that is for sure. They all should be mining cuckatoo31 with those.

I believe the AR PoW aims to deter any asic, including those with DRAM by any means. More memory chips, complex memory interface and large chunks of data to process are all part of the puzzle. Cuckaroo benefits from complex GPU designs, if you enable all eth farms on cheap power in, your old AMD card won’t even pay for the power.

You can still mine AF PoW on any GPU with 1GB for a loss if you want.

  1. So you think ETH farms around the world all run on 4GB? You will have farms with cheap power either way.
  2. My point is offering fairness and giving people a chance to participate. If somebody hears they can mine this algo profitably on their mobile GPU, think they won’t just try printing magic internet money for shits & giggles? That’s how Bitcoin took off, through being mined by people at their homes and spreading the word.

By enabling only 8GB you are excluding a large potential userbase and their network. Giving all power to 8GB GPU miners and $4000 FPGA owners.