Cuckatoo32 feasibility

I’m ensuring that there are pins in the first place. I.e. that you cannot build a single chip ASIC. This is discussed at length in https://forum.grin.mw/t/scheduled-pow-upgrades-proposal

No; you can just increment PART_BITS which doubles the number of passes you make over remaining edges in each round. See

You will however need to provide enough DRAM to store the bigger edge bitmap. Which is similar to ethash miners having to account for an eventual doubling of DAG size.

But… I thought you were being ASIC-friendly? Which is it!? Why not allow single-chip ASIC’s? Why even bother with the need for DRAM? What is the point?

For a lean miner, this doubles the power AND cuts the speed in half, which is effectively a 400% performance hit. Moving from S7’s to S9’s was a “massive” 250% efficiency boost, but S7’s were useless within months. Doubling the graph size is a much stronger effect than even the massive S7-to-S9 jump, and the only thing it achieves is this:

image https://www.trustnodes.com/wp-content/uploads/2018/11/bitcoin-asics-dumped-on-street-nov-2018.png

The main point is, why bother? What good reason is there to double the graph size? Bandwidth arguments make no sense. Don’t accelerate the trashing of old hardware.

That was answered in the topic I linked to. Please address the arguments in that thread if you disagree.

Your math is off. You use the same power for a 1x-2x longer time.
On cuckatoo_lean_cuda_29 for instance it’s1.3x slower.
Its performance hit is under 2x.

I already addressed this above:

How is it off? You must do 2x the number of siphashes. That’s 2x power. Please explain.

Lean Miner’ s 1.3x must be a peculiarity of GPU’s… I haven’t studied your Lean code honestly, but 1.3x must be because of one or more of these effects:

  • Lean-GPU is actually bound by atomic writes to shared and/or the ratio of shared memory to compute units for a given threadblock. i.e. it’s not bound by computation
  • Lean-GPU is caching siphashes in DRAM
  • Lean-GPU is bound by instruction latency and doubling the threadblock count improves pipeline efficiency

You cannot run Lean Miner on a GPU and expect the results to hold for an ASIC. Apples and oranges.

“We’re going to trash them anyway, so let’s just throw them away faster…”

No; it’s 2x the energy. So you take twice as long at the same energy/second.
Maybe I should have said your physics is off?!

I’m not; I’m just pointing out it can’t be more than 2x loss in efficiency.

Ok sorry, I’m wrong about 4x. On a Joules-per-hash basis, it’s 2x. I fooled myself with anxiousness to make a point. It will be very nearly 2x, which is still disasterous.

Any factor above 1x makes no sense to me. Why accelerate the trashing of hardware at all? I do not understand any reason to enforce multi-chip vs single-chip ASIC’s… which is the other thread.

John - thanks for the table. As you have correctly pointed out, for i = 1, remaining fraction is (1 - 1/e); call this T. It looks like for i = 4, remaining fraction is T/(2e); is this correct? Any other closed form results for i > 1? Thx.

I suspect not.

After 2 rounds, the fraction is a*b, where a = 1-e^-1, and b = 1-e^-a.
After 3 rounds, the fraction is b*c, where c = 1-e^-b.
I can formally derive these.

If we blindly continue the pattern, we get
d = 1-e^-c
c*d ~ 0.1167435
e_ = 1-e^-d
d*e_ ~ 0.0836613
f = 1-e^-e_
e_*f ~ 0.0630385
g = 1-e^-f
f*g ~ 0.04927553
etc.

which all closely match empirical results. But I have yet to find the formal proof for this pattern.

Here are the conjectured fractions, as computed by the Haskell expression

let fs a b = a * b : fs b (1 - exp(-b)) in fs 1 1

together with the cumulative values:

0 1.0 1.0
1 0.6321205588285577 1.6321205588285577
2 0.29617148759453016 1.928292046423088
3 0.17527117460580915 2.1035632210288973
4 0.1167434973394182 2.2203067183683154
5 0.08366133488711419 2.3039680532554296
6 0.06303852419193594 2.3670065774473654
7 0.049275532992528674 2.416282110439894
8 0.039615078308885755 2.4558971887487795
9 0.03256468342107633 2.488461872169856
10 0.027256743026040878 2.515718615195897
11 0.023157879639198638 2.5388764948350957
12 0.019925041905058123 2.558801536740154
13 0.017329210022888588 2.5761307467630425
14 0.015212625148852563 2.5913433719118952
15 0.013463672754950931 2.6048070446668463
16 0.012001541761243502 2.61680858642809
17 0.010766532955768914 2.627575119383859
18 0.009713754228894943 2.637288873612754
19 0.008808912243031928 2.646097785855786
20 0.0080254388075724 2.6541232246633584
21 0.0073424884526321595 2.6614657131159905
22 0.006743517461381278 2.668209230577372
23 0.006215258825631778 2.6744244894030036
24 0.00574697171180243 2.680171461114806
25 0.005329884410165022 2.685501345524971
26 0.004956775717854674 2.690458121242826
27 0.004621656739671887 2.6950797779824978
28 0.004319526457507245 2.699399304440005
29 0.004046182127856039 2.703445486567861
30 0.0037980708729921666 2.707243557440853
31 0.003572172534322336 2.7108157299751756
32 0.003365906473802856 2.7141816364489784
33 0.0031770568814350818 2.7173586933304135
34 0.0030037125008642515 2.7203624058312776
35 0.0028442176745825517 2.72320662350586
36 0.0026971323403770105 2.725903755846237
37 0.002561199154372839 2.72846495500061
38 0.0024353163243945627 2.7309002713250043
39 0.002318515046580586 2.733218786371585
40 0.0022099406741146227 2.7354287270457
41 0.002108836928254842 2.7375375639739548