Cuckatoo32 feasibility

tromp · January 10, 2019, 8:02am

I’m ensuring that there are pins in the first place. I.e. that you cannot build a single chip ASIC. This is discussed at length in https://forum.grin.mw/t/scheduled-pow-upgrades-proposal

No; you can just increment PART_BITS which doubles the number of passes you make over remaining edges in each round. See

github.com

tromp/cuckoo/blob/master/src/cuckatoo/lean.cu#L28


      
          #ifndef MAXSOLS
          #define MAXSOLS 4
          #endif
          
          typedef uint8_t u8;
          typedef uint16_t u16;
          typedef uint64_t u64; // save some typing
          
          // algorithm parameters
          #ifndef PART_BITS
          // #bits used to partition edge set processing to save memory
          // a value of 0 does no partitioning and is fastest
          // a value of 1 partitions in two at about 33% slowdown
          // higher values are not that interesting
          #define PART_BITS 0
          #endif
          
          #ifndef IDXSHIFT
          #define IDXSHIFT (PART_BITS + 8)
          #endif
          #define MAXEDGES (NEDGES >> IDXSHIFT)

You will however need to provide enough DRAM to store the bigger edge bitmap. Which is similar to ethash miners having to account for an eventual doubling of DAG size.

timolson · January 10, 2019, 4:13pm

But… I thought you were being ASIC-friendly? Which is it!? Why not allow single-chip ASIC’s? Why even bother with the need for DRAM? What is the point?

For a lean miner, this doubles the power AND cuts the speed in half, which is effectively a 400% performance hit. Moving from S7’s to S9’s was a “massive” 250% efficiency boost, but S7’s were useless within months. Doubling the graph size is a much stronger effect than even the massive S7-to-S9 jump, and the only thing it achieves is this:

https://www.trustnodes.com/wp-content/uploads/2018/11/bitcoin-asics-dumped-on-street-nov-2018.png

The main point is, why bother? What good reason is there to double the graph size? Bandwidth arguments make no sense. Don’t accelerate the trashing of old hardware.

tromp · January 10, 2019, 4:41pm

That was answered in the topic I linked to. Please address the arguments in that thread if you disagree.

Your math is off. You use the same power for a 1x-2x longer time.
On cuckatoo_lean_cuda_29 for instance it’s1.3x slower.
Its performance hit is under 2x.

I already addressed this above:

timolson · January 10, 2019, 6:20pm

How is it off? You must do 2x the number of siphashes. That’s 2x power. Please explain.

Lean Miner’ s 1.3x must be a peculiarity of GPU’s… I haven’t studied your Lean code honestly, but 1.3x must be because of one or more of these effects:

Lean-GPU is actually bound by atomic writes to shared and/or the ratio of shared memory to compute units for a given threadblock. i.e. it’s not bound by computation
Lean-GPU is caching siphashes in DRAM
Lean-GPU is bound by instruction latency and doubling the threadblock count improves pipeline efficiency

You cannot run Lean Miner on a GPU and expect the results to hold for an ASIC. Apples and oranges.

timolson · January 10, 2019, 6:21pm

“We’re going to trash them anyway, so let’s just throw them away faster…”

tromp · January 10, 2019, 7:06pm

No; it’s 2x the energy. So you take twice as long at the same energy/second.
Maybe I should have said your physics is off?!

tromp · January 10, 2019, 7:08pm

I’m not; I’m just pointing out it can’t be more than 2x loss in efficiency.

timolson · January 10, 2019, 7:53pm

Ok sorry, I’m wrong about 4x. On a Joules-per-hash basis, it’s 2x. I fooled myself with anxiousness to make a point. It will be very nearly 2x, which is still disasterous.

Any factor above 1x makes no sense to me. Why accelerate the trashing of hardware at all? I do not understand any reason to enforce multi-chip vs single-chip ASIC’s… which is the other thread.

BuckBeak · March 29, 2019, 2:12pm

John - thanks for the table. As you have correctly pointed out, for i = 1, remaining fraction is (1 - 1/e); call this T. It looks like for i = 4, remaining fraction is T/(2e); is this correct? Any other closed form results for i > 1? Thx.

tromp · March 29, 2019, 5:13pm

I suspect not.

After 2 rounds, the fraction is a*b, where a = 1-e^-1, and b = 1-e^-a.
After 3 rounds, the fraction is b*c, where c = 1-e^-b.
I can formally derive these.

If we blindly continue the pattern, we get
d = 1-e^-c
c*d ~ 0.1167435
e_ = 1-e^-d
d*e_ ~ 0.0836613
f = 1-e^-e_
e_*f ~ 0.0630385
g = 1-e^-f
f*g ~ 0.04927553
etc.

which all closely match empirical results. But I have yet to find the formal proof for this pattern.

tromp · March 30, 2019, 6:14pm

Here are the conjectured fractions, as computed by the Haskell expression

let fs a b = a * b : fs b (1 - exp(-b)) in fs 1 1

together with the cumulative values:

0	1.0	1.0
1	0.6321205588285577	1.6321205588285577
2	0.29617148759453016	1.928292046423088
3	0.17527117460580915	2.1035632210288973
4	0.1167434973394182	2.2203067183683154
5	0.08366133488711419	2.3039680532554296
6	0.06303852419193594	2.3670065774473654
7	0.049275532992528674	2.416282110439894
8	0.039615078308885755	2.4558971887487795
9	0.03256468342107633	2.488461872169856
10	0.027256743026040878	2.515718615195897
11	0.023157879639198638	2.5388764948350957
12	0.019925041905058123	2.558801536740154
13	0.017329210022888588	2.5761307467630425
14	0.015212625148852563	2.5913433719118952
15	0.013463672754950931	2.6048070446668463
16	0.012001541761243502	2.61680858642809
17	0.010766532955768914	2.627575119383859
18	0.009713754228894943	2.637288873612754
19	0.008808912243031928	2.646097785855786
20	0.0080254388075724	2.6541232246633584
21	0.0073424884526321595	2.6614657131159905
22	0.006743517461381278	2.668209230577372
23	0.006215258825631778	2.6744244894030036
24	0.00574697171180243	2.680171461114806
25	0.005329884410165022	2.685501345524971
26	0.004956775717854674	2.690458121242826
27	0.004621656739671887	2.6950797779824978
28	0.004319526457507245	2.699399304440005
29	0.004046182127856039	2.703445486567861
30	0.0037980708729921666	2.707243557440853
31	0.003572172534322336	2.7108157299751756
32	0.003365906473802856	2.7141816364489784
33	0.0031770568814350818	2.7173586933304135
34	0.0030037125008642515	2.7203624058312776
35	0.0028442176745825517	2.72320662350586
36	0.0026971323403770105	2.725903755846237
37	0.002561199154372839	2.72846495500061
38	0.0024353163243945627	2.7309002713250043
39	0.002318515046580586	2.733218786371585
40	0.0022099406741146227	2.7354287270457
41	0.002108836928254842	2.7375375639739548

Topic		Replies	Views
Choice of ASIC Resistant PoW for GPU miners Mining	110	15344	February 23, 2019
Scheduled PoW upgrades proposal Mining	42	5405	January 11, 2019
Next PoW CuckARooM unveiled at GrinCon1 Mining	18	4744	January 8, 2020
GPU mean memory reductions Mining	44	12228	August 16, 2019
Proof of work update Mining	81	12086	October 24, 2018

Cuckatoo32 feasibility

Related topics