No; you can just increment PART_BITS which doubles the number of passes you make over remaining edges in each round. See
You will however need to provide enough DRAM to store the bigger edge bitmap. Which is similar to ethash miners having to account for an eventual doubling of DAG size.
But… I thought you were being ASIC-friendly? Which is it!? Why not allow single-chip ASIC’s? Why even bother with the need for DRAM? What is the point?
For a lean miner, this doubles the power AND cuts the speed in half, which is effectively a 400% performance hit. Moving from S7’s to S9’s was a “massive” 250% efficiency boost, but S7’s were useless within months. Doubling the graph size is a much stronger effect than even the massive S7-to-S9 jump, and the only thing it achieves is this:
The main point is, why bother? What good reason is there to double the graph size? Bandwidth arguments make no sense. Don’t accelerate the trashing of old hardware.
That was answered in the topic I linked to. Please address the arguments in that thread if you disagree.
Your math is off. You use the same power for a 1x-2x longer time.
On cuckatoo_lean_cuda_29 for instance it’s1.3x slower.
Its performance hit is under 2x.
How is it off? You must do 2x the number of siphashes. That’s 2x power. Please explain.
Lean Miner’ s 1.3x must be a peculiarity of GPU’s… I haven’t studied your Lean code honestly, but 1.3x must be because of one or more of these effects:
Lean-GPU is actually bound by atomic writes to shared and/or the ratio of shared memory to compute units for a given threadblock. i.e. it’s not bound by computation
Lean-GPU is caching siphashes in DRAM
Lean-GPU is bound by instruction latency and doubling the threadblock count improves pipeline efficiency
You cannot run Lean Miner on a GPU and expect the results to hold for an ASIC. Apples and oranges.
Ok sorry, I’m wrong about 4x. On a Joules-per-hash basis, it’s 2x. I fooled myself with anxiousness to make a point. It will be very nearly 2x, which is still disasterous.
Any factor above 1x makes no sense to me. Why accelerate the trashing of hardware at all? I do not understand any reason to enforce multi-chip vs single-chip ASIC’s… which is the other thread.
John - thanks for the table. As you have correctly pointed out, for i = 1, remaining fraction is (1 - 1/e); call this T. It looks like for i = 4, remaining fraction is T/(2e); is this correct? Any other closed form results for i > 1? Thx.
After 2 rounds, the fraction is a*b, where a = 1-e^-1, and b = 1-e^-a.
After 3 rounds, the fraction is b*c, where c = 1-e^-b.
I can formally derive these.
If we blindly continue the pattern, we get
d = 1-e^-c
c*d ~ 0.1167435
e_ = 1-e^-d
d*e_ ~ 0.0836613
f = 1-e^-e_
e_*f ~ 0.0630385
g = 1-e^-f
f*g ~ 0.04927553
etc.
which all closely match empirical results. But I have yet to find the formal proof for this pattern.