Grin Improvement Proposal 1: put later phase outs on hold and rephrase primary PoW commitment

I hereby propose to accept Pull Request https://github.com/mimblewimble/grin/pull/2714
which preserves the planned phase out of C31 but puts all later phase outs, of C32 and beyond, on hold, and to replace the term “foreseeable future” in our primary PoW commitment at https://forum.grin.mw/t/cuckatoo31-im-mutability by the more specific term “next 18 months”.

[20190625 UPDATE] The proposal was accepted in today’s dev meeting: https://github.com/mimblewimble/grin-pm/blob/master/notes/20190625-meeting-development.md#decision-putting-phaseouts-on-hold

[20191118 UPDATE] To clarify, by “putting later phaseouts on hold”, I mean on hold indefinitely. They are effectively off the table. The plan is to stick with C32+, once C31 phaseout is complete.

Background:

The graph size upgrades were introduced in https://forum.grin.mw/t/scheduled-pow-upgrades-proposal as a means to resist single chip ASICs, which were deemed “not meaningfully different from pure computational ones like Bitcoin’s sha256”.
At the time, Grin was planning to use Cuckatoo32+ as its primary PoW. Cuckatoo32 requires 512MB of DRAM and 512MB of SRAM for efficient operation, an amount considered completely infeasible to put on a single chip at that time.
A while later, as argued for in https://forum.grin.mw/t/cuckatoo32-feasibility, the primary PoW was downsized to Cuckatoo31+. It was thought that even if putting the required 256 MB DRAM + 256 MB SRAM on a single C31 chip was feasible, it would make little economic sense, with the expected poor yield and limited window for ROI.

All this was heavily put into doubt when Obelisk posted details of their GRN1 miner in https://forum.grin.mw/t/obelisk-grn1-chip-details revealing simulation data of a single chip C31 ASIC as well as planned single chip ASICs for C32, C33, and C34. David Vorick also provided interesting data on expected heat density, showing that such chips run significantly cooler than computation focussed ones.

Motivation:

We need to re-evaluate our future PoW plans in light of

  1. Single chip Cuckoo ASICs being meaningfully different from computation ones
  2. the likelihood of single chip ASICs outperforming multi-chip ones for the next 2-8 years

We should wait for both single chip and multi chip ASICs for C31 and/or C32 being produced and compete so we can judge whether yields are reasonable and whether the performance advantage can make up for the higher die costs.
This could lead us to conclude that further phase outs are undesirable.

If we maintain our commitment to the C32 phaseout, and have ASIC manufacturers make significant investments into C33 miners, then we lose the option of freezing the primary PoW at 32 edge bits. Which is a very natural size that’s easier to design circuits for. Also, there’s something nice about the memory requirement being a round 1GB. I’d rather not end up deciding later that single chip ASICs are best after all, but then having them at the less natural 33 bits and more awkward 2GB.

I am in favor of this, but also have another question. In the original commitment post, it was suggested that CC31 phase-out might be delayed if CC32 ASICs were not fast enough, but no formal bound was put on as a speed requirement. I think we should clear that up as well, because slow ASICs exacerbate selfish mining and cause issues with non-progress-free PoW.

A natural requirement to me would seem that CC32 ASICs should be able to do CC32 faster than a 2080ti can do CC31. That is, a single chip or instance of CC32 on a mining rig should be capable of doing at least 2 graphs per second.

Does that sound reasonable?

1 Like

Quoting from the cuckatoo32-feasibility post above:

“When setting the ASIC friendly PoW to be cuckatoo32+, the expectation was that ASICs would be able to search a cuckatoo32 graph within one second.”

As long as single instances are solved within 1 second, we cannot make a convincing case for a delay. When manufacturers release specs for Cuckatoo miners, it would be desirable to include the rate for single instances, or at least a lower bound thereof.

2 Likes

This requirement of 2 graphs per second is rather low even by your own standard.

Per my calculation, your single chip approach to have 1GiB of SRAM will have >50W per chip. So 1000W/50W=20 Chips for 200 Graphs or at least 10 Graphs per chip/instance. So your proposed requirement seem to contradict your own spec. I think your plan of doing a CC31 only miner is not a good plan aside from any technical risks. It puts your buyer at rather high risk of not able to even recover their principal investment. Both Single-chip and Multiple-chip approaches should all try to achieve a balance between compatibility and performance. I believe in the best interest of your investors Obelisk should take CC32 compatibility into account on GRIN1.
http://www.innosilicon.com/html/grin-miner/index.html
Here is a link to our solution. Not only is it CC31/32 compatible, the power consumption is only around 500W and the Graph per Second number will blow you away. I want to let you know that instead of spending all the effort doing audit and finger crossing, Innosilicon has an open ASIC model and welcome collaboration from Obelisk and other members of the community. You can use our ASICs to build your machine and we have a proven track record of delivering working ASICs on first silicon.

@asic_king,
Please provide all your specs. You come with unsubstantiated claims and half of the information. It doesn’t matter if you have lower power if your performance is lower than Obelisk.

Who cares if you support C32, if you are not competitive in C31? If you are not competitive with Obelisk in C31 then there’s a good chance, someone could mine with Obelisk C31 miners and use the profits to buy whoever’s C32 is better. But show us the numbers so we can make informed decision.

Btw, does your Grin miner use the same memory architecture as your Ethereum A10 miner? What ever happened to those rigs? I don’t know any mining farms that received their orders.

My friends and I, some who are former engineers from AMD and Nvidia, were anxiously waiting for your performance numbers. We have done the analysis on your potential memory and siphash bandwidth. We don’t think you will have any significant advantage over GPU except for power, which is a similar proposition to your A10 EThminer, ie. not enough of an improvement over gpu rigs.

Hmmm…One cannot help but wonder if the following is happening…“shoot as many arrows as you can at your target (eg Obelisk)…it doesn’t matter if you hit or not…we can frighten him to death or no one goes near him” . Some of the stuff on the other thread is a prime example. Back in January, concerns of a whole paper-tiger phenomena were brought up. Now it looks to be true. To Inno, Publish the real specs. And not stuff from thin-air. What will it be for C31, what will it be for C32. And when? From the outside looking in, it seems every transparent step from Obelisk, is followed by mis-direction or incomplete statements. There is now an audience who understands memory controllers, latency and trade-offs in the semiconductor space. And we will help the Grin community understand what is going on. To Tromp, thank-you for initiating this thread. The other thread started with the right intentions but has degenerated beyond most of our imagination. We can keep this thread relevant and on-point. Period. Full-stop.

We can keep this thread relevant and on-point. Period. Full-stop.

Do try to not tempt me, I was already biting my tongue on taking this thread wildly offtopic before you said anything.

1 Like

Thank you for the clarification.

At the moment, there are no CC32 ASICs that have published specifications. If the decision had to be made today, I do not believe it would be wise to count on a miner that has no published specifications nor published roadmap to tape-out.

Does it make sense to put a date by which a CC32 ASIC needs to be taped-out at the required specification (1 graph per second per instance) before committing to keeping the phase-out? Should there be a requirement that this ASIC be publicly available, or is a private / self-mining tape-out considered good enough?

Transparently, if Obelisk were to make a CC32 ASIC, our own tape-out target date would be January 2020, after the phase-out begins. CC31 still technically has life until around August 2020, and we believe that it’s quite viable all the way until April, so we believe that it makes more sense to spend more time optimizing and refining than it does to rush to have a ASIC shipping for CC32 in January.

Another route that could be taken, given that there is currently uncertainty about whether CC32 ASICs will exist (which is a completely separate matter from whether or not CC32 are able to exist -we certainly believe they are physically practical), does it make sense to stop the phase-out before 0? What if instead of weakening CC31 to the point where it has 0 weight, we bring CC31 down to something like half-weight, such that a CC32 ASIC is superior in weight-per-computation even if it is making a full 2x TMTO. Then Grin is covered in the event that CC32 ASICs do not materialize, yet still gives a clear preference and advantage to manufacturers that chose to pursue CC32 compatibility.

Beyond publishing specs, can Innosilicon clarify whether their miner is able to do CC29? Based on what little information is released - namely the power consumption - I believe that Inno has made a mean miner, which should be capable of Cuckaroo29 in addition to Cuckatoo31 and Cuckatoo32.

At this time we are advising all people considering a GRN1 purchase to assume that the miner will be phased out starting January. By April, the potency of CC31 will be about equal to a miner that’s doing CC32 with a 2x TMTO, or equal to a miner that has about half the speed. So generally speaking we are suggesting April as the likely point at which the miners will cease to be dominant.

We are absolutely not encouraging people to count on the phase-out of CC31 to be delayed, as there is clearly a present intention among Grin developers to keep the phase-out in place. Though we disagree with it, we are fully happy to go along with the plan that was originally set out by the grin developers. We are here to serve the community.

If we successfully move forward with a single-chip CC32 miner, we believe that each instance will be capable of much more than the now-determined 1 graph per second requirement. The concern is not whether a proposed chip is able to mine fast enough to minimize selfish mining, the concern is whether or not a chip that is actually taped-out will be capable of speeds high enough to keep Grin mining progress-free.

That is one view. Obelisk’s view is that it is better to make a dominant miner for each algorithm, as opposed to a single miner that is able to target both. ASICs perform best when they laser in on a single task, and we believe that when specifications are published for any multi-algorithm miner, it will be clear that supporting multiple algorithms has a massive performance penalty.

Though making two separate chips requires paying for two separate tape-outs, we believe that the Grin block reward is enough to justify both tape-outs, and that from a competitive standpoint, no chip capable of both algorithms will be remotely competitive to what Obelisk is building for each.

No; in order to live up to our commitment, with the proposed 18 month horizon, we cannot unilaterally make any changes to the phaseout of C31; which will be completed 18 months from now. For such changes to be accepted would require agreement from all affected parties. In addition to governance approval.

Again, that would require agreement from affected parties as a first step. Can you get Innosilicon to agree to that?

I have faith that C32 ASICs will materialize. In the unlikely case that they don’t, the slean mining approach introduced by Wilke Trei at Grin Amsterdam will allow a 2080 Ti to mine C32 at speeds close to 1gps.

We have set the announcement date around 4/15. I can provide preliminary numbers but please bear in mind that many factors can affect the final outcome. To answer your question about competitiveness, currently we are looking at a lower bound around 100 GPS for 500W for CC31.

Innosilicon is against making changes to the CC31 phase out plan without good cause since we have devoted considerable design resource in order to build an ASIC that is CC31/32 compatible. We also tuned the compatibility to match the phase out schedule of CC31. We optimized for one algorithm and then spend additional silicon to stay compatible with the other algorithm. There is a balance between additional silicon to spend for compatibility and the cost. Also much effort has been spent to architecturally extend the first algorithm to support the second one. For CC32 I can’t provide detailed spec right now but current estimated per instance lower bound is >2GPS. The tape out is scheduled to take place in April with estimated machine shipment date around August.

@asic_king, do you think we are stupid people on this forum?

How the heck can you make external DDR memory more power efficient than SRAM? DRAM uses inherently more power than SRAM since DRAM needs constant refresh AND you will be using significantly more memory parts and ASICs than Obelisk. Could it be you made up numbers to be comparable to Obelisk as I showed below?

5 w/gps for inno grin DRAM memory I.E. 500w/100gps vs 5.3 w/gps for GRN1 with SRAM I.E. 800w/150gps

btw, where can I order some of your used A10 miners? I need some Ethminers to replace my Bitmain E3.

@asic_king, 2gps for CC32 doesn’t make any sense when you claim to achieve 100gps for CC31?

Could it be that you are rushing an interior product to market? You will not be competitive to Obelisk who claims to be 200gps in CC32. Or did you pull numbers out of the air like you did in your CC31 post above?

You will be marginally better than a 2080ti which will mine CC32 at 1gps as @tromp states in a post above, but you will be much more power hungry at 500W. Sorry at those specs, you suck and I’ll stuck with GPU’s.

Are you using the same tactics here as your A10 miner? Is Innosilicon is trying to scare people away from Obelisk and then not release because your rig is marginal? Sounds just like your stillborn A10 Ethminer which was not good enough to go into production despite Innosilicon’s announcement to stall sales of Bitmain’s E3 Ethereum miner.

The Grin community will suffer if that’s the game you are playing. No ASICs means the network will not scale. Nvidia and AMD won’t be able to make 2080’s and Navi cheap enough or have enough volume to support Grin when AI is in high demand and their CFO’s have been burnt by crypto.

2 GPS per instance for CC32 means per chip and not for the whole miner. You are mixing this number with the 100 GPS per miner for CC31.

It is not black and white. You are thinking all SRAM or all DRAM design. Instead it is a combination of memories.

Good to see some specs being posted.

“To answer your question about competitiveness, currently we are looking at a lower bound around 100 GPS for 500W for CC31”

To the rig mfg/mining decision-makers that have contacted me, you are asking the right question(s). If you were provided specs privately (say last month) and suddenly you see public specs that are VERY different…what could have changed in that short time frame. No apologies needed for interrupting my weekend. And your permission for me to elaborate here to help the Grin community is commendable.

Short answer, certainly not any architectural changes and certainly no implementation changes if tape-out is April, unless it is April 2020.

Your other question to me of how realistic is the 1GPS to 5W performance to power ratio with the traditional mem ctrlr design is also spot on. When alot of data has to move off chip and back, how can you do it with such low power and high data rates.

Other elements that are not passing any basic smell-test (and I am sure many others have grok’ed)

2080ti details are well documented. For example:

NV (and AMD) know how to design high speed memory interfaces. 2080ti can do 1-2 GPS in CC31 per some of the on-line postings.

So what is going on and what/who should you believe?

2080ti……1-2GPS…260W TDP, GDDR6, 352-bit bus.

Pubic Inno details…100GPS…500W….lower bound & baiting it can be even better, touting the superior memory IO interfaces.

Private Inno specs…(under NDA with rig mfg/miners)

We can continue that dialog in private and I will respect the NDAs in-place. The observation that things are not adding up would be accurate. The PPAC analysis will focus on the PP (power/performance). The AC (area & cost), we can touch on that too. But public details are lacking on the type of external memory being used (eg GDDR6, LPDDRx, or even plain DDRx). So it would not be right for me to comment in the forums until they say it first.

@asic_king, why don’t you provide comparable numbers for your CC32 rig instead of just the ASIC as you did for CC31? It’s very confusing, especially since we don’t have all the information about your rig. That said, how many ASICs are in your rig and what frequency are the chips? Surely you have that information since you provided CC31 performance and power estimates.

Power is fairly deterministic in this case because it’s mostly physics, so it’s pretty black and white.

You use some SRAM as a cache but you will need to go off-chip for external memory access. You need to keep that memory refreshed so that will use more power than embedded memory. You need lots of discrete memories to provide a wide data bus in order to get sufficient bandwidth and you will need multiple ASICs to provide enough performance. Let me know that type of memory, your memory clock, bus width and number of ASICs so we can verify your power claim.

Before further discussion, I want to emphasize again that for official information please refer to our website: http://www.innosilicon.com/html/grin-miner/index.html
Now back to the topic around GPS & memory architecture. First of all, even though DRAM needs to be refreshed every few milliseconds, when using at full speed it is typically less than 10% of the total power consumption. Most of the energy is spent moving data from point A to point B. Let’s examine access energy of two popular high speed DRAMs: GDDR and HBM.

Capture

HBM is a few times more efficient than GDDR because it is connected to the Core ASIC through an interposer. The energy to access a bit in HBM2 is about 4pJ/bit, most of that is data movement both on the DRAM die and from the base layer die to reach the I/O pins. To visualize this let’s look at a picture showing how data travels:

Currently HBM is still too expensive for mining ASICs. On the other hand, TSMC has announced the availability of Wafer-on-Wafer (WOW) technology which allows you to stack two logic dies and connect them through TSV. This is a cheaper alternative than using HBM. Also because there is no interposer it allows higher bandwidth. This opens the possibility of building an SRAM chip and stack it with your ASIC core.

In order to design an energy efficient ASIC for GRIN, one has the task of minimizing data movement. I will make a few observations here. It is not meant to disclose our particular method but will stimulate thinking. For example, if you goes with single chip ASIC, you can’t treat 512MiB of SRAM like one big memory chunk. The SRAM size on 16nm will be over 22mm on each side. A completely random access of this SRAM block is not efficient. This is an area where AI technology overlaps with GRIN Mining. Ideas like processing in memory (PIM) has to be considered where logic is mixed with SRAM to minimize power consumption and increase speed. Again, instead of focusing on building high bandwidth, you should at the same time think about minimizing movement. This is true both for both single and multiple chip design.

thank-you for the contribution. As an FYI, here is the publication where the diagrams would have originated. Sharing so we know this is public information and nothing confidential is being posted again.

Glad to see after 3-4 days, the thread is not being mis-directed.

At this point, there may be 3 ASICs that have invested resources into C31 or C32 designs. If anyone knows of others, let us know.

Incorrect, Linizhi Corp will be releasing their Lavasnow Ethereum miner using HBM in June. I should note that Linzhi’s performance is 3x the performance of Inno’s A10 miner.

@asic_king, thank you for the memory lesson. I noticed you did not give Obelisk the benefit of the doubt in the previous threads when it comes to memory segmentation and optimization. I guess those dumbasses at epic who built 10+ generations of GPU don’t know how to build memory controllers and high performance, low power ASICs.

Btw, you still didn’t answer how many ASIC’s are in your rig. Why is there any reluctance when you have released your CC31 rig performance but have only provided the CC32 at 2 gps for a single ASIC?

A few things that have been on my mind:

Innosilicon as best we know never shipped an Ethereum miner. Though I do believe that its claims and performance did damage Bitmain’s sales. I know people who ordered them and never received them. I’ve heard other rumors saying that the Ethereum miner from Inno does exist, but Inno only mined it internally.

During the sale of Obelisk’s Decred miner, Obelisk had the most energy efficient miner on the market, even ahead of the Innosilicon Decred miner. While the Obelisk sale was still going, engineers claiming to be working on the firmware for Inno’s Decred machine said that they had changed the firmware to boost the hashrate to 4.4 TH/s on only 1000 watts. This put the Inno machine very firmly ahead of the Obelisk machine in terms of cost and efficiency. Note: there were no official statements by Innosilicon, and also the connection between Inno and the engineers making the claims was never officially confirmed. Similar to this conversation here actually.

This claim transparently hurt Obelisk sales. Though there were no official numbers saying the Inno machine was better, the rumors were clearly impacting our sales. When the Inno Decred machine shipped, it was only doing 2.4 TH/s at 1000w. The 4.4 TH/s claims were a paper tiger, and not even from official sources, but the damage was done.

As it happened, Whatsminer had been making a 40 TH/s miner anyway. Neither the Obelisk nor the Inno machines were viable, so it ended up not mattering so much anyway. But this was not known at the time.


There’s another super important variable that isn’t being discussed by Innosilicon, and has never been discussed by Innosilicon, and also has been used dramatically to their advantage and their customers disadvantage at multiple points in the past: total production quantity.

Mining is a zero-sum game. The amount of reward that exists get split between the miners evenly, and so if more miners exist, the reward for each miner is smaller. Innosilicon has sold miners in the past at extremely high prices, even using phrases such as “you can’t put a discount on dollars”, not disclosing production volumes, not disclosing the fact that they would be mining themselves at-cost, and ultimately resulting in their customers losing substantial sums of money even though Innosilicon had the best miner.


As far as GIP1 goes though, it seems like everyone is on board with delaying the phase-out of CC32 indefinitely, and stopping at CC32. And it seems like delaying the phase-out of CC31 is out of the question, and that Grin is committed to moving to a 7nm-required algorithm for their cryptocurrency.