Does anyone know if it would be possible to make a pow that you could prove have a lower limit on the number of transistors it must take?
I’m vaguely aware of how the formal math people prove that sorting a list can’t be sub-linear, as you can prove that the number of comparisons increase with the size of the list; but could you extend that to transistor counts and then publish a design that is within a few percentage points of that lower limit.
The problem with that statement is that you can normally trade off transistors for time, e.g. use half as many transistors but take 5x as long.
One of the Cuckoo Cycle bounties supports the claim that you need at least 2^{N-1} bits of memory (and hence transistors) to solve cuckooN without an order of magnitude slowdown.
Equihash makes a stronger claim that the slowdown for saving a factor k in memory is exponential in k. But neither of these claims have formal proofs.
The drug war failed to stop consumable resources; capital that generates clean dark money with a resource that’s everywhere in modern society; no.
This is an apples and oranges comparison. The issue isn’t whether large and state-sponsored actors can stop ASICs the issue is whether they can control more than 50% of them and how cheaply they can do so.
Centralized and state-sponsored actors already do control much more than 50% of the global drug trade. The US pharmaceutical industry alone is larger than the global illegal drug trade.
Assuming different asic’s for different coins of different legal status, I don’t think the comparison to big pharma is valid.
If its an arms race on a new problem, I don’t believe the state can necessarily compete techwise
That state-actors deal in the drug trade is not quite the same as the state playing, if something intelligent and motivated, it will be “corrupt” and likely act selfishly
Ok, well first of all saying that a state or state backed player couldn’t win a technology arms race against non state backed players, on the development of a single type or small group of ASICs, seems like a risky bet to me. It seems safest for privacy focused coins to avoid testing this theory.
Secondly, even if you are right, ASICs still increase barriers to entry and increase the fixed costs associated with mining, which leads to more centralization. ASICs are literally a text book definition of an oligopolistic market, and that seems worth preventing.
Initially using an algorithm which is unlikely to allow much advantage to ASICs seems like a good idea, because it will lead to a greater distribution of coins in the economy. Then switching to an Algorand-style Proof of Share consensus mechanism would at least be an option.
I think you might be able to do a ZK range proof that a VRF output using the transaction’s blinding factor as the secret key, times the transaction’s value, is less than some threshold. That would allow inferences about the value in that transaction, though, which could lead to conflicting incentives for users.
It also hinges on the idea that the memory setup used by the GPU is optimal for the task at hand. For example, it’s well known that cuckoo has two different modes - a latency bound mode and a bandwidth bound mode. The bandwidth bound mode uses something like 7x the memory of the latency bound mode, which means that if you are building custom ASICs you can make substantial improvements over a GPU as long as you can get the latency low enough.
But beyond that, there are many tricks that ASICs can employ to manage memory that aren’t readily available in GPUs. For example, putting multiple chips in the same package, or even stacking chips on top of eachother. You’ve also got access to tools like embedded-DRAM, which has substantially faster access times and lower energy costs than dedicated / separate DRAM.
One example optimization is that you can make specialized memory which automatically increments every time that it is accessed. This is very useful for cuckoo cycle, and it’s also not something you are ever going to find in general purpose hardware.
Homogeneous costs are actually not what you want. Economies of scale work very well in hardware because of the homogeneity - an optimization found in one spot will generalize to everything else. So you can set a large number of people on optimizations, and their optimizations will all apply globally, meaning the largest team churning out the most optimizations is going to be able to create a consistent advantage over everyone else, and even be able to use that advantage to increase their team size and further extend their advantage. This is why we only have 4 major foundries, and really only one company behind the scenes that knows how to make EUV machines at all.
Electricity on the other hand, many of the optimizations and dealflows that make various datacenters possible happen using techniques, friendships, networks, and other skills that don’t apply to any other datacenter in the world. Finding cheap electricity and building low cost datacenters is not a skill that you can scale easily, meaning that the best farms in the world tend to be smaller groups, and that there isn’t any particular group that’s going around and dominating everyone the same way that Bitmain and Innosilicon are in the hardware world. Mining farms are actually reasonably decentralized to the extent that hardware is easily accessible.
As we’ve seen with the new innosilicon zcash miner, I would not put much faith in the ability of the 2x and 8x numbers to hold. Equihash just gained another 5x (now it’s at 40x), and I’m confident we’ll see something similar out of innosilicon or another group for Ethash by mid-next year. At this point we’re not even confident Bitmain was using application-specific hardware for Ethash and Equihash, a first teardown of the boards makes it seem possible that they are actually just throwing their more general-purpose AI chip at ethash and equihash and selling that.
We’ll know more in the next few months.
That’s certainly the case. Who is the most able to adapt quickly to a sudden change in the PoW algorithm? A team of 10 chip devs making their first or second product (like Obelisk) or a team of 150 chip devs that are making their 7th generation product (like Bitmain)? Who is more able to swallow $5 million in completely lost R&D because a cryptocurrency suddenly changed their algorithm at the last minute? A small company where $5 million is enough to kill them (like Obelisk), or a big company sitting on an estimated > $1 billion in the bank (like Bitmain).
Agility and risk tolerance are features of large chip companies with many developers and aggressive rollout strategies, and refined product creation pipelines.
Hardforks will not deter well funded companies like Bitmain and Innosilicon. They make enough money off of the wins to eat losses left and right. A single win could pay for as many as ten losses, the odds really are in the favor of just aggressively making ASICs with just enough flexibility. But that means you need enough capital to eat many losses in a row before you get a win, and only big companies have that kind of capital.
Yes, if you assume that ASICs are going to happen anyway, your absolute best bet at decentralization is to broadcast your intentions clear in advance. And actually, your best chance might be to do some variant of the Obelisk Launchpad (note: Obelisk is my company and Obelisk Launchpad is my product). Launchpad itself is aimed at slightly smaller coins, where the block reward really isn’t enough to allow more than one manufacturer to succeed. Grin probably has enough hype and potential block reward that you could do a variant of Launchpad which involves multiple manufacturers getting to market at the same time.
Well, the big difference there is that you can make drugs with a relatively small operation. 7nm ASICs require a $10+ billion dollar facility to manufacture, which means you aren’t going to have any underground groups manufacturing 7nm ASICs. Maybe 1000nm ASICs, that might be in reach, but not modern day competitive ASIC technology. This is in sharp contrast to weed, lsd, heroin, etc.
There are more sha256d manufacturers in the world than there are GPU manufacturers in the world. Even more importantly, competition for sha256d manufacturing means that the margins are actually pretty low, which is very different from GPU margins.
If a GPU manufacturer decided to start mining cryptocurrencies with their GPUs, they would have a sharp competitive advantage over everyone else because they don’t need to pay their own margins. And, if they so chose, they could also easily strip off all of the features and memory bits that aren’t explicitly required for certain algorithms, further extending their edge. Unlike with sha256d ASICs, competitors to these GPUs are going to require hundreds of millions in R&D, and will require some of the most expert and sophisticated designers in the world. You will be going up against companies who have spent decades building moats and entrenching themselves, ensuring it is difficult to compete with them.
As cryptocurrencies continue to grow, centralization risk from GPUs basically amounts to a single GPU manufacturer deciding to mine directly instead of selling cards to other miners. They would compete extremely effectively, and it’s not what you want long-term for the industry.
We haven’t finished yet, but we’re working on a patent license which many help. I’m generally not a fan of viral licenses in the open source world (all of my own projects are MIT licensed), however that’s because software is not typically a zero-sum game. That is to say, if someone takes my innovations and builds on them in private, the stuff I built still has value on its own. That’s not true in the mining world. Because mining is zero-sum, if someone takes a hardware optimization you released and builds upon it privately to get a more efficient miner, they actually reduce the value of your miner because now you are competing with more powerful opponents for the same sized pie.
So, I believe that in order for open mining hardware to succeed, we really should look into viral patent schemes. But I also know that the BDPL has been broadly rejected by the hardware industry, because it is really too general and too threatening to the standard business models of hardware companies and design houses. Hardware developers complain that the BDPL essentially requires that they cannibalize their entire business and life’s work just to produce a single chip; it’s a non-starter for them.
So we’re actively working on a different license that hardware developers are more comfortable with. The result is likely to be per-product - that is, if you use any patent from the patent pool for a particular product, all patents related to that product must also be contributed to the patent pool. But you can also have another product which avoids the patent pool, and that product (from the same company) can have special / unique patents protecting it.
Perhaps the most aggressive idea I had though was patenting the PoW algorithm itself under this patent pool. For example, Grin could invent a new proof-of-work algorithm, and then stick it in the patent pool such that any person mining on that algorithm has to contribute every patent and every piece of software related to mining that algorithm to the patent pool. Though this won’t deter secret mining, from a legal perspective it forces all public miners and all public mining code to be open and available for anyone to copy or use.
I think it would go a long way towards blowing open the hardware world.
Interesting, can you elaborate on what that would look like? And thanks for chiming in!
I’m likely missing a lot of what you have in mind, but at face value this seems difficult as most algorithms can’t be patented, at least under U.S. and E.U. patent law. Their implementation can be copyrighted and particular methods of implementation, like a given optimization, could potentially be patented. So while there is prior art, like OIN, it’s usually aimed at protecting specific software (Linux).
Can you share more info about this? I’m not necessarily disputing it, I just want to understand the situation a bit better as it’s not very clear to me. For example, doing a quick search I found this quote from the European Patent Office (page 3, emphasis mine):
Under the EPC [European Patent Convention], a computer program claimed “as such” is not a patentable invention (Article 52(2)(c) and (3) EPC). Patents are not granted merely for program listings. Program listings as such are protected by copyright. For a patent to be granted for a computer-implemented invention, a technical problem has to be solved in a novel and non-obvious manner.
Similarly, if the USPTO granted a US patent on Schnorr, what would invalidate a Proof of Work algorithm seeking a similar type of protection?
Coin developers work with Obelisk to produce a new, ASIC-friendly, PoW algorithm, that is kept secret until the launch of the coin;
Obelisk produces ASICs based on the secret PoW which then can be distributed to the community, in an attempt to ensure a certain degree of decentralisation;
Upon launch, PoW is announced and ASIC design is open sourced;
At genesis, the coin is protected by community mining at an advantage over other actors, which is estimated to give a roughly 4 month head start;
The bandwidth bound mode uses something like 7x the memory of the latency bound mode, which means that if you are building custom ASICs you can make substantial improvements over a GPU as long as you can get the latency low enough.
It’s 11x more. I expect ASICs are going to prefer lean solving with SRAM. And they’re indeed going to leave GPUs in the dust. Btw, stacking RAM chips on top of ASICs is already being done with HBM(2) equipped GPUs.
you can make specialized memory which automatically increments
That’s what the SRAM is used for in lean meaning: it’s a huge vector of 2-bit counters. The counters only need to count up to 2 though, so the one possible optimization is to implement an SRAM of trits rather than bits.
throwing their more general-purpose AI chip at ethash and equihash
That makes little sense to me since the sophon has only 32MB of memory and tons of 32-bit floating point circuitry for tensor calculations which is all wasted on pow.
Z9 teardowns show a BM1740 chip while the sophon is designated BM1680.
They are only similar in having significant amounts of on-chip memory and a bunch of pipelines for processing the data held in that memory; but the balance tips much more toward computation on the sophon.
Yeah. The difficult thing is that if you don’t keep the core algorithm secret, you have no control over bigger players coming to market at the same time, and you have no control over how much hardware they are going to produce or whether it might be better than the hardware that dedicated manufacturers are going to produce. The scariest bit is the technology choice. A $10m Launchpad run is going to get you 22nm hardware. If we broadcasted the algorithm ahead of time, there would be a high risk of Innosilicon or Bitmain stepping in and doing a much bigger run of 10nm ASICs. 22nm hardware is not going to compete at all with 10nm hardware. The difference is going to be approximately what the difference is between the S5 and the S9 is.
So if you are thinking about opening up the algorithm, you minimally need to get the tooling that’s going to be competitive, which is 16nm or better. That’s going to balloon the cost though. Tooling is going to be 2x-3x as expensive, and if you want to go all the way to 7nm the total cost is going to be closer to 5x (that is, a Launchpad engagement will be closer to $50 million, not $10 million).
And then you run into lead time problems. From what I understand, the ecosystem is pretty supply shocked today, and lead times are very large (6+ months) to get chips at any of the competitive nodes unless you’ve already got access. Bitmain and Innosilicon probably wouldn’t have much trouble pointing some of their existing allocation at a grin chip, but any startup manufacturer is going to need more lead time, and enough capital to convince a foundry to take them seriously.
The big thing here is that for every manufacturer you want to add, you have to add another multiplier to the cost of bringing a product to market, unless the manufacturers have some group agreement to all use the same tooling and chip production capacity. That’s actually what I would try to steer grin towards - once you have chips the cost of developing an efficient unit + enclosure is under $1 million (if you are efficient), and then you just need manufacturing dollars.
Financing would probably look something like a token sale. Basically access to early ASICs is more or less equivalent to access to early tokens, and for Grin I’m pretty sure a large number of investors would be interested in coming forward to get those machines manufactured, whether for the purpose of selling them to consumers when grin launches or for the purposes of mining themselves.
So that’s a lot to digest. The thing to remember is that if you don’t do some sort of coordinated hardware launch, you’re going to get a chaotic mess like Decred and Sia saw, where very likely a single party is first to market, and very likely one manufacturer had the financial capability of going for a node like 12nm or 10nm while everyone else was stuck at something cheaper. And likely, the manufacturer that went for the more aggressive node also got to market faster, because they’ve done it lots of times and already had allocation of wafers that they just re-pointed at grin, instead of having to wait the lead times that everyone else would have had to have gone through to get access to wafers.
What’s the launch schedule for Grin?
Well, that’s a pretty loose guideline, we’ve seen some pretty egregious algorithm patents get granted (Netflix owns a few for applying machine learning to movie selection, if I’m remembering my college lectures correctly). But, even if we can’t patent the algorithm itself, we can patent a bunch of optimizations like ASICBOOST or other ways to improve on the implementation of whatever algorithm gets chosen. That effectively makes it so that you can only have a competitive ASIC if you’ve got a license for the patents that cover all of the critical optimizations.
Then you release those patents under some sort of (carefully chosen! BDPL missed the mark here) viral license that require all additional patents / optimizations used in the product to be released as well, and you get some heavy leverage to open the industry, at least as far as public miners are concerned. (I’m sure you would get some secret miners in violation of the patents, so you’d need to make sure that your patents + license are generous / loose enough that the public miners can be competitive with secret ones - which I think is more than possible).
Well, that’s the goal. The really big thing we emphasize to smaller coins (again, not sure Grin + Zcash fit this ‘smaller’ descriptor) is that if they don’t use some sort of protected scheme, they are very likely to get a single, centralized, uncontrolled, first-to-market situation anyway with either Bitmain or Innosilicon. From what we can tell, Innosilicon has had absolutely no qualms mining on >50% of the hashrate while selling miners at outrageous markup to consumers, but when they’ve got the monopoly position, there’s not much you can do about it.
Glad we finally see eye to eye on this
Indeed. Though, turns out that developing something like a trinary sram cell on a traditional CMOS process is not a simple task, and probably not one you can complete in a typical hardware cycle. It’s also probably not something a mediocre team can pull off well, from what I’ve gathered it’d be an optimization that’s only accessible to more funded and more elite teams.
And actually, the same goes for a 2-bit cell. Foundry libraries typically don’t go below an 8-bit cell, which means you have to roll your own sram standard library if you want to do actual 2-bit reads and writes instead of 8-bit reads and writes.
Combine that with all of the yield complications that you see when trying to put so much sram onto a single chip, you get a situation where well-funded expert teams making a grin chip are probably going to come out quite a bit ahead of less experienced teams, even at the same technology level. It basically exacerbates the problems we already have where smaller teams can’t easily access 16nm, 12nm, or 10nm technolgies, and even if they can’t they don’t have the order quantity to get the same pricing that Bitmain can get.
You may be right. It was suggested to me that the BM1740 was a failed sophon chip, but it’s quite possible that Bitmain just had a much worse hardware implementation of Equihash than what Innosilicon had. Based on the ‘Z9’ name though… I think that their equihash miner was an analog design at 16nm.
@Taek
Great post, it is nice to hear from a professional chip manufacturer. While I feel that all your points are valid, I also feel like your perspective is based on your assumptions around the costs and benefits of ASIC vs ASIC-resistant algorithms. I would love to hear what criteria you think makes a good PoW mining ecosystem?
I also noticed that you seem to talk as though it is impossible to avoid an ASIC being developed pre-launch and that it is inevitable that an ASIC manufacturer will control the mining on launch. Since this is your area of expertise I would like to hear more about this? If that is true of Cuckoo cycles it would certainly effects my view of the algorithm.
I totally agree with your point that ASIC manufacturing is way more decentralized than GPU manufacturing but I don’t agree that this translates to more decentralization in mining. In fact, I think that it both misses the point of ASIC resistance and why GPUs are more decentralized.
First off, as you say GPU manufacturers have higher margins than ASIC manufacturers. Maintaining these margins is exactly why GPU manufacturers would never get into mining. GPU manufacturer mining is like steel manufacturers getting into construction just because it is the largest use of steel. The GPU industry is, by its definition larger than the mining industry which, given the margins, translates to a larger total profit in selling GPUs than in mining on GPUs (assuming that they could maintain the same barriers to entry).
To elaborate on this, GPU manufacturers (AMD/NVIDIA et al.) rely on their ability to use their scale and competitiveness to keep potential competitors out, if they let the market for GPUs become too lucrative or reduce the R&D costs associated with entering the market other large players will enter the market and push their margins down. This is reflected in the massive R&D budgets and constant flow of new generation GPUs which represent a steady flow of incremental improvements, this is how these firms keep competitors out. If a GPU firm were to fail to meet market demand for new chip-sets a new firm could easily start making old chips at incredibly low costs (after all their R&D costs would just be reverse engineering).
That is to say, if the market for GPUs increased substantially, the returns to manufacturers also increases substantially as long as they are able to meet supply, but if they fail to meet supply and prices go up for too long, new entrants will enter the market, this translates to an intrinsic cost GPU manufacturers would associated with mining.
The need to keep prices low, keep R&D costs high and meet market demands is reflected in how AMD and NVIDIA keep coming out with new GPUs that they can’t even produce (due to lack of supplies) while they are are still resistant to allowing prices to rise even while they can’t manufacture enough old generation GPUs to meet demands.
All that said I we may fundamentally be judging PoW algorithms through different lenses. I believe a PoW algorithm is more valuable the more it lowers the barriers to entry, for new entrants, to the ecosystem as this will inherently create a more decentralized ecosystem. Thus what is really important is not just ASIC vs GPU but rather is general compute vs application-specific compute power. Thus I feel that the community behind any given coin should be ready and willing to adopt algorithms that ensure general-purpose computers can mine at or above marginal their costs. This could and in my opinion should translate to moving away from GPU mining if control of GPUs ever became too centralized.
TL;DR - High margins, for GPU chip manufacturers, are what keep them from mining since the ROI on mining is by definition lower than the ROI on making and selling GPUs; because the demand for GPUs (see market price) is driven by the profits from mining + the profits from everything else done with them.
I have written both GPU miners and ASIC’s for multiple PoW’s, and in my not-so-humble opinion a PoW needs to be simple and well-proven like SHA-256. Trying to create a new PoW, especially without mathematical proof behind it, is foolish. ASIC’s are simply inevitable for any coin of sufficient value, and using a PoW which is complex and unproven only opens the door to secret optimizations and proprietary implementations. CryptoNight, Ethash, Equihash… they have all suffered from unforeseen optimizations and ASIC implementations. It’s easy to point the finger at Bitmain, but what about the monopolistic GPU farms that have private miners faster than yours? Why isn’t anyone worried about that?