Advice on Cuckatoo hardware implementation

CouchStudyAttire · August 23, 2025, 11:40pm

Hello Grin community ! I am an electrical engineering student entering the final year of my bachelor program and I’m exploring topics for my final year project. I have interest in building a custom hardware for solving a SRAM demanding algorithm (Cuckatoo31).

From what I understand Grin now operates on Cuckatoo32 only, but the deepest knowledge on this puzzle still resides within this forum. I was hoping some of you could point me to the right resources or hint at how to form fit the hardware for optimal solve speed.

Basically from viewing some GrinCon Youtube vids and perusing the Grin forums, I’ve gathered the following :
-There were 2 companies trying to make ASICs for Grin, Innosilicon and Obelisk/ePic
-ePic jammed 512mB of SRAM into a single die, project did not ship.
-Innosilicon made an ASIC that was multi chip and multi algo(c31/c32) compatible
(Im guessing this is the chip in Ipollo miner?)

john Tromp mentionned that C32 is essentially proof of 512mb SRAM, (is C31 proof of 256mb SRAM?)
-these custom miners use the Lean solving algorithm and GPUs use the Mean mining algo.

I have limited knowledge and zero experience building custom compute solutions but am interested in exploring this avenue. For this project, I see the G1 mini as my “competition”, not necessarily as a power efficiency nor footprint(size) improvement, just purely in terms of graphs/second.
——

Essentially I want to see if its possible to hack together some kind of Hardware monstrosity to be able to mine Grin. If so, this is good news for decentralization and low bar of entry for hardware enthusiasts with enough creativity. I’ve read that DRAM has too much latency so its not a matter of gathering old DRAM sticks and running them in parallel. My current train of thought is perhaps a FPGA to be able to optimize the memory accesses and design custom memory bus. Looking at available SRAM on the market it seems like I would need to have 9+ chips to reach the needed 512mb(+ complexity involved)

TLDR; I’m trying to make a homemade Grin miner, mostly from off-the-shelf components, avoiding custom silicon($$$) if possible. I’m humbly requesting comments and suggestions from the experts in the community

Thank you

syntaxjak · August 24, 2025, 2:42am

Very cool man! One suggestion would be to get your hands on a g1 mini and basically reverse engineer it, or at least just to see what makes it tick.

You could research the chips used on the board and that might give you a starting place for what kinda chips you might need.

It would be cool to make an even more mini miner, kinda like those bitaxe devices, basically a single chip board, lowering the entry for grin mining even further, ~150$

CouchStudyAttire · August 24, 2025, 2:59am

I love your suggestions ! I agree that a cheap/low hash device would be awesome! Hopefully as I learn more it will become clear what avenue is the most feasible.
A G1 mini will definitely be looked at for inspiration, but they have a custom-designed chip, which according to my understanding at least, costs millions to do. So I’m trying to find existing hardware that can be stringed together semi-effectively. I’m willing to do some customization like maybe RISC-V for the processor, memory optimization circuitry, etc.

tromp · August 24, 2025, 7:20pm

Correct; since the final pre-planned Jan 2021 hardfork.

Taek from Obelisk announced the Cuckatoo31 GRN1 ASIC at https://forum.grin.mw/t/obelisk-grn1-chip-details/ , while Henry Quan from ePIC gave a talk about it at GrinCon1: https://www.youtube.com/watch?v=n01GnIn2Hs0&list=PLvgCPbagiHgrQa5KVt4XixK9t_NbfpkuP

I don’t think so. ASAIK the Ipollo ASIC is not multi-chip.

I meant that the most efficient implementation, minimizing Joules/graph, would need to use that much SRAM. It would need an additional 512MB of serially accessed memory for the edge bitmap, which could be external DRAM.

DRAM indeed has way too much latency for the purely random access to node bits, while also wasting too much energy, since only one bit out of a whole cache-line is needed. Thus, DRAM is only useful in mean mining as used on GPUs.

FPGAs are quite unsuitable for C32 lean mining, having very limited on-chip SRAM and limited bandwidth to external SRAM chips, which also happen to cost a fortune.

Avoiding ASICs means that lean mining is ruled out, and for mean mining, existing high end GPUs are probably already close to optimal, as your performance mostly depends on memory bandwidth.

CouchStudyAttire · August 25, 2025, 12:22am

Thank you for the thorough response ! Is there any design avenue that you see as feasible or worth exploring for targeting C32 ? Or would you recommend choosing something else as a degree project (does C32 miner design absolutely necessitate large upfront costs?)

Somebody on this forum at some point (please forgive me I don’t remember who) brainstormed the idea of a physical solver, leveraging electrical continuity to detect cycle length. I will think about this more, it might be a ridiculous idea because of the sheer amount of nodes in these graphs but I’m intrigued.. Please let me know your thoughts !

Anynomous · August 25, 2025, 8:17am

It is not possible to program an FPGA in time to find a solution. It takes minutes.

In theory, if you would be able to program FPGA chips much faster and measure voltage drop at each gates, yes you could find graph cycles very efficiently.

Edit: I see some people report programming an FPGA can be done in 100’s of ms. So perhaps it is possible. Either using voltage drop or timing of a pulse to find a graph circle of the right size.

tromp · August 25, 2025, 10:49am

Please explain where to apply and measure voltage in a network of 4 billion nodes connected completely randomly with 4 billion edges. And how to extract a cycle of 42 edges from such measurements. Or similarly for where to send and measure a pulse.

The idea seems utterly ludicrous to me:-(

Of course an FPGA is limited to connections in a 2D grid so it could never represent such a network to begin with…

Anynomous · August 25, 2025, 7:54pm

Obviously I have little undestanding of this topic and had no idea the number of nodes and edges was that vast. That is too many nodes and edges to physically represent on a FPGA, let alone measure. I was only thinking of finding if a cyclical graph is of length 42. If you would build a network of nodes where resistance is mostly in the nodes and not the edges, sure the voltage drop could tell you if you have a cycle of length 42. The idea was inspired by letting a fungus or slime mold solve a mace, it will find the most optimal path. It works similarly, distance=cost/resistance. Anyhow, it is all not applicable since there no way to physically build or measure so many nodes and edges.

tromp · August 25, 2025, 8:05pm

Except that it’s not an isolated cycle. It has lots of other graph fragments attached, that can be thousands of nodes big, with cycles of other lengths potentially among them.

Also your “voltage drop” suggests you already know of 2 nodes to set to known voltages. How would you pick those nodes, not knowing if or where there is a cycle?

Anynomous · August 25, 2025, 8:52pm

This was purely based on my day daydreaming, so not taking into account edge trimming or the vast nr of nodes and edged. But lets say you would have done trimming and be able to measure voltage and timing perfectly. In that case you could fire a random node with a pulse and measure in that same node if and expected pulse with a voltage drop equal to 42 nodes and a delay of the pulse expected for the distance of 42 nodes can be measured. Of course measuring distance would be hard since any physical representation would have unequal edge lengths. But lets say you could magically make the nodes cause a small delay so time of the pulse spend on edges can be ignored as background noise and similarly would be able to ignore the voltage drop in edges as background noise, then it would work.

With regards to other smaller cycles being present, if you would use a pulse these other paths/cycles would be shorter (cycle smaller than 42) or longer (cycle larger than 42) and as such would have different time of arrival of the pulse going through them. Meaning their signal should not affect finding the solution. If you would use a continuous current and not a pulse, then yes all these other graphs would make it impossible to find a solution in the way I propose/daydream.

tromp · August 25, 2025, 9:12pm

But trimming is over 99% of the effort of looking for cycles. It doesn’t matter much how you go about finding cycles in a trimmed graph.

Anynomous · August 25, 2025, 9:31pm

Yep, that is why I am only daydreaming. Of course if you would have billiones of nodes that could measure voltage drop and timing themselves, you could probably also use that for trimming. E.g. fire all nodes, those who did not receive any pulse at delay expected for 42 nodes auto disable/trim.

CouchStudyAttire · August 26, 2025, 10:02am

I really appreciate the discussion. The things Anynomous was describing as well as Tromp’s pointing out of the issues with it is very similar to my thought process.

@tromp you have raised important points regarding the focus on finding cycles is misguided. So imagination is to be applied to “is there a different,cheap way to trim dead end nodes?"

The word vector comes to mind but I have yet to think about it enough, I just wanted to check in and say that I appreciate your comments and it is really getting me thinking. I now believe competing with ASICSs is a fool’s errand but I’m still intrigued by the possibilities of finding another way..

Anynomous · August 26, 2025, 11:13am

I think you quite well distill from the discussion the main point, which is: Whatever solution you can think of, it would be some highly specialized chip designed specifically to solve these networks graphs. In other words, the solution you come up with would be an Application-Specific Integrated Circuit (ASIC) and not something build on general purpose hardware.

If you would go into the general purpose hardware direction with your thoughts, you would end up with high-end expensive GPU’s for their energy efficient massive parallelization with lots of fast memory. So again not something cheap you can do at home. ASIC’s will always be as long as there is enough intensive to make them.

navie · September 5, 2025, 4:26pm

Hey OP,

I’d be interested in seeing an open source miner like the Bitaxe for Grin solo mining that can be reasonably priced and sold (< $150).

Anyone know how complex or feasible this is from a hardware perspective?

transatoshi · September 17, 2025, 7:23am

Just take the case off a G1 mini and it’s basically a diet bitaxe for Grin.

navie · September 29, 2025, 7:16pm

Hah, it’s a good start but I suppose this is where the bottleneck starts and ends doesn’t it? C32 mining chip specs aren’t public knowledge and making a new chip would be the only way to accomplish this.

Topic		Replies	Views
Cuckatoo32 feasibility Mining	30	9105	March 30, 2019
Choice of ASIC Resistant PoW for GPU miners Mining	110	16065	February 23, 2019
Grin Mining FAQ - All of the answers to your mining questions Mining	49	38634	November 29, 2021
Best GPU for mining Grin Mining	180	31592	November 11, 2019
GPU mean memory reductions Mining	44	12337	August 16, 2019

Advice on Cuckatoo hardware implementation

Related topics