FPGA mining with cloud instances?

lehnberg · May 21, 2018, 2:33pm

Are you aware of the FPGA solutions that are being offered as part of AWS EC2 F1 or Nimbix?

Both take OpenCL, so porting a baseline version based off of something like @photon & @urza’s Mozkomor miner once available should be relatively straight forward no?

While it can’t be cheap, it also gives you access to high performant FPGAs at zero fixed costs. As there presumably are no ASICs for Cuckoo yet, is it possible that this would be performant enough in the early days? Would performance of these high end FPGAs make it worth while compared to GPUs and “home-kit” FPGAs?

OpticFlow · May 21, 2018, 3:04pm

If it’s profitable enough to run on AWS, it would be a money printing machine. Who’d give it away for free?

lehnberg · May 21, 2018, 3:53pm

I guess I just did, no? (; Hoping some kind soul who has the project’s best interest at heart would be willing to help us test this and benchmark it. I don’t have any OpenCL programming expertise, and don’t plan to be a miner, so it doesn’t make sense for me to invest in learning this at this point. But it seems like anyone who’s thinking of mining with GPUs using OpenCL should at least look into this option, even if it’s only in order to dismiss it as non-feasible.

photon · May 21, 2018, 4:06pm

Apart from siphash, cuckoo cycle is still very memory bound. I can imagine some FPGA specific tuning for cuckoo in OpenCL that would help, but the FPGA would still need over 512GB/s bandwidth and then be only 2-3x faster compared to 1080 Ti. Sure it would draw less power, but the rental cost might be more expensive than electricity.

lehnberg · May 21, 2018, 4:40pm

If I read the product page right, the f1.16xlarge would have 8 FPGA cards with 976 GiB instance memory and thanks to dedicated PCI-e have the FPGAs share memory space and communicate with each other at up to 12 Gbps in each direction. Is that the same bandwidth that you are referring to? And what impact would that have on mining? Is it a linear increase?

OpticFlow · May 21, 2018, 4:50pm

The vu9p boards on amazon have 4 DDR4 channels each (4x12 = 48 Gb/s). Also 346 Mb embedded RAM on the chip.

https://www.xilinx.com/video/software/developing-on-aws-f1-with-sdaccel-and-rtl-kernels-part2.html

photon · May 21, 2018, 6:32pm

It needs one of those models with 4GB HBM memory. Embedded 32MB of SRAM is good for buffering. But I would still question if it is worth the effort given the insane price tags.

tromp · May 21, 2018, 6:35pm

UltraScale+ boards feature integrated HBM2 memory with a bandwidth of 460 GB/s. For a single graph, an leaner mean solver needs to write (and read) 12 GB of data (each of 2^29 edges is written an expected 6 times, and takes only 4 bytes). So in the very best case, you could run 19 graphs per second. But in practice, you’d probably do very well to achieve half of that.

photon · May 21, 2018, 7:19pm

Yea, I miscalculated, the existing GPU OpenCL code could also be modified to transfer only approx 13GB using the large SRAM buffer on UltraScale+. So theoretical max speedup is 7x over 1080 Ti with real being less (3-4x ?).

OpticFlow · May 21, 2018, 7:47pm

Some have HBM2, but not the dev kits (that cost almost $5k) or the ones on AWS.

Topic		Replies	Views
AMD Miners - are they being built? Mining	89	10513	January 27, 2019
FPGA cards and grin Mining	9	2263	January 12, 2020
"Forest Kitten" FPGA card for memory hard PoWs Mining	2	2010	June 5, 2019
OpenCL component for mining Mining	10	2219	December 17, 2018
Innosilicon Grin miner available for preorder Mining	13	2181	March 18, 2019

FPGA mining with cloud instances?

Related topics