Experimental higher-fidelity C31 in 10.5 GB

tromp · January 25, 2019, 12:13pm

In branch memred2 of my cuckoo repo, I now have seemingly working solvers cuda31.0 for RTX and cuda31.1 for GTX. See Makefile targets

cuda31.1: …/crypto/siphash.cuh mean.cu Makefile
(NVCC) -o @ -DNRB1=26 -DNEPS_A=133 -DNEPS_B=85 -DPART_BITS=1 -DEDGEBITS=31 -arch sm_35 mean.cu $(BLAKE_2B_SRC)

cuda31.0: …/crypto/siphash.cuh mean.cu Makefile
(NVCC) -o @ -DNRB1=26 -DNEPS_A=133 -DNEPS_B=85 -DPART_BITS=0 -DEDGEBITS=31 -arch sm_75 mean.cu $(BLAKE_2B_SRC)

The old solvers correspond to NRB1=32. This is the number of rows (out of 64) of bufferB that doesn’t overlap bufferA. By reducing it to 26, total memory size is reduced significantly, allowing NEPS_A/B to be raised to more comfortable levels resulting in much higher fidelity. Since I’m travelling, I don’t have time to properly test these versions, but I’m making them available for people to experiment with.

Using these solvers in grin-miner would require going into
grin-miner/cuckoo-miner/src/cuckoo_sys/plugins/cuckoo/
checking out branch memred2 and adjusting the compile options in CMakeLists.txt
none of which I can test on my laptop…

g011um · January 25, 2019, 1:27pm

Unfortunately, this doesn’t seem to work. On my RTX2080ti, cuda31.0 exits after 195ms (about the seeding time), and cuda31.1 does not find the 42-cycle for nonce 99, reports 2 28 10 10 10 10 10-cycle instead

zphou · January 25, 2019, 3:19pm

Have it built and running for about 2 mins, can’t tell difference atm.

fatchance · January 26, 2019, 3:21am

Also compiled and grin-miner is using the updated AT31 plugin with new compile flags.

Seems like it’s working as intended, in the sense that the grin-miner hashrate is less than the previous ~1.7 (now down to around ~1.5 range).

futurerheza · January 26, 2019, 5:48am

Compiled also. I’m sure it has a lower rejected rate. Is there any thing we could do to get the number of fidelity ?

zphou · January 26, 2019, 6:18am

GPS on my rig is the same as before, the only difference I can tell is memred2 branch use up all the vRAM on each card, code from master branch uses about 50% of the vRAM

g011um · January 26, 2019, 6:29am

Are you using the _gtx or _rtx plugin?
Which version of CUDA are you using?

zphou · January 26, 2019, 7:03am

rtx pluging with latest cuda-10

lofino · January 28, 2019, 1:51am

Mmm what has to be changed in the CMakeLists.txt file ?

fatchance · January 28, 2019, 4:36am

You have to add the new compile flags that @tromp mentioned above to the CMakeLists.txt file, more specifically in the if (NOT SKIP_CUCKATOO_GPU) section.

cuckoo · January 28, 2019, 6:33am

build failed with 'CMakeFiles/cuckatoo_mean_cuda_rtx_31.dir/all' failed when I update CMakeList.txt with the additional flags as shown below.

build_cuda_target("${AT_MEAN_CUDA_SRC}" cuckatoo_mean_cuda_rtx_31"-DNRB1=26 -DNEPS_A=133 -DNEPS_B=85 -DPART_BITS=0 -DEDGEBITS=31")

fatchance · January 28, 2019, 8:01am

As @tromp said up above:

This means you need to merge the changes that @tromp made to his memred2 cuckoo repository on GitHub, make sure it compiles properly on your server, update the specific files that were revised, update CMakeLists.txt, and then re-compile (and solve any compilation errors on the way).

Modifying the CMakeLists.txt file is the second-to-last step in the process. If you didn’t do the first few critical steps then it’s better to wait until they’re packaged up.

colincr33vey · January 28, 2019, 7:58pm

grin-miner/cuckoo-miner/src/cuckoo_sys/plugins/cuckoo/src/cuckatoo/mean.cu(618): error: no instance of function template "Round2" matches the argument list

argument types are: (int, u32, siphash_keys, u32 *, uint2 *, u32 *, u32 *)

Latest memred2 branch pulled, CMakeList updated to the same as cuckoo’s ^

Levicorpus · January 28, 2019, 8:18pm

Is there a quick way to tell whether you did it right? I’m running both versions now on different GPU. The experimental version seems to get a few more solutions, so far 21 vs 28, could still be luck though.

fatchance · January 28, 2019, 8:52pm

I just ran into the same issue. I think that @tromp’s latest commits broke the compilation process. Perhaps go back a few commits before he went to GrinConUS. From what I’m understanding, the improvements he made weren’t performance related at all, but more around reporting so that the reported gps rates in grin-miner match the gps rates that mining pools are reporting.

Ultimately, I think we’re going to have to wait for him to come back before we see further commits for both C31 and (hopefully) C32.

colincr33vey · January 28, 2019, 9:09pm

Ah, glad to hear I’m not alone – I was thinking it was user error on my behalf. Yeah, no harm in waiting, thanks for the reply

tromp · January 29, 2019, 8:14pm

Ok, think I have most if not all bugs fixed now.
Please try latest memred2 commit, and let me know if you still see problems, or if it works for you. Even better if you can give fidelity stats…

NOTE: there is now an option to set expand to 3,
which is the default for 30+
This means that when trying the new solver within grin-miner, you need to avoid setting expand=2

hans1024 · January 29, 2019, 8:41pm

How can I best test the fidelity?
Like this:
./cuda31.0 -d 0 -r 1000 | perl …/perl/cycles.pl
and report the outputs?

graemes · January 29, 2019, 9:43pm

2080ti:
./cuda31.0 -d 0 -r 1000 | perl …/perl/cycles.pl
42 24 1.008

tromp · January 29, 2019, 10:08pm

yes, the larger the range the better.

btw, one can try to further improve fidelity by increasing NEPS_A/B until running out of memory…

Topic		Replies	Views
What's the best miner for C31 algorithm? Mining	95	9295	February 14, 2019
GPU mean memory reductions Mining	44	12205	August 16, 2019
Miner Benchmarks - What is currently the best Grin miner? Mining	31	6748	March 19, 2019
Bminer 15.7.6 Fix Cuckaroo29d on RTX-series cards Mining	318	44436	July 26, 2020
[Mining] 4+ GPS with 1080ti but generating no shares [Solved] Tech Support	4	1475	December 7, 2018

Experimental higher-fidelity C31 in 10.5 GB

Related topics