Experimental higher-fidelity C31 in 10.5 GB

In branch memred2 of my cuckoo repo, I now have seemingly working solvers cuda31.0 for RTX and cuda31.1 for GTX. See Makefile targets

cuda31.1: …/crypto/siphash.cuh mean.cu Makefile
(NVCC) -o @ -DNRB1=26 -DNEPS_A=133 -DNEPS_B=85 -DPART_BITS=1 -DEDGEBITS=31 -arch sm_35 mean.cu $(BLAKE_2B_SRC)

cuda31.0: …/crypto/siphash.cuh mean.cu Makefile
(NVCC) -o @ -DNRB1=26 -DNEPS_A=133 -DNEPS_B=85 -DPART_BITS=0 -DEDGEBITS=31 -arch sm_75 mean.cu $(BLAKE_2B_SRC)

The old solvers correspond to NRB1=32. This is the number of rows (out of 64) of bufferB that doesn’t overlap bufferA. By reducing it to 26, total memory size is reduced significantly, allowing NEPS_A/B to be raised to more comfortable levels resulting in much higher fidelity. Since I’m travelling, I don’t have time to properly test these versions, but I’m making them available for people to experiment with.

Using these solvers in grin-miner would require going into
grin-miner/cuckoo-miner/src/cuckoo_sys/plugins/cuckoo/
checking out branch memred2 and adjusting the compile options in CMakeLists.txt
none of which I can test on my laptop…

5 Likes

Unfortunately, this doesn’t seem to work. On my RTX2080ti, cuda31.0 exits after 195ms (about the seeding time), and cuda31.1 does not find the 42-cycle for nonce 99, reports 2 28 10 10 10 10 10-cycle instead

Have it built and running for about 2 mins, can’t tell difference atm.

Also compiled and grin-miner is using the updated AT31 plugin with new compile flags.

Seems like it’s working as intended, in the sense that the grin-miner hashrate is less than the previous ~1.7 (now down to around ~1.5 range).

Compiled also. I’m sure it has a lower rejected rate. Is there any thing we could do to get the number of fidelity ?

GPS on my rig is the same as before, the only difference I can tell is memred2 branch use up all the vRAM on each card, code from master branch uses about 50% of the vRAM

Are you using the _gtx or _rtx plugin?
Which version of CUDA are you using?

rtx pluging with latest cuda-10

1 Like

Mmm what has to be changed in the CMakeLists.txt file ?

You have to add the new compile flags that @tromp mentioned above to the CMakeLists.txt file, more specifically in the if (NOT SKIP_CUCKATOO_GPU) section.

build failed with 'CMakeFiles/cuckatoo_mean_cuda_rtx_31.dir/all' failed when I update CMakeList.txt with the additional flags as shown below.

build_cuda_target("${AT_MEAN_CUDA_SRC}" cuckatoo_mean_cuda_rtx_31"-DNRB1=26 -DNEPS_A=133 -DNEPS_B=85 -DPART_BITS=0 -DEDGEBITS=31")

As @tromp said up above:

This means you need to merge the changes that @tromp made to his memred2 cuckoo repository on GitHub, make sure it compiles properly on your server, update the specific files that were revised, update CMakeLists.txt, and then re-compile (and solve any compilation errors on the way).

Modifying the CMakeLists.txt file is the second-to-last step in the process. If you didn’t do the first few critical steps then it’s better to wait until they’re packaged up.

grin-miner/cuckoo-miner/src/cuckoo_sys/plugins/cuckoo/src/cuckatoo/mean.cu(618): error: no instance of function template "Round2" matches the argument list

argument types are: (int, u32, siphash_keys, u32 *, uint2 *, u32 *, u32 *)

Latest memred2 branch pulled, CMakeList updated to the same as cuckoo’s ^

Is there a quick way to tell whether you did it right? I’m running both versions now on different GPU. The experimental version seems to get a few more solutions, so far 21 vs 28, could still be luck though.

I just ran into the same issue. I think that @tromp’s latest commits broke the compilation process. Perhaps go back a few commits before he went to GrinConUS. From what I’m understanding, the improvements he made weren’t performance related at all, but more around reporting so that the reported gps rates in grin-miner match the gps rates that mining pools are reporting.

Ultimately, I think we’re going to have to wait for him to come back before we see further commits for both C31 and (hopefully) C32.

Ah, glad to hear I’m not alone – I was thinking it was user error on my behalf. Yeah, no harm in waiting, thanks for the reply

Ok, think I have most if not all bugs fixed now.
Please try latest memred2 commit, and let me know if you still see problems, or if it works for you. Even better if you can give fidelity stats…

NOTE: there is now an option to set expand to 3,
which is the default for 30+
This means that when trying the new solver within grin-miner, you need to avoid setting expand=2

3 Likes

How can I best test the fidelity?
Like this:
./cuda31.0 -d 0 -r 1000 | perl …/perl/cycles.pl
and report the outputs?

2080ti:
./cuda31.0 -d 0 -r 1000 | perl …/perl/cycles.pl
42 24 1.008

yes, the larger the range the better.

btw, one can try to further improve fidelity by increasing NEPS_A/B until running out of memory…