How to mine Cuckoo 30 using Grin
This is a brief guide on how to configure mining at production-level Cuckoo 30 in the current master version of Grin, (i.e.) Testnet2. If you have the hardware and you’d like to help with mining testing efforts, want to get a sense for what mining will be like on Grin’s mainnet, or just want to learn more about mining in Grin, then this guide is for you.
Building Grin
The first thing you’ll need to do is build and run grin. Follow the instructions for installing prerequisites and building Grin down to the Build Grin section of the main build doc, but note that since we’re focused on mining testnet2 at Cuckoo 30, we’ll be using the master branch for now. Ensure you’re building on the correct branch with:
git checkout master
git pull
cargo build
Make sure grin compiles and runs as usual, using the default settings. By default, Grin runs a stratum server on port 13416, which is a server that communicates with remote mining workers. One or many remote ‘miners’ can mine into a single Grin wallet hosted by the grin node.
Modifications to grin.toml
The default settings in grin.toml should work for running a mining server for testnet2. Ensure you have a copy of grin.toml in your target directory for if you want to make modifications.
If everything is in order, the grin.toml file is in your current directory and the grin executable is in your path, you should just be able to run grin without arguments, e.g:
grin
Building grin-miner
grin-miner is a standalone mining client that actually runs Cuckoo Cycle solvers and communicates results back to listening stratum mining server on a Grin node. This architecture enables multiple machines mining into the same wallet, and ensures the grin node can’t be taken down by a mining plugin crashing (which happens quite frequently due to the bleeding-edge nature of solvers).
To build grin miner (in a separate directory from Grin or on a separate machine altogether:)
git clone https://github.com/mimblewimble/grin-miner
cd grin-miner
cargo build
More detailed instructions can be found on the grin-miner build page
Modifications to grin-miner.toml
As with grin, grin-miner should be run in a directory where it can find a grin-miner.toml file. To get a basic miner up and running, ensure that a grin node is running and that grin-miner is configured to point to it in grin-miner.toml in the line:
stratum_server_addr = "127.0.0.1:13416"
If your grin node and grin-miner are on the same machine, the defaults should work. Otherwise, modify this line accordingly.
Basic CPU Mining
If everything is working correctly, you should see grin-miner connecting, accepting solutions, and mining in the TUI. You should also see the output similar to the following in the logs, which you can view while the application is running with tail -f grin-miner.log
:
Feb 02 09:46:08.115 INFO Cuckoo plugin 0 - /home/yeastplume/projects/rust/grin/target/debug/plugins/mean_compat_cpu_30.cuckooplugin
Feb 02 09:46:08.115 DEBG Cuckoo Plugin 0: Setting mining parameter NUM_THREADS to 1 on Device 0
.
.
. Followed by:
Feb 02 09:47:03.872 DEBG Plugin 0 - Device 0 (CPU) Status: OK - Last Graph time: 16.715218787; Graphs per second: 0.060
Feb 02 09:47:03.872 INFO Mining at 0.05982572006641841 graphs per second
Feb 02 09:47:20.754 DEBG Plugin 0 - Device 0 (CPU) Status: OK - Last Graph time: 16.875651791; Graphs per second: 0.059
Feb 02 09:47:20.754 INFO Mining at 0.0592569704794047 graphs per second
If you see this, then your device is mining away (very, very slowly with the default CPU miner single-threaded). Now we’ve confirmed that mining is working, we can move on to trying to speed it up.
Plugin Configuration
Different mining implementations can be swapped in and out of the grin-miner client via a plugin system, and you can see the default plugins built by grin-miner by examining the contents of target/debug/plugins
. By default, grin-miner builds “Lean” and “Mean” versions of cuckoo mining CPU plugins for mining at 2^16 and 2^30 cuckoo sizes. The ‘Lean’ plugin is very slow, but designed to use minimal memory, while ‘Mean’ plugins are designed to go as fast as possible but use as much RAM as is necessary (up to 4Gb in most cases). Generally, you’ll want to be using the mean version of the CPU plugin. You’ll also want to ensure you actually have at least 4Gb RAM free, which probably means having at least 8Gb system RAM total.
By default, grin-miner.toml
is configured to run the slowest and most widely compatible plugin, which you can see in the uncommented portion of the mining plugin configuration section:
[[mining.miner_plugin_config]]
type_filter = "mean_compat_cpu"
[mining.miner_plugin_config.device_parameters.0]
NUM_THREADS = 4
type_filter
here denotes that grin will pick up the plugin from grin-miner’s plugin directory called mean_compat_cpu
at Cuckoo size 30. Each plugin can optionally support multiple devices (which we’ll explore more later), but this particular plugin only has a single device at index 0, indicating the main CPU. To configure this device, parameters are set in the appropriate section, in this case under [mining.cuckoo_miner_plugin_config.device_parameters.0]
. This particular plugin only has a single parameter which denotes the number of CPU threads the solver will run on.
The mean_compat_cpu
plugin is intended to be widely compatible, however if your CPU is relatively up-to-date (as in the past few years) you should be able to use the ‘mean_cpu’ miner instead, which contains instructions that only work on newer processors. Chances are your processor will support it, but in any case it’s worth testing this by stopping grin, changing the following setting, and running again:
type_filter = "mean_cpu"
Which in the case of my development machine, (a middle-of-the-road i7-4790k at 4GHz), results in somewhat better performance; roughly 12 seconds instead of 17:
Feb 02 10:05:59.409 DEBG Plugin 0 - Device 0 (CPU) Status: OK - Last Graph time: 11.978399334; Graphs per second: 0.083
Feb 02 10:05:59.409 INFO Mining at 0.08348360846190503 graphs per second
But we can do better… This particular device has 4 CPU cores, (which you can see if you run cat /proc/cpuinfo
,) so I’ll try setting the number of threads to the number of physical cores in my CPU:
[mining.cuckoo_miner_plugin_config.device_parameters.0]
NUM_THREADS = 4
And try again. This time, I get output along the lines of:
Feb 02 10:08:40.870 DEBG Plugin 0 - Device 0 (CPU) Status: OK - Last Graph time: 3.79394227; Graphs per second: 0.264
Feb 02 10:08:40.871 INFO Mining at 0.26357807495051844 graphs per second
Just under 4 seconds, which is starting to look much more promising. Let’s try 8 threads (as I actually have 8 virtual cores):
Feb 02 10:11:00.097 DEBG Plugin 0 - Device 0 (CPU) Status: OK - Last Graph time: 3.359463397; Graphs per second: 0.298
Feb 02 10:11:00.097 INFO Mining at 0.29766658594732714 graphs per second
Which gets me under 3.5 seconds. For fun, let’s try 16:
Feb 02 10:12:17.759 INFO Mining at 0.30098994282890174 graphs per second
Feb 02 10:12:21.080 DEBG Plugin 0 - Device 0 (CPU) Status: OK - Last Graph time: 3.31403283; Graphs per second: 0.302
Which seems to shave 4 hundreths of a second off of graph times. I can tweak and test as much as I like, but for the purposes of this example, I’ll stop here.
Also, note that if I inspect CPU usage using the top
command, I can see my entire CPU is happily engaged during one of these runs.
%Cpu0 : 94.7/1.3 96[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
%Cpu1 : 96.7/0.7 97[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
%Cpu2 : 96.7/0.0 97[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
%Cpu3 : 94.7/0.7 95[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
%Cpu4 : 97.4/0.7 98[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
%Cpu5 : 96.7/1.3 98[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
%Cpu6 : 96.0/0.0 96[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
%Cpu7 : 96.0/0.0 96[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
Your individual results will vary depending on many factors, but the take-away here is that tuning mining for your individual system is going to be an iterative process of tweaking parameters and observing the results. From my own experience, I know that the mean_cpu miner is unlikely to get much faster than this on my system, so I’ll leave these settings for now and move on to how we can configure a GPU miner:
GPU (CUDA) Mining
CUDA and nvidia-smi
Next, I’m going to try configuring my system to run a GPU miner within Grin. As noted in the Mining FAQ, only nVidia cards are supported… ATI solvers will very probably be coming later from the community.
Assuming you have an appropriate CUDA-compatible card (anything from the 9xx or 10xx series should work,) and the appropriate drivers installed (usually a package called nvidia
) you first need to enable the building of the CUDA plugins on your system. For this, you’ll first need the nVidia cuda
package which contains the cuda libraries and special versions of gcc used to compile cuda GPU code:
#whatever package manager your distro uses.. (don't just type the below)
[apt-get install][pacman -Sy][etc] cuda
The cuda
package is generally quite large, so allow it to install and then check it’s working with (you may have to log out and into your shell again):
nvcc
nvcc fatal : No input files specified; use option --help for more information
Also, it’s handy to look at the output of nvidia-smi
to get stats on your installed cards. On my system, ‘nvidia-smi’ shows a single 980Ti:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.34 Driver Version: 387.34 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 980 Ti Off | 00000000:01:00.0 On | N/A |
| 29% 47C P0 71W / 250W | 923MiB / 6077MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
This utility is useful for measuring workload and power usage. You can also get continually updating output via:
nvidia-smi dmon
Which scrolls output like the following until interrupted:
# gpu pwr temp sm mem enc dec mclk pclk
# Idx W C % % % % MHz MHz
0 18 46 0 7 0 0 405 135
0 18 46 1 7 0 0 405 135
0 18 46 1 7 0 0 405 135
0 18 46 0 7 0 0 405 135
0 18 46 0 7 0 0 405 135
Building grin-miner’s CUDA plugin
Assuming the cuda package is installed as described above, next we need to configure grin-miner to build the CUDA enabled plugins. We do this by editing the file util/Cargo.toml
in grin-miner’s source directory as follows:
#uncomment this feature to enable cuda builds (cuda toolkit must be installed)
features=["build-cuda-plugins"]
Remove the #
from the beginning of the #features=["build-cuda-plugins"]
line and save the file. Then rebuild grin-miner from the top-level grin-miner directory as usual:
cargo build
Then cross your fingers and hope the build works. You may come across a situation whereby the build attempts to use a version of GCC that’s incompatible with the cuda library you’re using, in which case you can override the GCC compiler used during the build via:
CUDA_HOST_COMPILER=[PATH_TO_GCC] cargo build
Many things could go wrong at this point with the build, and many conditions and cases aren’t yet handled in the underlying cmake files controlling the build. This portion of the build should get more robust over time, but for now if you have an issue you can’t fix, seek help on gitter or the forums.
If everything goes successfully with the build, you should see the cuda plugin appear in the target/debug/plugins
directory of the grin build as in the following listing:
cuda_30.cuckooplugin lean_cpu_30.cuckooplugin mean_compat_cpu_30.cuckooplugin mean_cpu_30.cuckooplugin
lean_cpu_16.cuckooplugin mean_compat_cpu_16.cuckooplugin mean_cpu_16.cuckooplugin
If you see cuda_30.cuckooplugin
as listed above, you should be good to go.
Configuring grin.toml for CUDA
Next, we’re going to try mining on a single CUDA device instead of the CPU, which just means enabling the CUDA plugin in grin.toml. For now, comment out the mean_cpu
configuration from earlier:
#[[mining.miner_plugin_config]]
#type_filter = "mean_cpu"
#[mining.cuckoo_miner_plugin_config.device_parameters.0]
#NUM_THREADS = 16·
And comment in the following a bit further down in the file:
[[mining.miner_plugin_config]]
type_filter = "cuda"
[mining.miner_plugin_config.device_parameters.0]
USE_DEVICE = 1
Now restart Grin. This will start mining on whatever CUDA device shows up on your system as device 0, in my case, a still useful but slightly ageing 980Ti:
Feb 02 11:46:06.290 DEBG Plugin 0 - Device 0 (GeForce GTX 980 Ti) Status: OK - Last Graph time: 3.60286413; Graphs per second: 0.278
Feb 02 11:46:06.291 INFO Mining at 0.27755695577673645 graphs per second
Which is this case, is even slower than the mean CPU miner at 8 threads! However, we’re confirmed as GPU mining at this stage.
Running Multiple Plugins
Grin-miner can run multiple plugins, or multiple devices configured within a particular plugin. To try this, without removing our GPU configuration, let’s comment back in the earlier CPU configuration:
[[mining.miner_plugin_config]]
type_filter = "mean_cpu"
[mining.cuckoo_miner_plugin_config.device_parameters.0]
NUM_THREADS = 8
And run grin-miner again. This time, we can see both plugins working away in the output, and our combined graphs per second has roughly doubled:
Feb 02 11:59:39.068 DEBG Mining: Plugin 0 - Device 0 (CPU) Status: OK : Last Graph time: 3.562477409s; Graphs per second: 0.281 - Total Attempts: 3
Feb 02 11:59:39.070 DEBG Mining: Plugin 1 - Device 0 (GeForce GTX 980 Ti) Status: OK : Last Graph time: 3.890610665s; Graphs per second: 0.257 - Total Attempts: 2
Feb 02 11:59:39.070 INFO Mining at 0.5377325938934971 graphs per second
Running Multiple GPUs
You may have multiple GPUs in your machine (like in a dedicated mining rig, for instance). In this case, you can configure the cuda plugin to run as many on them at a time as you desire.
To do this, modify the cuda plugin configuration section of grin.toml to look roughly similar to the following, (depending on how many devices you want to use):
#Parameters can be set per device, as below. In sync mode
#device 0 is currently the only device used. In async mode
#device 0 is used by default, and all other devices are
#disabled unless explicitly enabled by setting the 'USE_DEVICE'
#param to 1 on each device, as demonstrated below.
[[mining.miner_plugin_config]]
type_filter = "cuda"
[mining.miner_plugin_config.device_parameters.0]
USE_DEVICE = 1
# Below are advanced optional per-device tweakable params
#GENU_BLOCKS = 256
#GENU_TPB = 8
#GENV_STAGE1_TPB = 32
#GENV_STAGE2_TPB = 128
#TRIM_STAGE1_TPB = 32
#TRIM_STAGE2_TPB = 96
#RENAME_0_STAGE1_TPB = 32
#RENAME_0_STAGE2_TPB = 64
#RENAME_1_STAGE1_TPB = 32
#RENAME_1_STAGE2_TPB = 128
#TRIM_3_TPB = 64
#RENAME_3_TPB = 2
[mining.miner_plugin_config.device_parameters.1]
USE_DEVICE = 1
#[mining.miner_plugin_config.device_parameters.2]
#USE_DEVICE = 1
Note we’re leaving the advanced tweakable parameters commented out for now, and will just use the defaults. In my case, I’ve switched to another machine with 2 GPUs installed, and I’ve enabled another section to include the second device. If I had a third installed, I’d add another sections such as
[mining.cuckoo_miner_plugin_config.device_parameters.3]
and configure parameters for it as desired.
At a minimum, each device other than 0 has to have USE_DEVICE = 1
in order to run. Whatever your system sees as device 0 will run by default, and the USE_DEVICE = 1
is optional in that case (you can set it to 0 to exclude it from mining).
Now, running grin-miner again (note again I’ve switched machines for this example,) on a system with an i5 CPU, a 1080 and a 1080Ti, I get output like the following:
Feb 02 12:16:25.027 DEBG Mining: Plugin 0 - Device 0 (CPU) Status: OK : Last Graph time: 4.019902621s; Graphs per second: 0.249 - Total Attempts: 2
Feb 02 12:16:25.028 DEBG Mining: Plugin 1 - Device 0 (GeForce GTX 1080 Ti) Status: OK : Last Graph time: 1.135864421s; Graphs per second: 0.880 - Total Attempts: 6
Feb 02 12:16:25.028 DEBG Mining: Plugin 1 - Device 1 (GeForce GTX 1080) Status: OK : Last Graph time: 1.789290858s; Graphs per second: 0.559 - Total Attempts: 4
Feb 02 12:16:25.029 INFO Mining at 1.6880296360141922 graphs per second
For a blazing 1.7 graphs per seconds average! As grin-miner is currently configured, I should hopefully solve a block in about 30 seconds mining solo at difficulty 1.
Advanced Parameters
In the case of this particular GPU plugin, each device can be configured with a set of tweakable parameters. The exact meaning of these parameters are difficult to explain without getting into very low level details about how the current GPU miner works, but it’s still possible to try modifying these values and see what effects they have on graph times. For instance, I’m going to attempt to change #GENV_STAGE1_TPB
to 128 on the 1080ti and see what happens:
#GENU_BLOCKS = 256
#GENU_TPB = 8
GENV_STAGE1_TPB = 128
#GENV_STAGE2_TPB = 128
#TRIM_STAGE1_TPB = 32
#TRIM_STAGE2_TPB = 96
#RENAME_0_STAGE1_TPB = 32
#RENAME_0_STAGE2_TPB = 64
#RENAME_1_STAGE1_TPB = 32
#RENAME_1_STAGE2_TPB = 128
#TRIM_3_TPB = 64
#RENAME_3_TPB = 2
After uncommenting the appropriate line and changing the value, I get:
Feb 02 12:23:55.072 DEBG Mining: Plugin 0 - Device 0 (CPU) Status: OK : Last Graph time: 4.125390899s; Graphs per second: 0.242 - Total Attempts: 4
Feb 02 12:23:55.072 DEBG Mining: Plugin 1 - Device 0 (GeForce GTX 1080 Ti) Status: OK : Last Graph time: 1.250153236s; Graphs per second: 0.800 - Total Attempts: 13
Feb 02 12:23:55.072 DEBG Mining: Plugin 1 - Device 1 (GeForce GTX 1080) Status: OK : Last Graph time: 1.868723816s; Graphs per second: 0.535 - Total Attempts: 8
Feb 02 12:23:55.073 INFO Mining at 1.5774277673945534 graphs per second
Looks like it made things a 10th of a second slower. Now I’ll try tweaking something else:
TRIM_STAGE_2_TPB = 256
And I get:
Feb 02 12:27:51.839 DEBG (Server ID: Port 13414) Mining at Cuckoo30 for at most 90 secs at height 3 and difficulty 1.
Device 0 GPUassert: unspecified launch failure /home/mcordner/.cargo/git/checkouts/cuckoo-miner-4752934f0f1f2bfe/7608c1a/src/cuckoo_sys/plugins/cuckoo/src/mean_miner.cu 948
Device 1 GPUassert: unspecified launch failure /home/mcordner/.cargo/git/checkouts/cuckoo-miner-4752934f0f1f2bfe/7608c1a/src/cuckoo_sys/plugins/cuckoo/src/mean_miner.cu 928
Feb 02 12:27:54.054 DEBG Mining: Plugin 0 - Device 0 (CPU) Status: OK : Last Graph time: 0s; Graphs per second: inf - Total Attempts: 0
Feb 02 12:27:54.055 DEBG Mining: Plugin 1 - Device 0 (GeForce GTX 1080 Ti) Status: ERRORED : Last Graph time: 1.377844771s; Graphs per second: 0.726 - Total Attempts: 1
Feb 02 12:27:54.055 DEBG Mining: Plugin 1 - Device 1 (GeForce GTX 1080) Status: ERRORED : Last Graph time: 1.444037082s; Graphs per second: 0.693 - Total Attempts: 1
Feb 02 12:27:54.056 INFO Mining at 1.4182741537420638 graphs per second
Oops… I’ve likely exceeded a memory limit and caused everything to fail. (Note that the status of the GPU devices is ERRORED, which is a bad thing) In this case, I set values back and try something else.
Parameters and their meanings will change depending on the plugin implementation, and when new implementations come out they will most likely have completely different sets of tuning parameters.
Cuckoo cuda_30 Parameters
A full tuning guide for the current cuda_30 plugin, with explanations for each parameter that can be used in grin.toml can be found here. Note that the names of the parameters in the guide are slightly different from the names in grin.toml; mappings are:
N_TRIMS -> -m trims
N_BLOCKS -> -b blocks
GENU_BLOCKS -> -U blocks
GENU_TPB -> -u threads
GENV_STAGE_1_TPB -> -V threads
GENV_STAGE_2_TPB -> -v threads
TRIM_STAGE_1_TPB -> -T threads
TRIM_STAGE_2_TPB -> -t threads
RENAME_0_STAGE_1_TPB -> -X threads
RENAME_0_STAGE_2_TPB -> -x threads
RENAME_1_STAGE_1_TPB -> -Y threads
RENAME_1_STAGE_2_TPB -> -y threads
TRIM_3_TPB -> -Z threads
RENAME_3_TPB -> -z threads
Collecting stats
So, if you’ve got this far, hopefully you’ve been able to mine on at least your CPU and hopefully one or more CUDA devices. Now that you’ve gone this far, please help us out by posting your findings below! Collecting metrics for all of the varying hardware specs will be invaluable feedback for development and for the entire Grin community!
When posting your findings, please be sure to note as much relevant information as you can about details of your setup, for instance (not exhaustive):
- Number of GPU/CPUs you’re running
- CPU: vendor, make, clock speed (e.g intel i7 4790k @ 4.2 Ghz)
- GPU: vendor, reference spec, RAM, overclock status (ASUS 1080Ti at stock speeds)
- GPS (graphs per second) you’re getting for each device (can be found via debug logging)
- Power usage (important!) collected from
nvidia-smi dmon
output (see the mining guide above for details) - Whether you’re running a single device in sync more or async mode
- Parameters you’re setting for each device within grin.toml
- How long you’ve been able to mine without seeing errors
- If errors occur, and details from the logs/backtraces you can find.
And more bits of info which aren’t coming to mind right this moment but I’m sure will become evident. Once we have a decent number of stats, we’ll figure out the best way to collate them (possibly a google spreadsheet or the like). Also, if anyone from the community wants to take over stat compilation it would be much appreciated and I’ll do whatever’s needed to support them!
Looking very forward to seeing some findings below!
Happy mining!