Hey folks
For Grincon I did not only develop a C32 code, but also improved my grin C31 a lot for AMDs.
The updated codes got released today, here is the link:
New GRIN-AT31 performance code for Vega (+7%) and Navi (+12%). Requires amdgpu-pro 18.50 or newer or ROCm 2.10 driver
Experimental support for Cuckatoo-32 (use --coin GRIN-AT32) on 8G AMD cards (see further notes)
Windows release postponed due to incompatibilities with the new performance codes.
v0.9.3
Extended GRIN-AT31 compatibility to older drivers (18.x +)
Improved GRIN-AT31 performance on ROCm (RX 470/480/570/580/Vega/VII)
Introduced early job cancellation for GRIN-AT31+ (improves hash on pool side, see further release notes)
Deeply reworked kernel scheduler
Fixed GRIN-AT31 kernel bugs (improving stability and fidelity)
Fixed Bug: Vega FE loading 8G instead of 16G GRIN-AT31 solver in Windows
Fixed Bug: Watchdog did not call right file in Windows
Added --disablewatchdog 1 parameter to disable the 0 sol/s / 0 g/s detection
v0.9.2
Significant performance improvement of GRIN-AT31 on 8 / 16G cards (+5% on Polaris & Vega, +10 on Navi)
Experimental support for GRIN-AT31 and Polaris, Vega and VII using AMD ROCm drivers
Added range checks to GRIN-AT31 code (improves better stability)
Added function to call external watchdog scripts in case a GPU fails during mining (see release notes)
v0.9.1
Added GRIN-AT31 solver for 16G AMD cards (Better performance on Radeon Vega FE, Radeon VII and Sapphire RX 570 16G)
Updated GRIN-AT31 solver for 4G AMD cards (Better performance on Fiji based GPUs, Polaris 10 4G)
Fixed a bug causing too low pool hash on GRIN-AT31
Added experimental GRIN-AT31 support for AMD Navi (8G), AMD Fiji (4G) and AMD Hawaii (4G / 8G) GPUs
The miner uses an on GPU cycle finder and thus has very low on CPU load and should be farm friendly.
Also the total on chip memory use is only 7.6G and should allow mining even on Windows 10.
The fee is 1% for all Grin algorithms.
Sample usage:
./lolMiner --coin GRIN-AT31 --pool poolAddr:portNumber --user Account or Mail --pass Optional if Password needed
Overclock hint:
The siphash rounds are more compute intense then e.g. Ethash. Thus I would recommend not lowering the clock too much. Also it turned out that Ethash settings often have a too low voltage for Cuckatoo ending up in 0 g/s after a while. Best is to start with clocks slightly lower then stock ones (e.g. 10%) and then undervolt the card slowly as long as it remains stable.
Side note for Navi owners: I would like to support it soon, but currently the code suffers from a compiler bug causing a low fidelity, therefore support is currently disabled until I find a fix for this. Stay tuned!
would you mind testing your opencl miner with rocm navi driver?
you can find the driver download here.
the navi stability issue is not only the compiler , it’s the kernel driver wave id messed up which cause kernel hang. but if you use node as u32 all the processing, it should works well.
Overall have not tested ROCm with this code yet, but its on my list to be added soon. Was concentrating on the PAL drivers first, because thats more commonly used.
@Wolfshrike: Not at home, just with one remote rig (because at home energy is too expensive). The remote one is 2x V56, 1x64 clocked 1380 / 1000 @ 0.912V (the V64 is this) using approx 170 / 180W (56 / 64) and producing 3.2 g/s in total. At 10ct energy cost, current Grin nethash and proice thats a profit of 0.75 ct / day - more then what ETH does
Added GRIN-AT31 solver for 16G AMD cards (Better performance on Radeon Vega FE, Radeon VII and Sapphire RX 570 16G)
Updated GRIN-AT31 solver for 4G AMD cards (Better performance on Fiji based GPUs, Polaris 10 4G)
Fixed a bug causing too low pool hash on GRIN-AT31
Added experimental GRIN-AT31 support for AMD Navi (8G), AMD Fiji (4G) and AMD Hawaii (4G / 8G) GPUs
Because of the 3rd item I can recommend all users that used 0.9 to update to 0.9.1 - in almost all cases it should show better pool hash (although the one in the dashboard may go down slightly - reason is that some cycles that were before filtered out as potentially stale now pass the test and cause the GPU to work a bit more to recover the nonces for submission - anyways this will give better mining reward with new version now )
Small release note:
The miner requires the amdgpu-pro 19.30 driver or newer in Linux and AMD Adrenaline 19.10.1 whql or newer in Windows.
vega vii @ 1.7 gps is a good improvement on opencl implementation.
my hobby miner using rocm hcc stack , vega vii at 1.8 + gps, but I use gds for counting. would you like try gds counting in opencl? it will be faster but need embedded asm in code.
I still wonder why you do not publish your miner …
Well gds indeed would help - problem is that on Linux gpu-pro one is limited to 4k of GDS and on Windows it sometimes does not work at all with Adrenalin. If I would rebuild it all based on Vulkan that may work, but its a hard work. I guess that optimization will be limited to ROCm for a while.
ok, if gds won’t work in lol miner, there are another place you can optimize.
in original Tromp’s gpu circle finder, actually it can be 2x faster if you don’t use 13 MSB bit for tagging.
I though you already figure out that if you run gpu circle finder, it edge won’t reduce too much from each circle finder round b/c the tag passing will increase “tag carrying edge” alot.
in my algo , I have different implementation which when passing tag it still reduce the edge as normal.
Well I do not see how other tags should help there … (I used different tags before I developed my cuckarood miner that is basis for the code John published) but anyways: thats in a phase of the algorithm that does not have much weight - and even with slower decreasing edge count one uses less time for cycle finding (21 rounds) then doing 42 normal ones
the one I mentioned above is just for this 21 round dual direction tagging. from original your and Tromp’s algo, it cost around 40ms for these 21 tagging round, but it can lower to 20ms on vega VII. it’s not too much for whole performance though, it only increase perf in few percent.
Released 0.9.2 update today - Improving the performance further for C31 on 8 and 16G
Also above you see some examples for energy efficient clocks - with right settings it is possible to get the Vega GPUs (even V56) above 1.0 g/s at <130W - finally matching the Nvidia energy efficiency
It overall improves compatibility with older drivers (except Navi - that still needs new ones) and ROCM 2.10 thats now fully supported. Also the miner should run much more stable now.
Biggest feature for GRIN-AT31 is that the miner is now able to cancel its current calculations when a new job arrives. I made an example how this helps improving your hash on pool:
Assume you have a card running 1 g/s on GRIN-AT31. Then in 2 minutes (120 seconds) it will be able to process 120 graphs. On average - Grin has a 1 minute block time - two of these graphs are already obsolete when they got finished.
lolMiner 0.9.2 will display the 1 g/s, because of 120 processed graphs per 120 seconds. But two of the graphs are then filtered out in the stratum module, thus the pool can see at most 118 / 120 = 0.983 g/s (minus about 1 graph fee).
lolMiner 0.9.3 will cancel the running work before it gets completed. On average you will be able to process one more graph then 0.9.2 assuming both are canceled about half way through. Thus lolMiner 0.9.3 starts 121 graphs and completes 119 of them. The displayed hash rate is now 119 graphs /120 seconds = 0.991 g/s. This equals what the pool could see (minus one for fee), because the 119 completed are the ones that got not canceled early.
Conclusion: lolMiner 0.9.3 DISPLAYED hash rate may be a little lower then 0.9.2, but what arrives at pool is better. The slower the cards are, the more drastic is this effect, e.g. a 580 8G running at 0.65 g/s will benefit almost 2% on pool side, a VII with 1.75 g/s only half a percent.
Overall the displayed hash is more honest now about what you really will get. On top the miner has - given good enough pool connection - less then 0.5% stale shares (the test systems even run at <0.2%). So coming very close to what you see is what you get
This feature will get even more beneficial for the upcoming support of C32 Stay tuned.
Sadly for Linux only at the moment, because I did hit some driver limitation on Windows Hope I will have that resolved soon.
So whats new: much better C31 performance on Vega (+7%) and Navi (+12%) cards - on ROCm even a bit more.
But thats not all: C32 is supported for the first time (well almost… 0.9.3 had this as easter egg ^^). Right now its not competative with C31, but if you feel lucky you can try it out More tuning for this will come
Note that not all pools support C32 on their C31 port yet. Those running on the reference pool software as well as the solo nodes do it ofc. Have fun testing it
Thank you for your work
Just got new Radeon VII and it constantly crashes for me (C31, windows 10 and latest adrenaline, regular machine, not a mining rig with background tasks, web browsing, youtube etc.)
Maximum I get is 1-2hours of mining, after it freezes and reboots.
Auto undervolt, manual undervolt does not help.
Is there any specific settings or tips which can help with stable mining?
Thats interesting. I must say I mostly do not use Windows and its automatic tuning. Especially currently with Linux giving better hash rate. My Navi card was now mining 9 days in a row without any incident.
Wrt settings: my favorites currently are 1460 / 800 / 0.806v @ 1.6 g/s approx (Linux, 0.9.4)
I use my regular rig as a supplemental room heater and no crashes for me. Although you need to configure the watchdog script because it can stall out every now and then. I can even watch youtube videos while it mines in the background.
Windows 10 1903, Adrenaline 19.10.2, Radeon VII. 1.7 g/s 1750 / 1000 / 0.950v