Request for funding Cuckatoo reference miner

rewrite this on Rust:

GitHub - NicolasFlamel1/Cuckatoo-Reference-Miner: Cuckatoo miner that supports cuckatoo10 to cuckatoo32 and implements mean, slean, and lean edge trimming on the GPU using OpenCL and Metal. .

Here is the dev plan.

Cuckatoo → Rust: Progressive Milestones (easy ➜ hard)

Milestone 1 — Workspace & CPU “lean” baseline

Objective: Stand up a Rust workspace + a correct CPU reference for lean trimming and cycle verification for small EDGE_BITS (e.g., 12–16).
Why first: Gives you a truth model to compare GPU and other modes against.

Scope

· Create cargo workspace:

o cuckatoo-core (algorithms & data types)

o cuckatoo-miner (CLI runner; no networking yet)

· Implement:

o Header→edge generation (SipHash-2-4).

o Lean edge bitmap & node degree bitmap; one trim round; full 42-round trim loop.

o Cycle verifier (42-cycle check).

· CLI (parity with README wording where sensible):
--edge-bits, --mode lean, --tuning (offline), print Searching time vs Trimming time like the C++ tool.

Deliverables

· Passing unit tests on hashing, 1-round trimming behavior, and cycle verification on toy graphs.

· cargo run -- --tuning --edge-bits 12 --mode lean executes and prints timings.

Acceptance

· For fixed seeds at EDGE_BITS ≤ 16: CPU-lean produces stable survivors and can locate at least one valid 42-cycle on toy inputs.

Risks/Mitigation

· Tiny graphs may rarely contain a 42-cycle → include synthetic fixtures for the cycle checker.


Milestone 2 — CPU “mean” buckets + tuned memory

Objective: CPU mean trimmer with bucketed edges (locality speedup) and knobs mirroring the C++ miner.
Scope

· Implement mean trimmer (bucket sorting; contiguous access).

· Expose build/CLI knobs used by the C++ miner’s tuning section:

o TRIMMING_ROUNDS, LOCAL_RAM_KILOBYTES (use as logical sizing hint), SLEAN_TRIMMING_PARTS placeholder.

· CPU micro-benchmarks (Criterion) for lean vs mean at small EDGE_BITS.

Acceptance

· Mean trimmer ≥2× faster than lean on CPU for EDGE_BITS 16–20 in benchmarks.

· CLI prints timings in the same “Searching/Trimming” vocabulary.


Milestone 3 — CPU “slean” (semi-lean) multi-pass

Objective: Implement slean (K parts) to simulate lower VRAM needs.
Scope

· Add K-pass slean with SLEAN_TRIMMING_PARTS (power-of-two) and validate trade-offs: lower memory vs more rounds.

· Golden tests: survivors after slean(K=1) == mean; increasing K maintains correctness.

Acceptance

· slean results match mean for test vectors; performance/memory trade-off charted in docs.


Milestone 4 — CLI parity & multi-instance ergonomics

Objective: Mirror the C++ usage shape so users won’t be surprised.
Scope

· Flags matching README behavior/semantics:

o --display_gpus (stub prints “CPU only” for now)

o --mean_trimming/--slean_trimming/--lean_trimming allow-list

o --total_number_of_instances, --instance (CPU core partitioning)

· Config file (TOML) + env overrides.

Acceptance

· Running multiple processes with --total_number_of_instances/--instance shows non-overlapping CPU core affinity (documented benefit).


Milestone 5 — Stratum V1 minimal client

Objective: Talk to pools; mine with CPU.
Scope

· Async Stratum client (Tokio + Serde JSON): subscribe/authorize/notify/submit.

· CLI matches README’s pool style: -a host:port -u user[.rig] and the verbose long flags (--stratum_server_*). Pools cited in README: 2Miners, WoolyPooly, MWC Pool, Pacific Pool.

· Job switch + cancellation (short nonce batches).

Acceptance

· Connects to a test pool (or mock), receives jobs, submits valid shares for small EDGE_BITS.

Notes

· 2Miners/WoolyPooly endpoints & account format come from README & pool pages. GitHub2MinersWoolyPooly


Milestone 6 — GPU host harness (wgpu) + SPIR-V loader

Objective: Cross-platform GPU host setup once, then reuse everywhere.
Scope

· Add cuckatoo-gpu crate:

o Initialize wgpu (Vulkan/DX12/Metal backends).

o Enumerate adapters for --display_gpus, implement --gpu N.

· Set up Rust-GPU toolchain to compile a tiny shader crate to SPIR-V (no-op compute) with spirv-builder; load with wgpu and dispatch. embarkstudios.github.io

Acceptance

· --display_gpus lists actual GPUs; a dummy compute pass runs on selected device.

Why wgpu

· One codepath maps to Vulkan/D3D12/Metal automatically; future-proof vs deprecated bindings. (metal-rs is deprecated; use objc2-metal if we need native Metal later.)


Milestone 7 — GPU “mean” trimming round (Rust-GPU kernel)

Objective: Port one trimming round of mean to a Rust shader; validate against CPU.
Scope

· Write #[spirv(compute)] kernel that:

o Reads edge segments/buckets,

o Computes per-node degree (shared memory where possible),

o Marks edges for deletion.

· Host side: per-round dispatch loop; compact survivors (prefix-sum pass or mark-and-sweep buffer).

· Correctness harness: small/mid EDGE_BITS compare GPU vs CPU survivors bit-for-bit.

Acceptance

· GPU mean one round == CPU mean one round (bytes equal); multi-round pipeline produces same survivors list for EDGE_BITS up to 20.


Milestone 8 — Full GPU “mean” pipeline + perf targets

Objective: Chain 42 rounds on GPU; bring back survivors for CPU cycle finding.
Scope

· Device-local buffers for edges/bitmaps; staging buffers for download.

· Batch size & workgroup size tuning; expose specialization constants or shader feature flags for power users (documented).

· Performance target: ≥5× CPU-mean speedup for EDGE_BITS ~20–24 on a mid-tier GPU.

Acceptance

· End-to-end (job→GPU trimming→CPU cycle→submit) runs faster than CPU-only; stable over hours.


Milestone 9 — GPU “slean” (K-parts) & auto-fallback

Objective: Add slean on GPU + automatic mode choosing like the C++ miner’s “try mean→slean→lean” behavior.
Scope

· Implement K-segment GPU pipeline (partitions processed sequentially to fit VRAM).

· At startup, probe adapter memory; choose mean→slean→lean in that order unless user restricts via flags (--mean_trimming/--slean_trimming/--lean_trimming).

Acceptance

· On lower-VRAM GPUs, slean runs successfully with the same correctness as CPU; auto-fallback picks the fastest feasible mode and logs the choice.


Milestone 10 — Plan-B backends (OpenCL & native Metal) for parity

Objective: If rust-gpu hits a wall, provide OpenCL (Linux/Windows/Android) and Metal (macOS/iOS) backends mirroring the repo’s kernels.
Scope

· OpenCL host (ocl/opencl3) loading lean_trimming.cl / mean_trimming.cl / slean_trimming.cl from the repo.

· Native Metal host using objc2-metal to compile lean/mean/slean .metal sources.

· Common trait GpuTrimmer so wgpu/rust-gpu vs OpenCL/Metal are swappable.

Acceptance

· Either path (Unified SPIR-V OR OC L/Metal) passes the same correctness suite and meets performance targets.


Milestone 11 — Runtime parity flags & GPU RAM control

Objective: Match user options from README precisely for a polished UX.
Scope

· Implement:

o --gpu_ram <GB> (where supported by platform), --gpu N, --display_gpus richer info,

o Multi-process guidance: --total_number_of_instances, --instance behavior to avoid CPU contention,

o Explicit trimming allow-lists, build-time TUNING=1 mode (skip Stratum).

Acceptance

· Commands analogous to README examples behave as documented (including tuning mode & pool commands).


Milestone 12 — Cross-platform packaging (Win/macOS/Linux)

Objective: Ship binaries that “just run”.
Scope

· Windows MSVC build; Vulkan/D3D12 backend selection verified.

· macOS (Intel/Apple Silicon) universal build; Metal path tested.

· Linux (x86_64) packages; validate with proprietary & Mesa drivers.

Acceptance

· Three OSes can mine on common GPUs; --display_gpus shows proper adapters; no driver-specific crashes.


Milestone 13 — Mobile proofs (iOS/Android)

Objective: Demonstrate feasibility, not sustained mobile mining.
Scope

· iOS test app linking Rust lib; runs tiny EDGE_BITS with Metal.

· Android NDK build via cargo-ndk; simple CLI or JNI app.

· Respect README iOS/Android build patterns (Xcode/NDK invocations).

Acceptance

· Tiny tuning jobs complete on device; Stratum socket works; resource usage is controlled.


Milestone 14 — Performance Autotuner & long-haul stability

Objective: Hit production-worthy throughput and stability.
Scope

· Autotuner: balance Searching time vs Trimming time by adjusting TRIMMING_ROUNDS, workgroup sizes, and batch sizes (aim: Searching ≤ Trimming, but close).

· 24–72h soak tests against two different pools (e.g., 2Miners + mwcpool) with telemetry (accepted/rejected shares, graphs/sec).

Acceptance

· Stable over multi-day runs, >95% share acceptance, no memory growth.


Milestone 15 — Docs, examples, and release

Objective: Make it easy to use and extend.
Scope

· Comprehensive README with:

o Build matrix, GPU support table, typical VRAM per EDGE_BITS & mode, pool command examples (as in original README).

· Troubleshooting guide (driver quirks, OpenCL fallback notes, Metal/iOS caveats).

· Versioned releases (tag + artifacts).

Acceptance

Users can replicate original commands (e.g., 2Miners/WoolyPooly examples) in Rust miner with equivalent behavior

3 Likes

A little bit of context would be welcome

1 Like

I don’t see the point in limiting EDGE_BITS. You might as well allow the full range of 4–63.

There’s nothing “full” about 42 rounds. In fact, number of trimming rounds has little to do with cycle length.

No need to hardcode 42. Better make it a runtime configurable CYCLE_LENGTH. You test tiny graphs with tiny cycle lengths.

The C implementation showed a ~ 4x speed improvement, so this seems rather modest.

3 Likes
# Milestone Expected date Price (USD)
1 CPU baseline (lean) 8 1000
2 CPU mean + knobs 8 1000
3 CPU slean (K-parts) 8 1000
4 CLI parity & multi-instance 8 1000
5 Stratum V1 (CPU mining) 8 1000
6 GPU host harness (wgpu) 8 1000
7 GPU mean – one trimming round 12 1500
8 GPU mean – full pipeline 20 2000
9 GPU slean + auto-fallback 12 1500
10 Plan‑B backends (OpenCL/Metal) [optional] 20 2000
11 Runtime parity & GPU RAM control 8 1000
12 Cross-platform packaging (desktop) 8 1000
13 Mobile proofs (iOS/Android) 8 1000
14 Autotuner & long-haul stability 12 1500
15 Docs & release 8 1500
Totals 156 17000
2 Likes

I support funding this, relying on lolminer and gminer to allow GPUs on the network is a weak point of Grin mining, and Nikolas has a track record of writing good code and delivering on their promises.

4 Likes

Ya for sure! That 4x speed that @tromp mentioned is a game changer! It would pin GRIN at the top of most profitable gpu coin to mine for months! Maybe years.

2 Likes

Maybe someone could light me up, but I don’t understand the need to rewrite the cpp implementation in rust. Is there benefit in doing so?

The benefit to having it written in rust is that it would be more easily integrated into other grin/rust based applications. Like how cool would it be if the grim wallet came with a miner ready to go.

1 Like

We already have a miner written in rust:

Exactly. Grin miners are written in rust - it’s the standard. Thanks for proving my point.

That miner uses non-rust plugins for the actual solvers though [1].

[1] grin-miner/cuckoo-miner/src/cuckoo_sys/ffi.rs at master · mimblewimble/grin-miner · GitHub

2 Likes

I agree that it is not a must to have it written in Rust yet it might be a worthy addition.

A Rust implementation with wide use cases (generic and configurable parameters) as well as Rust specific advantages like a) potential speed boost, b) memory efficiency and c) vastly better memory security - could make it worthwile.

The numbers mentioned in this source match my own experience of performance boosts when using Rust libraries.

Regarding the ask price for its implementation, I cannot judge how much work this would be. @tromp are the asked prices in the range of what you would expect?

1 Like

Yes, they are, except that the docs price seems comparatively low, and that is already grouped in with release.

The value of the project depends not only on meeting the objective milestones, but on the code and documentation quality as well.

3 Likes

I know, that is first project.

for the relationship, I have set the price a bit low.

And for long term working

3 Likes

Well if tromp says price is fair, and assuming Thomas can actually deliver the good, I think it would be worth deploying the funds for it.

In that case I support this request.

5 Likes

Working on Milestone 1: (Will complete on Monday)

  1. Created Rust workspace structure with two crates: cuckatoo-core (algorithms) and cuckatoo-miner (CLI)
  2. Implemented SipHash-2-4 algorithm for generating edges from block headers and nonces
  3. Built lean trimming system with 42-round edge reduction using bitmaps for efficient operations
  4. Created cycle verification algorithm to find 42-cycles in the trimmed graph using bipartite traversal
  5. Developed CLI application with --edge-bits, --mode lean, and --tuning parameters matching original C++ tool
6 Likes

These can be copied from the existing grin code [1] [2]

You still mention 42-round after I pointed out above that number of rounds has little to do with cycle length.

[1] grin/core/src/pow/siphash.rs at master · mimblewimble/grin · GitHub

[2] grin/core/src/pow/cuckatoo.rs at master · mimblewimble/grin · GitHub

3 Likes

I am wondering the adoption of Rust as a long-term language. Has it seen much recent enthusiasm in the overall coding community? Why so much interest on rust?

1 Like

Cuz grin is written in rust but also I feel rust has been even more popular now than when grin was originally written in it. Declarative computer stuff is pretty popular. Or maybe that’s just me but I like it a lot.

1 Like