rewrite this on Rust:
Here is the dev plan.
Cuckatoo → Rust: Progressive Milestones (easy ➜ hard)
Milestone 1 — Workspace & CPU “lean” baseline
Objective: Stand up a Rust workspace + a correct CPU reference for lean trimming and cycle verification for small EDGE_BITS (e.g., 12–16).
Why first: Gives you a truth model to compare GPU and other modes against.
Scope
· Create cargo workspace:
o cuckatoo-core
(algorithms & data types)
o cuckatoo-miner
(CLI runner; no networking yet)
· Implement:
o Header→edge generation (SipHash-2-4).
o Lean edge bitmap & node degree bitmap; one trim round; full 42-round trim loop.
o Cycle verifier (42-cycle check).
· CLI (parity with README wording where sensible):
--edge-bits
, --mode lean
, --tuning
(offline), print Searching time vs Trimming time like the C++ tool.
Deliverables
· Passing unit tests on hashing, 1-round trimming behavior, and cycle verification on toy graphs.
· cargo run -- --tuning --edge-bits 12 --mode lean
executes and prints timings.
Acceptance
· For fixed seeds at EDGE_BITS ≤ 16: CPU-lean produces stable survivors and can locate at least one valid 42-cycle on toy inputs.
Risks/Mitigation
· Tiny graphs may rarely contain a 42-cycle → include synthetic fixtures for the cycle checker.
Milestone 2 — CPU “mean” buckets + tuned memory
Objective: CPU mean trimmer with bucketed edges (locality speedup) and knobs mirroring the C++ miner.
Scope
· Implement mean trimmer (bucket sorting; contiguous access).
· Expose build/CLI knobs used by the C++ miner’s tuning section:
o TRIMMING_ROUNDS
, LOCAL_RAM_KILOBYTES
(use as logical sizing hint), SLEAN_TRIMMING_PARTS
placeholder.
· CPU micro-benchmarks (Criterion) for lean vs mean at small EDGE_BITS.
Acceptance
· Mean trimmer ≥2× faster than lean on CPU for EDGE_BITS 16–20 in benchmarks.
· CLI prints timings in the same “Searching/Trimming” vocabulary.
Milestone 3 — CPU “slean” (semi-lean) multi-pass
Objective: Implement slean (K parts) to simulate lower VRAM needs.
Scope
· Add K-pass slean with SLEAN_TRIMMING_PARTS
(power-of-two) and validate trade-offs: lower memory vs more rounds.
· Golden tests: survivors after slean(K=1) == mean; increasing K maintains correctness.
Acceptance
· slean results match mean for test vectors; performance/memory trade-off charted in docs.
Milestone 4 — CLI parity & multi-instance ergonomics
Objective: Mirror the C++ usage shape so users won’t be surprised.
Scope
· Flags matching README behavior/semantics:
o --display_gpus
(stub prints “CPU only” for now)
o --mean_trimming/--slean_trimming/--lean_trimming
allow-list
o --total_number_of_instances
, --instance
(CPU core partitioning)
· Config file (TOML) + env overrides.
Acceptance
· Running multiple processes with --total_number_of_instances/--instance
shows non-overlapping CPU core affinity (documented benefit).
Milestone 5 — Stratum V1 minimal client
Objective: Talk to pools; mine with CPU.
Scope
· Async Stratum client (Tokio + Serde JSON): subscribe/authorize/notify/submit.
· CLI matches README’s pool style: -a host:port -u user[.rig]
and the verbose long flags (--stratum_server_*
). Pools cited in README: 2Miners, WoolyPooly, MWC Pool, Pacific Pool.
· Job switch + cancellation (short nonce batches).
Acceptance
· Connects to a test pool (or mock), receives jobs, submits valid shares for small EDGE_BITS.
Notes
· 2Miners/WoolyPooly endpoints & account format come from README & pool pages. GitHub2MinersWoolyPooly
Milestone 6 — GPU host harness (wgpu) + SPIR-V loader
Objective: Cross-platform GPU host setup once, then reuse everywhere.
Scope
· Add cuckatoo-gpu
crate:
o Initialize wgpu (Vulkan/DX12/Metal backends).
o Enumerate adapters for --display_gpus
, implement --gpu N
.
· Set up Rust-GPU toolchain to compile a tiny shader crate to SPIR-V (no-op compute) with spirv-builder
; load with wgpu and dispatch. embarkstudios.github.io
Acceptance
· --display_gpus
lists actual GPUs; a dummy compute pass runs on selected device.
Why wgpu
· One codepath maps to Vulkan/D3D12/Metal automatically; future-proof vs deprecated bindings. (metal-rs is deprecated; use objc2-metal
if we need native Metal later.)
Milestone 7 — GPU “mean” trimming round (Rust-GPU kernel)
Objective: Port one trimming round of mean to a Rust shader; validate against CPU.
Scope
· Write #[spirv(compute)]
kernel that:
o Reads edge segments/buckets,
o Computes per-node degree (shared memory where possible),
o Marks edges for deletion.
· Host side: per-round dispatch loop; compact survivors (prefix-sum pass or mark-and-sweep buffer).
· Correctness harness: small/mid EDGE_BITS compare GPU vs CPU survivors bit-for-bit.
Acceptance
· GPU mean one round == CPU mean one round (bytes equal); multi-round pipeline produces same survivors list for EDGE_BITS up to 20.
Milestone 8 — Full GPU “mean” pipeline + perf targets
Objective: Chain 42 rounds on GPU; bring back survivors for CPU cycle finding.
Scope
· Device-local buffers for edges/bitmaps; staging buffers for download.
· Batch size & workgroup size tuning; expose specialization constants or shader feature flags for power users (documented).
· Performance target: ≥5× CPU-mean speedup for EDGE_BITS ~20–24 on a mid-tier GPU.
Acceptance
· End-to-end (job→GPU trimming→CPU cycle→submit) runs faster than CPU-only; stable over hours.
Milestone 9 — GPU “slean” (K-parts) & auto-fallback
Objective: Add slean on GPU + automatic mode choosing like the C++ miner’s “try mean→slean→lean” behavior.
Scope
· Implement K-segment GPU pipeline (partitions processed sequentially to fit VRAM).
· At startup, probe adapter memory; choose mean→slean→lean in that order unless user restricts via flags (--mean_trimming/--slean_trimming/--lean_trimming
).
Acceptance
· On lower-VRAM GPUs, slean runs successfully with the same correctness as CPU; auto-fallback picks the fastest feasible mode and logs the choice.
Milestone 10 — Plan-B backends (OpenCL & native Metal) for parity
Objective: If rust-gpu hits a wall, provide OpenCL (Linux/Windows/Android) and Metal (macOS/iOS) backends mirroring the repo’s kernels.
Scope
· OpenCL host (ocl
/opencl3
) loading lean_trimming.cl / mean_trimming.cl / slean_trimming.cl from the repo.
· Native Metal host using objc2-metal to compile lean/mean/slean .metal sources.
· Common trait GpuTrimmer
so wgpu/rust-gpu vs OpenCL/Metal are swappable.
Acceptance
· Either path (Unified SPIR-V OR OC L/Metal) passes the same correctness suite and meets performance targets.
Milestone 11 — Runtime parity flags & GPU RAM control
Objective: Match user options from README precisely for a polished UX.
Scope
· Implement:
o --gpu_ram <GB>
(where supported by platform), --gpu N
, --display_gpus
richer info,
o Multi-process guidance: --total_number_of_instances
, --instance
behavior to avoid CPU contention,
o Explicit trimming allow-lists, build-time TUNING=1 mode (skip Stratum).
Acceptance
· Commands analogous to README examples behave as documented (including tuning mode & pool commands).
Milestone 12 — Cross-platform packaging (Win/macOS/Linux)
Objective: Ship binaries that “just run”.
Scope
· Windows MSVC build; Vulkan/D3D12 backend selection verified.
· macOS (Intel/Apple Silicon) universal build; Metal path tested.
· Linux (x86_64) packages; validate with proprietary & Mesa drivers.
Acceptance
· Three OSes can mine on common GPUs; --display_gpus
shows proper adapters; no driver-specific crashes.
Milestone 13 — Mobile proofs (iOS/Android)
Objective: Demonstrate feasibility, not sustained mobile mining.
Scope
· iOS test app linking Rust lib; runs tiny EDGE_BITS with Metal.
· Android NDK build via cargo-ndk
; simple CLI or JNI app.
· Respect README iOS/Android build patterns (Xcode/NDK invocations).
Acceptance
· Tiny tuning jobs complete on device; Stratum socket works; resource usage is controlled.
Milestone 14 — Performance Autotuner & long-haul stability
Objective: Hit production-worthy throughput and stability.
Scope
· Autotuner: balance Searching time vs Trimming time by adjusting TRIMMING_ROUNDS
, workgroup sizes, and batch sizes (aim: Searching ≤ Trimming, but close).
· 24–72h soak tests against two different pools (e.g., 2Miners + mwcpool) with telemetry (accepted/rejected shares, graphs/sec).
Acceptance
· Stable over multi-day runs, >95% share acceptance, no memory growth.
Milestone 15 — Docs, examples, and release
Objective: Make it easy to use and extend.
Scope
· Comprehensive README with:
o Build matrix, GPU support table, typical VRAM per EDGE_BITS & mode, pool command examples (as in original README).
· Troubleshooting guide (driver quirks, OpenCL fallback notes, Metal/iOS caveats).
· Versioned releases (tag + artifacts).
Acceptance
Users can replicate original commands (e.g., 2Miners/WoolyPooly examples) in Rust miner with equivalent behavior