Great job getting it all working!
With my mining software, the CPU is responsible for the Searching times and the GPU is responsible for the Trimming times. Those two pipeline stages run in parallel with the CPU searching the remaining edges in the current graph for solutions while the GPU removes the edges that are not part of a cycle in the next graph.
The downside of this pipeline approach is that the first graph for a job will take longer to process since it takes GPU + CPU time to complete. However all remaining graphs for that job then take max(GPU, CPU) time to complete. So ideally you’d want the to adjust the TRIMMING_ROUNDS value until your Searching (CPU) and Trimming (GPU) times are around the same value which would maximize your graphs/second per job over a long enough time since your graphs/second would eventually converge to 1 / max(GPU, CPU).
However, after thinking this through, I don’t think that your mining rate is fast enough to benefit from this pipeline approach. For example, assuming you reduce your TRIMMING_ROUNDS value until both your Searching and Trimming times are around 53 seconds, then the first graph for a job will take you 53 + 53 = 106 seconds to process while all remaining graphs for that job will take you 53 seconds to process. With Grin block times being on average every 60 seconds, you’ll likely not even finish processing a single graph for the current job by the time the job for the next block is available. So you’ll probably find more non-stale solutions with TRIMMING_ROUNDS=90 than you will with TRIMMING_ROUNDS=25 on your Mac mini M4.
A solution is a valid proof of work that you found for the current block that you are mining. However a solution must have a high enough difficulty to be included as a block in the Grin blockchain. This required threshold difficulty adjusts over time to keep Grin’s average block time around 60 seconds.
Also, my mining software uses at most 4 CPU cores for the edge searching algorithm. The algorithm I used doesn’t scale very well with respect to CPU cores, and I couldn’t find a way to consistently make it faster when using more than 4 CPU cores. And you’re seeing your CPU burst at regular intervals because your TRIMMING_ROUNDS value is resulting in your Searching time being equal to 1/6th of your Trimming time, so the CPU will be idle 5/6th of the time since it’s waiting for the GPU to finish trimming the next graph.