This was discussed in the thread
https://forum.grin.mw/t/cuckatoo32-feasibility
Let me summarize here. Suppose an ASIC has 512MB of DRAM/SRAM for the edge bitmap and 512MB/2^k SRAM for the node bitmap.
To trim one C32 graph, one needs roughly 2^32 * 2.8 * 2^k * 2 = 24 * 2^k billion random node bit accesses, and 2^32 * 16 * 2^k * 3 = 206 * 2^k billion serial edge bit accesses.
Since the edge bits can be accessed in large words, for small enough k, the limiting factor is the random node bit accesses, each of which needs 32 address bits.
If you have B banks of SRAM each running at G GHz frequency, then one graph can be trimmed in 24 * 2^k / (B * G) seconds. This will also require on the order of B pipelined siphash circuits, and some overall control circuitry whose complexity is slightly superlinear in B.
Finally, there is the post-processing of the trimmed graph, which gives a few % overhead.
The Grin Ipollo G1 ASIC has k=3 (64MB SRAM) and external DRAM for the edge bitmap, so it’s limited by DRAM bandwidth. They need about 0.7 seconds to trim a C32 graph, which requires about 300 GB/s bandwidth.