Storing less kernel offsets on the blockchain

Maybe has it already been considered previously by OGs or other people, but we could probably make the blockchain slightly smaller, and the verification slighlty faster by not storing the aggregated offsets per blocks but finding another, more scalable, solution instead. Not a very important topic since the gains will not be big, but I think I have already seen some conversations on keybases where non-important scalability gains were considered.

To have this scalability improvement, we would need to make a prunable MMR (a PMMR) for kernel offsets that is indexed on the blocks. For the leaf number i in this PMMR (corresponding to the i-th block), would be stored sum_{k=1…i}(kernel_offset_k), where kernel_offset_k represents the sum of the individual kernel offsets of block k. We would for example always keep unpruned the last 5 leaves of this PMMR in order to not make the small and natural reorgs a problem, while pruning all the leaves before that.

Assymptotically, the size of the merkle proofs for this PMMR will be negligible compared to the size of the set comprised of all the per-block kernel offsets since genesis that we currently keep stored in Grin, so that we can quickly look at the gains which such a change would produce.

Currently, due to Grin having one block per minute, the kernel offsets take approximatively 32 x 60 x 24 x 365 = 16819200 bytes per year. So having instead this PMMR on the kernel offsets would result in a gain of approximately 170Mb every 10 years.

Not a big gain, but it is the sum of small gains that contribute to larger gains.

One of the half-practical-half-conceptual advantage such a construction has is that in case of a catastrophic event on earth, such as a big magnetic attack on many computers, a subsequent reorg larger than 5 blocks would be impossible here in the case where the grin blockchain survived but the past per-block kernel offsets are lost, making Grin the first blockchain resilient to certain magnetic attacks outcomes. Half joking here, but still true.

We actually store the total kernel offset on each block header. Not for that block but total offset since genesis.

So technically we only need the latest 32 bytes in the latest header. Everything else is just historical data.

If a node decided to not retain historical block headers then it could prune these historical offsets and everything would still work (presumably retaining some recent history to handle reorg/fork).

By putting them in a PMMR you dont need to download them and then prune them. you just dont download them. that is why people like Mimblewimble, they dont need to download spent outputs.

I think maybe you are misunderstanding the benefits of using a PMMR - I don’t see how this could be used for kernel offsets here.
We use a PMMR for outputs to give us a single root hash that we can commit to in the header. Same for the root of the kernel MMR.
What we need for kernel offsets is the sum which we already commit to in the header.

The header MMR already exists and the root of this commits to the latest kernel offset (and all historical kernel offsets via all previous headers). I think this is probably what you want.

You are saying that during an IBD we currently only download 32 bytes only for the kernel offsets ? that is the scalar that is the sum of all kernel offsets since genesis ?
I thought currently we download 32 bytes per block for the kernel offsets. Then, my mistake

The IBD happens only after identifying the most work chain, which involves downloading all historical headers. So that does include a cumulative offset at every height.

In the future we could support a compressed format for a range of headers. That would include only one cumulative offset for an arbitrarily large range of headers.

Thanks John!

So having the PMMR instead brings a scalability advantage, described in the original post. i did not understand why Antioch said i misunderstood PMMRs. My understanding is not crystal clear, but having a PMMR on (cumulative sums of} kernel offsets would be nice to avoid having to download them at all and still fully validate the blockchain at ibd

Maybe I misunderstand what you are proposing here - I don’t follow the scalability advantage.
Surely if we introduced an additional PMMR (for kernel offsets) we would also need to commit to this additional PMMR root in the header(s). And this would negate any benefits of not requiring the offset in every header?
Without a PMMR root in a header how would we be able to verify any merkle proof for a particular sum at a particular block height?
We’d be swapping 32 bytes of offset for 32 bytes of PMMR root plus additional complexity (merkle proofs etc.)

Maybe i need to look more at it and it is indeed not possible. I am sorry if i confused everyone…

i think i see what you mean and i failed to consider the PMMR root that would need to be included in each block header, fully defeating the purpose of the construction. i am going to try to look more at PMMR and stuff

@kurt I suspect the existing header MMR is actually closer to what you are envisioning than you maybe realize. If you think about this less in terms of “not all historical offsets” and more “not all historical headers” themselves. Particularly if we get more serious with investigating FlyClient.
i.e. download a small random subset of headers, verify they all exist under the same header MMR root, then download recent headers, all in the same MMR and we know the overall total offset from the latest header downloaded.

Instead of having a specific PMMR for the offsets, which, as you described, require (probably like any MMR) to have the root commit to by each block header, taking 32 bytes, couldn’t we append the offsets to the kernel MMR itself: The kernel MMR, in that case, instead of containing only kernels for each of its leaves, as it currently is, would contain, every each block, the leaves storing each of the kernels for that block PLUS a specific (and prunable) leaf storing the aggregated kernel offset from genesis to this block. This would become a prunable MMR, where you never prune any of the leaves storing the kernels, but you can prune all the leaves storing the offsets (one such leaf per block) except the latest 5 ones.
That way you dont add any root (32 bytes) to the block headers, since 32 bytes are taken anyway by the root of the the current MMR on the kernels, and since the construction does not add any new MMR but rather utilizes the already-present kernel MMR. and if i am not incorrect, you would get here the scalability gain that i was describing in the original post.

EDIT: except that no, you would probably need as many merkle proofs as there are blocks, since only one leaf per block are pruned. so probably it is a construction, as is, without interest.

EDIT2: Maybe the construction would have interest if instead of putting the kernel offsets in the MMR storing the kernels, you put them instead in the PMMR storing the outputs. I think this one works well (No issue anymore with the merkle proofs taking the same amount of data as the pruned offsets) and provides indeed the scalability improvement. Curious to what you think about it