Storing less kernel offsets on the blockchain

Kurt · June 16, 2020, 2:58am

Maybe has it already been considered previously by OGs or other people, but we could probably make the blockchain slightly smaller, and the verification slighlty faster by not storing the aggregated offsets per blocks but finding another, more scalable, solution instead. Not a very important topic since the gains will not be big, but I think I have already seen some conversations on keybases where non-important scalability gains were considered.

To have this scalability improvement, we would need to make a prunable MMR (a PMMR) for kernel offsets that is indexed on the blocks. For the leaf number i in this PMMR (corresponding to the i-th block), would be stored sum_{k=1…i}(kernel_offset_k), where kernel_offset_k represents the sum of the individual kernel offsets of block k. We would for example always keep unpruned the last 5 leaves of this PMMR in order to not make the small and natural reorgs a problem, while pruning all the leaves before that.

Assymptotically, the size of the merkle proofs for this PMMR will be negligible compared to the size of the set comprised of all the per-block kernel offsets since genesis that we currently keep stored in Grin, so that we can quickly look at the gains which such a change would produce.

Currently, due to Grin having one block per minute, the kernel offsets take approximatively 32 x 60 x 24 x 365 = 16819200 bytes per year. So having instead this PMMR on the kernel offsets would result in a gain of approximately 170Mb every 10 years.

Not a big gain, but it is the sum of small gains that contribute to larger gains.

One of the half-practical-half-conceptual advantage such a construction has is that in case of a catastrophic event on earth, such as a big magnetic attack on many computers, a subsequent reorg larger than 5 blocks would be impossible here in the case where the grin blockchain survived but the past per-block kernel offsets are lost, making Grin the first blockchain resilient to certain magnetic attacks outcomes. Half joking here, but still true.

antioch · June 16, 2020, 9:18am

We actually store the total kernel offset on each block header. Not for that block but total offset since genesis.

github.com

mimblewimble/grin/blob/55a929e214bd02aa564c694665148c223d6435f5/core/src/core/block.rs#L224-L227


	/// Total accumulated sum of kernel offsets since genesis block.
	/// We can derive the kernel offset sum for *this* block from
	/// the total kernel offset of the previous block header.
	pub total_kernel_offset: BlindingFactor,

So technically we only need the latest 32 bytes in the latest header. Everything else is just historical data.

If a node decided to not retain historical block headers then it could prune these historical offsets and everything would still work (presumably retaining some recent history to handle reorg/fork).

Kurt · June 16, 2020, 9:20am

By putting them in a PMMR you dont need to download them and then prune them. you just dont download them. that is why people like Mimblewimble, they dont need to download spent outputs.

antioch · June 16, 2020, 9:23am

I think maybe you are misunderstanding the benefits of using a PMMR - I don’t see how this could be used for kernel offsets here.
We use a PMMR for outputs to give us a single root hash that we can commit to in the header. Same for the root of the kernel MMR.
What we need for kernel offsets is the sum which we already commit to in the header.

The header MMR already exists and the root of this commits to the latest kernel offset (and all historical kernel offsets via all previous headers). I think this is probably what you want.

Kurt · June 16, 2020, 9:27am

You are saying that during an IBD we currently only download 32 bytes only for the kernel offsets ? that is the scalar that is the sum of all kernel offsets since genesis ?
I thought currently we download 32 bytes per block for the kernel offsets. Then, my mistake

tromp · June 16, 2020, 9:48am

The IBD happens only after identifying the most work chain, which involves downloading all historical headers. So that does include a cumulative offset at every height.

In the future we could support a compressed format for a range of headers. That would include only one cumulative offset for an arbitrarily large range of headers.

Kurt · June 16, 2020, 9:49am

Thanks John!

So having the PMMR instead brings a scalability advantage, described in the original post. i did not understand why Antioch said i misunderstood PMMRs. My understanding is not crystal clear, but having a PMMR on (cumulative sums of} kernel offsets would be nice to avoid having to download them at all and still fully validate the blockchain at ibd

antioch · June 16, 2020, 3:37pm

Maybe I misunderstand what you are proposing here - I don’t follow the scalability advantage.
Surely if we introduced an additional PMMR (for kernel offsets) we would also need to commit to this additional PMMR root in the header(s). And this would negate any benefits of not requiring the offset in every header?
Without a PMMR root in a header how would we be able to verify any merkle proof for a particular sum at a particular block height?
We’d be swapping 32 bytes of offset for 32 bytes of PMMR root plus additional complexity (merkle proofs etc.)

Kurt · June 16, 2020, 3:44pm

Maybe i need to look more at it and it is indeed not possible. I am sorry if i confused everyone…

i think i see what you mean and i failed to consider the PMMR root that would need to be included in each block header, fully defeating the purpose of the construction. i am going to try to look more at PMMR and stuff

antioch · June 16, 2020, 4:23pm

@kurt I suspect the existing header MMR is actually closer to what you are envisioning than you maybe realize. If you think about this less in terms of “not all historical offsets” and more “not all historical headers” themselves. Particularly if we get more serious with investigating FlyClient.
i.e. download a small random subset of headers, verify they all exist under the same header MMR root, then download recent headers, all in the same MMR and we know the overall total offset from the latest header downloaded.

Kurt · June 16, 2020, 4:51pm

Instead of having a specific PMMR for the offsets, which, as you described, require (probably like any MMR) to have the root commit to by each block header, taking 32 bytes, couldn’t we append the offsets to the kernel MMR itself: The kernel MMR, in that case, instead of containing only kernels for each of its leaves, as it currently is, would contain, every each block, the leaves storing each of the kernels for that block PLUS a specific (and prunable) leaf storing the aggregated kernel offset from genesis to this block. This would become a prunable MMR, where you never prune any of the leaves storing the kernels, but you can prune all the leaves storing the offsets (one such leaf per block) except the latest 5 ones.
That way you dont add any root (32 bytes) to the block headers, since 32 bytes are taken anyway by the root of the the current MMR on the kernels, and since the construction does not add any new MMR but rather utilizes the already-present kernel MMR. and if i am not incorrect, you would get here the scalability gain that i was describing in the original post.

EDIT: except that no, you would probably need as many merkle proofs as there are blocks, since only one leaf per block are pruned. so probably it is a construction, as is, without interest.

EDIT2: Maybe the construction would have interest if instead of putting the kernel offsets in the MMR storing the kernels, you put them instead in the PMMR storing the outputs. I think this one works well (No issue anymore with the merkle proofs taking the same amount of data as the pruned offsets) and provides indeed the scalability improvement. Curious to what you think about it

PaigePalmer · December 31, 2021, 5:16am

The IBD occurs only after determining the most important work chain, which includes downloading all historical headers. So there is a cumulative offset at each height.

In the future, we may be able to support a compressed format for a variety of headers. This would result in a single cumulative offset for an arbitrarily large number of headers.

oryhp · December 31, 2021, 11:17am

I suggest we ban the guy above. I’m assuming that’s a bot. Some of his comments are copy paste either from a link e.g. someone linked a research paper and his reply is an abstract from that, or from a reply above Storing less kernel offsets on the blockchain - #7 by tromp where he changed the wording a bit (similarly, he used words from Grin Website Redesign - #6 by crrdlx in his latest comment). Sometimes I didn’t find where he picked the answer from, but in any case, it seems like a spam.

vegycslol · December 31, 2021, 1:06pm

I don’t know if he’s is a bot but unless he stops spamming (mostly replying to very old topics with, at least to me, seemingly no reason) then i would ban him. His reading time is 28 minutes, has created 9 posts and read 111 of them, this is imo not a regular user

Topic		Replies	Views
Efficient solution to verify kernel uniqueness + better absolute timelocks Development and Technical Discussion	60	2439	July 23, 2020
Grin blockchain size and total blockchain capacity	6	1087	January 3, 2022
Eliminating transaction kernels Development and Technical Discussion	5	861	January 30, 2019
Unique Kernel Thread #73 Development and Technical Discussion	23	1996	September 28, 2020
Thoughts on utreexo? Research	3	184	May 10, 2024

Storing less kernel offsets on the blockchain

Related topics