(Aggregate?) Merkle Proofs

I have been doing some thinking/reading around “UTXO Obsolescence” and something has been bothering me. We would need a Merkle Proof for every output being spent this way. And these get expensive relative to simply referencing an output via its commitment.

It would be great if we could aggregate Merkle proofs somehow. But I have not seen anything in the literature about aggregating Merkle proofs, how to go about doing it, or if it is even possible to aggregate multiple proofs.

I think it is possible to aggregate Merkle proofs where the aggregate proof is smaller than simply combining multiple proofs.
So I’m wondering a couple of things -

  1. Where is the flaw in my reasoning here?
  2. And in the unlikely event this does actually work - has this been described anywhere before?

Merkle Proofs

A Merkle proof gives us a way of proving that an element exists in a Merkle tree.
It does this by providing a list of hashes, such that we can reconstruct the path from the element up to the root and derive a matching root hash.

In the example above we can prove K exists in the Merkle tree via Hk and the “sibling” hashes [HL, HIJ, HMNOP, HABCDEFGH]. By hashing these siblings together we can derive the path from HK up to the root and confirm the root matches that of our Merkle tree.

If we can do this for HK we know both that the value K was correct and that it exists at the specified leaf position.

We can also build a separate Merkle proof for a different leaf position, say HA.
This proof would contain the following sibling hashes - [HB, HCD, HEFGH, HIJKLMNOP].

Aggregate Merkle Proofs

So the question is can we aggregate these two Merkle proofs, to prove that both HA and HK are elements in the Merkle tree?

HA, Proof: [HB, HCD, HEFGH, HIJKLMNOP]
HK, Proof: [HL, HIJ, HMNOP, HABCDEFGH]

At first glance it appears unlikely because there is nothing obviously in common between these two proofs.
But on closer inspection - the pair (HABCDEFGH, HIJKLMNOP) are the two children of the root node. If we know both of these we can derive the root (as we do for a single Merkle proof).

We can derive HABCDEFGH by following the Merkle proof path up from HK.
Likewise we can derive HIJKLMNOP from the Merkle proof for HA.

So with appropriate handling of the derived hashes below the root I think we can represent this aggregate proof in a more compact way (simply omitting any pairs of child hashes that can be derived by the proofs themselves) -

HA, Proof: [HB, HCD, HEFGH]
HK, Proof: [HL, HIJ, HMNOP]

Given this set of sibling hashes we can still derive the root hash and verify it matches.

Q) Does this compact representation of an aggregate Merkle proof reduce the security of the proof(s)?

Can we prove that all leaf nodes beneath a given parent node are in a Merkle tree? And what would this aggregate proof look like?

i.e. Can we prove that [A, B, C, D, E, F, G, H] are all in the Merkle tree?

We would need [HA, HB, HC, HD, HE, HF, HG, HH] and from these we could derive HABCDEFGH. We would then only need the additional sibling hash HIJKLMNOP to verify the root hash was indeed correct.

[HA, HB, HC, HD, HE, HF, HG, HH], Proof: HIJKLMNOP

The aggregate proof here is simply the list of sibling hashes above the derived root of all the provided leaf positions. In this case simply a single hash one level beneath the root.
By proving HABCDEFGH is the sibling of HIJKLMNOP beneath the root we prove for all the leaves beneath HABCDEFGH.

This is significantly smaller than requiring a full Merkle proof for each leaf node.

Intuitively it seem to make sense - if we have all the leaves beneath a parent position we can prove that rehashing all the leaves will produce a matching parent hash.
If we had all the leaves in the full Merkle tree we would not need any additional Merkle proof data to verify that they were indeed the full set of leaf nodes in the tree. We could simply rebuild the full tree and inspect the root hash.

  • This appears to imply these aggregate proofs get more efficient, the more leaf nodes we are proving (which seems like a desirable property).

  • This also suggests these proofs can be aggregated in a non-interactive way. You don’t need to know anything about the Merkle tree or the proofs being aggregated beyond being able to identify pairs of siblings to remove.

And I don’t see how this would necessarily reduce the security of the proofs. If I can rebuild the tree and produce the same root (given a set of nodes at fixed leaf positions and a fixed Merkle tree size) then it would appear sufficient to prove all the leaves are actually leaves of said tree.

Is there a flaw here? And if this does work has it been described previously somewhere?
I cannot find anything in the literature that touches on this.

2 Likes

An aggregate Merkle proof for a set S of leaves requires hashes for all nodes s for which

  1. no element of S is in the subtree rooted at s
  2. some element of S is in the subtree rooted at s’ sibling

When |S| > 1, this saves at least 2 hashes over concatening individual merkle proofs for all leaves. There is no loss of proof security.

The savings are maximal when S consists of all leaves below some node v, in which case only hashes for siblings of v and its ancestors are needed.

This assumes the positions and hashes of all elements in S are given.

1 Like

There is an argument to be made about not doing aggregate merkle proofs: penalize clients; the more data they need to include in the transaction to prove their obsolete utxos, the more fees they will have to pay, hence, they are incentivized to not let utxos become obsolete, which in turn increases transactions and privacy in the network. The burden of storing these proofs is not in-chain anyway. Are you worried about network bandwidth?

@antioch

Yeah generally caching intermediate nodes for reuse while making proofs smaller has been discussed before in Ethereum, see Optimizing Merkle tree multi-queries - Data Structure - Ethereum Research

Ahh thanks @lehnberg!
I knew somebody must have discussed this before somewhere.

Yes @antioch, above was by Georgios Konstantopoulos. See the additional exchange between him and Oleg Andreev:

Oleg Andreev, [29 Apr 2019 at 21:12:48]:
i only had skimmed through the article, but this problem reminded me of another trick to aggregate merkle proofs: factor out all 32-byte hashes into a de-duplicated array, and for each merkle proof use short indices into that array

so you can decouple the proof-as-sequence-of-neighbours from actual hashes.

Georgios Konstantopoulos, [30 Apr 2019 at 06:40:33 (30/04/2019, 06:43:55)]:
I did that for Sparse Merkle trees, with the default leaf being the hash of 0. Prefix the proof with a bitfield indicating whether to use the default value (precomputed) or to pop an element from the proof.

Very neat optimization when there’s some structure in the Merkle proof, and the more values you can use from the bitfield, the smaller the proof - at a cost of the size of the bitfield number in the worst case

This was described here Plasma Cash with Sparse Merkle Trees, Bloom filters, and Probabilistic Transfers - Plasma - Ethereum Research when it first came to my attention. @oleganza is it an old trick? It had got me pretty excited given it’s simple and provides straightforward gains

It looks like Beam uses something they call “Multiproofs”, which looks like it’s exactly what you’re talking about.

Description: https://github.com/BeamMW/beam/wiki/Merkle-trees#multiproof
Implementation: https://github.com/BeamMW/beam/blob/master/core/merkle.h#L164-L230

2 Likes