A trustless aggregator based on scalable BFT protocol

Hi all!

Recently I have been thinking of the trustless aggregator problem and I came up with an idea.

Briefly, the idea consists of using a scalable BFT (Byzantine Fault Tolerant) consensus protocol in which a set of nodes executes a state machine and rest of them replicates this state. The solution is not very Mimblewimble related as BFT protocols are meant to be more general, the notion of computed state is well separated from the protocol. It has both advantages and disadvantages.

I have wrapped it up into an article on my website, available under the link here below

In the article I consider

  1. Making the computing cluster more inclusive by introducing an additional larger set of nodes which do not need to replicate the state - a candidate pool
  2. Simple protection against Sybil attacks by making grin deposits allowing to join the cluster until certain block height is reached. This is flexible, more restrictive requirements can be defined.
  3. An abstract state machine which aggregates transactions and can be executed by the BFT cluster of nodes.
  4. Brief implementation plan which is compatible with scrum philosophy. Starts cheap and compatible with the daily aggregator and each iteration delivers value to the users.
  5. If ever implemented, by defining different abstract machine (different than for aggregating transactions) along with scriptless scripts could be used to build decentralized applications on top of Grin, such as bounty services, funding services et cetera.
  6. A brief probabilistic analysis of difficulty of performing a Sybil attack. If this idea gains interest I would be willing to develop a more elaborate model considering expected number of view changes to perform an attack and thus also estimating the costs of breaking the consensus.
  7. A rough estimate of how many of such clusters are required to reach the theoretical average limit of Grin transactions per block.
  8. A (very rough) estimate indicates this protocol is low memory impact. It appears that state is quite small as only aggregated transactions for next X hours need to be stored (say one per hour?) and the pool participants signatures. Perhaps it could even run on older generation smartphones?

I open a new thread because I value your feedback. I would highly appreciate a peer review and I will welcome all input (including criticism of course!). I will update my website based on your messages.

Let me know if you like it!

6 Likes

Great writeup! Since I’m not familiar at all with BFT and have realized it would take me a long time to understand your proposal i would like to ask a few questions (assume txs is 2-2):

  1. if i send tx to aggregator, who sees my inputs and outputs? (to who do i send my tx or parts of it?)
  2. how many bad actors are required for the aggregator to not work as expected (i did see the formula, but am not sure i understand it correctly so if you can give an example)?
  3. what’s the minimal number of bad actors required to find out the link between inputs and outputs?
1 Like

Thank you @vegycslol !

In regular scalable BFT the network of size n has a subset of nodes called the root committee of size c which is supposed to be small compared to n. In my modification I pose a one more set much larger than n called the candidate pool, let us assume the candidate pool is of size n^2. So we have c << n << n^2.

The root committee nodes produce a new state. The remaining n-c nodes are called the verifying nodes and they are only in charge of validating and replicating the state proposed by the root committee. In the picture I propose the remaining n^2 nodes in the candidate pool are just candidates to join the network in case if the replacement is needed.

The experiment described in the paper by Jalalzai et al. numbers used are the following: c=36 and n=200. For the purpose of the example lets imagine the candidate pool has N=10,000 nodes waiting to join (note difference between capital and lowercase symbol) and f of those nodes are malicious nodes implanted by the attacker.

  1. New state is proposed only by the root committee, initiated by a single node within it called a “primary node” (which also changes randomly). The new tx to be aggregated can be encrypted using the primary node key and broacasted in the network following the Dandelion protocol to hide the origin IP. Once the encrypted tx reaches the primary node it will aggregate it, sync within the c nodes of the root committee and then validated by remaining n-c nodes.
  2. That is a good question. There are multiple possible disruptions: preventing consensus from being reached and delaying the processing, leaking information that was supposed to be secure (unaggregated transactions) and killing entire cluster. Let me address them one by one at the end of the response as it will be a bit longer.
  3. I assume that in order to link inputs and outputs you need to leak the unaggregated transactions (correct me if I’m wrong). It is hard to tell explicitly how many you need because answer to that question is probabilistic. Let’s say only the primary node knows the raw transactions and only for fixed and limited amount of time before it gets re-elected randomly so to leak out this information you need your malicious node to become the primary node via the random election process within the cluster. We established some numbers earlier, the probability of your malicious agent implanted in the candidate pool becoming the primary node should be P=(1/c)*(1/(n-c))*(f/N). Imagine you put f=100 malicious agents in the candidate pool, given that c=36, n=200 and N=10,000 the probability of one of them becoming the primary node would be P=(1/36)*(1/(200-36))*(100/10000)=0.000001693766938. If this idea gets good feedback I can make better models, such as estimating the expected number of block heights given a number of malicious nodes in the candidates pool etc. Those might answer your question more explicitly. The plot in my article is the probability of replacing entire committee in terms of number of malicious nodes.

Now regarding the scenarios of the disruption of the aggregator, lets address them one by one

  1. Preventing consensus from being reached and delaying the processing, this requires c/2 malicious nodes in the root committee, so in our little example you would need to implant 18 of them (and you could see in the point 3 the probability of introducing one of them). If you managed to prevent the consensus from being reached the loss is only in the transactions waiting to be aggregated. The overal state gets preserved and new root committee gets elected. As long as it does not happen often it is not a problem as it only affects the performance.
  2. Leaking information that was supposed to be secure (unaggregated transactions), for that you only need a single node in the root commitee, but this node has to be the primary node. If you put f=100 malicious actors in the candidates pool, in our little example the probability of one of them reaching becoming the primary node is 0.000001693766938.
  3. Killing entire cluster. It is possible to literally destroy the network. State is replicated only by n nodes as N nodes in the candidates pool do not replicate it. If you implant c/2 nodes in the root committee to prevent consensus from being reached (which is 18 nodes in our example) and then among the replica nodes you would need f/2-c/2 malicious nodes implanted to prevent the new honest root committee to be formed (that would be 82 nodes in our example). So for the case of c=36 and n=200 to destroy the aggregator you need 100 malicious nodes and 18 of them must be in the root committee. If that occurs consensus could not be reached, there would either be no new root committee or it would be purposely dominated by the malicious nodes via manipulated protocol. To my understanding such scenario is not repairable. In such case a new BFT cluster has to be formed from scratch.

In the end I would like to make few important comments:

  1. Apart from disrupting the cluster, leaking information and affecting the performance potential attacker has nothing to gain. It is not like (hypothetically) hacking an actual Grin network in which you could produce blocks, double spend etc.
  2. Even without the PoW, it would not be easy to produce those Sybil nodes. Imagine joining candidates pool for 24h requires a refundable deposit in Grin of equivalence of $1,000.00. Implanting f=100 faulty nodes would require you to borrow $100,000.00 from someone for one day and it would only give you probability of 0.000001693766938 that one of them would become the primary node.
  3. I am not fluent with MW theory, I just assume that once aggregated it is safe to broadcast (of course the more txs aggregated then safer, so the later ones would benefit more from the aggregation improved privacy), also I assume it is possible to validate the aggregated transaction without knowing the tx (as only thing to check is the zero sum and matching signature). Please correct me if I’m wrong here.

I hope it helps. I’m also quite new to BFT, so we are actually learning together. While writing it I’m still before my first coffee so please let me know if the calculations don’t make sense so that I can review and fix them.

Thanks for the explanation, the idea is way more clear now :slight_smile:

It is safe to broadcast, but the tx can still become invalid, eg. An attacker sends his tx [i1] → [o1, o2] to the aggregator. When the aggregated tx is broadcasted, he creates new tx [i1] → [o3] for which he pays high fees. This way, the miners will pick it up and the aggregated tx would become invalid because one of its inputs would already be spent (similarly he could create [i3] → [o1] to create the same output as in the aggregated tx to make the aggregated tx invalid since you can’t have duplicate outputs in mw).

You need to also verify that each input exists in the utxo set and verify the rangeproof of each input.

Then there’s also a problem of “what to do when a reorg happens where one of aggregated inputs was double spent” → eg can the nodes recreate the old aggregated tx without this double spent tx (and this possibly means recreate multiple chained aggregated tx)
I’m not a math guy, but the calculations seem correct. How often does the primary node swap occur (eg in one aggregation)?

I think I see your example. I would like to think about this scenario, could you point me to some resources how the funds locking mechanism works currently? Perhaps it would be possible to lock the outputs while they are being aggregated (really thinking out loud here, need to sleep on that).

Maybe it could be done by the grin nodes as those are the ones storing the entire chain? In similar way as payment proof verification work currently. There is no risk the network would be flooded with requests as only the root committee nodes would need to verify it.

Hmmm not sure what do you mean by reorg, but the double spending attempt is a good point. I did not consider that entire aggregate would become trash in such case, I assumed that users could use the double spending attempts for cancelling the transactions while they are being aggregated.

It needs to be done before aggregating two txs, otherwise the aggregated tx might be invalid, at least i don’t see any other way of doing it

I meant a 51% which performs a doublespend due to a reorg. This has happened on multiple chains, here’s some blog post about one happening on the ETC chain https://blog.coinbase.com/ethereum-classic-etc-is-currently-being-51-attacked-33be13ce32de

1 Like

I will think about some sort of zk-SNARK of existence of input that would need to be provided by the sender to aggregator without which aggregation request would be rejected.

Regarding the reorg, I think I see what you mean now, let me comment on the original post.

I think introducing a rule in the consensus that input that hasn’t been confirmed by at least certain number of blocks will not be accepted by the aggregator. Perhaps it would be possible to deliver a zk-SNARK with the transaction which proves two things:

  1. Input exists
  2. Input is old enough to not worry about reorg

This would of course make entire process longer from the user perspective as freshly added inputs could not be spent immediately, but I still think that users who sign up for aggregation over long period of time would value their privacy more than time.

If transaction gets purposely accepted by Byzantine committee (very low probability event) and verifiers are also Byzantine and they don’t reject it (due to invalid zk-SNARK I mentioned above) then actually we are in the situation in which entire BFT is corrupted and state is invalid. Such state is not repairable and should aggregates resulting from it should not be accepted by the miners. I am not yet sure how to detect that, but I will sleep on it and try to come up with something more formal.

2 Likes