Mimblewimble CoinSwap proposal

Abstract

We present a coin shuffling proposal with the following properties:

  • Users submit self-spends throughout the day. No interaction needed for shuffling.
  • Shuffling is performed at the end of the day by a set of mixnodes that cannot steal any coins.
  • Invalid self-spends are automatically filtered out. No need to abort or restart the shuffling.
  • As long as at least one mixnode is honest, then no one learns the input output links.
  • The size of the shuffle is limited only by blocksize and could easily be over a thousand.
  • Each shuffle only grows the chainsize by a small constant (~100 byte per mixnode), thanks to MW cut-through.

Widespread use of the protocol would leave the transaction graph mostly obscured.

Introduction

This document proposes a possible design for a ``trustless’’ Mimblewimble CoinSwap service to which users submit self spend data throughout the day (or hour or any desired period) which are validated, aggregated, and published at the end of the day.

Coins swaps are useful for obfuscating the Mimblewimble transaction graph. They can be applied to self spending outputs that were recently received, or to self-spends resulting from canceled/aborted transactions. The lack of urgency of such self spends lends itself well to the slow nature (and possible failure) of coinswaps.

Whereas self spends normally reveal their input-output link publicly in every mempool on the network, use of a CoinSwap service obscures this link from public view. If one further trusts at least one service node to be honest, then the link is obscured from the nodes’ view as well. In this sense little trust is needed to benefit from the service. If all nodes collude, they not only learn the input output links, but could also choose to revert the entire coinswap after it confirms on-chain. Which is still a manageable problem for wallets, even if very unlikly.

This document is not concerned with aggregation of regular transactions, or what could be called a CoinJoin service. Consistent use of coinswaps on outputs received in regular transactions should make coinjoins mostly redundant, although it does come at a roughly doubling of fees.

This design follows the Grin forum discussion at Daily Aggregator (Partial Transactions) - #3 by antioch and combines ideas of users antioch, oryhp, vegycslol, and myself.

We first present the general design, and then discuss several refinements dealing with practical issues such as timing attacks, spam attacks and ownership proofs.

The CoinSwap protocol

MWixnet Architecture

The protocol is similar to that of mixnets, but instead of mixing messages from senders to receivers, we mix self spends from coin owners to themselves. Whereas the messages in a mixnet do not change, our mixnodes transform commitments into new ones, and then achieve mixing by sorting the results.

Let there be m self spends spend 1 … spend m, and n mixnodes node 1 … node n. We let subscript i range over spends, and subscript j range over nodes.

Let xi,j be the randomly generated excess by which node j should additively transform the commitment in spend i: Ci,j = Ci,j-1 + xi,j * G.

Conceptually Ci,0 and Ci,n are the input and output of spend i, except for a necessary fee adjustment:

  • Ci,0 = Ciin - fee * H
  • Ci,n = Ci,0 + (Σj xi,j) * G = Ciout

The excesses form a matrix such as this one for 4 self spends and 3 servers:

x1,1 x1,2 x1,3

x2,1 x2,2 x2,3

x3,1 x3,2 x3,3

x4,1 x4,2 x4,3

which correspond to the linkages in this matrix C of commitments:

C1,0 <–> C1,1 <–> C1,2 <–> C1,3

C2,0 <–> C2,1 <–> C2,2 <–> C2,3

C3,0 <–> C3,1 <–> C3,2 <–> C3,3

C4,0 <–> C4,1 <–> C4,2 <–> C4,3

Each node receives a sorted column, not knowing the i indices shown, transforms each commitment by some excess specified for that commitment, sorts the results, and passes on the new column to the next node.

Data provision

Node 1 is provided, for each i, with with a tuple datai,1 containing:

  • commitment Ciin
  • a proof of ownership of Ciin
  • excess xi,1

Node n is provided, for each i, with with a tuple datai,n containing:

  • commitment Ci,n-1
  • corresponding range proof BPi
  • excess share xi,n

Each other node j is provided, for each i, with a tuple datai,j containing:

  • commitment Ci,j-1
  • excess share xi,j

Input validation

Validation starts with node 1 computing, for each i, Ci,0 = Ciin - fee * H. It also verifies that each input commitment is unique, present in the UTXO set, and has a valid proof of ownership. All invalid data is removed (resulting in a smaller m).

Output derivation

Then for each j from 1 to n in turn, node j computes, for each i, Ci,j = Ci,j-1 + xi,j * G, and if j<n, sends the ordered set of m commitments Ci,j to its neighbour node j+1.
Any commitments received by a node that are not among its tuples are invalid and get removed.
Also, any non-unique Ci,j are similarly invalid and removed. This will allow correct determination of validity in the later kernel derivation stage.

Output validation

Node n removes all received commitments that don’t have valid range proofs.
Let Out be the set of remaining commitments.

Kernel derivation

Then for each j from n to 1 in turn, node j computes kernel Kj for a public excess of G * Σvalid xi,j, and if j>1, sends Out, the set of n+1-j kernels, and the set of valid commitments Ci,j-1 to its neighbour node j-1.
This maintains an invariant that ΣC ∈ Out C - Σvalid Ci,j = Σk>j Kk.

Aggregation

Finally, node 1 ends up with a set of valid C0,j that by invariance differs from the Out sum by all n kernels.
It constructs the final coinswap transaction CS from the set of valid outputs Ciout in Out, the set of valid inputs Ciin, the n kernels Kj, and an extra input and/or output to collect the leftover fees (or pay missing fees) as appropriate.

Once confirmed, the coinswap is irreversible unless all n nodes collude.

Practical Issues

Data provision

The data provision step above suggests that users send data directly to n different nodes. While possible, this is not the most practical.
The following improvement is inspired by TOR, The Onion Router.
Let’s assume that each node j has a known public key PKj, and denote by ENC(PK, D) the pair <DHPK, E> where DHPK is a temporary public key generated by Diffie-Hellman for PK, and E is the symmetric key encryption of data D with the shared secret as key.
Then for each i we can recursively define an onion bundle OBi,j for nodes j…n as follows:

  • OBi,n = ENC(PKn, datai,n)
  • OBi,j = ENC(PKj, <datai,j, OBi,j+1>)

Now a user only needs to send OBi,1 to node 1, which can decrypt to obtain both <datai,1 and OBi,2. In the output derivation stage, each node j decrypts each OBi,j, checks that the received Ci,j-1 matches the one in datai,j (removing mismatches), computes Ci,j, sorts all OBi,j+1 by Ci,j, and passes those on to its neighbor node j+1.

This onion bundling not only prevents timing attacks on users whose spend data submisions to different nodes are correlated in time, but also allows them to send their data to a single node, possibly as soon as they receive the output they want to self spend.

Fee handling (Grin)

In a more realistic setting, each node pays itself a mixfee with a 1-input 1-output transaction replacing $K_j$, where Σvalid xi,j can instead by added to the transaction offset.
The contributed fees should then match the coinswap relay fee plus n mixfees. In Grin this gives the identity (with FB being the fee base of half a millicent):

|Out| * fee = (|Out| + n) * (1 + 21) * FB + n * 3 * FB + n * mixfee, or
mixfee = |Out| * (fee - 22 * FB) / n - 25 * FB.

For simplicity, we could require a fee of (22 + n) * FB, resulting in a mixfee of (|Out| - 25) * FB. In that case, for a mixnode to earn 10 grin-cents a day requires 225 daily self spends.

Spam attacks

A deluge of bogus onion bundles could fill up node 1’s memory buffers long before the day is done. This is mostly remedied by doing immediate input validation on every new bundle received by node 1. At the end of the day, we’d still redo the UTXO membership checks since self spend inputs could be spent during the day.

Thanks to the proofs of ownership, the number of self spends of any one attacker, surviving input validation, is limited by the number of outputs they own.
Having enough buffers to cover the entire UTXO set is one solution.

If after input validation at the end of the day, more spends remain than fit in a block (about 40000/22 = 1818) then two options are available:

  • run a modified version of the above protocol that merely filters out all invalid data. Any invalidly spent input can be banned from future coinswapping.
  • randomly partition them into equal sized (under 1818) subsets, and run coinswap on each in turn.

A proofs of ownership of commitment C can take different forms.

  • a rangeproof on C + G
  • a rangeproof on C, that differs from the one on-chain
  • if possible, a compact encoding of a small tweak of the on-chain rangeproof

Bandwidth Optimization for n=2

In the case of only 2 non-colluding nodes, we can do away with the (relatively large) proofs of ownership, and reverse the flow of data. That is, the onions are constructed with the (much smaller) node 1 data inside the bundle for node 2. Node 2 starts with output validation and sends transformed valid commitments back to node 1, which then does input validation on its transformed commitments. Kernel derivation then proceeds back to node 2. In this case it’s possible to handle spam with intermediate filtering rounds. Even though both nodes determine valid sets of inputs and outputs, they would require collusion to link them together. One downside is that node 2 can try to correlate submission order with the order that final coinswap inputs appeared on-chain.

Other Bandwidth Optimization

Node n could send the large Out set directly to node 1, reducing bandwidth through intermediate nodes on the backward pass by about half, but at the cost of those intermediate nodes no longer being able to check the invariant.

Reorg protection

A deep reorg is likely to invalidate a big coinswap transaction. In order to be able to still redo most of the contained self spends, node 1 could remember all valid input onion bundles for the past few days, and reprocess any that were undone in a reorg.

21 Likes

For people who don’t understand technology, it’s not easy to understand. Can someone give a brief overview?

When you send your output from tx1 through a coinswap service before spending it in tx2, nobody can link these two tx together except with a small 1/N probability where N is the number of other coinswappers that day.

So it’s like getting a size N anonimity set, where N is limited to the max number of self spends in a block (1818).

Suffice to say, this is pretty awesome!

7 Likes

If this plan can be realized, it will be great.

What impact will this have on Avg tx size?

No impact on the tx size as regular transactions stay exactly the same. What changes is that the outputs you get from a regular transaction can be sent to the coinswap service. When the coinswap service broadcasts the aggregated transaction, the new outputs should have a relatively high anonymity set.

This service comes at some fee cost, but that’s free today and will be for quite some time. It also seems preferable compared to a coinjoin service where the transaction needs to wait to be joined before it is published. This gets rid of the waiting as the actual tx can go on the chain directly to get confirmed sooner, but you pay some small fees for the self-spend coinswap.

1 Like

I’m not sure what’s better, the CoinSwap proposal, or this phrase. I like both.

That’s what I thought, but it sounded too good to be true.

Can we estimate the max size of the anonymity set based on current usage?

Can we estimate the max size of the anonymity set based on current usage?

It would be nice if some explorer could show number of non-coinbase txs per day (we know there are 1440 coinbase ones), but I’m not aware of any.

Doc couple of questions.
“nobody can link these two tx together except with a small 1/N probability where N is the number of other coinswappers that day”

Does the coinswap broadcast all of the transactions to the blockchain at the end of the day if a user participates in the coinswap? In other words the transactions get swapped in the memepool until some set time.

If so then the coinswap will take a day to finalize or could it be done at two times a day or three?

Could it just have a minimum number of transactions before it is broadcast to the chain from the memepool like after 100/N or maybe a smaller number like 20/N instead of once daily?

Does grin need to hardfork to implement the coinswap?

Coinswap broadcasts exactly 1 transaction at the end of the day (unless there are too many txa but to get the idea you can ignore such case). This transaction is composed of only self-spend encrypted transactions (together all coinswap nodes can decrypt it, this happens at the end of the day). Those encrypted transactions are never broadcasted to the other nodes by themselves, only as part of the coinswap merged transaction, so nothing changes in the mempools. If there are enough transactions coinswap could run every hour instead of once per day.

Not really because then it’s not secure (eg. Let’s say you want to stop mixing at 20 txs. When i see coinswap broadcast a tx then i can send 19 selfspends to it and i would be able to link the next (20th) tx that would be sent to the coinswap).

No, it’s an offchain solution of how to build a new valid transaction, wallets would need to change though

An interesting debate here will be if we default to sending to coinswap service:) should every wallet send to the coinswap? I’m leaning towards yes if we figure out it would work out well simply because privacy by default is what makes it fungible. We don’t want exchanges blacklisting coins that send to the coinswap or treat the users that do as higher risk.

2 Likes

I wrote a script to parse data from grinscan.net because it also shows also whether an output in a block has already been spent. Below is data on outputs and not txs since we are interested in the number of outputs for the coinswaps.

In this block range 2902 outputs were created out of which 736 were spent

This was for height in range(1078419-1440, 1078419). Now if we subtract 1440 outputs because I guess we had that many coinbase outputs, we get 1462 outputs. If every non-coinbase created output went to coinswap, we would send 1462 outputs to the coinswap service in this block range. However, since 736 of these were spent, we also need to subtract 736 because these outputs won’t pass the validation on the coinswap service because they have been spent in the meantime, so we get 1462-736=726. It’s kinda late so it’s quite possible I made a mistake somewhere.

The script I used is in this gist

3 Likes

I thought we should have this paper that @tromp posted in keybase because i think the general ideas are the same and it is very well written.

2 Likes

Also mentioned in bitcointalk Coinshuffle thread at

https://bitcointalk.org/index.php?topic=567625.msg56288711#msg56288711

3 Likes

Is the explanation in this article correct to describe “Coinswap” for a layman like me?

1 Like

No, MW CoinSwap is only superficially similar to Bitcoin CoinSwap, in that both look like self-spends to the users that end up getting unlinked. So those articles explain something which is quite different under the hood.

3 Likes

hi @tromp, a few questions:

  • what is the benefit of the swap nodes contributing random excess to the tx?
  • how are non-signing swap nodes able to malleate txs with extra randomness?
    • does the original sender need to sign after the swap is “complete”?
  • could similar goals be achieved by adding coinswap rules to all nodes?
    • something like: “X tx marked for coinswap, keep in coinswap pool for M blocks, then aggregate”
    • all txs in the “coinswap pool” could also go through rounds of propagation (via Dandelion?)
    • some metadata could be attached for how many propagation rounds a tx has gone through
    • only txs on their last propagation round get aggregated

Hopefully, the questions and ideas make sense (still learning inner workings of Grin/MW). What are your thoughts?

You mean the mixnodes? And what random excess are you referring to?

By txs, do you mean the user’s self-spend data? Mixnodes can’t change the input, excesses, or output of anyone.

No; as the proposal makes clear, the users just submit data. “No interaction needed for shuffling.”

That defeats the entire purpose of the coinswap protocol, since all i/o links are publicly visible in the coinswap pool.

Yes, the mixnodes, sorry for terminology mixup. Referring to the randomness mentioned here:

Whereas the messages in a mixnet do not change, our mixnodes transform commitments into new ones
...
Let xi,j be the randomly generated excess by which node j should additively transform the commitment in spend i: Ci,j = Ci,j-1 + xi,j * G.

I was confused by the snippet in the previous quote, it makes it sound like the mixnodes are adding random excesses to commitments. What is the point of the mixnodes if they don’t change the inputs / commitments in any way?

I thought they would need to sign again, because I understood your proposal to include random excesses from mixnodes. Apparently, that was a mistaken understanding.

All i/o links are still observable by mixnode 1, or am I wrong? My suggestion was to decentralize it to all participating nodes, but maybe that is just the Dandelion++ protocol.

If the mixnodes don’t mutate the inputs / commitments in any way, is their purpose just to perform aggregation / cut-through on accumulated txs in a given time period? If not, is there a one or two sentence explanation for how the mixnodes obfuscate the transaction graph?

Thanks for your replies