Objective-Dandelion

I suspect being able to deterministically figure out who’s most likely to fluff opens up for a wide range of attacks.

For example, if I know all the relay_weights of honest nodes and observe the network, and a node with a relay_weight in the 50th percentile fluffs, then I know for a fact that only nodes with a relay_weight less than that could have been stemming to this node and can exclude half my dataset immediately and work further on narrowing the list of suspects down.

Here’s another one: If I’m running malicious supernodes that connect to most peers on the network, what’s to stop me from setting artificially high relay_weights on my nodes? Wouldn’t this mean that the honest nodes that are connected to my malicious nodes end up forwarding their own transactions to me immediately, thus exposing themselves directly?

That’s definitely a valid criticism. As is, the proposal is a tradeoff between transaction origination privacy and transaction graph privacy, so depending on which we value more highly, it may or may not be worth it in its current form.

My hunch is that transaction graph privacy is more valuable, for a few reasons:

Transaction graph information

  • Requires running only one archival node
  • Comes in the form of kernels, inputs, and outputs which irrefutably prove the topology of part of the transaction graph
  • Since it is objective, multiple parties who don’t trust each other can aggregate data to form a more complete picture (i.e. if party A sees some kernels/inputs/outputs, party B sees some others, they can join them together into a graph without needing to believe that the other is honest)
  • Used in the majority of real-world blockchain forensics and deanonymization attacks

Transaction origination information

  • Requires running large numbers of nodes, nonstandard software, and some sophistication to collect
  • Comes in the form of “I saw this node send this transaction, so I believe that node originated the transaction”, and not objective, irrefutable evidence that can be presented to a third party (i.e. kernels + inputs + outputs)
  • Since information is not objective, you must trust the source of the information. (i.e. I can lie to you about which node I saw stem which transaction)
  • Sees limited use as an attack vector (I’m sure there are examples of this that I’m missing. They probably come in the form of targeted attacks on exchanges or large coin holders.)

Adding objective dandelion as a second phase after initial stemming would remove the tradeoff between these two sources of privacy loss, at the cost of additional complexity and an increase in relay time.

Maybe it would be better to call it something other than Objective-Dandelion to emphasize that it isn’t necessarily an alternative to it.

The proposal contains a provision that prevents this:

So you can pick a node ID that gives a high relay weight, but other nodes will ignore it until the next epoch, at which point the relay weight will be scrambled.

Say we simplify the various timers with something along the lines of -

Then nodes generate ids via H(ip_address|block_hash) every n blocks (say 10 for simplicity).

Each node relays stem transactions to the outbound peer with lowest id. The analogy here is stem transactions always flow downhill toward minimum peer ids.

Now instead of deciding if we are stem/fluff randomly - we decide we fluff if our node has the lowest id vs all our outbound peers. i.e. If we are a local minima we fluff.

Stem txs will always flow toward peers with lowest ids (sinks).
Peers with lowest ids aggregate and fluff transactions.

The epoch id is simply the block hash every n blocks (every 10th block).
Nobody gets any advantage attempting to game the system with node ids.
Everybody on the network generates new (but deterministic and global) ids for their outbound peers based on peer ip address and latest epoch id.

Maybe we still maintain some level of randomness to keep stem paths randomized and not strictly “flowing downhill” - relay to lowest id 90% of the time and a random outbound node 10% of the time.

And if the tx is our tx then we always relay to our lowest id peer, regardless of whether we are the current local minima.

Edit: I actually thought the “we aggregate and fluff if we are the local minima” was a new idea here. But the original post here actually describes this first… (apologies to @rodarmor).

And “heaviest” (aka largest) ids makes more sense here in that these are sinks and stem txs flow downhill toward these.

2 Likes

Thanks for revisiting this discussion!

Naive question:

Then nodes generate ids via H(ip_address|block_hash) every n blocks (say 10 for simplicity).

Is this gameable? I.e. would malicious nodes be able to manipulate their id to get a very “heavy” ID by somehow spoofing their IP address?

I suppose it could be mitigated the same way as @rodarmor proposes above (em mine):

  • Since the epoch ID is known by all, a node could pick node IDs that had high relay weight during that epoch. To provent this, if a node connects to a new peer it should use a randomly assigned peer ID for the current epoch.

We should be ok. They would have to manipulate their IPs every 10 minutes, and then reconnect. If they were able to do this, we’d have bigger problems (banning would be ineffective, for example)

There are additional improvements we could make that would make it even harder to manipulate. We could split the IP address in half, and hash with the first 2 octets and then hash with the last 2 octets separately. Then the hash with first 2 becomes the primary ID (higher priority), and the hash of last 2 just sort of become a tiebreaker. This makes it so gaming the system would require having IP addresses in several regions.

We’re already only considering outbound peer connections as stem relays. So you’d need to not just reconnect with your carefully chosen ip address but other nodes would need to actively reach out and connect to you (on your new ip address).

Maybe we can take a more hybrid approach that maintains longer stem paths (better anonymity via Dandelion itself) while still having txs “flow downhill” to give us good opportunity for aggregation.

Rather than choosing the heaviest peer to relay stem txs to we simply choose a random heavier peer for that epoch. At each hop along the stem path txs flow downhill but they take a more random path to get to one of the final “sink” nodes for aggregation and eventual broadcast.

Intuitively at least it feels like this more random path gives transactions a better chance of finding a more global maximum and less chance to get stuck hitting a local maximum. Which would increase aggregation effectiveness.

This also has the nice benefit of paths never crossing and preventing cycles in the stem paths. Txs will always end up at an aggregation/fluff/broadcast node eventually.

We’re already only considering outbound peer connections as stem relays. So you’d need to not just reconnect with your carefully chosen ip address but other nodes would need to actively reach out and connect to you (on your new ip address).

Ah, that’s right, and yes, that makes it even harder to game.

Rather than choosing the heaviest peer to relay stem txs to we simply choose a random heavier peer for that epoch. At each hop along the stem path txs flow downhill but they take a more random path to get to one of the final “sink” nodes for aggregation and eventual broadcast.

So in a given epoch, with this approach, it will take more “stem hops” to reach the “final sink” than in the previous proposal, but the path to reach the sink would be less deterministic, and could potentially lead to better overall degree aggregation, is that right?

This also has the nice benefit of paths never crossing and preventing cycles in the stem paths. Txs will always end up at an aggregation/fluff/broadcast node eventually.

That’s a very nice benefit that I had not thought of.


In a construction such as this one, where the sink weights are determined using publicly available data (IP + block hash), a global adversary observing the entire p2p network would be able to calculate the weights for the entire network, thus also understand the likelihood of flows of stem transactions. If they are able to monitor the communication, would they also be able to determine how the nodes are connected to each other by studying these flows?

What’s the impact of that? Are we concerned about this helping to build up a topology of the network? Does it matter if network observers and participants can determine where the sinks will end up being during a given epoch?

I’m not sure to be honest.
You could do the same thing with the current Dandelion++ impl, knowing that stem txs can flow randomly from any node to an adjacent outbound peer.

With this weighting approach the only additional information would be that txs flow from lighter nodes to heavier nodes (so some subset of outbound peers for a given epoch).

You could do the same thing with the current Dandelion++ impl, knowing that stem txs can flow randomly from any node to an adjacent outbound peer.

Yes, you are right, this is already possible, but the impact today might be less. I.e. today, if I understand the topology of how nodes are connected, then I can only assign a probability that a certain node will be stemming or fluffing (based on DANDELION_STEM_PROBABILITY).

On the other hand, with this proposal, if I understand the topology, then in every epoch I will know which node will be fluffing, and which nodes most certainly will not be fluffing. Granted, your proposal of randomly choosing a heavier weighted node to stem to mitigates this somewhat. I don’t know how much or what the implications of this are.


All in all, I personally like the ideas and directions here, combined with what’s proposed in Dandelion++ Thoughts on Timers/Scheduling · Issue #2880 · mimblewimble/grin · GitHub of using blocks as timers.

We’re are already making considerable deviations from the D++ spec today in our implementation, which leads to some attacks being possible.

The direction of this proposal allows nodes (and passive adversaries) to deterministically understand where fluffing will occur on the network and in which direction transactions will flow during an epoch. It’s not clear to me how these changes would lead to particularly worse privacy than what we have today. I am not convinced we today are able to adequately protect against an adversary that monitors the entire p2p network in any case. The proposed changes do not impact this as far as I can tell.

I do see however that with this proposal, our design would become more simpler and elegant, less error-prone, and lead to increased pre-fluff aggregation. Which is good.

It would be nice to get other people’s thoughts on the matter.

@rodarmor do you have thoughts or anything to add? It was your ideas above that originally put us on this path and inspired some of the thinking after all. For example, what do you think of @antioch’s suggestion to use

H(ip_address|block_hash) every n blocks

over your proposed

relay_weight = blake2b(node_id || epoch_id)

He and I did discuss this a bit when I proposed using ip address a few months ago. See here:

1 Like

Thanks for that @david. Apologies, I didn’t recall that the IP address approach had been originally proposed by you. So the attribution of that idea should go to you. :v:

No worries. Several of us have independently designed systems like Casey’s, each with various advantages and disadvantages. I’m not too worried about credit personally, but it would be great if we could all work together (like we’re doing) to make sure the best of our ideas make it into the final design.

On that note, we’re already adding an additional (powerful) tool that we can probably now toss into the mix: TOR. Hidden services could be advertised by nodes that can possibly be used to submit txs and maybe even eliminate some of the more inefficient parts of Dandelion.

1 Like

I’m definitely not picky about attribution or credit! I’m just happy if blockchain analysis becomes much more difficult :slight_smile:

I think using IP addresses and block hashes instead of introducing new concepts is definitely better, and I don’t see any downsides.

I like the idea of picking a random heavier peer. But, I don’t have a strong or well-reasoned opinion. I think that design choices like these are amenable to simulation, i.e. simulating nodes, connections, and transaction flows under different relay policies. I wrote a simulator to draw the graphs in the original post, but it’s disastrously messy code, so I wasn’t going to post it unless someone was really interested.

Regarding a potential privacy trade off vs vanilla dandelion, i.e that any kind of non-random behavior can be predicted by an attacker, it could be useful to think about this as an intermediate third state, in addition to stem and fluff, that a transaction can be in. So a transaction is first stemmed, using the purely random vanilla dandelion algorithm. At some point a node flips a coin and decides that it should enter the aggregation phase, where it follows non-random rules designed to maximize aggregation, and then at some point the now hopefully aggregated transaction bundle gets fluffed.

Of course, the introduction of a third state increases complexity with an uncertain gain.

1 Like

What kind of Casey, can I learn more?

This kind of Casey, it’s my first name. I think David is referring to my original post in this thread.

1 Like

@lehnberg Do you mind creating a new keybase channel on the node team, similar to what was created for tx-building? I think that would be a wise place to continue this discussion.

1 Like

Done: #tx-propagation

3 Likes

Reviving this old thread.

I think I found something that could be useful for the Objective-Dandelion. The document shows how to build a path (chain) where each vertex receives a transaction (which can be validated) and replaces the whole output set with new outputs - this effectively unlinks inputs from outputs at each step and is more effective when we have aggregation (replacing all outputs in a 1-2 tx doesn’t really do much). Objective-Dandelion seems appropriate here because it clearly defines a path from a starting vertex to a fluff vertex and the path has the property that the closer a vertex is to the fluff vertex, the more likely an aggregation will happen.

The document first describes the general idea of such a ‘cleansing chain’. There is a section with how a variant of this could be used in Objective Dandelion.

If you use hash(latest_block | IP) how do you deal with NAT and TOR?
Edit: ah i see, you don’t care about these duplicates