Fine tuning Grin's Parallel Independend Block Download (PIBD)

Identifying bottlenecks in PIBD

I will use this topic as public notebook to log the results of experiments to fine tune parameters used in PIBD to reduce the sync time for grin rust node, and hopefully soon also for Grin++ nodes. I will modify and add new results to this post for each experiment.

My motivation for this little exploration is simple. When I tested the sync speed for grin rust this morning I was surprised how long to took to sync all chain data. I believe there many low hanging fruits, such as parameters that can be fine tuned without even a need to change any of the real messaging code for PIBD. For this post I will limit my exploration to Step 1 syncing the headers, since this step is not resource intensive while being responsible for >70 of the downloaded data, it is by fart the most easy and worthwhile to optimize sync step.

First observation, Step 1 Downloading block headers - accounts for 70% of all data downloaded and took around 3 hours to download 3.2 GB’s of stored data (ignoring overhead). That is ~300 KB/second of downloaded data speed which is rather underwhelming. From there on I started doing experiments changing some of the default parameters for PIBD to see which one’s are causing the bottlenecks and slow downloading speed.
Warning: These are just some exploratory experiments performed on particular hardware with a specific internet speed. Any finding here might require much more testing to see if they do not cause problems when performed with different hardware and internet speed. E.g. slow peers and mobile nodes might require their own specific set of parameters.

STEP 1: Experiment 1A - Increasing HEADER_BATCH_SIZE from 32 to 256, increased HEADER_IO_TIMEOUT from 5000ms to 20000ms.

This led to decrease in syncing time of approximately a factor 3x for Step 1: Syncing headers. With 32 headers per request, it took around 3 hours ~ 180 minutes to download all Header data. With an increase to 256 headers per request, the download time for the headers was reduced to ~55 minutes. Note that this step is not resource intensive, CPU <1%, around 15MB Ram usage.
Changing this parameter led to a reduction of 3X in sync time for step 1 which is promising.

STEP 1: Experiment 1B - Same as experiment 1, but increasing the HEADER_IO_TIMEOUT to 60000 ms:

Results appear a slight drop in download speed, therefore increasing timeout appears counter productive. Note that I switched to much more detailed logging which might reduce the speed a bit, also 1/8 nodes was Grin++ which does not support PIBD yet, meaning speed should drop by 12.5% (1/8*100).

STEP 1: Experiment 1C - Reducing the HEADER_IO_TIMEOUT again to the default of 5000 ms, increasing HEADER_BATCH_SIZE to 512

  • Time from requests to receiving is still low, e.g. 100-1000ms.
  • Some periodic warnings about headers refused by chain, not found, check this error. I did not notice this error much before, perhaps reducing HEADER_IO_TIMEOUT is the cause, should increase it again to 20000ms and test.
  • A small increase in resource consumption, CPU now peaking at times at 3%
  • Not faster despite request size increase, perhaps even slower than before. Increase HEADER_IO_TIMEOUT and test again.

STEP 1: Experiment 1D - Increasing the HEADER_IO_TIMEOUT to 30000 ms, increasing HEADER_BATCH_SIZE to 512

  • Does not appear to be faster

STEP 1: Experiment 1E - Same as 1D but increasing peers from 8-16 and blocking the specific peer that sends empty blocks

  • Surprisingly this appears to be slower, especially at the start. I think it is because nodes need to be filtered, some, like and old 5.0 node that sends empty blocks, need to be filtered out.
    Speed per peers appears to drop significantly, but it unclear why. I see quite a few 5.1 and 5.2 nodes, my guess is that they are not syncing well for some reason.

SUMMARY of findings

  • Step 1 is not resource intensive and accounts for 70% of block data, most worthwhile to optimize.
  • Best result so far is a 3X decrease in download time for this step.
  • Increasing the HEADER_BATCH_SIZE leads to significant reduction of download time but requires an increase of the HEADER_IO_TIMEOUT to avoid “header reused by chain …DB Not Found Error: BLOCK HEADER: XXXXXXXXXXXX” errors
  • Increasing HEADER_BATCH_SIZE beyond 256 does not lead to a decreased sync time on this specific hardware/network.
  • successfully hydrated (empty) block: 0003f4ac2f8b at 3175504 (v3), leads to block not found error. Investigate what these “empty blocks are”. Note, it is always one single peer that cause this error, investigate:
    20250204 15:44:18.815 DEBUG grin_servers::common::adapters - Received compact_block 00002059283d at 3175499 from **162.19.139.184:13414** [out/kern/kern_ids: 1/1/0] going to process.

Useful resources:

12 Likes

I continued my quest here to speed up initial sync for grin rust nodes. I had a look at Grin++

  • Observation 1, Grin++ is way faster in initial syncing (5x-10x) compared to grin rust
  • Grin++ separates the download and the processing/validations of headers, grin rust downloads and verifies them simultaneously. Perhaps there is a wait for verify before requesting a new batch of headers, I should check this.
  • Grin++ also uses a larger batch size, 128 headers per request, for grin rust optimum appears to be 256 headers
  • My Grin++ node is connected to 8 grin rust peers, meaning the speed in transfer is provided by grin rust nodes.
    • Conclusion, the bottleneck s is not in grin rust nodes replying to requests, but the slowness is in the request or processing which only occurs in grin rusts nodes, not in grin++.
    • The bottleneck must be in how rust nodes keep connections, e.g unneeded pinging, or processing of request. This is very surprising since grin rust use the async package with the network adapter, so it should be 10-100x faster than it is. There must be a conscious or accidental introduced bottleneck to the processing of header request. For example, to a limit introduced to protect grin nodes from these type of attacks:
      Potential P2P Bandwidth attack · Issue #3744 · mimblewimble/grin · GitHub
      or perhaps to avoid slow peers from going to 100% of the CPU capacity. Such a limit would make sense, but should not be be used in the header sync IMO or at least reduced since this step is not CPU intensive, also arguably having close to 100% CPU usage during initial sync would be acceptable.

@Yeastplume Is there a conscious limiter put on the header sync speed, or is there somewhere a real bottleneck that I should search for?

  • Important to remember, running grin rust node with high level of logging really slows down the node, so for bench-marking, logging should be set to Error, not to trace or Debug.
1 Like

Bottleneck found :raising_hands: :party_popper:.
There is a very, very simple cause for the difference between grin rust and Grin++, so simple that I initially did not look for it. Grin rust node tries to minimize the burden on your system, it has a very minimal memory foot print (10-20Mb ad very low CPUU usage while running header sync), it does so by only using only 4 threads for the connections you make + some sleep timers to even avoid max CPU usage on 4 cores.Grin++ on the other hand uses up to 50 threads with max core load. Quite the difference.
Conclusion of initial sync speed up quest:
Grin rust node should allow more connections, 1) by default, 2) by detection of the systems capabilities or 3) by adding a number_threads: field to grin-server.toml.

Follow up experiments:

  • I will experiment a bit if increasing the number of threads does not negatively impact system like raspberry pi.
  • In my experiment, I also decreased the timeout from 10 seconds to 1 second, I should check if this had influence on the test results.
  • I have to check if this change does not negatively effect other sync steps, such as the PIBD syncing that comes after the header sync.
  • In general find a balance between increasing speed and keeping grin’s footprint minimal and kind on all systems.
7 Likes