Paralel Initial Block Download PIBD TESTING

Testing on Windows 10, four runs in total.

I encountered a forced shutdown at step 4/7. Also some observations on speed in different steps, where it can be possible improved

Step 1) Downloading headers 0-2 MB/s, seed fluctuates. Sometimes drops to 0 kb/seconds (I monitored a graph of the internet speed). Based on disk usage over time, effective download speed is around 0.75 MB/s. Since I am connected to 24 peers and have at least 10 times the used download bandwidth, I suspect there is some unnecessary waiting before making new requests. Maybe check that new PMMR’s are already requested while old ones are being validated?
Step 2, p2p protocol started, around 500 Kb according to wallet speed, with 24 peers
According to wallet, step 2, chain state for sync is 1314(MB), took around 17:48:11-17:52 (so only 4 minutes for more than 1 GB around 6 MB/s based on disk space increase :rocket:, and according to network monitor around 50 MB/s :rocket: Now that is the speed we also want in step 1.
Step 3), preparing for chain-sate validation, from 17:52-17:56, took 4 minutes XXX, CPU at 7.5%, so apparently a lot of calculations, done
Step 4) Sync step 4/7: Validating chain state - range proofs:, 17:56 - 17:57, grin not responding, CPU, still one core full power working. 7.5 %, high power usage on and off, giving it some time, since it appears not hanging. Looking at logs, just a lot of verify_kernel_signatures. This all looks good except that it in the end results in a shutdown, probably because the normal p2p connection is not maintained/prioritized. Will run another test to see if this shutdown is a one time problem.

Update: 2nd run everything went smooth, around 1.5 hour for sync.
Update: 3rd run everything went smooth, no crashes, sync in 1 hour and 15 minutes.
Update: 4th run, enabled 16 threads instead of default 4. Sync time 1 hour and 37 minutes, so to many HTTP threads might even slow down syncing :thinking:
Update: 5th run, enabled 32 thread, 32 peers, Sync time was 1 hour and 18 minutes. More peers and more threads is not faster it appears, or at least it is saturated with lower numbers
Update: 6th run, use default 8 peers, 4 threads.

20230629 18:02:15.353 ERROR grin_p2p::peers - connected_peers: failed to get peers lock
20230629 18:02:15.353 ERROR grin_p2p::peers - connected_peers: failed to get peers lock
20230629 18:02:15.353 DEBUG grin_servers::common::adapters - send_block_request_to_peer: can’t send request to peer PeerAddr(107.174.186.153:3414), not connected
20230629 18:02:15.353 ERROR grin_p2p::peers - connected_peers: failed to get peers lock
20230629 18:02:15.353 DEBUG grin_servers::common::adapters - send_block_request_to_peer: can’t send request to peer PeerAddr(192.3.254.163:3414), not connected
20230629 18:02:15.353 ERROR grin_p2p::peers - connected_peers: failed to get peers lock
20230629 18:02:15.353 DEBUG grin_servers::common::adapters - send_block_request_to_peer: can’t send request to peer PeerAddr(45.77.150.172:3414), not connected
20230629 18:02:15.353 ERROR grin_p2p::peers - connected_peers: failed to get peers lock
20230629 18:02:15.353 DEBUG grin_servers::common::adapters - send_block_request_to_peer: can’t send request to peer PeerAddr(121.43.174.18:3414), not connected
20230629 18:02:15.353 DEBUG grin_p2p::conn - Shutting down reader connection with 121.43.174.18:3414
20230629 18:02:15.353 ERROR grin_p2p::peers - connected_peers: failed to get peers lock
20230629 18:02:15.353 DEBUG grin_servers::common::adapters - send_block_request_to_peer: can’t send request to peer PeerAddr(23.94.105.34:3414), not connected
20230629 18:02:15.353 DEBUG grin_p2p::conn - Shutting down reader connection with 23.94.105.34:3414
…More of the same, waiting then dropping peers, then a complete Shutdown!
20230629 18:02:15.356 DEBUG grin_p2p::peer - Waiting for peer PeerAddr(107.175.127.117:3414) to stop
20230629 18:02:15.356 DEBUG grin_p2p::conn - waiting for thread ThreadId(923) exit
20230629 18:02:15.356 DEBUG grin_p2p::conn - waiting for thread ThreadId(924) exit
20230629 18:02:15.356 WARN grin_servers::grin::server - Shutdown complete
RESTARTING NODE

After the restart, we are at step 7) this takes a bit (hanging at 99%, probably downloading the lats <48h of blocks) done after 1 hour and 30 minutes of syncing, most time consumer by getting headers, more than an hour. Not bad, but I think in step 1 the speed can be made faster by making sure there are always requests made and new PMMR’s are stored in some buffer.

Full log in DEBUG mode can be found here:
https://github.com/Anynomouss/grin_PIDB_testing/blob/main/grin-server-run1%2C%201.5h.log2

2 Likes