Debugging sync issues

You’re either starting a brand new Grin testnet node, or have been running it for some time but it doesn’t seem to be updating with latest blocks anymore. First, make sure you’re running a recent version of the code. If your build is more than a couple weeks old, update and retry.

If you’re up-to-date, check the head of your local chain:

curl localhost:13413/v1/chain

Compare the resulting height and hash with:

If you’re really late on the chain and not updating, here are a few things you can do to check what’s wrong. The following is assuming you’re familiar with a few command line tools, like grep.

Check your peers

Grep your grin.log and look for the latest lines with monitor_peers. You should have at least a few healthy peers. If not:

  • Try restarting the server and see if you start acquiring newer peers.
  • Stop the server, remove the .grin/peers directory and restart. Check again your healthy peer count.
  • If you still don’t have enough healthy peers, come ask on Gitter so people can check whether your peer got banned and why.

Check how far along your peers are

Grep for the latest total_diff messages in grin.log. This will tell you at which height your peers are and how much work they have accumulated, compared to you (the vs us part). If they all have less than you, there are 2 possible situations:

  • You’re connected to stale peers, that haven’t been well updated. Follow the 2 last steps in the previous section.
  • You found a block with a large amount of work that others don’t have yet. Your server is actually right to not accept new blocks. Keep it online and others should ask you for your chain. Also check the last section.

If you have peers with more work than you, continue to the next section.

Check for the next block

There are longer chains out there but your peer is stuck because it’s not getting the next block. You need to check why. Use Grin Explorer to look for the block that has the height immediately after yours (so if your latest block is at 50042, search for 50043). Take the first 8 characters of the hash of the next block and look for that in your grin.log.

Again, a couple situations can arise:

  • You’ve received the block (process_block ...) but it’s been rejected with some error. Report the error on Gitter to see if others have had it and tag @ignopeverell so I’m aware of it. Unfortunately, due to a few bugs that still need to be fixed, your local store may have been corrupted by a restart. But please, do not re-sync from scratch without confirming that is the case. It may be a different error that we need to be aware of to debug it properly.
  • Your peer has asked for that block (requesting blocks...) but never received it (no process_block ... with the right hash). Make sure your node has been running for at least 10 min without receiving a block first. Then remove .grin/peers and restart to see if a different set of peers help. If it still doesn’t get the block you need, check the next step.

Are you on a dead high work fork?

Compare the block hash you have at your height with what block explorers have at that same height. If it’s different, your node got a branch with a lot of work but the peer that discovered it has gone away. This can only happen on testnet1 because of:

  • A bug that allows blocks that are extremely “lucky” to be considered the most worked.
  • The fact that miners don’t really care about the rewards and can go offline.

This situation should improve materially with testnet2 and mainnet. In the meantime you should:

  • Try after restarting the server. That will reset your sync state and may help you switch.
  • Report the situation on gitter tagging @ignopeverell. I’d like to know how often this occurs.
  • If the work reported by your peer seems close to the one reported by your peers, wait for some time until they get to a chain with more work than you. Your node should switch then.
  • Otherwise, you’ll have to resync unfortunately.
1 Like

I’m not sure what I’m doing wrong, but my blocks disappeared from me twice and my mining rewards once. I’m using Debian 9 on Testnet1.

First time I was half sync’d. Stopped the server and wallet via Ctrl+c in term. After a few hours I restarted the wallet and server in the normal way, but my blocks are gone. I have shutdown the server and wallet before the same way, but my blocks are there when I restart.

Second time I sync’d completely with the network to 84,000~ blocks. Shutdown server and restart with mining enabled. Pick up in the right spot, mine some blocks, shut down wallet and server for a couple hours. Then I restart both in the exact same way and the server says I have no blocks, but when I check my wallet it shows I’m on the block I left on and shows my proper balance (locked of course).

Ok, so now I’m really confused. So I let the server run while I type this up even though it’s not getting any blocks and it’s throwing tons of errors like this:

ERRO slog-async: logger dropped messages due to channel overflow, count: 8

When I now recheck my wallet info it says I’m on block 0 with no funds.