Grin Node times out after sometime running successfully

I’ve noticed an issue with running my Grin node, attempting to check my wallet info eventually results in the following after 12ish hours

$ sudo -u grin /bin/sh -c 'cat wallet-password.txt | grin-wallet info'

Password: 20220831 13:26:43.956 WARN grin_wallet_libwallet::api_impl::owner_updater - Scanning - 0% complete

20220831 13:27:03.958 ERROR grin_wallet_impls::node_clients::http - Error calling get_pmmr_indices: Request error: Cannot make request: error sending request for url (http://127.0.0.1:3413/v2/foreign): operation timed out

Wallet command failed: LibWallet Error: Client Callback Error: Error calling get_pmmr_indices: Request error: Cannot make request: error sending request for url (http://127.0.0.1:3413/v2/foreign): operation timed out

My grin node appears to timeout after 12 - 24 hours. The logs seem okay…

20220831 12:21:41.234 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(156.226.20.225:3414))

20220831 12:31:51.278 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(35.181.35.93:3414))

20220831 12:42:01.320 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(23.94.220.239:3414))

20220831 12:52:11.362 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(35.181.35.93:3414))

20220831 13:02:21.405 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: false (90%), relay: Some(PeerAddr(65.21.40.28:3414))

20220831 13:12:31.454 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(192.3.254.163:3414))

20220831 13:22:41.500 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(156.226.20.225:3414))

20220831 13:32:51.541 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(156.226.20.225:3414))

20220831 13:43:01.583 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(18.204.166.78:3414))

20220831 13:53:11.632 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(156.226.20.225:3414))

20220831 14:03:21.674 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: false (90%), relay: Some(PeerAddr(23.94.120.30:3414))

Any ideas?

Just checking, but are you looking at the logs when the event happened in local time? Node displays in UTC but logs prints in OS set time.

Are you running the beta version with PIBD, or just a regular version of the node?

Ah I’m using 5.2.0-alpha.1

think the unstableness of an alpha is what I’m experiencing?

I see a few issue related to hanging nodes running 5.2.0-alpha

Going to downgrade to 5.1.2 and see if that fixes the issue here

1 Like

I faced the same issue running 5.1.2, before looking for a permanent solution (no time right now) I scheduled a restart each 12 hours in the crontab. But interested in a permanent fix :slight_smile:

1 Like

shoot, you are correct @Cadmus ! still having issues with 5.1.2

Different error message when checking the wallet now tho…

20220902 19:37:26.678 ERROR grin_wallet_impls::node_clients::http - Outputs by id failed: Request error: Cannot make request: error sending request for url (http://127.0.0.1:3413/v2/foreign): operation timed out
20220902 19:37:26.678 WARN grin_wallet_libwallet::api_impl::owner_updater - Updater Thread unable to contact node

It means the node cannot be contacted. Restarting the node will probably fix it. These kind of stability issues are a problem.
Since resently they are being worked on, but no fixes yet.

Got this error on testnet, do i need to re sync?

Blockquote20220914 10:32:04.587 ERROR grin_servers::grin::sync::state_sync - error validating PIBD state: Invalid Root
Cause: Unknown
Backtrace:
20220914 10:32:04.610 ERROR grin_servers::grin::sync::state_sync - PIBD Reported Failure - Restarting Sync
20220914 10:32:07.944 WARN grin_chain::txhashset::txhashset - rewind_single_block: 1 output_pos entries missing for: 16e40eeec31d at 1409698
20220914 10:32:07.964 WARN grin_chain::txhashset::txhashset - rewind_single_block: 1 output_pos entries missing for: 79a6e0e302dc at 1408282
20220914 10:32:07.974 WARN grin_chain::txhashset::txhashset - rewind_single_block: 1 output_pos entries missing for: 278659fa7b1e at 1407545
20220914 10:32:07.974 WARN grin_chain::txhashset::txhashset - rewind_single_block: 1 output_pos entries missing for: 72b00b291532 at 1407540
20220914 10:32:07.974 WARN grin_chain::txhashset::txhashset - rewind_single_block: 1 output_pos entries missing for: 4ba767e6ae89 at 1407527
20220914 10:32:07.998 WARN grin_chain::txhashset::txhashset - rewind_single_block: 1 output_pos entries missing for: 596b5c839c54 at 1405853
20220914 10:32:07.999 WARN grin_chain::txhashset::txhashset - rewind_single_block: 1 output_pos entries missing for: 564dfb89f72e at 1405782
20220914 10:32:08.000 WARN grin_chain::txhashset::txhashset - rewind_single_block: 1 output_pos entries missing for: 826943cc593d at 1405751
20220914 10:32:08.000 WARN grin_chain::txhashset::txhashset - rewind_single_block: 1 output_pos entries missing for: 3b349fb0f0b4 at 1405749
20220914 10:32:08.000 WARN grin_chain::txhashset::txhashset - rewind_single_block: 4 output_pos entries missing for: 4c01fc6f8290 at 1405737
20220914 10:32:08.000 WARN grin_chain::txhashset::txhashset - rewind_single_block: 2 output_pos entries missing for: 2738884181d3 at 1405735
20220914 10:32:08.000 WARN grin_chain::txhashset::txhashset - rewind_single_block: 2 output_pos entries missing for: 69ae8ac2af6a at 1405729
20220914 10:32:08.000 WARN grin_chain::txhashset::txhashset - rewind_single_block: 2 output_pos entries missing for: 9abc0695c86f at 1405727
20220914 10:32:08.000 WARN grin_chain::txhashset::txhashset - rewind_single_block: 2 output_pos entries missing for: 324a15b90976 at 1405721
20220914 10:32:08.001 WARN grin_chain::txhashset::txhashset - rewind_single_block: 5 output_pos entries missing for: 71a852baeb09 at 1405717
20220914 10:32:08.001 WARN grin_chain::txhashset::txhashset - rewind_single_block: 2 output_pos entries missing for: 492618c1f3fc at 1405715
20220914 10:32:08.001 WARN grin_chain::txhashset::txhashset - rewind_single_block: 2 output_pos entries missing for: 318bd0b9df45 at 1405711
20220914 10:32:08.001 WARN grin_chain::txhashset::txhashset - rewind_single_block: 2 output_pos entries missing for: 9168c070c654 at 1405698
20220914 10:32:08.001 WARN grin_chain::txhashset::txhashset - rewind_single_block: 1 output_pos entries missing for: 6426b1cf1a6b at 1405685
20220914 10:32:08.002 WARN grin_chain::txhashset::txhashset - rewind_single_block: 1 output_pos entries missing for: 3fab3f84abc0 at 1405659
20220914 10:32:08.002 WARN grin_chain::txhashset::txhashset - rewind_single_block: 2 output_pos entries missing for: 6c1a7e538362 at 1405656
20220914 10:32:08.002 WARN grin_chain::txhashset::txhashset - rewind_single_block: 2 output_pos entries missing for: 7b7d935382e6 at 1405653
20220914 10:32:08.002 WARN grin_chain::txhashset::txhashset - rewind_single_block: 2 output_pos entries missing for: 3d03d768f67a at 1405649
20220914 10:32:08.002 WARN grin_chain::txhashset::txhashset - rewind_single_block: 2 output_pos entries missing for: 2415a6f4ef5c at 1405627
20220914 10:32:08.002 WARN grin_chain::txhashset::txhashset - rewind_single_block: 2 output_pos entries missing for: 82994c53f3e0 at 1405621
20220914 10:32:08.086 ERROR grin_servers::grin::sync::state_sync - pibd_sync restart: chain reset error = Store Error: NotFoundErr(“Block with hash: 7568a9de8739”), reason: DB Not Found Error: Block with hash: 7568a9de8739
Cause: DB Not Found Error: Block with hash: 7568a9de8739
Backtrace:

My Node appears to be hanging up all the time, I don’t think it runs for more than 10 minutes before checking my balance results in the following error:

ERROR grin_wallet_impls::node_clients::http - Error calling get_tip: Request error: Cannot make request: error sending request for url (http://127.0.0.1:3413/v2/foreign): error trying to connect: tcp connect error: Connection refused (os error 111)

Anyone else experiencing this? I must be doing something wrong here. I’m using the official node/wallet

Do you see anything unusual in the node logs?

Here’s the logs starting from the stopping of the hung node and ending in node being hung again

20220930 12:13:49.445 WARN grin::cmd::server - Received SIGINT (Ctrl+C) or SIGTERM (kill).
20220930 12:13:49.495 INFO grin_api::rest - API server has been stopped
20220930 12:13:49.643 INFO grin_servers::grin::server - connect_and_monitor thread stopped
20221001 11:36:03.573 INFO grin_util::logger - log4rs is initialized, file level: Info, stdout level: Warn, min. level: Info
20221001 11:36:03.573 INFO grin - Using configuration file at /var/lib/grin/grin-server.toml
20221001 11:36:03.573 INFO grin - This is Grin version 5.1.2 (git v5.1.2), built for x86_64-unknown-linux-gnu by rustc 1.59.0 (9d1b2106e 2022-02-23).
20221001 11:36:03.574 INFO grin - Chain: Mainnet
20221001 11:36:03.574 INFO grin - Accept Fee Base: 500000
20221001 11:36:03.574 INFO grin - Future Time Limit: 300
20221001 11:36:03.574 INFO grin - Feature: NRD kernel enabled: false
20221001 11:36:03.574 WARN grin::cmd::server - Starting GRIN w/o UI...
20221001 11:36:03.581 INFO grin_servers::grin::server - Starting server, genesis block: 40adad0aec27
20221001 11:36:15.360 INFO grin_servers::grin::server - Starting rest apis at: 127.0.0.1:3413
20221001 11:36:15.365 WARN grin_api::handlers - Starting HTTP Node APIs server at 127.0.0.1:3413.
20221001 11:36:15.366 WARN grin_api::handlers - HTTP Node listener started.
20221001 11:36:15.366 INFO grin_servers::grin::server - Starting dandelion monitor: 127.0.0.1:3413
20221001 11:36:15.366 WARN grin_servers::grin::server - Grin server started.
20221001 11:36:15.368 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: None
20221001 11:36:18.344 INFO grin_servers::common::adapters - Received 32 block headers from 194.41.36.114:3414
20221001 11:36:27.284 INFO grin_servers::common::adapters - Received 32 block headers from 103.82.25.152:3414
20221001 11:36:27.522 INFO grin_servers::common::adapters - Received 32 block headers from 103.82.25.152:3414
20221001 11:36:27.573 INFO grin_servers::common::adapters - Received 32 block headers from 103.82.25.152:3414
20221001 11:36:27.760 INFO grin_servers::common::adapters - Received 32 block headers from 103.82.25.152:3414
20221001 11:36:27.803 INFO grin_servers::common::adapters - Received 32 block headers from 103.82.25.152:3414
20221001 11:36:27.847 INFO grin_servers::common::adapters - Received 32 block headers from 103.82.25.152:3414
20221001 11:36:27.999 INFO grin_servers::common::adapters - Received 32 block headers from 103.82.25.152:3414
20221001 11:36:28.040 INFO grin_servers::common::adapters - Received 32 block headers from 103.82.25.152:3414
20221001 11:36:28.131 INFO grin_servers::common::adapters - Received 32 block headers from 103.82.25.152:3414
20221001 11:36:28.175 INFO grin_servers::common::adapters - Received 32 block headers from 103.82.25.152:3414
20221001 11:36:28.238 INFO grin_servers::common::adapters - Received 32 block headers from 103.82.25.152:3414
20221001 11:36:28.370 INFO grin_servers::common::adapters - Received 32 block headers from 103.82.25.152:3414
20221001 11:36:28.410 INFO grin_servers::common::adapters - Received 32 block headers from 103.82.25.152:3414
20221001 11:36:28.478 INFO grin_servers::common::adapters - Received 32 block headers from 103.82.25.152:3414
20221001 11:36:28.610 INFO grin_servers::common::adapters - Received 32 block headers from 103.82.25.152:3414
20221001 11:36:28.848 INFO grin_servers::common::adapters - Received 32 block headers from 103.82.25.152:3414
20221001 11:36:29.003 INFO grin_servers::common::adapters - Received 32 block headers from 137.184.46.144:3414
20221001 11:36:29.074 INFO grin_servers::common::adapters - Received 32 block headers from 137.184.46.144:3414
20221001 11:36:29.111 INFO grin_servers::common::adapters - Received 32 block headers from 137.184.46.144:3414
20221001 11:36:29.148 INFO grin_servers::common::adapters - Received 32 block headers from 137.184.46.144:3414
20221001 11:36:29.190 INFO grin_servers::common::adapters - Received 32 block headers from 137.184.46.144:3414
20221001 11:36:29.224 INFO grin_servers::common::adapters - Received 32 block headers from 137.184.46.144:3414
20221001 11:36:29.263 INFO grin_servers::common::adapters - Received 32 block headers from 137.184.46.144:3414
20221001 11:36:29.298 INFO grin_servers::common::adapters - Received 32 block headers from 137.184.46.144:3414
20221001 11:36:29.339 INFO grin_servers::common::adapters - Received 32 block headers from 137.184.46.144:3414
20221001 11:36:29.377 INFO grin_servers::common::adapters - Received 32 block headers from 137.184.46.144:3414
20221001 11:36:29.413 INFO grin_servers::common::adapters - Received 32 block headers from 137.184.46.144:3414
20221001 11:36:29.448 INFO grin_servers::common::adapters - Received 32 block headers from 137.184.46.144:3414
20221001 11:36:29.486 INFO grin_servers::common::adapters - Received 32 block headers from 137.184.46.144:3414
20221001 11:36:29.557 INFO grin_servers::common::adapters - Received 32 block headers from 137.184.46.144:3414
20221001 11:36:29.622 INFO grin_servers::common::adapters - Received 32 block headers from 137.184.46.144:3414
20221001 11:36:29.656 INFO grin_servers::common::adapters - Received 32 block headers from 137.184.46.144:3414
20221001 11:36:29.831 INFO grin_servers::common::adapters - Received 32 block headers from 91.245.227.209:3414
20221001 11:36:29.955 INFO grin_servers::common::adapters - Received 32 block headers from 91.245.227.209:3414
20221001 11:36:29.995 INFO grin_servers::common::adapters - Received 32 block headers from 91.245.227.209:3414
20221001 11:36:30.080 INFO grin_servers::common::adapters - Received 32 block headers from 91.245.227.209:3414
20221001 11:36:30.118 INFO grin_servers::common::adapters - Received 32 block headers from 91.245.227.209:3414
20221001 11:36:30.204 INFO grin_servers::common::adapters - Received 32 block headers from 91.245.227.209:3414
20221001 11:36:30.239 INFO grin_servers::common::adapters - Received 32 block headers from 91.245.227.209:3414
20221001 11:36:30.328 INFO grin_servers::common::adapters - Received 32 block headers from 91.245.227.209:3414
20221001 11:36:30.361 INFO grin_servers::common::adapters - Received 32 block headers from 91.245.227.209:3414
20221001 11:36:30.393 INFO grin_servers::common::adapters - Received 32 block headers from 91.245.227.209:3414
20221001 11:36:30.453 INFO grin_servers::common::adapters - Received 32 block headers from 91.245.227.209:3414
20221001 11:36:30.485 INFO grin_servers::common::adapters - Received 32 block headers from 91.245.227.209:3414
20221001 11:36:30.576 INFO grin_servers::common::adapters - Received 32 block headers from 91.245.227.209:3414
20221001 11:36:30.612 INFO grin_servers::common::adapters - Received 32 block headers from 91.245.227.209:3414
20221001 11:36:30.675 INFO grin_servers::common::adapters - Received 32 block headers from 91.245.227.209:3414
20221001 11:36:30.709 INFO grin_servers::common::adapters - Received 23 block headers from 91.245.227.209:3414
20221001 11:37:43.366 INFO grin_servers::grin::sync::syncer - synchronized at 1978502799163505 @ 1944258 [000331497eb2]
20221001 11:46:25.427 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(103.82.25.152:3414))
20221001 11:56:35.474 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: Some(PeerAddr(103.82.25.152:3414))

I saw this issue, recommending using the “pibd_impl” branch.

I’m going to build the latest in this branch and give it a try.

One question, I attempted to build a debug build, but ran into an error…

$ cargo build --debug
error: Found argument ‘–debug’ which wasn’t expected, or isn’t valid in this context

What’s the correct way to compile a debug build? Thanks!

Need to enable debug mode in your TOML file,

To install the PIBD_impl branch:

  1. git clone grin repo
  2. git checkout pibd_impl
  3. cargo build (you need to have all building dependencies installed!)
  4. there is a new grin binary now (built from latest pibd_impl) in the target/debug folder; go there and run ./grin

This is the process I used. If number one doesn’t work, try to use the full URL after “clone” to the grin repository instead