Recently there was a fault at one of the sites of the hosting providing we use at Telos UK. This caused an unclean shutdown of a couple of our nodeos instances leaving them in a “Database dirty flag set” state. I attempted to replay from a snapshot file with the existing blocks.log
file in place as described in Replay from Snapshot with Blocks Log in my prior post Nodeos Replays, but this time this failed with the following error message in the log:
Error unpacking field new_producers {"field":"new_producers"} nodeos raw.hpp:367 operator() error unpacking eosio::chain::signed_block {"type":"eosio::chain::signed_block"} nodeos raw.hpp:644 unpack
So, I removed the blocks.log
and tried again. This succeeded but of course meant that the blocks required to take us from the snapshot back to being in synchronisation with the chain would have to be entirely sourced via the P2P (peer-to-peer) network rather than being partially replayed from the blocks.log
.
At Telos UK, we run a main and standby pair of servers in different data centres. Each server runs two instances of nodeos, a block producing node and a kind of proxy node that synchronises with the peer nodes of other Telos block producers and provides our external API service. Our block producing nodes only synchronise in a mesh with our own internal nodes.
The snapshot that I used to recover both nodes at the affected site was a couple of months old. I set the synchronisation going late at night and checked on its progress the next morning, about eight hours later. To my surprise, whereas the block producing node had already caught up with the chain, the proxy node had only received about a week’s worth of blocks – ball park at least 180 times the normal synchronisation rate for the block producing node vs. only about 8 times for the proxy node.
I take care not to include hosts in the P2P network that produce Net Plugin errors in our nodeos logs, but I noted that whereas the log for our block producing node reported nothing other than requests for ranges of blocks during catch up, these were interspersed with various other Net Plugin info messages in the proxy node’s log, which, though I wasn’t confident that I understood them, suggested all was not well.
I also temporarily made the proxy node in catchup unavailable for incoming P2P connections to see if this made any difference to its catchup speed. It did not.
This set me thinking, could it be that:
- The info messages that I saw in the log for our proxy node are indeed indicative that all is not well as they appeared to be.
- Dealing with whatever it is that these info messages are reporting explains the poor catchup performance of our proxy node relative to our block producing node.
So, I set about exploring these questions by synchronising a test node with the Telos testnet. I believe that what I found out may be useful to the wider Telos community and indeed other communities who manage chains based on EOSIO. As always, comments that advance, expand on or correct the learning in this post are most welcome.
Contents
- Starting a Testnet Node from Genesis
- Using the EOSIO Net API Plugin
- Net Plugin Info Messages During Catchup
- Conclusions
Starting a Testnet Node from Genesis
I repeated the steps described under Start from Genesis and Successful Shutdown in my prior post Nodeos Replays – note, again using my Docker Compose EOSIO services, which is therefore reflected throughout what follows in this post. The successful shutdown and restarting of nodeos using the command docker-compose up nodeos
was done to bring nodeos up with convenient cleos access, again via my Docker Compose EOSIO services.
As described previously, I used the “Get Peers” tool in my Telos toolbox to generate a fresh list of p2p-peer-address
values for Telos testnet and added the statement to enable the EOSIO net API plugin, to produce the config.ini
file below:
plugin = eosio::net_api_plugin # p2p-peer-address list generated at 06:17:32 on 21-Oct-21 # Chain id=1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f # cryptosuviio p2p-peer-address=testnet.telos.cryptosuvi.io:9876 # telosunlimit p2p-peer-address=p2p.testnet.telosunlimited.io:9876 # blindblocart p2p-peer-address=testnet.blindblocart.io:9876 # persiantelos p2p-peer-address=mainnet.persiantelos.com:8880 # telosuknodes p2p-peer-address=testnet-p2p.telosuk.io:9877 # eosphereiobp p2p-peer-address=peer1-telos-testnet.eosphere.io:9876 # caleosblocks p2p-peer-address=testnet.telos.caleos.io:9879 # telosgermany p2p-peer-address=telosgermany-testnet.genereos.io:9876 # telosarabia1 p2p-peer-address=testnet2p2p.telosarabia.net:9876 # katalyotelos p2p-peer-address=tlos-p2p-test.katalyo.com:29877 # bp.boid p2p-peer-address=boid-telos-testnet-p2p.animusystems.com:3535 p2p-peer-address=boid-telos-testnet-p2p.animusystems.com:5050 # telosboidbp1 p2p-peer-address=boid-telos-testnet-p2p.animusystems.com:3535 p2p-peer-address=boid-telos-testnet-p2p.animusystems.com:5050 # edeniaedenia p2p-peer-address=telos-testnet.eosio.cr:9879 # 3boidanimus3 p2p-peer-address=boid-telos-testnet-p2p.animusystems.com:3535 p2p-peer-address=boid-telos-testnet-p2p.animusystems.com:5050
While I have noted the inclusion of the line plugin = eosio::net_api_plugin
, I should perhaps highight the following “Caution” notice on the Net Api Plugin page of the EOSIO version 2.0 manual:
This plugin exposes endpoints that allow management of p2p connections. Running this plugin on a publicly accessible node is not recommended as it can be exploited.
Of course, there is no issue enabling this plugin for my isolated test node but we can not enable it on our mainnet proxy nodes since they combine both peering and API functions – see Conclusions.
In the usage instructions for the “Get Peers” tool in my Telos toolbox it says:
Note that the report does not attempt nodeos communication with a P2P endpoint as a validation check for the p2p-peer-address entries. It’s tests include, for example, validation by telnet to the advertised address but they do not include anything based on the relevant EOSIO communication protocols. Thus if an advertised p2p-peer-address passes all the tests that this tool applies this is not a guarantee that synchronisation via that address will be possible.
In light of this, I always monitor the error messages appearing in the nodeos log when starting a node up. I do this for my test node now using:
docker logs --follow eosio-nodeos 2>&1 >/dev/null | grep error
I monitor this for a few minutes as I’ve observed before that there can be initial connection errors for peers that resolve themselves after a short period of time. Before long, I only see this report for a single peer, appearing every thirty seconds in the log:
error 2021-10-21T06:35:06.046 net-1 net_plugin.cpp:2388 operator() ] incoming message length unexpected (1347703880) error 2021-10-21T06:35:06.046 net-1 net_plugin.cpp:2437 operator() ] Closing connection to: mainnet.persiantelos.com:8880
Fortunately, log entries in the “error” category are usually highlight red in the console and so are easy to spot in the nodeos logs even without filtering them.
I believe that what’s happening here is that the “Get Peers” tool in my Telos toolbox :
- Finds a producer record on the Telos testnet chain for persiantelos, with a URL indicated as https://persiantelos.com.
- Does a GET to https://persiantelos.com/chains.json, which returns the following JSON:
{ "chains": { "4667b205c6838ef70ff7988f6e8257e8be0e1284a2f59699054a018f743b1d11": "/bp.json", "1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f": "/bp.json" } }
- Matches to the Telos testnet chain id (
1eaa082
…), which directs to to/bp.json
, though as we can see both the entries for Telos mainnet and Telos testnet in this file direct to/bp.json
, which I think is the source of the error here. - Does a GET to https://persiantelos.com/bp.json and retrieves the values for
p2p_endpoint
in the array ofnodes
in the JSON returned, which yields a single value ofmainnet.persiantelos.com:8880
.
I believe that, as the name of this p2p_endpoint
would suggest, it’s actually a peering node for Telos mainnet and that is why my attempt to connect to it from a testnet node is producing Net Plugin errors. I should point out here that:
- The Telos block producer community work collaboratively not in competition for the good of Telos.
- This is Telos testnet, not mainnet and indeed this post would not have been possible had I not encountered both a
p2p_enpoint
that gave me Net Plugin errors and another that gave me Net Plugin issues that manifest as info messages. - We can see that the inherent resilience that is built into EOSIO readily overcomes any such issues.
Using the EOSIO Net API Plugin
Since in my test node have the EOSIO Net API plugin enabled, I am able to use cleos to get the status of all peers connected to it:
docker-compose run cleos net peers
This produces the following output:
[{ "peer": "testnet.telos.cryptosuvi.io:9876", "connecting": false, "syncing": false, "last_handshake": { "network_version": 1210, "chain_id": "1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f", "node_id": "0d6400bcaa37f115a4c7f64da4ff4f54b6ae9f07e5fbe44cba5efe2f8171a86e", "key": "EOS1111111111111111111111111111111114T1Anm", "time": "1634798269583622709", "token": "0000000000000000000000000000000000000000000000000000000000000000", "sig": "SIG_K1_111111111111111111111111111111111111111111111111111111111111111116uk5ne", "p2p_address": "Ubuntu-1804-bionic-64-minimal:9876 - 0d6400b", "last_irreversible_block_num": 136399500, "last_irreversible_block_id": "08214a8c78ffc5939a2cb2b8da0eaa33e8ecb3e8bd92f39d45a15c6fa14b9ecc", "head_num": 136399828, "head_id": "08214bd487895288817e6e27fe4b8fdb8916bec564cddcff4d837f853c530ae0", "os": "linux", "agent": "\"Cryptosuvi.io\"", "generation": 1 } },{ "peer": "p2p.testnet.telosunlimited.io:9876", "connecting": false, "syncing": false, "last_handshake": { "network_version": 1210, "chain_id": "1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f", "node_id": "686ec1a2c16abda10be4688ac8692a3150cc99a910237068b5c205fdac38e927", "key": "EOS1111111111111111111111111111111114T1Anm", "time": "1634798269716439000", "token": "0000000000000000000000000000000000000000000000000000000000000000", "sig": "SIG_K1_111111111111111111111111111111111111111111111111111111111111111116uk5ne", "p2p_address": "condor2233:9876 - 686ec1a", "last_irreversible_block_num": 136399500, "last_irreversible_block_id": "08214a8c78ffc5939a2cb2b8da0eaa33e8ecb3e8bd92f39d45a15c6fa14b9ecc", "head_num": 136399828, "head_id": "08214bd487895288817e6e27fe4b8fdb8916bec564cddcff4d837f853c530ae0", "os": "linux", "agent": "\"telosunlimit\"", "generation": 1 } },{ "peer": "testnet.blindblocart.io:9876", "connecting": false, "syncing": false, "last_handshake": { "network_version": 1210, "chain_id": "1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f", "node_id": "fc0c4915fbb5b08c123c843d96eab1386d7073c9fde36143145b872928ee564b", "key": "EOS1111111111111111111111111111111114T1Anm", "time": "1634798309643881526", "token": "0000000000000000000000000000000000000000000000000000000000000000", "sig": "SIG_K1_111111111111111111111111111111111111111111111111111111111111111116uk5ne", "p2p_address": "BBA-TEST.server.com:9876 - fc0c491", "last_irreversible_block_num": 136399572, "last_irreversible_block_id": "08214ad4ef030578f021e840f1ef4ac9ff76a1b7c2d0b8068ed85e1d875b0bfb", "head_num": 136399908, "head_id": "08214c24709d309b228fc89447393c4eddadc97aada563c8ee12ae2508f45184", "os": "linux", "agent": "EOS Test Agent", "generation": 8 } },{ "peer": "mainnet.persiantelos.com:8880", "connecting": false, "syncing": false, "last_handshake": { "network_version": 0, "chain_id": "0000000000000000000000000000000000000000000000000000000000000000", "node_id": "0000000000000000000000000000000000000000000000000000000000000000", "key": "EOS1111111111111111111111111111111114T1Anm", "time": 0, "token": "0000000000000000000000000000000000000000000000000000000000000000", "sig": "SIG_K1_111111111111111111111111111111111111111111111111111111111111111116uk5ne", "p2p_address": "", "last_irreversible_block_num": 0, "last_irreversible_block_id": "0000000000000000000000000000000000000000000000000000000000000000", "head_num": 0, "head_id": "0000000000000000000000000000000000000000000000000000000000000000", "os": "", "agent": "", "generation": 0 } },{ "peer": "testnet-p2p.telosuk.io:9877", "connecting": false, "syncing": false, "last_handshake": { "network_version": 1206, "chain_id": "1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f", "node_id": "9beeaf6e9e957c0c65df7df3f3ea272b7690a4fe7c035bec47fba6c1abe48a07", "key": "EOS1111111111111111111111111111111114T1Anm", "time": "1634798269803860000", "token": "0000000000000000000000000000000000000000000000000000000000000000", "sig": "SIG_K1_111111111111111111111111111111111111111111111111111111111111111116uk5ne", "p2p_address": "telostestnet01:9877 - 9beeaf6", "last_irreversible_block_num": 136399500, "last_irreversible_block_id": "08214a8c78ffc5939a2cb2b8da0eaa33e8ecb3e8bd92f39d45a15c6fa14b9ecc", "head_num": 136399828, "head_id": "08214bd487895288817e6e27fe4b8fdb8916bec564cddcff4d837f853c530ae0", "os": "linux", "agent": "\"latin textbook\"", "generation": 1 } },{ "peer": "peer1-telos-testnet.eosphere.io:9876", "connecting": false, "syncing": false, "last_handshake": { "network_version": 1206, "chain_id": "1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f", "node_id": "dae17e1c30aaeb41d4beded401301a1390aff3a421bd1c07be8064717cfa8ed0", "key": "EOS1111111111111111111111111111111114T1Anm", "time": "1634798270161575000", "token": "0000000000000000000000000000000000000000000000000000000000000000", "sig": "SIG_K1_111111111111111111111111111111111111111111111111111111111111111116uk5ne", "p2p_address": "telos-testnet.eosphere.io:9876 - dae17e1", "last_irreversible_block_num": 136399500, "last_irreversible_block_id": "08214a8c78ffc5939a2cb2b8da0eaa33e8ecb3e8bd92f39d45a15c6fa14b9ecc", "head_num": 136399829, "head_id": "08214bd5226b86ce3c71cfb5e8464f496255f2d2eb631fafe1ff345de50165ce", "os": "linux", "agent": "\"EOSphere Telos Testnet\"", "generation": 1 } },{ "peer": "testnet.telos.caleos.io:9879", "connecting": false, "syncing": false, "last_handshake": { "network_version": 1206, "chain_id": "1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f", "node_id": "528d52d68af88e81e7aa9e7fd362ab63351838e45a07b7a8037a70577f0f70b9", "key": "EOS1111111111111111111111111111111114T1Anm", "time": "1634798269942640000", "token": "0000000000000000000000000000000000000000000000000000000000000000", "sig": "SIG_K1_111111111111111111111111111111111111111111111111111111111111111116uk5ne", "p2p_address": "api2.hosts.caleos.io:9876 - 528d52d", "last_irreversible_block_num": 136399500, "last_irreversible_block_id": "08214a8c78ffc5939a2cb2b8da0eaa33e8ecb3e8bd92f39d45a15c6fa14b9ecc", "head_num": 136399828, "head_id": "08214bd487895288817e6e27fe4b8fdb8916bec564cddcff4d837f853c530ae0", "os": "linux", "agent": "\"CalEOS testnet state-history\"", "generation": 1 } },{ "peer": "telosgermany-testnet.genereos.io:9876", "connecting": false, "syncing": false, "last_handshake": { "network_version": 1210, "chain_id": "1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f", "node_id": "daf4d0033171e1cb583ae0172478d64448912988ba5f70a12103e7f7534070d6", "key": "EOS1111111111111111111111111111111114T1Anm", "time": "1634798269931051000", "token": "0000000000000000000000000000000000000000000000000000000000000000", "sig": "SIG_K1_111111111111111111111111111111111111111111111111111111111111111116uk5ne", "p2p_address": "tf-telos-testnet-fn-1:9876 - daf4d00", "last_irreversible_block_num": 136399500, "last_irreversible_block_id": "08214a8c78ffc5939a2cb2b8da0eaa33e8ecb3e8bd92f39d45a15c6fa14b9ecc", "head_num": 136399829, "head_id": "08214bd5226b86ce3c71cfb5e8464f496255f2d2eb631fafe1ff345de50165ce", "os": "linux", "agent": "\"Telos Germany\"", "generation": 1 } },{ "peer": "testnet2p2p.telosarabia.net:9876", "connecting": false, "syncing": false, "last_handshake": { "network_version": 1206, "chain_id": "1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f", "node_id": "cd23b484b2a790c718c86173080b1168adf8d543b0a5de5f9291453237f9a308", "key": "EOS1111111111111111111111111111111114T1Anm", "time": "1634798269862566017", "token": "0000000000000000000000000000000000000000000000000000000000000000", "sig": "SIG_K1_111111111111111111111111111111111111111111111111111111111111111116uk5ne", "p2p_address": "65.21.11.228:9876 - cd23b48", "last_irreversible_block_num": 136399500, "last_irreversible_block_id": "08214a8c78ffc5939a2cb2b8da0eaa33e8ecb3e8bd92f39d45a15c6fa14b9ecc", "head_num": 136399828, "head_id": "08214bd487895288817e6e27fe4b8fdb8916bec564cddcff4d837f853c530ae0", "os": "linux", "agent": "\"EOS Test Agent\"", "generation": 1 } },{ "peer": "tlos-p2p-test.katalyo.com:29877", "connecting": false, "syncing": false, "last_handshake": { "network_version": 1206, "chain_id": "1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f", "node_id": "b4d29f4ae3abfd9c3cd994ebd76c202eb205581e285d42e28c44aae9b68fcde0", "key": "EOS1111111111111111111111111111111114T1Anm", "time": "1634798269932457000", "token": "0000000000000000000000000000000000000000000000000000000000000000", "sig": "SIG_K1_111111111111111111111111111111111111111111111111111111111111111116uk5ne", "p2p_address": "telos2.katalyo.com:29877 - b4d29f4", "last_irreversible_block_num": 136399500, "last_irreversible_block_id": "08214a8c78ffc5939a2cb2b8da0eaa33e8ecb3e8bd92f39d45a15c6fa14b9ecc", "head_num": 136399829, "head_id": "08214bd5226b86ce3c71cfb5e8464f496255f2d2eb631fafe1ff345de50165ce", "os": "linux", "agent": "\"EOS Test Agent\"", "generation": 1 } },{ "peer": "boid-telos-testnet-p2p.animusystems.com:3535", "connecting": false, "syncing": false, "last_handshake": { "network_version": 1210, "chain_id": "1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f", "node_id": "09223c8964768688aebde789f21041620c91a0db75ffc11c9a1ba39330d3103a", "key": "EOS1111111111111111111111111111111114T1Anm", "time": "1634798269964773000", "token": "0000000000000000000000000000000000000000000000000000000000000000", "sig": "SIG_K1_111111111111111111111111111111111111111111111111111111111111111116uk5ne", "p2p_address": "boid-telos-testnet-p2p.animusystems.com:3535 - 09223c8", "last_irreversible_block_num": 136399500, "last_irreversible_block_id": "08214a8c78ffc5939a2cb2b8da0eaa33e8ecb3e8bd92f39d45a15c6fa14b9ecc", "head_num": 136399829, "head_id": "08214bd5226b86ce3c71cfb5e8464f496255f2d2eb631fafe1ff345de50165ce", "os": "linux", "agent": "bp.boid", "generation": 1 } },{ "peer": "boid-telos-testnet-p2p.animusystems.com:5050", "connecting": false, "syncing": false, "last_handshake": { "network_version": 1210, "chain_id": "1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f", "node_id": "051cb1d94b02a600d35267db15e299d47ce9aa8a13747caab516204f1b513cbb", "key": "EOS1111111111111111111111111111111114T1Anm", "time": "1634798269945186000", "token": "0000000000000000000000000000000000000000000000000000000000000000", "sig": "SIG_K1_111111111111111111111111111111111111111111111111111111111111111116uk5ne", "p2p_address": "boid-telos-testnet-p2p.animusystems.com:5353 - 051cb1d", "last_irreversible_block_num": 136399500, "last_irreversible_block_id": "08214a8c78ffc5939a2cb2b8da0eaa33e8ecb3e8bd92f39d45a15c6fa14b9ecc", "head_num": 136399829, "head_id": "08214bd5226b86ce3c71cfb5e8464f496255f2d2eb631fafe1ff345de50165ce", "os": "linux", "agent": "bp.boid-backup", "generation": 1 } },{ "peer": "telos-testnet.eosio.cr:9879", "connecting": false, "syncing": false, "last_handshake": { "network_version": 1210, "chain_id": "1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f", "node_id": "7cd6f8576a741522fc62f84d640811225607c78c9c60403ef1f237d87548ce75", "key": "EOS1111111111111111111111111111111114T1Anm", "time": "1634798270159928000", "token": "0000000000000000000000000000000000000000000000000000000000000000", "sig": "SIG_K1_111111111111111111111111111111111111111111111111111111111111111116uk5ne", "p2p_address": "telos-testnet.eosio.cr:9879 - 7cd6f85", "last_irreversible_block_num": 136399500, "last_irreversible_block_id": "08214a8c78ffc5939a2cb2b8da0eaa33e8ecb3e8bd92f39d45a15c6fa14b9ecc", "head_num": 136399829, "head_id": "08214bd5226b86ce3c71cfb5e8464f496255f2d2eb631fafe1ff345de50165ce", "os": "linux", "agent": "\"telos-testnet.eosio.cr\"", "generation": 1 } } ]
Some observations:
- The
peer
field for each entry corresponds to thep2p-peer-address
in theconfig.ini
file. However, it is thep2p_address
where one can be established that is reported in the nodeos log files. Thep2p_address
appears to be as reported by the peer when communication can be successfully established via thep2p-peer-address
. This listing gives the mapping between the two. - The
connecting
andsyncing
fields are always reported as false whenever I produce this peers list. However, I am definitely receiving blocks from the peers and so must be communicating with at least one of them. Perhaps these fields will show true after my node has caught up with the chain? I would welcome any insight into the meaning of these values as comments on this post if anybody has it? - The entry for the peer
mainnet.persiantelos.com:8880
is distinct from all the others and consistent with what we’re seeing in the nodeos logs.
At this point, I decided to conduct some little tests to further examine the capabilities of the EOSIO Net API Plugin as follows:
Test
Run:
docker-compose run --rm cleos net disconnect mainnet.persiantelos.com:8880
Result
Error reports for this peer in the nodeos log cease and it is no longer reported by:
docker-compose run --rm cleos net peers
Inference
The net disconnect command takes effect immediately, without having to restart the node.
Test
Shutdown the node normally and start it up again.
Result
Error reports for mainnet.persiantelos.com:8880
resume in the nodeos log and it reported again by docker-compose run --rm cleos net peers
Inference
The disconnect of this peer is not remembered and the node attempts to connect to all peers identified by p2p-peer-address
entries in the config.ini
file on each start-up.
Test
Shutdown the node normally, comment out the entry for testnet.telos.cryptosuvi.io:9876
and restart the node.
Run in turn:
docker-compose run --rm cleos net peers
docker-compose run --rm cleos net connect testnet.telos.cryptosuvi.io:9876
docker-compose run --rm cleos net peers
Result
The peer testnet.telos.cryptosuvi.io:9876 is reported the second time that the net peers command is run but not the first time.
Inference
It is possible to use the net connect
command to dynamically connect to peers that are not referenced in a p2p-peer-address
entry in the node’s config.ini
file.
Net Plugin Info Messages During Catchup
For this next part of my test, I first make sure that my node is not attempting to connect with any peer that is currently being reported as an error, in this case just the single instance for mainnet.persiantelos.com:8880
. This is inline with my normal operational practice. While my node remains in catchup with the chain, i.e. well behind the current head block, the Net Plugin continuously outputs info messages like this to the nodeos log:
info 2021-10-23T07:45:35.158 net-1 net_plugin.cpp:1556 operator() ] requesting range 209465 to 209564, from 65.21.11.228:9876 - cd23b48
It’s clear that while my node is in this catchup mode, it is requesting ranges of historical blocks from the peers that it is connected to. If I look closely, I can see that for one peer only messages like this are immediately followed by two other Net Plugin info messages
info 2021-10-23T07:45:52.323 net-1 net_plugin.cpp:1556 operator() ] requesting range 212365 to 212464, from BBA-TEST.server.com:9876 - fc0c491 info 2021-10-23T07:45:52.442 net-0 net_plugin.cpp:1670 recv_handshake ] handshake from BBA-TEST.server.com:9876 - fc0c491, lib 136753022, head 136753356, head id d9613153ec8f34a5.. sync 1 info 2021-10-23T07:45:57.324 net-0 net_plugin.cpp:1619 sync_reassign_fetch ] reassign_fetch, our last req is 212464, next expected is 212365 peer BBA-TEST.server.com:9876 - fc0c491
Every time my node requests a range of blocks from this peer, it responds with a handshake and then it appears that my node has reassigned that fetch to another, later range of blocks. I confess that I don’t understand exactly what is going on here. However, I infer that all is not well in respect of catchup of blocks from this particular peer node.
In this test, I am only seeing info messages that are out of the ordinary in this way for this single node. I observed during the catchup of mainnet that motivated this post that there were out of the ordinary messages appearing for multiple peers during the uncomfortably slow catchup of our external peering node.
I will monitor the catchup speed of my experimental, testnet node, first with this peer included in the list of those that I am connecting to and then again with it removed. To do this, I run this command to get a timestamped status report on my node’s catchup with the chain:
date && docker-compose run --rm cleos get info
This produces the following output:
Sat 23 Oct 09:04:52 BST 2021 Starting eosio-keosd ... done { "server_version": "32a47e39", "chain_id": "1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f", "head_block_num": 216933, "last_irreversible_block_num": 216932, "last_irreversible_block_id": "00034f64dca30c7eb2ce81a92c68eb83f767f4d71504c510afc5481df55c2ada", "head_block_id": "00034f65462965fc4730a39e627fbb6a21c2f089e43d7308bd6a6d63cc470cac", "head_block_time": "2019-08-17T04:40:44.000", "head_block_producer": "eosio", "virtual_block_cpu_limit": 200000000, "virtual_block_net_limit": 1048576000, "block_cpu_limit": 200000, "block_net_limit": 1048576, "server_version_string": "v2.0.12", "fork_db_head_block_num": 216933, "fork_db_head_block_id": "00034f65462965fc4730a39e627fbb6a21c2f089e43d7308bd6a6d63cc470cac", "server_full_version_string": "v2.0.12-32a47e396a13b948263b1dc9762a6e2292a2c2ef" }
After roughly half an hour, I run the same command again, which this time produces this output:
Sat 23 Oct 09:34:39 BST 2021 Starting eosio-keosd ... done { "server_version": "32a47e39", "chain_id": "1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f", "head_block_num": 468663, "last_irreversible_block_num": 468662, "last_irreversible_block_id": "000726b6705c27f8973be5e39f11efb923ce5e79b33f3780006514b9f36af9af", "head_block_id": "000726b765d2335c9993055a30c2d805daccf36724c2fa0c94d484d3324ded40", "head_block_time": "2019-08-18T15:38:29.000", "head_block_producer": "eosio", "virtual_block_cpu_limit": 200000000, "virtual_block_net_limit": 1048576000, "block_cpu_limit": 200000, "block_net_limit": 1048576, "server_version_string": "v2.0.12", "fork_db_head_block_num": 468663, "fork_db_head_block_id": "000726b765d2335c9993055a30c2d805daccf36724c2fa0c94d484d3324ded40", "server_full_version_string": "v2.0.12-32a47e396a13b948263b1dc9762a6e2292a2c2ef" }
A gain of 251,730 blocks (468,663-216,933) in 1,787 seconds (09:34:39-09:04:52) or approximately 141 blocks per second.
I now run the command docker-compose run --rm cleos net disconnect testnet.blindblocart.io:9876
, which produces the response "connection removed"
and also check to see that my nodeos log output is “clean”, i.e. only reports the expected “requesting range” messages and no others.
I now once again run the command to produce a timestamped synchronisation statues report:
Sat 23 Oct 09:40:01 BST 2021 Starting eosio-keosd ... done { "server_version": "32a47e39", "chain_id": "1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f", "head_block_num": 539173, "last_irreversible_block_num": 539172, "last_irreversible_block_id": "00083a244ce1e7e412b646bc2a07667ef4c70bcb4a0cc422c6d3dcdb38dc5035", "head_block_id": "00083a25142563b990f3a1f777a5fe70fb059ee6396aa54b71a2a58a3a25e77e", "head_block_time": "2019-08-19T01:26:04.000", "head_block_producer": "eosio", "virtual_block_cpu_limit": 200000000, "virtual_block_net_limit": 1048576000, "block_cpu_limit": 200000, "block_net_limit": 1048576, "server_version_string": "v2.0.12", "fork_db_head_block_num": 539173, "fork_db_head_block_id": "00083a25142563b990f3a1f777a5fe70fb059ee6396aa54b71a2a58a3a25e77e", "server_full_version_string": "v2.0.12-32a47e396a13b948263b1dc9762a6e2292a2c2ef" }
And yet again approximately half an hour later:
Sat 23 Oct 10:10:02 BST 2021 Starting eosio-keosd ... done { "server_version": "32a47e39", "chain_id": "1eaa0824707c8c16bd25145493bf062aecddfeb56c736f6ba6397f3195f33c9f", "head_block_num": 1194673, "last_irreversible_block_num": 1194492, "last_irreversible_block_id": "001239fc04f6c4ec8322f374722a76edfde0adafc696ca34c417ce7f3f922ed8", "head_block_id": "00123ab1cdb18f34d9459f1bbaa428cf28ae4ef6f87a26a1baf0f2d4cf63557b", "head_block_time": "2019-08-22T20:37:42.000", "head_block_producer": "swedencornet", "virtual_block_cpu_limit": 200000000, "virtual_block_net_limit": 1048576000, "block_cpu_limit": 200000, "block_net_limit": 1048576, "server_version_string": "v2.0.12", "fork_db_head_block_num": 1194673, "fork_db_head_block_id": "00123ab1cdb18f34d9459f1bbaa428cf28ae4ef6f87a26a1baf0f2d4cf63557b", "server_full_version_string": "v2.0.12-32a47e396a13b948263b1dc9762a6e2292a2c2ef" }
This time a gain of 655,500 blocks (1,194,673-539,173) in 1,801 seconds (10:10:02-09:40:01) or approximately 364 blocks per second.
So, there has certainly been a significant performance improvement from removing just that single peer from the list of those we were connecting to. This isn’t has dramatic as the difference we saw in the catchup performance of our mainnet block producing and proxy nodes to mainnet but there the proxy node showed several peer connect issues reported by Net Plugin as info messages.
Nodes that are in synchronisation with the chain and just peering to remain so only have to process two blocks per second. So, I assume that they are insenstive to such issues.
Conclusions
- It is extremely useful to have the Net Plugin API available on a peering node.
- Our current arrangement whereby our peering nodes also provide our external API service is problematic. It prevents us from having the Net API plugin enabled on those nodes. A better approach would be to separate these functions and only expose the API service on the peering nodes internally for our own, secure use.
- If you have nodes in catchup mode, which new block producers might have to do for example, then you need to pay attention to any out of the ordinary Net Plugin messages, whether they are categorised as errors or not in order to get good catchup performance.
Roger Davies says:
Thanks very much for your dedication and attention to detail Dave 🙂
David Williamson says:
Thanks. Some might say, “tedious droning on”!