Managing Nodeos P2P Peer Connections

Recently there was a fault at one of the sites of the hosting providing we use at Telos UK. This caused an unclean shutdown of a couple of our nodeos instances leaving them in a “Database dirty flag set” state. I attempted to replay from a snapshot file with the existing blocks.log file in place as described in Replay from Snapshot with Blocks Log in my prior post Nodeos Replays, but this time this failed with the following error message in the log:

Error unpacking field new_producers
    {"field":"new_producers"}
    nodeos  raw.hpp:367 operator()
error unpacking eosio::chain::signed_block
    {"type":"eosio::chain::signed_block"}
    nodeos  raw.hpp:644 unpack

So, I removed the blocks.log and tried again. This succeeded but of course meant that the blocks required to take us from the snapshot back to being in synchronisation with the chain would have to be entirely sourced via the P2P (peer-to-peer) network rather than being partially replayed from the blocks.log.

At Telos UK, we run a main and standby pair of servers in different data centres. Each server runs two instances of nodeos, a block producing node and a kind of proxy node that synchronises with the peer nodes of other Telos block producers and provides our external API service. Our block producing nodes only synchronise in a mesh with our own internal nodes.

The snapshot that I used to recover both nodes at the affected site was a couple of months old. I set the synchronisation going late at night and checked on its progress the next morning, about eight hours later. To my surprise, whereas the block producing node had already caught up with the chain, the proxy node had only received about a week’s worth of blocks – ball park at least 180 times the normal synchronisation rate for the block producing node vs. only about 8 times for the proxy node.

I take care not to include hosts in the P2P network that produce Net Plugin errors in our nodeos logs, but I noted that whereas the log for our block producing node reported nothing other than requests for ranges of blocks during catch up, these were interspersed with various other Net Plugin info messages in the proxy node’s log, which, though I wasn’t confident that I understood them, suggested all was not well.

I also temporarily made the proxy node in catchup unavailable for incoming P2P connections to see if this made any difference to its catchup speed. It did not.

This set me thinking, could it be that:

  1. The info messages that I saw in the log for our proxy node are indeed indicative that all is not well as they appeared to be.
  2. Dealing with whatever it is that these info messages are reporting explains the poor catchup performance of our proxy node relative to our block producing node.

So, I set about exploring these questions by synchronising a test node with the Telos testnet. I believe that what I found out may be useful to the wider Telos community and indeed other communities who manage chains based on EOSIO. As always, comments that advance, expand on or correct the learning in this post are most welcome.

Continue reading

Nodeos Blocks Log Stride

A Telos mainnet blocks.log that contains all blocks from genesis up to the current chain state at the time of writing is now over 200GBytes in size. This prompted me to ask myself the question, “Is it possible to break the blocks.log into fragments that can subsequently be joined together again and the results of that recombined blocks.log be useable by nodeos?”

Of course there are lots of tools that will compress and split a large file to create an archive of that file. These will enable you to recreate the whole file. My previous article Nodeos Replays showed that nodes supports the use of a partial blocks.log, in the sense that they don’t have to go back to genesis. I reason that generic splitting tools, with no knowledge of the internal structure of a blocks.log, will be incapable of producing blocks.log fragments that are useable individually.

We saw in Nodeos Replays that the eosio-blocklog tool can be used to split a blocks.log up. However, there is no documented capability to recombine the fragments that it produces. In this article, I explore whether they can be recombined by simply concatenating them. This leads me on to the new blocks log stride feature in version 2.1.0 of EOSIO and things to be aware of when upgrading to this version and enabling this feature with an existing blocks.log in place.

As in Nodeos Replays, I will use my Docker Compose EOSIO services to step through various test scenarios and again I will focus on nodeos and not expand on Docker or Docker Compose concepts nor on generic command line actions. I will also skip over the detail of some actions if they were explained in Nodeos Replays, so I would recommend reading that article before reading this one.

Continue reading

Nodeos Replays

EOSIO is “a highly performant open-source blockchain platform, built to support and operate safe, compliant, and predictable digital infrastructures” and nodeos is “the core service daemon that runs on every EOSIO node”. This article is about the replaying of blockchain blocks using nodeos, which can be necessary to recover from failure or as a faster alternative to synchronising from a p2p network.

To demonstrate and test the concepts involved, I will use my Docker Compose EOSIO services, while referring to the published nodeos documentation. I will step through various test scenarios using a varilink/eosio Docker image based on version 2.0.12 of EOSIO and so any links to the EOSIO documentation within the body of the article will reference the EOSIO v2.0 manual. Since nodeos is the focus of this article, I will not expand on Docker or Docker Compose concepts nor generic command line actions.

Continue reading