Nodeos Blocks Log Stride

A Telos mainnet blocks.log that contains all blocks from genesis up to the current chain state at the time of writing is now over 200GBytes in size. This prompted me to ask myself the question, “Is it possible to break the blocks.log into fragments that can subsequently be joined together again and the results of that recombined blocks.log be useable by nodeos?”

Of course there are lots of tools that will compress and split a large file to create an archive of that file. These will enable you to recreate the whole file. My previous article Nodeos Replays showed that nodes supports the use of a partial blocks.log, in the sense that they don’t have to go back to genesis. I reason that generic splitting tools, with no knowledge of the internal structure of a blocks.log, will be incapable of producing blocks.log fragments that are useable individually.

We saw in Nodeos Replays that the eosio-blocklog tool can be used to split a blocks.log up. However, there is no documented capability to recombine the fragments that it produces. In this article, I explore whether they can be recombined by simply concatenating them. This leads me on to the new blocks log stride feature in version 2.1.0 of EOSIO and things to be aware of when upgrading to this version and enabling this feature with an existing blocks.log in place.

As in Nodeos Replays, I will use my Docker Compose EOSIO services to step through various test scenarios and again I will focus on nodeos and not expand on Docker or Docker Compose concepts nor on generic command line actions. I will also skip over the detail of some actions if they were explained in Nodeos Replays, so I would recommend reading that article before reading this one.

Can Block Log Fragments be Concatenated?
Blocks Log Stride
Upgrading EOSIO to v2.1.0
Applying Blocks Stride Retrospectively
Replay from Snapshot without a Blocks Log
Restoring Blocks to “Strided” Blocks Log
What Happens When Blocks Stride Archiving Kicks In?
Conclusions

Can Block Log Fragments be Concatenated?

Exactly as documented in Nodeos Replays, I start a test node synchronising from genesis with Telos testnet using version 2.0.12 of EOSIO and let it run for a while to build up a blocks.log to work with. I shutdown nodeos normally and use eosio-blocklog –smoke-test to examine the contents of my blocks directory:

Smoke test of blocks.log and blocks.index in directory "blocks"
block log version= 3
first block= 1
last block= 2451
blocks.log and blocks.index agree on number of blocks
no problems found

Using my data Docker Compose service to gain access to my nodeos data directory, I create a couple of directories, /data/one and /data/two and copy my blocks.index and blocks.log to them both. Then I use eosio-blocklog –trim to create two fragments of my original blocks.log, one (in /data/one) with –last set to 1,250 and the other (in /data/two) with –first set to 1,251. If I then use eosio-blocklog –smoke-test to examine those two fragments, just to verify that I’ve created what I expected to, then the following is reported:

Smoke test of blocks.log and blocks.index in directory "one"
block log version= 3
first block= 1
last block= 1250
blocks.log and blocks.index agree on number of blocks

And:

Smoke test of blocks.log and blocks.index in directory "two"
block log version= 3
first block= 1251
last block= 2451
blocks.log and blocks.index agree on number of blocks

Now I concatenate these two blocks.log fragments that I have created to another new directory (/data/combined) and execute eosio-blocklog –make-index on the result. The output is as follows:

Will read existing blocks.log file combined/blocks.log
Will write new blocks.index file combined/blocks.index
block log version= 3
first block= 1         last block= 11070772
3190000 block_log_exception: Block log exception
Block log file at 'combined/blocks.log' formatting is incorrect, indicates position later location in file: 7215499557470229800, which was retrieved at: 20567770.
    {"blocks_log":"combined/blocks.log","pos":"7215499557470229800","orig_pos":20567770}
    eosio-blocklog  block_log.cpp:849 previous

Clearly that didn’t work though as there is no indication in the documentation that it would this isn’t a surprising result. I reason that block log files have content at their start and/or end over and above the blocks themselves.

Blocks Log Stride

While researching the topic of blocks.log fragments, I discovered that version 2.1.0 of EOSIO has introduced the following options for the nodeos chain plugin that were not present in version 2.0.12:

blocks-log-stride
max-retained-block-files
blocks-retained-dir
blocks-archive-dir

There are other new options but it is those above that concern us here. The documentation suggests that this new blocks log stride feature is linked to the aspiration that is the motivation for writing this article:

Documentation of blocks stride related options in EOSIO V2.1 manual

Upgrading EOSIO to v2.1.0

The release notes for EOSIO v2.1.0 state that “Node operators running version v2.0.x should be able to upgrade to v2.1.0 using a snapshot.” So, before attempting to upgrade my test node, I bring my nodeos service up again, purely because I want to have more blocks to work with. After a little while, I then shutdown the nodeos service normally and use the snapshot tool within my Docker Compose EOSIO services to create a snapshot.

{
   "head_block_id" : "00002ceb509d07879bc4b20f5e24b1f224cedf6670c8bb214c4182ca9be31b8b",
   "snapshot_name" : "/data/snapshots/snapshot-00002ceb509d07879bc4b20f5e24b1f224cedf6670c8bb214c4182ca9be31b8b.bin"
}

I then bring the nodeos service up again normally and let it run on a little so that I have a blocks log that extends beyond the snapshot point. While it is up, I use cleos to check the block number that my snapshot corresponds to:

docker-compose run --rm cleos get block 00002ceb509d07879bc4b20f5e24b1f224cedf6670c8bb214c4182ca9be31b8b

This confirms that the snapshot corresponds to block number 11,499:

{
  …
  "block_num": 11499,
  …
}

I’m going to rehearse the upgrade process for a block producer with a nodeos instance that has a blocks.log in situ that I do not want to lose through the upgrade process. If I shutdown my nodeos service normally, update the .env file in my Docker Compose project to set VER=2.1.0 and then bring up the nodeos service again without attempting a replay from snapshot, this fails with copious messages ending in:

eosio-nodeos | rethrow Database incompatible; All environment parameters must match:
eosio-nodeos |     {"what":"Database incompatible; All environment parameters must match"}
eosio-nodeos |     nodeos  chain_plugin.cpp:1321 plugin_initialize
eosio-nodeos |
eosio-nodeos exited with code 254

I’ve used this approach successfully in the past for minor release upgrades. It clearly doesn’t work for upgrade from v2.0.12 to v2.1.0 of EOSIO. The release notes indicate this when they say that the upgrade can be performed using a snapshot. When I review all the messages relating to the error in the log, it leads me to conclude that the chain database from a prior version of EOSIO is incompatible with v2.1.0, hence the need to replay using a snapshot.

So, I replay from the snapshot that I created above but also leaving my existing blocks.log in place. Thus the nodeos service replays the blocks after the snapshot from blocks.log and then begins synchronisation with the p2p network. I shutdown the nodeos service normally.

Applying Blocks Stride Retrospectively

Now I want to experiment to find out what happens if the new blocks log stride feature is applied with a blocks.log created by a prior version of EOSIO already in place. Before doing this, I do a smoke test of my blocks directory:

Smoke test of blocks.log and blocks.index in directory "blocks"
blocks.log and blocks.index agree on number of blocks
no problems found

I wanted to see what block my blocks.log had got to when I left nodeos running for a while after I replayed from a snapshot. However it appears that in v2.1.0 of EOSIO, the eosio-blocklog tool no longer reports the first and last block in a smoke test.

I create a /data/backup directory and copy my blocks.log into it before using the eosio-blocklog tool to make an index based on that copy:

Will read existing blocks.log file backup/blocks.log
Will write new blocks.index file backup/blocks.index
block log version= 3
first block= 1         last block= 32503
eosio-blocklog - making index took 7 msec

So, eosio-blocklog still gives me an indication of the first and last blocks in a blocks.log if I use it to make an index. This backup of my blocks.log will also come in handy later.

I set the following option in my configuration file:

blocks-log-stride=5000

After bringing my nodeos service back up, once the p2p network synchronisation of my node has passed block 35,000, the contents of my blocks directory look like this:

/data/blocks
|-- archive
|-- blocks-1-35000.index
|-- blocks-1-35000.log
|-- blocks.index
|-- blocks.log
`-- reversible
    `-- shared_memory.bin

At this point, cleos get block for block 35,001 is successful but it fails for block 35,000 with the following error message:

Failed with error: Assert Exception (10)
(b & ~1) == 0:

Based on the documentation, I was expecting to still be able to query block 35,000 as blocks-1-35000.index and blocks-1-35000.log that have been created have not been moved to the archive folder yet. Suspicious that something is wrong, I shutdown nodeos normally and bring it back up again. On resumption, the service fails with the following error messages at the end of the log:

eosio-nodeos | error 2021-07-12T07:46:25.877 nodeos    main.cpp:163                  main                 ] 3190000 block_log_exception: Block log exception
eosio-nodeos | Block log file at '/data/blocks/blocks-1-35000.log' formatting indicated last block: 3929036033, first block: 1, but found 35000 blocks
eosio-nodeos |     {"blocks_log":"/data/blocks/blocks-1-35000.log","last_block_num":3929036033,"first_block_num":1,"num":35000}
eosio-nodeos |     nodeos  block_log.cpp:489 construct_index
eosio-nodeos | rethrow
eosio-nodeos |     {}
eosio-nodeos |     nodeos  chain_plugin.cpp:1321 plugin_initialize
eosio-nodeos |
eosio-nodeos exited with code 254

It appears that, whether by accident or design, it is not possible to apply blocks log stride to an in-situ blocks.log that previously didn’t use it.

Replay from Snapshot without a Blocks Log

I replay my node from the same snapshot, this time without a blocks log in place. Before long, my blocks directory has these contents:

/data/blocks
|-- archive
|-- blocks-11500-15000.index
|-- blocks-11500-15000.log
|-- blocks.index
|-- blocks.log
`-- reversible
    `-- shared_memory.bin

I shutdown nodeos normally and bring the service back up again to facilitate cleos access. This time the restart is successful. As I would expect, the cleos get block command using this nodes API endpoint is successful for blocks 11,500, 14,999, 15,000 and 15,001 but fails for block 11,499 with the following error:

Error 3100002: Unknown block
Error Details:
Could not find block: 11499

Restoring Blocks to a “Strided” Blocks Log

Now I want to explore whether it is possible to do something that I haven’t found described in the Nodeos documenation for EOSIO v2.1 but it strikes me may be useful. First, I shutdown my running nodoes service normally.

I create a couple of directories, /data/frag1 and /data/frag2 and copy blocks.index and blocks.log from the /data/backup directory that I used to hold a backup copy of my blocks.log and blocks.index from prior to replay from snapshot following the EOSIO upgrade.

Then I use eosio-blocklog with the –trim option on the contents of /data/frag1 (with –first not set and –last 5000) and /data/frag2 (with –first 10001 and –last 11499). I deliberately left the gap from blocks 5,001 to 10,000.

I then copy blocks.index and blocks.log from /data/frag1 to blocks-1-5000.index and blocks-1-5000.log respectively in /data/blocks and blocks.index and blocks.log from /data/frag2 to blocks-10001-11499.index and blocks-10001-11499.log respectively in /data/blocks.

When I have finished these actions, the contents of my /data/blocks directory look like this:

/data/blocks
|-- [drwxr-xr-x]  archive
|-- [-rwx------]  blocks-1-5000.index
|-- [-rw-r--r--]  blocks-1-5000.log
|-- [-rwx------]  blocks-10001-11499.index
|-- [-rwx------]  blocks-10001-11499.log
|-- [-rw-r--r--]  blocks-11500-15000.index
|-- [-rw-r--r--]  blocks-11500-15000.log
|-- [-rw-r--r--]  blocks.index
|-- [-rw-r--r--]  blocks.log
`-- [drwxr-xr-x]  reversible
    `-- [-rw-r--r--]  shared_memory.bin

Note that some of the files generated by using eosio-blocklog with the –trim option have had their permissions altered by that command to rwx——. I cannot discern a pattern for this behaviour, i.e. when it will happen and when it will not happen. The contents of /data/frag1 and /data/frag2 where I performed the trim operations look like this:

/data/frag1
|-- [-rwx------]  blocks.index
`-- [-rw-r--r--]  blocks.log

/data/frag2
|-- [-rwx------]  blocks.index
|-- [-rwx------]  blocks.log
`-- [drwxr-xr-x]  old
    |-- [-rwx------]  old.index
    `-- [-rw-r--r--]  old.log

This confirms that it was the eosio-blocklog –trim command that modified some of the file permissions and I note also that the command created the ./old directory and its contents in /data/frag2 but not in /data/frag1. There appears to be undocumented and possibly unintended behaviour here.

Now, I bring my nodeos service back up, which is successful and I set about querying various blocks using my cleos get block accessing my running node. This is successful for blocks 1, 5000, 10001, 11499, 11500 and 11501. It fails for blocks 5001 and 10000, with the same Could not find block error message we saw earlier.

It seems that we can use the eosio-blocklog –trim command to create fragments from our old blocks log and a node with the nodeos –blocks-log-stride option set can use them, even if there are gaps in the history.

What Happens When Blocks Stride Archiving Kicks In?

Since the situation I have created is now slightly artificial and hasn’t been arrived at in a manner that is documented in the EOSIO manuals, I was mindful to observe what happened when the number of blocks files exceeded the value set for –max-retained-block-files, which in my test was the default of 10 since I hadn’t set another value.

So, I observed what happened as my nodeos service continued to add blocks via p2p network synchronisation. The results that I observed were:

Blocks files were moved to the default archive directory in the correct order, i.e. the files containing the oldest blocks first, then the files containing the second oldest blocks, and so on. I had wondered whether nodeos might base this on file timestamps, i.e. oldest files first, which would not have aligned to files containing oldest blocks first in the way that I had set things up.
The fact that I had a gap from block 5,001 to 10,000 did not cause nodeos any issues.
As each pair of block files was moved to archive, the cleos get block run against my running nodeos service stopped being able to find the blocks in the corresponding archived range.

In summary, everything was as I would interpret correct behaviour to be. I note that the presence of this archiving mechanism built into the new blocks stride feature suggests to me that the design intention is that nodes using the –blocks-log-stride option will not have access to the entire block history, only a rolling recent history – albeit, there is no upper limit documented for –max-retained-block-files so it could be possible to set it to such a high value that no archiving would occur for the foreseeable future.

Conclusions

Upgrade to v2.1.0 of EOSIO requires a nodeos replay from snapshot.
Block producers may experience nodeos failure if they retrospectively apply the –blocks-log-stride nodeos option with a blocks.log in situ.
A combination of the eosio-blocklog –trim command and the new –blocks-log-stride nodeos option seems to offer the panacea of blocks.log fragments of a manageable size with the flexibility to use as few or as many of them as you wish for your nodeos instances, possibly all the way back to genesis.
However, some of the behaviour that I have demonstrated/tested in this article that leads me to say that is undocumented and so I would suggest it is risky at this moment to rely upon it.
Furthermore, to try to use nodeos –blocks-log-stride option without accepting that the node only has access to a rolling recent block history would seem to me to be counter to the obvious design intent of this feature.
I believe that the Telos community, or indeed any community using EOSIO to deliver blockchain services, would benefit from a central archive of:

A blocks.log from genesis to a recent point in time, split using one of the generic tools for this purpose. That archive could only be restored as a whole but would not rely on either the eosio-blocklog –trim command or the –blocks-log-stride nodeos option.
blocks.log fragments created using either the eosio-blocklog –trim command (for the history from genesis) or a node with the nodeos –blocks-log-stride option enabled that would generate additional fragments over time.

I would particularly welcome comments on the last of my conclusions above from the Telos community, on whether they would or would not consider that to be a valuable resource.

Contents

Can Block Log Fragments be Concatenated?

Blocks Log Stride

Upgrading EOSIO to v2.1.0

Applying Blocks Stride Retrospectively

Replay from Snapshot without a Blocks Log

Restoring Blocks to a “Strided” Blocks Log

What Happens When Blocks Stride Archiving Kicks In?

Conclusions

One thought on “Nodeos Blocks Log Stride”

Roger Davies says:

Leave a Reply Cancel Reply

David Williamson

Varilink Computing Ltd

Data Protection