Here, we will have a look at how the Tezos node performs (how fast it syncs the chain from scratch ) on different AWS instances as well as a general purpose Digital Ocean instance.
For this test, the following instances were used:
Note that the cost per month is calculated from the on-demand rate, you could get much cheaper instances if you reserve an instance for a longer time, or with spot-instances but we won’t go too much into AWS pricing model here, however, spot-pricing is definitely a thing to look at for fail-over nodes for example.
On all instances we ran a Tezos node with docker (using the official tezos/tezos:mainnet) in all possible storage nodes, as well as an archive node compiled from source.
The main purpose of this was to look into the effects of different Storage as this is the most important aspect of a cloud node in terms of performance and syncing speed, given that Tezos currently only runs on one CPU, the number of CPU cores for each instance should have no influence on the outcome, however single-core speed could if the storage is fast enough to not be the bottleneck.
So let`s look at our different Storage options in more detail.
Digital Ocean states that data is stored on hardware separated from your Droplet and it’s replicated multiple times across different racks, reducing the chance of data loss in case of hardware failure.
This is very similar to EBS volumes for AWS, however, we do not quite get the amount of information that AWS provides about their Storage implementation.
AWS offers two types of storage, EBS and Instance Storage. EBS is similar to Digital Ocean’s storage, the disks are not in the same (physical) instance as the CPU and RAM, but in the same data center and presumably spread over many hardware disks, in this way the Storage volume will survive even if your instance fails, and you can move EBS volumes from one instance to another.
Instance Store on the other hand is a disk that is physically attached to the virtualization host, this provides a lot lower latency, more IOPS (input-output operations per second), but comes at the cost that the volume will not survive instance failure, as soon as the instance stops, the volume is gone so you should not use it for databases for example. For Tezos nodes specifically, instance store could be a good option if you have a setup with multiple nodes as fail-over.
EBS volumes come with caveats of their own however and most importantly there are multiple types of EBS volumes as can bee seen here:
For this test, we used only gp2 Volumes, the default Option, but it gets more complicated, EBS volumes get faster the bigger the volume is, so a 50 GB volume might perform slower under constant load than a 500 GB volume. On a running node, this should not be a big difference, as most IO then will fall under “Burst IOPS” — rate-limited fast access to your data, but for syncing the blockchain from scratch the size of the volume will have big influence as the following diagram shows:
For this test both EBS volumes were 250GB in size.
Enough theory, let’s look at data,
On all instances we ran a node with docker using
docker run -v <HostPath>:/var/run/tezos tezos/tezos:mainnet tezos-node — history-mode <mode>
and tracked BlockHeight over time, here are the results:
Unsurprisingly, Aws3 with the nvme instance performed best, what is interesting however is that EBS optimization had little to no effect, AWS writes about EBS optimized instances:
An Amazon EBS–optimized instance uses an optimized configuration stack and provides additional, dedicated capacity for Amazon EBS I/O. This optimization provides the best performance for your EBS volumes by minimizing contention between Amazon EBS I/O and other traffic from your instance.
So basically EBS optimized instances get dedicated bandwidth to EBS volumes, that would otherwise be shared with the networking bandwidth of your instance, however, it appears that the Tezos node does not generate enough traffic for this to have any significant effect.
It is also worth noting that archive nodes perform a little bit faster than other Storage modes, presumably because in rolling and full modes, the node spends extra time cleaning up the unneeded data.