Network Stability Report: Kusama With Parachains

We’ve monitored the Kusama network for stability after the first 5 parachain auctions. We’ve sampled metrics from opt-in validators to gather the information in Prometheus and Grafana for this report.

By PolkadotAugust 17, 2021

By Robert Habermeier, Polkadot Founder

We’ve monitored the Kusama network for stability after the first 5 parachain auctions. At this time, there were 6 parachains on the network.

Our interest was in 4 key areas:

  1. Candidate Production Stability
  2. Approval Voting Stats
  3. Network Connectivity
  4. Load

We’ve sampled metrics from opt-in validators to gather the information in Prometheus and Grafana as displayed in the graphs below.

Candidate Production Stability

Under ideal circumstances, in the current iteration of the protocol, each parachain produces a block for every 2 relay-chain blocks. We can determine the rate of block production for each parachain by dividing the amount of relay-chain blocks produced during the time the parachain has been online by the number of parachain blocks produced in that period of time.

The table below shows the values for the 6 current parachains as of a recent block number (#).

As all these parachains are secured by the same validator set and are validated by random validators — there should be no major discrepancies in the service provided by the validators to the parachains.

The effect of network noise is not worth considering at this time, as over these past days the parachains have not had enough time to be repeatedly exposed to every possible combination of backing validators. But noise certainly does not explain the large discrepancy between Shiden (and to a lesser extent, Khala) and the rest of the parachains, which mostly occupy the band between 5 and 10% off the ideal. It is worth noting that Statemine had a rocky period in the first few weeks of its launch which caused it to produce blocks only once every minute, and the current data is skewed by that initial issue.

There are two possible explanations for the discrepancy. The true reason may be one or some combination of the two:

  1. Heavy parachain execution or data
  2. Poor collator connectivity to validators

At the moment, the time window given to collators and validators to produce a parachain block is very short, which makes the system fragile and experience short delays in communication. For both issues, the long-term solution is to improve on the parachains protocol to allow a longer window for the next parachain block to be created. A short-term solution would be to position collators geographically closer to the bulk of the validator nodes. However, this creates a temporary regional centralization risk — one which would be mitigated by the long-term solution.

Approval Voting

The approval voting protocol is responsible for providing most of the security of parachains. It’s tightly integrated with the GRANDPA finality protocol. Roughly speaking, nodes are randomly selected to check the validity of parachain blocks. A certain number of nodes are required to finalize the relay chain block containing the candidate, and ‘no-shows’ who announce their intent to check but don’t follow up are automatically replaced. Disputes over validity are escalated to the entire validator set resulting in the slashing of at least one validator.

To benchmark approval voting, we can observe the following:

  1. GRANDPA finality lag opinions by validators
  2. Average ‘tranche’ of assignments by validators (ideally = 0)
  3. Number of assignments and approvals by validators

Finality Lag

The graph above shows on logarithmic scale, the maximum and average opinion on how many blocks should lag behind the head of the relay chain finality. Each validator has its own opinion, based on the validator’s perception of the approval status of each parachain block referenced by the relay chain.

Most of the time, this is between 2 and 5. However, it occasionally jumps up to 50. These incidents are somewhat alarming – there is a failsafe at 50 blocks which is clearly being hit very occasionally, once every few weeks.

These finality stall incidents are being investigated at the moment. We will propose a solution to governance, to resolve the issue in advance of the Polkadot parachains launch.

Average Tranche

Every validator is technically assigned to check every parachain block. The only question is when – typically only tranche 0 validators are actually enlisted to check, and later tranches only come into the picture when the tranche 0 validators fail to show up.

The graph above demonstrates that outside of the finality stall incidents, the tranches of the 50th and 95th percentile assignments are generally 0.

Assignments and Approvals

This graph demonstrates how validators’ assignments across the network are translated into their corresponding approval votes. We can see that all assignments are translated into approvals, although most of them are reported as stale. This data is inconsistent with the reported finality lag, as ‘stale’ approvals are those which become irrelevant after finality.

Almost by definition, a majority of approvals should not be stale as they are required for finality in the first place. There is a possibility that this category is being misreported or misrepresented by the node or Grafana. On Rococo, the corresponding graph shows an almost 1:1 mapping of assignments and successful approvals.

Network Connectivity

On Kusama, there are 900 validators, and every session 200 of them are randomly selected to participate in parachain consensus. Each current validator is meant to connect to the validators of the current validator set as well as the last 6 sessions.

There are a number of validators that have approximately 200 connections, and this is due to the fact that they’re part of an older validator set. Validators that are part of the current validator set should experience spikes of higher connectivity. We can see that in large part, the validators we inspected in the network are over-connected and are connected to most of the other 899 validators on the validation peer-set. This poses no problem for correctness but does present an opportunity for efficiency.

Some validators are under-connected and are not as plugged into the gossip network as they should be. Despite this, none of the validators have fewer than 100 connections, and therefore information should be gossiped to the validators, albeit over more hops.

Certain requests require point-to-point communication, and for that reason, all validators must be publicly reachable through a published node address. The node software automatically performs this function, but the node operator is responsible for ensuring that the node is reachable.

This graph shows the number of chunk requests made per second, and the number of failures of different types. The type of request here is unimportant, and the key takeaway is that ‘dial-failure’ failures (the yellow line in the lower graph) are almost exactly 10% of the number of requests made. This indicates that 10% of validators are unreachable on their published address

Load (CPU & Network)

This graph displays the CPU usage in cores by the validators. Most validators are in the 1.5-2 core utilization band. Our current recommendation is for validators to run with 4 core CPUs, so CPU utilization is within expectation.

This graph displays the CPU usage breakdown by task. The top 3 tasks dominate the CPU usage, and in descending order are the ‘libp2p-node’, ‘network-worker’, and ‘grandpa-voter’ tasks. These tasks are primarily networking-related, which indicates that optimizations in network utilization will reduce CPU utilization of the node substantially.

The majority of gossip traffic used by nodes occurs on the /polkadot/validation/1 network-protocol. This rolls up all gossip between nodes and accounts for a large portion of the network traffic. This graph shows that in general, average network bandwidth across validators is very stable at between 400-500KB/s with slightly more ‘in’ than ‘out’ bandwidth.

The majority of request/response bandwidth used by nodes is in the chunk-distribution protocol. With 200 validators and a maximum PoV size of 1MB, chunks are at peak around 15KB. At these average request/response rates, this implies a bandwidth of around 307KB/s in and 138KB/s out. However, at the moment PoVs are much, much smaller as parachains aren’t near peak transaction volume.

Recommendations

Overall, the network is running smoothly. Although average peer counts and network bandwidth appear consistent across the network, there are outlier nodes which are over-connected and experience higher levels of network bandwidth.

It appears that in the current environment, the standing recommendation of a strong 4-core CPU and 64GB Memory is more than sufficient along with a fast internet connection. Current bandwidth usage appears to be in the range of 8-16Mbps, so a typical datacenter connection of 100Mbps will be sufficient to maintain the last 5.

The only major issue is the infrequent finality stalls that are experienced by the network. These stalls are being caught by a failsafe and thus have not caused much damage, but the underlying cause is being investigated and a solution to remedy this will be proposed prior to parachains launching on Polkadot.

From the blog

Developing Analog's Timechain: Why we chose Polkadot SDK

Discover why Analog chose Polkadot SDK to power its Timechain, leveraging modularity, security, and forkless upgrades to create a seamless interoperability hub for multichain and cross-chain dapp ecosystems.

Polkadot Roundup MMXXIV

Dive into Polkadot Roundup MMXXIV for an in-depth look at 2024 milestones, including high-performance rollups, decentralized governance, and exciting expansions. Discover how Polkadot is evolving into a next-level web3 platform—plus a sneak peek at Polkadot 2, Polkadot 3, and the JAM protocol supercomputer.

Unifying Polkadot’s developer docs: A new chapter for builder experience

The PaperMoon team, supported by a Decentralized Futures grant, launches the Polkadot Developer Documentation Hub, a unified platform that simplifies developer onboarding and provides access to essential resources. This marks a transformative step for builders in the Polkadot ecosystem.

Building the future of decentralized marketing: A case study

Polkadot’s decentralized marketing sets the standard for Web3 growth. With OpenGov, bounties, and community tools, contributors are empowered to innovate, ensuring consistent, impactful campaigns and showcasing the potential of decentralized ecosystems.

Uplifting new voices and validators: Polkadot's path to sustainable decentralization

Decentralized Nodes and Decentralized Voices programs by the Web3 Foundation offer pathways for validators and governance participants to gain influence based on merit, helping to build a fairer, more resilient Polkadot ecosystem that aligns with the core principles of web3.

Is large-scale, high-quality Web3 education possible?

The Polkadot Blockchain Academy (PBA) is taking Web3 education to the next level with PBA-X, an online course crafted to deliver high-quality blockchain knowledge at scale. Learn how we’re preserving educational standards, fostering a global community, and creating new opportunities for learners everywhere.

Blockchain explained: A practical guide from concepts to use cases

Blockchain has grown far beyond its roots in cryptocurrency, reshaping industries from finance to healthcare. This guide walks you through the basics, benefits, and groundbreaking potential of blockchain and its impact on the future of the open internet.

Empowering the next wave of founders: Welcome to EasyA x Polkadot University

Unlock a structured path to start building on Polkadot with EasyA x Polkadot University.

Dynamic & Modular: Scaling Ambition with Agile Coretime

Polkadot’s Agile Coretime simplifies launching and scaling blockchain projects with dynamic blockspace allocation and flexible cost options. Learn how Agile Coretime makes it easier to build, launch, and scale ambitious Web3 projects.

How play-to-earn (P2E) is transforming onchain mobile sports gaming

Play-to-earn games are transforming mobile sports gaming. Learn how blockchain, NFTs, and platforms like Polkadot create new opportunities for digital asset ownership and cross-chain gameplay.

Polkadot Token 2049 and Decoded Asia 2024: A multichain ecosystem in action

At Token 2049 and Decoded Asia 2024 in Singapore, Polkadot teams and contributors showcased a multichain future for real-world applications. Key moments included Dr. Gavin Wood’s vision for digital individuality, Chrissy Hill’s regulatory insights, and announcements from emerging projects shaping the Web3 ecosystem.

What is a crypto wallet? Your all-access pass to the future web

In Web3, your wallet is your most valuable digital tool. It’s more than just a place to store, send, and receive cryptocurrencies securely—it’s your passport to the decentralized world.