Gas Problems in Ethereum Blocks
I’ve decided to conduct research and analytics on Ethereum blocks, as I haven’t seen anyone conducting complex analysis of the Ethereum Blockchain’s blocks. After a brief initial examination, I wanted to formulate some hypotheses and confirm or reject them.
Data Extraction
First of all, I extracted all the data about Ethereum Blocks. The GitHub repository containing the extracted data can be viewed here. This was made possible thanks to Ethereum Nodes and flipsidecrypto.xyz. I extracted data from both sources and combined it into a single dataset with approximately 18 million lines, each line being associated with a block. Here’s a snippet:
Not all of the columns are displayed here, so let’s provide a brief description of each one for better understanding during the further reading process:
block_number
- the serial number of the block;block_timestamp
- time when the block was produced;transactions_count
- amount of transactions in the block;difficulty
- the effort required to produce a block;total_difficulty
- total difficulty of the chain at the current block;extra_data
- any data that was included additionally in the block;gas_limit
- gas limit by all of the transactions in the block;gas_used
- total gas used in the current block;hash
- hash of the block;parent_hash
- hash of the parent block;miner
- who mined the block;nonce
- nonce;receipts_root
- the root of the state tree;sha3_uncles
- the mechanism which Ethereum Javascript RLP encodes an empty string;size
- block size (determined by gas limit of the block).
It would have been impractical to analyze ~18 million rows as a whole. Therefore, I decided to categorize the blocks by years, ranging from 2015 to 2023, to better comprehend the narratives and trends in each year.
Block Dynamics
First of all, I wanted to observe the dynamics of the main metrics in the dataset. Here’s the general hypothesis I aim to test:
- I expect that all quantitative metrics will exhibit linear growth over the specified time period.
I also expect to extract valuable insights from the Ethereum Blocks Data at this stage of the analytics. To achieve this, I have created some visualizations to better comprehend the linear growth over time. Let’s explore the different charts:
As we can see from the chart above, there was almost linear growth in the average number of transactions in a single block over the years. However, there were some declines in 2022 and 2023, mostly due to reduced user activity following the 2021 bull market.
Another hypothesis that could be considered is that the network began producing more blocks but with fewer transactions per block. However, this is simply not true, as over the past 4 years, the network has consistently produced almost the same number of blocks:
- 2020: ~2.3M blocks
- 2021: ~2.3M blocks
- 2022: ~2.3M blocks
The amount of blocks produced does not depend on the amount of transactions submitted.
Let’s move on to the next chart:
The efforts required to produce a block have been growing linearly over time. As you can see, there’s no data for 2023, but since 2023 hasn’t ended yet, the data I gathered isn’t complete for that year. Let’s move on to the next chart:
The Total Difficulty represents the overall difficulty of the blockchain at the current block. Thus, it’s evident that as the chain continues to receive new blocks, the difficulty level will increase with each addition. Hence, everything is in order in that regard.
The next two charts will display the gas limit and the actual gas usage in the block. Let’s view them below:
The gas fees on Ethereum in a single block have been increasing from the beginning. Why? The average number of transactions in a block has been growing over time, resulting in higher gas fees.
More transactions lead to higher gas fees, and subsequently, more gas is used in a single block. It’s as simple as that.
The block size, determined by a given block’s gas limit, has also been linearly growing over time. Why? The logic is the same as the example above: more transactions → higher gas fees → increased gas usage in a block → larger gas limit → greater block size.
So here are some conclusions we can draw after this brief analysis of the dynamics:
- Not all quantitative metrics grow linearly over time.
- The number of blocks produced does not depend on the number of transactions submitted.
- The difficulty of block production has been linearly increasing.
- Gas fees in a block have been linearly growing.
Correlations
The next step in the exploratory data analysis is to identify various correlations. I have plotted more than 20 charts, each representing different visualizations.
However, I’ll showcase the most interesting ones I’ve discovered during this analysis (displaying 20 different charts would be too cluttered). First, let’s begin with the general correlation table among different parameters:
We can identify several things by examining this correlation heatmap:
- There are no inverse correlation values among any variables
- Strong correlations exist among block_number, total_difficulty, and gas_limit
- Nearly all values have an average correlation (between 0.4 and 0.7)
Now, let’s examine the same values split into years. I have created various bar plots to provide a clearer visual representation. For accuracy and rigor, I’ll only consider the correlation as significant if its absolute value is ≥ 0.5.
First of all, let’s test if the strong correlations among block_number, total_difficulty, and gas_limit remain stable over a period of time.
- We can clearly see that the correlation between block_number and total_difficulty remains stable. With every new block added, the complexity of the overall chain state becomes more difficult.
- The correlation values between block_number and gas_limit, and between total_difficulty and gas_limit, are nearly the same.
- Only blocks from 2017, 2019, 2020, and 2021 exhibit a strong correlation among the listed parameters.
As block_number and total_difficulty have a strict correlation, I’ll assume that they represent the same parameter (as they basically signify the same increasing difficulty of the chain). Thus, I’ll focus on the correlation between block_number and gas_limit.
The first thought that came to my mind was to find a parameter that correlates strongly enough with gas_limit only in 2017, 2019, 2020, and 2021, but at the same time, has a different correlation state with block_number (otherwise, it wouldn’t make sense to analyze it).
I found the difficulty parameter, which shows a quite similar correlation pattern with gas_limit but has a different pattern with block_number. Let’s take a look:
According to the charts above, we can formulate several assumptions:
- The difficulty of the block did not follow a strictly linear growth pattern over time, exhibiting ups and downs in different years.
- Gas limits were adjusted to be closer to the actual gas used in some years than in others.
To test the first hypothesis, let’s compare the distributions of block difficulties in different years (I reduced the dataset size, but it remains representative):
The distribution appears to resemble a normal distribution in each case, and it’s evident that the distribution shifts to the left with each passing year, indicating a mostly linear trend.
Now, let’s examine the gas_used/gas_limit ratio:
In fact, we observe the following:
- The size of the blocks clearly depends on the gas_limit.
- Gas_limit has been steadily and consistently increasing over time.
- Consequently, the actual gas used in the blocks is not growing at the same rate as the gas limit.
The issue here is that the gas_limit only exhibits a strong correlation with the period of time, i.e., block number or total difficulty of the chain, and it does not depend on the number of transactions in the block:
So, the insight here is that it doesn’t matter if there are many transactions or few transactions in a block — the block size will always grow because the gas limit is constantly increasing.
Conclusion
In fact, we could use less space to store a block (as the size would be smaller) if we could reduce the gas limit for each block, making it more dynamic instead of static. However, calculating and optimizing the gas limit for each block can be costly and require more time and resources than simply retaining extra data on a node.
If you’d like to discuss this article or ask any questions related to it, please visit my twitter profile (which will be displayed below)and see the pinned tweet. Thank you!
If you want to see the data process, visit my github.
my twitter: twitter.com/paramonoww
my telegram: t.me/paramonoww
my linkedin: linkedin.com/in/paramonoww
my medium: medium.com/@paramonoww