Why Ethereum Uses a Trie for Key-Value Lookups

Ethereum is a decentralized platform that runs smart contracts, applications that run exactly as programmed without any possibility of downtime, censorship, fraud or third-party interference. Ethereum uses a blockchain to store and verify transactions, but it also has another layer of data storage that is not directly visible on the blockchain: the state trie.

The state trie is a data structure that keeps track of the current state of all accounts in the Ethereum network. Each account has an address, a balance, a nonce, and a storage root. The storage root is a pointer to another trie that contains the key-value pairs of the account’s storage. The state trie is a mapping between addresses and account states, and it is updated every time a transaction modifies an account.

But why does Ethereum use a trie data structure for storing and retrieving key-value pairs? What are the advantages of using a trie over a simple hash table or a database? In this article, we will explore the reasons behind Ethereum’s choice of using a trie, and how it benefits from its properties.

Why Ethereum Uses a Trie for Key-Value Lookups

Table of Contents

What is a Trie?
What is a Merkle Patricia Trie?
Why Does Ethereum Use a Trie for Key-Value Lookups?
Summary
Frequently Asked Questions (FAQs)
Question: What is a trie?
Question: What is a Merkle Patricia Trie?
Question: Why does Ethereum use a trie for key-value lookups?

What is a Trie?

A trie, also known as a radix tree, prefix tree, or digital tree, is a tree-like data structure that stores strings in its nodes. Each node can have multiple children, and each edge represents a character or a symbol. A string can be retrieved by following the path from the root node to a leaf node, where the value associated with the string is stored.

For example, the following image shows a trie that stores four strings: “dog”, “dot”, “cat”, and “car”. Each node has an array of 26 pointers, one for each letter of the alphabet. To find the value of “dog”, we start from the root node and follow the edges labeled “d”, “o”, and “g”, until we reach the leaf node that contains the value.

A trie has some advantages over other data structures for storing and retrieving strings:

It is fast and efficient: A trie can perform lookups, insertions, and deletions in O(k) time, where k is the length of the string. This is because each node only needs to compare one character at a time, and there is no need to hash or sort the strings.
It is compact and space-saving: A trie can store many strings with common prefixes in a shared structure, reducing the space complexity. For example, in the above image, the strings “dog” and “dot” share the same prefix “do”, so they only need one node to represent it.
It is deterministic and verifiable: A trie can be uniquely identified by its root hash, which is computed by hashing all the nodes recursively. This means that two tries with the same strings will have the same root hash, and any change in the trie will result in a different root hash. This property allows for easy verification and synchronization of tries across different nodes.

What is a Merkle Patricia Trie?

A Merkle Patricia Trie (MPT) is a special type of trie that combines the features of two other data structures: a Merkle tree and a Patricia tree.

A Merkle tree is a binary tree that stores hashes in its nodes. The leaf nodes store the hashes of the data blocks, and the internal nodes store the hashes of their children. The root node stores the hash of the entire tree, which serves as a summary or fingerprint of the data. A Merkle tree allows for efficient verification of data integrity and membership by using cryptographic proofs.

A Patricia tree, also known as a compact prefix tree or crit-bit tree, is a compressed version of a trie that eliminates unnecessary nodes and edges. A Patricia tree only has two types of nodes: branch nodes and leaf nodes. A branch node has exactly two children, and each edge represents a sequence of symbols instead of just one. A leaf node stores both the key and the value of a string. A Patricia tree reduces the space complexity and improves the lookup performance of a trie.

A Merkle Patricia Trie combines both structures by storing hashes in its nodes like a Merkle tree, and compressing its structure like a Patricia tree. A MPT has three types of nodes: extension nodes, branch nodes, and leaf nodes. An extension node has one child and represents a sequence of symbols that are shared by multiple keys. A branch node has 16 children (one for each hex digit) and optionally stores a value. A leaf node stores both the key and the value of a string.

For example, the following image shows an MPT that stores four key-value pairs: (“dog”, 1), (“dot”, 2), (“cat”, 3), and (“car”, 4). Each node has an associated hash that is computed by hashing its contents (including its children’s hashes). The root hash is 0x3c…b5.

A MPT has some advantages over a regular trie for storing and retrieving key-value pairs:

It is cryptographically authenticated: A MPT can provide proofs of existence or non-existence of a key-value pair by using the hashes along the path from the root to the node. For example, to prove that (“dog”, 1) exists in the MPT, we can provide the hashes of the nodes marked in blue in the above image. To prove that (“cow”, 5) does not exist in the MPT, we can provide the hashes of the nodes marked in red, and show that there is no edge labeled “c” from the root node.
It is fully deterministic: A MPT is guaranteed to have the same structure and root hash for the same set of key-value pairs, regardless of the order of insertion or deletion. This means that two MPTs with the same key-value pairs can be easily compared and synchronized by checking their root hashes.
It is compact and space-saving: A MPT eliminates redundant nodes and edges by using extension nodes and branch nodes, reducing the space complexity and improving the lookup performance.

Why Does Ethereum Use a Trie for Key-Value Lookups?

Ethereum uses a trie data structure for storing and retrieving key-value pairs because it benefits from its properties of being fast, efficient, compact, deterministic, and verifiable. Ethereum uses a modified version of the MPT, which has some additional features and optimizations for its specific use cases.

Ethereum uses four types of tries for different purposes:

State trie: This trie stores the state of all accounts in the Ethereum network. Each account has an address (a 20-byte hex string), a balance (a 256-bit integer), a nonce (a 256-bit integer), a code hash (a 32-byte hash of the smart contract code), and a storage root (a 32-byte hash of the storage trie). The state trie is a mapping between addresses and account states, and it is updated every time a transaction modifies an account. The state trie’s root hash is stored in the block header as the stateRoot field.
Storage trie: This trie stores the storage contents of each smart contract account. Each smart contract account can store arbitrary key-value pairs in its storage, where both keys and values are 256-bit integers. The storage trie is a mapping between keys and values for each account, and it is updated every time a smart contract modifies its storage. The storage trie’s root hash is stored in the state trie as the storage root of each account.
Transaction trie: This trie stores the transactions included in each block. Each transaction has a nonce, a gas price, a gas limit, a to address, a value, a data field, a v field, an r field, and an s field. The transaction trie is a mapping between transaction indices (0, 1, 2, …) and transactions for each block, and it is updated every time a new block is created. The transaction trie’s root hash is stored in the block header as the transactionsRoot field.
Receipt trie: This trie stores the receipts generated by each transaction execution. Each receipt has a status (0 for failure, 1 for success), a cumulative gas used, a bloom filter of logs, and a list of logs. Each log has an address, a list of topics, and a data field. The receipt trie is a mapping between transaction indices (0, 1, 2, …) and receipts for each block, and it is updated every time a new block is created. The receipt trie’s root hash is stored in the block header as the receiptsRoot field.

By using these tries, Ethereum can achieve several benefits:

It can efficiently store and retrieve key-value pairs related to accounts, contracts, transactions, and receipts.
It can verify the integrity and validity of any data by using cryptographic proofs based on hashes.
It can synchronize and compare different nodes’ states by using root hashes.
It can enable light clients to access relevant data without downloading the entire blockchain by using merkle proofs.

Summary

In this article, we have explained why Ethereum uses a trie data structure for storing and retrieving key-value pairs. We have discussed what a trie is, what a Merkle Patricia Trie is, and how Ethereum uses four types of tries for different purposes. We have also highlighted some of the benefits of using tries in terms of performance, efficiency, compactness, determinism, and verifiability.

Frequently Asked Questions (FAQs)

Question: What is a trie?

Answer: A trie is a tree-like data structure that stores strings in its nodes. Each node can have multiple children, and each edge represents a character or a symbol. A string can be retrieved by following the path from the root node to a leaf node.

Question: What is a Merkle Patricia Trie?

Answer: A Merkle Patricia Trie (MPT) is a special type of trie that combines the features of two other data structures: a Merkle tree and a Patricia tree. A MPT stores hashes in its nodes like a Merkle tree, and compresses its structure like a Patricia tree. A MPT can provide proofs of existence or non-existence of a key-value pair by using the hashes along the path from the root to the node.

Question: Why does Ethereum use a trie for key-value lookups?

Answer: Ethereum uses a trie data structure for storing and retrieving key-value pairs because it benefits from its properties of being fast, efficient, compact, deterministic, and verifiable. Ethereum uses four types of tries for different purposes: state trie, storage trie, transaction trie, and receipt trie. By using these tries, Ethereum can achieve several benefits such as verifying the integrity and validity of any data, synchronizing and comparing different nodes’ states, and enabling light clients to access relevant data.