Ethereum: What I Know
This article includes the most important information about Ethereum and is written as the continuation to the Bitcoin article.
This article describes an older version of Ethereum to help you see a bit bigger picture. All of the modern updates will be described closed to the end.
- 1. What Is Ethereum
- 2. Hash and Address
- 3. Transactions
- 4. Smart Contracts
- 5. Tokens
- 6. Oracles
- 7. The Ethereum Virtual Machine
- 8. Consensus
- More to come...
What Is Ethereum
Ethereum is often described as "the world computer.” But what does that mean? Let’s start with a computer science–focused description, and then try to decipher that with a more practical analysis of Ethereum’s capabilities and characteristics, while comparing it to Bitcoin and other decentralized information exchange platforms (or "blockchains" for short).
From a computer science perspective, Ethereum is a deterministic but practically unbounded state machine, consisting of a globally accessible singleton state and a virtual machine that applies changes to that state.
From a more practical perspective, Ethereum is an open source, globally decentralized computing infrastructure that executes programs called smart contracts. It uses a blockchain to synchronize and store the system’s state changes, along with a cryptocurrency called ether to meter and constrain execution resource costs.
The Ethereum platform enables developers to build powerful decentralized applications with built-in economic functions. While providing high availability, auditability, transparency, and neutrality, it also reduces or eliminates censorship and reduces certain counterparty risks.
Compared to Bitcoin Many people will come to Ethereum with some prior experience of cryptocurrencies, specifically Bitcoin. Ethereum shares many common elements with other open blockchains: a peer-to-peer network connecting participants, a Byzantine fault–tolerant consensus algorithm for synchronization of state updates (a proof-of-work blockchain), the use of cryptographic primitives such as digital signatures and hashes, and a digital currency (ether).
Yet in many ways, both the purpose and construction of Ethereum are strikingly different from those of the open blockchains that preceded it, including Bitcoin.
Ethereum’s purpose is not primarily to be a digital currency payment network. While the digital currency ether is both integral to and necessary for the operation of Ethereum, ether is intended as a utility currency to pay for use of the Ethereum platform as the world computer.
Unlike Bitcoin, which has a very limited scripting language, Ethereum is designed to be a general-purpose programmable blockchain that runs a virtual machine capable of executing code of arbitrary and unbounded complexity. Where Bitcoin’s Script language is, intentionally, constrained to simple true/false evaluation of spending conditions, Ethereum’s language is Turing complete, meaning that Ethereum can straightforwardly function as a general-purpose computer.
In Bitcoin, the reference implementation is developed by the Bitcoin Core open source project and implemented as the bitcoind client. In Ethereum, rather than a reference implementation there is a reference specification, a mathematical description of the system in the Yellow Paper. There are a number of clients, which are built according to the reference specification.
The Birth of Ethereum All great innovations solve real problems, and Ethereum is no exception. Ethereum was conceived at a time when people recognized the power of the Bitcoin model, and were trying to move beyond cryptocurrency applications. But developers faced a conundrum: they either needed to build on top of Bitcoin or start a new blockchain. Building upon Bitcoin meant living within the intentional constraints of the network and trying to find workarounds. The limited set of transaction types, data types, and sizes of data storage seemed to limit the sorts of applications that could run directly on Bitcoin; anything else needed additional off-chain layers, and that immediately negated many of the advantages of using a public blockchain. For projects that needed more freedom and flexibility while staying on-chain, a new blockchain was the only option. But that meant a lot of work: bootstrapping all the infrastructure elements, exhaustive testing, etc.
Toward the end of 2013, Vitalik Buterin, a young programmer and Bitcoin enthusiast, started thinking about further extending the capabilities of Bitcoin and Mastercoin (an overlay protocol that extended Bitcoin to offer rudimentary smart contracts). In October of that year, Vitalik proposed a more generalized approach to the Mastercoin team, one that allowed flexible and scriptable (but not Turing-complete) contracts to replace the specialized contract language of Mastercoin. While the Mastercoin team were impressed, this proposal was too radical a change to fit into their development roadmap.
In December 2013, Vitalik started sharing a whitepaper that outlined the idea behind Ethereum: a Turing-complete, general-purpose blockchain. A few dozen people saw this early draft and offered feedback, helping Vitalik evolve the proposal.
Ethereum’s founders were thinking about a blockchain without a specific purpose, that could support a broad variety of applications by being programmed. The idea was that by using a general-purpose blockchain like Ethereum, a developer could program their particular application without having to implement the underlying mechanisms of peer-to-peer networks, blockchains, consensus algorithms, etc. The Ethereum platform was designed to abstract these details and provide a deterministic and secure programming environment for decentralized blockchain applications.
Much like Satoshi, the founders didn’t just invent a new technology; they combined new inventions with existing technologies in a novel way and delivered the prototype code to prove their ideas to the world.
The founders worked for years, building and refining the vision. And on July 30, 2015, the first Ethereum block was mined. The world’s computer started serving the world.
Ethereum’s Four Stages of Development Ethereum’s development was planned over four distinct stages, with major changes occurring at each stage. A stage may include subreleases, known as "hard forks," that change functionality in a way that is not backward compatible.
Block #0 Frontier—The initial stage of Ethereum, lasting from July 30, 2015, to March 2016.
Block #200,000 Ice Age—A hard fork to introduce an exponential difficulty increase, to motivate a transition to PoS when ready.
Block #1,150,000 Homestead—The second stage of Ethereum, launched in March 2016.
Block #1,192,000 DAO—A hard fork that reimbursed victims of the hacked DAO contract and caused Ethereum and Ethereum Classic to split into two competing systems.
Block #2,463,000 Tangerine Whistle—A hard fork to change the gas calculation for certain I/O-heavy operations and to clear the accumulated state from a denial-of-service (DoS) attack that exploited the low gas cost of those operations.
Block #2,675,000 Spurious Dragon—A hard fork to address more DoS attack vectors, and another state clearing. Also, a replay attack protection mechanism.
Block #4,370,000 Metropolis Byzantium—Metropolis is the third stage of Ethereum. Launched in October 2017, Byzantium is the first part of Metropolis, adding low-level functionalities and adjusting the block reward and difficulty.
Block #7,280,000 Constantinople / St. Petersburg—Constantinople was planned to be the second part of Metropolis with similar improvements. A few hours before its activation, a critical bug was discovered. The hard fork was therefore postponed and renamed St. Petersburg.
Block #9,069,000 Istanbul—An additional hard fork with the same approach, and naming convention, as for the prior two.
Block #9,200,000 Muir Glacier—A hard fork whose sole purpose was to adjust the difficulty again due to the exponential increase introduced by Ice Age.
Block #11,500,000 Berlin—A hard fork implemented on April 15, 2021, refining gas costs for certain EVM (Ethereum Virtual Machine) actions, adding support for multiple transaction types, and making changes to the transaction refund mechanism.
Block #12,244,000 London—A significant upgrade launched on August 5, 2021, introducing EIP-1559 which changed the transaction fee mechanism to include a fixed-per-block network fee that is burned and a flexible, optional 'tip' paid to miners. This update aimed at reducing ETH inflation by burning part of the transaction fees.
Block #13,773,000 Arrow Glacier—A hard fork that took place on December 9, 2021. Its purpose was solely to delay the difficulty bomb, which is a mechanism intended to make mining more difficult and encourage the network's transition to Proof of Stake.
Block #14,902,144 Gray Glacier—Activated on June 30, 2022, this upgrade once again delayed the Ethereum difficulty bomb, pushing it further to a future date to ensure a smoother transition to Proof of Stake.
Block #15,537,393 The Merge—A highly anticipated upgrade completed on September 15, 2022, marking the official transition of Ethereum from Proof of Work (PoW) to Proof of Stake (PoS). This event significantly reduced the energy consumption of network operations by approximately 99.95% and transitioned the entire network's operation consensus mechanism.
Block #17,158,887 Shanghai / Capella (Shapella)—Set for early 2023, this upgrade included further enhancements and improvements and allowed the staked ETH withdrawals, which was previously locked during the staking process.
Block #19,426,589 Cancun-Deneb (Dencun)—Activated on March 13 and includes Proto-Danksharding (EIP-4844), introducing temporary data blobs for cheaper layer 2 (L2) rollup storage.
A General-Purpose Blockchain The original blockchain, namely Bitcoin’s blockchain, tracks the state of units of bitcoin and their ownership. You can think of Bitcoin as a distributed consensus state machine, where transactions cause a global state transition, altering the ownership of coins. The state transitions are constrained by the rules of consensus, allowing all participants to (eventually) converge on a common (consensus) state of the system, after several blocks are mined.
Ethereum is also a distributed state machine. But instead of tracking only the state of currency ownership, Ethereum tracks the state transitions of a general-purpose data store, i.e., a store that can hold any data expressible as a key–value tuple. A key–value data store holds arbitrary values, each referenced by some key; for example, the value "Mastering Ethereum" referenced by the key "Book Title". In some ways, this serves the same purpose as the data storage model of Random Access Memory (RAM) used by most general-purpose computers. Ethereum has memory that stores both code and data, and it uses the Ethereum blockchain to track how this memory changes over time. Like a general-purpose stored-program computer, Ethereum can load code into its state machine and run that code, storing the resulting state changes in its blockchain. Two of the critical differences from most general-purpose computers are that Ethereum state changes are governed by the rules of consensus and the state is distributed globally. Ethereum answers the question: "What if we could track any arbitrary state and program the state machine to create a world-wide computer operating under consensus?"
Turing Completeness The term refers to English mathematician Alan Turing, who is considered the father of computer science. In 1936 he created a mathematical model of a computer consisting of a state machine that manipulates symbols by reading and writing them on sequential memory (resembling an infinite-length paper tape). With this construct, Turing went on to provide a mathematical foundation to answer (in the negative) questions about universal computability, meaning whether all problems are solvable. He proved that there are classes of problems that are uncomputable. Specifically, he proved that the halting problem (whether it is possible, given an arbitrary program and its input, to determine whether the program will eventually stop running) is not solvable.
Alan Turing further defined a system to be Turing complete if it can be used to simulate any Turing machine. Such a system is called a Universal Turing machine (UTM).
Ethereum’s ability to execute a stored program, in a state machine called the Ethereum Virtual Machine, while reading and writing data to memory makes it a Turing-complete system and therefore a UTM. Ethereum can compute any algorithm that can be computed by any Turing machine, given the limitations of finite memory.
Ethereum’s groundbreaking innovation is to combine the general-purpose computing architecture of a stored-program computer with a decentralized blockchain, thereby creating a distributed single-state (singleton) world computer. Ethereum programs run "everywhere," yet produce a common state that is secured by the rules of consensus.
Hearing that Ethereum is Turing complete, you might arrive at the conclusion that this is a feature that is somehow lacking in a system that is Turing incomplete. Rather, it is the opposite. Turing completeness is very easy to achieve; in fact, the simplest Turing-complete state machine known has 4 states and uses 6 symbols, with a state definition that is only 22 instructions long. Indeed, sometimes systems are found to be "accidentally Turing complete."
However, Turing completeness is very dangerous, particularly in open access systems like public blockchains, because of the halting problem we touched on earlier. For example, modern printers are Turing complete and can be given files to print that send them into a frozen state. The fact that Ethereum is Turing complete means that any program of any complexity can be computed by Ethereum. But that flexibility brings some thorny security and resource management problems. An unresponsive printer can be turned off and turned back on again. That is not possible with a public blockchain.
Turing proved that you cannot predict whether a program will terminate by simulating it on a computer. In simple terms, we cannot predict the path of a program without running it. Turing-complete systems can run in "infinite loops," a term used (in oversimplification) to describe a program that does not terminate. It is trivial to create a program that runs a loop that never ends. But unintended never-ending loops can arise without warning, due to complex interactions between the starting conditions and the code. In Ethereum, this poses a challenge: every participating node (client) must validate every transaction, running any smart contracts it calls. But as Turing proved, Ethereum can’t predict if a smart contract will terminate, or how long it will run, without actually running it (possibly running forever). Whether by accident or on purpose, a smart contract can be created such that it runs forever when a node attempts to validate it. This is effectively a DoS attack. And of course, between a program that takes a millisecond to validate and one that runs forever are an infinite range of nasty, resource-hogging, memory-bloating, CPU-overheating programs that simply waste resources. In a world computer, a program that abuses resources gets to abuse the world’s resources. How does Ethereum constrain the resources used by a smart contract if it cannot predict resource use in advance?
To answer this challenge, Ethereum introduces a metering mechanism called gas. As the EVM executes a smart contract, it carefully accounts for every instruction (computation, data access, etc.). Each instruction has a predetermined cost in units of gas. When a transaction triggers the execution of a smart contract, it must include an amount of gas that sets the upper limit of what can be consumed running the smart contract. The EVM will terminate execution if the amount of gas consumed by computation exceeds the gas available in the transaction. Gas is the mechanism Ethereum uses to allow Turing-complete computation while limiting the resources that any program can consume.
The next question is, 'how does one get gas to pay for computation on the Ethereum world computer?' You won’t find gas on any exchanges. It can only be purchased as part of a transaction, and can only be bought with ether. Ether needs to be sent along with a transaction and it needs to be explicitly earmarked for the purchase of gas, along with an acceptable gas price. Just like at the pump, the price of gas is not fixed. Gas is purchased for the transaction, the computation is executed, and any unused gas is refunded back to the sender of the transaction.
Development Culture In Bitcoin, development is guided by conservative principles: all changes are carefully studied to ensure that none of the existing systems are disrupted. For the most part, changes are only implemented if they are backward compatible. Existing clients are allowed to opt-in, but will continue to operate if they decide not to upgrade.
In Ethereum, by comparison, the community’s development culture is focused on the future rather than the past. The (not entirely serious) mantra is "move fast and break things." If a change is needed, it is implemented, even if that means invalidating prior assumptions, breaking compatibility, or forcing clients to update. Ethereum’s development culture is characterized by rapid innovation, rapid evolution, and a willingness to deploy forward-looking improvements, even if this is at the expense of some backward compatibility.
What this means to you as a developer is that you must remain flexible and be prepared to rebuild your infrastructure as some of the underlying assumptions change. One of the big challenges facing developers in Ethereum is the inherent contradiction between deploying code to an immutable system and a development platform that is still evolving. You can’t simply "upgrade" your smart contracts. You must be prepared to deploy new ones, migrate users, apps, and funds, and start over.
Ironically, this also means that the goal of building systems with more autonomy and less centralized control is still not fully realized. Autonomy and decentralization require a bit more stability in the platform than you’re likely to get in Ethereum in the next few years. In order to "evolve" the platform, you have to be ready to scrap and restart your smart contracts, which means you have to retain a certain degree of control over them.
But, on the positive side, Ethereum is moving forward very fast. There’s little opportunity for "bike-shedding," an expression that means holding up development by arguing over minor details such as how to build the bicycle shed at the back of a nuclear power station. If you start bike-shedding, you might suddenly discover that while you were distracted the rest of the development team changed the plan and ditched bicycles in favor of autonomous hovercraft.
Hash and Address
Asymmetric cryptography, hashing and addresses were fully described in the Bitcoin article. For Ethereum almost all of that will be the same. The differences between the two are described bellow.
Ethereum’s Cryptographic Hash Function: Keccak-256 Ethereum uses the Keccak-256 cryptographic hash function in many places. Keccak-256 was designed as a candidate for the SHA-3 Cryptographic Hash Function Competition held in 2007 by the National Institute of Standards and Technology. Keccak was the winning algorithm, which became standardized as Federal Information Processing Standard (FIPS) 202 in 2015.
However, during the period when Ethereum was developed, the NIST standardization was not yet finalized. NIST adjusted some of the parameters of Keccak after the completion of the standards process, allegedly to improve its efficiency. This was occurring at the same time as heroic whistleblower Edward Snowden revealed documents that imply that NIST may have been improperly influenced by the National Security Agency to intentionally weaken the Dual_EC_DRBG random-number generator standard, effectively placing a backdoor in the standard random number generator. The result of this controversy was a backlash against the proposed changes and a significant delay in the standardization of SHA-3. At the time, the Ethereum Foundation decided to implement the original Keccak algorithm, as proposed by its inventors, rather than the SHA-3 standard as modified by NIST.
Ethereum Addresses Ethereum addresses are unique identifiers that are derived from public keys or contracts using the Keccak-256 one-way hash function.
keccak256(<public key>) = 2a5bc342ed616b5ba5732269001d3f1ef827552ae1114027bd3ecf1f086ba0f9
Then we keep only the last 20 bytes (least significant bytes), which is our Ethereum address. Most often you will see Ethereum addresses with the prefix 0x that indicates they are hexadecimal-encoded.
0x001d3f1ef827552ae1114027bd3ecf1f086ba0f9
Unlike Bitcoin addresses, which are encoded in the user interface of all clients to include a built-in checksum to protect against mistyped addresses, Ethereum addresses are presented as raw hexadecimal without any checksum.
The rationale behind that decision was that Ethereum addresses would eventually be hidden behind abstractions (such as name services) at higher layers of the system and that checksums should be added at higher layers if necessary.
In reality, these higher layers were developed too slowly and this design choice led to a number of problems in the early days of the ecosystem, including the loss of funds due to mistyped addresses and input validation errors. Furthermore, because Ethereum name services were developed slower than initially expected, alternative encodings were adopted very slowly by wallet developers.
Hex Encoding with Checksum in Capitalization (EIP-55)
Due to the slow deployment of ICAP and name services, a standard was proposed by Ethereum Improvement Proposal 55. EIP-55 offers a backward-compatible checksum for Ethereum addresses by modifying the capitalization of the hexadecimal address. The idea is that Ethereum addresses are case-insensitive and all wallets are supposed to accept Ethereum addresses expressed in capital or lowercase characters, without any difference in interpretation.
By modifying the capitalization of the alphabetic characters in the address, we can convey a checksum that can be used to protect the integrity of the address against typing or reading mistakes. Wallets that do not support EIP-55 checksums simply ignore the fact that the address contains mixed capitalization, but those that do support it can validate it and detect errors with a 99.986% accuracy.
EIP-55 is quite simple to implement. We take the Keccak-256 hash of the lowercase hexadecimal address. This hash acts as a digital fingerprint of the address, giving us a convenient checksum. Any small change in the input (the address) should cause a big change in the resulting hash (the checksum), allowing us to detect errors effectively. The hash of our address is then encoded in the capitalization of the address itself. Let’s break it down, step by step:
1. Hash the lowercase address, without the 0x prefix:
keccak256("001d3f1ef827552ae1114027bd3ecf1f086ba0f9")=23a69c1653e4ebbb619b0b2cb8a9bad49892a8b9695d9a19d8f673ca991deae1
2. Capitalize each alphabetic address character if the corresponding hex digit of the hash is greater than or equal to 0x8. This is easier to show if we line up the address and the hash:
Address: 001d3f1ef827552ae1114027bd3ecf1f086ba0f9
Hash: 23a69c1653e4ebbb619b0b2cb8a9bad49892a8b9...
Our address contains an alphabetic character d in the fourth position. The fourth character of the hash is 6, which is less than 8. So, we leave the d lowercase. The next alphabetic character in our address is f, in the sixth position. The sixth character of the hexadecimal hash is c, which is greater than 8. Therefore, we capitalize the F in the address, and so on. As you can see, we only use the first 20 bytes (40 hex characters) of the hash as a checksum, since we only have 20 bytes in the address to capitalize appropriately.
Check the resulting mixed-capitals address yourself and see if you can tell which characters were capitalized and which characters they correspond to in the address hash:
Address: 001d3F1ef827552Ae1114027BD3ECF1f086bA0F9
Hash: 23a69c1653e4ebbb619b0b2cb8a9bad49892a8b9...
Now, let’s look at how EIP-55 addresses will help us find an error. Let’s assume we have printed out an Ethereum address, which is EIP-55 encoded:
0x001d3F1ef827552Ae1114027BD3ECF1f086bA0F9
Now let’s make a basic mistake in reading that address. The character before the last one is a capital F. For this example let’s assume we misread that as a capital E, and we type the following (incorrect) address into our wallet:
0x001d3F1ef827552Ae1114027BD3ECF1f086bA0E9
Fortunately, our wallet is EIP-55 compliant! It notices the mixed capitalization and attempts to validate the address. It converts it to lowercase, and calculates the checksum hash:
keccak256("001d3f1ef827552ae1114027bd3ecf1f086ba0e9") = 5429b5d9460122fb4b11af9cb88b7bb76d8928862e0a57d46dd18dd8e08a6927
As you can see, even though the address has only changed by one character (in fact, only one bit, as e and f are one bit apart), the hash of the address has changed radically. That’s the property of hash functions that makes them so useful for checksums!
Now, let’s line up the two and check the capitalization:
001d3F1ef827552Ae1114027BD3ECF1f086bA0E9
5429b5d9460122fb4b11af9cb88b7bb76d892886...
It’s all wrong! Several of the alphabetic characters are incorrectly capitalized. Remember that the capitalization is the encoding of the correct checksum.
The capitalization of the address we input doesn’t match the checksum just calculated, meaning something has changed in the address, and an error has been introduced.
Transactions
Let’s take a look at the basic structure of a transaction, as it is serialized and transmitted on the Ethereum network. Each client and application that receives a serialized transaction will store it in-memory using its own internal data structure, perhaps embellished with metadata that doesn’t exist in the network serialized transaction itself. The network-serialization is the only standard form of a transaction.
A transaction is a serialized binary message that contains the following data:
Nonce - A sequence number, issued by the originating EOA, used to prevent message replay. Gas price - The amount of ether (in wei) that the originator is willing to pay for each unit of gas. Gas limit - The maximum amount of gas the originator is willing to buy for this transaction. Recipient - The destination Ethereum address. Value - The amount of ether (in wei) to send to the destination. Data - The variable-length binary data payload. v(special prefix),r,s - The three components of an ECDSA digital signature of the originating EOA.
The transaction message’s structure is serialized using the Recursive Length Prefix (RLP) encoding scheme, which was created specifically for simple, byte-perfect data serialization in Ethereum. All numbers in Ethereum are encoded as big-endian integers, of lengths that are multiples of 8 bits.
Note that the field labels (to, gas limit, etc.) are shown here for clarity, but are not part of the transaction serialized data, which contains the field values RLP-encoded. In general, RLP does not contain any field delimiters or labels. RLP’s length prefix is used to identify the length of each field. Anything beyond the defined length belongs to the next field in the structure.
While this is the actual transaction structure transmitted, most internal representations and user interface visualizations embellish this with additional information, derived from the transaction or from the blockchain.
For example, you may notice there is no “from” data in the address identifying the originator EOA. That is because the EOA’s public key can be derived from the v,r,s components of the ECDSA signature. The address can, in turn, be derived from the public key. When you see a transaction showing a "from" field, that was added by the software used to visualize the transaction. Other metadata frequently added to the transaction by client software includes the block number (once it is mined and included in the blockchain) and a transaction ID (calculated hash).
The Transaction Nonce Strictly speaking, the nonce is an attribute of the originating address; that is, it only has meaning in the context of the sending address. However, the nonce is not stored explicitly as part of an account’s state on the blockchain. Instead, it is calculated dynamically, by counting the number of confirmed transactions that have originated from an address.
There are two scenarios where the existence of a transaction-counting nonce is important: the usability feature of transactions being included in the order of creation, and the vital feature of transaction duplication protection. Let’s look at an example scenario for each of these:
Imagine you wish to make two transactions. You have an important payment to make of 6 ether, and also another payment of 8 ether. You sign and broadcast the 6-ether transaction first, because it is the more important one, and then you sign and broadcast the second, 8-ether transaction. Sadly, you have overlooked the fact that your account contains only 10 ether, so the network can’t accept both transactions: one of them will fail. Because you sent the more important 6-ether one first, you understandably expect that one to go through and the 8-ether one to be rejected. However, in a decentralized system like Ethereum, nodes may receive the transactions in either order; there is no guarantee that a particular node will have one transaction propagated to it before the other. As such, it will almost certainly be the case that some nodes receive the 6-ether transaction first and others receive the 8-ether transaction first. Without the nonce, it would be random as to which one gets accepted and which rejected. However, with the nonce included, the first transaction you sent will have a nonce of, let’s say, 3, while the 8-ether transaction has the next nonce value (i.e., 4). So, that transaction will be ignored until the transactions with nonces from 0 to 3 have been processed, even if it is received first. Phew!
Now imagine you have an account with 100 ether. Fantastic! You find someone online who will accept payment in ether for a mcguffin-widget that you really want to buy. You send them 2 ether and they send you the mcguffin-widget. Lovely. To make that 2-ether payment, you signed a transaction sending 2 ether from your account to their account, and then broadcast it to the Ethereum network to be verified and included on the blockchain. Now, without a nonce value in the transaction, a second transaction sending 2 ether to the same address a second time will look exactly the same as the first transaction. This means that anyone who sees your transaction on the Ethereum network (which means everyone, including the recipient or your enemies) can "replay" the transaction again and again and again until all your ether is gone simply by copying and pasting your original transaction and resending it to the network. However, with the nonce value included in the transaction data, every single transaction is unique, even when sending the same amount of ether to the same recipient address multiple times. Thus, by having the incrementing nonce as part of the transaction, it is simply not possible for anyone to "duplicate" a payment you have made.
In summary, it is important to note that the use of the nonce is actually vital for an account-based protocol, in contrast to the “Unspent Transaction Output” (UTXO) mechanism of the Bitcoin protocol.
Gaps in Nonces, Duplicate Nonces, and Confirmation It is important to keep track of nonces if you are creating transactions programmatically, especially if you are doing so from multiple independent processes simultaneously.
The Ethereum network processes transactions sequentially, based on the nonce. That means that if you transmit a transaction with nonce 0 and then transmit a transaction with nonce 2, the second transaction will not be included in any blocks. It will be stored in the mempool, while the Ethereum network waits for the missing nonce to appear. All nodes will assume that the missing nonce has simply been delayed and that the transaction with nonce 2 was received out of sequence.
If you then transmit a transaction with the missing nonce 1, both transactions (nonces 1 and 2) will be processed and included (if valid, of course). Once you fill the gap, the network can mine the out-of-sequence transaction that it held in the mempool.
What this means is that if you create several transactions in sequence and one of them does not get officially included in any blocks, all the subsequent transactions will be "stuck," waiting for the missing nonce. A transaction can create an inadvertent "gap" in the nonce sequence because it is invalid or has insufficient gas. To get things moving again, you have to transmit a valid transaction with the missing nonce. You should be equally mindful that once a transaction with the "missing" nonce is validated by the network, all the broadcast transactions with subsequent nonces will incrementally become valid; it is not possible to "recall" a transaction!
If, on the other hand, you accidentally duplicate a nonce, for example by transmitting two transactions with the same nonce but different recipients or values, then one of them will be confirmed and one will be rejected. Which one is confirmed will be determined by the sequence in which they arrive at the first validating node that receives them—i.e., it will be fairly random.
As you can see, keeping track of nonces is necessary, and if your application doesn’t manage that process correctly you will run into problems. Unfortunately, things get even more difficult if you are trying to do this concurrently.
Concurrency, Transaction Origination, and Nonces In simple terms, concurrency is when you have simultaneous computation by multiple independent systems. These can be in the same program (e.g., multithreading), on the same CPU (e.g., multiprocessing), or on different computers (i.e., distributed systems). Ethereum, by definition, is a system that allows concurrency of operations (nodes, clients, DApps) but enforces a singleton state through consensus.
Now, imagine that you have multiple independent wallet applications that are generating transactions from the same address or addresses. One example of such a situation would be an exchange processing withdrawals from the exchange’s hot wallet (a wallet whose keys are stored online, in contrast to a cold wallet where the keys are never online). Ideally, you’d want to have more than one computer processing withdrawals, so that it doesn’t become a bottleneck or single point of failure. However, this quickly becomes problematic, as having more than one computer producing withdrawals will result in some thorny concurrency problems, not least of which is the selection of nonces. How do multiple computers generating, signing, and broadcasting transactions from the same hot wallet account coordinate?
You could use a single computer to assign nonces, on a first-come first-served basis, to computers signing transactions. However, this computer is now a single point of failure. Worse, if several nonces are assigned and one of them never gets used (because of a failure in the computer processing the transaction with that nonce), all subsequent transactions get stuck.
Another approach would be to generate the transactions, but not assign a nonce to them (and therefore leave them unsigned—remember that the nonce is an integral part of the transaction data and therefore needs to be included in the digital signature that authenticates the transaction). You could then queue them to a single node that signs them and also keeps track of nonces. Again, though, this would be a choke point in the process: the signing and tracking of nonces is the part of your operation that is likely to become congested under load, whereas the generation of the unsigned transaction is the part you don’t really need to parallelize. You would have some concurrency, but it would be lacking in a critical part of the process.
In the end, these concurrency problems, on top of the difficulty of tracking account balances and transaction confirmations in independent processes, force most implementations toward avoiding concurrency and creating bottlenecks such as a single process handling all withdrawal transactions in an exchange, or setting up multiple hot wallets that can work completely independently for withdrawals and only need to be intermittently rebalanced.
Transaction Gas Read about gas below.
Transaction Recipient The recipient of a transaction is specified in the to field. This contains a 20-byte Ethereum address. The address can be an EOA or a contract address.
Ethereum does no further validation of this field. Any 20-byte value is considered valid. If the 20-byte value corresponds to an address without a corresponding private key, or without a corresponding contract, the transaction is still valid. Ethereum has no way of knowing whether an address was correctly derived from a public key (and therefore from a private key) in existence.
Sending a transaction to the wrong address will probably burn the ether sent, rendering it forever inaccessible (unspendable), since most addresses do not have a known private key and therefore no signature can be generated to spend it. It is assumed that validation of the address happens at the user interface level (see [EIP55]). In fact, there are a number of valid reasons for burning ether—for example, as a disincentive to cheating in payment channels and other smart contracts—and since the amount of ether is finite, burning ether effectively distributes the value burned to all ether holders
Transaction Value and Data The main "payload" of a transaction is contained in two fields: value and data. Transactions can have both value and data, only value, only data, or neither value nor data. All four combinations are valid.
A transaction with only value is a payment. A transaction with only data is an invocation. A transaction with both value and data is both a payment and an invocation. A transaction with neither value nor data—well that’s probably just a waste of gas! But it is still possible.
When you construct an Ethereum transaction that contains a value, it is the equivalent of a payment. Such transactions behave differently depending on whether the destination address is a contract or not.
For EOA addresses, or rather for any address that isn’t flagged as a contract on the blockchain, Ethereum will record a state change, adding the value you sent to the balance of the address. If the address has not been seen before, it will be added to the client’s internal representation of the state and its balance initialized to the value of your payment.
If the destination address (to) is a contract, then the EVM will execute the contract and will attempt to call the function named in the data payload of your transaction. If there is no data in your transaction, the EVM will call a fallback function and, if that function is payable, will execute it to determine what to do next. If there is no code in fallback function, then the effect of the transaction will be to increase the balance of the contract, exactly like a payment to a wallet. If there is no fallback function or non-payable fallback function, then transaction will be reverted.
A contract can reject incoming payments by throwing an exception immediately when a function is called, or as determined by conditions coded in a function. If the function terminates successfully (without an exception), then the contract’s state is updated to reflect an increase in the contract’s ether balance.
When your transaction contains data, it is most likely addressed to a contract address. That doesn’t mean you cannot send a data payload to an EOA—that is completely valid in the Ethereum protocol. However, in that case, the interpretation of the data is up to the wallet you use to access the EOA. It is ignored by the Ethereum protocol. Most wallets also ignore any data received in a transaction to an EOA they control. In the future, it is possible that standards may emerge that allow wallets to interpret data the way contracts do, thereby allowing transactions to invoke functions running inside user wallets. The critical difference is that any interpretation of the data payload by an EOA is not subject to Ethereum’s consensus rules, unlike a contract execution.
For now, let’s assume your transaction is delivering data to a contract address. In that case, the data will be interpreted by the EVM as a contract invocation. Most contracts use this data more specifically as a function invocation, calling the named function and passing any encoded arguments to the function.
The data payload sent to an ABI-compatible contract (which you can assume all contracts are) is a hex-serialized encoding of:
A function selector: The first 4 bytes of the Keccak-256 hash of the function’s prototype. This allows the contract to unambiguously identify which function you wish to invoke.
The function arguments: The function’s arguments, encoded according to the rules for the various elementary types defined in the ABI specification.
For example, a function for withdrawals:
function withdraw(uint withdraw_amount) public {
The prototype of a function is defined as the string containing the name of the function, followed by the data types of each of its arguments, enclosed in parentheses and separated by commas. The function name here is withdraw and it takes a single argument that is a uint (which is an alias for uint256), so the prototype of withdraw would be:
withdraw(uint256)
Let’s calculate the Keccak-256 hash of this string:
> keccak256("withdraw(uint256)")
'0x2e1a7d4d13322e7b96f9a57413e1525c250fb7a9021cf91d1540d5b69f16a49f'
The first 4 bytes of the hash are 0x2e1a7d4d. That’s our "function selector" value, which will tell the contract which function we want to call.
Next, let’s calculate a value to pass as the argument withdraw_amount. We want to withdraw 0.01 ether. Let’s encode that to a hex-serialized big-endian unsigned 256-bit integer, denominated in wei:
> withdraw_amount = toWei(0.01, "ether");
'10000000000000000'
> withdraw_amount_hex = toHex(withdraw_amount);
'0x2386f26fc10000'
Now, we add the function selector to the amount (padded to 32 bytes):
2e1a7d4d000000000000000000000000000000000000000000000000002386f26fc10000
Special Transaction: Contract Creation One special case that we should mention is a transaction that creates a new contract on the blockchain, deploying it for future use. Contract creation transactions are sent to a special destination address called the zero address; the to field in a contract registration transaction contains the address 0x0. This address represents neither an EOA (there is no corresponding private–public key pair) nor a contract. It can never spend ether or initiate a transaction. It is only used as a destination, with the special meaning "create this contract."
While the zero address is intended only for contract creation, it sometimes receives payments from various addresses. There are two explanations for this: either it is by accident, resulting in the loss of ether, or it is an intentional ether burn (deliberately destroying ether by sending it to an address from which it can never be spent).
A contract creation transaction need only contain a data payload that contains the compiled bytecode which will create the contract. The only effect of this transaction is to create the contract. You can include an ether amount in the value field if you want to set the new contract up with a starting balance, but that is entirely optional. If you send a value (ether) to the contract creation address without a data payload (no contract), then the effect is the same as sending to a burn address—there is no contract to credit, so the ether is lost.
Transaction Signing All about digital signatures were described in the Bitcoin article. Here's what different in Ethereum.
To produce a valid transaction, the originator must digitally sign the message, using the Elliptic Curve Digital Signature Algorithm. When we say "sign the transaction" we actually mean "sign the Keccak-256 hash of the RLP-serialized transaction data." The signature is applied to the hash of the transaction data, not the transaction itself.
To sign a transaction in Ethereum, the originator must:
- 1.Create a transaction data structure, containing nine fields: nonce, gasPrice, gasLimit, to, value, data, chainID, 0, 0.
- 2.Produce an RLP-encoded serialized message of the transaction data structure.
- 3.Compute the Keccak-256 hash of this serialized message.
- 4.Compute the ECDSA signature, signing the hash with the originating EOA’s private key.
- 5.Append the ECDSA signature’s computed v, r, and s values to the transaction.
The special signature variable v indicates two things: the chain ID and the recovery identifier to help the ECDSArecover function check the signature. The recovery identifier is used to indicate the parity of the y component of the public key.
The EIP-155 "Simple Replay Attack Protection" standard specifies a replay-attack-protected transaction encoding, which includes a chain identifier inside the transaction data, prior to signing. This ensures that transactions created for one blockchain (e.g., the Ethereum main network) are invalid on another blockchain (e.g., Ethereum Classic or the test network). Therefore, transactions broadcast on one network cannot be replayed on another, hence the name of the standard.
EIP-155 adds three fields to the main six fields of the transaction data structure, namely the chain identifier, 0, and 0. These three fields are added to the transaction data before it is encoded and hashed. They therefore change the transaction’s hash, to which the signature is later applied. By including the chain identifier in the data being signed, the transaction signature prevents any changes, as the signature is invalidated if the chain identifier is modified. Therefore, EIP-155 makes it impossible for a transaction to be replayed on another chain, because the signature’s validity depends on the chain identifier.
The resulting transaction structure is RLP-encoded, hashed, and signed. The signature algorithm is modified slightly to encode the chain identifier in the v prefix too.
The transaction message doesn’t include a "from" field. That’s because the originator’s public key can be computed directly from the ECDSA signature. Once you have the public key, you can compute the address easily. The process of recovering the signer’s public key is called public key recovery.
Given the values r and s that were computed in ECDSA Math, we can compute two possible public keys.
To make things more efficient, the transaction signature includes a prefix value v, which tells us which of the two possible R values is the ephemeral public key. If v is even, then R is the correct value. If v is odd, then it is R'. That way, we need to calculate only one value for R and only one value for K. This is like setting 02 or 03 for short public keys.
Smart Contracts
There are two different types of accounts in Ethereum: externally owned accounts (EOAs) and contract accounts. EOAs are controlled by users, often via software such as a wallet application that is external to the Ethereum platform. In contrast, contract accounts are controlled by program code (also commonly referred to as “smart contracts”) that is executed by the Ethereum Virtual Machine. In short, EOAs are simple accounts without any associated code or data storage, whereas contract accounts have both associated code and data storage. EOAs are controlled by transactions created and cryptographically signed with a private key in the "real world" external to and independent of the protocol, whereas contract accounts do not have private keys and so "control themselves" in the predetermined way prescribed by their smart contract code. Both types of accounts are identified by an Ethereum address. In this chapter, we will discuss contract accounts and the program code that controls them.
The term smart contract has been used over the years to describe a wide variety of different things. In the 1990s, cryptographer Nick Szabo coined the term and defined it as “a set of promises, specified in digital form, including protocols within which the parties perform on the other promises.” Since then, the concept of smart contracts has evolved, especially after the introduction of decentralized blockchain platforms with the invention of Bitcoin in 2009. In the context of Ethereum, the term is actually a bit of a misnomer, given that Ethereum smart contracts are neither smart nor legal contracts, but the term has stuck. In this book, we use the term “smart contracts” to refer to immutable computer programs that run deterministically in the context of an Ethereum Virtual Machine as part of the Ethereum network protocol—i.e., on the decentralized Ethereum world computer.
Life Cycle of a Smart Contract Smart contracts are typically written in a high-level language, such as Solidity. But in order to run, they must be compiled to the low-level bytecode that runs in the EVM. Once compiled, they are deployed on the Ethereum platform using a special contract creation transaction, which is identified as such by being sent to the special contract creation address. Each contract is identified by an Ethereum address, which is derived from the contract creation transaction as a function of the originating account and nonce. The Ethereum address of a contract can be used in a transaction as the recipient, sending funds to the contract or calling one of the contract’s functions. Note that, unlike with EOAs, there are no keys associated with an account created for a new smart contract. As the contract creator, you don’t get any special privileges at the protocol level (although you can explicitly code them into the smart contract). You certainly don’t receive the private key for the contract account, which in fact does not exist—we can say that smart contract accounts own themselves.
Importantly, contracts only run if they are called by a transaction. All smart contracts in Ethereum are executed, ultimately, because of a transaction initiated from an EOA. A contract can call another contract that can call another contract, and so on, but the first contract in such a chain of execution will always have been called by a transaction from an EOA. Contracts never run “on their own” or “in the background.” Contracts effectively lie dormant until a transaction triggers execution, either directly or indirectly as part of a chain of contract calls. It is also worth noting that smart contracts are not executed "in parallel" in any sense—the Ethereum world computer can be considered to be a single-threaded machine.
Transactions are atomic, they are either successfully terminated or reverted. A successful termination of a transaction means that if there were changes to be made with this transaction, they are recorded. Otherwise, if a transaction is reverted, all of its effects (changes in state) are “rolled back” as if the transaction never ran. A failed transaction is still recorded as having been attempted, and the ether spent on gas for the execution is deducted from the originating account, but it otherwise has no other effects on contract or account state.
As mentioned previously, it is important to remember that a contract’s code cannot be changed. However, a contract can be “deleted,” removing the code and its internal state (storage) from its address, leaving a blank account. Any transactions sent to that account address after the contract has been deleted do not result in any code execution, because there is no longer any code there to execute. To delete a contract, you execute an EVM opcode called SELFDESTRUCT (previously called SUICIDE). That operation costs “negative gas,” a gas refund, thereby incentivizing the release of network client resources from the deletion of stored state. Deleting a contract in this way does not remove the transaction history (past) of the contract, since the blockchain itself is immutable. It is also important to note that the SELFDESTRUCT capability will only be available if the contract author programmed the smart contract to have that functionality. If the contract’s code does not have a SELFDESTRUCT opcode, or it is inaccessible, the smart contract cannot be deleted.
More on smart contracts will be in the sections below.
Tokens
The most obvious use of tokens is as digital private currencies. However, this is only one possible use. Tokens can be programmed to serve many different functions, often overlapping. For example, a token can simultaneously convey a voting right, an access right, and ownership of a resource.
Wikipedia says: "In economics, fungibility is the property of a good or a commodity whose individual units are essentially interchangeable."
Tokens are fungible when we can substitute any single unit of the token for another without any difference in its value or function.
Non-fungible tokens are tokens that each represent a unique tangible or intangible item and therefore are not interchangeable. For example, a token that represents ownership of a specific Van Gogh painting is not equivalent to another token that represents a Picasso, even though they might be part of the same "art ownership token" system. Similarly, a token representing a specific digital collectible such as a specific CryptoKitty is not interchangeable with any other CryptoKitty. Each non-fungible token is associated with a unique identifier, such as a serial number.
Using Tokens: Utility or Equity Almost all projects in Ethereum today launch with some kind of token. But do all these projects really need tokens? Are there any disadvantages to using a token, or will we see the slogan "tokenize all the things" come to fruition? In principle, the use of tokens can be seen as the ultimate management or organization tool. In practice, the integration of blockchain platforms, including Ethereum, into the existing structures of society means that, so far, there are many limitations to their applicability.
Let’s start by clarifying the role of a token in a new project. The majority of projects are using tokens in one of two ways: either as "utility tokens" or as "equity tokens." Very often, those two roles are conflated.
Utility tokens are those where the use of the token is required to gain access to a service, application, or resource. Examples of utility tokens include tokens that represent resources such as shared storage, or access to services such as social media networks.
Equity tokens are those that represent shares in the control or ownership of something, such as a startup. Equity tokens can be as limited as nonvoting shares for distribution of dividends and profits, or as expansive as voting shares in a decentralized autonomous organization, where management of the platform is through some complex governance system based on votes by the token holders.
The Problem With Utility Tokens Many startups face a difficult problem: tokens are a great fundraising mechanism, but offering securities (equity) to the public is a regulated activity in most jurisdictions. By disguising equity tokens as utility tokens, many startups hope to get around these regulatory(more likely moral or logical) restrictions and raise money from a public offering while presenting it as utility tokens. Whether these thinly disguised equity offerings will be able to skirt the regulators(more likely users/people) remains to be seen.
These tokens introduce significant risks and adoption barriers for startups. Perhaps in a distant future "tokenize all the things" will become reality, but at present the set of people who have an understanding of and desire to use a token is a subset of the already small cryptocurrency market.
For a startup, each innovation represents a risk and a market filter. Innovation is taking the road least traveled, walking away from the path of tradition. It is already a lonely walk. If a startup is trying to innovate in a new area of technology, such as storage sharing over P2P networks, that is a lonely enough path. Adding a utility token to that innovation and requiring users to adopt tokens in order to use the service compounds the risk and increases the barriers to adoption. It’s walking off the already lonely trail of P2P storage innovation and into the wilderness.
Think of each innovation as a filter. It limits adoption to the subset of the market that can become early adopters of this innovation. Adding a second filter compounds that effect, further limiting the addressable market. You are asking your early adopters to adopt not one but two completely new technologies: the novel application/platform/service you built, and the token economy.
For a startup, each innovation introduces risks that increase the chance of failure of the startup. If you take your already risky startup idea and add a utility token, you are adding all the risks of the underlying platform (Ethereum), broader economy (exchanges, liquidity), regulatory environment (equity/commodity regulators), and technology (smart contracts, token standards). That’s a lot of risk for a startup.
Advocates of "tokenize all the things" will likely counter that by adopting tokens they are also inheriting the market enthusiasm, early adopters, technology, innovation, and liquidity of the entire token economy. That is true too. The question is whether the benefits and enthusiasm outweigh the risks and uncertainties.
Nevertheless, some of the most innovative business ideas are indeed taking place in the crypto realm. If regulators are not quick enough to adopt laws and support new business models, entrepreneurs and associated talent will seek to operate in other jurisdictions that are more crypto-friendly. This is already happening.
Finally, "token" has the colloquial meaning as "something of insignificant value." The underlying reason for the insignificant value of most tokens is because they can only be used in a very narrow context: one bus company, one laundromat, one arcade, one hotel, or one company store. Limited liquidity, limited applicability, and high conversion costs reduce the value of tokens until they are only of "token" value. So when you add a utility token to your platform, but the token can only be used on your single platform with a small market, you are recreating the conditions that made physical tokens worthless. This may indeed be the correct way to incorporate tokenization into your project. However, if in order to use your platform a user has to convert something into your utility token, use it, and then convert the remainder back into something more generally useful, you’ve created a company scrip. The switching costs of a digital token are orders of magnitude lower than for a physical token without a market, but they are not zero. Utility tokens that work across an entire industry sector will be very interesting and probably quite valuable. But if you set up your startup to have to bootstrap an entire industry standard in order to succeed, you may have already failed.
Tokens on Ethereum Blockchain tokens existed before Ethereum. In some ways, the first blockchain currency, Bitcoin, is a token itself. Many token platforms were also developed on Bitcoin and other cryptocurrencies before Ethereum. However, the introduction of the first token standard on Ethereum led to an explosion of tokens.
Tokens are different from ether because the Ethereum protocol does not know anything about them. Sending ether is an intrinsic action of the Ethereum platform, but sending or even owning tokens is not. The ether balance of Ethereum accounts is handled at the protocol level, whereas the token balance of Ethereum accounts is handled at the smart contract level. In order to create a new token on Ethereum, you must create a new smart contract. Once deployed, the smart contract handles everything, including ownership, transfers, and access rights. You can write your smart contract to perform all the necessary actions any way you want, but it is probably wisest to follow an existing standard.
The first standard was introduced in November 2015 by Fabian Vogelsteller as an Ethereum Request for Comments (ERC). It was automatically assigned GitHub issue number 20, giving rise to the name "ERC20 token." The vast majority of tokens are currently based on the ERC20 standard. The ERC20 request for comments eventually became Ethereum Improvement Proposal 20 (EIP-20), but it is mostly still referred to by the original name, ERC20.
ERC20 is a standard for fungible tokens, meaning that different units of an ERC20 token are interchangeable and have no unique properties.
The ERC20 standard defines a common interface for contracts implementing a token, such that any compatible token can be accessed and used in the same way. The interface consists of a number of functions that must be present in every implementation of the standard, as well as some optional functions and attributes that may be added by developers.
An ERC20-compliant token contract must provide at least the following functions and events:
totalSupply
Returns the total units of this token that currently exist. ERC20 tokens can have a fixed or a variable supply.
balanceOf - Given an address, returns the token balance of that address.
transfer - Given an address and amount, transfers that amount of tokens to that address, from the balance of the address that executed the transfer.
transferFrom - Given a sender, recipient, and amount, transfers tokens from one account to another. Used in combination with approve.
approve - Given a recipient address and amount, authorizes that address to execute several transfers up to that amount, from the account that issued the approval.
allowance - Given an owner address and a spender address, returns the remaining amount that the spender is approved to withdraw from the owner.
Transfer - Event triggered upon a successful transfer (call to transfer or transferFrom) (even for zero-value transfers).
Approval - Event logged upon a successful call to approve.
ERC20 optional functions In addition to the required functions listed in the previous section, the following optional functions are also defined by the standard:
name - Returns the human-readable name (e.g., "US Dollars") of the token.
symbol - Returns a human-readable symbol (e.g., "USD") for the token.
decimals - Returns the number of decimals used to divide token amounts. For example, if decimals is 2, then the token amount is divided by 100 to get its user representation.
Here’s what an ERC20 interface specification looks like in Solidity:
contract ERC20 {
function totalSupply() constant returns (uint theTotalSupply);
function balanceOf(address _owner) constant returns (uint balance);
function transfer(address _to, uint _value) returns (bool success);
function transferFrom(address _from, address _to, uint _value) returns (bool success);
function approve(address _spender, uint _value) returns (bool success);
function allowance(address _owner, address _spender) constant returns (uint remaining);
event Transfer(address indexed _from, address indexed _to, uint _value);
event Approval(address indexed _owner, address indexed _spender, uint _value);
}
One of the less obvious issues with ERC20 tokens is that they expose subtle differences between tokens and ether itself. Where ether is transferred by a transaction that has a recipient address as its destination, token transfers occur within the specific token contract state and have the token contract as their destination, not the recipient’s address. The token contract tracks balances and issues events. In a token transfer, no transaction is actually sent to the recipient of the token. Instead, the recipient’s address is added to a map within the token contract itself. A transaction sending ether to an address changes the state of an address. A transaction transferring a token to an address only changes the state of the token contract, not the state of the recipient address. Even a wallet that has support for ERC20 tokens does not become aware of a token balance unless the user explicitly adds a specific token contract to "watch." Some wallets watch the most popular token contracts to detect balances held by addresses they control, but that’s limited to a small fraction of existing ERC20 contracts.
In fact, it’s unlikely that a user would want to track all balances in all possible ERC20 token contracts. Many ERC20 tokens are more like email spam than usable tokens. They automatically create balances for accounts that have ether activity, in order to attract users. If you have an Ethereum address with a long history of activity, especially if it was created in the presale, you will find it full of "junk" tokens that appeared out of nowhere. Of course, the address isn’t really full of tokens; it’s the token contracts that have your address in them. You only see these balances if these token contracts are being watched by the block explorer or wallet you use to view your address.
Tokens don’t behave the same way as ether. Ether is sent with the send function and accepted by any payable function in a contract or any externally owned address. Tokens are sent using transfer or approve & transferFrom functions that exist only in the ERC20 contract, and do not (at least in ERC20) trigger any payable functions in a recipient contract. Tokens are meant to function just like a cryptocurrency such as ether, but they come with certain differences that break that illusion.
Oracles
A key component of the Ethereum platform is the Ethereum Virtual Machine, with its ability to execute programs and update the state of Ethereum, constrained by consensus rules, on any node in the decentralized network. In order to maintain consensus, EVM execution must be totally deterministic and based only on the shared context of the Ethereum state and signed transactions. This has two particularly important consequences: the first is that there can be no intrinsic source of randomness for the EVM and smart contracts to work with; the second is that extrinsic data can only be introduced as the data payload of a transaction.
Let’s unpack those two consequences further. To understand the prohibition of a true random function in the EVM to provide randomness for smart contracts, consider the effect on attempts to achieve consensus after the execution of such a function: node A would execute the command and store 3 on behalf of the smart contract in its storage, while node B, executing the same smart contract, would store 7 instead. Thus, nodes A and B would come to different conclusions about what the resulting state should be, despite having run exactly the same code in the same context. Indeed, it could be that a different resulting state would be achieved every time that the smart contract is evaluated. As such, there would be no way for the network, with its multitude of nodes running independently around the world, to ever come to a decentralized consensus on what the resulting state should be.
Note that pseudorandom functions, such as cryptographically secure hash functions (which are deterministic and therefore can be, and indeed are, part of the EVM), are not enough for many applications. Take a gambling game that simulates coin flips to resolve bet payouts, which needs to randomize heads or tails—a miner can gain an advantage by playing the game and only including their transactions in blocks for which they will win. How do we get around this problem? Since all nodes can agree on the contents of signed transactions, extrinsic information, including sources of randomness, price information, weather forecasts, etc., can be introduced as the data part of transactions sent to the network. However, such data simply cannot be trusted, because it comes from unverifiable sources. As such, we have just deferred the problem.
Oracle Oracles, ideally, provide a trustless (or at least near-trustless) way of getting extrinsic (i.e., "real-world" or off-chain) information, such as the results of football games, the price of gold, or truly random numbers, onto the Ethereum platform for smart contracts to use. They can also be used to relay data securely to DApp frontends directly. Oracles can therefore be thought of as a mechanism for bridging the gap between the off-chain world and smart contracts. Allowing smart contracts to enforce contractual relationships based on real-world events and data broadens their scope dramatically. However, this can also introduce external risks to Ethereum’s security model. Consider a "smart will" contract that distributes assets when a person dies. This is something frequently discussed in the smart contract space, and highlights the risks of a trusted oracle. If the inheritance amount controlled by such a contract is high enough, the incentive to hack the oracle and trigger distribution of assets before the owner dies is very high.
Note that some oracles provide data that is particular to a specific private data source, such as academic certificates or government IDs. The source of such data, such as a university or government department, is fully trusted, and the truth of the data is subjective (truth is only determined by appeal to the authority of the source). Such data cannot therefore be provided trustlessly—i.e., without trusting a source—as there is no independently verifiable objective truth. As such, we include these data sources in our definition of what counts as "oracles" because they also provide a data bridge for smart contracts. The data they provide generally takes the form of attestations, such as passports or records of achievement. Attestations will become a big part of the success of blockchain platforms in the future, particularly in relation to the related issues of verifying identity or reputation, so it is important to explore how they can be served by blockchain platforms.
Oracle Design Patterns Starting with the simplest, immediate-read oracles are those that provide data that is only needed for an immediate decision, like "What is the address for ethereumbook.info?" or "Is this person over 18?" Those wishing to query this kind of data tend to do so on a "just-in-time" basis; the lookup is done when the information is needed and possibly never again. Examples of such oracles include those that hold data about or issued by organizations, such as academic certificates, dial codes, institutional memberships, airport identifiers, self-sovereign IDs, etc. This type of oracle stores data once in its contract storage, whence any other smart contract can look it up using a request call to the oracle contract. It may be updated. The data in the oracle’s storage is also available for direct lookup by blockchain-enabled (i.e., Ethereum client–connected) applications without having to go through the palaver and incurring the gas costs of issuing a transaction. A shop wanting to check the age of a customer wishing to purchase alcohol could use an oracle in this way. This type of oracle is attractive to an organization or company that might otherwise have to run and maintain servers to answer such data requests. Note that the data stored by the oracle is likely not to be the raw data that the oracle is serving, e.g., for efficiency or privacy reasons. A university might set up an oracle for the certificates of academic achievement of past students. However, storing the full details of the certificates (which could run to pages of courses taken and grades achieved) would be excessive. Instead, a hash of the certificate is sufficient. Likewise, a government might wish to put citizen IDs onto the Ethereum platform, where clearly the details included need to be kept private. Again, hashing the data (more carefully, in Merkle trees with salts) and only storing the root hash in the smart contract’s storage would be an efficient way to organize such a service.
The next setup is publish–subscribe, where an oracle that effectively provides a broadcast service for data that is expected to change (perhaps both regularly and frequently) is either polled by a smart contract on-chain, or watched by an off-chain daemon for updates. This category has a pattern similar to RSS feeds, WebSub, and the like, where the oracle is updated with new information and a flag signals that new data is available to those who consider themselves "subscribed." Interested parties must either poll the oracle to check whether the latest information has changed, or listen for updates to oracle contracts and act when they occur. Examples include price feeds, weather information, economic or social statistics, traffic data, etc. Polling is very inefficient in the world of web servers, but not so in the peer-to-peer context of blockchain platforms: Ethereum clients have to keep up with all state changes, including changes to contract storage, so polling for data changes is a local call to a synced client. Ethereum event logs make it particularly easy for applications to look out for oracle updates, and so this pattern can in some ways even be considered a "push" service. However, if the polling is done from a smart contract, which might be necessary for some decentralized applications (e.g., where activation incentives are not possible), then significant gas expenditure may be incurred.
The request–response category is the most complicated: this is where the data space is too huge to be stored in a smart contract and users are expected to only need a small part of the overall dataset at a time. It is also an applicable model for data provider businesses. In practical terms, such an oracle might be implemented as a system of on-chain smart contracts and off-chain infrastructure used to monitor requests and retrieve and return data. A request for data from a decentralized application would typically be an asynchronous process involving a number of steps. In this pattern, firstly, an EOA transacts with a decentralized application, resulting in an interaction with a function defined in the oracle smart contract. This function initiates the request to the oracle, with the associated arguments detailing the data requested in addition to supplementary information that might include callback functions and scheduling parameters. Once this transaction has been validated, the oracle request can be observed as an EVM event emitted by the oracle contract, or as a state change; the arguments can be retrieved and used to perform the actual query of the off-chain data source. The oracle may also require payment for processing the request, gas payment for the callback, and permissions to access the requested data. Finally, the resulting data is signed by the oracle owner, attesting to the validity of the data at a given time, and delivered in a transaction to the decentralized application that made the request—either directly or via the oracle contract. Depending on the scheduling parameters, the oracle may broadcast further transactions updating the data at regular intervals (e.g., end-of-day pricing information).
A range of other schemes are also possible; for example, data can be requested from and returned directly by an EOA, removing the need for an oracle smart contract. Similarly, the request and response could be made to and from an Internet of Things–enabled hardware sensor. Therefore, oracles can be human, software, or hardware.
Publish–subscribe is a pattern where publishers (in this context, oracles) do not send messages directly to receivers, but instead categorize published messages into distinct classes. Subscribers are able to express an interest in one or more classes and retrieve only those messages that are of interest. Under such a pattern, an oracle might write the interest rate to its own internal storage each time it changes. Multiple subscribed DApps can simply read it from the oracle contract, thereby reducing the impact on network bandwidth while minimizing storage costs.
In a broadcast or multicast pattern, an oracle would post all messages to a channel and subscribing contracts would listen to the channel under a variety of subscription modes. For example, an oracle might publish messages to a cryptocurrency exchange rate channel. A subscribing smart contract could request the full content of the channel if it required the time series for, e.g., a moving average calculation; another might require only the latest rate for a spot price calculation. A broadcast pattern is appropriate where the oracle does not need to know the identity of the subscribing contract.
Good article on modern oracles - https://www.rapidinnovation.io/post/blockchain-oracles-essential-guide-connecting-on-chain-off-chain-data
The Ethereum Virtual Machine
At the heart of the Ethereum protocol and operation is the Ethereum Virtual Machine, or EVM for short. As you might guess from the name, it is a computation engine, not hugely dissimilar to the virtual machines of Microsoft’s .NET Framework, or interpreters of other bytecode-compiled programming languages such as Java.
The EVM is the part of Ethereum that handles smart contract deployment and execution. Simple value transfer transactions from one EOA to another don’t need to involve it, practically speaking, but everything else will involve a state update computed by the EVM. At a high level, the EVM running on the Ethereum blockchain can be thought of as a global decentralized computer containing millions of executable objects, each with its own permanent data store.
The EVM has a stack-based architecture, storing all in-memory values on a stack. It works with a word size of 256 bits (mainly to facilitate native hashing and elliptic curve operations) and has several addressable data components:
- • An immutable program code ROM, loaded with the bytecode of the smart contract to be executed.
- • A volatile memory, with every location explicitly initialized to zero.
- • A permanent storage that is part of the Ethereum state, also zero-initialized.
- • A set of environment variables and data that is available during execution.
Comparison with Existing Technology The term "virtual machine" is often applied to the virtualization of a real computer, typically by a "hypervisor" such as VirtualBox or QEMU, or of an entire operating system instance, such as Linux’s KVM. These must provide a software abstraction, respectively, of actual hardware, and of system calls and other kernel functionality.
The EVM operates in a much more limited domain: it is just a computation engine, and as such provides an abstraction of just computation and storage, similar to the Java Virtual Machine (JVM) specification, for example. From a high-level viewpoint, the JVM is designed to provide a runtime environment that is agnostic of the underlying host OS or hardware, enabling compatibility across a wide variety of systems. High-level programming languages such as Java or Scala (which use the JVM) or C# (which uses .NET) are compiled into the bytecode instruction set of their respective virtual machine. In the same way, the EVM executes its own bytecode instruction set (described in the next section), which higher-level smart contract programming languages such as LLL, Serpent, Mutan, or Solidity are compiled into.
The EVM, therefore, has no scheduling capability, because execution ordering is organized externally to it—Ethereum clients run through verified block transactions to determine which smart contracts need executing and in which order. In this sense, the Ethereum world computer is single-threaded, like JavaScript. Neither does the EVM have any "system interface" handling or “hardware support”—there is no physical machine to interface with. The Ethereum world computer is completely virtual.
The EVM Instruction Set (Bytecode Operations) Let’s start our exploration of the EVM in more detail by looking at the available opcodes and what they do. As you might expect, all operands are taken from the stack, and the result (where applicable) is often put back on the top of the stack.
Arithmetic opcode instructions:
ADD //Add the top two stack items
MUL //Multiply the top two stack items
SUB //Subtract the top two stack items
DIV //Integer division
SDIV //Signed integer division
MOD //Modulo (remainder) operation
SMOD //Signed modulo operation
ADDMOD //Addition modulo any number
MULMOD //Multiplication modulo any number
EXP //Exponential operation
SIGNEXTEND //Extend the length of a two's complement signed integer
SHA3 //Compute the Keccak-256 hash of a block of memory
Stack, memory, and storage management instructions:
POP //Remove the top item from the stack
MLOAD //Load a word from memory
MSTORE //Save a word to memory
MSTORE8 //Save a byte to memory
SLOAD //Load a word from storage
SSTORE //Save a word to storage
MSIZE //Get the size of the active memory in bytes
PUSHx //Place x byte item on the stack, where x can be any integer from
// 1 to 32 (full word) inclusive
DUPx //Duplicate the x-th stack item, where x can be any integer from
// 1 to 16 inclusive
SWAPx //Exchange 1st and (x+1)-th stack items, where x can be any
// integer from 1 to 16 inclusive
Instructions for control flow:
STOP //Halt execution
JUMP //Set the program counter to any value
JUMPI //Conditionally alter the program counter
PC //Get the value of the program counter (prior to the increment
//corresponding to this instruction)
JUMPDEST //Mark a valid destination for jumps
Opcodes for the system executing the program:
LOGx //Append a log record with x topics, where x is any integer
//from 0 to 4 inclusive
CREATE //Create a new account with associated code
CALL //Message-call into another account, i.e. run another
//account's code
CALLCODE //Message-call into this account with another
//account's code
RETURN //Halt execution and return output data
DELEGATECALL //Message-call into this account with an alternative
//account's code, but persisting the current values for
//sender and value
STATICCALL //Static message-call into an account
REVERT //Halt execution, reverting state changes but returning
//data and remaining gas
INVALID //The designated invalid instruction
SELFDESTRUCT //Halt execution and register account for deletion
Opcodes for comparisons and bitwise logic:
LT //Less-than comparison
GT //Greater-than comparison
SLT //Signed less-than comparison
SGT //Signed greater-than comparison
EQ //Equality comparison
ISZERO //Simple NOT operator
AND //Bitwise AND operation
OR //Bitwise OR operation
XOR //Bitwise XOR operation
NOT //Bitwise NOT operation
BYTE //Retrieve a single byte from a full-width 256-bit word
Opcodes dealing with execution environment information:
GAS //Get the amount of available gas (after the reduction for
//this instruction)
ADDRESS //Get the address of the currently executing account
BALANCE //Get the account balance of any given account
ORIGIN //Get the address of the EOA that initiated this EVM
//execution
CALLER //Get the address of the caller immediately responsible
//for this execution
CALLVALUE //Get the ether amount deposited by the caller responsible
//for this execution
CALLDATALOAD //Get the input data sent by the caller responsible for
//this execution
CALLDATASIZE //Get the size of the input data
CALLDATACOPY //Copy the input data to memory
CODESIZE //Get the size of code running in the current environment
CODECOPY //Copy the code running in the current environment to
//memory
GASPRICE //Get the gas price specified by the originating
//transaction
EXTCODESIZE //Get the size of any account's code
EXTCODECOPY //Copy any account's code to memory
RETURNDATASIZE //Get the size of the output data from the previous call
//in the current environment
RETURNDATACOPY //Copy data output from the previous call to memory
Opcodes for accessing information on the current block:
BLOCKHASH //Get the hash of one of the 256 most recently completed
//blocks
COINBASE //Get the block's beneficiary address for the block reward
TIMESTAMP //Get the block's timestamp
NUMBER //Get the block's number
DIFFICULTY //Get the block's difficulty
GASLIMIT //Get the block's gas limit
Ethereum State he job of the EVM is to update the Ethereum state by computing valid state transitions as a result of smart contract code execution, as defined by the Ethereum protocol. This aspect leads to the description of Ethereum as a transaction-based state machine, which reflects the fact that external actors (i.e., account holders and miners) initiate state transitions by creating, accepting, and ordering transactions. It is useful at this point to consider what constitutes the Ethereum state.
At the top level, we have the Ethereum world state. The world state is a mapping of Ethereum addresses (160-bit values) to accounts. At the lower level, each Ethereum address represents an account comprising an ether balance (stored as the number of wei owned by the account), a nonce (representing the number of transactions successfully sent from this account if it is an EOA, or the number of contracts created by it if it is a contract account), the account’s storage (which is a permanent data store, only used by smart contracts), and the account’s program code (again, only if the account is a smart contract account). An EOA will always have no code and an empty storage.
When a transaction results in smart contract code execution, an EVM is instantiated with all the information required in relation to the current block being created and the specific transaction being processed. In particular, the EVM’s program code ROM is loaded with the code of the contract account being called, the program counter is set to zero, the storage is loaded from the contract account’s storage, the memory is set to all zeros, and all the block and environment variables are set. A key variable is the gas supply for this execution, which is set to the amount of gas paid for by the sender at the start of the transaction. As code execution progresses, the gas supply is reduced according to the gas cost of the operations executed. If at any point the gas supply is reduced to zero we get an "Out of Gas" (OOG) exception; execution immediately halts and the transaction is abandoned. No changes to the Ethereum state are applied, except for the sender’s nonce being incremented and their ether balance going down to pay the block’s beneficiary for the resources used to execute the code to the halting point. At this point, you can think of the EVM running on a sandboxed copy of the Ethereum world state, with this sandboxed version being discarded completely if execution cannot complete for whatever reason. However, if execution does complete successfully, then the real-world state is updated to match the sandboxed version, including any changes to the called contract’s storage data, any new contracts created, and any ether balance transfers that were initiated.
Note that because a smart contract can itself effectively initiate transactions, code execution is a recursive process. A contract can call other contracts, with each call resulting in another EVM being instantiated around the new target of the call. Each instantiation has its sandbox world state initialized from the sandbox of the EVM at the level above. Each instantiation is also given a specified amount of gas for its gas supply (not exceeding the amount of gas remaining in the level above, of course), and so may itself halt with an exception due to being given too little gas to complete its execution.
Contract Deployment Code There is an important but subtle difference between the code used when creating and deploying a new contract on the Ethereum platform and the code of the contract itself. In order to create a new contract, a special transaction is needed that has its to field set to the special 0x0 address and its data field set to the contract’s initiation code. When such a contract creation transaction is processed, the code for the new contract account is not the code in the data field of the transaction. Instead, an EVM is instantiated with the code in the data field of the transaction loaded into its program code ROM, and then the output of the execution of that deployment code is taken as the code for the new contract account. This is so that new contracts can be programmatically initialized using the Ethereum world state at the time of deployment, setting values in the contract’s storage and even sending ether or creating further new contracts.
When compiling a contract offline, e.g., using solc on the command line, you can either get the deployment bytecode or the runtime bytecode.
The deployment bytecode is used for every aspect of the initialization of a new contract account, including the bytecode that will actually end up being executed when transactions call this new contract (i.e., the runtime bytecode) and the code to initialize everything based on the contract’s constructor.
Let’s take a simple Faucet.sol contract as an example:
pragma solidity ^0.4.19;
contract Faucet {
// Give out ether to anyone who asks
function withdraw(uint withdraw_amount) public {
// Limit withdrawal amount
require(withdraw_amount <= 100000000000000000);
// Send the amount to the address that requested it
msg.sender.transfer(withdraw_amount);
}
// fallback function
function () external payable {}
}
After getting the runtime bytecode of Faucet.sol, we can feed it into Binary Ninja (after loading the Ethersplay plug-in) to see what the EVM instructions look like.
When you send a transaction to an ABI-compatible smart contract (which you can assume all contracts are), the transaction first interacts with that smart contract’s dispatcher. The dispatcher reads in the data field of the transaction and sends the relevant part to the appropriate function. We can see an example of a dispatcher at the beginning of our disassembled Faucet.sol runtime bytecode. After the familiar MSTORE instruction, we see the following instructions:
PUSH1 0x4
CALLDATASIZE
LT
PUSH1 0x3f
JUMPI
As we have seen, PUSH1 0x4 places 0x4 onto the top of the stack, which is otherwise empty. CALLDATASIZE gets the size in bytes of the data sent with the transaction (known as the calldata) and pushes that number onto the stack. After these operations have been executed, the stack looks like this:
<length of calldata from tx>
0x4
This next instruction is LT, short for “less than.” The LT instruction checks whether the top item on the stack is less than the next item on the stack. In our case, it checks to see if the result of CALLDATASIZE is less than 4 bytes.
Why does the EVM check to see that the calldata of the transaction is at least 4 bytes? As was said above, because of how function identifiers work. Each function is identified by the first 4 bytes of its Keccak-256 hash.
A function identifier is always 4 bytes long, so if the entire data field of the transaction sent to the contract is less than 4 bytes, then there’s no function with which the transaction could possibly be communicating, unless a fallback function is defined. Because we implemented such a fallback function in Faucet.sol, the EVM jumps to this function when the calldata’s length is less than 4 bytes.
LT pops the top two values off the stack and, if the transaction’s data field is less than 4 bytes, pushes 1 onto it. Otherwise, it pushes 0. In our example, let’s assume the data field of the transaction sent to our contract was less than 4 bytes.
The PUSH1 0x3f instruction pushes the byte 0x3f onto the stack. After this instruction, the stack looks like this:
0x3f
1
The next instruction is JUMPI, which stands for "jump if." It works like so:
jumpi(label, cond) // Jump to "label" if "cond" is true
In our case, label is 0x3f, which is where our fallback function lives in our smart contract. The cond argument is 1, which was the result of the LT instruction earlier. To put this entire sequence into words, the contract jumps to the fallback function if the transaction data is less than 4 bytes.
At 0x3f, only a STOP instruction follows, because although we declared a fallback function, we kept it empty, so the contract would throw an exception instead.
Let’s examine the central block of the dispatcher. Assuming we received calldata that was greater than 4 bytes in length, the JUMPI instruction would not jump to the fallback function. Instead, code execution would proceed to the following instructions:
PUSH1 0x0
CALLDATALOAD
PUSH29 0x1000000...
SWAP1
DIV
PUSH4 0xffffffff
AND
DUP1
PUSH4 0x2e1a7d4d
EQ
PUSH1 0x41
JUMPI
PUSH1 0x0 pushes 0 onto the stack, which is now otherwise empty again. CALLDATALOAD accepts as an argument an index within the calldata sent to the smart contract and reads 32 bytes from that index, like so:
calldataload(p) //load 32 bytes of calldata starting from byte position p
Since 0 was the index passed to it from the PUSH1 0x0 command, CALLDATALOAD reads 32 bytes of calldata starting at byte 0, and then pushes it to the top of the stack (after popping the original 0x0).
Then PUSH29 0x1000000… instruction. SWAP1 switches the top element on the stack with the i-th element after it. In this case, it swaps 0x1000000… with the calldata. So the stack is:
<32 bytes of calldata starting at byte 0>
0x1000000… (29 bytes in length)
The next instruction is DIV, which means division.
The 0x100000000… we pushed earlier is 29 bytes long, consisting of a 1 at the beginning, followed by all 0s. Dividing our 32 bytes of calldata by this value will leave us only the topmost 4 bytes of our calldata load, starting at index 0. These 4 bytes—the first 4 bytes in the calldata starting at index 0—are the function identifier, and this is how the EVM extracts that field.
The result of the DIV (the function identifier) gets pushed onto the stack, and our stack is now:
<function identifier sent in data>
Since the PUSH4 0xffffffff and AND instructions are redundant, we can ignore them entirely, as the stack will remain the same after they are done. The DUP1 instruction duplicates the first item on the stack, which is the function identifier. The next instruction, PUSH4 0x2e1a7d4d, pushes the precalculated function identifier of the withdraw(uint256) function onto the stack. The stack is now:
0x2e1a7d4d
<function identifier sent in data>
<function identifier sent in data>
The next instruction, EQ, pops off the top two items of the stack and compares them. This is where the dispatcher does its main job: it compares whether the function identifier sent in the msg.data field of the transaction matches that of withdraw(uint256). If they are equal, EQ pushes 1 onto the stack, which will ultimately be used to jump to the withdraw function. Otherwise, EQ pushes 0 onto the stack.
Assuming the transaction sent to our contract indeed began with the function identifier for withdraw(uint256), our stack has become:
1
<function identifier sent in data> (now known to be 0x2e1a7d4d)
Next, we have PUSH1 0x41, which is the address at which the withdraw(uint256) function lives in the contract.
The JUMPI instruction is next, and it once again accepts the top two elements on the stack as arguments. In this case, we have jumpi(0x41, 1), which tells the EVM to execute the jump to the location of the withdraw(uint256) function, and the execution of that function’s code can proceed.
Turing Completeness and Gas As we have already touched on, in simple terms, a system or programming language is Turing complete if it can run any program. This capability, however, comes with a very important caveat: some programs take forever to run. An important aspect of this is that we can’t tell, just by looking at a program, whether it will take forever or not to execute. We have to actually go through with the execution of the program and wait for it to finish to find out. Of course, if it is going to take forever to execute, we will have to wait forever to find out. This is called the halting problem and would be a huge problem for Ethereum if it were not addressed.
However, with gas, there is a solution: if after a prespecified maximum amount of computation has been performed, the execution hasn’t ended, the execution of the program is halted by the EVM. This makes the EVM a quasi–Turing-complete machine: it can run any program you feed into it, but only if the program terminates within a particular amount of computation. That limit isn’t fixed in Ethereum—you can pay to increase it up to a maximum (called the "block gas limit"), and everyone can agree to increase that maximum over time. Nevertheless, at any one time, there is a limit in place, and transactions that consume too much gas while executing are halted.
Gas is Ethereum’s unit for measuring the computational and storage resources required to perform actions on the Ethereum blockchain. In contrast to Bitcoin, whose transaction fees only take into account the size of a transaction in kilobytes, Ethereum must account for every computational step performed by transactions and smart contract code execution.
Each operation performed by a transaction or contract costs a fixed amount of gas. Some examples, from the Ethereum Yellow Paper:
- • Adding two numbers costs 3 gas.
- • Calculating a Keccak-256 hash costs 30 gas + 6 gas for each 256 bits of data being hashed.
- • Sending a transaction costs 21,000 gas.
When an EVM is needed to complete a transaction, in the first instance it is given a gas supply equal to the amount specified by the gas limit in the transaction. Every opcode that is executed has a cost in gas, and so the EVM’s gas supply is reduced as the EVM steps through the program. Before each operation, the EVM checks that there is enough gas to pay for the operation’s execution. If there isn’t enough gas, execution is halted and the transaction is reverted.
If the EVM reaches the end of execution successfully, without running out of gas, the gas cost used is paid to the miner as a transaction fee, converted to ether based on the gas price specified in the transaction:
miner fee = gas cost * gas price (This has changed in London update, see below)
The gas remaining in the gas supply is refunded to the sender.
If the transaction “runs out of gas” during execution, the operation is immediately terminated, raising an “out of gas” exception. The transaction is reverted and all changes to the state are rolled back.
Although the transaction was unsuccessful, the sender will be charged a transaction fee, as miners have already performed the computational work up to that point and must be compensated for doing so.
Who decides what the block gas limit is? Above was said that there is some block gas limit. The miners on the network collectively decide the block gas limit. Individuals who want to mine on the Ethereum network use a mining program, such as Ethminer, which connects to a Geth or Parity Ethereum client. The Ethereum protocol has a built-in mechanism where miners can vote on the gas limit so capacity can be increased or decreased in subsequent blocks. The miner of a block can vote to adjust the block gas limit by a factor of 1/1,024 (0.0976%) in either direction. The result of this is an adjustable block size based on the needs of the network at the time. This mechanism is coupled with a default mining strategy where miners vote on a gas limit that is at least 4.7 million gas, but which targets a value of 150% of the average of recent total gas usage per block (using a 1,024-block exponential moving average).
EIP-1559 in The London Update Ethereum historically priced transaction fees using a simple auction mechanism, where users send transactions with bids (“gasprices”) and miners choose transactions with the highest bids, and transactions that get included pay the bid that they specify. This leads to several large sources of inefficiency:
Mismatch between volatility of transaction fee levels and social cost of transactions: bids to include transactions on mature public blockchains, that have enough usage so that blocks are full, tend to be extremely volatile. It’s absurd to suggest that the cost incurred by the network from accepting one more transaction into a block actually is 10x more when the cost per gas is 10 nanoeth compared to when the cost per gas is 1 nanoeth; in both cases, it’s a difference between 8 million gas and 8.02 million gas.
Needless delays for users: because of the hard per-block gas limit coupled with natural volatility in transaction volume, transactions often wait for several blocks before getting included, but this is socially unproductive; no one significantly gains from the fact that there is no “slack” mechanism that allows one block to be bigger and the next block to be smaller to meet block-by-block differences in demand.
Inefficiencies of first price auctions: The current approach, where transaction senders publish a transaction with a bid a maximum fee, miners choose the highest-paying transactions, and everyone pays what they bid. This is well-known in mechanism design literature to be highly inefficient, and so complex fee estimation algorithms are required. But even these algorithms often end up not working very well, leading to frequent fee overpayment.
Instability of blockchains with no block reward: In the long run, blockchains where there is no issuance (including Bitcoin and Zcash) at present intend to switch to rewarding miners entirely through transaction fees. However, there are known issues with this that likely leads to a lot of instability, incentivizing mining “sister blocks” that steal transaction fees, opening up much stronger selfish mining attack vectors, and more. There is at present no good mitigation for this.
The proposal in this EIP is to start with a base fee amount which is adjusted up and down by the protocol based on how congested the network is. When the network exceeds the target per-block gas usage, the base fee increases slightly and when capacity is below the target, it decreases slightly. Because these base fee changes are constrained, the maximum difference in base fee from block to block is predictable. This then allows wallets to auto-set the gas fees for users in a highly reliable fashion. It is expected that most users will not have to manually adjust gas fees, even in periods of high network activity. For most users the base fee will be estimated by their wallet and a small priority fee, which compensates miners taking on orphan risk (e.g. 1 nanoeth), will be automatically set. Users can also manually set the transaction max fee to bound their total costs.
An important aspect of this fee system is that miners only get to keep the priority fee. The base fee is always burned (i.e. it is destroyed by the protocol). This ensures that only ETH can ever be used to pay for transactions on Ethereum, cementing the economic value of ETH within the Ethereum platform and reducing risks associated with miner extractable value (MEV). Additionally, this burn counterbalances Ethereum inflation while still giving the block reward and priority fee to miners. Finally, ensuring the miner of a block does not receive the base fee is important because it removes miner incentive to manipulate the fee in order to extract more fees from users.
Consensus
Consensus via Proof of Work The creator of the original blockchain, Bitcoin, invented a consensus algorithm called proof of work (PoW). Arguably, PoW is the most important invention underpinning Bitcoin. The colloquial term for PoW is “mining,” which creates a misunderstanding about the primary purpose of consensus. Often people assume that the purpose of mining is the creation of new currency, since the purpose of real-world mining is the extraction of precious metals or other resources. Rather, the real purpose of mining (and all other consensus models) is to secure the blockchain, while keeping control over the system decentralized and diffused across as many participants as possible. The reward of newly minted currency is an incentive to those who contribute to the security of the system: a means to an end. In that sense, the reward is the means and decentralized security is the end. In PoW consensus there is also a corresponding “punishment,” which is the cost of energy required to participate in mining. If participants do not follow the rules and earn the reward, they risk the funds they have already spent on electricity to mine. Thus, PoW consensus is a careful balance of risk and reward that drives participants to behave honestly out of self-interest.
Ethereum started as a PoW blockchain, following Bitcoin’s example, in that it used a PoW algorithm with the same basic incentive system for the same basic goal: securing the blockchain while decentralizing control. Ethereum’s PoW algorithm was slightly different from Bitcoin’s and is called Ethash.
Consensus via Proof of Stake Historically, proof of work was not the first consensus algorithm proposed. Preceding the introduction of proof of work, many researchers had proposed variations of consensus algorithms based on financial stake, now called proof of stake (PoS). In some respects, proof of work was invented as an alternative to proof of stake. Following the success of Bitcoin, many blockchains have emulated proof of work. Yet the explosion of research into consensus algorithms has also resurrected proof of stake, significantly advancing the state of the technology. From the beginning, Ethereum’s founders were hoping to eventually migrate its consensus algorithm to proof of stake. In fact, there was a deliberate handicap on Ethereum’s proof of work called the difficulty bomb, intended to gradually make proof-of-work mining of Ethereum more and more difficult, thereby forcing the transition to proof of stake.
In general, a PoS algorithm works as follows. The blockchain keeps track of a set of validators, and anyone who holds the blockchain’s base cryptocurrency (in Ethereum’s case, ether) can become a validator by sending a special type of transaction that locks up their ether into a deposit. The validators take turns proposing and voting on the next valid block, and the weight of each validator’s vote depends on the size of its deposit (i.e., stake). Importantly, a validator risks losing their deposit, i.e., “being slashed”, if the block they staked it on is rejected by the majority of validators. Conversely, validators earn a small reward, proportional to their deposited stake, for every block that is accepted by the majority. Thus, PoS forces validators to act honestly and follow the consensus rules, by a system of reward and punishment. The major difference between PoS and PoW is that the punishment in PoS is intrinsic to the blockchain (e.g., loss of staked ether), whereas in PoW the punishment is extrinsic (e.g., loss of funds spent on electricity).
Since Ethereum was born in 2015, there was the intention to transition to a PoS consensus protocol. The first concrete step into that direction came the 1st of December 2020 with the launch of the Beacon Chain. Initially the Beacon Chain was an empty blockchain that let everyone become a validator by depositing 32 ETH into a specific deposit contract and only handled its internal consensus of validators and their respective balances. At that time, the Ethereum blockchain was still using Ethash as its consensus protocol.
On September 15, 2022, the Merge hard fork occurred, and the Beacon Chain, with its own set of validators, extended its PoS based consensus protocol to the Ethereum main blockchain, effectively ending the use of Ethash. However, some limitations remained, including the inability for validators to withdraw their capital and leave the validator set.
These issues were fully resolved on April 12, 2023, with the Shapella update, which conclusively transitioned Ethereum from a Proof of Work (PoW) to a Proof of Stake (PoS) consensus protocol.
The Proof of Stake consensus protocol used by Ethereum is called Gasper. In the following sections, we will explore how it works, starting with basic terminology and progressing to the fork choice rule (LMD-Ghost) and finality gadget (Casper FFG).
Nodes and Validators Nodes compose the Ethereum network’s backbone. They communicate with each other and are responsible for validating consensus adherence. Validators are attached to nodes, and despite what their name suggests they are not in charge of any validation. Validators carry out consensus by proposing and voting on new blocks, while validators’ output is validated by nodes to ensure that blocks and transactions adhere to the network’s rules. We will see in the following chapters the validators’ duties in detail.
One peculiar property of PoS to keep in mind is that the set of active validators is known: this will be key to achieve finality, as we can identify when we have achieved a majority vote of participants.
Blocks and Attestations Strict time management is an important property of Ethereum’s Proof of Stake. The two key intervals in PoS are the slot which is exactly 12 seconds, and the epoch which spans 32 slots.
At every slot, exactly one validator is selected to propose a block. At every epoch, every validator gets to share its view of the world exactly once, in the form of an attestation. An attestation contains votes for the head of the chain that will be used by the LMD GHOST protocol, and votes for checkpoints that will be used by the Casper FFG protocol, where FFG stands for “Friendly Finality Gadget,”.
Attestation sharing is bandwidth intensive, so it’s taken on every epoch instead of every block to spread the necessary workload and keep it manageable.
The protocol incentivizes block and attestation production and accuracy via a system of rewards and penalties for validators, but it tolerates empty slots and attestations which can happen for both organic (e.g. node went offline) and profit-driven reasons - we will expand on this in the later section about timing games.
LMD Ghost GHOST is a fork-choice algorithm for selecting the latest block of the chain. It doesn’t follow the heaviest chain as the Bitcoin fork-choice algorithm does, but the heaviest subtree.
The ratio is that, in a PoS system we have way more consensus data than in PoW because every validator votes once per epoch. So we want to use all this information to select the latest block - i.e. the head block - of the chain.
Basically every vote for a block is not only a vote for that block, but also for every ancestor of that block. And the fork choice rule is not only driven by block proposers, but by every validator’s attestations, as the Message Driven (MD) of LMD can suggest. Latest (the L in LMD) means that we only consider the last vote of every validator, discarding old ones.
The whole idea of LMD-Ghost can be described with the following points:
- Every validator casts a vote once per epoch about what he thinks it’s the head of the chain at the moment he publishes the attestation.
- Every block has a score (or weight) obtained by the sum of every validator that votes for that block as the head block of their local chain.
- This score is recursively applied to all branches that root that block.
- Every validator always assumes that the subtree with the heaviest weight is the “right” one.
Now let’s see how LMD-Ghost works in practice.
We have a subtree of blocks, starting from a root block and the weights for every block. We know the weights of every block through attestation messages received via gossip from other validators or directly included in the blocks. In fact, by having other validators’ attestations we can calculate the score of every block by summing the votes that each block has received from all the validators.
The algorithm proceeds recursively, starting from the root block and selecting the branch with the highest weight, until it stops on a leaf node that doesn’t have any descendents. Of course if a block has only one descendent, there’s no choice to be made by the algorithm, while if there is a scenario where two or more branches have an equal weight, LMD-Ghost arbitrarily chooses the branch rooted at the child block with the highest block hash value.
Remember that the universal view of the network, such as being able to see everything that’s going on on the whole network, doesn’t exist. Every validator has its own local view and it runs LMD-Ghost on top of it. The idea is that honest validators build their blocks on the best head they see and cast their votes according to it. And by having a majority (more than 51%) of validators behaving correctly, we can reach a unique consensus about the history of the blockchain.
Let’s see an example. We are now a validator that has to propose the next block or simply has to publish an attestation. We need to run LMD-Ghost on our own local view to get the last head block so that we can then produce the next block on top of it or vote for that head block if we’re just publishing an attestation
First of all, from the latest attestations in our storage, we need to calculate the weight of votes for each block in the tree.
Then, from the weight of the blocks we can derive the weight of each branch.
Last, we start from the root block (block A in this example) and each time we select the branch with the highest score, until we get to a leaf node, that is our head block returned by the fork-choice algorithm.
Note how we first selected the branch that goes to block C even if block C has less votes than block B, because we only care about total votes for the branch, and that includes all the votes for block C’s children. The idea is that the majority of validators the node knows about think that that subtree is the correct chain of blocks to follow.
Incentives Block proposers are implicitly incentivised to produce a block on top of the correct head block, otherwise they risk their block to be orphaned and will not receive any reward. In fact all transactions’ fees of a block, plus a dynamic amount of inflated reward, are directed to the proposer of that block.
Attesters are explicitly incentivised to vote correctly by being rewarded if their vote is added into next blocks. Actually the maximum amount of reward an attester can receive is when his attestation is added into the immediately next slot from when he publishes it. Block proposers are also explicitly incentivised to add as more attestations as possible by receiving a very small reward per each vote they add into a block.
Slashing An important concern that Proof of Stake systems have to solve is how they can punish bad behaviors. In PoW systems dishonest participants are implicitly punished because they have to waste time, energy and real money in order to do those bad behaviors.
Think about a Bitcoin miner that wants to create an heaviest chain that contains a double spend. He has to create blocks faster than all other miners together. If he tries to create those blocks and doesn’t succeed, he’s already implicitly punished by the protocol by having wasted his time, his energy and real money to even try and mine those blocks. If he doesn’t end up to be on the chain all other nodes consider to be valid, he actually spent time and money for nothing.
In PoS systems it is almost free for a validator to equivocate by publishing multiple contradictory messages. This is also called the “nothing at stake” problem.
The solution is quite simple but very powerful. Every participant must have a minimum amount of ETH in order to become a validator. Right now this value is fixed at 32 ETH which is more than $60.000 at the time of writing this. The system is able to detect when a validator has equivocated and punishes it by removing a portion of the amount he previously used to become a validator, and ejecting him from the protocol. Since validators digitally sign every message, it’s very easy to detect a proof of misbehavior.
Casper FFG: The Finality Gadget Casper FFG is a kind of meta-consensus protocol. It is an overlay that can be run on top of an underlying consensus protocol in order to add finality to it.
In Ethereum’s proof of stake consensus, the underlying protocol is LMD GHOST which does not provide finality. Finality ensures that blocks once confirmed in the chain cannot be reversed, they will be part of the chain forever. So in essence Casper FFG functions as a “finality gadget”, and we use it to add finality to LMD GHOST.
Casper FFG takes advantage of the fact that, as a proof of stake protocol, we know who our participants are: the validators that manage the staked Ether. This means that we can use vote counting to judge when we have seen a majority of the votes of honest validators. More precisely, votes from validators that manage the majority of the stake - in everything that follows, every validator’s vote is weighted by the value of the stake that it manages, but for simplicity we won’t spell it out every time.
Casper FFG, like all classic Byzantine fault tolerant (BFT) protocols, can ensure finality as long as less than a third of validators are faulty or adversarial. Once a majority of honest validators have declared a block final, all honest validators agree, making that block irreversible. By requiring that honest validators constitute over two-thirds of the total, the system ensures that the consensus accurately represents the honest majority’s view.
Notably, Casper FFG distinguishes itself from traditional BFT protocols by offering economic finality even if more than a third of validators are compromised.
Casper FFG ensures consensus by requiring votes from over two-thirds of validators within an epoch, dividing voting across 32 slots to manage the large validator set efficiently. Validators vote once per epoch on a checkpoint, the first slot, to maintain a unified voting focus. This process, incorporating both Casper FFG and LMD GHOST votes for efficiency, aims at finalizing checkpoints, not entire epochs, clarifying that finality extends to the checkpoint and its preceding content.
Justification and Finalization Casper FFG, like traditional Byzantine Fault Tolerance protocols, secures network agreement in two stages. Initially, validators broadcast and gather views on a proposed checkpoint. If a significant majority agrees, the checkpoint is justified, signaling a tentative agreement. In the subsequent round, if validators confirm widespread support for the justified checkpoint, it achieves finalization, meaning it’s unanimously agreed upon and irreversible. This process underlines the collaborative effort to ensure network consistency and security, aiming for checkpoints to be justified and then finalized within specific timeframes, improving the reliability of the consensus mechanism.
Sources and Targets, Links and Conflicts In Casper FFG, votes comprise source and target checkpoints, representing validators’ commitments to the blockchain’s state at different points. These votes are cast as a linked pair, indicating a validator’s current and proposed points of consensus. The source vote reflects a validator’s acknowledgment of widespread support for a checkpoint, while the target vote represents a conditional commitment to a new checkpoint, dependent on similar support from others. This dual-vote system facilitates a structured progression towards finalizing blocks, ensuring network integrity and continuity.
Supermajority Links In Casper FFG, a supermajority link between source and target checkpoints, s→t, is established when more than two-thirds of validators, by stake weight, endorse the same link, with their votes timely included in the blockchain. This mechanism ensures consensus and security by validating the sequence of checkpoints through widespread validator agreement.
Justification In Casper FFG, when a node observes a majority of validators agreeing on a transition from one checkpoint to another, it justifies the old checkpoint. This signifies that the node has seen evidence of consensus from a significant portion of the validator set, making a commitment not to revert to a previous state unless overwhelming consensus is shown for an alternative path.
Finalization When a node observes a consensus (a supermajority link) from one justified checkpoint to its direct child, it finalizes the parent checkpoint. This indicates a network-wide commitment not to revert from this point, backed by a strong majority of validator support. Finalization ensures network stability and security by making the blockchain history immutable past that checkpoint, preventing reversals without significant consequences for validators.
Fork Choice Rule Casper FFG modifies the traditional fork choice rule, mandating that nodes prioritize the chain with the highest justified checkpoint. This adaptation, which is an evolution from the LMD GHOST protocol’s approach, ensures the network achieves finality by committing to checkpoints that have been agreed upon by a supermajority of validators. It effectively guarantees that once a checkpoint is justified, the network cannot revert beyond it, reinforcing the security and stability of the blockchain. This rule is also designed to maintain network liveness, aligning with Casper’s foundational goals.
Plausible Liveness Casper FFG ensures the network remains active and can always reach consensus without any honest validators being penalized, embodying the concept of “plausible liveness.” This means that, provided a supermajority of validators are honest, the protocol can continue justifying and finalizing new checkpoints, avoiding any deadlock scenarios where progress is halted due to fear of slashing. This principle ensures the network’s resilience and continuous operation, underlining Casper’s adaptability to maintain consensus even in challenging conditions.
Conflicting Justification Justifying a checkpoint means that I, as a validator, have received confirmation from ⅔ of the validators that they approve the checkpoint. This approval, however, represents only my local perspective. It’s possible other validators have different information; I can’t be sure. Despite this uncertainty, as an honest validator, I commit to never reversing any checkpoint that I’ve justified based on my local data.
Finalizing a checkpoint, on the other hand, takes this a step further. It occurs when I’ve received assurances from ⅔ of the validators that they, too, have heard from ⅔ of their peers confirming the checkpoint’s validity. This means that a supermajority of the network—not just my local view—acknowledges and commits to this checkpoint. It’s this broad consensus that protects the checkpoint from being reversed globally. Therefore, a finalized checkpoint is not just locally recognized; it’s globally secured.
Let’s explore an extreme scenario to understand the consensus process better. Suppose we have four validators: A, B, C, and D. All of them are honest, but the network they operate in can experience indefinite delays. For the sake of this example, imagine that there’s a checkpoint at every block height.
Every validator in the scenario has the checkpoint N+1 and can therefore justify the checkpoint N; so N, the source is justified locally for all 4 validators and N+1 is the target.
Now let’s imagine that A is severely delayed in the network connection and it’s also chosen to propose a block.
A proposes a block in the epoch N+2, this block contains all the 4 votes to finalize the checkpoint N, but since its network connection is severely delayed, the other validators never see it.
A has a supermajority link between the source N and the target N+1, so it will finalize checkpoint N and justify checkpoint N+1, meanwhile B, C and D saw no votes in the current epoch, so they still have only justified checkpoint N.
They will also vote for an empty checkpoint in this epoch, which is checkpoint M.
In epoch N+3 one validator between B, C and D is chosen to propose a block.
This block contains 3 votes with the source as the checkpoint N and the target as the checkpoint M, therefore there is a supermajority link between N and M that allows the validators B, C and D to have checkpoint N as finalized and checkpoint M as justified.
A, on the other hand, considers this block to be invalid, because in its local view, N+1 is justified and cannot be reverted.
The only solution for the chain on validator A to continue is to delete its memory and resync with the rest of the network.
Supermajority Bug Let’s also explore a scenario where the majority client, an execution client that is used by at least ⅔ of the network, has a bug.
We will have 2 different clients for this scenario: AliceETH, the good one, BobETH the buggy one.
Due to a bug, the validators running BobETH at Epoch N+2 propose an invalid block. At this point, there are three possible outcomes:
• The validators that run the BobETH client have less than ⅓ of the total ethereum staked.
In this case the validators that run BobETH will eventually switch to the correct chain.
• The validators that run the BobETH client have less than ⅔ of the total ethereum staked, but more than ⅓.
In this case neither chains have enough votes to finalize any checkpoint, so 2 chains will be built at the same time, with neither of the two chains being finalized. Assuming that the bug fix arrives before the inactivity leak drains enough funds from the BobETH validator’s chain so that they now have less than ⅓ of the Ethereum staked, at that point the validators from BobETH can just switch to the correct canonical chain because that chain can be finalized.
This will require a bug fix by the BobETH developers, once the bug fix is developed, published and installed by all the validators that run BobETH, the chain with the invalid block will be discarded, and the other chain will be continued and finalized.
The penalty for the validators with the buggy client would be relatively light, only an “inactivity leak” for not having participated in the correct chain.
The “inactivity leak” is a penalty to inactive validators that grows quadratically with time.
• The validators that run the BobETH client have more than ⅔ of the total ethereum staked.
In this scenario the validators will still be A, B, C and D, A runs the client AliceETH and B, C and D run the client BobETH.
Just as before every validator will justify checkpoint N in the epoch N+1; they will also finalize said block in the epoch N+2.
All the validators that are running AliceETH will discard the invalid block and checkpoint, but those validators have not enough stake to actually finalize the correct chain that they are building.
All the validators that are running BobETH will continue with the invalid chain that is being built and they will be able to finalize the invalid canonical chain.
In this example a bug fix would not help, because the validators B, C and D cannot switch to the correct chain being built by the validator A, unless that chain is finalized.
To fix this issue the validators B, C and D will lose stake due to the “inactivity leak” for a long time, until their staked ethereum will no longer be more than ⅓ of the total ethereum staked.
At this point the chain built by the validator A can be finalized, because A has more than ⅔ of the total ethereum staked, and B, C and D will be able to switch to it.
Timing Games In Ethereum’s protocol, time is structured into 12-second units called slots. Each slot assigns a validator the role of proposing a block right at the start (t=0). A committee of attesters is then tasked with validating this block, aiming to do so by four seconds into the slot (t=4), which is considered the attestation deadline.
Timing games in Ethereum create a competitive landscape where gains from Maximum Extractable Value (MEV) for one validator may lead to disadvantages for others. This competition can disrupt consensus by increasing the number of missed slots and potential block reorganizations. Additionally, it motivates attesters to postpone their validations, adding layers of complexity to the process.
The “Principles of Consensus” section points out the importance of liveness for Ethereum’s consensus process. However, timing games pose a threat to this critical feature by compromising the network’s reliability.
What’s a timing game? It’s like waiting for the perfect moment to make a move, aiming to get the most out of it. This is what some of the people keeping the network up and running are trying to do. They’re waiting for the right time to act to get the most rewards. But, this waiting game can be risky. If their internet is slow or they’re not too experienced, they might miss their chance to do their part. And missing too many chances could make the network less dependable.
Right now, this isn’t a big problem. Most of the entries working as validators aren’t really getting into these timing games or aren’t playing them at all.
However, we’re seeing more and more validators starting to play these games. It’s something we need to keep an eye on so it doesn’t turn into a bigger issue.
A good article on MEV - https://ethereum.org/en/developers/docs/mev/#mev-extraction
Centralization of Supermajority The concept of supermajority client risk in Ethereum is all about balancing the network’s health and security. Ethereum decided to use multiple clients to prevent any single point of failure. This is because all software, including these clients, can have bugs. The real trouble starts when there’s a consensus bug, which could lead to something serious like creating infinite Ether out of thin air. If just one client ran the whole show and it got hit by such a bug, fixing it would be a nightmare. The network could keep running with the bug active long enough for an attacker to cause irreversible damage.
Let’s analyze a fast example of what could happen if a majority client had a bug, please note that every block below is a checkpoint and not a block in the blockchain:
Functional clients disregard the epoch containing the invalid block (indicated in red). The initial red arrow serves to justify the invalid epoch, while the subsequent one finalizes it.
Assuming the bug is resolved and the validators who finalized the invalid epoch wish to switch back to the correct chain B, a preliminary action required is the justification of epoch X:
To engage in the justification of epoch X, requiring a supermajority link as shown by the dashed green arrow, validators must bypass the second red arrow, which represents the finalization of the invalid epoch. Casting votes for both links could lead to penalties for these validators.
The multi-client approach offers a safety net. If a bug pops up in a client that less than half the network uses, the rest of the network, running other clients, simply ignores the buggy block. This keeps the network on track, minimizing disruption. But, if a majority client — especially one used by more than two-thirds of validators — introduces a bug, it could wrongly finalize the chain, leading to a potential split.
Ethereum encourages diversifying clients because if everyone used the same client and it failed, the whole network would be at risk. The penalties for running a client that goes against the grain are there to discourage putting all our eggs in one basket. This way, if a client does have a bug, the damage is contained, affecting fewer users.
If a minority client causes trouble, it’s less of an issue because the majority can correct the path.
The more we spread out our choices across different clients, the safer Ethereum becomes. It’s not just about avoiding technical failures; it’s about safeguarding Ethereum’s future against any single point of failure. This diversity is our best defense against network-wide crises, ensuring Ethereum remains robust and resilient no matter what comes its way.