On 2 January 2018, news broke of a novel class of security flaws in modern processors. Known inside the chip and software industry since the middle of 2017, and deeply embedded in the fundamental design of the processors, the problems were reported by the Google Project Zero security research team — one of several to discover the flaws — to exist in some form in most Intel CPUs since 1995. Some AMD and ARM processors were also reported as affected, but a full list of which chips have what class of problem does not yet exist (the Raspberry Pi, however, is secure).
Meltdown can and has been protected against in recent updates to macOS, Windows, and Linux-derived operating systems, albeit with a sometimes significant performance impact. Spectre, however, is much harder to defend against, with most industry analysts agreeing that chip redesigns will be needed — although at the time of writing, Intel claims to have system and firmware patches that render its processors ‘immune’. This remains to be confirmed and, if true, what the performance ramifications are. AMD has issued new firmware that reportedly disables the chip features at fault: again, the practical implications of this microcode update await discovery.
The first lesson of Meltdown and Spectre, then, is basic security hygiene: ensure that, whatever your hardware, the operating system running on it is fully patched and up to date.
How did this happen?
Meltdown and Spectre are examples of one of technology’s most common trade-offs: performance versus security.
Before digging into the details, an analogy may help. Let’s say you want to know whether the Vatican archives contain a certain book, the existence of which is denied. If the book exists, then it’s only available to certain highly qualified theologians, which you are not.
The Vatican archivists are very busy and it always takes a long time to locate anything, so they’ve evolved a way to save time. Instead of checking your credentials before dispatching a minion to the stacks to retrieve your request, the minion sets off while the front desk is checking the papal database to see whether you’re allowed to see the book you’ve asked for.
You turn up, present your credentials, and make your request. After a while, the front desk tells you to go away empty-handed, and says nothing about the book.
However, if you then sit outside the archives and watch a qualified theologian go in and return shortly afterwards clutching a book, you can discover that the archivists found the book you requested while you were waiting and had it available for the properly-accredited cleric.
Nobody has told you about the book and you didn’t get past security, but you’ve extracted information nonetheless, thanks to your knowledge of how the system works.
Meltdown and Spectre work in the same way; they know how the system works and can manipulate it to indirectly reveal information by setting up situations and seeing what happens next.
Inside the processor
Key processor techniques such as pipelining, out-of-order execution, branch prediction, and speculative execution have evolved over the past twenty years, together with memory speed-ups through increasingly complex caching systems. The interactions between these techniques have now been shown to be fundamentally insecure, without any one system being at fault.
All processors deal with the instructions that make up programs by a series of steps. Once an instruction is loaded from memory, it is analysed, any data it needs is loaded, the actual operation then carried out, and any results put in the right place. Each step is the responsibility of a sub-unit within the processor, with intermediate results being passed along.
Early processors didn’t load the next instruction until the preceding one had been completely dealt with, leaving each sub-unit idle for most of the time. By loading multiple instructions in and moving them along inside the processor as each sub-unit becomes free, more modern processors create a conveyer belt, or pipeline, that keeps everything busy most of the time, speeding things up considerably. Often, the processor knows that two or more instructions that don’t depend on each other can be dealt with at the same time even if they’re in different places in the program — out-of-order execution — further improving sub-unit utilisation.
Keeping the pipeline full is very important. If for whatever reason it has to be flushed out and processing started afresh — a pipeline stall — this is very expensive in wasted time.
Two things reliably stall a pipeline: branches and tests. This realisation leads to two major design features of contemporary processors, branch prediction and speculative execution, and these are at the heart of the new class of security vulnerability.
Every program jumps about or branches within its execution, switching from one chain of instructions to another. When that happens, all of the instructions within the pipeline that happen after the branch instruction are no longer valid, because execution is starting afresh from a different area of memory.
To offset that, when the processor first encounters a branch instruction entering the pipeline it immediately starts to load in the new instructions from the branch destination. If the branch is conditional — jump if some data has a certain value, don’t jump otherwise, or jump to a computed value — the processor attempts to guess which path will be followed and uses that to fill its pipeline. At some point further down the pipeline, when the branch has been evaluated and the actual result computed, the processor either just carries on — it got the guess right — or stalls the pipeline and starts anew. There’s no extra performance loss from a failed branch prediction, and a large benefit if it’s right, so branch prediction is a big advantage.
Furthermore, the better branch prediction is, the better the chip performs, so it’s an area where chip vendors can profitably spend a lot of engineering time and also where trade secrets are very important. The details of how it works are often vague, although common techniques include keeping a statistical track of how a branch has proceeded in the past.
Speculative execution is a more general form of the ideas behind branch prediction. Typically, a program does a calculation or logical test that sets up how the subsequent instructions behave. Sometimes that test can take some time to complete, especially if it involves retrieving data from external memory. Rather than wait for that result, the processor assumes what it might be, checkpoints how it’s set up at the point of the test, and carries on. As with branch prediction, if the assumption proves incorrect when the test finally completes, then nothing extra is lost: the processor restores its state to the checkpoint and carries on. Again, speculation can be a very complicated and nuanced technique to get right, so processor designers can be reluctant to share all the details.
Where it goes wrong
The major problem that leads to Meltdown or Spectre type misbehaviour is the assumption that the processor can recover from a failed guess and restore itself to exactly the condition it was in before. Within the processor’s instruction pipeline — barring design flaws — that’s perfectly possible. However, what happens if the failed speculative code has changed things outside the processor? Another assumption is made: if you don’t speculatively execute code that writes to memory or devices, then nothing will be changed.
This is the key assumption that fails in real life, and the heart of the new vulnerabilities. When a speculative instruction reads from memory, it goes to the cache first — and the cache’s condition can materially change or materially affect later processing in ways that persist.
Side-channel and the processor cache
Reading information by inference from a failed speculative execution cache interaction is an example of a side-channel attack; extracting data independently of the processor’s explicit data-handling paths.
Modern processors have a hierarchy of caches — blocks of on-chip memory that are faster to work with than system memory reached through the CPU’s external, slow buses. Intel processors typically have three levels of cache: the first, quickest and smallest is the Level 1 or L1 cache. If a processor wants to read or write memory data, it checks whether it’s already in the L1 cache (a cache hit), in which case it uses that copy. If not (a cache miss), it checks the next level — the larger but slower L2 cache; if it’s there, the cache controller delivers the data to the processor but also moves the chunk of L2 data containing the requested information into L1. If the data wasn’t in L2, then the process repeats with the yet larger, yet slower L3 cache (the only one shared between cores), again cascading both the data and its neighbours back up the cache hierarchy. All these things take differing amounts of time and can leave the cache in different states, regardless of whether code was speculative or successful.
Download now: Network security policy (free PDF)
Side-channel cache attacks rely on facts like cache misses resulting in slower data processing than hits. Attacking code can tell whether a victim process has accessed a certain block of memory by timing how long it takes to complete, or by clearing the cache before the victim executes, and then seeing whether the memory address is back in the cache afterwards. Just the presence of cached memory is enough, and this can be detected even if the system denies access to its contents.
Spectre: a worked example
Here’s how it works in just one case of Spectre.
The attacker identifies a standard function in the victim code — in the kernel or otherwise — that does what’s called a ‘bounds check’, before passing back data from a table. This accepts an input value, multiplies it with another value in memory to find the right entry in the table, and returns the result from that — if and only if the first value is within a safe range.
If the input value were to be outside the safe range and got through, then the victim code could pass back some other, secret area of memory. So, the first thing the victim code does is compare the input value to a safe limit held in memory. It then branches either to code that runs the ‘OK, here’s your information’ function, or branches to code that says ‘wrong number, access denied’. This is all absolutely standard data security practice.
The Spectre attack starts by repeatedly calling the victim code with a safe, valid input, training the branch predictor to expect to run the code that gives the thumbs-up and passes back safe data.
Then the attack code clears the cache that contains the information that the victim code uses to check the initial value — again, a legitimate if impolite move — and submits an incorrect value aimed at the contents of memory it’s not normally allowed to see.
The bounds check dutifully sets to work, but because the attacker has cleared the cache containing the check value it has to wait for that to come in from main memory. This can take hundreds of cycles — long enough to run quite a lot of code.
Which is exactly what branch prediction is designed to help out with. It goes immediately to work, checking which way the branch normally goes and thus, during the delay, speculatively executes the ‘OK, here’s your data’ code.
This time, the guess is wrong. The speculative execution of the mis-guessed branch uses the malicious value to first of all retrieve the data in the secret memory location that the attacker has targeted — let’s say it’s a six. It then uses that multiplied by the input value to find the sixth entry in what it thinks is the legitimate final table, but is in fact an area in memory that the attacker knows about and has also previously cleared from cache.
By now, the processor has finally got its check data, determined that the initial value was incorrect and abandoned the speculative execution, throwing away its results and running the ‘go away, buster’ code instead. Again, this is normal operation — and the processor has no reason to think anything’s wrong.
But it’s too late. The cache has reacted to the calculated entry in the fake final table and loaded in the memory corresponding to it. The attacker can now go through the fake table memory, seeing that the sixth entry was actually cached, perhaps by timing its own accesses. It now knows that the target secret memory contained a six, and can set to work on the next target memory location. This too requires no special privileges.
This is not a fast process, with data extraction running at a mere couple of thousand bytes per second in the researchers’ test code. But that’s more than enough for security keys and the like, which can then be used to unlock everything. None of these techniques can alter data directly; they can only retrieve information. This is enough — and, again, this example is just one of many sorts of attack that can be mounted.
Meltdown doesn’t rely on finding victim code. Instead, it uses the fact that physical memory is split up by the processor’s memory management into multiple virtual memory areas having different access permissions. The kernel containing the privileged and secure operating system components has access to all memory, but user processes have access restricted to their own areas only.
By setting up speculative execution in a user process that accesses protected memory — that of other users, or of the kernel — Meltdown relies on the protection mechanism kicking in and passing control to the kernel to handle while speculative execution is still going on. In the time it takes to handle the illegal access, the speculative execution has the contents accessed and can make requests based on the data that affect the cache. The processor subsequently discards the speculative execution results and returns to normal operation, but again too late. This is an example of a race condition, where two processes affecting the same data are operating independently, with unpredictable results, and it depends heavily on the details of the memory management unit.
Spectre is more subtle and, unlike Meltdown, must be specially implemented for particular processes and environments. However, it isn’t dependent on, and in no way needs to violate, the processor’s internal protection mechanisms.
Can they be stopped?
Meltdown is the easier class of problem to protect against without a processor redesign, using a variety of techniques such as moving kernel memory to a higher protection level within the processor and changing the processor’s internal state more significantly when a user program switches to the kernel, either through a direct call or when an error occurs. Randomising where within kernel memory different data structures and processes exist also makes it harder for attackers of all sorts to know where to look. However, enforcing stricter controls on switching to kernel mode introduces significant performance deficits.
Spectre is much harder to guard against, as it relies on the normal functioning of speculative execution and branch prediction, and the normal working of caches. Turning off any of these techniques across the board removes decades of performance improvement, assuming it’s even possible in existing designs. Modern processors can have many of their features fine-tuned after manufacture, as many very low-level functions can be reconfigured by loading new rules that control how the sub-units work. Only the vendors know the details of such reprogramming, and while particular tweaks, combined with software designed to minimise attack possibilities such as Retpoline, can potentially fix areas of attack, the basic problem remains.
The good news, such as it is, is that while this class of vulnerability adds a significant and potentially invulnerable new tool to the attacker’s toolset, it still relies on installing malicious code on the target. That still has to get past existing security systems that block, scan, and monitor for all suspicious activity, and as exploits that use Spectre-class attacks are discovered they can be guarded against — so long as security updates are religiously applied. But the zero-day attack just grew a lot more teeth.
The fact that some aspects of modern computing take longer than others is a function of electrical physics: this will not change. It’s possible to design out many of the slow aspects of inter-process security, but only at the expense of flexibility and scalability: arbitrarily complex and numerous processes running on a single system will need access and privilege control that can handle large data sets dedicated to their management. You can’t have large datasets with all components instantly accessible to central processing.
The choices are stark: things will slow down, continue to be insecure, or need a radical reimagining of basic CPU architecture with all that this implies for compatibility and continuity.
Which will happen remains to be seen.
PREVIOUS AND RELATED COVERAGE
The Linux vs Meltdown and Spectre battle continues (ZDNet)
Fixing Meltdown and Spectre will take Linux — and all other operating systems — programmers a long, long time. Here’s where the Linux developers are now.
Windows Meltdown-Spectre patches: If you haven’t got them, blame your antivirus (ZDNet)
Microsoft says your antivirus software could stop you from receiving the emergency patches issued for Windows.
How the Meltdown and Spectre security holes fixes will affect you (ZDNet)
Get ready to patch every piece of computing gear in your home and company to deal with this CPU nightmare.
How the Meltdown and Spectre chip flaws will impact cloud computing (TechRepublic)
Mitigations for two critical architectural flaws in CPUs can cause performance degradation, but real-world impact is lower than synthetic benchmarks.
Massive Intel CPU flaw: Understanding the technical details of Meltdown and Spectre (TechRepublic)
Two critical architectural flaws in CPUs allow user processes to read kernel memory, affecting Intel, AMD, and ARM processors. Here’s what you need to know.