PC Tips (dot) click: Hertzbleed Simplified

Hertzbleed is a new vulnerability that is present on Intel and AMD CPUs. It allows extracting information from much like Spectre does, though it is more feasible to exploit over a network.

On the whole, after reading the paper, it is not as big of a problem like those vulnerabilities but it is important to understand if you are a sysadmin.

What it can do:

Remotely break encryption keys from only SIKE, at this time,
Remotely means only local, Ethernet networks, not over the internet (<1ms latency in the paper), [2]
Only works on CPUs with very fine-grained clock boosting (Ryzen 2nd+ gen, Intel 9th+) [1]
Can weaken kernel defenses against hacks (ASLR).

Hardware Failures

Previously there has been ways to do these kinds of attacks on the local machine, by looking at the power consumption of the CPU that the OS provides. This has, later, been restricted so that only privileged users can access that data, closing the exploit for intra-machine users or processes.

The novelty in Hertzbleed is that researchers found that different numbers going into the CPU can change the clock speed and therefore the calculation time.

So, say you are calculating a 64 bit number and if you use only enough bits for the number to be small, it only takes up 32 bit:

The fact that the top bits are unused (0) has been demonstrated by the researchers to use less power.
Because it uses less power, the CPU can clock higher.
By clocking higher, it does these "small number" calculations faster.
The speed increase is fast enough to be measurable and provide information to the attacker/hacker.

This is only possible because newer processors have a very fine-grained clock boosting that relates to power, not the number of cores active. The most way people are familiar with this is the frequency of FPU/SSE code vs AVX/AVX2 instructions affecting frequencies and power usage - which in the case of AVX512, can even send the CPU below base clock.

There are also variations on the bit patterns in a number and how they can be used to extract information but that is quite complex to convey in "TLDR".

Software Failures

What to note here is that the attack is possible because: when the hacker guesses a wrong bit, the SIKE cypher will enter a cycle were the number processed (in the CPU) will be zero and will stay like that until the calculation ends.

So, the attacker will be able to try some bits and check if they were right or wrong by measuring the time the server takes to process the unknown bits of the key. After a while, the rest of the bits can be brute forced.

This is what makes it possible to exploit the frequency variations and it was quickly patched on the reference implementations from Microsoft and Cloudflare. It is for this reason that I earlier commented that this was not a very scary vulnerability.

The attack

The said hacker would do the following:

First monitor the CPU and look for a slow down that would indicate a power cap is in place and the Turbo window has expired.
Next, it would send different numbers to be processed and measure slower or faster calculations.
With this, it is possible to infer bits from the secret key on the server.

This would also have to go unnoticed, as it would take at least a day on the unpatched Cloudflare library and certainly a lot more over the internet.

Unaffected Hardware

Since this seems to rely on slight changes to clock speed, anything that is pre-Intel 8th gen isn't affected. Likewise - although not explicitly mentioned - this will only affect Ryzen after 2nd gen, which was when AMD introduced technology that related clocks to power instead of fixed boosting tables based on the number of active cores.

For x86, the only hardware tested and safe was:

Intel Core up to 7th generation (Kaby Lake),
AMD Rzyen 1st gen and earlier CPUs.
Atom based CPUs up to "Airmont" architecture.

This list is not extensive. It is possible that recent Atom like cores from "Apollo Lake" forward are affected, as I recall these having a more fine-grained frequency control after Turbo time has passed.

Hertzbleed Paper Notes

I think there was a slight mistake regarding the researcher's report that Intel 6th and 7th gen can't boost when using more than one core. This is clearly not the case (and not for some generations before) but the Turbo granularity was 100MHz, not less.

Given that the differences in timings are so small (40MHz max), it is possible that the CPU doesn't transition states due to being older tech and thus the timing attack fails.

I would also say that if the older systems aren't boosting higher on two core loads, it most certainly involves bad motherboard defaults - which is common.

Another way to fail would be because they mention these are K CPUs, which may be running on high-end motherboards. On those, most CPUs would automatically have power limits removed and always be at top frequency and have unlimited power budgets.

Notes

[1] AMD's security bulletin says CPU generations belonging to Excavator architecture are vulnerable but this is not the case upon reading the paper. AMD's report does not list Ryzen 1st gen, as expected, but given there was no technical write-up, it is likely someone forgot to remove older generations as well.

[2] I am not a security researcher or expert. Just as with 'retpoline' fixes for Spectre, too much stuff reveals itself to be exploitable after enough research. When declaring "only exploitable on a local network" it means it could work over the internet but in a much reduced bitrate, as to probably not be useful or easily detectable on the victim side.

PC Tips (dot) click

Hertzbleed Simplified - CPU Vulnerabilities