Side Channel Attacks — Part 1 ( Timing Analysis — Password Recovery)

18 min readFeb 14, 2021

https://soatok.blog/2020/08/27/soatoks-guide-to-side-channel-attacks/

We have just finished a course at University about Side Channel Attacks and I’m happy to share with you some basics in this part and we will get into the advanced stuff on the next one.

What I’m going to share is a one of the LABs created by New AE Technology.
Anastasia and I worked together and we have ended up with the results below and we still have an advanced one for Side Channel Attacks on AES (Part 2)

Introduction

“Side channel attacks” are attacks that are based on “Side Channel Information”.

Side channel information is information that can be retrieved from the encryption device that is neither the plaintext to be encrypted nor the ciphertext resulting from the encryption process. In the past, an encryption device was perceived as a unit that receives plaintext input and produces
ciphertext output and vice-versa.

Attacks were therefore based on either knowing the ciphertext (such as ciphertext-only attacks), or knowing both (such as known plaintext attacks) or on the ability to define what plaintext is to be encrypted and then seeing the results of the encryption (known as chosen plaintext attacks).

Today, it is known that encryption devices have additional output and often additional inputs which are not the plaintext or the ciphertext. Encryption devices produce timing information (information about the time that operations take) that is easily measurable , radiation of various
sorts , power consumption statistics ( that can be easily measured as well), and more. Often the encryption device also has additional “unintentional” inputs such as voltage that can be modified to cause predictable outcomes.
Side channel attacks make use of some or all of this information, along with other (known) crypt-analytic techniques, to recover the key the device is using.

Side channel analysis techniques are of concern because the attacks can be mounted quickly and can sometimes be implemented using readily available hardware costing from only a few hundred dollars to thousands of dollars.
The amount of time required for the attack and analysis depends on the type of attack (Differential Power Analysis, Simple Power Analysis, Timing, etc.)

In the next sections, we will cover how to perform these types of attacks and how they can be used real life scenarios.

— — — — — — — — — — — — — — — — — — — — — —
Before you start reading, if you don’t want to know all the details about the requirements for this LAB and get bored if you aren’t interested, just skip and go to the section Instruction Power Differences
— — — — — — — — — — — — — — — — — — — — — —

Requirement

Hardware

It’s a Side Channel Attack, it means we need a hardware to attack !

If you don’t have one, don’t stop reading because we tried to give all the details so that you won’t have any doubts about a thing. If you still do, message me or google is your friend :D

The ChipWhisperer boards take away the frustration of setting up the hardware for side channel attacks.

To perform a side channel attack, you need two things:
● A capture board. This is an oscilloscope on steroids: it has special hardware that it uses to capture very small signals with a precisely synchronized clock.
● A target board. This is typically a processor that can be programmed to perform some kind of secure operation.

In our case, we choose to use the ChipWhisperer Light Classic :
The ChipWhisperer-Lite Bare Board consists of two main parts: a multi-purpose power analysis capture instrument, and a target board. The target board is a standard micro-controller which you can implement algorithms onto. For example if you wish to evaluate an AES library, you can program that library into the target board and perform the power analysis.

Software

There are five ways to set up ChipWhisperer:

● VMWare Virtual Machine: Get a pre-prepared virtual machine image with all of the required tools already installed. Recommended for beginners.
● Windows Installer Get a Windows binary that installs the ChipWhisperer repository to your computer. Does not include WinAVR compiler.
● ChipWhisperer Releases: Get a zip file with the latest stable ChipWhisperer code and run it on your own environment.
● PyPi Package: pip install chipwhisperer. Only includes the software — doesn’t come with the hardware source files, drivers, or example firmware.
● Git Repository: Get the latest, bleeding-edge features and bugs. Recommended if you’re an experienced developer and you want to contribute to ChipWhisperer.

In our case, we had already the the Virtual Machine proposed by NEW AE TECHNOLOGY.

LAB Architecture

An example of workflow, will be as below

Don’t mind this picture for now, you will comeback later here, if you don’t get the workflow.

Firmware Setup Build

In order to work with the ChipWhisperer, we will need to specify setup parameters :
● SCOPETYPE — Indicates the capture hardware to use. The practicals use the ChipWhisperer Lite, which corresponds to the ‘OPENADC’ value.
● PLATFORM — Selects the target we’re attacking. The XMEGA target is specified using the value ‘CWLITEXMEGA’.
● CRYPTO_TARGET — Selects the cryptographic library to use on the target. When using cryptographic functions, the value should be ‘AVRCRYPTOLIB’ for the XMEGA target. Use ‘NONE’ when no crypto library is required.

Communication protocol

SimpleSerial is the communications protocol used for almost all of the ChipWhisperer demo project. It’s a very basic serial protocol which can be easily implemented on most systems.

All messages are sent in ASCII-text, and are normally terminated with a line-feed (‘\n’). This allows you to interact with the simpleserial system over a standard terminal emulator.

Firmware build procedure

First we need to specify the setup parameters :
— SCOPETYPE = OPENADC # ChipWhisperer Lite’s capture hardware
— PLATFORM = CWLITEXMEGA # ChipWhisperer’s builtin target
— CRYPTO_TARGET = NONE # No need for a cryptography library

Firmware code

This is the main firmware function. In the first step, the platform (CWLITEXMEGA) is initialized.
Afterwards, the Universal Asynchronous Receiver Transmitter (UART) is initialized. The XMGA target can thus send ascii characters to the ChipWhisperer (see the schema above). Then, the trigger pin is initialized. The trigger is used to signal the Chipwhisperer that a capture must start.
Subsequently, the message “hello” is sent to the ChipWhisperer via the UART.

The functions simpleserial_XXXX are for setting up the Simple Serial module :
• simpleserial_init() : add a v command for the version.
• simpleserial_addcmd(char, len, fp) : adds a ‘ char ‘ command of length ‘ len ‘ and refers its functions ‘fp’
• get_key() : commands that start with ‘k’ character and accept an input of ‘16’ length : receive a key of 16 bytes.
• get_pt() : commands that start with ‘p’ character and accept an input of ‘16’ length : receive a plaintext of 16 bytes.
• reset() : commands that start with ‘x’ character and don’t need any input : resets the reception buffer.
• simpleserial_get() : reads input, finds which command is it, reads the parameter for the command, calls the command.

Compiling the code

Once, we have written the code and we have all the dependencies. We should be able to compile the code to get a binary file that can be executed by our target.

In order to compile the code, we need to have avr-gcc utility :

We write a Makefile, where we specify the target source and the dependencies file.

We run the make utility and we also provide the required parameters we have seen in the beginning of this section.

As a result we get an ELF (Executable Linked File) and a HEX file ( the firmware on a binary format) that can be flashed to the target :

Flashing the firmware

In order to upload the firmware to the target, we will be using the ChipWhisperer Python API.
First we import the module :

import chipwhisperer as cw

Next we’ll need to connect to the scope end of the hardware. `cw.scope` will attempt to auto-detect which scope type you have :

scope = cw.scope()

We’ll also need to setup the interface to the taget

target = cw.target(scope, cw.targets.SimpleSerial)

We can use the default setup for the taget

scope.default_setup()

Finally, we will use the generic programming function by providing the scope, programmer type and the path to the firmware in hex format :

prog    = cw.programmers.XMEGAProgrammer
fw_path = '../hardware/victims/firmware/simpleserial-base-lab1/simpleserial-base-{}.hex'.format(PLATFORM)
cw.program_target(scope, prog, fw_path)

Instruction Power Differences

If you got here without getting bored then congratz, else i really do understand that the setup process is really boring when you aren’t interested.

For people that have skipped the first people, here is an TL_DR for you

At this point we have :
— A target board : This will execute the firmware (program) that will flashed on it.
— A capture board : This will measure the power consumption of the capture board while executing the program.
— SimpleSerial: a protocol to communicate with our capture board so that it sends the result of power consumption and also we can communicate with the target and tell it to execute some stuff for us.

The actual program we have on the target is as shown on the above picture : sends “hello” then setup the functions for receiving plaintext, keys or even reset the target buffer.

Using the ChipWhisperer API, we can start with a simple script that connects to the target and read data :

Cool ! we received the “hello”

Now that we can communicate with our target, our next goal is to get a power trace while the target is running. To do this, we’ll arm the scope just before we send our command, then record the trace as we’ve done before.

The trace is just an array of power consumption values in Volt during a certain time. We can choose the number of samples (points) to have in our array (trace), using : scope.adc.samples = 1000

Now, we have the trace back, let’s plot it using bokeh python library :

We can communicate with the target (send commands), we can also communicate with the capture board so that we get the trace of the power consumption of the target while executing some code !

Let’s get into the interesting part :D

4- iterations loop

If we add a loop of 4 iteration with no operations inside the loop to the beginning of the code of the firmware

The trigger signal is raised for one clock cycle. This signals the ChipWhisperer that the infinite loop will be executed in the next step. Thereafter, the infinite loop is executed. The volatile keyword ensures that the compiler does not remove the infinite loop during code optimization.

We recompile the code and flash the target with our new code (firmware).
Then, we need to plot the power trace :

We can notice that there is a pattern ( number of spikes ) that get repeated 4 times.
By playing with the number of samples to capture and the width/height of the graph, the pattern will be clearer :

Wow, so it’s that easy to spot operations executed by a processor ?
Can we do that with more realistic operations ?

NOPs vs MULs

Now that we can recognize the pattern of a loop iteration, let’s compare between a MUL operation and NO Operation inside a loop of 10 iterations :

We notice that the MULs have more power consumption than the NOPs which is logical because the CPU is doing calculations when the MULs are happening meanwhile it’s not the case with the NOPs.

In order to precisely compare between MUL and NOP, we will double the number of NOPs and compare the number of cycles :

It seems like the loop period of 20 NOPs is equal to the one with 10 MULs, we can conclude that the MUL operation takes 2 clock cycles and the NOP takes 1 clock cycle.

NOPs vs MULs vs ADDs

Now we want to include the addition operation, so we will do 10 NOOs, 10 MULs, 10 NOPs then 10 ADDs :

It seems like the clock cycle for the ADD is almost equal to the NOP clock cycle. Let’s verify by doubling the number of iterations ( 20 nops, 20 mul operations, 20 nops and 20 additions ) :

Indeed, the ADD operation has a 1 clock cycle just like the NOP.

Conclusion :

For all the previous plots, the abscissa corresponds to the number of samples
meanwhile the ordinate to the power (in volts). The number of samples is “directly proportional” with the number of clock cycles (amount of time consumed by the cpu to do the operation) :

MUL > ADD, NOP for both number of clock cycles and samples :
- MUL : 2 clock cycles
- ADD : 1 clock cycle
- NOP : 1 clock cycle

This kind of interdependence (i.e: clock cycles which affect the number of samples) are helping an attacker to distinguish different performed operations.

Timing Analysis With Power

Now, we gonna discuss a realistic situation where there is some kind of a program that is implemented inside a hardware (SIM card, Bank card) :

After getting a ‘\n’ terminated password, the target checks it and sends us a result message back. Before communicating with it, we’ll need to reset it :

We will also need a function that tries (sends) a password and then receives the correspondant trace back which will be plotted :

The trace resulting from the correct password “h0px3” :

In the beginning of the trace, we see 5 repeated patterns which matches the loop that compares the password from user input with the password “h0px3” character by character:

The trace of completely wrong password, is similar to this one :

We notice that there is no pattern for a loop at the beginning of trace.

Timing analysis

NB: The correct password is always “h0px3”, keep that in mind

Since the password check is done character by character, we can exploit the fact that the program will take longer to check a password that starts with a correct character than the one that doesn’t !

The resulting traces for the code below :

• Yellow trace : password starts with bad char
• Red trace : password starts with good char

Both traces start with almost the same sample value. Then after around 60 samples, the red trace got delayed till the end.

We also can draw the trace of their difference using the code below :

From 0 to 60 samples, the difference is quite low, then after that it goes higher and unstable.

If we increment the number of correct characters in both compared password we will notice that the difference between samples will get delayed further more :

That means we are doing more loop iterations !

Attacking a single char

Now, we can proceed to attack part !

We know that all traces representing a correct letter will always be different from other traces and that the wrong chars will generate pretty the same traces.

We can use this to find the most de-correlated trace, which will be the trace corresponding to the correct char.

— — — — — — — — — — — — — — — — — — — — — — — — — — -
But, wait ! what is correlation ?

What does it mean for two variables to be correlated? Simply put, it means that we can tell the variables are related because they move together in some way. Perhaps one rises while the other falls or both rise together. This isn’t the same as a movement in one variable causing the other to rise or fall — there might be some third, unseen variable that is controlling the other two — yet knowing the value of one of the variables will still give you information about the second if they are correlated.

Many people are introduced to correlation in beginning statistics classes with a formula for Pearson’s Correlation Coefficient and a set of graphs demonstrating various levels of correlation like this:

On the far left graph, there is only a very low correlation between our x and y variables. On the right there is a very high correlation; the two variables seem to move in lockstep, as X increases, invariably Y does as well. The value of Pearson’s Correlation Coefficient reflects the important trends pretty well, with the lockstep relationship on the right represented by a value close to 1 and the low correlation on the left captured by a value close to 0. Of course, variables can also be negatively correlated, where the Y value generally falls as X increases:

So, the correlation value can be between [-1; 1]. If it’s near to 1 then the 2 variables are positively correlated else if the value is near -1 then hey are negatively correlated else they are de-correlated : values between [-0.5; 0.5] for example!
— — — — — — — — — — — — — — — — — — — — — — — — — — — -

To do so, we’ll calculate the correlation between a known bad char (for example “\xFF’) and all other characters :
- The wrong characters will have a high correlation with our bad char ( corr > 0.9 )
- The correct character will have a low correlation with our bad char ( corr < 0.9 )

We will pick the character that has the lowest correlation as the correct character :

We got “h” which is the first char of the correct password “h0px3” !

That’s how it looks in practice to find the lowest correlated char :

Attacking the full password

Now, we will use correlations to attack the full password. We will use the same logic as before :

We define a function that calculates the correlation for a longer guess :
It takes the old guess which is passed as an entry parameter and consider it as an empty password (it adds the bad char ”\xFF”, calculates correlation, compares with all other characters’s correlation
and pick the minimum) :

When executed :

Isn’t that cool !

The inconvenient here is that we performed the attack and we already knew the password and we knew its size.

Attacking any size password

Now we’ll try to guess the password when we do not know the size of the password.

Since we know that the threshold correlation for a wrong character is 0.9, we will modify the function next_char so that it return the minimum correlation in addition :

If the returned correlation is > 0.9, it means that we looped through all the candidates and we didn’t find a correct character for the next position. At that moment, we can say that we leaked the whole password.

We ask another person to choose a password for our target and then we recompile, flash the target and run the next code :

And voila !

Countermeasure

The most reliable approach that will really bother the attacker, is to try to get a constant time for all performed, secret-dependent operations.

For our countermeasure, we will keep the two arrays, one containing the correct password and another one containing the password sent by the user. Also, we keep the loop that iterates through both strings, comparing them, but slightly modified to avoid the early aborts in case of a wrong
letter.

To do so, we modify both arrays to have a fixed size (32) and fill them with “\0”. We’ll do 32 iterations, despite the fact that the real size of the password is much shorter. This allows us to have a constant execution time, even if we are sacrificing the performance.

Now, many would think that the side channel arises only because of an early abort which is definitely false. As we told before, we want a constant time for all secret-dependent operations. So, using the same approach as the developer (without the early abort), expose our program to a side channel attack, because of the comparison ( if (correct_passwd[i] != passwd[i]) ) which is secret dependent, resulting in an operation disproportion, as the program will execute an instruction in the case of equality and not otherwise. Adding an else with the same operation is not a solution, as it does affect the performance and does not protect our code from a potential branch prediction .

Our solution consists of:
For each byte of the two strings :
● Calculate (left_i XOR right_i)
● Bitwise OR the current value of passwd with the result of the XOR, store the output in passwd ( which was initialized to 0 )

At the end of the loop, passwd will be equal to 0 if and only if the two strings are equal.