Side Channel Attacks — Part 2 ( DPA & CPA applied on AES Attack )
Last month, i have shared a blog post with you about the basics of Side Channel Attacks and showed an example of Timing Analysis attack.
If you haven’t read it, i hardly suggest you to read it so that you won’t have problems understanding this one: https://yan1x0s.medium.com/side-channel-attacks-part-1-timing-analysis-password-recovery-607716bfc56a
Today, I’m sharing with you the most known (used) attacks of this field !
We gonna apply them to exploit the most used algorithm for encryption AES !
Power analysis is a branch of side channel attacks where the side channel used is the power consumption. In electronic devices, the instantaneous power consumption is dependent on the data that is being processed in the device as well as the operation performed by that device.
Therefore by analyzing the power consumed by a device when it is doing encryption or decryption the key can be deduced
Consider Figure above. The computer inputs a set of known plain texts to the cryptographic device which does the encryption. While the device performs the encryption the oscilloscope measures the power consumption.
For several hundreds of plaintext samples power traces are obtained and then they are analyzed on a computer using an algorithm such as Simple Power Analysis (SPA), Differential Power Analysis (DPA) or Correlation Power Analysis (CPA) to derive the secret key of the system. As the plaintext in this case is known to the attacker this is a known plaintext attack.
This report will focus on using differential power analysis (DPA) to recover the key used in the AES encryption.
When implementing a DPA attack, we need to choose two models: a power model and a statistical analysis model.
In the next sections, we draw theoretical and practical models, namely the Hamming weight model with the correlation coefficient (CPA) and we put them in practice combined with the DPA attack in order to recover an AES key from a non secure target.
We first need some background knowledge on the Advanced Encryption Standard and differential power analysis attacks.
The Advanced Encryption Standard (AES)
AES is a symmetric-key algorithm, which means the same key is used for both encryption and decryption. There are multiple versions of AES, which differ in the key length.
In our case we will be attacking a target that uses AES-128. This is a version of AES with a key length of 128 bits, which is 16 bytes. The plaintext and ciphertext length are also 128 bits.
AES-128 consists of 10 rounds. Before The rounds are performed, the secret key is XORed with the plaintext. The First nine rounds consist of four stages: SubBytes, ShiftRows, MixColumns, AddRoundKey. In the 10th round the MixColumns operation is not performed as can be seen in figure below.
We are particularly interested in the SubBytes step. In this step, a S-box lookup table is used to substitute each byte for another one. The SubBytes step is the only nonlinear transformation in AES, used to minimize the correlation between the input and the output of the function. This protects AES from linear cryptanalysis.
The plaintext P is XORed with the key K. AES uses a key schedule to define different keys for each of the rounds. However, the key that we XOR with the plaintext before the first round is just the secret key. The output of the first SubBytes operation is Sbox(plaintext⊕key). The output of the tenth round is the ciphertext C.
We will be using the same LAB architecture and requirements from the last report (Chipswhisperer board, Virtual Machine, Simple Serial Communication Protocol and XMEGA target)
— — — — — — — — — — — —
If you are not understanding the next part (firmware related stuff), go back to the last blog -_-”
— — — — — — — — — — — —
First we need to specify the setup parameters :
- SCOPETYPE = OPENADC # ChipWhisperer Lite’s capture hardware
- PLATFORM = CWLITEXMEGA # ChipWhisperer’s builtin target
- CRYPTO_TARGET = TINYAES128C # Cryptographic library used by our target
This is the main firmware function. In The First Step ,the platform (CWLITEXMEGA) is initialized. Afterwards, the Universal Asynchronous Receiver Transmitter (UART) is initialized. The XMGA target can thus send ascii characters the ChipWhisperer (see the schema above).
Then, the trigger pin is initialized. The trigger is used to signal the Chipwhisperer that a capture must start. Subsequently, the message “hello” is sent to the ChipWhisperer via the UART.
The functions simpleserial_XXXX are for setting up the Simple Serial module :
- simpleserial_init() : add a v command for the version.
- simpleserial_addcmd(char,len,fp) : adds a ‘char’ command length ‘len’ and refers its functions ‘fp’
- reset() : resets the reception buffer.
- simpleserial_get() : reads input, finds which command is it, reads the parameter for the command, calls the command.
- get_key(): receives a key of 16 bytes.
- get_pt(): receives a plaintext of 16 bytes.
- get_mask(): sends an input of 18 bytes to aes_indep_mask.
The functions aes_indep_XXXX are for setting up the AES library :
- aes_indep_mask(m) : sets the AES mask to m
- aes_indep_init(): initialize the AES context
- aes_indep_key(key) : sets the AES “key” schedule for encryption
After having the code ready, we compile it as shown in the previous project :
We will get a HEX file as a result which represents the firmware to the flash into our target.
Using the Chipwhisperer Python API, we can flash our target after defining the parameters :
Locating AES Phases and SubBytes operations
Now that we have a general understanding of the AES algorithm, we need to focus on the part we are interested in (SubBytes operations). We will need to approximately locate all the AES Phases.
Locating AES Phases
Let’s take a general look at the AES trace :
The trace above is so compact and it’s a bit hard to differentiate an AES operation from another.
Since we have access to the code, we can add some NOPs between each operation of the Round, by doing that we will be able to see a delay and in result, we will differentiate each operation from an other :
And the trace looks like this :
It’s hard to tell where our NOPs are. In consequence it’s also hard to distinguish the round operations.
Then, we tried increasing the number of samples and put the traces one above another so that we can see the changes from a normal trace to a trace with NOPs between rounds :
Now that we have a way to compare between operations and spot them, let’s do it in 1 shot !
The plan :
- Generate a fixed plaintext and key so that we make sure the operations consumes the same amount of time and power
- Store both traces ( one with NOPs and other without ) : add a boolean flag
- Add ticks on the x axis (every 50 samples) so that we don’t do much calculations.
Also having added 6 NOPs after each operation :
We can now spot the range of each operation :
- Line 3: AddRoundKey(0) starts at around 200 samples and takes around 1200 samples to finish ( 200 -> 1400 )
- Line 13: SubBytes(1) starts at around 1400 samples and takes 1100 samples to finish (1400 -> 2500 )
- Line 14: ShiftRows(1) starts around 2500 samples and takes around 300 samples to finish ( 2500 -> 2800 )
- Line 15: MixColumns(1) start around 2800 samples and takes around 2550 samples to finish ( 2800 -> 5350 )
- Line 16: AddRoundKey(1) start around 5325 samples and takes around 1200 samples to finish (5325 → 6525 )
This confirms our calculations for the 1st AddRounKey(0) where we found that this operation takes around 1200 samples.
AddRoundKey(1) goes from 5325 to 6525 and that’s around 1200 samples.
Meanwhile our calculation for the AES phases were approximative, we don’t wanna do that for the SBox computation since it’s our attack point !
We will be adding some NOPs before each SBox operation :
The picture below shows a trace of SubBytes operation with and without NOPs before each SBox operation :
The blue color represents the SubBytes operation and the brown corresponds to the ShiftRows operation.
We notice that :
- Without NOPs (1st trace) :
The SubBytes operation takes around 980 samples, going from 1470 to 2450. We know that the total number of SBox operations is 16. We conclude that each operation takes around 60 samples.
- With 2 NOPs (2nd trace):
The SubBytes operation takes around 4800 samples, going from 1710 to 6510. We try to calculate the number of samples for 2 loop iterations :
Knowing that 2 NOPs takes around 240 samples :
( 240 + 60 ) * 16 = 4800 samples for the SubBytes with 2 NOPs before each SBox operation. This proves the exactitude of our calculation if all the SBox operations take the same amount of samples.
To view the SBox operations in detail, we will move our adc offset to around 1400 samples and capture to the 2600th sample, we will also make more accurate precision in the x axis tick :
We detected the location where the first byte in the S-box is computed at around 1472-th sample, and the end of the last byte at 2452-th sample. The total number of samples required for capturing each of the 16 operations is around 980, with around 56 samples per operation.
All four sets of four bytes manipulated in the Sbox are stored in columns. In consequence, the second set of samples for example would correspond to the first byte of the second row in the output of the S-box and so forth.
We end up with the following ranges:
Introduction to DPA & HW Assumption
Now that we have identified the SubBytes phase and the SBox operations, we wish to attack them using Differential Power Analysis based on the Hamming Weight Assumption.
DPA Attack theory
The AES (and all block ciphers in general) is designed such that small size blocks (typically 8 bits) can be processed independently, providing efficient implementations on all kinds of platforms. The figure below illustrates a typical 8-bit key addition (i.e., AddRoundKey) and S-box layer (i.e., SubBytes) in a round of the AES.
For instance, in the case of the first round, the input “x” corresponds to a plaintext byte, and the value “s” corresponds to a byte of the first round key (hence a byte of the master key). The output of the AddRoundKey and the SubBytes are respectively denoted as “y” and “z”. Given that the input “x” is assumed to be known (as well as the AES operations), by capturing power leakages while “y” and “z” are produced, the adversary indirectly recovers information about the key byte “s”.
A DPA attack follows a divide-and-conquer strategy. In other words, key bytes are separately attacked, then reassembled to recover the 128-bit round key. The attack on an individual key byte “s” is performed with the following steps :
- The adversary captures a series of power leakage traces (corresponding to the target round) and their respective input “x” . As a reminder, the input “x” is known but not necessarily chosen. The key byte “s” keeps the same value for all of the traces.
- For now, “s” is unknown, but since it is a byte, it can be easily brute forced (s is a value between 0 and 255). Therefore, for each possible value of “s”, the adversary predicts the intermediate values “y” and “z”.
- For each prediction of “z” (or “y” ), the adversary makes use of a leakage model. The role of a leakage model is to emulate the way intermediate values are reflected in leakage traces.
- The modeled predictions (corresponding to a key candidate) are then compared to the observed leakages using a distinguisher. The correct key candidate is the one returning the highest similarity.
The efficiency of a DPA attack resides in appropriately choosing the leakage model and the distinguisher.
The Hamming weight is used as a leakage model. It requires no training, and it is based on the assumption that power leakages vary linearly with the number of 1’s in the binary representation of the target value. For instance, for an 8-bit AES S-box output “z”, the Hamming weight is expressed as follows:
How do we prove that there is a linear relation between the power consumption and the hamming weight ? Let’s plot the Hamming weight (HW) of the data to figure this out along with the power traces!
In order to plot the HW, we first need to have some data. For that, we will be capturing around 1000 traces and store them in an array.
Each trace contains 3 fields : a wave (the data), textin (input plaintext), key (input key).
After capturing the traces, we can plot one for example :
It looks like now we are identifying the AES phases easily!
Using the trace data
Now that we have some data, we can loop through all the bytes of the key for all the captured traces to make a key guess :
The function guess isn’t yet implemented and for that we need to go back to how AES works and understand how we gonna perform the attack in order to have the best guess :
The exact point of attack is the output of the 1st SBox round, why ?
Because the first SBox round contains the master key as it is due to the AddRoundKey(0) which adds a 0 to the master key which means no changes to the target key.
Then it’s passed to the SubBytes(1) operation which represents mathematically :
— — — — — — — — — SBox ( XOR( Pi , Ki ) ) — — — — — — — —
This is easily reversible or brute forceable, why ?
The SBox is a well known matrix and available for everyone, XOR is a simple operation.
We can perform a byte guess of the key and then compare the trace at the leakage point and then compare it to the original trace at that same point. The good guess will have a bigger difference than other wrong guesses.
At this point, we have all the necessary information to brute force the key byte by byte. The total tries is 256 (possible key) * 16 rounds = 4096 tries which is easily computable and a lot better than brute forcing the key bit by bit ( 2¹²⁸ ) which is almost impossible to compute nowadays.
Calculating the Hamming Weight
We previously said that the output of SBox is obviously visible on trace but this assumption is due to the property of the power consumption is lineare to the Hamming Weight !
Now, we need to calculate the HW of the SBox result, let’s define the function that does that :
After that we have the output of the SBox, we will need to calculate its HW, which is basically the numbers of 1s in its binary representation.
We will calculate the HW for all the possible outputs :
Plotting the HW
In order to spot the leakage points, we will be tracing the 16 SubBytes operations using the ranges extracted above. Then, since we have the same Key for 1000 different plaintexts, it means that the HW will vary from 0 to 8 :
SBox( XOR(K, Pi) ) ; 0 <= Pi < 256; 0 < i < 1000
With a high probability, we will have different outputs for the SBox operation and this implies all possible values for the HW.
A “red” palette (brewer) will be used to show the HW value :
- HW = 0 -> trace color is dark red
- HW = 8 -> trace color is light red
From the section where we located the AES operations and exactly from identifying the offset of 16 SBox operations, we can put the plot_start at 1472 and the plot_end at 1528 for the 1st SBox in order to leak the first byte of the key :
There is more than one point with a smooth degradation of red. After zooming a little bit, we decided to choose the one before the 1510 sample, which is 1508.
But how do we know if the chosen point is correct ?
Let’s plot the averages of HW points at the chosen point, which is 1508.
We got a linear plot, which means that the chosen point is the right one.
You might have noticed that the slope is opposite what we expected ! why is that ?
The figure below shows how the measurement process is done on the Chipwhisperer :
We are measuring the drop across the shunt resistor. An increase in the current causes a higher voltage across the resistor. When no current flows there is no drop across the resistor. But since we only measure a single end of the resistor, we see a higher voltage when no current flows.
Next, we want to make sure that our methodology is correct and for that we will re do the same process for the next range (1724 , 1780) and try to spot the leakage point.
We observe a smooth, beautiful gradation of red somewhere between 1760 and 1768. We’ll pick the 1761-th point.
We then plot the averages, as we did before.
And voilà, the long-awaited linear plot.
We have to follow the same process and find all leakage points for all rages that we defined. However, this is a long process to follow and for that we have implemented a function that finds for us all the points that are lineare on a specific range :
As we’ve seen before, there can be multiple leakage points, more precisely an interval of points. We choose one of them, which represents the output of the S-box and use it to find the key before it fades out in the following AES rounds.
Performing the DPA Attack
At this point, we have enough information to recover the AES key. We need to find a way to derive the value of this bytes using the Hamming Weight of the result of the SBox operation which, as we’ve seen, has a measurable effect on the power consumed by the microcontroller.
As we said on the DPA Attack theory section, we will need to capture enough traces with same key and different plaintext :
This capture process takes time to finish. Because of that we want to save the traces into files and load them everytime we want to perform an attack :
Here is our final Attack code :
Once executed :
It seems like we have successfully recovered the correct key !
By now, we have a lot of plaintexts and their associated traces. Also we know the approximative points corresponding to the S-box outputs on the traces.
For each SubKey from the master key and for all the traces at the associated leakage point, we will make a sub key guess and calculate the Hamming Weight of the guess we made and the associated sub plaintext of that part ( from line 13 to line 24 ).
Since we proved that the power consumption is linear to the HW, we gonna use this fact as a distinguisher of 2 groups :
- group1 : low power consumption group (HW is under a certain threshold)
- group2 : high power consumption group (HW is above a certain threshold)
The best guess (correct subkey) will have the highest mean difference between the values of these 2 groups. Why is that ?
Let’s print these values (group1 mean, group2 mean, mean difference) for both a correct subkey and a wrong one, then we can compare them manually :
Once executed :
The mean difference of the correct sub key is 10 times bigger than the wrong one. Why is that too ?
Let’s suppose we have used the correct subkey :
Correct SubKey ->
-Correct HW of the SBox output ->
— Correct distinguishing of 2 groups depending on the HW ->
— -points that will be added to group2 have higher HW ->
— — that means they have higher power consumption at that point -> [ 0.1, 0.2, …. ]
— — -points that will be added to group1 have lower HW ->
— — — that means they have lower power consumption at that point -> [ 0.01, 0.02, …. ]
— — — — — means difference these 2 groups will be high : 0.2
Meanwhile with a wrong guess :
Wrong SubKey ->
-Wrong HW of the SBox output ->
— Wrong distinguishing of 2 groups depending on the HW ->
— -points that will be added to group2 are random -> [0.1, , 0.02, … ]
— — points that will be added to group1 are random -> [0.01, 0.2, … ]
— — - means difference these 2 groups will be low : 0.02
That’s how the mean difference is used to get the best guess for all
Sub Keys (from line 25 to line 38 )
In order to confirm that the attack works for any chosen key, we will choose our key :
Then we will run our attack :
The attack succeeded and it took around 1mn51sec to finish.
Playing with the HW threshold and number of samples
In order to analyze our method and its performance, we are gonna try to play with HW threshold and the samples to be used on the key recovery process.
We did some adjustments to our code :
- Pass the correct key as a parameter
- Pass the the HW threshold as a parameter
- Return the number of wrong keys extracted
- Add a function that do some statistics on the correctness of the key recovery regarding the number of samples and the value of the threshold :
We will need to capture 100 traces and then apply the statistics on them :
The result is shown here :
In order to reach 100% ( full correct key recovery ) :
- With threshold 3, you will need at least 160 traces
- With threshold 4, you will need at least 80 traces
- With threshold 5, you will need at least 100 traces
We conclude that the best parameters are : 4 HW THRESHOLD & 80 SAMPLES
And with these parameters, we will have a quicker key recovery ( we went from 1mn 54 to 1 sec 54 ! )
Same attack different leakage model & distinguisher
In the previous attack, we used the Hamming Weight as the leakage model and the HW threshold as a distinguisher. There are other leakage models and distinguishers. LSB is one of the most known distinguisher
Everything is the same as the previous attack, except for the way of splitting the leakage points on 2 groups. This process will be based on the least significant bit of the SBox output with our guest and the plaintext as an input. if the LSB is 1 then we add it to group1 else to group2 :
This distinguisher is based only on 1 bit of the SBox output which isn’t good, this attack will rely more on probabilities than exploiting a side channel on the SBox output.
It supposes that with a good key guess, the mean difference of a group that has LSB at 1 and an other group that has LSB 0 will be bigger than the other wrong guesses :
This attack didn’t totally succeed. It recovered half of the key (8 out of 16) and it took the same execution time as the previous attack.
To improve this attack :
- Replace the leakage points by leakage ranges
- Increase the number of traces from 1000 to 2000
We notice that we recovered more keys ( 10 out of 16 )
- We change our distinguished from LSB to MSB (Most Significant Bit)
- Increase the number of samples to 3000 :
It seems like now, we are recovering the whole correct key.
The LSB/MSB distinguishers do work but they do rely a lot on parameters and probabilities ( the more samples you use, the higher probably you will recover the correct key ) and also they depend only on 1 bit, either the most significant one or the least significant one. We can adjust them to use more than 1 bit. MSB is better than LSB because the group that has the MSB at 1 will consume more power on the output of the SBox operation.
After all they will take so long to recover the key if you compare it to the HW Threshold distinguisher.
In the previous section, we presented results of practical tests sof the differential power analysis on the AES algorithm, implemented on the particular microcontroller in the C language and in the assembler.
The results were positive, we were able to reveal all 16 bytes of the secret key. The time needed for the attack was reduced to 1 sec using the HW threshold as a distinguisher and 2mn 36 seconds using LSB_MSB as a distinguisher. These timings are a negligible time in comparison with a naive brute force attack on this algorithm ( 2¹⁶ )
DPA attacks rely on the leakage models, distinguishers, number of traces and offsets !
A better choice of these parameters means a correct and a faster key recovery.
The problem with the DPA attack is that:
- It’s quite susceptible to noise
- If you don’t select the right offset and samples the attack can easily pick up other parts of the AES operation (you can try with different values to see this point)
- The attack typically requires a lot of traces. These software unprotected AES implementations are pretty weak against power analysis, but they still required thousands of traces to break
Can’t we improve these inconveniences on another variant of side channel attack ?
Introduction to CPA
Before we get into details of the Correlation Power Analysis (CPA) attack, we will need to change the way we capture, store and load traces.
This time we would have to store traces in their raw format (tess space and less time) :
For example, we are capturing 100 traces of 3000 samples and storing them as raw array of floats inside a file ‘traces.raw’
After we have stored the raw traces, we would have to use them in a C program, parse them and check if we are doing right by comparing to the Pythonish way :
This code does open the raw file then read NB_SAMPLES of size double, then it prints the first 10 values.
We compile and execute the program :
We compare it with the python way and it’s the same values :
At this point, we are asked to :
- Change the python code above in order to convert the paintext and the key in raw type, using astype(‘uint8’), and write them in separate files.
- Modify your script in order to generate raw files for a set of several traces with different random plaintexts
- The file containing the traces and the file containing the plaintext must have a length of nb_samples * nb_traces * sizeof(double) and 16 * nb_traces respectively
To solve all these problems in one shot, we opted for a specific type of file and we will be using just one, which is a database.
First, we define the structure of our database (Model), we will keep the same the structure of python trace :
The code above does store 1 trace per row in a table of type Trace (key : char type, textin: char type, wave : Blob type)
The key and the plaintext are stored in their hexadecimal format and the wave is stored in Blob (Binary Large OBject) format which corresponds to the wave values in their float format.
Our goal now is to create a C library containing a function which will take as arguments an array of traces, the number of traces and the number of samples and which will perform the CPA attack.
Before digging into details we have to see the project structure :
The launch_attack is the front door to our cpa_attack. The number of traces, the offset and number of samples were asked to be modular, so we added some options to the set by the user.
As we said before, we wish to select a given number of traces from the database into an array of traces which will be accepted as a parameter to a C function.
To do so, we define a structure, which we’ll call `t_trace` in C:
We create the same structure using ctypes in python:
The `textin` and `key` sizes are fixed and have a value of 16. As for `wave`, we have 2 cases :
- It can be either a whole wave from the start to the end of samples (0–3000 by default)
- Or it can be a list of ranges, since later we want to attack only specific parts of the wave.
For test proposes , we’ll create a function in C which prints an array of traces:
We’ll perform the `make` command which creates our library `cpa_attack.so`.
After we created the library, we’ll load it in python,
perform a query which selects from the database, the number of traces requested by the user,
create a list of TRACES (wave, textin, key) from the output of the select query and use it as a parameter of the C `print_traces` function we saw above.
We will try to print in python and in C a trace to see if we have the same values.
Looks like we have the correct values, so we can start to implement the CPA attack.
CPA attack theory
The CPA attack is based on the correlation of the power consumption according to a chosen model. In our case, the chosen model is the Hamming Weight. Before we explain the mathematics, we’ll take the time to visualise our goal. We will take the example of the first key byte and try to visualise the correlation we are searching for.
Supposing we have the following plaintexts and their associated waves of AES operations performed with the correct key byte.
Now, we try to make a guess on the first key byte. We chose the first byte of the key to have a value of Z and we will compute the HW of the first byte of the plaintext xor-ed avec our guess Z.
We see that the HW outputs are not correlated with the waves. We can convince ourselves by taking the first plaintext’s power consumption wave which is the smallest but has the greatest value of HW, which is clearly a sign of decorrelation .
Let’s try another guess (the right one), which would be Y and compute the HW as we did for Z.
We can see that compared to the other guess, Y gets us HW values which are way more correlated with the waves.
As we’ve seen from the attack illustration presented above, we have two datasets:
- Waves associated to the power consumption of AES operations of a set of plaintexts and the correct key
- The Hamming Weights of the given plaintexts xored with a key guess
We will try to measure the linear correlation between them.
To determine if there is a relationship and the strength of this relationship between these two datasets we’ll use the Pearson’s correlation coefficient. Pearson’s correlation coefficient is considered to be the best method one can use to determine if there is an association between two datasets, because it is based on the method of covariance.
The covariance, in its turn, evaluates how much or to what extent two assets change together. This can sound sufficient to see if there is a relationship between our HWs and the waves. However the covariance used alone will only tell us if our assets tend to move in the same direction or to the inverse direction, but will not tell us if there is a dependency between our assets. That’s why we have to use Pearson’s correlation, because unlike the covariance, the correlation coefficient tells us not only that there is a relationship between two datasets, but also the strength of this relationship, the dependency between the two datasets and that’s what we wish to know.
The Pearson’s correlation can be expressed with the following formula:
ρ(X,Y) — the correlation between two variables X and Y
Cov(X,Y) — the covariance between the variables X and Y
σX — the standard deviation of the X-variable
σY — the standard deviation of the Y-variable
This formula can also be written as :
Where y stands for traces (waves) and x for hypothesis, respectively x bar and y bar are the values of the hypothesis mean and wave’s mean.
The Pearson’s coefficient will always return a value between -1 and 1 and stronger the association of the two variables is, closer the Pearson correlation coefficient will be to either 1 or -1, respectively closer the output of the formula is to 0, stronger will be the variation between the two variables.
Now that we have all the theoretical tools to perform our attack, we can start to implement it following the steps below:
- Encrypt several plaintexts and record the waves representing the power consumption associated with these encryptions
- Store the waves and plaintexts along with the encryption key in files/ database as we’ll need them later to perform the attack
- Choose a model of power consumption, in our case Hamming Weight
- For each of the key’s bytes, apply the power consumption model. For our example, compute the HW of the plaintext’s each byte and the possible key’s bytes hypothesis.
- For each key byte, apply the Pearson’s correlation formula on the obtained hypothesis and the recorded waves(actual power consumption).
- Decide which subkey guess correlates best to the measured traces according to the Pearson’s output interpretation explained above.
- Put together the best subkey guesses to obtain the full secret key
First we need to capture some traces and store them in a database file :
Once our database is filled with the captured traces, we need to load the traces from the database to the C Structure of the trace :
After that we have allocated the needed space for C Traces, we can call the C function that performs the attack :
The CPA attack core code :
For each subkey we will apply correlation between the HW of all the guesses and the traces, the guess that has maximum correlation is our best guess and highly probable to be our subkey.
By applying this to all the subkeys, we will end up recovering the whole key.
The whole attack relies on the calculation of the pearson’s correlation coefficient, which is presented by this formula :
This form can be simplified to :
We notice that all the sums are done over `n` which represents the number of traces. We can exploit this to calculate them in one loop :
Once the sums are calculated, we divide the numerator ( Covariance(X,Y) ) by the square root of the multiplication of the 2 denominators ( Standard_Deviation(X), Standard_Deviation(Y) ) :
The pearson’s correlation coefficient is the maximum of the last result :
Correlation applied only on leakage intervals of the wave
Now that our code is ready, let’s launch an attack using 50 traces and we will target only leakage ranges of the wave :
Impressive ! we recovered the whole key in less than a second, it takes around 71 milliseconds.
If we decrease the number of samples to 25 :
It takes less time but recovers 14 correct subkeys out of 16.
Using 35 seems to be the threshold of number of traces to use :
When attacking only the leakage ranges of the wave, we need to use 35 traces in order to recover the whole key correctly and that takes 51 milliseconds.
Correlation applied on the whole wave
Now, we want to apply the correlation on the whole wave :
It recovered 15 subkeys out of 16 in 17 seconds using 50 traces.
We play a little bit with the number of traces and then we concluded that with 50 trace the 9th SubKey isn’t correctly recovered but adding 1 trace will fix the problem :
When attacking with the whole wave instead of leakage ranges, we need to use 51 traces in order to recover the whole key correctly and that takes around 17 seconds.
DPA vs CPA
As a bonus, we have implemented DPA attack in C, we basically translated the Python Code we already had to C version:
We execute DPA with HW Threshold as a distinguisher :
Using 40 traces, we recovered 14 subkeys correct out of 16 in 3 milliseconds.
Playing with the number of traces , we found that :
When attacking with DPA and Hamming Weight as a distinguisher, we need to use 60 traces in order to recover the whole key and that takes 5 milliseconds.
Overall comparison :
From this comparison table, we can say that :
- CPA Attack using the whole wave is the best
- DPA Attack using HW with leakage points is the fastest but it’s not convenient to search each time for the offsets and number of samples for each SBox operation
- CPA Attack using leakage ranges is also good if you can spot the ranges easily this will make win some seconds
- DPA Attack using LSB as a distinguisher is the worst one, because it requires a lot of traces, takes a lot of time and doesn’t fully recover the key.
We have seen that the implementation of AES that we used showed leakage of power consumption information. If there was no leakage,the differential power analysis attack would not have worked.
For both attacks (DPA & CPA), we succeeded in revealing the secret key of AES. A Noticeable difference between the two attacks is that the CPA attack is more stable and less dependent on finding the correct key .
The LSB model only considers the last bit of a byte and ignores all other bits. The Hamming weight model/destinguisher which is used for the CPA/DPA attack does include all bits, which makes CPA/DPA more efficient and more accurate.
The correlation coefficient considers more aspects of statistics, which leads to a better comparison between the columns of the matrices and that makes CPA way stable than DPA.
Unfortunately in the security world there are no such hundred percent perfect countermeasures. Therefore the objective of a countermeasure against power analysis is to make an attack extremely difficult. When the time and the cost for an attack is large enough such that it would be infeasible for an attacker, such a countermeasure can be considered good enough.
Thanks for reading and i hope you have learnt something new.
Please, if you notice that i miss understood something, let me know ^_^