Template attacks are a powerful type of profiling attacks. They require quite a lot of pre-processing on a copy of the device we attack, but the attack alone requires a very few power traces.

#### Overview of template attacks

General idea of the template attack is to create a profile (template) of power supply on a device for some operation for different values of subkeys and then recording some power traces with the unknown key and compare with the template to discover which subkeys are most likely.

To create a template we need to find POIs – Points Of Interest in the power traces. These are some points which depend the most on the values of the subkeys.

Based on the many power measurements in POIs we calculate empirical distribution for every subkey, assuming that those measurements can be closely described in terms of multivariate normal distribution.

Next step is to collect some small number of power traces from the attacked device, where encryption is performed with the same key every time. Then we take values in POIs and compute probability of occurring of those values in the pre-computed template (empirical distribution). Subkeys with the highest probability are most likely to represent the actual secret subkeys.

#### Template attack on AES

In this post we will go through template attack on AES-128 (16 subkeys), implementation provided in ChipWhisperer firmware on XMEGA. We encourage to read the post and simultaneously follow the Jupyter notebook because of a quite large number of technical details that can be otherwise not understood.

Ok, it’s time to collect power traces. We will build our template based on them, and we want empirical distributions to be very close to the real ones. For those reasons we collect a large numer of power traces – we decided to collect 8192 (2^13). Traces have 3000 samples, just enough to catch the first round Sbox substitution, which will be enough for us to make this attack work. In this phase bytes of the secret key (subkeys) and bytes of the plaintext are XORed and substituted by the output of Sbox on the input that is the result of XOR operation.

Collecting this number of power traces takes a lot of time. For that reason it is reasonable to save them in case we want to use them again (we comment on how to do it in the Jupyter notebook for this post).

Our template will be based on the Hamming weight of the output of the first Sbox on subkeys and plaintexts. We will get 9 “classes” for each subkey, because the Hamming weight of a byte can be from 0 to 8.

At first we organize power traces in such a way that all power traces for a particular subkey and Hamming weight of the output of the Sbox are collected together:

def calculateTracesHW(subkey, traces, textins, keys):

tnum = len(traces)

tracesHW = [[] for _ in range(9)]

for t in range(tnum):

hw = HW[intermediate(textins[t][subkey], keys[t][subkey])]

tracesHW[hw].append(traces[t])

tracesHW = [np.array(ttrace) for ttrace in tracesHW]

return tracesHW

Power traces are collected for randomly chosen keys to improve statistical properties. We need to make sure we’ll get at least a couple power traces for each class since there is a small chance that some subkeys will never be chosen. We take it into account in the notebook.

Let’s try to find some POIs. Those are the points in the power trace that depend mostly on the Hamming weight of the output of the Sbox. We take the average of power traces for each subkey and Hamming weight:

def calculateMeanTraces(traces):

tracelen = len(traces[0][0])

meantraces = np.zeros((9, tracelen))

for i in range(9):

meantraces[i] = np.average(traces[i], 0)

return meantraces

tpl_mns = []

for i in range(16):

tpl_mns.append(calculateMeanTraces(tpl_hws[i]))

This gives us a mean power trace for every subkey and Hamming weight, for example for 9th subkey and Hamming weight 5 average trace is shown below:

The average trace looks fine, just like a power trace for normal execution of AES, what was expected.

Next step is to compute absolute differences between those average traces for every subkey and sum them:

def calculateSumDiff(traces):

tracelen = len(traces[0])

tempSumDiff = np.zeros(tracelen)

for i in range(9):

for j in range(i):

tempSumDiff += np.abs(traces[i] - traces[j])

return tempSumDiff

tpl_sdif = []

for i in range(16):

tpl_sdif.append(calculateSumDiff(tpl_mns[i]))

Let’s look at a plot of sum of differences for all subkeys:

We can clearly see some characteristic points with high sum of differences for every subkey. We can also see that some operations before Sbox substitution (around samples 250-1200) also visibly depend on the choice of a subkey, so they will be certainly added to our set of POIs.

In addition to generating POIs only from the sum of differences, we also add points of the power trace that ** correlate** the most with the Hamming weight of the output of the Sbox on each subkey and bytes of plaintext. More details on this approach can be found in one of our previous posts.

#### Finding POIs

Let’s quickly look at a function that finds POIs:

def findPOIs(spikes, num=5, space=5, incr=0):

spikes = spikes.copy()

spikes[:1300] = 0

spikes[2900:] = 0

POIs = []

# Repeat until we have enough POIs

for _ in range(num):

# Find the biggest peak and add it to the list of POIs

nextPOI = spikes.argmax()

POIs.append(nextPOI)

# Zero out some of the surrounding points

# Make sure we don't go out of bounds

poiMin = max(0, nextPOI - space)

poiMax = min(nextPOI + space, len(spikes))

for j in range(poiMin, poiMax):

spikes[j] = 0

space += incr

return POIs

We don’t want POIs that are to close to each other. We must remember that we take 4 samples per one clock cycle, so taking some space between two consecutive POIs will eliminate redundancy.

We will test three sets of POIs – one from sum of differences, 2nd from correlations on the first Sbox and a 3rd which is a sum of the previous ones. We want to have at least 10 points for every subkey to construct the empirical distribution.

We compute mean and covariance matrices using previously computed average traces and array of traces grouped by the Hamming weight. The random variables are measurements in POIs in the collected traces for different Hamming weights, hence we actually have 9*(#of POIs for a subkey) random variables for every subkey. We use **multivariate_normal** module from **scipy.stats** python package to obtain probability density function (pdf) from covariance matrices and vectors of means.

#### The attack

Finally, we collect some small number of power traces, in our simulations 15 were enough. For every subkey, we take from those traces the measurements in POIs. Then, for every possible value of that subkey we check the Hamming weight of the output of the Sbox on the plaintext (which we know) XORed with that value. Having Hamming weight of the output, we can input the vector of measurements in POIs to the pdf for that HW. This way we get 256 values (for every possible value of the subkey) and choose the largest one. We can see results of such attack below:

We were able to recover whole key having just 15 traces.

#### Summary

The approach used here isn’t based on the values of the subkeys, but rather on Hamming weight of some operation in the power trace. Basing the template on the possible values of the subkeys would probably allow to recover the key with even lower number of power traces from the attacked device. However, it requires more pre-processing, since there is 256 possible values for every subkey. Before attempting such attack one must carefully assess this trade-off.

As usual, source code and Jupyter notebook for the post can be found in our public repository.