In this article we will show some ways to enhance our CPA on the simple password program in the time-invariant version. Our goal will be to lower the number of traces needed to recover the password. We will do it by focusing on those parts of the power traces that have significant information for our analysis.

Let’s revisit the part of the code we exploit:

unsignedc = 0;for(unsignedi=0; i<sizeof(q);i++) {

c |= q[i] ^ p[i];

}

Let’s look at the example power trace:

We can clearly see the iterations of the loop. If we focus only on those iterations in CPA analysis, we will be able to reduce the amount of noise in our analysis and we should recover the password with fewer power traces. It will work due to the fact that we focus our analysis on the parts that are connected to the particular bytes of the key we want to recover instead of taking the whole power trace into account. We expect that consecutive iterations calculate consecutive password byte comparisons.

We start our analysis by collecting 70 power traces with random 8-byte passwords. The average of the traces is below:

The average of the traces doesn’t really differ visually from the sample power trace we show above. Well, it can be yet another proof that the program is indeed time-invariant.

We will try to precisely deduce the number of the measurement when the loop starts and how many measurements one iteration takes. We can see that first iteration starts roughly at 10th point in the trace and that one iteration is about 60 measurement points long. More careful analysis with correlation coefficients and relative differences shows that the iteration with high probability starts at point 9 and is 56 points long. We will describe methods of finding this information in another post. In simple cases – like this one – it’s possible to do this by eye by overlaying iterations on top of each other until they match visually:

We use the same leak model as in the previous post about CPA – Hamming weight. The difference is that we compute Pearson’s correlation coefficients only for the parts of power traces where we expect a particular byte is processed:

def getiteration(x,i):

start = 8

length = 56

offs = length

return x[start+i*offs:start+i*offs+length]

...

for i in range(ntrc):

hdiff = hs[i] - meanh

tdiff = getiteration(trcs[i],byte) - getiteration(avg,byte)

covht += hdiff*tdiff

varh += hdiff2

vart += tdiff2

rs = covht / np.sqrt(varh * vart)

...

Using this method we managed to recover all the password bytes (“feedcafe”) in rank-0 (the highest correlation coefficient) with only 70 power traces, compared to 1000 traces from the previous post. The guesses are accurate enough that we don’t need the brute-force pass anymore.

Best guesses for byte 0:

66 f +0.888461

20 +0.863924

Best guesses for byte 1:

65 e +0.863597

25 % +0.703183

Best guesses for byte 2:

65 e +0.729099

64 d +0.654668

Best guesses for byte 3:

64 d +0.871467

60 ` +0.781315

...

Although the results are quite satisfying, we can improve our analysis further. We focus on iterations, but there are many things going on in them. There’s checking condition for the loop, the loop counter is incremented and there are also many operations on the lower level, like loading variables to registers. If we could pinpoint at which exact point a xor is performed, we would be able to base our CPA solely on these points in the power trace.

Turns out it’s not that hard to find those points. We need a few hundred traces once to determine those points-of-interest – let’s call this a “profiling” pass. Once we have a created a profile we can reuse it to attack new passwords with very few traces (those familiar with side-channel attacks will notice what we’re doing – we are gradually building towards a Template Attacks).

Let’s have a look at where points of high correlation for each byte:

We can conclude that xor is performed at points 29-32 of each iteration, which is actually one instruction of the processor (we collect 4 samples per cycle).

We will do one more improvement. Last time we did not care about the quality of our guesses. There were times where difference in correlation between rank-0 and rank-1 guesses was very small (say 0.643 rank-0 vs 0.642 rank-1) indicating that our guess for this byte was not very solid. This is also why we needed a brute-force pass at the end.

This time we would like to eliminate this flaw and attempt to guess the password directly, without a brute-force. We will require that the relative difference in correlation between rank-0 and rank-1 guess is at least N% – let’s call this number a quality constraint. This is an arbitrarily chosen value. The higher it is, the higher the probability that will recover the password directly, but also the larger the number of traces we will need.

Let’s start with a quality constraint of 8%. When we use the whole iteration in our calculations it turns out that we need more than 100 traces. However, if we limit ourselves only to the “hot” 3 samples at 29-31, we can recover the password with 49,83 traces on average with success rate of about 83.3% (25/30).

For a quality constraint of 10% we need an average of 65.1 traces needed and 96.6% (58/60) success rate.

Here we have some screenshots of successful trials:

In future posts we will discuss some countermeasures that can be applied to protect against these improved attacks.

As usual, our notebooks and source code can be found in our repository:

https://gitlab.com/myrelabs/devblog/tree/master/chipwhisperer/firmware/simple_pass_CPA_enhanced