Reverse engineering an unknown CRC

It's been a long time since I've been to any security CTFs, even longer time since I've written my last writeup. But, as life would have it, I came across a task during my AR adventures that was exactly like a CTF task:

I've found a non-standard CRC checksum on a firmware file, and had to reverse-engineer its parameters.

What is a writeup?

Not all of you may be familiar with CTFs and writeups. CTFs in the infosec context are competitions where you have to hack computers, applications, or both. They usually last some days, and there are 5-20 tasks, and done in teams.

After the CTF, it's common practice for the CTF teams to write so-called "writeups", which are half technical description of the task and how it was solved, and half a story of how the author came to the solution.

So prepare for 50% story, 50% information :)

The task

I had an embedded (Cortex-M4-based) device on my hand, some ~5 firmware files that I obtained fully legally.

If I booted the device while holding down the "volume UP" button, this splash screen appeared on USB-to-serial:

========================= Home Menu ==========================

==============================================================


--------------------------------------------------------------
=  Version: Vxx.xx.xx                                        =
=  Revise Date: xx-xx-xx                                     =
=  Software Name: xxxxxxx_Bootloader                         =
=  (c) COPYRIGHT xxxxxxxxxxxxxxxxxxxxxxx All rights reserved =
==============================================================

$1 ------- Download New xxxxxxxxxxx_App

$2 ------- Download xxxxProgram

$3 ------- Download xxxxxxProgram

$5 ------- Jump To xxxxx App

(It has been redacted because it's not important and I don't want to appear in specific search results)

Naturally, I pressed $ 1, and got a new prompt:

Waiting New xxxxxxx_App to be Write Flash_addr 0xxxxxxxx

Start Download xxxxx APP

Not having any better ideas, I switched to a pyserial to automate the process and with it, directly sent in one of the firmware files.

Download xxxxxApp Length is 80681 Byte!

Check New xxxxxxx_App CRC Success!

Recvd: 
 File Name : CONF7
 Size: 80681     Bytes

And it started successfully after the $5 command. OK, we are done with the positive control.

Looking at the firmware files, it was quite apparent that it didn't have any cryptographic signatures (there were no 0x80 or bigger blobs with very high entropy), and in fact the header was incredibly simple:

ee a1 19 6f 29 3b 01 00  43 4f 4e 46 37 00 56 30  |...o);..CONF7.V0|
2e 31 2e 32 00 00 a8 53  01 20 35 10 05 08 97 0d  |.1.2...S. 5.....|
05 08 f9 0d 05 08 fb 0d  05 08 fd 0d 05 08 ff 0d  |................|
05 08 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00 00 01 0e 05 08 03 0e  05 08 00 00 00 00 05 0e  |................|
05 08 f1 41 04 08 6b 10  05 08 6f 10 05 08 73 10  |...A..k...o...s.|

It may not look that simple to you, so let me break it down:

My goal here is to patch the firmware with some extra functionality that the original manufacturer didn't bother to implement.

But, as you could've guessed, modifying any bytes in the firmware binary resulted in the following error message, and the device failing to boot:

Check New xxxxxxx_App CRC Failed!

I tried some shenanigans like sending only half the firmware, or unplugging it just before any check could happen, etc., but nothing worked.

Time to figure out...

...what kind of checksum is that

Let's first list our assumptions:

(Later I've been told that some software even include the CRC field in the checksum process with dummy bytes, but fortunately it turned out this was not the case)

First, let's try the easy way: copy-pasting the bytes into an online CRC calculator and crossing fingers:

Screenshot of crccalc.com

Nope, none of them are even close to EE A1 19 6F or its inverse (11 5E E6 90).

Next try: bruteforcing parameters with pycrc:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/usr/bin/python3
import crc
from itertools import product

data = open("fw.bin", "rb").read()
expected_checksum = data[0] | (data[1] << 8) | (
    data[2] << 16) | (data[3] << 24)

for poly, init, revi, revo, with_size in product(
    [0x04C11DB7, 0x000000AF, 0x1EDC6F41, 0xA833982B, 0x814141AB],
    [0, 0xffffffff],
    [False, True],
    [False, True],
    [False, True]
):
    config = crc.Configuration(
        width=32,
        polynomial=poly,
        init_value=init,
        final_xor_value=0,
        reverse_input=revi,
        reverse_output=revo,
    )

    calculator = crc.Calculator(config)
    checksum = calculator.checksum(
        data[4:] if with_size else data[8:]
    )
    if checksum == expected_checksum or checksum ^ 0xffffffff == expected_checksum:
        print("Found solution: ", config)
        break

It didn't find anything.

Background

I saw that I won't really be able to brute-force this many arbitrary bits even with GPU stuff, so I resorted to the M-word: using mathemathics. Group theory, specifically.

GF(2)

Tl;dr: addition of polyomials over GF(2) is the xor operator, multiplication and division are weird bitshift and xor combinations.

To understand the stuff in this section, you need to understand GF(2).

GF(2) is a finite field field, which means it has:

GF(2) is not really useful by itself, as it operates on only a single bit.

Polynomials over GF(2) are much more useful. Let's look at these two example polynomials:

a = x2 + 1

b = x4

a + b = x4 + x2 + 1

Easy, right? Let's make it a bit more complicated:

c = x3 + x2 + x

a + c = x2 + 1 + x3 + x2 + x = x3 + x + 1

Wait what? What happened to x2? Remember that while x is a symbol in the above expression, all the constants must be either 0 or 1, since we are operating above GF(2). And according to the addition rule, x2 + x2 = (1 + 1)⋅x2 = 0⋅x2 = 0.

We will need one more operation, multiplication:

a⋅c =
(x2 + 1)⋅(x3 + x2 + x) =
x2⋅x3 + x2⋅x2 + x2⋅x + 1⋅x3 + 1⋅x1 + 1⋅x =
x5 + x4 + x3 + x3 + x1 + x =
x5 + x4 + x1 + x

It's still not really clear what this has to do with computers or anything. Let's start descending towards Earth. We will start by expanding all of our polynomial notations:

a = 0⋅x4 + 0⋅x3 + 1⋅x2 + 0⋅x1 + 1⋅x0

b = 1⋅x4 + 0⋅x3 + 0⋅x2 + 0⋅x1 + 0⋅x0

c = 0⋅x4 + 1⋅x3 + 1⋅x2 + 1⋅x1 + 0⋅x0

At this point we can leave the x-es behind:

a = 00101

b = 10000

c = 01110

The above operation results become:

a + b = 10101

a + c = 01011 (basically a xor c)

a ⋅ c = 110110

Only the last one, the multiplication is not what you would expect. This is because if you do the long multiplication, the addition part is done differently.

With pseudocode, it looks like something like this:

1
2
3
4
5
6
7
# Multiplying a and b
result = 0
while a>0:
    if a & 1:
        result ^= b
    a = a >> 1
    b = b << 1

CRC

Spoiler: CRC is literally just division over GF(2) polynomials. The CRC polynomial is the divisor, CRC checksum itself is the remainder.

It's the same as long division taught in elementary school, but spicier. Just remember that over GF(2), addition and subtraction are the same.

There's a great example of the steps on wikipedia, which I'll just copy paste here:

11010011101100 000 <--- input right padded by 3 bits
1011               <--- divisor
01100011101100 000 <--- result (note the first four bits are the XOR with the divisor beneath, the rest of the bits are unchanged)
 1011              <--- divisor ...
00111011101100 000
  1011
00010111101100 000
   1011
00000001101100 000 <--- note that the divisor moves over to align with the next 1 in the dividend (since quotient for that step was zero)
       1011             (in other words, it doesn't necessarily move one bit per iteration)
00000000110100 000
        1011
00000000011000 000
         1011
00000000001110 000
          1011
00000000000101 000
           101 1
00000000000000 100
            10 11
00000000000000 100
             1 011
-----------------
00000000000000 100 <--- remainder (3 bits).

And an example code with the CCIT-16 polynomial (a.k.a the XMODEM checksum):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def crc16(data):
    result = 0
    for octet in data:
        result = result ^ (octet << 8)
        for bit in range(8):
            if result & 0x8000:
                result = (result << 1) ^ 0x1021
            else:
                result = result << 1
            result = result & 0xffff
    return result

CRC collisions

CRC is not designed to be secure, so finding collisions, preimages and things like that are actually easy once you know the math behind it.

We will be looking at the following equality specifically:

crc(a ^ b) = crc(a) ^ crc(b)

This should be obvious why this works, but if not, here's the same thing with math notations:

(a + b) mod crc_poly = a mod crc_poly + b mod crc_poly

which is always true (because GF(2) is a field).

This means that if you have any two messages with checksums, you can always create a third message with a correct checksum, even if you don't know anything else about them. This is very important in case of encryption algos which simply xor the message with a secret byte stream (such as RC4 or AES-CTR), because in some cases a correct checksummed message can be assembled from several secret messages.

This was used in some WEP cracks back in the day.

Further reading

There is a very interesting paper out there which shows several other interesting operations, like forcing the CRC to be always zero, or even specific value.

Unfortunately none of them helped recover the polynomial.

The first success

So I learned all the above yesterday, it was an interesting refresher for my rusty college math skills. Fortunately this little side-quest only took a few hours, which is lucky considering my untreated ADHD.

But... we were doing something important, weren't we?

Oh yeah, I still have a firmware checksum to crack.

But first, I wanted to try the xor thing. I found two firmware files that had the same length. In fact, they differed in two bits (a version character was upped from '1' to '2'). I xored them together, and got this beauty:

$ hexdump -C xxxor.bin
00000000  15 a8 97 7c 00 00 00 00  00 00 00 00 00 00 00 00  |...|............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001870  00 00 00 00 00 00 00 00  00 00 00 03 00 00 00 00  |................|
00001880  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
0000fca0  00                                                |.|
0000fca1

Literally 3 bits and a supposedly correct checksum. I fixed up the size field (because size == 0 obviously did not work), and (force)fed it to the bootloader:

Download xxxxxApp Length is 64665 Byte!

Check New xxxxxxx_App CRC Success!

Recvd: 
 File Name : 
 Size: 64665     Bytes

Lol. Lmao, even.

Notice how the file name is empty (it was not redacted).

Now this is somewhat useless, but it did give some important information:

Now that we know all this, brute forcing actually became possible.

Brute forcing the CRC polynomial

I had several choices for doing the brute force:

I opted for the last one, because obviously. The tool that worked for me is called CRC RevEng. The usage was straightforward:

  1. I Modified my firmware files so they look like [data] + [checksumbytes], without the length field.
  2. Ran RevEng with the following parameters:
    ./reveng -w 32 -i 0 -x 0 -f -s fw_mod1 fw_mod2 fw_mod3 fw_mod4

It spat out a result after like 30 seconds (on a single thread):

width=32
poly=0xf4acfb13
init=0x00000000
refin=false
refout=false
xorout=0x00000000
check=0x6c9f84a8
residue=0x00000000
name=(none)

Apparently this is actually a known polynomial, known as AUTOSAR CRC. I just forgot to include this poly in the python code above.

Oh and the checksum was not little endian. It was big endian. Those sneaky bastards. I got lucky that RevEng tries for that too.

So now all I have to do is

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
def fix_crc(data):
    config = crc.Configuration(
        width=32,
        polynomial=0xf4acfb13,
        init_value=0x00,
        final_xor_value=0x00,
        reverse_input=False,
        reverse_output=False,
    )
    calculator = crc.Calculator(config)
    checksum = calculator.checksum(data[8:])
    data[0] = (checksum >> 24) & 0xff
    data[1] = (checksum >> 16) & 0xff
    data[2] = (checksum >> 8) & 0xff
    data[3] = (checksum) & 0xff

And I get my nice little SUCCESS message:

Download xxxxxApp Length is 80681 Byte!

Check New xxxxxxx_App CRC Success!

Recvd: 
 File Name : HACKD
 Size: 80681     Bytes

The device boots, and I'm free to do all kinds of modifications.

Epilogue

Hey, wait, so all that math was for nothing?

Yeah it was brute force in the end. The polynomial was even standard. Yeah I could've skipped all that math stuff if I was only a tiny bit more thorough. In fact, I could've spared you too.

But maybe the real checksum was that funky xor stuff we learned along the way.



If you need Augmented Reality problem solving, or want help implementing an AR or VR idea, drop us a mail at info@voidcomputing.hu