Writing a custom bootloader for FE310 RISC-V (Part 1)

In my last blog post, I talked about a RISC-V target for chipwhisperer. I went through the bringup process by using a somewhat legacy Freedom E SDK revision, connecting to the microcontroller through an on-board JTAG adapter (FT2232HL). The board lacked a standard bootloader and basically required running the code from host using OpenOCD + GDB combo. This has several drawbacks, starting with occupying all USB slots on my laptop 😉 (basically CW capture board, FTDI, Wireless transceiver for my mouse and I had to run Logic Analyzer from a different machine…). Also the RISC-V flashing setup would not play nice in WSL and I’m quite used to compiling firmwares on it (or another virtualized/remote environment) and flashing them directly through the connected ChipWhisperer capture board.

Getting back to the microcontroller at hand, FE310 comes with an OTP bootloader programmed with a jump to 0x20000000. The SDK, however, builds software with starting address 0x20400000, assuming there is already a bootloader at 0x20000000 which then jumps to 0x20400000. An example of such bootloader can be found in older SDK versions as double_tap_dontboot.c.

In this blog post series, I’ll focus on designing and writing a custom bootloader to drop-in replace the double_tap_dontboot and provide enough functionality to allow flashing directly from CW software. If you are not insterested in CW, but want to check out what I’ve done with the RISC-V – keep on reading, just skip the boring parts where I talk about hooking the monstrosity up to the python scripts.

Building double_tap_dontboot

In order to learn how to write, build and flash a bootloader, I first tried to build the default HiFive1 bootloader, called double_tap_dontboot.

As the first step, I modified the SDK BSP for the HiFive1 board, by adding another, custom LINK_TARGET. I started by simply copying freedom-e-sdk/bsp/env/freedom-e300-hifive1/flash.lds into freedom-e-sdk/bsp/env/freedom-e300-hifive1/bootloader.lds and modifying the following:

# Replace line
flash (rxai!w) : ORIGIN = 0x20400000, LENGTH = 512M
# With
flash (rxai!w) : ORIGIN = 0x20000000, LENGTH = 4M

Building the bootloader with those changes is quite simple. Flashing, however, required changing the procedure, as only pages 64-255 are being unlocked during a regular upload, and pages 0-63 are being protected as the “bootloader” pages. As a temporary workaround, I simply modified the flashing procedure by changing the protected region from 0 64 to 0 0. In the main Makefile:

# In freedom-e-sdk/Makefile:
# Replace line
GDB_UPLOAD_CMDS += -ex "monitor flash protect 0 64 last off"
# With
GDB_UPLOAD_CMDS += -ex "monitor flash protect 0 0 last off"

After flashing, remember to set the protected region back to default. Alternatively, you can simply make this choice conditional on the LINK_TARGET, but then you have to add the variable to the upload procedure as well:

ifeq ($(LINK_TARGET),bootloader)
    GDB_UPLOAD_CMDS += -ex "monitor flash protect 0 0 last off"
else
    GDB_UPLOAD_CMDS += -ex "monitor flash protect 0 64 last off"
endif

The commands I used for build and flash are following:

make software PROGRAM=double_tap_dontboot BOARD=freedom-e300-hifive1 LINK_TARGET=bootloader

make upload PROGRAM=double_tap_dontboot BOARD=freedom-e300-hifive1
# or, if you've made the GDB commands conditional on the link target:
make upload PROGRAM=double_tap_dontboot BOARD=freedom-e300-hifive1 LINK_TARGET=bootloader

I reflashed the board with the led_fade I discussed in the first RISC-V blog post, pressed nRST button and the leds started flashing after about 2.5 seconds, prompting me that the bootloader flashing was successful.

Designing a simple serial bootloader

In our scenario, we want to enable user to perform a full re-flash of the device without using a dedicated USB cable or JTAG probe, simply by downloading the firmware using serial interface. This is similar to how the STM32F programmer works in ChipWhisperer software (using the standard STM32F bootloader firmware), or what is available on numerous Kinetis MCUs via the Kinetis bootloader.

First and foremost, we have to design a way to tell the bootloader we want it to run in a special download mode, in which we can trigger its commands for reading/writing/erasing flash, etc. Luckily, the double_tap_dontboot does exactly that! It waits for a second (half a second actually) for the user to press the reset button just after starting – if the reset is triggered (again) in that period, the bootloader goes into an infinite loop waiting for a debugger to connect into a “safe state”.

Given we can easily trigger resets from the ChipWhisperer board, this is a perfect way to trigger the bootloader into this special mode. As far as I know, the target board exports only a couple of GPIO pins corresponding to CW pins 1, 2 and 4, so triggering a dedicated boot pin would interfere with trigger and/or serial functionalities.

Triggering reset from ChipWhisperer software

Manually triggering reset of the target device is quite easy via the scope interface. There is a dedicated io pin for nrst and the target boards in general respect that.

The tricky part was to determine the right timings. The double_tap_dontboot waits for 500ms before proceeding to the regular function, but there is still an unknown delay between triggering reset and this period. I have experimentally determined that on my board waiting shorter than 1315ms does not trigger the “double tap”, and so does (not) waiting longer than 1800ms. I have chosen a safe value of 1500ms, but this may depend on the actual default core clock, temperature, voltage, etc. Note the troubles I had with synchronizing UART when using the internal high frequency clock (see my previous blog post here).

In general, though, this is a simple code that triggers the “dontboot” loop in the double_tap_dontboot bootloader firmware, which I’ll use later for triggering my own bootloader:

scope.io.nrst = 'low'
time.sleep(0.005)
scope.io.nrst = 'high'
time.sleep(1.500)  # Values ranging from 1.316 to 1.800 worked fine on my setup at 28.3 deg C and 1.767V at VCC
scope.io.nrst = 'low'
time.sleep(0.005)
scope.io.nrst = 'high'

Modifying double_tap_dontboot

In terms of hooking up to the existing bootloader, it turned out to be very easy. I simply added a call to a custom function I called myre_bootloader_init just before the infinite loop waiting for debugger/reset, and replaced the asm volatile wait within that loop with a custom “loop handler” function I called myre_bootloader_loop. The changes are highlighted below:

    if (AON_REG(AON_BACKUP15) == BACKUP15_MAGIC) {
      // Reset was "double-tapped".

      // Re-arm the reset double-tap
      AON_REG(AON_BACKUP15) = 0;

      // PWM Red LED

      GPIO_REG(GPIO_IOF_EN)     |=   (1 << RED_LED);
      GPIO_REG(GPIO_OUTPUT_XOR) &=  ~(1 << RED_LED);
      GPIO_REG(GPIO_IOF_SEL)    |=   (1 << RED_LED);

      GPIO_REG(GPIO_OUTPUT_VAL) &=  ~(1 << GREEN_LED);
      GPIO_REG(GPIO_OUTPUT_XOR) &=  ~(1 << GREEN_LED);
      GPIO_REG(GPIO_OUTPUT_EN)  &=  ~(1 << GREEN_LED);

      PWM1_REG(PWM_CFG)   = 0;
      PWM1_REG(PWM_COUNT) = 0;
      PWM1_REG(PWM_CMP0)  = 0xFF;
      PWM1_REG(PWM_CMP3)  = 0xFF;
      PWM1_REG(PWM_CFG)   = PWM_CFG_ENALWAYS;

      int pwm_val = 255;

      // !!! CHANGE HERE !!!
      // I added this function to initialize a with a semi-known clock, UART baudrate, etc.
      myre_bootloader_init();
	
      // Wait for debugger or another RESET press.
      while(1){

        // Make the PWM a fade. This is preferable to just a PWM blink
        // because it makes it clear that the processor is actually
        // running this code, not just the PWM hardware.

        now = *((volatile uint64_t*) (CLINT_CTRL_ADDR + CLINT_MTIME));
        then = now + 32768/500;
        while (*((volatile uint64_t*) (CLINT_CTRL_ADDR + CLINT_MTIME)) < then) {
          // !!! CHANGE HERE !!!
          // I replaced `asm volatile ("");` with the following function call
          // to handle commands incoming via serial interface
          myre_bootloader_loop();
        }
        pwm_val = (pwm_val == 0) ? 255 : (pwm_val -1);
        PWM1_REG(PWM_CMP3) = pwm_val << 4;
      }
           
    } // If Magic

Simple bootloader commands

For the first iteration I wanted to create a simple, (un)limited bootloader, which could be sufficient for uploading and running simple functions from RAM, writing and reading arbitrary memory regions. For this, I really needed only three commands:

  • READ for reading arbitrary memory regions
  • WRITE for writing, obviously
  • CALL for calling user functions, possibly with arguments

Additionally, I decided to add a fourth, simple command to check the actual capabilities of the bootloader:

  • GET VERSION for checking what functions are implemented in firmware

The last function may prove to be especially helpful if at some point the communication protocol or data format changes for some of the functions and the bootloader will be out of sync with the programmer version.

Each of the functions obviously needs an interface. For READ and WRITE I decided for simply providing a starting address and length as parameters, with READ dumping the read data into the serial stream and WRITE feeding from stream into the memory. CALL in turn accepts the function address and a pointer to a void* data, which is then passed to the function as a parameter (obviously, the user can simply provide NULL and ignore the contents). GET VERSION accepts a single byte denoting a command and returns 0xFF if the command is not supported, or its version otherwise. The versions should start with 0 and increase on each functional change (e.g. adding a new functionality, changing the API, etc).

Since we’re using such an unreliable protocol as UART, we need some way to check integrity of both commands and data must be provided. We’ll get back to it when designing packet format, but for now we should stress that the data read or written should also be protected with a checksum, no matter how trivial.

Summarizing:

  • GET VERSION – Returns the version of protocol command cmd (0 is the default version if no major revisions were done); If the command is not valid, returns 0xFF.
    • 1 byte argument: cmd
  • READ – Reads up to size bytes from address addr and writes them back to serial; the stream is finished with a standard checksum.
    • 4 byte argument: addr
    • 2 byte argument: size
  • WRITE – Reads up to size bytes from serial and writes them under address addr; the stream the must be finished with a standard checksum.
    • 4 byte argument: addr
    • 2 byte argument: size
  • CALL – Calls a function under address func; pass a pointer args as a void* pointer to that function.
    • 4 byte argument: func
    • 4 byte argument: args

Simple bootloader protocol

We are using a very simple protocol with minimum data integrity validation with a simple “parity checksum”. To compute it, simply XOR all bytes of data together and then invert all bits. When reading, an exclusive bitwise or of all bytes, including the checksum byte, should add up to 0xFF. If this description is not clean enough, here is a simple pseudocode:

def compute_checksum(bytes):
    cs = 0xFF
    for b in bytes:
        cs ^= b
    return cs

def verify(bytes_with_checksum):
    cs = 0x00
    for b in bytes_with_checksum:
        cs ^= b
    return cs == 0x00

The protocol itself consists of a header byte containing a command ID (unique for each command, 0x00 and 0xFF are reserved as invalid to quickly detect broken streams), followed by an appropriate number of argument bytes (see command list) and finally the checksum. Note that the packet length is fully determined by the command type (first byte), but additional data may follow (like in case of the WRITE command) and it must be protected by its own checksum (therefore the additional data is not directly part of the packet and thus is not parsed as such).

Multi-byte values (such as 16-bit lengths and 32-bit pointers) are passed in a native byte order, i.e. in Little Endian. This may be confusing when reading the packets directly, but has obvious advantages on the device side (e.g. direct memcopies can be applied).

For the commands described in the last section, I’ve assigned the following IDs:

  • GET VERSION – 0x01
  • READ – 0x02
  • WRITE – 0x03
  • CALL – 0x21

An example of READ packet that reads 4 bytes from address 0x21FEC is:

[02] [EC1F0200] [0400] [08]
^-------------------------- READ header (0x02)
^--------------------- READ addr field (0x00021FEC)
^---------- READ length (4 bytes)
^--- Checksum

In response, firmware must check that the command ID is valid and supported (that’s actually the first thing to be checked, because it determines how many parameter bytes must be read), then read the arguments and verify the checksum. If any of the above fails (i.e. the command is not valid/supported or the checksum doesn’t match), a NACK byte must be sent together with a simple ERROR byte. If the checks pass, however, even if the command would fail for whatever reason, an ACK byte is sent and the command starts executing.

Note the ACK byte is always send if the command is correct simply to decouple the command logic from the protocol logic. If the packet is consistent, but the command is somehow malformed, the protocol logic should not care and pass this error reporting to the command logic.

Below are the defines I used in my implementation for the ACK/NACK/ERROR bytes:

#define MYRE_BOOTLOADER_ACK    0x31
#define MYRE_BOOTLOADER_NACK   0xCE

#define MYRE_BOOTLOADER_ERR_INVALID_CMD    0x81
#define MYRE_BOOTLOADER_ERR_BAD_CHECKSUM   0x82

Therefore, a proper response for the above request (READ 4 bytes from 0x21FEC) would be:

[31] [08000000] [f7]
^------------------- ACK
^-------------- READ data
^--- READ checksum

Another example, GET VERSION of 0x00 and 0x02:

Malformed packet (bad checksum):
Host: [01] [00] [99]
^------------- GET VERSION header
^-------- GET VERSION cmd (0x00)
^--- checksum (invalid)
Target: [CE] [82]
^------------- NACK
^-------- ERR_BAD_CHECKSUM

Invalid command (trying to call cmd 0x00)
Host: [00] [10] [EF]
^------------- ??? header (invalid)
^-------- ??? param (invalid)
^--- checksum (valid)
Target: [CE] [81] [CE] [81] [CE] [81]
^------------- NACK
^-------- ERR_INVALID_CMD
^------------- NACK
^-------- ERR_INVALID_CMD
^------------- NACK
^-------- ERR_INVALID_CMD


Valid command, invalid target (GET VERSION: command 0x00 not supported):
Host: [01] [00] [FE]
^------------- GET VERSION header
^-------- GET VERSION cmd (0x00)
^--- checksum (valid)
Target: [31] [FF]
^------------- ACK
^-------- GET VERSION response (invalid/unsupported command)

Valid command, valid target:
Host: [01] [02] [FC]
^------------- GET VERSION header
^-------- GET VERSION cmd (0x02)
^--- checksum (valid)
Target: [31] [00]
^------------- ACK
^-------- GET VERSION response (version 0)

Note the repeated NACK in case of invalid command. If host waited for a split second before sending the rest of the command it could pickup on the NACK before starting something that the target thought to be the next command.

In the next part(s)…

… I want to show you:

  • How I implemented the basic protocol, including the general program structure and function flows
  • How to interact with the bootloader via the ChipWhisperer software (low-level)
  • How the basic bootloader can be used for basic stuff such as loading programs into RAM and executing them from there
  • Discuss and maybe implement more advanced functions, like flash erase, flash protect, flash write, changing clock sources and baudrates, etc.

Leave a Reply

Close Menu