A computer worm is a program that replicates and propagates itself across a network. Typically associated with malware, this project deploys a worm for good! In applications that involve a large number of independent microcontroller-based devices (e.g. swarm robotics or virtual cattle fencing), the bottleneck in the research and development process can be the time that it takes to plug-in/reprogram each individual device.
One scalable approach to reprogramming a huge collection of devices is to reprogram a single device, and to have that program spread (like a virus or a computer worm) to other devices that it encounters. You then get an exponential spread of firmware updates across the swarm.
This webpage describes a custom bootloader that allows for firmware updates to propagate from device to device. The particular mechanism for wireless data transfer in this project is infrared, though the bootloader is written so that it could be modified for radio, BLE, LoRA, or something else.
This project is an augmentation of a previous project: a custom serial bootloader for the RP2040.
In order to avoid bloating this webpage too much, I've consolidated prerequisite reading onto a few other webpages. I strongly encourage folks to read these other webpages before returning to this one.
This webpage describes a custom serial bootloader which we will modify so that it behaves like a worm. As described on that webpage, this serial bootloader lives directly above the stage 2 bootloader in RP2040 memory, and receives new executable programs via a wireless UART interface. This bootloader has the ability to replace existing user application code, and it has the ability to detect and branch to valid programs in flash memory. The user can force the device into bootloader mode via a GPIO port, or an application program can force the device into bootloader mode by writing to a watchdog scratch register and then executing a soft reset.
When the system comes out of reset, the bootloader asks itself the following questions, in the following order:
- Yes: Skip straight to step 3.
- No: Go to step 2.
- Yes: Branch straight to it!
- No: Go to step 3.
The mechanisms by which it asks and answers these questions are discussed at length on the serial bootloader webpage. In order to allow for the RP2040 to share its own application code in a worm-like fashion, we are going to modify the above sequence of questions and actions to the following:
In the diagram above green arrows indicate affirmative answers to questions, red arrows indicate negative answers, and black arrows indicate unconditional transitions.
The serial bootloader webpage describes each part of this state machine except for the boxes drawn in orange. These are the augmentations which are required to turn the bootloader into a worm, and the ones that the rest of this webpage will discuss.
In order to determine whether it should share its valid program, the bootloader must check something to see if the user (or a program which forced a soft reset) is commanding the RP2040 to enter worm mode to share its program. In fact, the bootloader allows for the user to force the system into worm mode by holding down a button attached to a GPIO port, and it also allows for a program to force the system into bootloader mode using a watchdog peripheral scratch register.
The watchdog peripheral can perform a soft reset of the RP2040, and it offers eight scratch registers which retain their value through a soft reset (but, importantly, not through a hard reset). These scratch registers are described on page 548 of the RP2040 datasheet. If a program wants to force the system back into bootloader mode, it writes a 1 to watchdog scratch register 1 and then forces a soft reset. After the soft reset, the bootloader will check both a GPIO and scratch register 1 for an answer to question 1 above.
In the event that the bootloader is sent into worm mode by a GPIO port or by a watchdog scratch register, it calls the function shareApplication()
. This function is linked here, and discussed by section below.
The first thing that the shareApplication()
function does is initialize the UART channel. It is initialized for 1 stop bit, 8 data bits (since we'll be sending characters), default parity, and a baud rate of 4800. The baud rate is 4800 for compatibility with the wireless infrared UART interface.
// Setup UART
// Initialize the UART channel
uart_init(UART_ID, BAUD_RATE) ;
// Set our data format
uart_set_format(UART_ID, DATA_BITS, STOP_BITS, PARITY);
// Turn off FIFO's - we want to do this character by character
uart_set_fifo_enabled(UART_ID, false);
// Set the TX and RX pins by using the function select on the GPIO
gpio_set_function(UART_TX_PIN, GPIO_FUNC_UART);
gpio_set_function(UART_RX_PIN, GPIO_FUNC_UART);
gpio_pull_up(UART_RX_PIN) ;
The shareApplication()
function then initializes and starts a PWM channel to run with a 50% duty cycle at 56 kHz. This drives the IR LED which the UART TX line modulates on and off, as described on the wireless infrared via UART page.
// Initialize PWM
// Tell GPIO 0 and 1 they are allocated to the PWM
gpio_set_function(PWM_PIN, GPIO_FUNC_PWM);
// Find out which PWM slice is connected to GPIO 0 (it's slice 0)
uint slice_num = pwm_gpio_to_slice_num(PWM_PIN);
// Set period of 4 cycles (0 to 3 inclusive)
pwm_set_wrap(slice_num, 2232);
// Set initial B output high for three cycles before dropping
pwm_set_chan_level(slice_num, PWM_CHAN_B, 1116);
// Set the PWM running
pwm_set_enabled(slice_num, true);
Because it's useful to see the LED flash as the handshakes occur.
// Initialize the LED pin
gpio_init(LED_PIN);
// Configure the LED pin as an output
gpio_set_dir(LED_PIN, GPIO_OUT);
We are going to read the contents of flash memory and sent those contents, over the wireless serial link, to the receiving RP2040. We'll start reading the contents of flash memory at the very beginning of the application code.
// The pointer to flash memory
uint8_t* flash_pointer = application_start ;
Per the modified linker script discussed on the serial bootloader webpage, or at an address of 0x10003000. application_start
, a global, is initialized as shown below.
// Application program offset in flash (12*1024)
// This should agree with the linker script for the
// application program.
#define PROGRAM_OFFSET 32768
// Application information for programming another Pico
uint8_t *application_start = (uint8_t *)(XIP_BASE + PROGRAM_OFFSET) ;
The worm sends data to the receiving RP2040 in accordance with the Intel hexadecimal object file format. This file format is discussed at length on the serial bootloader webpage but, in short, each line includes an address (which will come from the flash_pointer
), a length (all will be 16 bytes long, arbitrarily chosen), a record type, the data itself, and a checksum. The checksum is the 2's complement of the LSB of the sum of all bytes in the packet which precede the checksum. The DMA sniffer, described on page 97 of the RP2040 datasheet provides an easy mechanism for computing this checksum. The code below acquires an unused DMA channel, configures it for 8-bit (1-byte) transfers, and configures the associated DMA sniffer. The DMA channel will simply move the packet to a dummy array, computing the checksum as it does so.
// Claim and configure a DMA channel
dma_chan_1 = dma_claim_unused_channel(true) ;
// Configure the first channel (performs checksum)
dma_channel_config c1 = dma_channel_get_default_config(dma_chan_1);
channel_config_set_transfer_data_size(&c1, DMA_SIZE_8);
channel_config_set_read_increment(&c1, true);
channel_config_set_write_increment(&c1, true);
dma_channel_configure(
dma_chan_1, // Channel to be configured
&c1, // Config. we just created
data_binary_dummy_dest, // write address
data_binary, // The initial read address
20, // Number of transfers
false // Don't start immediately.
);
// Configure the sniffer!
dma_sniffer_enable(dma_chan_1, 0x0F, true);
hw_set_bits(&dma_hw->sniff_data, 0x0);
Before sending any valid packets, the worm sends some packets with incorrect checksums. It sends ten of these, separated by about half a second. In the event that the application code which is running on the receiving device can receive UART messages, these dummy messages can be used to tell the other device to force itself into bootloader mode. Once in bootloader mode, any message which fails a checksum is simply ignored.
// Get the other device into bootloader mode
for (int i=0; i<10; i++) {
for (int k=0; k<17; k++) {
uart_putc(UART_ID, to_boot[k]) ;
sleep_ms(UART_SMALL_DELAY) ;
}
sleep_ms(UART_DELAY) ;
}
The to_boot
array that is being sent is a statically-stored array which encodes an extended linear address message with an incorrect checksum. This is stored globally, and copied below.
unsigned char to_boot[17] = {':','0','2','0',
'0','0','0','0',
'4','1','0','0',
'0','E','B','\n', '\r'} ;
The FIFO is disabled for the UART, so reading a single character will clear the receive buffer. The receiving device will respond to every packet that it receives with a single character. Reading a character clears the receive buffer for subsequent transactions.
// The FIFO is disabled. One read will clear the rx register
uart_getc(UART_ID) ;
As described on the serial bootloader page, the bootloader state machine responds to a message with the extended linear address record type by preparing to receive a new program. It resets its own programming pointer, clears the first sector of memory, and clears the page buffer in preparation for subsequent data packets to arrive. We send one of these messages to get the receiving device ready to receive the program. Note that we keep sending the message for as long as the receiving device responds to those messages with the character A
. This is the receiving device telling us "please send that again." As soon as it's understood the message (i.e., the checksum passed), it will respond with the character B
.
// Send the hexline which indicates start of file
do {
for (int k=0; k<17; k++) {
uart_putc(UART_ID, start_of_file[k]) ;
}
}
while (uart_getc(UART_ID)=='A') ;
The worm then sends its own application code, 16 bytes at a time. It does so by populating an array called data_binary
according to the Intel hexadecimal object file format. The first byte in the array represents the length of the data payload (0x10, or 16 bytes). The second byte represents the most-significant 8 bits of the least-significant 16-bits of the base address for the data payload (the most significant 16 are in the extended linear address packet, per the Intel standard). The next byte is the least-significant 8 bits of the least-significant 16 bits of the address. The next byte is the record type (0x00
, which represents "data"). And then we populate the next 16 bytes with the data at the address pointed to by flash_pointer
, incrementing flash_pointer
each time.
The code then clears the DMA sniffer register, resets the DMA read and write address, starts the DMA channel and waits for it to finish. The DMA sniffer computes a checksum on the data in-flight, and then the last byte of the data binary is populated with the 2's complement of the least-significant byte of this checksum.
All of the binary data in the data_binary
array then gets converted to ASCII characters by means of a lookup table, and the resulting array of ASCII hex values gets stored in the character array data
. The program reads the most recent character from the receiving device, clearing the receive buffer. It then sends the data
array to the receiving device over the UART channel until it receives confirmation that the packet was received without error.
The code continues doing this until it reaches a hard-coded end address, though this hard-coded end address could be replaced with the end of binary address as indicated in the binary info header.
// Send all the data hexlines
do {
// Populate the first few bytes of the data binary
data_binary[0] = 0x10 ;
data_binary[1] = (unsigned char)((((uint32_t)(flash_pointer)) >> 8) & 0xFF) ;
data_binary[2] = (unsigned char)(((uint32_t)(flash_pointer)) & 0xFF) ;
data_binary[3] = 0x00 ;
// Populate the data into the hexline
for (int i=4; i<20; i++) {
data_binary[i] = *flash_pointer++ ;
}
// Reset the sniff register to zero
dma_sniffer_set_data_accumulator(0x00000000) ;
// Use DMA to compute a checksum
dma_channel_set_read_addr(dma_chan_1, data_binary, false) ;
dma_channel_set_write_addr(dma_chan_1, data_binary_dummy_dest, true) ;
dma_channel_wait_for_finish_blocking(dma_chan_1) ;
// Populate the checksum into the data binary
data_binary[20] = (unsigned char)((~(dma_hw->sniff_data) + 0x1) & 0x000000FF) ;
// Convert the data binary to hex characters
for (int i=0; i<21; i++) {
data[(i<<1)+1] = numToHex((data_binary[i]>>4) & 0xF) ;
data[(i<<1)+2] = numToHex(data_binary[i] & 0xF) ;
}
// Make sure the buffer is clear
uart_getc(UART_ID) ;
// Send to the receiving device. Keep sending for as long
// as the response is A
do {
for (int i=0; i<45; i++) {
uart_putc(UART_ID, data[i]) ;
}
} while (uart_getc(UART_ID)=='A') ;
// Blink the LED
gpio_put(LED_PIN, !gpio_get(LED_PIN)) ;
} while (data_binary[1]<0xA9) ;
Send a packet with the "end of file" record type, which will cause the receiving device's bootloader to branch into the newly received program.
// Get the other device into application mode
do {
for (int i=0; i<45; i++) {
uart_putc(UART_ID, end_of_file[i]) ;
}
} while (uart_getc(UART_ID)=='A') ;
De-init all peripherals, clear all interrupts, free DMA channels, etc. Tidy up before we leave the bootloader.
// Clean up
// Turn off interrupts (NVIC ICER, NVIC ICPR)
hw_set_bits((io_rw_32 *)0xe000e180, 0xFFFFFFFF);
hw_set_bits((io_rw_32 *)0xe000e280, 0xFFFFFFFF);
// Free-up DMA
dma_channel_cleanup(dma_chan_1) ;
dma_channel_unclaim(dma_chan_1) ;
// Disable sniffer
dma_sniffer_disable() ;
// Release UART and GPIO
uart_deinit(UART_ID) ;
gpio_deinit(LED_PIN) ;
gpio_deinit(BOOTLOAD_PIN) ;
// Stop the PWM running
pwm_set_enabled(slice_num, false);
gpio_deinit(PWM_PIN) ;
Call the handleBranch()
routine, described in the serial bootloader writeup, which sets the main stack pointer and branches to the application code in flash memory.