Generic Interface for Baremetal Task Management in Rust

In this article we will explore my solution for generalizing task management in a real-time environment.

The code here is tightly coupled to the Rust programming language, although the concepts could be implemented anywhere. This need was spawned from a baremetal kernel project. The hardware I'm developing for is a single-core ARMv7 Cortex-M chip: Teensy4.0. You can find all the source code here on GitHub.

Any baremetal program is going to require multiple tasks to execute. The question is, how concurrent do you want them to be? My kernel could be classified as a Real Time Operating System (RTOS) and that means I don't use a thread scheduler to govern tasks. I simply have a collection of processes and each one executes extremely quickly, giving the appearance of concurrency.

Here's an example of what my generic approach to task management looks like. I call these gates. In this article we will explore how gates evolved and one way to implement them.

// System entrypoint
loop {
   blink_task();
}

fn blink_task() {
    gate_open!()
       .once(|| { pin_mode(13, OUTPUT); })
       .when_ms(500, || { pin_out(13, HIGH); })
       .when_ms(500, || { pin_out(13, LOW); })
       .build();
}

Here is another example of what a gate might look like in action:

 // When data is available over serial, read the bytes into a buffer.
loop {
    gate_open!()
        .when(|| { return serial_available() > 0 },
              || { serial_buffer.append(serial_read()); }
        ).build();
}

The key takeaway here is that Gates exist in a loop { } segment but automatically handle when to yield compute power back to the kernel. You don't have to implement any state management yourself, the gate itself abstracts it all away.

The old world

In general, I've noticed a few patterns about embedded tasks:

Tasks often have at least two discreet states: initialize and work.
Tasks can have complex nested conditions.
Tasks are often placed in an infinite loop and need to be able to pick up where they left off each iteration.

Let's take our favorite example, blinking a light. In a system that requires near real-time performance, we can't just use delay() and hog resources. So here's some rust code to demonstrate how we can yield processing power in between blink states:

static mut INITIALIZED: bool = false;
static mut NEXT_ITERATION: u64 = 0;
static mut HIGH: bool = false;

// System entrypoint
loop {
    blink_task();
}

fn blink_task() {
    if unsafe { INITIALIZED } == false {
        pin_mode(13, Output);
        unsafe { INITIALIZED = true };
    } else {
        if system::millis() > unsafe { NEXT_ITERATION } {
            pin_out(13, unsafe { !HIGH });
            unsafe {
                HIGH = !HIGH;
                NEXT_ITERATION = system::millis() + 500;
            };
        }
    }
}

This definitely works, but it is not generic. Each task in our system is going to require multiple static variables to help control the flow of our process. It gets unwieldy very quickly.

Anatomy of a Gate

If we can agree on an interface to represent stateful segments of work, that's half the battle. The other half is abstracting all those pesky static variables... Introducing Gate!

A gate describes a series of connected tasks that are, well, gated by a condition. The interface for a gate looks something like this:

pub trait Gated {
    fn new() -> Self;
    fn when(&mut self, cond: CondFn, then: ExecFn) -> &mut Self;
    fn when_ms(&mut self, then: ExecFn) -> &mut Self;
    fn once(&mut self, then: ExecFn) -> &mut Self;
    fn build(&mut self) -> Gate;
    fn process(&mut self);
}

pub struct Gate {
    pub conditions: Vector::<CondFn>,
    pub functions: Vector::<ExecFn>,
    pub durations: Vector::<u64>,
    pub target_times: Vector::<u64>,
    pub current_index: usize,
    pub tail: usize,
    pub built: bool,
}

If you are familiar with design patterns, you may recognize this as the builder pattern. If one were to implement this interface with the following design principles, you would have a much cleaner mechanism for representing units of connected work.

Gate Design Principles

When the condition is not met, a gate must yield.
When the condition is met, a gate will run the accompanying work function only once.
Subsequent invocations of the gate will not re-run any code block that has already been processed.
Gates can be chained together, similar to Promises.
Once every gate in sequence has been invoked, the entire flow is reset.

Implementation

I decided to implement gates using Rust macros, which give us the unique opportunity to inject code around a given statement. Using this language feature, we are left with the challenge of writing something that is stable inside a loop { } and can be executed for any discreet gate that is created.

If you could somehow derive a unique hash that represents a gate, the following pseudo-code would be a reasonable template any gate-providing mechanism could follow:

static mut SYSTEM_GATES: BTreeMap<u32, Gate> = BTreeMap::new();

loop {
   // Given a unique identifier that describes a block of code...
   let id: u32 = code_hash();
   let gate: Gate = unsafe { SYSTEM_GATES.get(id) };

   match gate {
       None => {
           // Create gate and add it to the SYSTEM_GATES static
       },
       Some(gate) => {
           // Process gate
       }
   }
}

There is one calamitous caveat here: Rust does not support reflection. Putting that aside for a moment, let's explore what the macro might look like.

In this macro, we simply lookup whether the gate exists or not and return either a shiny new gate - or the stored instance. The cool thing about this macro is that it injects code before the gate. This is an important feature that I'll touch on in a moment.

#[macro_export]
macro_rules! gate_open {
    ( $( $x:expr ),* ) => {
        {
            // Get the hash of the gate in question
            let id = code_hash();
            let current_node = unsafe { GATES.get(id) };
            let result: &mut Gate;

            // Determine if the gate already exists, or if we need to make it
            match current_node {
                None => {
                    // Let's create a new gate.
                    let new_gate = crate::mem::alloc::<Gate>();
                    unsafe { *new_gate = Gate::new(); }

                    // This new gate is what we'll return
                    result = unsafe { &mut (*new_gate) };

                    // Insert the gate in the global gate register
                    unsafe { GATES.insert(id, new_gate as u32) };
                },
                Some(gate) => {
                    result = unsafe { (gate as *mut Gate).as_mut().unwrap() };
                }
            }

            result
        }
    };
}

I won't go into detail about what the builder pattern looks like, but at a high level it works like this:

Each time you add a component to the gate, the condition gets stored in the conditions vector, the work function gets stored in the functions vector, and that's about it. Here's some code to demonstrate:

pub fn when(&mut self, cond: CondFn, then: ExecFn) -> &mut Self {
    if self.built {
        return self;
    }

    self.conditions.add(cond);
    self.functions.add(then);
    self.tail += 1;
    return self;
}

pub fn build(&mut self) -> Gate {
    if self.built {
        self.process();
    } else {
        self.compiled = true;
    }
    return *self;
}

The secret sauce is in the build() method. If the gate is already built, then build() will just proxy to process(). And in this way, any subsequent attempt to build the gate results in running it instead.

code_hash()

All that code looks great, but it's predicated on a language feature that simply does not exist. How can you derive a unique hash of a function which is stable in a loop { }? Without this piece, the whole thing collapses.

Well, never let language limitations get in the way of beautiful code!

/// This function returns a u32 containing the
/// program counter of the line of code which
/// invokes this function.
pub fn code_hash() -> u32 {
    let result = 0;
    unsafe {
        asm!("mov r0, lr");
    }
    return result;
}

This is my pièce de résistance.

Because rust macros allow us to inject code around the function call, that means the program counter will be unique prior to invoking the Gate() constructor. But since we're running all this in a loop - it will be consistent across iterations.

Put another way, the literal line of code which invokes the gate constructor is going to be different for every gate. But it'll never change between iterations of the application loop. Using this absolute, we can come up with that coveted unique hash necessary to bring this whole solution together.

Wrapping Up

My goal is to make it extremely simple to write efficient, baremetal code in a way that automatically yields processing power without having to give it a second thought. This Gate concept has already seen value in my kernel project. There are limitations however:

The syntax of having two anonymous functions next to each other bothers me.
The gate introduces a few extra cycles that feel unnecessary.
For extremely complex timing (like writing a driver for the WS2812b LEDs) I found that gates were not powerful enough.

That being said, I can see a world where this concept fundamentally provides value. I hope to release a crate version of it soon.

If you would like to see my full gate_open!() prototype, feel free to explore the code. It's likely to change a bit after publishing this article, but the sentiment will be the same.