run-to-completion protothreads

repository: rtcproto

rtcproto.h

A single-header ANSI C library that depends on C99 for fixed-width integer types and POSIX.1-2001 for clock and time functions. It implements a run-to-completion scheduler on top of the concept of protothreads, making their otherwise stackless threads stateful. Then, it provides convenient mechanisms for cooperation with the scheduler, letting protothreads yield after a timepoint has passed.

This provides another way of deferring computation into the future, and executing it whenever it is convenient for the processor, letting it return to other tasks in the meantime without any concept of simultaneous concurrency. While many coroutine libraries prefer more nuclear solutions (which come with their own drawbacks), this library tries to see just what it can get away with using a portable toolset.

The library header comes with an introduction near the top, I recommend reading that for a short rationale on its workings (as well as the documentation above the data structures and definitions). For usage examples, I recommend checking out the tests in the repository, or building and running the demo.

  1. Motivation
  2. Coroutines
  3. Protothreads
  4. rtcproto

Motivation

Like many other projects, this one also grew out of reinventing the wheel, or at least wanting to. While working on desert-train, I realized that the heavy computations and the tight event loop together meant that I had to figure out some way to defer the former so that I can keep in sync with the latter. The approach I ended up with was to keep the logic responsible for stepping the state run on a single frame, as usual, but to defer heavier, iterative loads into a system that would execute them later.

Slap a double buffer on top, add a(n) (un)healthy level of macro layering, and try to predict how long executing an interval of a work type could take, and you end up with whatever I came up with at the time. I tried to keep the overhead on definitions you had to add to work with this system minimal, but I still ended up with someting complicated and rigid. It worked perfectly fine, but looking back on it, it was mostly just clever for what I knew at the time. Even if I reached no enlightenment into the ways of C since then, hindsight seemed to let me cobble together something far more comfortable.

Coroutines

Enter the wild world of ANSI C, where although you may lack comforts that other, more sleek languages pack with their batteries (such as coroutines), nothing can stop you from building beautiful, devilish hacks that let you imitate them. Specifically (and understandably), C has no concept of local continuations, which are central to that of coroutines. I read up on them at first after someone said that wanting to suspend execution rhymed with coroutines to their ears.

There isn’t a great segue here into how you can get a barebones local continuation in C that stores only the point of execution inside a function, so let’s look at a little language quirk that lets us do just that.

Duff’s device

On a fated day somewhere in November 1983, Tom Duff was trying to speed up some animations while working for Lucasfilm. If you have done loop unrolling by hand in assembly before, could you try to also imagine how loop unrolling with an arbitrary entry point would look at the level of C? Feast your eyes on the following forsaken piece of code.

send(to, from, count)
register short *to, *from;
register count;
{
    register n = (count + 7) / 8;
    switch (count % 8) {
    case 0: do { *to = *from++;
    case 7:      *to = *from++;
    case 6:      *to = *from++;
    case 5:      *to = *from++;
    case 4:      *to = *from++;
    case 3:      *to = *from++;
    case 2:      *to = *from++;
    case 1:      *to = *from++;
            } while (--n > 0);
    }
}

What makes this work is that, for some reason, the case labels of a switch statement are valid inside sub-blocks of that switch statement. Even if those blocks belong to control flow constructs such as loops or conditionals. More generally, what we needed was a way to be able to jump to specific points in our code. If we could make this less of a pain to use, we could already use this with a local continuation that stores the point of execution (here, that’s the case label to jump to next).

Coroutines in C

Simon Tatham demonstrated rewriting a pair of producer-consumer functions with coroutine call primitives, with a sort of “return and continue” operation. First, he uses a switch statement to decide which goto to execute, to return to a specific point in the function.

int function(void) {
    static int i, state = 0;
    switch (state) {
        case 0: goto LABEL0;
        case 1: goto LABEL1;
    }
    LABEL0: /* start of function */
    for (i = 0; i < 10; i++) {
        state = 1; /* so we will come back to LABEL1 */
        return i;
        LABEL1: /* resume control straight after the return */
    }
}

However, Duff’s device shows that we can use the switch statement to perform the jump itself, because we can just interleave its case labels with the code.

int function(void) {
    static int i, state = 0;
    switch (state) {
        case 0: /* start of function */
        for (i = 0; i < 10; i++) {
            state = 1; /* so we will come back to "case 1" */
            return i;
            case 1: /* resume control straight after the return */
        }
    }
}

With only a slight bit of added cleverness, we can now construct a few well chosen macros, to “hide the gory details in something plausible-looking”.

#define crBegin static int state=0; switch(state) { case 0:
#define crReturn(i,x) do { state=i; return x; case i:; } while (0)
#define crFinish }
int function(void) {
    static int i;
    crBegin;
    for (i = 0; i < 10; i++)
        crReturn(1, i);
    crFinish;
}

It’s almost exactly what we wanted. We do have to obey some ground rules for this to work, though:

  1. Surround the function body with crBegin and crFinish.
  2. Declare all local variables static if they need to be preserved across a crReturn.
  3. Never put a crReturn within an explicit switch statement.

The only remaining problem is that just as we had to invent new labels to avoid them colliding with the existing ones in the previous implementation, now we have to ensure that all the parameters to each crReturn are different. The compiler would catch this if it happened, but it’s still inconvenient. Luckily, ANSI C provides the special macro name __LINE__, which expands to the current source line number. For the cost of another constraint (never putting two crReturn statements on the same line), this lets us assign state numbers automagically.

Here, the local continuation is somewhat obscured, so let’s separate that out next.

Protothreads

Adam Dunkels and Oliver Schmidt were trying to simplify event-driven explicit state machines in severly resource-constrained systems (such as embedded). I recommend reading their initial technical report and the whitepaper they co-authored to get the full picture of what a protothread is, but I’ll leave a short description here.

Protothreads are extremely lightweight stackless threads designed for severely memory constrained systems such as small embedded systems or sensor network nodes. Protothreads can be used with or without an underlying operating system.

Protothreads provide a blocking context on top of an event-driven system, without the overhead of per-thread stacks. The purpose of protothreads is to implement sequential flow of control without complex state machines or full multi-threading.

Protothreads are defined by a function and a local continuation within it. The technical report outlines two ways to work with this local continuation: with the switch statement’s quirk, or with address labels, using the labels-as-values gcc extension. I’ll be working with the former method here.

The local continuation here is just an integer. The original protothread implementation used two bytes to represent it, I happened to use four. If I had to guess, the former just means that your functions cannot exceed 65536 lines, or the compiler could assign the same line number to two __LINE__ macros in different places.

/** Initialize a local continuation.
  * Also unsets any previously set continuation state.
*/
#define LC_INIT(s) s = 0;

/** Resume a local continuation.
  * Resumes a previously set local continuation, restoring the state of the
  * function when the local continuation was set. If it was not set previously,
  * the resume operation does nothing.
*/
#define LC_RESUME(s) switch(s) { case 0:

/** Set a local continuation.
  * Saves the state of the function at the point where the operation was last
  * executed. This state does not include the call stack or local (automatic)
  * variables, only the program counter (and such CPU registers that need to be
  * saved).
*/
#define LC_SET(s) s = __LINE__; case __LINE__:

/** Mark the end of a local continuation usage.
  * The end operation signifies that local continuations should not be used
  * further on in the function. This operation is only required by some
  * implementations of local continuations, like the one here that relies on
  * the switch construct.
*/
#define LC_END(s) }

Now we can define the set of macros on top of these that will let us think in terms of a “return and continue” mechanism inside our functions.

#define PT_WAITING 0
#define PT_YIELDED 1
#define PT_EXITED  2
#define PT_ENDED   3

/** Initialize a protothread.
  * Initialization must be done prior to starting execution of the protothread.
*/
#define PT_INIT(pt) LC_INIT((pt)->lc)

/** Declaration of a protothread.
  * All protothreads must be declared with this macro, which constrains the
  * signature of the function implementing it to return a character.
*/
#define PT_THREAD(fun_name_and_args) char fun_name_and_args

/** Declare the start of a protothread inside the function that implements it.
  * Should be placed at the beginning of the function. All C statements above
  * this will be executed each time the protothread is scheduled.
*/
#define PT_BEGIN(pt) \
{ char PT_YIELD_FLAG = 1; struct timespec PT_YIELD_TIME; \
  (void)PT_YIELD_FLAG; (void)PT_YIELD_TIME; \
  LC_RESUME((pt)->lc)

/** Declare the end of a protothread inside the function that implements it.
  * Always used together with a matching PT_BEGIN() macro.
*/
#define PT_END(pt) LC_END((pt)->lc); PT_YIELD_FLAG = 0; \
                   PT_INIT(pt); return PT_ENDED; }

All we miss now are the specific mechanisms that let us yield from protothreads.

/** Block and wait until a condition is true.
  * Blocks the protothread until the specified condition is true.
*/
#define PT_WAIT_UNTIL(pt, condition) \
do { \
    LC_SET((pt)->lc); \
    if(!(condition)){ return PT_WAITING; } \
} while(0)

/** Block and wait while a condition is true.
  * Blocks the protothread while the specified condition is true.
*/
#define PT_WAIT_WHILE(pt, condition) PT_WAIT_UNTIL(pt, !(condition))

/** Yield from the current protothread.
  * Essentially, it acts as a one-time blocking wait to be rescheduled.
*/
#define PT_YIELD(pt) \
do { \
    PT_YIELD_FLAG = 0; \
    LC_SET((pt)->lc); \
    if(PT_YIELD_FLAG == 0){ return PT_YIELDED; } \
} while(0)

/** Yield from the current protothread until a condition occurs.
  * Essentially, it acts as a one-time blocking wait, and then a conditional
  * wait (always blocking at least once).
*/
#define PT_YIELD_UNTIL(pt, condition) \
do { \
    PT_YIELD_FLAG = 0; \
    LC_SET((pt)->lc); \
    if(PT_YIELD_FLAG == 0 || !(condition)){ return PT_YIELDED; } \
} while(0)

The limitation here in comparison to coroutines is that protothreads cannot yield to any other function but the calling one. Still, this does not stop us from nesting protothreads by spawning them and returning control to them until they exit.

/** Block and wait until a child protothread completes.
  * Schedules a child protohread. The current protothread will block until the
  * child protohread completes. The child protothread must be manually
  * initialized with the PT_INIT() macro before PT_WAIT_THREAD() is used.
*/
#define PT_WAIT_THREAD(pt, thread) PT_WAIT_WHILE((pt), PT_SCHEDULE(thread))

/** Spawn a child protothread and wait until it exits.
  * This macro can only be used inside a protothread.
*/
#define PT_SPAWN(pt, child, thread) \
do { \
    PT_INIT(child); \
    PT_WAIT_THREAD(pt, thread); \
} while(0)

There are a few more macros needed to round these out for actual use, but I’ll omit those here.

rtcproto

Knowing all of the above, what does rtcproto add into the mix? It implements basic data structures to store calls in priority queues and their local states in unordered vectors, while making sure that those vectors stay homogeneous, by using one for each protothread (as the local state of a specific protothread will always be represented in the same number of bytes) in a hash table. Then, it builds a run-to-completion cooperative scheduler on top of these, providing a specific kind of convenience that is out of scope for the protothreads themselves. This implementation aims for the convenience I happened to be looking for: being able to defer calls to functions into the future, and to suspend their execution periodically to make them usable in soft-realtime systems. It provides some performance statistics to help with optimization.

Finally, it gives you a loop macro to help with yielding from tight loops without wasting too much performance on checking the clock on each iteration. Overall, it lets you write code that looks a little something like the following. This snippet was taken from a demo project that keeps stepping a cellular automata, and does many steps in the background if you press a key, while still remaining in sync with its main event loop.

PT_THREAD(trophs_bigstep(rtcp_pt_t* pt, void* state, struct timespec* end))
{
    TrophsBuffer* tbuf = *(TrophsBuffer**)state;
    int i;

    PT_BEGIN(pt);
    
    /* Loop that yields conditionally only every 10 iterations */
    PT_LOOP(tbuf->i = 0, tbuf->i < 20000, i, 10){

        /* Step the cellular automata */
        trophs_step(tbuf->passive);

        PT_LOOP_INCR(pt, end, ++tbuf->i, i)
    }

    PT_END(pt);
}

created

modified