Moving CHERIoT RTOS to a tickless model

01HNNWZ0MV43FF · on June 8, 2024

Interesting. I don't know much about OSes, but in userspace there's an analog for event loops.

Almost all of them have a timer heap or timer wheel or something (that's below the level of abstraction I work at) which can always answer the question, "When do I wake up next?". And then every event is reducible to either a timer timeout or an external event like the file driver giving you data from a file, getting a packet from the network, mouse movement, etc.

Games often don't need this because they tick at 60+ Hz to render everything anyway, so it's easier to just hang everything off that and check every timer on every frame, it's going to be a tiny waste of CPU compared to the rest of the game logic.

However in one case I was writing an event-based program for work, using C++, with limitations on what dependencies I could pull in, so I set a fixed tick (something like every 1 second) and when I got external events, I just ran the whole tick function instantly. This introduced some latency but it was in places that didn't matter, and it made the code really easy to follow.

londons_explore · on June 9, 2024

It is sad that event based code is so hard to read.

I wish some programming language would let the programmer write the code as-if polling, but then transform the program automatically to something event based at compiletime.

Eg.

    While(true):
      if (button_is_pressed) do_x()
      If (something_else) do_y()

The Linux select() API is nearly that, although it's rather cumbersome.

eurleif · on June 9, 2024

Why is that easier to read than something like this?

    function onButtonIsPressed() {
        doX();
    }
    function onSomethingElse() {
        doY();
    }

krisoft · on June 9, 2024

With the second I have to potentially worry about thread safety. What if while i’m busy doing X in response to a button press something else happens and I also need to do Y?

With the first version of the code it is pretty obvious what will happen (i will finish doing X and then I will do Y). With the second code I have to read my platform’s documentation to know if it will buffer the second event while the first is handled, or drop it, or call the handler from a different thread.

eurleif · on June 9, 2024

That's a pretty fundamental thing to understand about the platform you're using, isn't it? You learn it once, and then you never have to learn it again.

Plus, it's not entirely clear in your example what happens if "something else" occurs, and then while `do_y()` is being executed, a button press occurs. Is the button press event buffered, or is it dropped?

Also, given that you're proposing automatic transformation of programs (by the compiler?), and your `while` loop would become shorthand for some unspecified other program, there would potentially be a lot of hidden magic to be aware of. Your program could even be transformed into a program which would process the events in multiple threads.

layla5alive · on June 8, 2024

It's easy to assume OS schedulers have a lot of intellect behind them, but in reality, like most software, they're often shipped in a half-baked state. Even mature schedulers like the Linux kernel are far from perfect.

It surprised me that yield wasn't a concept from the beginning for this RTOS (reading it, when they started describing the problem, I was already asking "why aren't they yielding instead of sleeping?"), but glad to see the improvement in the next paragraph.

There are always things to improve, which I find refreshing - I hope AI doesn't take that away. Or if it does, it had better also eliminate poverty, war, etc. Let people spend their days playing.

klyrs · on June 8, 2024

No, people are using AI to do all the fun stuff like art, poetry, programming. Humans are great at cleaning the bits out of a meat grinder tho.

theamk · on June 8, 2024

> almost all of the calls to thread_sleep wanted to yield to allow another thread to make progress, rather than to guarantee some time had elapsed.

That's .. deeply alarming and likely indicates the bad code. I've seen many beginner programmers use sleep/yield to busy-wait on something, and it's almost always a bad design decision: using proper synchronization primitive would be safer and more performant too.

Just to make sure that my guess was correct, I've searched CHERIoT codebase for some thread_sleep calls. There were only 15 hits, and most of them were related to implementation; only two examples were actual use in the code: busy-wait in TLS stack and polling for response in SNTP driver. Both of those are bad design - they should be using proper synchronization mechanisms.

mafuyu · on June 8, 2024

You're not necessarily wrong, but I wouldn't call it alarming in the context of an RTOS. Sometimes you just want to invoke the scheduler and let other stuff run for a while. In an RTOS, yielding or sleeping for a number of ticks has deterministic behavior. If you sleep for a few ticks, come back, and then have to sleep again, it's not the end of the world. Blocking on a concurrency primitive would be better, since you wouldn't get scheduled at all, of course, but sometimes you just don't have what you need hooked up. I'm not surprised to see them in the net stack - there's a lot of layers involved in networking, and network latencies are usually orders of magnitude longer than a scheduler tick.

comex · on June 9, 2024

I searched the cheriot-rtos repo and found a lot of calls from tests (either directly or through the wrapper function `sleep`):

https://github.com/search?q=repo%3Amicrosoft%2Fcheriot-rtos%...

That's a context where it makes a lot of sense to use sleep. I've written similar test code myself for a different RTOS. Most of the callers are testing various form of synchronization mechanisms themselves, so using proper synchronization isn't really an option.

The non-test uses you found are more dubious though.

kats · on June 9, 2024

One suggestion is they could make the thread_sleep() function always yield. Then have a lowest-priority task that actually sleeps.

brcmthrowaway · on June 8, 2024

Is Linux/Windows/Mac tickless?

sillywalk · on June 8, 2024

From https://en.wikipedia.org/wiki/Tickless_kernel

"The Linux kernel on s390 from 2.6.6[2] and on i386 from release 2.6.21 can be configured to turn the timer tick off... The XNU kernel from Mac OS X 10.4 on, and the NT kernel from Windows 8 on, are also tickless. The Solaris 8 kernel introduced the cyclic subsystem which allows arbitrary resolution timers and tickless operation. FreeBSD 9 introduced a "dynamic tick mode" (aka tickless).

"

brcmthrowaway · on June 9, 2024

What about beOS?

lionkor · on June 8, 2024

Unsure if this is clear so I'll say it: Those are not real time OSs

sertsa · on June 8, 2024

Pretty sure that most people here know that, so it remains a less obvious yet interesting question (while tangential to the posted article): Are Linux/Windows/Mac tickless?

ninkendo · on June 8, 2024

I know Linux is, the rest probably are too. I remember when tickless support was merged:

https://lwn.net/Articles/549580/

https://lwn.net/Articles/659490/

yencabulator · on June 9, 2024

My Linux sure is. It's a good idea for battery-powered devices.

    $ zgrep ^CONFIG_NO_HZ /proc/config.gz
    CONFIG_NO_HZ_COMMON=y
    CONFIG_NO_HZ_FULL=y
    CONFIG_NO_HZ=y