Systemd in Embedded Systems: Don't Listen to the Hatemob

systemd, the (init?) system that everyone loves to hate, needs no introduction. It's pretty ubiquitous today -- to the point where you can find it, for example, in Yocto.

When it comes to its use in embedded systems, the (vocal) community response is either that it's too bloated, or that it's probably a fringe case where systemd is actually useful because it can speed up booting. Since the notorious optimum-in-the-middle -- rule 8 in my favourite set of rules ever -- is pretty hard to figure out in the climate of systemd-related online discussions, I figured I'd offer my non-flamebait take on the topic.

Fresh out of the oven, from someone who really does work on embedded systems -- some of them systemd-enabled.

In order to understand the (perfectly reasonable!) logic of using systemd in embedded systems, there are two things that the hive mind believes about embedded systems and that you should unlearn.

Embedded doesn't necessarily mean low hardware resources. Is a 4-bit microcontroller that does nothing but read a 16-position knob and drive a timer an embedded system? Sure. What about a Xeon machine with 16 GB of RAM? You bet it is! It's dedicated (as opposed to special-purpose) workloads that differentiate an embedded system from a general-purpose one. Even reliability isn't all it's hyped to be. Failure-prone embedded systems definitely do exist, and as long as they control non-essential processes where failure is an option, it's a perfectly valid design choice to allow certain failure conditions to occur and handle them correctly (gracefully is such a loaded word...) instead of designing the system to be failure-proof. It's rarely the right design choice, but it can be.

Booting time isn't relevant everywhere. Embedded systems that have some form of user interaction, for example, do need to boot up quickly. No one likes to wait two minutes before they can do anything with their device. Embedded systems that are in the middle of nowhere, have no user interaction, and don't need to respond quickly, can boot up in as long as they need. I've seen boot times of twenty minutes, not twenty seconds, that were considered a nuisance worth mentioning, but not important enough to fix anytime soon.

So what's the deal with systemd?

There are embedded Linux systems which consist of nothing but the kernel and one or two daemons. Even the venerable sysvinit system can be overkill for such a system: their init system is often a sh script that launches the two daemons, and service supervision is implemented in the form of a hardware watchdog. Systemd is probably overkill in this case but then so is everything else. You can make a case for the importance of minimalism in solid engineering but is it really a good argument to make when it involves the init system of an embedded device -- especially if it's a consumer product -- that runs Linux, quite possibly glibc, on a CPU that can run circles around the ones that are deployed on Mars?

Now, there are embedded systems, like the beast I linked to above, which run more things than you could possibly imagine. SDR boxes, routers, embedded RADARs/LIDAR boxes -- all these boxes run a surprising amount of stuff besides the RF-decoding, packet-pushing or pedestrian-hitting code. They have logging and telemetry and health checking capabilities and a bunch of other things that you don't really think about when you see a white box, but they're there. There are 100,000 USD boxes that sit in a rack somewhere, with nothing but a serial console for interaction capabilities, doing only one job, that nonetheless use Docker and Redis while doing it.

Systemd really is useful on these devices. These systems may not tick all the boxes that would qualify it as an elegant engineering solution for a more civilized age, but they bring billions of dollars per quarter in sales.

It's not because it speeds up booting, it's because it gives you a unified toolbox that everyone in the industry knows about which helps you manage all that complexity, .

By far the most important part is the one about managing complexity. You can argue that complexity is a hole that we've dug ourselves in and that we should struggle to climb out of it, not keep digging. You wouldn't be entirely incorrect, but some real-world problems really are complex enough that their solutions are also complex. Furthermore, if we're being cynical: throwing containers, Azure integration and cloud services at a problem may not be the best technical approach, but have you tried selling something that doesn't list all these things in the product sheet in 2019?

Like it or not, dealing with a tangled ball of services, developed separately, by separate teams is a very relevant aspect of embedded systems development. When you're looking at an embedded Linux device, you can be certain that at most 1% of the code it runs is developed by the company that's selling it. Its web server, RPC interface, message broker, logging system, job scheduler -- and many, many others -- are probably written by someone else. Many of them are written in ignorance or active disdain of "the Unix Way" -- but if they do what you need them to do better than anything else, you 're bound to use them, even if ESR would disapprove. All these things need to play well together, and systemd helps you with that.

Can it do anything you couldn't do with OpenRC and bash scripts? Probably not, but this is where the "everyone in the industry" part comes in handy. Uniformity is useful: it makes things cheap and easy to replicate. These are highly desirable things, especially in price-driven industries.

If a system is common enough that you can find out how to do anything with it using nothing but Google and StackOverflow, that's good. It means that you can pay an intern to fumble their way through it, which in turn allows senior developers to work on things that are more likely to make a difference to customers than how the system boots and how it restarts failing processes. Granted, there are devices where how the system boots and how it restarts failing processes is crucial -- but where that's not the case, why waste experience on it?

Sure, systemd is no free lunch. It brings a lot of complexity of its own and not all of it is easy to handle,. I've worked on projects where 10% of the time we billed easily went under "figuring out why systemd hates us", and another 20% could probably go under "who thought that was a good idea?". But that's an integral part of adopting complex solutions. 30% (a rare occurrence -- it's usually zero) of the total project time is a lot, but if the lower levels of a tech stack are too simple for the complexity of its upper levels -- and you often don't have a choice about the complexity of its upper levels -- 30% is a breeze compared to the alternative.

Incidentally, I think the debugging tools are a moot point. Systemd's debugging tools are, indeed, absolutely fantastic -- you can get SVG plots of how your system boots, for crying out loud. Systemd's uniform, centralized design is why we can get them, too. But realistically, the complexity and opaqueness of systemd are part of why they're needed in the first place. You can debug old-style init scripts with vi. Good luck debugging a systemd problem with nothing but vi and console logging. Increased -- and more complicated -- troubleshooting is part of the price that you pay for extra complexity. More advanced troubleshooting tools alleviate some of that but you're still paying for it -- you just get your money back after the show's over.

So here's the juicy stuff:

  • systemd is used a lot in the embedded Linux space, regardless of whether booting time is important or not
  • systemd is used precisely because it's so common and so easy to work with, even for people who don't have much Unix experience
  • We're all aware that there are less bloated, more Unix-y solutions around, but bloat and Unixness are not the only relevant qualities of a program

Do I recommend systemd? In many -- probably most -- cases, I do, actually. Non-systemd systems are rapidly becoming the exception, rather than the rule, so I tend to save exceptional recommendations for exceptional cases.

If I had complete freedom (and, consequently, responsibility) to choose and write every single piece of software that runs on a system, I would probably not choose it. But most projects are not like that. They include plenty of integration work, with pieces of variable quality and very different conventions and quirks. Systemd is very well suited for that