How Relevant are C Portability Pitfalls?

An article about C Portability Lessons for Weird Machines has been making the headlines on the Interwebs lately. It's full of interesting examples, though none of them are from machines relevant to the last two decades of high-end computing.

I think these lessons are still relevant today, though, and that you should still pay attention to them, and that you  should still write "proper" code. Here is why.

Many of the machines with interesting pointer and/or integer representation properties have quirks that smell of various types of hardware optimizations. The MOS6502 faster access time to zero pages was a clever optimization to the problem of slow memory access; the 8051 was quirky not so much because Intel's designers hate elegant instruction sets, but because fitting every gadget that a microcontroller has on 1980s-era silicon was tough.

9-bit chars and 18-bit ints on general-purpose machine sound weird, but 10-bit or 12-bit ADCs are normal, as are TCAMs with 2000 (not 2048) or 8000 (not 8192) entries. Granted, 8-bit bytes are not "just" an accident anymore, but in the wider landscape of digital logic design, round numbers are not all that common.

The hardware we see at the horizon is starting to have some of these traits again. We're looking at heterogenous and disaggregated and highly-programmable hardware. And control-plane CPUs are a somewhat more democratic field lately, as RISC-V is coming of age, so to speak.

Granted, much of this exotic hardware is not general-purpose, but the way special-purpose hardware is integrated into systems is sometimes quirky (the 6510 on Commodore 64, for example, was a special-purpose device; it was memory-mapped at addresses 0x0000 - 0x0001, causing programmers to dread porting anything from the Apple II, where these addresses meant nothing special, but they were on the first page, which was faster to access and the only one you could do indirect indexed addressing with).

Programmable hardware is not (always) programmed in C, but interoperability will still be expected. We could be talking about Rust instead of C, of course, but strange hardware quirks still remain relevant, even if only to those designing runtimes. Practical experience, however, shows that hardware details still leak even past the most well-designed abstractions; and when they don't, performance constraints will eventually force upper-layer programmers to make it leak.

We are nearing the end of the era that begun in 2000-2001 or so, and during which most high-level programmed logic was general-purpose x86(_64) processors. Multicore, big.LITTLE machines that interface peculiarly (and often share memory space) with LUT-constrained devices are what we are looking at in the future, and these devices will cause many ghosts of the old days to haunt us again.

I don't think byte width is going to be one of these ghosts (albeit it's worth saying that the last time I worked on a 32-bit architecture with sizeof(char) = sizeof(int) = 1 was only three years ago, and it is very much a current architecture). But assumptions about overflow behaviour, addresses, and conversion between data types (especially pointers and wherever serialization/deserialization occurs) are ripe for the taking.

It may be possible to avoid exposing most programmers to this through a correct, "standard", more sane (!?) runtime, a la Rust. However, based on how hard it was to write ANSI-compliant C compilers for some of those old machines, the effort involved in writing a compliant Rust compiler and runtime may turn out to be more than what a (so far) largely volunteer-driven team can do quickly enough.

Until then, remembering which overflows are undefined behaviours and which aren't may still be relevant.