I think they're all ideas that are relatively obvious intuitive responses to the problem, and yet they may only incrase complexity tbh. For example, constexpr can be used relatively independent of template programming even, yet where they can be used practically before it becomes an unmaintainable mess of boilerplate are the most trivial cases, almost those which you could have hacked in with macros. TBF I think if you need serious metaprogramming, just compile and run a program at compile time.
Reflection has always been a mess no matter which implementation or language I've used. Fine for scripting languages, unusable for anything serious complex. The data you need is never there, and the data that is there is unusable, at the wrong semantic level (programming language level not what actually your own domain model semantics).
Also I avoid templates for the same reason, they're quickly becoming unmaintainable. Yes, I've tried to make use of them many times, and I have a fair number of them in deployed software. They work without bugs, of course. But I still don't love them, they're boilerplatey hard to maintain complexity that would be better solved with the right factoring plus a tiny bit of ad-hoc boilerplate. I would like to remove many of my deployed templates if I had the time.
And yes, I even avoid std:: template containers and such. Most uses I regret later. Again, this is for systems programming. They're fine for "scripting", leetcode, business software.
Is writing compilers, linkers, database servers, HPC and HFT platforms, OS drivers, networking stacks at IP level, considered systems programming accordign to you, or are they plain business software?
I said, I avoid, I don't love, I was talking about preference. And I'll state: Most of these are written mostly like I say. Please find serious counter-examples.
You must be talking about Linux, the BSDs, sqlite, postgres, gcc, the mold linker, or let's take some new kids on the block: raddebugger, FilePilot, TaskSlinger?
I am for example talking about LLVM and GCC, used to compile all those examples.
Living in the past? GCC has long adopted C++, last time it compiled with a pure C compiler was back in 2011 thereabouts, not cross-checking the exact year.
Actually care to open GCC and see what I mean? Check the newest commits and see what they do. Maybe you're living in a dream world where some magic language features do the work for you. Meanwhile people out in the field do actual work by just pushing bytes at the low level.
> Necessary to bootstrap GCC. GCC 5.4 or newer has sufficient support for used C++14 features.
> Versions of GCC prior to 15 allow bootstrapping with an ISO C++11 compiler, versions prior to 10.5 allow bootstrapping with an ISO C++98 compiler, and versions prior to 4.8 allow bootstrapping with an ISO C89 compiler.
> If you need to build an intermediate version of GCC in order to bootstrap current GCC, consider GCC 9.5: it can build the current D compiler, and was also the version that declared C++17 support stable.
Why are you unable to get my point? I understand that GCC doesn't compile with plain C compiler anymore. A lot of my own code doesn't!
I'm saying that most of features like templates, constexpr, reflection etc. don't scale well to serious use, as a broad statement. I fully acknowledge this is not a black and white situation. But I encourage you to look at actual pedestrian code, it's mostly not abstracted fluffy magic template code at all. It's pushing individual bytes with totally basic means (mostly C code). Why? Because code using these fluffy features is terribly hard to maintain. Templates lock you in their own language world with incredibly bad syntax and bad ergonomics, in short: it's a pain!
Personally I think even C++ classes (i.e. 1980's C++) are unusable because they bifurcate syntax/semantics needlessly and add implicit invisible scope. But I acknowledge it's somewhat possible to program with classes, and some people like to lean on RAII heavily. I mostly do not like to use RAII, and I've tried many times, I think it sucks for non-toy programming, even though obviously the idea is intuitive.
Caring for the actual assembler output in selected critical pieces of code is not the same as ignoring the abstract machine model. What you claim is simply not the case if you check actual proficient systems programmers. Of which there are an astonishingly high share C and C++-but-mostly-C programmers.
Any user of compiled languages cares about Assembly, which is why regardless of the compiled language, an Assembler was always shipped alongside.
Also it isn't a C invention to have the compiler dump the Assembly output instead of object code.
Now the culture that C language constructs in 2026 are still 1:1 to Assembly instructions, that pretty much prevails, despite easy proof that isn't the case at various compiler optimization levels.
Proficient devs, well many still don't know to distinguish what is their compiler, and what ISO says.
It is the case that you can more easily know what happens when you don't use the wrong abstractions but stay in control. Highly-abstracted C++ code basically makes allocations and syscalls in the whitespace between the source code tokens.
You can't do systems software like that, you have to roll back the abstractions and roll back the use of pre-canned containers and libraries that you don't understand.
So it's all about understanding and control, not about some idea that C was defined in terms of assembly instructions, which it obviously is not. That's a total strawman.
There is not much real evidence for "devs wrongly assume" and as someone writing numerical code (clusters, NUMA, SIMD, etc.) I think C is still the ideal tool for this.
Assuming you mean the standard does not provide features for numa and simd? It doesn't necessarily have to. I think it is not surprising that you seem always bewildered that people still use C (as per your comments), as it seems you fundamentally do understand neither standardization nor systems programming.
such a strawman again... You don't want to be writing explicit platform specific SIMD most of the time. You just want to write a dumb function that doesn't do any non-obvious calls, doesn't cause thread contention, doesn't hide complexity, isn't going to be a nightmare to change later, no surprises.
I am talking about self-inflicted complexity that is entirely within the C(++) machine model. Avoid that complexity and you're pretty good already. Only drop down to concrete hardware arch level where it makes sense. But largely, the C machine model is still very much suited as a model for actual hardware. Writing straightforward obvious code allows you to stay in control of memory layout and the data transformation paths. It easily gets you within <<2x of what you could achieve with hand coded assembler for the >90% of the code that are pretty boring and straightforward. And obviously you couldn't get the work done in time when coding everything in assembler.
I have seen plenty of self inflicted complexity in C, starting in the golden age of Yourdon Structured Method, and all those libraries that replicate C++ basic features with preprocessor macros.
That's true, and that's why your typical string vector code has a prelude and a postlude to do the incomplete chunks at the ends. Between the ends, it's processing larger self-aligned chunks.
You didn’t really say that, but feel free to share any reasons you might have to think so.
I don’t see any reason why it wouldn’t be perfectly fine on recent hardware, where unaligned loads are just as fast, and the cache pressure is identical for a linear search algorithm.
I asked where is the part about unaligned pointers in your string processing example. Saying that you want to load multiple bytes at a time does not imply at all that you have to do unaligned loads.
Doing unaligned loads using SSE or AVX might have been possible on Intel architectures for a long time, but it is still a little bit slower afaik. But anyway when you get into sub-architecture specific details like that, you've essentially left C-land, and you're essentially doing assembler level programming.
Every vectorized string search algorithm (including those treating an unsigned long as a "vector" of 8 bytes) currently needs a prelude that performs the search up to the first alignment boundary, and then performs the bulk of the search on well-aligned blocks, and then finally a postlude search in the tail of the string, where the tail is shorter than the block size/alignment.
Using unaligned loads, you can get rid of the prelude, including the associated branches and intptr arithmetic, and just have to deal with the tail.
If you're comparing short-ish strings, almost all of the time is spent in the prelude and postlude, even if the entire substring fits in a register. This is a silly language limitation when the hardware can actually easily just support the unaligned load.
In particular, it doesn't seem justified that what at most amounts to a tiny inefficiency in hardware turns into a very expensive class of bugs (UB).
Have you ever wanted to do this? I find the premise ridiculous.
But anyway, you're complaining that you have to work too hard to do unaligned loads (i.e. the wrong thing even if it should work on a particular machine) in C, when basically every other language makes you work more for basic systems programming tasks?
Whether unaligned loads can work on the machine level, it depends on the hardware. On some other architectures, you probably get anything from traps to unpredictable behaviour. It's totally fine that C does not define the behaviour for unaligned loads.
If you want to do some weird stuff like loading a single unaligned 16 byte quantity, where there was no "middle part" to begin with, just do memcpy then. The compiler might just do the appropriate thing on this architecture. Or if you need to closely control what's happened, write assembly then. But again, why would you even do this?
I would agree that C is "really flexible", but I would say it's primarily flexible because it lets you cast say from a void pointer to a typed pointer without requiring much boilerplate. It's also flexible because it lets you control memory layout and resource management patterns quite closely.
If you want to be standards correct, yes you have to know the standard well. True. And you can always slip, and learn another gotcha. Also true. But it's still extremely flexible.
The problem is that a lot of the flexibility introduced by UB doesn't serve the developer.
Take signed integer overflow, for example. Making it UB might've made sense in the 1970s when PDP-1 owners would've started a fight over having to do an expensive check on every single addition. But it's 2026 now. Everyone settled on two's complement, and with speculative execution the check is basically free anyways. Leaving it UB serves no practical purpose, other than letting the compiler developer skip having to add a check for obscure weird legacy architectures. Literally all it does is serve as a footgun allowing over-eager optimizations to blow up your program.
Although often a source of bugs, C's low-level memory management is indeed a great source of flexibility with lots of useful applications. It's all the other weird little UB things which are the problem. As the article title already states: writing C means you are constantly making use of UB without even realizing it - and that's a problem.
If we're talking two's complement it's not undefined that is right.
Having to emit checks though, that is where I beg to differ.
A check is only useful if you want to actually change the behavior when it happens, otherwise it is useless.
Furthermore, it might be "essentially free" from a branch prediction point, but low and behold caches exist.
You would pollute both the instruction cache with those instructions _and_ the branch prediction cache.
From this it doesn't follow at all, that there is no cost.
In the end small things do add up, and if you're adding many little things "because it doesn't cost much nowadays" you will end up with slow software and not have one specific bottleneck to look at.
I do agree that having the option for checked operations is nice (see C#), but I have needed this behavior (branching on overflow) exactly once so far.
> A check is only useful if you want to actually change the behavior when it happens, otherwise it is useless.
You almost always want to change the behavior to erroring out on overflow. The few cases where overflow really is intended and fine can be handled by explicit opt-out.
And I refuse to buy the argument that "small things add up" in the world where we do string building and parsing every few microseconds. Checked math will have unnoticable impact compared to all the other things we do, in almost every type of program.
This string manipulation stuff is very common, and that's why in 2026, an age where science fiction has become a reality, many things are still absurdly slow. Exactly because of such sloppiness, which does accumulate in many cases, and when one least expected it.
100% agreed on the sloppiness. But overflow checking is not sloppiness. It's the opposite of sloppiness. Unchecked math is sloppiness, allowing overflows to happen silently and uncontrollably is sloppiness. It just so happens this kind of sloppiness makes code faster, unlike other kinds of sloppines that make code slower. Not doing necessary safety checks is faster than doing these necessary checks, but it doesn't make these checks any less necessary. Not validating user input also makes code faster, and is also sloppy.
It is defined as an error. That error’s default handling is wrapping when debug_assertions is off, and panic when it’s on, but since it’s an incorrect program (though not UB) either behavior is acceptable in any mode.
No. An integer getting deterministically set to an unintended value is a bug. A bug is not the same thing as UB. (Even if it were non-deterministic, it would still not be anything like UB.) It's not the same ballpark, not even the same sport.
What if the wrapped index is used to construct an invalid pointer? It might be possible, not sure. What if the integer is used to read the wrong data from disk, or corrupt data on disk by writing to the wrong location?
> What if the wrapped index is used to construct an invalid pointer?
Constructing an invalid pointer in rust is UB, yes, but integer wraparound is not.
> What if the integer is used to read the wrong data to a disk, or corrupt data on disk by writing to the wrong location?
Then it is a very bad bug.
> What if the program controls a nuclear power plant and the integer causes the control system to fail, causing memory errors due to radiation from the meltdown?
Then it is a very very bad bug.
> What if the wrapped integer causes the program to output the true name of god, and the programmer, in their last minutes of existence, looks up to see, overhead, without any fuss, the stars going out?
It's indistinguishable from unspecified behavior, not from undefined behavior. Unspecified behavior has to pick from a finite list of allowed behaviors. Undefined behavior can do anything.
A program with corrupted state can essentially do anything. Yes it's still a question of run-time checks the runtime has to protect against it. But the compiler is probably deriving a lot of assumptions from the assumption that there wasn't overflow.
But did the rust compiler assume that the integer would not overflow? It did so in Debug mode where runtime checks were added. If it's not the case in Release mode, does that mean semantics are different between Debug and Release?
The semantics are well-defined in both modes. You can predict exactly what will happen in either case. In C, the semantics are not defined at all, you can't predict what will happen and it's allowed to change between compilations of the same source.
It will probably get omitted, since Undefined Behavior isn't allowed by the C abstract machine, but sadly compilers are allowed to emit code for UB in the source (partly because some UB is only detectable at runtime). Sometimes disabling optimizations will incorrectly allow codegen to run for source lines which have UB, tricking people into thinking that optimizations are breaking their program. Compilers are allowed to do this, since behaviors other than "omit the offending statement" are unfortunately allowed by the standard, so it's not a compiler bug.
UB is a runtime property. As far as you can statically verify some code parts, you can see UB at compile time, but the point of UB is exactly that it is about stuff you can't predict, or that is hard to predict as a compiler.
Now why you can cook up trivial artificial examples where a compiler will remove some code sections based on statically detected UB, instead of printing an error, you have to ask the compiler authors.
> The semantics are well-defined in both modes.
So they're not the same? So the behaviour is not uniquely defined by the source code alone, but is actually _very_ different based on compile mode? Between two modes whose point was never to have different semantics, but to have the _same_ semantics while being debuggable vs being fast?
> You can predict exactly what will happen in either case. In C, the semantics are not defined at all, you can't predict what will happen and it's allowed to change between compilations of the same source.
You can make the same "predictability" argument for C, you can easily write a compiler that has semantics exactly laid out. Case in point: -fwrapv. Case in point: UBSAN.
You can write a C compiler with exactly laid out well-defined semantics. You can't assume those semantics hold for C-the-language, because it doesn't define those semantics. UB is a property of the language, not just of a given compiler. The Rust reference defines the semantics of the safe subset of Rust without any UB, so any compliant Rust compiler won't have UB in that subset. The reference also defines the guarantees which the programmer must uphold within `unsafe` blocks to avoid UB, as long as those are upheld there's no UB at all.
I understand that. It makes no practical difference. 99,99% of my additions don't rely on signed overflow for example, and if I'd ever need it there are ways to get just it.
Or tell me how you write a Rust program differently given that signed overflow is apparently defined? I bet you write it exactly the same way, and you get pretty much the same behaviour in practice. And we're even only debating actual overflow situations, meaning there is a bug whatever the compiled behaviour is.
C the language doesn't even guarantee that the machine has native integers with 8, 16, 32, 64 bits etc, that a cacheline is 64 bits, that a page is 4K, and here I am, writing programs for exactly that.
> But did the rust compiler assume that the integer would not overflow?
It did not.
> It did so in Debug mode where runtime checks were added.
It didn't assume in that case either. It did a well defined thing: add checks.
> If it's not the case in Release mode, does that mean semantics are different between Debug and Release?
Strictly speaking, the language doesn't know about "release mode", as that's a Cargo thing. But yes, in practice, the semantics are different based on various things: it could be debug vs release, it could also be flags that control the behavior. But that's still distinct from "undefined behavior" as a concept. The behavior is well defined, with multiple possible options for behaviors.
So in Rust, you are actually specificing TWO programs with a single source? Those Rust users are surely too clever for my liking!
You can tune a C compiler as well to have a very specific defined behaviour for integer overflow. You can add -fwrapv or you can add UBSAN.
The user never intended overflow to happen, because if they did, they could have used something like __builtin_mul_overflow() or whatever. Or they are an emotionally unstable user with destructive tendencies. The user also never intended the program to abort with a (nicely formatted) error message, unless they are a very very sad depressed nihilistic user who also never runs their program in Release mode.
To say that overflow would be defined in Rust is at least half a lie. We could agree that cargo has a choice of diagnostic policy though, a policy how to handle what is essentially a state with no defined or useful path forward, or in other words, UB.
Throwing errors might be a wanted property to detect oversights. C ecosystem has UBSAN too! But essentially the same is still true: Basic arithmetic operations are not closed over the numbers 0..2^N. Rust doesn't have a (unique and useful) definition for those operations for a subset of numbers. Even if you claim the operations are defined (say wrapping arithmetic in Release mode), it's not what the programmer wants. Probably the majority of algorithms work over natural numbers or integer numbers. These algorithms don't work when the arithmetic on integers modulo 2^N.
So the user has to constrain the set of valid inputs, and do manual sanitization, just like in C.
> You can tune a C compiler as well to have a very specific defined behaviour for integer overflow. You can add -fwrapv or you can add UBSAN.
This is an example of a compiler flag that adds definition to undefined behavior, which is of course, legal to do. That doesn't change that in the standard, it is undefined behavior, and in Rust, it is not.
> To say that overflow would be defined in Rust is at least half a lie.
In the context of "undefined behavior", it is not a lie at all.
> So the user has to constrain the set of valid inputs, and do manual sanitization, just like in C.
No, because the consequences of how the two languages define these behaviors are very, very different.
No, “release mode vs debug mode” is defined in Cargo. What’s defined in Rust is the debug_assertions flag, which is one of the things that Cargo will set by default as part of the debug mode by default.
> Just saying that it's defined and then not saying what the definition is, is no different from saying it's undefined.
It actually is, because, as I said earlier, “undefined behavior” is a term of art with very specific meaning. Regardless, it is defined: there are two possible behaviors, with one guaranteed with that flag and the other chosen by implementations.
I think people make up way too much of it. What is the actual term of art? What is the meaning of UB? If you look in the standard, UB is basically what its name says, it is behaviour (or state) that is not defined. It can be anything. And that makes sense in many cases: What if you construct a random pointer, and read it or write it? It's not useful or practically possible to define the behaviour from then on. So the behaviour is left undefined, simple as that.
Now are there many cases of UB in C, many more than strictly need to exist on contemporary platforms? For sure there are. But does it affect me? Not unless I need a specific behaviour common to most contemporary platforms that I can't get within the confines of C, even considering compiler specific extensions. Honestly I can't come up with any of the top of my head. Maybe some integer-shifting stuff or such, if the compiler was able to prove I'm doing sth undefined, it can leave out that code (or delete my mail, for the doomers). Personally, it hasn't happened to me, and it's on the compiler authors to not do stupid things too.
Leaving all the semantic hair-splitting aside. What is the practical difference in how you write a Rust program compared to a C program, given that integer overflow is "defined" in Rust?
> It didn't assume in that case either. It did a well defined thing: add checks.
It did. The compiler added the checks (which panic on overflow, from a quick web search) precisely so it (and importantly, the developer!) can assume the overflow didn't happen in the subsequent code. Unless you consider a panic a defined state, and consider wrap-on-overflow equally valid in all cases, it's essentially the same as UB. (panic seems to be considered "unrecoverable").
Difference is _at most_ that C spec gives compiler more freedom to "implement UB", but then again, hit any unsafe code in Rust with wrapped around integer, you probably have comparable practical result -- machine doing random things, corrupting memory and so on.
You can run your code under ASAN and UBSAN nowadays, it will catch many or most of issues as they happen.
But that's completely besides the point. UB on signed overflow, or really most of UB, is not unrelated to C flexibility. It is a detail of the spec related to portability and performance. IIRC it is even required to make such trivial optimizations as turning
for (int i = 0; i < n; i++) func(a[i]);
into
for (Foo *p = a, *last = a + n; p < last; p++) func(p);
saving arithmetics and saving a register, on architectures where `int` is smaller than pointers. But there is also options like -fwrapv on GCC for example, allowing you to actually use signed overflow.
IIRC computation of the address is done by computing offset from base pointer as a multiplication in (32-bit) int, (like p + (i * sizeof (Foo)). The right term might overflow, but due to signed overflow being UB, the compiler is able to assume that it does not, so the transformation to do the arithmetic entirely in (64-bit) pointer space is valid.
Exactly. You as the programmer know that the loop counter won't overflow, and in general, essentially nobody would actually write it that way. But if you don't assume it can't happen, the possibility for signed overflow is everywhere in address computations.
This is also a major blocker for auto-vectorization. Can't coalesce a load of a[i], a[i+1], a[i+2], a[i+3] into a load of a[i:i+3] if there's a possibility that `i+1`, `i+2` or `i+3` wrapped around (thus causing your "contiguous" load to be non-contiguous). This is a big reason why you shouldn't use `unsigned` for loop counters, especially if they're going to be used as an index into an address calculation.
But surely the more natural approach than making this undefined behavior would be making the computation of a[i] take place in 64-bit pointer space rather than 32-bit int space? Why does the compiler need the freedom to emit nasal demons?
It's not flexible in practice, because knowing the standard isn't optional. If you make the choice to not follow the standard, you're making the choice to write fundamentally broken software. Sometimes with catastrophic consequences.
I'm making the choice to pass pointers as void to get low-friction polymorphism. I'm making the choice to control the memory layout of my data structures, including of levels and type of indirection. I'm making the choice to control my own memory allocators and closely control lifetimes, closely control (almost) everything that happens in the system.
That has nothing to do with not following the standard.
If you don't follow the standard, gcc -O2 can introduce bugs to your code that you never even wrote. Skipping null checks, executing both branches of a conditional, and so on.
> If you want to be standards correct, yes you have to know the standard well.
to mean that being standards-correct is optional. It's not. Every C programmer needs to know every possible UB by heart and never introduce any of it to their code, or else they'll be constantly introducing subtle, hard to debug bugs that contradict the actual code they wrote.
Maybe you meant something different by those words, but then I'm confused what the "if" was supposed to mean.
Of course it's optional (although I didn't mean to imply that). Even using computers at all is optional. I never said that I don't aim to follow the standard, have a clean compiling program without warnings and without UB, etc. I do strive to achieve all of that.
But it's not entirely black and white, either. In practice I'm fine accepting that some bugs are technically UB but whatever, we've found a bug by whatever manifestation (like NULL dereference most likely leading to segfault in practice). I just fix the bug as a bug, and life goes on.
The standard is not perfect, it does have shortcomings. It can be improved. And it can be interpreted to fix some issues. Let's not hold theory over practicality, and let's expect the compiler writers also strive to do the reasonable thing.
In practice, GCC -O2 will happily erase entire swathes of code and turn perfectly logical source into nonsense assembly whenever it gets as much as a sniff of UB anywhere in the code path. Nobody would be talking about UB if GCC wasn't so aggressive in abusing the freedom UB gives.
To paraphrase your earlier comment - you lose low-friction polymorphism (unpredictable compiler output causes a lot of friction). You lose control of memory allocations (because they may have been elided) and lose control of lifetimes (because free can be moved before last use causing a crash, or removed entirely causing a leak). You lose control of (almost) anything that happens in the system. And it has everything to do with not following the standard.
You do retain control of the memory layout of data structures, though.
Then I'm almost ashamed to admit that I'm not sure I've ever witnessed any surprising form of UB in the wild. For example, I will reliably get segfaults on NULL dereference in practice. Typical manifestations of UB are entirely predictable and obvious. Of course I'm also running most code without most optimizations, most of the time, while developing.
On the other hand, what I've observed with my own eyes is interesting phenomenons like performance drops, e.g. memory bandwidth dropping from gigabytes/sec to 300 KB/sec due to false sharing on an ARM SOC for example.
There was once a privilege escalation vulnerability in Linux kernel that only happened when compiled with optimizations. In kernel space, address 0 is just regular memory that can be read from and written to if there's a page mapped to it. But in C standard, reads and writes to null pointer are UB.
There was some function that read from a passed pointer unconditionally whether it's null or not. It made sense in context. Then it checked if the pointer is null - if it is do early return, if it's not do privileged operation. The pointer was null iff the user didn't have permissions to do the operation.
What GCC did is notice that a pointer is accessed before its null check. Since accessing a null pointer is UB, and GCC assumes UB never happens, it figured out the null check is superfluous. And removed the check and the early return. The pointer read stayed, mind you. The optimized function would unconditionally read from the pointer even if it's null, then unconditionally execute the privileged operation without checking permissions. That allowed obtaining root access from anywhere.
I saw a few other writeups of interesting UB behavior on The Old New Thing blog. I especially like the time travel one: https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63... (apologies to people of the future, links to MS devblogs tend to die often).
Not sure why you're being downvoted. That's completely right. The example is silly. The code is obviously bad, doesn't matter if it's UB or not.
I'm also not convinced (yet) that the example really is UB: I agree reading a volatile is "a side effect" in some sense, and GP cited a paragraph that says just that. But GP doesn't clearly quote that it's a side effect on the object (or how a side effect on an object is defined). Reading an object doesn't mutate it after all.
But whatever language lawyer things, the code is obviously broken, with an obvious fix, so I'm not so interested in what its semantics should be. Here is the fix:
volatile int x;
// ...
int val = x; // volatile read
printf("%x %d\n", val, val);
The problem is that the function call as a whole is UB. Having the original example compile to the equivalent of
volatile int x;
int a = x;
int b = x;
printf("%x %d\n", a, b);
is equally valid as
volatile int x;
int a = x;
int b = x;
printf("%x %d\n", b, a);
, and neither needs to have the same output as your proposed fix.
C could've specified something like "arguments are evaluated left-to-right" or "if two arguments have the same expression, the expression is [only evaluated once]/[always evaluated twice]". But it didn't, so the developer is left gingerly navigating a minefield every time they use volatile.
Not only is "arguments are evaluated left-to-right" less easy to formalize than you think, it would also make all C code run slower, because the compiler would no longer be able to interleave computations for more efficient pipelining. The same goes for "expression is [only evaluated once]/[always evaluated twice]".
Of course the developer is navigating a minefield every time they use volatile, that's why it's called "volatile" - an English word otherwise only commonly used in chemistry, where it means "stuff that wants to go boom".
Your argument makes no sense since the developer is expected to perform manual sequencing. Correctly written UB free code cannot be interleaved either.
All you've achieved is that the standard C function call syntax can no longer be used as is.
the compiler can still interleave anything it shows is side-effect free; it’s hard to show that something would benefit from being reordered without analyzing it well enough to determine what side effects it has
I understand, that's why I said the code is obviously broken. The problem is not about order of evaluation. It's not about an UB arising from unsequenced volatile reads or whatever.
The problem is simply that the there are two volatile reads where only one was intended. It doesn't matter if there is UB or not. The code doesn't express the intention either way. All you need to know to understand that is that volatile might be modified concurrently (a little bit similar but not the same semantics as atomics).
Erm... that's not just false. The point of templates is generic programming, reusable components. If you don't put them in a header, you're not reusing them much. And if you have to "selectively pick TUs where they're instantiated", you're basically admitting that you have to invest effort to reduce compile times. You are refuting the very point you're making.
C++ templates _are_ slow to compile. They require running something like a dynamically typed VM in the compiler.
**** Template sets that took longest to instantiate:
833 ms: sf::base::Optional<$> (911 times, avg 0 ms)
Each individual instantiation of this class is sub 1ms.
Including the header itself takes 3ms.
I'm sure I can optimize it even further if I wanted to.
---
Now to refute your other incorrect claims:
> The point of templates is generic programming, reusable components.
That's ONE use case. A more general use case is just reducing code repetition in a type-safe manner, which is extremely useful even within the same translation unit. Another use case is metaprogramming. And I'm sure I can come up with more. Templates are a versatile tool.
> And if you have to "selectively pick TUs where they're instantiated", you're basically admitting that you have to invest effort to reduce compile times.
...well, yeah? Of course you have to put in effort to reduce compile times. That doesn't undermine my point at all.
Not slow to compile? 0,833 seconds extra compile time for a trivial utility class that doesn't do anything interesting other than make something perceived "safer"? Does that mean that each of the 911 instantiations took several million CPU ticks? You could convince me that it's not slow if it was 2-4 orders of magnitude less.
As I wrote elsewhere, 1 second is a timespan where we could aim to compile 1 MLOC of code on a single core.
> A more general use case is just reducing code repetition in a type-safe manner
As I said -- code reuse. And interestingly your Optional.hpp is a header...
That's a strange dismissal. `Optional<T>` isn't "perceived" safety -- it eliminates a whole category of bugs (null dereferences, uninitialized reads) at the type-system level, with zero runtime overhead versus a raw pointer or sentinel value.
If you think that's uninteresting, that's an aesthetic preference, not a technical argument.
But let's set that aside, because it's also irrelevant to the compile-time claim.
The point of the example wasn't "look at this fascinating class," it was "here is a real template, used 911 times across the codebase, in a public header -- exactly the scenario you said would be slow -- and it costs under 1ms per instantiation."
You can swap `Optional` for any non-trivial template of similar complexity and the numbers will look similar.
On your 1 MLOC/sec benchmark: that's a fair reference point for C-like code, but it's not the right yardstick for template instantiation, which is doing semantic work (overload resolution, SFINAE, constraint checking) that a C compiler simply isn't.
Comparing them is comparing different jobs.
The honest question is whether template compilation is slow relative to what it's actually doing, and in well-structured code, it isn't.
And yes, `Optional.hpp` is a header -- that's the whole point of the demonstration. I'm not claiming you should hide every template in a .cpp file. I'm claiming that even templates in headers, instantiated hundreds of times, are cheap when written with compile times in mind.
The "put templates in .cpp where it makes sense" advice is for the specific cases, not a blanket rule.
In practice, C means you end up with generic data structures with pointers to what they contain, rather than being inline.
You do see a lot of macro use to deal with this, but that is just primitive, non-typesafe metaprogramming, and it gets unwieldy enough that in practice, you see people add an extra pointer. This is why it gets slower.
In practice, I see people write very performance C code where it matters, while moving on quickly where it does not. C++ code is often highly templated with annoying compile times, but still often slow because it still does not use the right data structures, and the amount of instruction bloat by specializing everything does not help for anything which is not a toy benchmark.
If you need callbacks and generics, you're not writing performance code.
99% of code in the wild is comically inefficient and is doing the wrong thing, using way too generic data structures and algorithms for very concrete problems. C++ templates may be one way to make comically slow code faster by spending a lot of compile time. But it's often much quicker to just write straightforward concrete code that the compiler can easily optimize.
IMO C++ makes for slow programs for the sole fact that it compiles so slow (if you use its modern features), so you have much less time to actually iterate and improve.
If compilation is even more than 10% of the time it takes you to run your tests, you're probably not writing correct code. Compilation times don't even measure.
So every time you compile, you run your test suite? I don't. And you trust that I have experience writing and compiling programs too...?
It should be a goal to keep rebuild times around 1 second (often not quite possible, but 3-5 seconds, even for full rebuilds, is often realistic). I edit, compile, run, edit, compile, run. Editing and running can often take as little as 1-3 seconds, and I sometimes do it dozens of times working in a row, working on a single improvement. That's why there is a 1 second rebuild time goal.
In practice I often work on codebases I don't fully control, but when the build times are excessively high, I will complain and try to improve. Build times longer than 10-15 seconds break the flow, they are a significant productivity hit. But they are quite common with C++ codebases (it can also be bad with C codebases by the way, but C++ is typically much worse because of templates and metaprogramming which is very slow).
You run your code before running tests? IMO that's bad practice.
1 second, seriously? Even the Linux kernel is based on C, and it doesn't even have compilation times approaching that.
I guess I also work on a lot of big data projects, where getting results will take... 48 hours or so, so anything shorter than that is basically some sort of unit test or dry run... so in that context, compilation times do not even register on the things slowing me down.
Running the code immediately after making changes is the first line of testing. To run a huge test suite full of tests that are completely unrelated to the current changes would be stupid, it's a huge waste of time and energy.
Yes, seriously, have you ever written a project from scratch? A simple .c file with a thousand lines in it should easily build and start within 100ms. A compiler should be able to do basic parsing and codegen at 1M lines per core.
If your runs take 48h, of course you need a strategy to avoid noticing bugs only after dozens of hours running. You can't tell me that it is efficient to make changes and to wait for minutes or even hours before noticing that your code wasn't even syntactically valid, or maybe it did compile but your code had a small oversight and you need to start over building.
The Linux kernel is a HUGE project, one of the biggest around. Yes, a full rebuild takes a long time, depending on configuration. Incremental rebuilds do not, though.
I'm actually working on a Linux kernel module (distributed filesystem client), it's on the order of 40 KLOC. I can do a full rebuild in 10/15 seconds (debug/release), and that includes calling into the kernel's infrastructure and doing a lot of stuff that shouldn't have to be done. An incremental rebuild after changing a single .c file is about 3 seconds. Restarting the module (swapping for the newly built one) takes less than 10 seconds also. And this can be already a stressful bottleneck depending on the task. Say you're improving logging in a particular section of code, this can easily require 5-10 attempts.
I'm working on Desktop GUIs (2D/3D) too. You need a quick turnaround time as much as possible. Many changes are trivial but you want to do many small incremental improvements, recompile, run and test (manually), often with a breakpoint on the code you're currently working on.
The projects I'm working on are written in C or conservative C++, and most have from thousands to hundreds of thousands lines of code. They can be built from scratch in a short amount of time (< 10s for the smaller ones). And all of them do incremental builds in <= 10 seconds except when maybe changing the most central headers which essentially means a full rebuild.
You can also design a C/C++ codebase to always do a full rebuild, compiling everything as a single unit. That can be faster than trying to do incremental builds, for codebases of considerable size. Try out the popular raddebugger project, a complete build after checkout is about 3 seconds. It's ~300 KLOC I think.
It really depends on what you are doing. Sometimes I am not building in a day, just designing and thinking by editing. Sometimes I am refactoring for hours without building. But sometimes I am rebuilding every 10 seconds or so.
Actually why even specify metaprogram as C like source code? It must be convenience. But there is little practical use, like a good program always models a lot of different representations of more or less the same things, just recombined and processed a little differently. Why would we want to deal with semantics of C types for example, if we can model a much clearer and better constrained universe of types used in e.g. a de/serialization framework? Even only pointers are quite special, and often only of very immediate use, but there is no point in e.g. persisting them to disk or sending them over the network.
Reflection has always been a mess no matter which implementation or language I've used. Fine for scripting languages, unusable for anything serious complex. The data you need is never there, and the data that is there is unusable, at the wrong semantic level (programming language level not what actually your own domain model semantics).
Also I avoid templates for the same reason, they're quickly becoming unmaintainable. Yes, I've tried to make use of them many times, and I have a fair number of them in deployed software. They work without bugs, of course. But I still don't love them, they're boilerplatey hard to maintain complexity that would be better solved with the right factoring plus a tiny bit of ad-hoc boilerplate. I would like to remove many of my deployed templates if I had the time.
And yes, I even avoid std:: template containers and such. Most uses I regret later. Again, this is for systems programming. They're fine for "scripting", leetcode, business software.
reply