Great comment! I agree about comptime, as a Rust programmer I consider it one of the areas where Zig is clearly better than Rust with its two macro systems and the declarative generics language. It's probably the biggest "killer feature" of the language.
> as a Rust programmer I consider it one of the areas where Zig is clearly better than Rust with its two macro systems and the declarative generics language
IMHO "clearly better" might be a matter of perspective; my impression is that this is one of those things where the different approaches buy you different tradeoffs. For example, by my understanding Rust's generics allows generic functions to be completely typechecked in isolation at the definition site, whereas Zig's comptime is more like C++ templates in that type checking can only be completed upon instantiation. I believe the capabilities of Rust's macros aren't quite the same as those for Zig's comptime - Rust's macros operate on syntax, so they can pull off transformations (e.g., #[derive], completely different syntax, etc.) that Zig's comptime can't (though that's not to say that Zig doesn't have its own solutions).
Of course, different people can and will disagree on which tradeoff is more worth it. There's certainly appeal on both sides here.
Consider that Python + C++ has proved to be a very strong combo: driver in Python, heavy lifting in C++.
It's possible that something similar might be the right path for metaprogramming. Rust's generics are simple and weaker than Zig's comptime, while proc macros are complicated and stronger than Zig's comptime.
So I think the jury's still out on whether Rust's metaprogramming is "better" than Zig's.
>often more performant than Rust with lower resource usage
[citation needed]
If we are to trust this page [0] Rust beats Zig on most benchmarks. In the Techempower benchmarks [1] Rust submissions dominate the TOP, while Zig is... quite far.
Several posts which I've seen in the past about Zig beating Rust by 3x or such all turned to be based on low quality Rust code with some performance pitfalls like measuring performance of writing into stdout (which Rust locks by default and Zig does not) or iterating over ..= ranges which are known to be problematic from the performance perspective.
I would say in most submission-based benchmarks among languages that should perform similar, this mostly reflects the size and enthusiasm of the community.
I agree. In my opinion NaNs were a big mistake in the IEEE 754 spec. Not only they introduce a lot of special casing, but also consume a relatively big chunk of all values in 32 bit floats (~0.4%).
I am not saying we do not need NaNs (I would even love to see them in integers, see: https://news.ycombinator.com/item?id=45174074), but I would prefer if we had less of them in floats with clear sorting rules.
>This goes all the way back to the "futures are inert" design of async Rust
Yeap. And this footgun is yet another addition to the long list of reasons why I consider the Rust async model with its "inert" futures managed in user space a fundamentally flawed un-Rusty design.
I feel there's a difference between a preference and a flaw. Rust has targets that make anything except inert futures simply unworkable, and in my opinion it's entirely valid for a programming language to prioritise those targets.
The requirement is that the futures are not separate heap allocations, not that they are inert.
It's not at all obvious that Rust's is the only possible design that would work here. I strongly suspect it is not.
In fact, early Rust did some experimentation with exactly the sort of stack layout tricks you would need to approach this differently. For example, see Graydon's post here about the original implementation of iterators, as lightweight coroutines: https://old.reddit.com/r/ProgrammingLanguages/comments/141qm...
If it’s not inert, how do you use async in the kernel or microcontrollers? A non-inert implementation presumes a single runtime implementation within std+compiler and not usable in environments where you need to implement your own meaning of dispatch.
I think the kernel and microcontroller use-case has been overstated.
A few bare metal projects use stackless coroutines (technically resumable functions) for concurrency, but it has turned out to be a much smaller use-case than anticipated. In practice C and C++ coroutines are really not worth the pain that they are to use, and Rust async has mostly taken off with heavy-duty executors like Tokio that very much don't target tiny #[no-std] 16-bit microcontrollers.
The Kernel actually doesn't use resumable functions for background work, it uses kernel threads. In the wider embedded world threads are also vastly more common than people might think, and the really low-end uniprocessor systems are usually happy to block. Since these tiny systems are not juggling dozens of requests per second that are blocking on I/O, they don't gain that much from coroutines anyways.
We mostly see bigger Rust projects use async when they have to handle concurrent requests that block on IO (network, FS, etc), and we mostly observe that the ecosystem is converging on tokio.
Threads are not free, but most embedded projects today that process requests in parallel — including the kernel — are already using them. Eager futures are more expensive than lazy futures, and less expensive than threads. They strike an interesting middle ground.
Lazy futures are extremely cheap at runtime. But we're paying a huge complexity cost in exchange that benefits a very small user-base than hasn't really fully materialized as we hoped it would.
> it has turned out to be a much smaller use-case than anticipated
Well, no, at the time of the design of Rust's async MVP, everyone was pretty well aware that the vast majority of the users would be writing webservers, and that the embedded use case would be a decided minority, if it ever existed at all. That Embassy exists and its ecosystem as vibrant as it is is, if anything, an unexpected triumph.
But regardless of how many people were actually expected to use it in practice, the underlying philosophy remained thus: there exist no features of Rust-the-language that are incompatible with no_std environments (e.g. Rust goes well out of its way, and introduces a lot of complexity, to make things like closures work given such constraints), and it would be exceptional and unprecedented for Rust to violate this principle when it comes to async.
Point taken, I might have formed the wrong impression at the time.
With my C++ background, I'm very much at home with that philosophy, but I think there is room for nuance in how strictly orthodox we are.
C++ does have optional language features that introduce some often unwelcone runtime overhead, like RTTI and unwinding.
Rust does not come configured for freestanding environments out of the box either. Like C++, you are opting out of language features like unwinding as well as the standard library when going freestanding.
I want to affirm that I'm convinced Rust is great for embedded. It's more that I mostly love async when I get to use it for background I/O with a full fledged work stealing thread-per-core marvel of engineering like tokio!
In freestanding Rust the I/O code is platform specific, suddenly I'd have to write the low-level async code myself, and it's not clear this makes the typical embedded project that much higher performance, or all that easy to maintain.
So, I don't want to say anything too radical. But I think the philosophy doesn't have to be as clear cut as no language feature ever incompatible with no-std. Offering a std only language feature is not necessarily closing a door to embedded. We sort of already make opt-out concessions to have a friendlier experience for most people.
"Not inert" does not at all imply "a single runtime within std+compiler." You've jumped way too far in the opposite direction there.
The problem is that the particular interface Rust chose for controlling dispatch is not granular enough. When you are doing your own dispatch, you only get access to separate tasks, but for individual futures you are at the mercy of combinators like `select!` or `FuturesUnordered` that only have a narrow view of the system.
A better design would continue to avoid heap allocations and allow you to do your own dispatch, but operate in terms of individual suspended leaf futures. Combinators like `join!`/`select!`/etc. would be implemented more like they are in thread-based systems, waiting for sub-tasks to complete, rather than being responsible for driving them.
If you’ve got eager dispatch I’m eager (pun intended) to learn how you have an executor that’s not baked into the std library and limited to a single runtime per process because at the time of construction you need the language to schedule dispatch of the created future. This is one of the main challenges behind the pluggable executor effort - the set of executors that could be written is so different (work stealing vs thread per core) that it’s impossible to unify without an effect system and even then you’ve got challenges of how to encode that in the language structure because the executor is a global thing determined at runtime but then it’s also local in the sense that you don’t know which executor a given piece of code will end up actually being dispatched into since you could have the same async function invoked on different executors.
For better or worse eager dispatch I think generally implies also not being able to cancel futures since ownership is transferred to the executor rather than being retained by your code.
You don't need any of that, and you can keep cancellation too.
The core of an eager cooperative multitasking system does not even need the concept of an executor. You can spawn a new task by giving it some stack space and running its body to its first suspension point, right there on the current thread. When it suspends, the leaf API (e.g. `lock`) grabs the current top of the stack and stashes it somewhere, and when it's time to resume it again just runs the next part of the task right there on the current thread.
You can build different kinds of schedulers on top of this first-class ability to resume a particular leaf call in a task. For example, a `lock` integrated with a particular scheduler might queue up the resume somewhere instead of invoking it immediately. Or, a generic `lock` might be wrapped with an adapter that re-suspends and queues that up. None of this requires that the language know anything about the scheduler at all.
This is all typical of how higher level languages implement both stackful and stackless coroutines. The difference is that we want control over the "give it some stack space" part- we want the compiler to compute a maximum size and have us specify where to store it, whether that's on the heap (e.g. tokio::spawn) or nested in some other task's stack (e.g. join, select) or some statically-allocated storage (e.g. on a microcontroller).
(Of course the question then becomes, how do you ensure `lock` can't resume the task after it's been freed, either due to normal resumption or cancellation? Rust answers this with `Waker`, but this conflates the unit of stack ownership with the unit of scheduling, and in the process enables intermediate futures to route a given wakeup incorrectly. These must be decoupled so that `lock` can hold onto both the overall stack and the exact leaf suspension point it will eventually resume.)
Cancellation doesn't change much here. Given a task held from the "caller end" (as opposed to the leaf callee resume handles above), the language needs to provide a way to destruct the stack and let the decoupled `Waker` mechanism respond. This still propagates naturally to nested tasks like join/select arms, though there is now an additional wrinkle that a nested task may be actively running (and may even be the thing that indirectly provoked the cancellation).
On the other hand, early Rust also for instance had a tracing garbage collector; it's far from obvious to me how relevant its discarded design decisions are supposed to be to the language it is today.
This one is relevant because it avoids heap allocation while running the iterator and for loop body concurrently. Which is exactly the kind of thing that `async` does.
It avoids heap allocation in some situations. But in principle the exact same optimization could be done for stackful coroutines. Heck, right now in C I could stack-allocate an array and pass it to pthread_create as the stack for a new thread. To avoid an overlarge allocation I would need to know exactly how much stack is needed, but this is exactly the knowledge the Rust compiler already requires for async/await.
What people care about are semantics. async/await leaks implementation details. One of the reasons Rust does it the way it currently does is because the implementation avoids requiring support from, e.g., LLVM, which might require some feature work to support a deeper level of integration of async without losing what benefits the current implementation provides. Rust has a few warts like this where semantics are stilted in order to confine the implementation work to the high-level Rust compiler.
> in principle the exact same optimization could be done for stackful coroutines.
Yes, I totally agree, and this is sort of what I imagine a better design would look like.
> One of the reasons Rust does it the way it currently does is because the implementation avoids requiring support from, e.g., LLVM
This I would argue is simply a failure of imagination. All you need from the LLVM layer is tail calls, and then you can manage the stack layout yourself in essentially the same way Rust manages Future layout.
You don't even need arbitrary tail calls. The compiler can limit itself to the sorts of things LLVM asks for- specific calling convention, matching function signatures, etc. when transferring control between tasks, because it can store most of the state in the stack that it laid out itself.
In order to know for sure how much stack is needed (or to replace the stack with a static allocation, which used to be common on older machines and still today in deep embedded code, and even on GPU!), you must ensure that any functions you call within your thread are non-reentrant, or else that they resort to an auxiliary stack-like allocation if reentrancy is required. This is a fundamental constraint (not something limited to current LLVM) which in practice leads you right back into the "what color are your functions?" world.
>Try not to breathe any, studies are still pending but that stuff gets everywhere.
I would understand such comment in the context of carbon nanotubes or fullerenes, but graphene? Have you forgot that graphite is literally a bunch of stacked graphene?
Considering how much graphite pencils are used across the world, we would've seen hypothetical negative effects already with a high degree of confidence.
Yes, graphene production aims to produce larger sheets, but it only makes graphene less biologically active, not more.
> Considering how much graphite pencils are used across the world, we would've seen hypothetical negative effects already with a high degree of confidence.
Graphitosis is the graphite equivalent of silicosis and asbestosis so yes we’ve got plenty of evidence it’s harmful, but it’s mostly a problem with occupational exposure where large amounts of graphite dust are produced.
That might change if there’s tiny sheets of graphene flaking off everywhere from nanocoatings and it turns out to be carcinogenic for the same reason asbestos is (which isn’t out of the question given the studies on CNTs and nanotoxicity in general).
IIUC graphitosis, silicosis, and black lung require to inhale ungodly amounts of dust. It's orders of magnitude more than we can expect from flaking-based trace contamination.
Why do you expect a different result from "tiny sheets of graphene flaking off everywhere from nanocoatings" compared to the same flaking from graphite smeared across paper?
Pencil graphite breaks off in very large chunks and when you look at them in a microscope the particle size is in the micrometers. Those particles are too big to easily penetrate cells or deep tissue. You understand correctly about the dust issue.
Nanosheets are a different story and I’m worried that the graphene produced for industrial applications will be much smaller, flake off much easier in the field as distinct sheets like from abrasion, and stay airborne for longer. In that form they’re likely to behave like asbestos and the evidence is already pretty strong that they do.
If we start to have huge amounts of it spread through house objects, than yeah, we can increase people's exposure by a large multiplier and get the known harmful effects we already know about.
That said, I don't think we will ever have large amounts of it in house objects. Graphene doesn't seem to be useful that way. We may have it embedded in some material, but that will limit exposure to waste management and manufacture.
Also, differently from asbestos, graphene is not chemically stable. So very small pieces of it have a limited half-life.
>If they had invented a separate code point for I in Turkish, then when converting text from those existing ISO character encodings, you’d have to know whether the text is Turkish or English or something else, to know which Unicode code point to map the source “I” into. That’s exactly what Unicode was designed to avoid.
Great. So now we have to know locale for handling case conversion for probably centuries to come, but it was totally worth to save a bit of effort in the relatively short transition phase. /s
You always have to know locale to handle case conversion - this is not actually defined the same way in different human languages and it is a mistake to pretend it is.
In most cases locale is encoded in character itself, i.e. Latin "a" and Cyrillic "a" are two different characters, despite being visually indistinguishable in most cases.
The "language-sensitive" section of the special casing document [0] is extremely small and contains only the cases of stupid reuse of Latin I.
I call BS. Without a series of MAJOR blunders Unicode was destined to succeed. When the rest of the world has migrated to Unicode, I am more than certain that Turks would've migrated as well. Yes, they may have complained for several years and would've spent a minuscule amount of resources to adopt the conversion software, but that's it, a decade or two later everyone would've forgotten about it.
I believe that even addition of emojis was completely unnecessary despite the pressure from Japanese telecoms. Today's landscape of messengers only confirms that.
>Even modern languages like Rust did a crappy job of enforcing it
Rust did the only sensible thing here. String handling algorithms SHOULD NOT depend on locale and reusing LATIN CAPITAL LETTER I arguably was a terrible decision on the Unicode side (I know there were reasons for it, but I believe they should've bit the bullet here), same as Han unification.
Thank you for a great overview! I wish HTTP3/QUIC was the "default option" and had much wider adoption.
Unfortunately, software implementations of QUIC suffer from dealing with UDP directly. Every UDP packet involves one syscall, which is relatively expensive in modern times. And accounting for MTU further makes the situation ~64 times worse.
In-kernel implementations and/or io-uring may improve this unfortunate situation, but today in practice it's hard to achieve the same throughput as with plain TCP. I also vaguely remember that QUIC makes load-balancing more challenging for ISPs, since they can not distinguish individual streams as with TCP.
Finally, QUIC arrived a bit too late and it gets blocked in some jurisdictions (e.g. Russia) and corporate environments similarly to ESNI.
> In-kernel implementations and/or io-uring may improve this unfortunate situation, but today in practice it's hard to achieve the same throughput as with plain TCP.
This would depend on how the server application is written, no? Using io-uring and similar should minimise context-switches from userspace to kernel space.
> I also vaguely remember that QUIC makes load-balancing more challenging for ISPs, since they can not distinguish individual streams as with TCP.
Not just for ISPs; IIRC (and I may be recalling incorrectly) reverse proxies can't currently distinguish either, so you can't easily put an application behind Nginx and use it as a load-balancer.
The server application itself has to be the proxy if you want to scale out. OTOH, if your proxy for UDP is able to inspect the packet and determine the corresponding instance to send a UDP packet too, it's going to be much fewer resources required on the reverse proxy/load balancer, as they don't have to maintain open connections at all.
It will also allow some things more easily; a machine that is getting overloaded can hand-off (in userspace) existing streams to a freshly created instance of the server on a different machine, because the "stream" is simply related UDP packets. TCP is much harder to hand-off, and even if you can, it requires either networking changes or kernel functions to hand-off.
I still disagree with their decision to make libc THE system interface. I understand why it's important to provide a compatibility layer, bit, ideally, I would like to see a Linux-like (potentially semver-versioned) stable sycall API, or at the very least something like libsystem, i.e. a thin wrapper around technically unstable syscalls API.
The wild thing here with a microkernel is that the syscall API to the actual kernel should be theoretically really small right?
I get the various little services might change, but ultimately the kernel supporting posix like threading and memory operations should be mostly enough?
The kernel ABI is notoriously backwards compatible (the famous "we do not break userspace" and all). The primary reason why binaries rot on Linux is GLIBC and other shared library dependencies. I still can execute a MUSL binary compiled more than a decade ago without any issues.
reply