The thing that concerns me most is looking at the fix it is very difficult to see why this fix is correct. It also appears as there is lots of code without explicit bounds checks. It makes me worried because while the logic may be safe this makes the logic very complex. I wonder what the cost would be to add an explicit, local bounds check at every array access. This would serve as a backup that is much easier to verify. I suspect the cost would be relatively small. Small enough that I personally would be happy to pay it.
This is also a great reminder that fuzzing isn't a solution to memory unsafe languages and libraries. If anything the massive amount of bugs found via fuzzing should scare us as it is likely only scratching the surface of the vulnerabilities that still lie in the code, a couple too many branches away from being likely to be found by fuzzing.
> ...isn't a solution to memory unsafe languages and libraries. If anything the massive amount of bugs found via fuzzing should scare us as it is likely only scratching the surface of the vulnerabilities that still lie in the code
Yup. For example, the Linux code for its relatively new[1] io_uring subsystem was so memory-exploit-ridden that Google disabled it for apps on Android, and entirely on ChromeOS, and their servers[2]. It is insane that this is how easy it has been for a person to break into Linux.
[1] released with kernel version 5.1 which came out in May 2019
The design of io_uring has nothing to do with the language the Linux kernel is implemented in. And safer languages really can't provide guarantees when the issue at hand is designing a shared memory cross-executable (kernel⋄userspace) API.
Just having first class slice types, where your pointer is paired with a length, which allows for doing bounds checking automatically is a huge upgrade over C, even if it doesn't solve every other problem.
Security exploits from out of bounds access should not be happening today, bounds checking is a solved problem, and has been solved for decades.
That's a pretty interesting paper. Thanks for sharing it. I wonder why Dennis Ritchie, with the weight his name carries, wasn't able to push it into mainstream implementation...
I want to create software that doesn’t harm users. That doesn’t at all imply that I’m not interested in computers. In fact guaranteeing safety for users shows a profound interest in computers.
As an example, see how C 'int', 'short', 'long' or whatever becomes actual different types
> just admit you're not interested in computers
I have the popcorn ready already for when the next C "hotshot" shoots himself in the foot yet again because he thinks a chainsaw without an emergency brake is just more fun
A 2-5× performance hit is also a 2-5× CO₂ hit, and I wouldn't be willing to let myself or other people take that hit. We're throwing enough heat into the atmosphere already to make cats dance on our portable supercomputers as is.
I understand that the patch is correct. The point that I am trying to make is that the correctness of this code relies on non-local reasoning (the allocation/reallocation of the table before it is used). I believe that non-local reasoning raises the chance of error (and is likely one of the reasons why this bug existed in the first place).
By adding length counts and explicit bounds checks locally you provide local reasoning to back up this non-local reasoning which is much *easier* to make and maintain correct. I think that would result in code that is less likely to have similar bugs in the future.
Over a library used by billions of people a lot of times a day, those bounds checks add up to a lot of wasted time and energy if you don't strictly need them.
If this is a common function in the webp library, this function is probably called something like a trillion times a day.
The past 40+ years of CS research has created a huge number of tools for reducing the costs of memory safety. Some of these made it into Rust, but others are sitting in (often abandoned) academic projects. It is an abject failure of industry that we are still parsing untrusted input in languages like C.
WebP gets a slight pass because the code is so old; WebP is at least partly based on code from On2 that extends back at least until the early '00s. Rewriting working code is always going to be lower priority than writing new code (and it's not always a safety win either; being deployed for 10 years gets you an awful lot of real-world testing!).
But look at implementations for newer image formats and you see more of the same; C and C++ with the token java implementation that won't get used outside of the JVM.
C and C++ are often used because at the optimization level that these sorts of libraries use, Rust makes tradeoffs that do not work out to 0 overhead. I would personally suggest that Google et al invest in figuring out how to make "formally-verified C" a more widely adopted language for libraries that are truly security-critical.
C safe enough to use for parsing untrusted input like this needs to be either formally verified to be safe (not to be correct, at least it's it's the canonical implementation), or written in an intrinsically memory-safe way which humans can fully reason about the safety of.
And at that latter point it's slower than Rust. (Not too mention that Rust's awareness of memory lifetimes enables substantial optimizations beyond plain C code that quite often result in faster code at the same level of effort.)
Apple has actually measured the impact of adding bounds checking to commonly used libraries including media libraries and found no impact on overall power use. Audio and video encoding libraries took around a 1% performance impact.
I don't think there is any evidence to support that bounds checking causes any significant increase in power consumption, and measurements currently available such as shown above show minimal runtime performance impact.
Bounds checking is the perfect situation for branch predictors, 99.9% of the time the branch is predicted correctly, and when it's not it doesn't matter because you're probably aborting the program!
There's also the point that you can opt out of bounds checking as-needed on hot loops if you really need to.
There is no technological reason to not use bounds checking as the default mode of operation today.
There is no technological reason to not use bounds checking as the default mode of operation since ALGOL exists.
My favourite quote, C.A.R Hoare's "The 1980 ACM Turing Award Lecture"
"A consequence of this principle is that every occurrence of
every subscript of every subscripted variable was on every
occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they
urged us not to--they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980 language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have
long been against the law."
They aren't wasted if they are preventing security vulnerabilities.
I also wouldn't be surprised if patching, rebuilding and distributing this fixed version of libwebp to all of these devices would be comparable to the extra cost of decoding.
"Preventing security vulnerabilities" is not an unadulturated good (the most secure computer does nothing but checking to make sure it hasn't been breached), and any bounds check that does not actively prevent an intrusion is wasted from a value perspective.
Your intuition about the cost of a patch vs the extra instructions in the hot path to check bounds everywhere is also very much wrong: the cost of a patch (which is also amortized among many fixes) is not even close. The hot path adds up when you call it a lot.
I think your intuition on the cost of not-taken bounds check branches is off, especially in the context of devices that spend most of their resources browsing the web, running radios, powering screen backlights. I bet any holistic measurement could not tell the difference.
The best solution to this sort of bug is not using languages that are susceptible to this sort of bug. At the very least, I think it's time to retire the fallacy that we're generally capable of producing sound programs in memory unsafe languages. Just like we don't write code manually checking raw SQL to protect against injections and we don't roll our own crypto when we need to encrypt something or do a key exchange, we shouldn't treat writing safe code in C as something anyone can do by just being vaguely diligent.
I feel like we need for C and C++ what Typescript is for Javascript: Not a language from scratch but something which is as close as possible to the thing everyone is familiar with while doing the thing Rust does. A standard library where everything has the same names, the same kind of C++ objects and templates and RAII etc., change only this and nothing else.
Because otherwise you make people learn 100 other things at once to start using it effectively, and then they don't.
That isn't the thing Rust does, and is slower, so then people won't use it.
The advantage of Rust is that you can produce efficient safe code. The disadvantage is that you have to learn a different language first, which deters adoption.
Zig is something like this for C, but not C++. There are a number of ongoing attempts to do something like this for C++, but it is very hard and they are all way behind Rust in terms of mindshare.
Check out the tooling from the seL4 project.
I'm aware they used intermediate levels between raw C and the proof assistant, but I don't see why that should be prohibitive.
Implementing webp and similar libraries in another language may not be quick or easy, but having these sorts of issues continue to crop up over and over doesn't feel particularly practical to me; it's just easier to act like every time a critical vulnerability comes up due to memory safety that it's an isolated issue than confront the reality that its a systemic issue and there aren't any easy solutions.
Agreed, I'll stop saying it when we stop having these issues. I don't think people want to confront the fact that there isn't any easier solution, but we're going to keep running into the same thing over and over until we do.
If you read what they proposed and disagreed with it, then don't ask "What do you propose?" - Argue/debate why you think their proposal, which you read, is not useful.
Fuzzing needs to cover all important bits of the code to be useful. The problem I see is that incomplete coverage creates a false sense of security. Projects have some minimal fuzzing coverage (e.g. in oss-fuzz) and care less about quality of the code, thinking fuzzing will catch all security bugs.
Rust code needs proper fuzzing too. It takes a lot of effort to ensure everything is covered and stays covered as the code is developed. Crashing libraries or applications can be a denial of service. Sure, it's lower impact than an RCE due to a buffer overflow, but it is still a security issue.
This isn't even the problem. There are shitloads of open bugs found by fuzzing, even in the linux kernel, that nobody is fixing. Serious security vulns in the kernel regress because of lack of tests.
It is clear that even if we stood up the best bug finding systems the world has ever seen that critical software will still be a disaster.
This is the sort of thing where I am very curious to hear what happens if you fine-tune an uncensored version of GPT4 on some memory exploits like this along with the vulnerable code and ask it for more.
It seems like it ought to be really good at this and I am suspicious that people in the know are afraid to talk about it publicly because it's too good at it and once people start weaponizing LLMs for this purpose we just won't be able to use memory unsafe code anymore.
LLMs should indeed be very good at finding vulns. It is a safe bet state actors have already done this, and have added to their stockpile of zero-day exploits. I suspect you are right that collectively we are whistling past the graveyard and the current "solution" is to make widely available LLMs explicitly not provide that kind of capability.
My experience is they're not good at this for most vulnerability classes, especially the those that are tough to discover by classical methods. Have you had any experience using them for this?
Trivial vulnerabilities are easily discoverable yes -- but, they are also trivially discoverable by standard automation available today. I've found GPT-4 to be shockingly bad at vulnerability analysis for all except the most popular vulnerability classes. My speculation is that there just isn't enough literature on these vulnerability classes for it to have practical mastery of them.
Complex vulnerabilities are the emergent phenomena of multiple events across a codebase and it's dependencies, involving control flow, data flow, while missing type information and other runtime data. Even Anthropic's 100K context windows won't nearly fit it all, and if you stuff all the code into embeddings, the ability to reason across all this space will be poor.
You can train a model to ask very pointed questions about particular snippets, but wholesale LLM-based analysis to find vulnerabilities seems like it'll be extremely slow, expensive and inaccurate.
I am surprised that you can even say "it's not very good." The sorts of prompts I'm imagining I would expect to trigger the "I'm afraid I can't do that Dave" safeguards every time. I guess yeah, I'm just imagining using it as an advanced fuzzer but I think the thing about using an LLM is you can take the fuzzer code and ask the LLM to just generate slight variations, feed that into a test and ask the LLM to flag ones that look like they might have been an exploit. And when it generates nonsense code, you just throw out those runs. But on the other hand hallucination feels like an advantage here since it's going to do things you never would have thought to test.
I don't have any problem getting it to help with exploit development. I never had any issue with that with any of them, in fact, which is surprising in retrospect.
Inference is so slow, and almost everything about fuzzers are meant to be super fast. Maybe there's a late stage part in crash validation/analysis where you can use it but my bias is that we're just not there yet.
I would not expect an LLM to be good at this without specialized training. I have tried prompting for code generation.
I do not know of LLMs that have been specifically trained on, say, the testing corpus of some "lint" programs and against known vulns. As you point out, it wouldn't be possible as a user of an LLM AI to do the equivalent by showing it some vulns, while it would be perfectly reasonable to get an LLM to write, for example, business case studies by showing it examples.
I don't think specialized training solves any of the problems I mentioned. It doesn't increase the window size, or provide any of the types of highly specialized and optimized multi-file, multi-technique analysis, or make it any cheaper.
Humans find security vulnerabilities all the time too. The assumption is a lot of the misses are because highly trained security experts don't have time to carefully read and understand all code without fatigue.
> To put this in context: if this bug does affect Android, then it could potentially be turned into a remote exploit for apps like Signal and WhatsApp. I'd expect it to be fixed in the October bulletin.
Interesting quote from Ben Hawkes (former Project Zero manager) in the article. I regularly compile Signal-Android from source and happened to notice they vendored libwebp a few days ago:
Android is particularly troublesome here with the number of phones out there receiving no updates, and just a single download away from being exploited.
For me, personally, it's a race to see if Google can get this patched for my Pixel 5 before security updates stop in October.
> A phone released in October 2020 is about to stop getting security updates?!
I actually looked into it, my previous phone was released in June 2020.
The last security update for the phone came out in January 2022.
I'm not sure whether they're just updating things infrequently or whether that model is abandoned altogether, but neither would speak highly of the Android support landscape for non-flagship phones.
This ends up being kind of silly: if I want to have a mostly secure backup device (for 2FA and my bank/eID apps), then I probably need two comparatively recent devices, since a 5 year old budget phone would no longer quite do as a temporary daily driver, in case I'd lose my main phone.
I can say that their rugged designs served me well, but some of the newer phones apparently have this weird component for wireless earphones and I don't really want that, so my current and future purchases are from different manufacturers. Was actually pretty close to stock Android, though!
It stops getting guaranteed security updates, whatever that means. If this is really so serious that it enables drive-by attacks on the web, it'd be negligent not to port the fix to older OSes.
My phone came out in 2017. I'm still on Android 8. There's an update to Android 9.1 available somewhere, but AT&T never got around to porting it to their crapware-riddled fork.
The problem is that they'll use the opportunity to schlep a bunch of non-security related stuff into the update as well. That's the thing that really bothers me about these, that you can't say just the security patches and hold the telemetry/marketing/spyware/adware/crapware/malware/etc.
The point about Android is particularly important. I wouldn't like to estimate the proportion of Android phones that are in regular use that no longer receive security updates.
Android phones don't have an awful lot of attack surface area for typical users though. Messenger apps already will refuse to display arbitrary images - Whatsapp for example will only display jpegs and mp4's sent from other contacts.
True, and most websites will reencode images for compression anyway. But the point still stands this makes "hacked by clicking a link" a reality again for some people which should have ended along with flash and java applets. The current mainstream is that "you won't get hacked if you don't install." (not to say that this is the best security practise..) Since you don't install images most people will just assume they are safe. I hope this whole ordeal doesn't hurt JPEG-XL adoption.
The reference implementation is C++, and it’s nearly guaranteed to have equally worrisome bugs in it — every image library has seen those over the years.
We live in 2023. We can deal with slightly worse compression until someone rewrites it in a sane language.
Yeah, I don't mess with webp, VP8, etc. I see near zero benefit to myself from that, with significant downsides.
Like, Google Meet insists on using VP8/9. Why, cause it's "free?" The strain on my laptop and extra energy usage for it to CPU-en/decode video ain't free. Zoom just uses h.264 instead of being annoying about it.
Especially when there are various neural net compression methods on the horizon that look like they can probably reduce file sizes 10x or more for similar user enjoyment.
It's not just compression ratios that new formats provide though: it's also higher bit depths (meaning less banding for things like gradients, i.e. in skies at sunset) and HDR support (values over 1.0 in 'linear' space, rather than being clamped).
Smart pointers aren't the beginning and end of things. C++ is riddled with places that can be converted into weird machines, even if you guarantee correct lifetimes of everything on the heap.
Look at how much fun we can just have with the stack!
std::string_view foo(std::string_view s) {
return s;
}
auto s = foo("temporary"); // kaboom
Modern C++ does not force you to initialize everything before it is read. Modern C++ happily lets you ignore bounds checks with vector operator[] or by using c style arrays or by doing pointer arithmetic. Modern C++ happily lets you overflow integers or silently truncate when widths change. And on and on and on. Turning every "new" into "make_shared" is nowhere close to enough to make C++ safe in the face of bugs.
Agreed. Security shouldn't depend on not visiting certain web sites, because if nothing else the ad networks are able to get your browser to load all sorts of content from all sorts of places that you wouldn't deliberately do.
Clicking a link in a web browser means that any image decoding happens in the browser renderer sandbox.
That sandbox is pretty robust - the difficulty of finding an exploit somewhere in a browser renderer is much lower than the difficulty of finding a way out of the sandbox the renderer runs in.
Yes, not exactly "clicking a link" but still at the same level ("add to contacts"). Also many apps have users with profile pictures. An attacker can send a link that will open in an app (eg. Instagram, facebook). Also within browsers could this still leak whatever is currently rendering on the page?
Is there any site that shows what phones are still getting security updates? I'm worried that this will be the thing that makes me retire my son's old Moto G... 5 I think? It's probably out of security updates. Which kills me, he's a careful boy and it's a solid phone, this is unnecessary E-Waste.
I’m surprised, given the history of exploits, Google didn’t decide to start shipping the image decoders as an APEX system component in the Play Store, with the built-in ones serving as a fallback. They just might now.
Having spelunked through some Google announcements on this, it looks like they planned to do so back when Android 11 was in development (https://android-developers.googleblog.com/2020/02/Android-11... ), but I can’t find any evidence that image decoders are included in the current Android Mainline system updates that ship via Google Play - video codecs are, but not image ones apparently.
It looks as if they planned to include image decoders, but that was dropped sometime during the Android 11 development cycle, unless I’ve missed something (which is certainly possible).
I'm always suspicious of Google's behavior. It might be a good opportunity for them to tell everyone they need the latest version of Chrome because security. Manifest 3.0 and cohorts/FLoC, you can't escape them.
I'm tired and cranky today so this will lack subtlety, but:
You don't have to use Rust but you **can't** use C.
There's no reason to be finding these bugs in 2023; period, we can do better and we know how to do better, there's just no reason apart from legacy code (and even then) that you should be using memory unsafe languages in production.
Responses like this are why people still use C when it may make more sense to use a different language. Don’t just tell people, “trust me bro, never use C, there’s never a good reason”. Never make broad claims that leave no wiggle room, because the one thing that is always true is that there will be exceptional cases. Instead of saying this kind of stuff, why don’t you elucidate what circumstances exist where using C makes sense? Microcontrollers, FPGAs, glue code that can expose your API to other programming languages, maintaining an old code base already written in C, etc.
And as an aside, this is also why people don’t take security seriously. Not every bounds overflow will result in my database being breached and/or my computer being pwned. Instead of freaking out every time a bounds overflow or some sort of memory error is found, it may be useful to actually find out what the impact is. Is that code sandboxed? Is it local, or does it have access to the web? Does it request elevated permissions at any point? Does it even need elevated permissions?
There are many factors that can reduce or increase the risk factor of a bug. Security researchers need to start actually explaining the risk instead of blindly proclaiming everything as a “critical” vulnerability.
And everything I’m saying here has no bearing on this specific vulnerability. It’s in response to the general claim that every memory bug is critical.
For an image decoding library, you'll still need to expose a C API if you want anything to be able to use your library. You could only provide a library for your preferred language(s) but if like Google you're trying to push an all-new format the ecosystem buy-in is a big deal. And obviously you can provide a C-compatible library without writing the library in C, but it's more complexity to handle.
(Edit: I should mention that this is one reason why I really like wuffs, mentioned elsewhere in this thread; it's a safer language that compiles down to C, so you can have mostly the best of both worlds)
I don't think the answer is to do fine-grained bounds checking absolutely everywhere, nor do I think it'll happen anyway. Even "unsafe" code has a coarse type of bounds-checking, the kernel's virtual memory layer, which adds less overhead. C etc is fine if it's not in the same memory space as the safe code.
This is pretty bad, there will be many vulnerable applications and someone just posted how to exploit it. I just reproduced it on a VM:
SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/<name>/webp_test/examples/dwebp+0xb24e9) in BuildHuffmanTable
Shadow bytes around the buggy address:
...
Also, here's the webp it generated as base64, but chrome doesn't crash or anything. Other apps may handle it differently, but many just say invalid format.
If I recall correctly, Whatsapp supports webp for their sticker packs. And those are open to anyone to develop and integrate in Whatsapp on device with a intent.
Remember when the primary purpose of flatpak's shared runtimes was so that libraries like libwebp are not actually bundled per-application? libwebp in the runtimes has been patched with a fix for over a week.
A memory-safe language could have easily resulted in the same vulnerability.
In this context, “memory-safe” means it does bounds checking on an array when you try to access an element. But the webp code does bounds checks up-front so that array accesses can be non-checked, to help performance. (If they didn’t want this performance, they could have easily used a std::vector and used bounds-checked access.) The vulnerability happened because this up-front bounds checking logic was incorrect.
If we were to hypothesize a counterfactual world where webp was written in rust, presumably the devs would have wanted a similar optimization where they did the bounds checking up-front still, and they would have put the actual array access in an unsafe block to have the same perf optimization. The same bug would have thus happened.
The lesson here is that bounds checking on every element access is probably a worthwhile overhead, no matter what the language is. C++ has safe ways to do this (well, safe-ish… safe for the purposes of this bug at least) with std::vector, and maybe they should just switch to that and eat the performance overhead.
This is a big presumption. Yes, it could happen. In practice, doing this isn't even the first tool you'd reach for in this circumstance; the compiler can and will eliminate duplicate bounds checks, so if you've hoisted it early, you shouldn't be using unchecked accesses, even if you care about performance, until you've demonstrated why the compiler isn't okay with removing them. The extra ceremony ("unsafe { foo.get_unchecked(n) {" vs "foo[n]") makes this even simpler to catch in code review.
> The extra ceremony ("unsafe { foo.get_unchecked(n) {" vs "foo[n]" makes this even simpler to catch in code review
Right, and in said code review, the webp author could have easily said “yup, we want unsafe here because we already checked up front that the buffer shall not exceed k elements”. Sure it’s easier to see that unchecked access is happening, but when the whole point of large sections of the huffman table code in webp is to make this very thing work, it wouldn’t cause any additional scrutiny in a code review. In other words, it would already be super clear to the reviewer that bounds checking is being disabled for performance sake, seeing the word `unsafe` as ceremony isn’t really adding any information here.
It’s possible webp could have been implemented in rust with a naive approach to early bounds checking and relying on the optimizer to elide it, but having looked at the code, with all the buffer sizes they’re passing around, it looks unlikely that equivalent rust code would have been able to auto-optimize it. I don’t think it’s unlikely at all that, in this parallel universe, they would have found the bounds checking to be a decent enough overhead that they would have reached for unchecked access, and would have passed code review the same way the current code did.
For sure, this absolutely could (and will, at some point) happen. But security isn't only about what can happen, but what will probably happen. Each of these things introduces a possibility to catch the bug, not an impossibility that a bug would happen.
> it wouldn’t cause any additional scrutiny in a code review.
I would hope that it would at least cause a "demonstrate that this is actually necessary" review. People do do this! and sometimes, you do have to use unchecked. A famous example of this is litearally huffman coding, by Dropbox, back in 2016. They ended up providing both bounds checked and unchecked options as a flag, because they actually did measure and demonstrate that things were causing performance drops. I am curious if they'd still have the same issues today, given that a lot of time has passed, and also there were some other factors that would cause me to wonder a bit, but haven't followed up on myself. Regardless, this scenario will end up happening for good reasons at some point, the goal is to get them to happen only when they need to, and not before.
That was for Dropbox specific use case, and 11% seems too much in any case.
For example, what improvements has had rustc in the meantime to better optimize bounds checking, it is still the same, it is less, does it actually matter on the acceptance criteria regarding the definition of done on the story.
You’re arguing about a potential vulnerability in a hypothetical code that exists only in your imagination.
Meanwhile the real code that exists doesn’t work like that. It’s not just a matter of personal style. Rust has constructs that help avoid bounds checks. It has design patterns that move unsafe code into smaller, easier to verify helpers. It has a culture of caring about safety – people don’t rewrite projects in Rust to just give up on its biggest selling point.
We know that the webp project does use unchecked access. We know this is intentional because they go through a lot of effort to do up front bounds checking rather than at every access. We also know that companies like Dropbox, when implementing their own Huffman table implementation, demonstrated an 11% speed up when disabling bounds checks: https://dropbox.tech/infrastructure/lossless-compression-wit...
It is not a stretch whatsoever to assume that a direct rewrite of the webp library into rust would have uncovered the same perf findings that Dropbox did, and decided that the pattern of “up front bounds checks, disabled bounds checks at element access” is a reasonable perf enhancement, especially for a Huffman table, where you’re continually accessing the compression table over and over when expanding it.
Edit: the more I think about it, you’re probably right. I had mistakenly thought I was reading C++ code when I was looking at the original patch to the vulnerability. The fact that it’s actually written in C, erodes my argument quite a bit… my logic in my head went something like, “if they wanted bounds checking they would have used std::vector”, but now I realize that’s impossible since it’s not C++. I’ll concede that there’s no real reason to assume that they would have skipped bounds checking if written in a safe language.
While yes, that's theoretically possible, do you have data to establish this? For example, having small sections of the code be marked as unsafe would allow for greater scrutiny of those sections. Also, unsafe access is more annoying to perform in Rust than in C or C++, so maybe that would have acted as a deterrent (or at least the code would have been profiled to make sure that unsafe access was worth it).
It looks like dropbox experimented with disabling bounds checks in their huffman coding impl, and found that using the unsafe pattern increased throughput from 224 MB/s to 249 MB/s (11%-ish faster.) We don’t even need to hypothesize about whether webp would have elminated bounds checking, we can see that other companies arrived at the same conclusion: Disabling it can be worth it if you’re quite sure you’ve gotten the up-front checking right. We can imagine that if Dropbox went to prod with the unchecked huffman implementation (never mind that that article isn’t about webp in particular), we could imagine they could easily have the same bug. And I don’t think a naive code review saying “unsafe is bad” would have stopped them from doing it: they clearly did the work to show why it’s worth it.
The risk probably doesn't matter for their use case because they're doing all this datacenter-scale image conversion on processes separate from the main logic (and likely not even on the same machines). Unlike in a phone's web browser or something, where it's 11% speedup with ??% added risk.
It's already common for PC apps to split potentially unsafe rendering into subprocesses, like in Chrome. If you don't want to pay the full IPC toll, there's shared memory. In theory should be about the same speed as inlined unsafe code, right? What if Rust's "unsafe" blocks could do this for you?
It isn't, but there's a big difference between writing straight C and writing some unsafe Rust with safe wrappers. It's generally possible to have comparably few lines of unsafe Rust that do the heavy lifting and can be verified to maintain memory safety without sacrificing performance.
Well, the code you mark unsafe is probably also the most complicated piece and thus the most likely to have a bug. I can trust decent C devs to write their basic logic safely, just not the hyper-optimized portions.
A big blob of complex unsafe code is the opposite of how Rust devs approach unsafe optimizations.
Rust has a pattern of isolating unsafety into small components behind a safe interface, so that the component can be understood and tested in isolation. For example, if you need some adventurous pointer arithmetic, you write an Iterator for it, rather than do it in the middle of a complex algorithm. This way the complicated logic can be in safe code.
It's sort of like Lego, where you build from safe higher-level blocks, but you can design custom blocks if you need.
Likewise, C code will be organized into separate pieces with a few small super-optimized/complicated parts, and they can fuzz-test the complicated pieces that are most likely to have buffer overflows.
I can't find the actual code causing the libwebp vulnerability, so idk if mixed safe/unsafe Rust code would've been any better here. Maybe what we really need is an "unsafe-jail" block in Rust that uses a child process limited to a piece of shared mem, and you put big pieces in there to avoid overhead. Like, libwebp can screw up all it wants, just don't touch the rest of my app.
The difference is that in C you can't make a "safe" interface, which the compiler will enforce is used properly.
C's type system is not nearly expressive enough for this. You can barely declare a pointer non-null with extensions, but you can't express ownership. You can't force callers of your function to check for error before using the returned value. You can't force correct lifecycle of objects (e.g. Rust can have methods that can be called once, and no more, and then statically forbid further uses of the object. Great for cleanup without double-free.)
C doesn't have ability to turn off Undefined Behavior for a section of code. You can't just forbid dangling pointers.
For a foolproof API the best you can do is use handles and opaque objects, and a ton of run-time defenses. But in Rust that is unnecessary. Use of the API can be guaranteed safe, and a lot of that is guaranteed at compile time, with zero run-time overhead.
For example, a mutable slice in Rust (a buffer) has a guarantee that the pointer is not null, is a valid allocation for the length of the slice, is aligned, points to initialized memory, is not aliased with any other pointer, and will not be freed for as long as you use it. And the compiler enforces that the safe Rust code can't break these guarantees, even if it's awful code written by the most incompetent amateur while drunk. And at run time this is still just a pointer and a length.
In C you don't get this compartmentalization and low- or zero-overhead layering. Instead of unsafe + actually enforced safe, you have unsafe + more unsafe.
Right, but I can trust a decent C developer to use it safely in the simple parts, especially with tooling like valgrind to detect obvious bugs. The only part where I'd say the usual "nobody is perfect" is in the hard parts.
There's 40 years history of trying, and it doesn't work.
These decent C programmers are like True Scotsmen. When top software companies keep getting pwned, even in their most security-sensitive projects, it's because they hire crap programmers.
Even basic boring C can be exploitable. Android was hit by an integer overflow in `malloc(items * size)` (stagefright). Mozilla's NSS had vulnerability due to a wrong buffer size, which fuzzing did not catch (BigSig).
The only overhead I see in theory is the additional process's kernel resources, which are negligible. Wherever you'd normally write to memory for some unsafe code to mutate, you instead write to the shared memory. Kernel is doing the same virtual mem protection either way, only in this case it opens up access to the child process too.
Am I missing something else, like shared mem being slower? Maybe the inability to share CPU caches?
The fact that you have a whole other process, was my line of thinking. If the scheduling doesn't play nice then latency will suffer. I don't really know in practice though.
Latency is an issue with file or network-based IPC, but shared memory is supposed to be the same speed as non-shared. Apparently the X window system relies on it a lot.
Scheduling, I dunno. Would imagine it's not bad as long as you don't spawn a ton of these.
The solution, if you can consider there to be one, was always languages with semantics to guarantee logical correctness. But we're never going to get that because it requires a pace of software development that's incompatible with the money people's fetish for churn and burn. So let's put all our eggs into the Rust basket and while we're at it trust the hardware to never lead us astray :)
Buffer overflows are bugs. You're asking how bugs can still be happening in this day and age? The answer is inadequate testing and broken engineering processes.
The question is how buffer overflows can still be happening, not bugs in general. No matter how bad my eng practices are, I cannot create a buffer overflow bug in a memsafe language.
Buffer overflows happen because a bounds check is missing. Using a language with implicit bounds checking (memsafe?) does _not_ excuse you from explicit bounds checking. Depending upon the language an implicit bounds check might "recover" by intentionally crashing or raising an exception. Either approach could result in lost or corrupt data with the latter being arguably worse as the program continues running in a potentially broken state. Bounds checking is error checking and non-trivial applications need robust error checking and recovery regardless of the language they are written in.
Exception gets caught somewhere up the call stack, like a webserver returning error 500. It's not the most user-friendly error handling, so you should avoid that whenever you can, but it's not a big problem. In some languages like C++, you'll trigger a crash instead, which can be more of an issue but still isn't nearly as bad as an undetected buffer overflow.
Additional timeline info, as I was curious myself. WebP is old enough that a memory safe language was not a feasible option when the project started.
Android 12 was the first version to support Rust code, and came out in 2021 [0, link talks about the first year of integration].
On the iOS side (which also was affected by this), Swift 1.0 came out in ~2014.
As far as I can tell, Chrome doesn't yet support a memory safe language, but do have a bunch of other safety things built in (see MiraclePtr, sandboxing, etc). Since both WebP and Chrome are from Google, this would stop a possible transition.
WebP was announced in 2010, and had its first stable release in 2018 [1].
> Chrome doesn't yet support a memory safe language
In addition to the safety features you mentioned, Chrome supports Wuffs, a memory safe programming language that supports runtime correctness checks, designed for writing parsers for untrusted files. I don’t think it existed at the start of the webp project either, but that’s what I would expect the webp parser to be written in, over Rust or a garbage collected language.
Is Rust in practice a memory safe language when you're doing tricks like decoding huffman-decoding huffman tables into buffers? It seems like once you optimize for performance this much, you're liable to turn off bounds checking here or there.
I agree it's definitely possible, and I'm certain we'll see a vuln due to some crazy Rust optimizations in the future.
That said, not switching over to a memory safe language, in my opinion, is letting perfect be the enemy of the good. Folks will still be able to write footguns, but better language choices will prevent bugs in the all the non-crazy optimized parts.
There are already huffman-decoding and some parts of webp algorithms in https://github.com/google/wuffs (language that finds missing bounds checks during compilations). In contrary, according to readme, this language allows to write more optimized code (compared to C). WEBP decoding is stated as a midterm target in the roadmap.
The point of `unsafe` in Rust is that you can generally limit `unsafe` code to a few short functions and encapsulate them in a safe interface. This means less code to review and maintaining strong guarantees in the rest of the safe Rust code.
Incorrect. Semantically, Rust always inserts runtime bounds-checks, regardless of optimization level. The compiler _may_ then remove them as part of an optimization pass, iff it can prove the check is redundant.
Do bounds checks even matter, even for very tight loops? This article [1] seems to suggest that removing the checks can give up to 10-15% speed increase under some circumstances. Worth it? I'd say no.
Ada has the bounds checking, but doesn't (AFAIK) have a safe way to deallocate dynamic memory. At least as of now, there is a proposal to add something like Rust in the future.
It also has memory unsafe concepts like specifying an address for a variable.
No. That is true for the list upthread (well, if you consider runtimes measuring a couple of MB "large", it's reasonable but quite arguable). It doesn't have much correlation with any feature.
This is a needlessly aggressive take IMO, and hailing Rust as the be-all and end-all for secure software lacks nuance, giving the language and its adherents a bad reputation.
Downvotes of course. Programmers like to think they can write secure code with no guardrails. The fact that CVE lists are full of memory errors in 2023 says otherwise.
Even if you can write secure code in unsafe languages, it’s still a problem. Bugs creep in over time as more people work on a project, changes are merged, it’s ported to other platforms, etc.
If your project has any longevity you can’t bet on your pristine C code staying pristine.
We simply don't appreciate other people imposing their will on us. We should be able to start a C project in 2023 without shame if we want to. Simply because we like C and don't like Rust. There's literally no need to justify it any further. We are free to do what we want.
If in doubt, remember: the software is distributed in the hopes it will be useful but with ABSOLUTELY NO WARRANTY OF ANY KIND.
There's a substantial difference between a hobbyist writing a random project in C, and Google writing an image processing implementation, and then including it in Chrome and Android.
Freedom of speech is not freedom from consequences. You can start your project in C and release it on an AS IS basis, and we can make fun of you for it, object to its use in professional contexts, and so on.
Don't complain about the downvotes or the well-deserved replies then.
> object to its use in professional contexts
Liability limitation is standard legal boilerplate in literally every free software and open source license. No doubt it's also present in the terms of service documents which nobody reads before accepting. If you're gonna object to using software based on that, you might find you'll have to rewrite all of it or pay top dollar for service level agreements.
> And how is the limitation of LEGAL liability relevant
It sure as hell matters when a guy comes here posting about how C programmers need to be sued for negligence. Can you imagine some asshole suing you because you started a project on GitHub but didn't use one of the approved languages?
No worries, in a couple of years you will be able to get goverment clearance for C, in similar way companies have to when dealing with hazardous substances.
I've been in tech for ages and I hate fads, and consider myself decent at spotting them. I was skeptical when I first started seeing Rust stuff but then I tried the language.
Definitely not a fad. It's a real innovation. Not the best language ever for language nerds / purists, but it brings a lot of important innovations down into the realm of the practical stuff you can easily use for real projects.
The only way it's going away is if something dramatically superior appears. Language design is very very hard and most "advanced" languages are impractical for real world use for various reasons, so I am not holding my breath.
The other newish language I like is Go, but for different use cases. Has a different design philosophy and imposes less cognitive load, but it's also designed for a different niche.
Observation: Uncompressed bitmaps, while bloated in terms of necessary bandwidth, still are provably the most secure form of bitmap -- just as uncompressed video (again, while super-bloaty and bandwidth intensive) would be...
That is, to abstract, our security issue exists because:
A) There is complex compression/decompression software/code;
B) To implement this compression/decompression -- there are one or more lookup tables in effect;
C) The software implementing those lookup tables and the decompression side of things were never properly fuzzed, bound-checked, and/or mathematically proven not to create out-of-bounds errors, that is, for every potential use of the lookup table to be guaranteed as correct mathematically given any possibility, any combination of data in an input stream.
Also -- Didn't stuff like this already happen in GIF and PNG formats? Weren't there past security vulnerabilities for those formats?
Isn't this just (I dare to say) -- computer history repeating itself ?
Point: Software Engineering Discipline:
If you as a Software Engineer implement a decompressor for whatever format (or hell, more broadly and generically something that uses lookup tables on incoming streams of data) -- then the "burden of proof" is on you, to prove (one way or another) that it does not have vulnerabilities.
Fuzzing can help, mathematics and inductive/deductive logic can help, testing can help, and paranoid coding (always bounds-checking array accesses, etc.) can help, running in virtual machines and other types of limited environments and sandboxes can help. Running as interpreted code (albeit slower) could help. Deferring decompression to other local network attached resources running in limited execution environments could help.
In short... a monumental challenge... summed up as:
"Prove that all code which uses lookup tables can not generate hidden/unwanted states."
Uncompressed bitmaps with absolutely NO extraneous unnecessary "handling code" (which would have existed in the case of the Windows BMP display code, if a security flaw was found in them...).
Also... many posters suggest use of Rust may be a solution...
It may turn out to be... but I think a more broader generalization of the language /compiler aspect of things may be:
Not so much to "use Rust", so much as "NOT to use C".
That is -- any library performing decompression of any sort (regardless of whether that decompression is related to visual images or not), if it uses C and lookup tables and implements decompression -- should at least be considered a potential source of future problems, and should (ideally) be migrated to a more memory-safe, bounds-checked language -- in the future.
Rust -- may or may not turn out to be this language...
Negatives for Rust -- large size and complexity of Rust's compiler source code.
Positives for Rust -- Rust's treatment of memory and bounds-checking.
Anyway, some thoughts on the language/compiler aspect of this...
Yes, but as counterargument: there are battle-tested compression formats that offer pretty-damned-good compression. If anything, this shows some questions of calculus when it comes to risk-vs-reward of embracing new standards for compression formats. When was the last time there was a serious vulnerability in major JPEG libs? Or even h.264, which is much newer? Yes, a .webp is like 2/3 the size of a similar JPEG, but for most people are JPEG image sizes a huge cause for concern? Once you're doing lossy compression you're already sacrificing fidelity for filesize so you don't see a lot of machine-crushing-huge JPEGs.
https://github.com/webmproject/libwebp/commit/902bc919033134...
This is also a great reminder that fuzzing isn't a solution to memory unsafe languages and libraries. If anything the massive amount of bugs found via fuzzing should scare us as it is likely only scratching the surface of the vulnerabilities that still lie in the code, a couple too many branches away from being likely to be found by fuzzing.