That's completely crazy, the backdoor is introduced through a very cryptic addit...

agwa · on March 29, 2024

Thanks to autoconf, we're now used to build scripts looking like gibberish. A perfect place to hide a backdoor.

rwmj · on March 29, 2024

This is my main take-away from this. We must stop using upstream configure and other "binary" scripts. Delete them all and run "autoreconf -fi" to recreate them. (Debian already does something like this I think.)

cesarb · on March 29, 2024

> We must stop using upstream configure and other "binary" scripts. Delete them all and run "autoreconf -fi" to recreate them.

I would go further than that: all files which are in a distributed tarball, but not on the corresponding git repository, should be treated as suspect.

Distributing these generated autotools files is a relic of times when it could not be expected that the target machine would have all the necessary development environment pieces. Nowadays, we should be able to assume that whoever wants to compile the code can also run autoconf/automake/etc to generate the build scripts from their sources.

And other than the autotools output, and perhaps a couple of other tarball build artifacts (like cargo simplifying the Cargo.toml file), there should be no difference between what is distributed and what is on the repository. I recall reading about some project to find the corresponding commit for all Rust crates and compare it with the published crate, though I can't find it right now; I don't know whether there's something similar being done for other ecosystems.

nolist_policy · on March 29, 2024

One small problem with this is that autoconf is not backwards-compatible. There are projects out there that need older autoconf than distributions ship with.

cesarb · on March 29, 2024

The test code generated by older autoconf is not going to be work correctly with newer GCC releases due to the deprecation of implicit int and implicit function declarations (see https://fedoraproject.org/wiki/Changes/PortingToModernC), so these projects already have to be updated to work with newer autoconf.

badsectoracula · on March 29, 2024

Typing `./configure` wont work but something like `./configure CFLAGS="-Wno-error=implicit-function-declaration"` (or whatever flag) might work (IIRC it is possible to pass flags to the compiler invocations used for checking out the existence of features) without needing to recreate it.

Also chances are you can shove that flag in some old `configure.in` and have it work with an old autoconf for years before it having to update it :-P.

rwmj · on March 29, 2024

There are, and they need to be fixed.

pajko · on March 30, 2024

https://www.gnu.org/software/autoconf-archive/

Too · on March 30, 2024

Easily solved with Docker.

Yes, it sucks to add yet another wrapper but that’s what you get for choosing non backwards compatible tools in the first place. In combination with projects that don’t keep up to date on supporting later versions.

account42 · on April 4, 2024

A yes, let's replace "binary" blobs in release archives with even bigger blobs of mistery goo.

londons_explore · on March 29, 2024

Why do we distribute tarballs at all? A git hash should be all thats needed...

cesarb · on March 29, 2024

> Why do we distribute tarballs at all? A git hash should be all thats needed...

A git hash means nothing without the repository it came from, so you'd need to distribute both. A tarball is a self-contained artifact. If I store a tarball in a CD-ROM, and look at it twenty years later, it will still have the same complete code; if I store a git hash in a CD-ROM, without storing a copy of the repository together with it, twenty years later there's a good chance that the repository is no longer available.

We could distribute the git hash together with a shallow copy of the repository (we don't actually need the history as long as the commit with its trees and blobs is there), but that's just reinventing a tarball with more steps.

(Setting aside that currently git hashes use SHA-1, which is not considered strong enough.)

londons_explore · on March 29, 2024

except it isn't reinventing the tarball, because the git hash forces verification that every single file in the repo matches that in the release.

And git even has support for "compressed git repo in a file" or "shallow git repo in a file" or even "diffs from the last release, compressed in a file". They're called "git bundle"'s.

They're literally perfect for software distribution and archiving.

bombcar · on March 29, 2024

People don't know how to use git hashes, and it's not been "done". Whereas downloading tarballs and verifying hashes of the tarball has been "good enough" because the real thing it's been detecting is communication faults, not supply chain attacks.

People also like version numbers like 2.5.1 but that's not a hash, and you can only indirectly make it a hash.

hypnagogic · on March 30, 2024

> I would go further than that: all files which are in a distributed tarball, but not on the corresponding git repository, should be treated as suspect.

This and the automated A/B / diff to check the tarball against the repo, flag if mismatched.

nicolas_17 · on March 29, 2024

The backdoor is in an .m4 file that gets parsed by autoconf to generate the configure script. Running autoconf yourself won't save you.

rwmj · on March 29, 2024

That's not entirely true. autoreconf will regenerate m4/build-to-host.m4 but only if you delete it first.

tobias2014 · on March 29, 2024

It seems like this was the solution for archlinux, pull directly from the github tag and run autogen: https://gitlab.archlinux.org/archlinux/packaging/packages/xz...

1oooqooq · on March 29, 2024

it's shocking how many packages on distros are just one random tarball from the internet with lipstick

k8svet · on March 29, 2024

Oh come on, please, let's put autotools out to pasture. I've lost so much of my life fighting autotools crap compared to "just use meson".

account42 · on April 4, 2024

As long as we also exterminate it's even more evil brother - libtool.

snnn · on March 30, 2024

I don't think it would help much. I work on machine learning frameworks. A lot of them(and math libraries) rely on just in time compilation. None of us has the time or expertise to inspect JIT-ed assembly code. Not even mentioning that much of the code deliberately read/write out of bound, which is not an issue if you always add some extra bytes at the end of each buffer, which could make most memory sanitizer tools useless. When you run their unit tests, you run the JIT code, then a lot of things could happen. Maybe we should ask all packaging systems splitting their build into compile and test two stages, to ensure that a testing code would not impact the binaries that are going to be published. I would rather to read and analysis the generated code instead of the code that generates it.

anthk · on March 29, 2024

I always run autoreconfig -ifv first.

rwmj · on March 29, 2024

In this case it wouldn't be sufficient. You had to also delete m4/build-to-host.m4 for autoreconf to recreate it.

anthk · on March 30, 2024

Thanks. At least the distro I use (Hyperbola) it's LTS bound, so is not affected.

pornel · on March 30, 2024

Maybe it's time to dramatically simplify autoconf?

How long do we need to (pretend to) keep compatibility with pre-ANSI C compilers, broken shells on exotic retro-unixes, and running scripts that check how many bits are in a byte?

phendrenad2 · on March 31, 2024

Not just autoconf. Build systems in general are a bad abstraction, which leads to lots and lots of code to try to make them do what you want. It's a sad reality of the mismatch between a prodecural task (compile files X, Y, and Z into binary A) and what we want (compile some random subset of files X, Y, and Z, doing an arbitrary number of other tasks first, into binary B).

For fun, you can read the responses to my musing that maybe build systems aren't needed: https://news.ycombinator.com/item?id=35474996 (People can't imagine programming without a build system - it's sad)

hgs3 · on March 30, 2024

Autoconf is m4 macros and Bourne shell. Most mainstream programming languages have a packaging system that lets you invoke a shell script. This attack is a reminder to keep your shell scripts clean. Don't treat them as an afterthought.

hypnagogic · on March 30, 2024

I'm wondering is there i.e. no way to add an automated flagging system that A/B / `diff` checks the tarball contents against the repo's files and warns if there's a mismatch? This would be on i.e. GitHub's end so that there'd be this sort of automated integrity test and subsequent warning? Just a thought, since tainted tarballs like these might be altogether be (and become) a threat vector, regardless of the repo.

demizer · on March 29, 2024

Maybe the US Government needs to put its line in the sand and mandate the end of autotools. :D

resonious · on March 31, 2024

It looks like they are trying to get rid of C, so maybe in luck!

omoikane · on March 29, 2024

It looks like an earlier commit with a binary blob "test data" contained the bulk of the backdoor, then the configure script enabled it, and then later commits patched up valgrind errors caused by the backdoor. See the commit links in the "Compromised Repository" section.

Also, seems like the same user who made these changes are still submitting changes to various repositories as of a few days ago. Maybe these projects need to temporarily stop accepting commits until further review is done?

ptx · on March 29, 2024

The use of "eval" stands out, or at least it should stand out – but there are two more instances of it in the same script, which presumably are not used maliciously.

A while back there was a discussion[0] of an arbitrary code execution vulnerability in exiftool which was also the result of "eval".

Avoiding casual use of this overpowered footgun might make it easier to spot malicious backdoors. Usually there is a better way to do it in almost all cases where people feel the need to reach for "eval", unless the feature you're implementing really is "take a piece of arbitrary code from the user and execute it".

[0] https://news.ycombinator.com/item?id=39154825

bonzini · on March 29, 2024

Unfortunately eval in a shell script has an effect on the semantics but is not necessary to do some kind of parsing of the contents of a variable, unlike Python or Perl or JavaScript. A

    $goo

line (without quotes) will already do word splitting, though it won't do another layer of variable expansion and unquoting, for which you'll need

    eval "$goo"

(This time with quotes).

jwilk · on March 29, 2024

eval in autoconf macros is nothing unusual.

In (pre-backdoor) xz 5.4.5:

  $ grep -wl eval m4/*
  m4/gettext.m4
  m4/lib-link.m4
  m4/lib-prefix.m4
  m4/libtool.m4

lyu07282 · on March 30, 2024

> Usually there is a better way to do it in almost all cases where people feel the need to reach for "eval"

unfortunately thats just standard in configure scripts, for example from python:

``` grep eval Python-3.12.2/configure | wc -l 165 ```

and its 32,958 lines of code, plenty of binary fixtures as well in the tarball to hide stuff.

who knows, but I have feeling us finding the backdoor in this case was more of a happy accident.

zb3 · on March 29, 2024

Yeah, now imagine they succeeded and it didn't cause any performance issues...

Can we even be sure no such successful attempt has already been made?

coldpie · on March 29, 2024

You can be certain it has happened, many times. Now think of all the software we mindlessly consume via docker, language package managers, and the like.

Remember, there is no such thing as computer security. Make your decisions accordingly :)

gpvos · on March 29, 2024

No, we can't.

tetromino_ · on March 29, 2024

A big part of the problem is all the tooling around git (like the default github UI) which hides diffs for binary files like these pseudo-"test" files. Makes them an ideal place to hide exploit data since comparatively few people would bother opening a hex editor manually.

acdha · on March 29, 2024

How many people read autoconf scripts, though? I think those filters are symptom of the larger problem that many popular C/C++ codebases have these gigantic build files which even experts try to avoid dealing with. I know why we have them but it does seem like something which might be worth reconsidering now that the tool chain is considerably more stable than it was in the 80s and 90s.

bonzini · on March 29, 2024

How many people read build.rs files of all the transitive dependencies of a moderately large Rust project?

Autoconf is bad in this respect but it's not like the alternatives are better (maybe Bazel).

acdha · on March 30, 2024

The alternatives are _better_ but still not great. build.rs is much easier to read and audit, for example, but it’s definitely still the case that people probably skim past it. I know that the Rust community has been working on things like build sandboxing and I’d expect efforts to be a lot easier there than in a mess of m4/sh where everyone is afraid to break 4 decades of prior usage.

bonzini · on March 30, 2024

build.rs is easier to read, but it's the tip of the iceberg when it comes to auditing.

If I were to sneak in some underhanded code, I'd do it through either a dependency that is used by build.rs (not unlike what was done for xz) or a crate purporting to implement a very useful procedural macro...

bdd8f1df777b · on March 30, 2024

Bazel has its problems but the readability is definitely better. And bazel BUILD files are quite constrained in what it can do.

salawat · on March 29, 2024

I mean, autoconf is basically a set of template programs for snffing out whether a system has X symbol available to the linker. Any replacement for it would end up morphing into it over time.

Some things are just that complex.

acdha · on March 29, 2024

We have much better tools now and much simpler support matrices, though. When this stuff was created, you had more processor architectures, compilers, operating systems, etc. and they were all much worse in terms of features and compatibility. Any C codebase in the 90s was half #ifdef blocks with comments like “DGUX lies about supporting X” or “SCO implemented Y but without option Z so we use Q instead”.

salawat · on March 31, 2024

That's the essence of programming in C though...

You figure out what the hardware designers actually did, and get the program written to accommodate it.

johnny22 · on March 29, 2024

I don't see how showing the binary diffs would help. 99.99999% of people would just scroll right past them anyways.

AeroNotix · on March 29, 2024

Even in binary you can see patterns. Not saying it's perfect to show binary diffs (but it is better than showing nothing) but I know even my slow mammalian brain can spot obvious human readable characters in various binary encoding formats. If I see a few in a row which doesn't make sense, why wouldn't I poke it?

janc_ · on March 29, 2024

This particular file was described as an archive file with corrupted data somewhere in the middle. Assuming you wanted to scroll that far through a hexdump of it, there could be pretty much any data in there without being suspicious.

ok123456 · on March 29, 2024

What should I look for? The evil bit set?

johnny22 · on March 29, 2024

Sure, the same person who's gonna be looking is the same person who'd click "show diff"

bangoodirro · on March 29, 2024

00011900: 0000 4883 f804 7416 b85f 5f5f 5f33 010f ..H...t..____3.. │ 00011910: b651 0483 f25a 09c2 0f84 5903 0000 488d .Q...Z....Y...H. │ 00011920: 7c24 40e8 5875 0000 488b 4c24 4848 3b4c |$@.Xu..H.L$HH;L │ 00011930: 2440 7516 4885 c074 114d 85ff 0f84 3202 $@u.H..t.M....2. │ 00011940: 0000 498b 0ee9 2c02 0000 b9fe ffff ff45 ..I...,........E │ 00011950: 31f6 4885 db74 0289 0b48 8bbc 2470 1300 1.H..t...H..$p.. │ 00011960: 0048 85ff 0f85 c200 0000 0f57 c00f 2945 .H.........W..)E │ 00011970: 0048 89ac 2470 1300 0048 8bbc 2410 0300 .H..$p...H..$... │ 00011980: 0048 8d84 2428 0300 0048 39c7 7405 e8ad .H..$(...H9.t... │ 00011990: e6ff ff48 8bbc 24d8 0200 0048 8d84 24f0 ...H..$....H..$. │ 000119a0: 0200 0048 39c7 7405 e893 e6ff ff48 8bbc ...H9.t......H.. │ 000119b0: 2480 0200 0048 8d84 2498 0200 0048 39c7 $....H..$....H9. │ 000119c0: 7405 e879 e6ff ff48 8bbc 2468 0100 004c t..y...H..$h...L │ Please tell me what this code does, Sheldon

tetromino_ · on March 29, 2024

You're right - the two exploit files are lzma-compressed and then deliberately corrupted using `tr`, so a hex dump wouldn't show anything immediately suspicious to a reviewer.

Mea culpa!

maxcoder4 · on March 29, 2024

Is this lzma compressed? Hard to tell because of the lack of formatting, but this looks like amd64 shellcode to me.

But that's not really important to the point - I'm not looking at a diff of every committed favicon.ico or ttf font or a binary test file to make sure it doesn't contain a shellcode.

bangoodirro · on March 30, 2024

it's just an arbitrary section of libcxx-18.1.1/lib/libc++abi.so.1.0

londons_explore · on March 29, 2024

testdata should not be on the same machine as the build is done. testdata (and tests generally) aren't as well audited, and therefore shouldn't be allowed to leak into the finished product.

Sure - you want to test stuff, but that can be done with a special "test build" in it's own VM.

hanwenn · on March 30, 2024

In the Bazel build system, you would mark the test data blob as testonly=1. Then the build system guarantees that the blob can only be used in tests.

This incident shows that killing the autoconf goop is long overdue.

Hackbraten · on March 29, 2024

That could easily double build cost. Most open-source package repositories are not exactly in a position to splurge on their build infra.

lyu07282 · on March 30, 2024

in this case the backdoor was hidden in a nesting doll of compressed data manipulated with head/tail and tr, even replacing byte ranges inbetween. Would've been impossible to find if you were just looking at the test fixtures.

20after4 · on March 29, 2024

> "Given the activity over several weeks, the committer is either directly involved or there was some quite severe compromise of their system. Unfortunately the latter looks like the less likely explanation, given they communicated on various lists about the "fixes" mentioned above."

Crazy indeed.