How to break everything by fuzz testing

TwoBit · on April 27, 2020

My favorite personal fuzzing story is from 1987 when a friend said his x86 graphics drawing program was solid, and I said OK and smashed both hands on the keyboard and it insta-crashed.

MaxBarraclough · on April 27, 2020

Mark Twain put it best: The weakest of all weak things is a virtue which has not been tested in the fire.

thechao · on April 27, 2020

> The fix for this was fairly straightforward - I just made the library keep a record of the previously visited IFDs and bail out if it found a loop.

If you just want to detect loops, keep a “+1” pointer that you use to increment through the data; also, keep a “+2” pointer that is advanced twice each time your “+1” pointer advances: either your “+2” pointer hits the end, or it becomes equal to your “+1” pointer — in which case you have a loop.

colatkinson · on April 27, 2020

Also known as the "Tortoise and Hare Algorithm!"

https://en.wikipedia.org/wiki/Cycle_detection#Floyd's_Tortoi...

andrepd · on April 27, 2020

That's quite clever!

bhaak · on April 27, 2020

If you haven't read the whole article, you should do that.

There's a funny plot twist at the end.

3 bug reports by discovering 1 bug. What a bargain! :-D

oweiler · on April 27, 2020

In university we had to write a simple fuzzer which extracted options from man pages and ran the corresponding command with randomized but valid options. Didn't take long until we found the first bug in one of the tested commands.

bpbp-mango · on April 27, 2020

that's a good assignment - extra marks for reporting the bugs?

Psyladine · on April 27, 2020

"A QA engineer walks into a bar. Orders a beer. Orders 0 beers. Orders 99999999999 beers. Orders a lizard. Orders -1 beers. Orders a ueicbksjdhd.

First real customer walks in and asks where the bathroom is. The bar bursts into flames, killing everyone."

pfdietz · on April 27, 2020

A very funny thing about fuzzing is how random input testing used to be so looked down upon by the software testing community. Read old (1970s) testing books and you'll see comments like "random testing is the worst kind of testing". I still saw this even as recently as a decade ago.

praptak · on April 27, 2020

Fuzzing is not random testing though. It's directed random testing.

pfdietz · on April 27, 2020

The original fuzzing was as random + black box as it gets.

UncleMeat · on April 27, 2020

Yes, and things like coverage guided fuzzing have completely revolutionized things. Prior to directed fuzzing, it was okay but largely unimpressive. Now it blazes through code structures that were previously used as motivating examples for symbolic execution. It is a meaningfully different technique today.

pfdietz · on April 27, 2020

Well, that depends on what you mean by "impressive", and in what domain. Black box compiler fuzzing has been very effective.

UncleMeat · on April 27, 2020

That's actually one of the few fields where I feel like fuzzing has underperformed. There was an interesting paper at OOPSLA this year that found that while the fuzzing community has indeed found a lot of bugs, that these bugs actually are triggered by real code approximately-never. It was a really interesting result coming from within a community that ordinarily biases towards overinflating the value of PL techniques.

pfdietz · on April 27, 2020

That paper, if it's the one I'm thinking of, found that a compiler bug found by fuzzing was more likely to be found by a user, than a user found bug was likely to be found by another user. So if fuzzing-found bug reports are bad, user-found bug reports are even less useful.

Another thing to remember is that as blackbox fuzzing became state of the practice, its benefit declined, as the bugs it would find would be found early, by the developers themselves. All testing techniques are self-limiting this way.

I want you to look at the results of jsfunfuzz and tell me the impact of that wasn't profound.

m000 · on April 27, 2020

That piece of advice was probably valid in the 1970s: The computers were far too slow and far too expensive for any kind of random testing to make sense. Fuzzing became popular when multi-core CPUs became commonplace and RAM more affordable.

dnautics · on April 27, 2020

You really shouldn't do random testing. Fuzzing is better, but property testing (fuzzing with shrinkage) is even better.

pfdietz · on April 27, 2020

The original prejudice was against any sort of randomness in testing. Manually constructed tests were seen as superior. That may have been true when computer time was dear, but the bias persisted into the latest edition of a well known book on software testing, published after (for example) Csmith had been released.

snazz · on April 27, 2020

Fuzzing is fun! If you're doing it on your personal computer (as opposed to a cloud VM somewhere), I'd suggest putting the testcase output directory on a spinning-rust hard drive that you don't care about instead of your (presumably much more expensive) internal SSD. It creates an impressive number of disk writes.

I've been thinking about fuzzing JavaScript code (not attacking V8 or SpiderMonkey, but the JS code itself). While JavaScript might not be vulnerable to buffer overflows and format string vulnerabilities, it certainly can have logic issues, unhandled exceptions, and DoS vulnerabilities that are exposed by fuzzing.

I took a look at the most-depended-on NPM packages. I'll try writing test harnesses on functions that take user input. Does anyone have any ideas for packages that could use some fuzz testing?

segfaultbuserr · on April 27, 2020

> I'd suggest putting the testcase output directory on a spinning-rust hard drive that you don't care about instead of your (presumably much more expensive) internal SSD.

Even better, use the /dev/shm RAM disk if your memory is more than enough (although you should probably create an additional RAM disk with a size limit if you don't want a runaway program to accidentally drain your RAM). On a modern development machine, taking 2 GiB out for testcase issue is usually not a problem, and there's often a significant acceleration.

vimslayer · on April 27, 2020

If you are interested in finding possible security holes, you could try finding prototype pollution bugs in basically any library that somehow handles user input. Utility libraries like lodash and underscore, argument parsers like yargs, minimist, others like moment, handlebars, DB/ODM tools like Mongoose, Knex, etc.

You'd look for code where input would be able to modify Object.prototype (or I guess some other constructor's prototype) unintentionally (and it's basically always unintentional).

Example of such vulnerability found in Minimist https://snyk.io/vuln/SNYK-JS-MINIMIST-559764

These issues are a constant pain in the JS ecosystem and you wouldn't be the only one using fuzzing to try to find them.

someguyorother · on April 27, 2020

If the files you are fuzzing are small, then you could just create a couple of gigs of tmpfs ramdisk with something like "mount -t tmpfs -o size=2G none /mnt/somewhere" and put your fuzz directory on there.

Then the impressive number of writes are all to memory, which should pose no problem.

jansan · on April 27, 2020

It can be difficult to evaluate the result of a test. We solved this by using an existing (of course inferior) library that uses a different algorithm for the same task (different algorithm so it fails at different tests). We would run the same test with both libraries and compare the results. If they were different, we had to find a way to decide which library failed or maybe evaluate those failed cases manually.

luord · on April 27, 2020

I love case studies like this. They are the best way to show why the subject matter at hand is important and worth investing time into.

TanakaTarou · on April 27, 2020

Interesting. Is there any fuzzing libraries for c#?

thewebcount · on April 27, 2020

Can you call c# from C? If so, then you can just use any C fuzzing library, and have it call your C# code. I do this with C++ and Objective-C using clang's libfuzzer. You write a single C function that takes a pointer to a buffer and a length and pass it to whomever you want. I just write a C wrapper that calls my Objective-C or C++ functions with the data.

snazz · on April 27, 2020

Doesn't libFuzzer only require `extern "C" int LLVMFuzzerTestOneInput(...` to fuzz C++ code? What else does your C wrapper do beyond that? Google puts their fuzz tests right alongside the rest of the Chromium source code, which is C++.

gnud · on April 27, 2020

There's a "port" of AFL - https://github.com/Metalnem/sharpfuzz

mebr · on April 26, 2020

My summary of this blog post: plenty of random input data can reveal code bugs. The kind of bugs that would take probably a lot of time of think of and write unit tests for, in advance.

userbinator · on April 26, 2020

The kind of bugs that would take probably a lot of time of think of and write unit tests for, in advance.

Would it? Maybe it's because I've had a "low-level upbringing", but whenever I'm writing parsing code for a file format, "assume any byte of data you read can have any value" is the norm. The rest of it follows from there.

mebr · on April 27, 2020

Let's go one step higher, keeping track of the state by a state machine. When designing/coding with the correctness on mind, I try to stay focused, and not think of edge cases. Or I will end up spending more time coming up with edge cases and what can go wrong. I'm not lazy, I'm almost certain of that. But I do feel time is a limited resource and want to add more value per hour spent working. Maybe, this is more the case of if it can be automated then automate it.

jra_samba_org · on April 27, 2020

Yeah, I've gotten to the point where I can't do any arithmetic on any values without immediately adding integer wrap tests afterwards.

mcswell · on April 28, 2020

Reminds me of using a slide rule. You normally push the inner part (the C scale) to the right, line up the 1 on the C scale with the first number you're multiplying on the D scale, then look on the C scale for the second number you're multiplying, and read the result off the D scale immediately below that.

But when the result is more than 10, you've wrapped: your answer is off the D scale. So now you have to push the inner part back to the left, and line up the 10 (usually marked as 1, at the right-hand end) on the C scale with the first number on the D scale. And remember to add 1 to the exponent.

I've seen slide rules where the D scale goes slightly beyond 10 (like 10.1), so if the result was just a tiny bit over 10, you wouldn't need to wrap.

MaxBarraclough · on April 27, 2020

Something the C# language gets right is that it can be configured to throw on overflow, or to wrap-around.

a1369209993 · on April 27, 2020

Although the really nasty bugs (like this one) happen when the individual bytes are all sensible, but the meta-level structure of the file is toxic.

WrtCdEvrydy · on April 26, 2020

It just gets worse and worse....