My favorite personal fuzzing story is from 1987 when a friend said his x86 graphics drawing program was solid, and I said OK and smashed both hands on the keyboard and it insta-crashed.
> The fix for this was fairly straightforward - I just made the library keep a record of the previously visited IFDs and bail out if it found a loop.
If you just want to detect loops, keep a “+1” pointer that you use to increment through the data; also, keep a “+2” pointer that is advanced twice each time your “+1” pointer advances: either your “+2” pointer hits the end, or it becomes equal to your “+1” pointer — in which case you have a loop.
In university we had to write a simple fuzzer which extracted options from man pages and ran the corresponding command with randomized but valid options. Didn't take long until we found the first bug in one of the tested commands.
A very funny thing about fuzzing is how random input testing used to be so looked down upon by the software testing community. Read old (1970s) testing books and you'll see comments like "random testing is the worst kind of testing". I still saw this even as recently as a decade ago.
Yes, and things like coverage guided fuzzing have completely revolutionized things. Prior to directed fuzzing, it was okay but largely unimpressive. Now it blazes through code structures that were previously used as motivating examples for symbolic execution. It is a meaningfully different technique today.
That's actually one of the few fields where I feel like fuzzing has underperformed. There was an interesting paper at OOPSLA this year that found that while the fuzzing community has indeed found a lot of bugs, that these bugs actually are triggered by real code approximately-never. It was a really interesting result coming from within a community that ordinarily biases towards overinflating the value of PL techniques.
That paper, if it's the one I'm thinking of, found that a compiler bug found by fuzzing was more likely to be found by a user, than a user found bug was likely to be found by another user. So if fuzzing-found bug reports are bad, user-found bug reports are even less useful.
Another thing to remember is that as blackbox fuzzing became state of the practice, its benefit declined, as the bugs it would find would be found early, by the developers themselves. All testing techniques are self-limiting this way.
I want you to look at the results of jsfunfuzz and tell me the impact of that wasn't profound.
That piece of advice was probably valid in the 1970s: The computers were far too slow and far too expensive for any kind of random testing to make sense. Fuzzing became popular when multi-core CPUs became commonplace and RAM more affordable.
The original prejudice was against any sort of randomness in testing. Manually constructed tests were seen as superior. That may have been true when computer time was dear, but the bias persisted into the latest edition of a well known book on software testing, published after (for example) Csmith had been released.
Fuzzing is fun! If you're doing it on your personal computer (as opposed to a cloud VM somewhere), I'd suggest putting the testcase output directory on a spinning-rust hard drive that you don't care about instead of your (presumably much more expensive) internal SSD. It creates an impressive number of disk writes.
I've been thinking about fuzzing JavaScript code (not attacking V8 or SpiderMonkey, but the JS code itself). While JavaScript might not be vulnerable to buffer overflows and format string vulnerabilities, it certainly can have logic issues, unhandled exceptions, and DoS vulnerabilities that are exposed by fuzzing.
I took a look at the most-depended-on NPM packages. I'll try writing test harnesses on functions that take user input. Does anyone have any ideas for packages that could use some fuzz testing?
> I'd suggest putting the testcase output directory on a spinning-rust hard drive that you don't care about instead of your (presumably much more expensive) internal SSD.
Even better, use the /dev/shm RAM disk if your memory is more than enough (although you should probably create an additional RAM disk with a size limit if you don't want a runaway program to accidentally drain your RAM). On a modern development machine, taking 2 GiB out for testcase issue is usually not a problem, and there's often a significant acceleration.
If you are interested in finding possible security holes, you could try finding prototype pollution bugs in basically any library that somehow handles user input. Utility libraries like lodash and underscore, argument parsers like yargs, minimist, others like moment, handlebars, DB/ODM tools like Mongoose, Knex, etc.
You'd look for code where input would be able to modify Object.prototype (or I guess some other constructor's prototype) unintentionally (and it's basically always unintentional).
If the files you are fuzzing are small, then you could just create a couple of gigs of tmpfs ramdisk with something like "mount -t tmpfs -o size=2G none /mnt/somewhere" and put your fuzz directory on there.
Then the impressive number of writes are all to memory, which should pose no problem.
It can be difficult to evaluate the result of a test. We solved this by using an existing (of course inferior) library that uses a different algorithm for the same task (different algorithm so it fails at different tests). We would run the same test with both libraries and compare the results. If they were different, we had to find a way to decide which library failed or maybe evaluate those failed cases manually.
Can you call c# from C? If so, then you can just use any C fuzzing library, and have it call your C# code. I do this with C++ and Objective-C using clang's libfuzzer. You write a single C function that takes a pointer to a buffer and a length and pass it to whomever you want. I just write a C wrapper that calls my Objective-C or C++ functions with the data.
Doesn't libFuzzer only require `extern "C" int LLVMFuzzerTestOneInput(...` to fuzz C++ code? What else does your C wrapper do beyond that? Google puts their fuzz tests right alongside the rest of the Chromium source code, which is C++.
My summary of this blog post: plenty of random input data can reveal code bugs. The kind of bugs that would take probably a lot of time of think of and write unit tests for, in advance.
The kind of bugs that would take probably a lot of time of think of and write unit tests for, in advance.
Would it? Maybe it's because I've had a "low-level upbringing", but whenever I'm writing parsing code for a file format, "assume any byte of data you read can have any value" is the norm. The rest of it follows from there.
Let's go one step higher, keeping track of the state by a state machine. When designing/coding with the correctness on mind, I try to stay focused, and not think of edge cases. Or I will end up spending more time coming up with edge cases and what can go wrong. I'm not lazy, I'm almost certain of that. But I do feel time is a limited resource and want to add more value per hour spent working. Maybe, this is more the case of if it can be automated then automate it.
Reminds me of using a slide rule. You normally push the inner part (the C scale) to the right, line up the 1 on the C scale with the first number you're multiplying on the D scale, then look on the C scale for the second number you're multiplying, and read the result off the D scale immediately below that.
But when the result is more than 10, you've wrapped: your answer is off the D scale. So now you have to push the inner part back to the left, and line up the 10 (usually marked as 1, at the right-hand end) on the C scale with the first number on the D scale. And remember to add 1 to the exponent.
I've seen slide rules where the D scale goes slightly beyond 10 (like 10.1), so if the result was just a tiny bit over 10, you wouldn't need to wrap.