Riak, by default, uses a replication value of three. Your single test machine ha...

Riak, by default, uses a replication value of three. Your single test machine has to do ~three times the work, so you should expect slower performance here. (I'm oversimplifying somewhat.)

You'll see significantly improved performance on a linear test (in my informal testing, 3-4x speedups) by adding an extra two nodes. Parallelized tests pretty much scale linearly with nodes.

In practice, I've found Riak to be slightly slower than MySQL. Direct reads/writes tend to be fast, but JSON parsing can bite you and denormalization requires more writes. The major advantage is that the Riak system can scale linearly with nodes, and that it can fail in predictable and resolvable ways.

As an example, the feed system I'm currently building on Riak will survive a total network partition and allow full reads and writes from every node with no data lost. Everything is automatically merged when the partition ends. The vclock-tagged multi-value functionality of Riak is exceptionally powerful when you want to design these types of systems, and is, in my mind, worth the performance hit and additional design complexity for certain classes of problems.

This was last year so maybe Riak's performance has increased since then. I'd be interested if TokyoCabinet was added as a backend.

There are also InnoDB and multiple in-memory backends, which may provide performance characteristics more in line with what you are looking for.