Dyeing the cheese orange – Beware of benchmarks

mistercow · on Dec 27, 2013

The name for this is a "false proxy", and its confounding effects are pervasive just about anywhere you look. The problem is that any time you use a proxy to make decisions, you create an incentive to falsify that proxy.

The worst version of the trap is when one manipulates a proxy and honestly believes that they are helping.

Sniffnoy · on Dec 27, 2013

See also: https://en.wikipedia.org/wiki/Goodhart%27s_law

noelherrick · on Dec 27, 2013

This analogy does not work. He's trying to make the point that we don't know what caused PayPal's application to speed-up and Node.js may not be the root cause at all. The cheese analogy is not correct since he's claiming their benchmark is like the orange color - a false proxy for what you really want: good tasting cheese. You do really want your application to go faster, though. It's not a false proxy - it's what this team was optimizing for. They had a measurable increase in success - they made better cheese.

dragonwriter · on Dec 27, 2013

> He's trying to make the point that we don't know what caused PayPal's application to speed-up and Node.js may not be the root cause at all. The cheese analogy is not correct since he's claiming their benchmark is like the orange color - a false proxy for what you really want: good tasting cheese.

I think you've misunderstood the analogy.

He's claiming that without knowledge of the prior state of their systems and the differences (besides implementation language) between the Node.js and earlier implementation that underlies the benchmark, "using Node.js" may be analogous to the orange color in the cheese -- something that came along with the cause of the improvement that is not actually the source of the improvement -- and that splashing Node.js on other applications (as in PayPal's stated decision based on the result of this upgrade to use Node.js on all future consumer-facing applications, but more importantly other people using PayPal's experience to justify using Node.js for their applications) may be analogous to dyeing the cheese.

> They had a measurable increase in success - they made better cheese.

Right. He's saying that "PayPal made a faster web app. They used Node.js. Therefore, we should use Node.js if we want our web apps to go faster" may be analogous to "Farmer Bob's cheese is better tasting. His cheese is orange. Therefore we should buy orange cheese if we want better tasting."

Though, really, like most uses of colorful analogies to get points across in technical articles it does a lot more to obscure the point than to illustrate it.

The TL,DR: You are taking a significant risk of serious error when you accept a claim that a change that preceded an effect caused that effect when you don't know what other changes occurred at the same time that may be relevant to the effect under consideration.

raganwald · on Dec 27, 2013

Yes!

There are two blog posts here, both interesting, but not deeply connected. One is about false proxies, which is specifically about people manipulating perceptions.

The other is about getting "measurably better cheese" but not understanding why it's measurably better. Speed is not a proxy for speed, it's just speed. The meme here is that correlation does not equal causation.

bslatkin · on Dec 27, 2013

The point is speed is a false proxy for the quality of Node.js. The results don't necessarily justify using it for everything going forward.

dragonwriter · on Dec 27, 2013

> The point is speed is a false proxy for the quality of Node.js.

Well, no, speed is the actual measure of interest. The point is that reimplementations that change language also often change other design features, and that without knowing what else changed one cannot validate the attribution of the post-change speed improvement to Node.js.

bslatkin · on Dec 27, 2013

I think we're saying the same thing :)

jamesaguilar · on Dec 27, 2013

Most of the nodejs success stories I've read have taken a serial, blocking system and made it parallel and non-blocking. The wins in this context are hardly surprising, and say little about the value of nodejs specifically.

dllthomas · on Dec 28, 2013

That the latter is idiomatic use of nodejs may itself be of value.

jamesaguilar · on Dec 28, 2013

It's idiomatic in a bunch of languages. There are also good ways to do blocking IO. "We remove the capability to use an entire category of IO" is probably the weakest feature claim of any language out there.

dllthomas · on Dec 28, 2013

Sure. I never made any claim that it unique, or that nodejs is great generally.

mathattack · on Dec 27, 2013

Good point. Any benchmark can be co-opted, that's why it's useful to have multiple benchmarks. There's a Dilbert that tells the story on it well. [1] This is true for a lot of fields. If you over-optimize on individual player scoring in basketball, your team might play worse even if there is a correlation between individual scoring and team scoring.

[1] http://search.dilbert.com/comic/10%20Dollars%20Bug%20Fix

teddyh · on Dec 27, 2013

http://dilbert.com/fast/1995-11-13/

http://dilbert.com/fast/1995-11-14/

dded · on Dec 27, 2013

We run a number of compute servers, and we mostly run just a couple of different types of jobs. (Sort of a traditional 80/20 split: 80% of computes are spent on a couple types of jobs, 20% is spent on a very large variety of jobs.)

We benchmark our two important loads whenever we're buying new servers. Surprisingly, some machines perform significantly better on one load, and others perform better on the other. To us, it's non-obvious why that would be so, given what we know about the software running (this software is not written by us, ours falls in the 20% bin).

In any case, this very simplistic try-the-actual-load benchmarking serves us very well.

greatsuccess · on Dec 28, 2013

I have never known Paypal to be a company that improves their software. This may be the first rewrite since the 90s for all I know. Yes they have a business, but as a consumer and merchant on Paypal the overriding feeling I have always gotten is that they are on cruise control as far as engineering.

So when they say they changed to node and things got better, I just take it as, OMG, paypal tried to improved something and it improved.

Im sure it has nothing to do with node.