Performance Culture

Animats · on April 11, 2016

This seems generic. Here are a few phrases from the article. For X, substitute "security", "quality", "user satisfaction", or "revenue".

- The cultural transformation must start from the top

- X often regresses and team members either don’t know, don’t care, or find out too late to act.

- Blame is one of the most common responses to X problems

- Performance tests swing wildly, cannot be trusted, and are generally ignored by most of the team.

- X is something one, or a few, individuals are meant to keep an eye on, instead of the whole team.

- X issues in production are common, and require ugly scrambles to address (and/or cannot be reproduced).

Unexpected performance problems typically are not that hard a problem to deal with once you can measure and find the bottlenecks. This tends to be local, rather than global. The other kind of performance problem is scaling - traffic is up, things are slow, and the architecture needs to be scaled. That can be hard.

hliyan · on April 11, 2016

As someone who has worked on HFT systems for a decade, if there is one piece of advice I can give on the matter, it is this: test with production-like data. Having processes, 'performance-culture' and unit-wise perf testing is all good and fine, but unless you test your features with production-like loads, you can never be sure.

I've seen features that were regression/unit tested and code-reviewed to death fail in production because all that testing was done at < 10 transactions/sec with a dataset of about 100, when typical production values were in the neighborhood of 1000/tx/sec with a dataset of perhaps 10,000.

At the very beginning of the design process, someone needs to ask, "How frequently will this feature get invoked? What kind of latency can the user tolerate?"

brianwawok · on April 11, 2016

Spending time on them exchange side, prod like data isn't enough. We had bugs not hit with prod like data hit us in prod. To get the real truth, test in prod.

Obviously you need good infrastructure to do so.. But you can't fake a prod exchange. Too much bandwidth and too many users each on their own agenda.

clevernickname · on April 11, 2016

Very true, and a huge part of why, say, game engines have a reputation for being efficient, and, say, web software has a reputation for being inefficient. Performance clearly matters (or should matter) to both (dictating what a game can do, or dictating how much it costs to run your web servers and thus remain profitable while sustaining growth), yet it's a lot easier to throw more data into a game until it breaks than it is to simulate a million web sessions.

amelius · on April 11, 2016

But what if testing with production-like data takes 1000x the time of testing with faked-test-data?

hliyan · on April 11, 2016

Then one makes the judgement call:

    cost of testing w/ production-like data > cost of potential production issues

and decide whether it's prudent to omit or scale down the tests.

pjc50 · on April 11, 2016

Well, if it's a real-time system it shouldn't take too long to test with production-like data; and if it does take too long that probably indicates a problem.

For an HFT system I'd have the dev version doing overnight testing with the day's data or chunk thereof, in delayed-realtime mode.

marknadal · on April 11, 2016

We've started to do this at our company, and it has really improved the morale. We've also been giving talks at local meetups about our findings and it had created a bit of a buzz in the community which reflects positively back on the company.

I'm starting to tell people "go beyond TDD with PTSD (performance testing stipulated development)". It will give your machine trauma, but it helps gamify development in an area that is otherwise dry and boring. Even worse, our company, GUN, does JavaScript development (we're an Open Source Firebase), and JS engines are sadly slow, but it is critical to our engineering team. In the process, I've collected a humorous sample of some very basic JS operations that you can run yourself, starting with nothing all the way up to loops and closures - http://db.marknadal.com/ptsd/ptsd.html . Hope you enjoy it and don't get too depressed.

ZeroGravitas · on April 11, 2016

Naming your project after an illness is a strange choice. What was the thinking behind that?

Particularly as you mention trying to create positive buzz for your team.

Grangar · on April 11, 2016

I think it's pretty humorous actually.

softawre · on April 11, 2016

You probably don't have PTSD then.

voltagex_ · on April 11, 2016

>"I was thinking of hacking the networking stack this weekend to use it – care to join in?"

I love a weekend hack as much as anyone, but I'm assuming this is unpaid and uncompensated... for FOSS that might be fine but for a multi-million dollar project? No way.

p4wnc6 · on April 11, 2016

Then you'll just be labeled as "not passionate enough." BTW, even if you worked all weekend, you're still expected to stick around at work late on Thursday for some whiskey tasting or some other bullshit, because if not, you're not a team player or not a good cultural fit :/

zaccus · on April 13, 2016

Just be good at your job and ignore the bullshit. Go home at 5. Has worked for me so far. Maybe I'm not a good cultural fit or whatever, but that's ok. Can't please everyone.

softawre · on April 11, 2016

Is this a real problem you're seeing today? There are lots of jobs out there. Go find one with a good manager.

p4wnc6 · on April 11, 2016

This has been a problem in all four jobs I've had, and almost all jobs for which I've ever interviewed (which has been a lot lately).

xiaopingguo · on April 11, 2016

The reason 90% of everything is crap is that these things like performance and quality are almost directly opposed to business interests/needs. That seems unlikely to change. And it is a good reason to bet on free and open source software.

Manager B would most likely be politicked out of his/her job.

p4wnc6 · on April 11, 2016

Performance and quality are rarely at odds with legitimate business interests or needs. In fact, if you sleight performance or quality it is exactly equal to sleighting future business needs or interests.

The trouble is when illegitimate (generally political) business concerns are allowed to be trumpeted as more important than legitimate business concerns, or when illegitimate views about how to discount the future value of quality are permitted to dominate the comparison with extreme short-term opportunities.

adrianN · on April 11, 2016

Performance and quality always take time and money. It's pretty hard to convince management to actively invest in something that doesn't add features to the sales pitch. The argument that long term it makes it easier to build more features somehow doesn't seem to work.

modoc · on April 11, 2016

If you're doing anything on the web, Performance should be an easy sell to the business/management. There's a ton of solid data showing faster websites = more conversions which = more revenue. Then factor in hardware and licensing cost reductions by handling more capacity on less infrastructure, and you have another clear financial win. Honestly performance work is one of the easiest places to show a clear ROI.

ajmurmann · on April 11, 2016

"In fact, if you sleight performance or quality it is exactly equal to sleighting future business needs or interests."

This assumes that you are building something you will be using in that future which for most startups is not true. For most it will be a better trade off to iterate faster now and solidify the system once you know for a fact that it matches actual needs.

p4wnc6 · on April 11, 2016

Most start-ups are deemed by the market to be not useful. I'm not sure it's informative to draw lessons of the value of emphasizing quality ahead of short-term, less substantive "results" from the population of start-ups.

Ono-Sendai · on April 11, 2016

When developing on Windows, one thing that would make it easier to have the kind of tests mentioned in this article, is to make it easier to access CPU performance counters.

I tried it a little while ago and I had to put Windows into test mode or something like that, and write lots of code On linux it seems to be much easier (for example just run 'perf').

Edit: Joe - If you read this, please ask the Microsoft kernel team to expose the CPU hardware performance counters to userspace. Thanks!

digitalinfinity · on April 11, 2016

Curious: Is this not what is exposed by the ETW CPU counters?

Ono-Sendai · on April 11, 2016

Possibly. I want easy access though, through some nice C/C++ API. Maybe intrinsics exposed by the compiler.

known · on April 11, 2016

Should have written more about tools

koder2016 · on April 11, 2016

Premature optimisation counterculture.

Also, when is M# / Midori going to ship?

hyperpape · on April 11, 2016

Midori isn't going to ship: http://joeduffyblog.com/2015/11/03/blogging-about-midori/.

pjmlp · on April 11, 2016

Although parts of it have contributed to the .NET AOT native compilers in Windows Phone 8.x (MDIL) and the .NET CoreCLR.

Also some of the M# features are coming in C# 7 and later versions.