Three charts are all I need

JumpCrisscross · on Feb 1, 2013

I think I see a Pareto law of data visualisation: most of the data visualisation needs of a field can be satisfied by a small number (hypothesis: N = 3) of visualisations.

Lines for time series, histograms for frequencies, and panels for everything else is a good heuristic for web analytics. E.g. in quantitative finance a scatter plot is my go-to visualisation.

saraid216 · on Feb 1, 2013

Paying this forward: http://www.flickr.com/photos/amit-agarwal/3196386402/

sesqu · on Feb 2, 2013

For what it's worth, I disagree with that chart. It breaks conventions, makes assumptions, and has an eclectic selection, so use it only for suggestions.

swanson · on Feb 1, 2013

Food for thought, here is a visualization I've made in an app for tracking team mood: http://i.imgur.com/PUUYGKI.png

It's pretty easy to spot what days were good, bad, and in between. You can start to see patterns (Every other Tuesday seems to be better, why is that? Oh, that's when we had donuts!).

If you were to apply the Three Chart rule, this would be a line chart since it shows change over time. I find the calendar visualization much easier to interpret than a line chart in this case.

danso · on Feb 1, 2013

No, I think this would fall under the Table category, as it, and most table charts, exhibit the quality of "small multiples"

bengillies · on Feb 2, 2013

Those three may be fine (and indeed all you really need for representing most data) in some fields. In other fields though, they're entirely unsuitable.

A lot of people have mentioned scatter plots.

I'd quite like to know how you'd represent relationships between things using only a histogram or line chart.

I realise that it's covered somewhat by the "95% of all cases" statistic, but really, as with most things, if all you're doing is visualising relationships, then that statistic is likely way off.

The general point is a good one - use a visualisation that's appropriate for representing your data not one that's appropriate only for looking nice, but I think the message is lost somewhere in amongst the rhetoric.

mshron · on Feb 1, 2013

Assuming that you're comfortable with 2d histograms (hexagonal binning is the standard) I completely agree. Otherwise scatterplots are pretty crucial.

yummyfajitas · on Feb 1, 2013

Unless you are comfortable with tweaking opacity, jitter and point size until you get a graph that is representative, don't use scatterplots. A 2d histogram with a decent color scheme is far more reliable and easy.

For dense data, scatterplots can obscure more than they show:

http://www.chrisstucchio.com/blog/2012/dont_use_scatterplots...

Even an author who was well aware of the problem made the same mistake:

http://garyrubinstein.teachforus.org/2013/01/09/the-50-milli...

tmoertel · on Feb 3, 2013

Using a 2D histogram when there isn't overplotting is just throwing away good data. Why do it unless you need to?

yummyfajitas · on Feb 4, 2013

Overplotting is common, and "you" (generic internet guy who posted a graph on his blog, not you personally) probably don't even know how to spot it.

Secondly, the data you throw away is usually just sampling noise. Most of the time the interesting object is the underlying probability distribution - individual points are only useful to infer that.

In the cases where individual data points are actually of interest (e.g. http://cl.ly/GvnM ), go ahead and use them. But they are terrible default choice.

tmoertel · on Feb 4, 2013

Scatterplots are only a "terrible" choice when the underlying data set is concentrated; in all other cases they are superior. Do you not realize that using a 2D histogram on a sparse data set is every bit as foolish as using a scatterplot on a concentrated one?

Whether scatterplots make a "terrible default choice," then, depends on what you believe about the distribution of data sets out there. Based on your advice, you must believe that most people have data sets that are dense.

For a lot of people, however, that's not true. For them, "plot density, not points" is the terrible default choice. But you're telling them to make it their default anyway.

kunalb · on Feb 2, 2013

I disagree with this post: there's another use case for visualizations -- consuming data as fast as possible to take action (eg firefighting). I work on tools for engineers, and I've found that a rich, non-standard visualization that people can learn to parse quickly after a tiny bit of initial acclimatization -- but showing them exactly what they need can help a lot.

Chris_Newton · on Feb 2, 2013

a rich, non-standard visualization that people can learn to parse quickly ... showing them exactly what they need

I do a lot of UI work, also mostly for technical users, so I’m interested in both effective visualisations and efficient interactions built around them.

I’ve found that customised and/or contextual visualisations and interactions can be very effective if, but only if, they fit how the user thinks about the situation better than any of the standard alternatives.

Put another way, if you’re dealing with a solved problem, using the solution that everyone already knows usually works best. But if you’re dealing with something new and different, and you can’t build what you need cleanly using existing tools, then creating a new kind of tool often gets better results than cobbling something together with the wrong tools for the job.

The hard part is that creating more appropriate tools generally requires understanding your users’ mental model(s) of the situation and the actions they need to take, and even a seemingly small mismatch between what a user expects and what you actually give them can really hurt when the user doesn’t have familiar conventions to fall back on.

In practice, I’ve had some success building UIs around a small number of specialised visualisations and controls (typically making up a single main screen/page) but using only mainstream presentation like tables and histograms for supporting features, but every project is different.

kunalb · on Feb 3, 2013

True -- any visualization that doesn't get the message across is just about useless.

Luckily we have very tight feedback loops -- and I can push new versions every couple of hours so I can customize it really fast to suit their needs/mental models.

pretoriusB · on Feb 2, 2013

>I disagree with this post

No, you actually AGREE with this post.

The posts says "for 95% of cases", that is, for most cases.

Your case, "firefighting", is a 5% outlier.

SatvikBeri · on Feb 1, 2013

I mostly agree with the article, but disagree with the idea that unusual visualizations are harder to understand. Tuning your chart to your data often makes it easier for your audience to understand. For example, if you want to communicate the detailed difference between how well two models predict the performance of teams in the NFL, an ROC chart showing both models is usually the clearest way to present[0]. If you want to model different scenarios, a segmentation chart may be better. If you just want average conversion rates, bar charts may be more helpful.

That said, I certainly agree with using the same chart for things like weekly updates, etc. Creativity for creativity's sake is pointless-the point of using different types of charts is to communicate a complex message as simply as possible.

[0]: http://blog.optimalbi.com/wp-content/uploads/2012/12/ROC-cha...

ChuckMcM · on Feb 1, 2013

Interesting thesis, I would like to see a cage match between Noah and Edward Tufte :-) I tend to favor Tufte's thesis that humans can extract more knowledge per millimeter of paper space out of a chart than they can out of text. The trick of course is constructing that visualization. Most folks are facile with words, while fewer can work as effectively with shapes.

I like the notion that if you cannot see at least four things in a chart then the chart isn't doing its job. It makes me ask the question, "What is the context in which this chart is expressing information? Can I show that?"

Q6T46nT668w6i3m · on Feb 1, 2013

I think Noah is echoing the work of W.S. Cleveland and Edward Tufte—both of whom advocate for using a small number of well-understood methods (e.g. tables, histograms, line plots, scatter plots, and so on).

trjordan · on Feb 1, 2013

I'm a fan of histograms above nearly all else. When I need to show change over time, I actually prefer heatmaps:

http://deliveryimages.acm.org/10.1145/1810000/1809426/gregg3... http://queue.acm.org/detail.cfm?id=1809426

It's like a scatterplot, but a bit better at showing collapsed data (i.e., there's a lot of data at that point on the graph -- is in more or less than that other jumble of Xs?).

tmoertel · on Feb 1, 2013

Histograms are great for visualizing relative density, but they have problems when items don't sort neatly into bins. They also make it hard for viewers to accurately estimate the portion of the distribution that falls within arbitrary ranges, say the percentage of smokers aged 24 thru 35. Plotting the empirical cumulative distribution function solves these problems.

For a numeric random variable X, its CDF F(x) gives P(X <= x). So if X gives the age of smokers, the answer to our earlier question is just F(35) less F(23). Plotting F over all values of X, then, lets us not only "see" the shape of the distribution but also answer range questions: just lookup two points on the graph and subtract.

Some examples:

http://docs.ggplot2.org/0.9.2.1/stat_ecdf.html

sesqu · on Feb 2, 2013

The problem with the ECDF is that it's harder to get a feel for, being cumulative. I spent a few days once playing with N-bin histograms, and still feel those should be the default (with non-compact ends, maybe gamma distributed) over smoothed hacks.

brown9-2 · on Feb 1, 2013

Very frustrating that Oracle broke all the blogs.sun.com links in that ACM article.

brendangregg · on Feb 1, 2013

Frustrating, yes; I have linked the real images here http://dtrace.org/blogs/brendan/2010/06/05/visualizing-syste...

kfcm · on Feb 1, 2013

And completely missing, the old reliable pie chart. Used when showing attributes (percentage, counts, etc) of a whole.

And then there's the scatterplot. Very useful in regression models for best fit.

Charts and plots are tools. To limit yourself to just three is using a screwdriver for a hammer, or C# for any programming task.

Understand the available tools and use the right one for the job at hand.

defrost · on Feb 1, 2013

> the old reliable pie chart.

? Weird. In the past 35 years working in numerical analysis of multitudes of data sources from toilet flushes in a city of a million people to stock movements to multichannel radiometrics, to cloud data from LIDAR, to cot death incidences, etc. I've never once used a pie chart (or, in fact, worked with anyone that's used them).

I understand they are popular when lying with statistics and in power point displays to non technical suits, but they have pretty limited use in understanding data or presenting layered attributes.

Scatter plots are somewhat useful, when combined with a means of indicating densities, such as heatmapping; box and whisker plots that show central densities, means, medians and extents of ranges are useful.

But Pie Charts? Professionally they're the joke setting in Excel . . .

danso · on Feb 1, 2013

Pie charts are controversial, as the human eye is easily misled by them. A discussion on Edward Tufte's site:

http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0...

showerst · on Feb 1, 2013

Slightly more advanced guide that I like -

http://giveupinternet.com/2009/01/16/chart-of-the-charts-cha...

They hit the most important thing about chart making right on the head: What are you trying to communicate?

nfm · on Feb 1, 2013

I really hope that the term "Infauxgraphics" gets some more traction.

lubos · on Feb 1, 2013

he forgot pie chart. I found that no matter how rational one can be about visualization, single pie chart can make world of a difference to a client.

edit: thanks for downvotes, I really appreciate how people can't recognize sarcasm. I'm very big on Tufte but it's easier to just give client damn pie chart than re-educate him on all aspects of data visualization.

Toenex · on Feb 1, 2013

If you want to convey an honest impression of the data then pie charts are regarded as a bad idea as humans are less adept at comparing areas as they are lengths - [http://en.wikipedia.org/wiki/Pie_chart#Use.2C_effectiveness_...]

sputknick · on Feb 1, 2013

Agreed. pie chart is best when the total set will not change in size such as something that totals to 100% will always total 100%.

saraid216 · on Feb 1, 2013

A pie chart is best when you're trying to convey a minimal amount of information, such extremely inexact proportions–like "roughly a third" or "about half"–, within an extremely small dataset, capping at around ten items.

antidaily · on Feb 1, 2013

boring.

MSM · on Feb 1, 2013

What's wrong with being boring?

If I can completely understand it without giving a hint of effort, that's perfection.