Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Using a 2D histogram when there isn't overplotting is just throwing away good data. Why do it unless you need to?


Overplotting is common, and "you" (generic internet guy who posted a graph on his blog, not you personally) probably don't even know how to spot it.

Secondly, the data you throw away is usually just sampling noise. Most of the time the interesting object is the underlying probability distribution - individual points are only useful to infer that.

In the cases where individual data points are actually of interest (e.g. http://cl.ly/GvnM ), go ahead and use them. But they are terrible default choice.


Scatterplots are only a "terrible" choice when the underlying data set is concentrated; in all other cases they are superior. Do you not realize that using a 2D histogram on a sparse data set is every bit as foolish as using a scatterplot on a concentrated one?

Whether scatterplots make a "terrible default choice," then, depends on what you believe about the distribution of data sets out there. Based on your advice, you must believe that most people have data sets that are dense.

For a lot of people, however, that's not true. For them, "plot density, not points" is the terrible default choice. But you're telling them to make it their default anyway.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: