Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Seems very dismissive and unaware of recent advances in causal inference (cf other comments on Pearl). Putting "throw the kitchen sink at it" regression a la early 2000s nutritional research (which is indeed garbage in garbage out) in the same category as mendelian randomization, DAGs, IP weighting, and G-methods is misleading. I do worry that some of these EA types dive head-first into a random smattering of google scholar searches with no subject matter expertise, find a mess of studies, then conclude "ah well, better just trust my super rational bayesian priors!" instead of talking with a current subject matter expert. Research -- even observational research -- has changed a lot since the days of "one-week observational study on a few dozen preschoolers."

A more general observation: If your conclusion after reading a bunch of studies is "wow I really don't understand the fancy math they're doing here" then usually you should do the work to understand that math before you conclude that it's all a load of crap. Not always, of course, but usually.



> I do worry that some of these EA types dive head-first into a random smattering of google scholar searches with no subject matter expertise, find a mess of studies, then conclude "ah well, better just trust my super rational bayesian priors!" instead of talking with a current subject matter expert. Research -- even observational research -- has changed a lot since the days of "one-week observational study on a few dozen preschoolers."

EA types spend a lot of time talking with subject matter experts, see e.g. https://www.givewell.org/international/technical/programs/vi...


This is the problem of gnosis vs doxa. Regardless of one's intelligence, since the priors (or rather posteriors as at least I as Bayesian thinker would really look at paper X and then look at who made it (how much funding they had), where, and the context (is this obvious marketing) and compute from overly safe approximation it's BS, the actual probability that tou call priors) actually are reliable & you cannot know what you cannot know & Postman.

So then you are hands in the air, trying to get others internalize, a discussion culture where you actually can as normal thing claim or are by default assumed to be probably superficial.

Even if someone knows lots about foundational things like philosophy & what they want etc., statements have a certain feel to it regardless of mathematical truthness, and at some point you will do the value benefit calculation that you might be superficial but you should learn about others topics first because you know you can for less time gain more information that is valuable from them.

---

Pearl's causality as far as I see is best modelled with cdr car + NBG as embedded agency foundations + computability in the mix + signal theory (time as discrete evolution from one state in the chain to another; signal theory is relevant when you have multiple agents or the environment & PARTS of you run at different clocks or something more complex) i.e. part of formal embedded agency. Doesn't feel too meaningful without it except epistemologically (close what can we know type questions) where causality might be a good lens, especially to filter inquiries.


> If your conclusion after reading a bunch of studies is "wow I really don't understand the fancy math they're doing here" then usually you should do the work to understand that math before you conclude that it's all a load of crap.

While this is true, putting the onus on the reader to understand a lot of advanced math makes it easy to avoid scrutiny by increasing the complexity of your math such that the only people who are ever going to be able to critique you are the intersection between PhD-level mathematicians and experts in the field your paper actually pertains to. Anyone can just say whatever they want and assure you that they must be right because they know more math than everyone else who's interested in that problem.

Instead of, "Understand the math before you conclude that it's all a load of crap," I would say, if it's an unreasonable level of complexity for the particular problem and you can't find a large body of other papers doing something similar with the same problem, just ignore it.


We don't even need to go into the 2000's. The author openly dismisses Generalized Method of Moments (published in 1982 by Lars Hansen [0]) as a 'complex mathematical technique' that he's 'guessing there are a lot of weird assumptions baked into' it, the main evidence being that he 'can't really follow what it's doing'. He also admits that he has no idea what control variables are or how to explain linear regression. It's completely pointless trying to discuss the subtleties of how certain statistical techniques try to address some of his exact concerns, it's clear that he has no interest in listening, won't understand and just take that as further evidence that it's all just BS. This post is a rant best described as Dunning-Kruger on steroids, I have no idea how this got 200 points on HN and can just advise anyone who reads here first to spare themselves the read.

[0] edit: Hansen was awarded the Nobel Memorial Prize in Economics in 2013 for GMM, not that that means it can't fail, but clearly a lot of people have found it useful.


I think you are significantly misrepresenting what the author said. He didn't say he has no idea what control variables are. What he said is:

> The "controlling for" thing relies on a lot of subtle assumptions and can break in all kinds of weird ways. Here's[1] a technical explanation of some of the pitfalls; here's[2] a set of deconstructions of regressions that break in weird ways.

[1] https://journals.plos.org/plosone/article?id=10.1371/journal...

[2] https://www.cold-takes.com/phil-birnbaums-regression-analysi...

To me this seems to demonstrate a stronger understanding of regression analysis than 90+% of scientists who use the technique.


> He didn't say he has no idea what control variables are

He did say exactly that.

> They use a technique called regression analysis that, as far as I can determine, cannot be explained in a simple, intuitive way (especially not in terms of how it "controls for" confounders).

That's about as /noideadog as you can get.


That is unfair, he says...

> "generalized method of moments" approaches to cross-country analysis (of e.g. the effectiveness of aid)

Which is an entirely reasonable criticism. GMM is a complex mathematical process, wiki suggests [0] that it assumes data generated by a weakly stationary ergodic stochastic process of multivariate normal variables. There are a lot of ways that a real world data for aid distribution might be nonergodic, unstationary, generally distributed or even deterministic!

Verifying that a paper has used a parameter estimation technique like that properly is not a trivial task even for someone who understands GMM quite well. A reader can't be expected to follow what the implications are from reading a study; there is a strong element of trust.

[0] https://en.wikipedia.org/wiki/Generalized_method_of_moments


Every statistical model makes assumptions. As a general rule, the more mathematically complex the model, the fewer (or weaker) assumptions are made. That's what the complexity is for. So the criticism 'it looks complex, so the assumptions are probably weird' doesn't make sense.

If as a reader you don't understand a paper (that's been reviewed by experts), then the best thing to conclude is that you're not the target audience, not that the findings can be dismissed.


He isn't saying that, he's saying he does understand the paper and therefore the findings can be viewed with some suspicion. That is the nature of research, clear conclusions are rare because real data is messy.

> Every statistical model makes assumptions. As a general rule, the more mathematically complex the model, the fewer (or weaker) assumptions are made. That's what the complexity is for. So the criticism 'it looks complex, so the assumptions are probably weird' doesn't make sense.

This is an argument of the form [X -> Y. Y. Y has a purpose. Therefore not(Y->Z)]. It isn't valid; the fact that a criticism is general doesn't make it weaker (or stronger, for that matter). It is a bit like saying meat contains bacteria so someone can't complain that some meal gave them food poisoning. They can certainly complain about it and it is possible (indeed likely) that some meat is bad because of excessive bacteria.


> He isn't saying that, he's saying he does understand the paper

He literally says 'I can't really follow what it's doing', linking to a paper that discusses some issues with instrumental variable regression (what GMM is used for).


Yeah, I found this article to be annoying AF, because it seemed to fall into the same traps that he's accusing these study authors of making in the first place. It seemed by the end of it he was just trying to yell "correlation is not causation!" but in an even smarter "I am very smart" sort of way.

E.g. I certainly found myself agreeing with his points about observational studies, and there are plenty of real-world examples you can point to where experts have been lead astray by these kinds of studies (e.g. alcohol consumption recommendations, egg/cholesterol recommendations, etc.)

But when he talked about his reservations re "the wheat" studies, they seemed really weak to me and semi-bizarre:

1. Regarding "The paper doesn't make it easy to replicate its analysis." I mean, no shit Sherlock? The whole point is that it would be prohibitively expensive or unethical to carry out these real experiments, so we rely on these "natural" experiments to reach better conclusions.

2. "There was other weird stuff going on (e.g., changes in census data collection methods), during the strange historical event, so it's a little hard to generalize." First, this seems kind of hand-wavy (not all natural experiments have this issue), but second and more importantly, of course it's hard to "generalize" these kinds of experiments because their value in the first place is that they're trying to tease out one specific variable at a specific point in time.

3. The third bullet point just seemed like it could be summarized as "news flash, academics like to endlessly argue about shit."

I think the fundamental problem when looking for "does X cause Y", is that in the real world these are complex systems: lots of other things cause Y too (or can reduce its chances), so you're only ever able to make some statistical statement, e.g. X makes Y Z% more likely, on average. But even then, suppose there is some thing that could make Y Z% more likely among some specific sub-population, but make it some percent less likely in another sub-population (not an exact analogy but my understanding is that most people don't really need to worry about cholesterol in eggs, but a sub-population of people is very reactive to dietary cholesterol).

Basically, it feels like the author is looking for some definitive, unambiguous "does X cause Y", but that's not really how complex systems work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: