More than 80% of college students get the following question wrong.[1] I'd say this is an example of naive pattern matching with no real reasoning.
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.
Which is more probable?
a) Linda is a bank teller.
b) Linda is a bank teller and is active in the feminist movement.
Hank is 31 years old, single, outspoken, and very bright. Hank got a technical degree, and as a student was deeply interested in machine learning and the ethics & philosophy of potential artificial intelligence. He posts a problem on HN where he describes the personality, capabilities, interests, and background of a woman named Linda has and then asks people to reflect on her occupation, politics, and an intersection between the two. Which is more likely?
a) Hank has written us a human interest problem b) Hank has written us a human interest and a probability problem.
I don't think people are simply wrong about the Linda problem, I think they're imprecise about which question they're answering, and more or less think they're answering a question about what chances that Linda is a feminist vs what are the chances she's a bank teller not only using the givens+relevant priors about people but also their priors about what kind of question they're answering. It isn't "no real reasoning", it's just not high resolution enough to be technically correct by the standards of a constructed probability problem.
You can argue LLMs are also not quite high resolution enough and I'd accept that. In my mind the question is what it would take to get some kind of ML software to a place where if you trained it on enough probability problems it would be able to evaluate the Hank problem above, including the issue of whether (a) and (b) are actually independent. ;)
I'd say this is the very definition of "overthinking it" instead of pattern matching.
Pattern matching would lead you straight to the correct answer - joint probability is always ≤ single probability. Meaning the information given in the question is just fluff. Pattern matching reduces to "Which is more probable: just A or else A ^ B?" at which point the correct answer becomes obvious.
Eh, the "correct" answer seems silly and overly mathematical to me.
The fact is, you can infer things about people based on things you know about people. I can pick a random user on HN, and knowing they're a user of HN, I can say it's probable that they work in technology. We don't need to bring statistics into it and turn it into a math problem.
Well, it's multiple-choice. So the "a" answer excludes the "b" answer by convention, and thus implies she isn't a feminist. Rephrase the question to remove that implication and I expect far more people will pick the "a" answer.
This particular question doesn't seem like a fallacy to me.
I, like most people intuitively answered b). Given the explanation on the Wikipedia page I went "oh, of course, yeah", but then I thought about why I'd answer b) given that I'm fairly familiar with basic probability.
If you give me two options and ask me to pick between them, my brain is usually going to assume it's not a trivially true problem.
Language needs context for any sense to be made of it.
As a result of the above, reading the question, the intuitive reading makes the answer choices
a) Linda is a bank teller (implicitly, a bank teller NOT active in the feminist movement)
b) Linda is a bank teller and is active in the feminist movement.
This question is one of language, context, and interpretation, not of people failing to understand basic probability.
I suspect that if you prime people to excise interpretation of the question by presenting it as the following, the majority of people would guess correctly:
Multiple choice questions imply that other choices are excluded (unless an answer such as "all of the above" is a choice. So the implied question is:
a) Linda is a bank teller (and NOT active in the feminist movement)
b) Linda is a bank teller and is active in the feminist movement
What you want it to be asking here is:
a) Linda is a bank teller, and may or may not be active in the feminist movement
b) Linda is a bank teller, and is active in the feminist movement
Most college students have taken quite a few multiple-choice tests (particularly in the US, high-schools train for standardized multiple-choice tests). The question isn't asking what the mathematicians seem to think it's asking, because it's format conveys extra restrictions.
Can you expand on that a little?