It's surprising because these models are pretty ok at some vision tasks. The existence of a clear failure mode is interesting and informative, not embarrassing.
Not only are they capable of understanding images(the kind people might actually feed into such a system - photographs), but they're pretty good at it.
A modern robot would struggle to fold socks and put them in a drawer, but they're great at making cars.
I mean, with some of the recent demos, robots have got a lot better at folding stuff and putting it up. Not saying it's anywhere close to human level, but it has taken a pretty massive leap from being a joke just a few years ago.
They're hardly being advertised or sold on that premise. They advertise and sell themselves, because people try them out and find out they work, and tell their friends and/or audiences. ChatGPT is probably the single biggest bona-fide organic marketing success story in recorded history.
This is fantastic news for software engineers. Turns out that all those execs who've decided to incorporate AI into their product strategy have already tried it out and ensured that it will actually work.
> Turns out that all those execs who've decided to incorporate AI into their product strategy have already tried it out and ensured that it will actually work.
The 2-4-6 game comes to mind. They may well have verified the AI will work, but it's hard to learn the skill of thinking about how to falsify a belief.
MATCH
2, 4, 6
8, 10, 12
12, 14, 16
20, 40, 60
NOT MATCH
10, 8, 6
If the answer is "numbers in ascending order", then this is a perfect illustration of synthetic vs. realistic examples. The numbers indeed fit that rule, so in theory, everything is fine. In practice, you'd be an ass to give such examples on a test, because they strongly hint the rule is more complex. Real data from a real process is almost never misleading in this way[0]. In fact, if you sampled such sequences from a real process, you'd be better off assuming the rule is "2k, 2(k+1), 2(k+2)", and treating the last example as some weird outlier.
Might sound like pointless nitpicking, but I think it's something to keep in mind wrt. generative AI models, because the way they're trained makes them biased towards reality and away from synthetic examples.
--
[0] - It could be if you have very, very bad luck with sampling. Like winning a lottery, except the prize sucks.
That's the one. Though where I heard it, you can set your own rule, not just use the example.
I'd say that every black swan is an example of a real process that is misleading.
But more than that, I mentioned verified/falsified, as in the difference between the two in science. We got a long way with just the first (Karl Popper only died in 1994), but it does seem to make a difference?
I see this complaint about LLMs all the time - that they're advertised as being infallible but fail the moment you give them a simple logic puzzle or ask for a citation.
And yet... every interface to every LLM has a "ChatGPT can make mistakes. Check important info." style disclaimer.
The hype around this stuff may be deafening, but it's often not entirely the direct fault of the model vendors themselves, who even put out lengthy papers describing their many flaws.
A bit like how Tesla Full Self-Driving is not to be used as self-driving. Or any other small print. Or ads in general. Lying by deliberately giving the wrong impression.
Evidently, all these models still fall short.