It really shouldn't be surprising that these models fail to do anything that _th...

jazzyjackson · on July 10, 2024

I agree that these open ended transformers are way more interesting and impressive than a purpose built count-the-polygons model, but if the model doesn't generalize well enough to figure out how to count the polygons, I can't be convinced that they'll perform usefully on a more sophisticated task.

I agree this research is really interesting, but I didn't have an a priori expectation of what token prediction could accomplish, so my reaction to a lot of the claims and counterclaims of this new tech is that it's good at fooling people and giving plausible but baseless results. It makes for good research but dangerous in the hands of a market attempting to exploit it.

empath75 · on July 11, 2024

> I agree that these open ended transformers are way more interesting and impressive than a purpose built count-the-polygons model, but if the model doesn't generalize well enough to figure out how to count the polygons, I can't be convinced that they'll perform usefully on a more sophisticated task.

I think people get really wrapped into the idea that a single model needs to be able to do all the things, and LLMs can do a _lot_, but there doesn't actually need to be a _one model to rule them all_. If VLMs are kind of okay at image intepretation but not great at details, we can supplement them with something that _can_ handle the details.