Most interesting thing to me is the spelling is correct. I'm not a heavy user of...

widerporst · 2025-03-26T07:21:53 1742973713

It very much looks like a side effect of this new architecture. In my experience, text looks much better in recent DALL-E images (so what ChatGPT was using before), but it is still noticeably mangled when printing more than a few letters. This model update seems to improve text rendering by a lot, at least as long as the content is clearly specified.

However, when giving a prompt that requires the model to come up with the text itself, it still seems to struggle a bit, as can be seen in this hilarious example from the post: https://images.ctfassets.net/kftzwdyauwt9/21nVyfD2KFeriJXUNL...

remuskaos · 2025-03-26T08:29:12 1742977752

The periodic table is absolutely hilarious, I didn't know LLMs had finally mastered absurdist humor.

soco · 2025-03-26T10:40:57 1742985657

Yeah who wouldn't love a dip in the sulphur pool. But back to the question, why can't such a model recognize letters as such? It cannot be trained to pay special attention to characters? How come it can print an anatomically correct eye but not differentiate between P and Z?

londons_explore · 2025-03-26T12:10:44 1742991044

I think the model has not decided if it should print a P or a Z, so you end up with something halfway between the two.

It's a side effect of the entire model being differentiable - there is always some halfway point.