Watching the livestream now, the improvement over their current models on the benchmarks is very small. I know they seemed to be trying to temper our expectations leading up to this, but this is much less improvement than I was expecting
I have a suspicion that while the major AI companies have been pretty samey and competing in the same space for a while now, the market is going to force them to differentiate a bit, and we're going to see OpenAI begin to lose the race toward extremely high levels of intelligence instead choosing to focus on justifying their valuations by optimizing cost and for conversational/normal intelligence/personal assistant use-cases. After all, most of their users just want to use it to cheat at school, get relationship advice, and write business emails. They also have Ive's company to continue investing in.
Meanwhile, Anthropic & Google have more room in their P/S ratios to continue to spend effort on logarithmic intelligence gains.
Doesn't mean we won't see more and more intelligent models out of OpenAI, especially in the o-series, but at some point you have to make payroll and reality hits.
I'm not sure what "10% performance gain" is supposed to mean here; but moving from "It does a decent job 95% of the time but screws it up 5%" to "It does a decent job 98% of the time and screws it up 2%" to "It does a decent job 99.5% of the time and only screws it up 0.5%" are major qualitative improvements.
"+100 points" sounds like a lot until you do the ELO math and see that means 1 out of 3 people still preferred Claud Opus 4's response. Remember 1 out of 2 would place the models dead even.
Also, the code demos are all using GPT-5 MAX on Cursor. Most of us will not be able to use it like that all the time. They should have showed it without MAX mode as well
Then why increment the version number here? This is clearly styled like a "mic drop" release but without the numbers to back it up. It's a really bad look when comparing the crazy jump from GPT3 to GPT4 to this slight improvement with GPT5.
GPT-5 was highly anticipated and people have thought it would be a step change in performance for a while. I think at some point they had to just do it and rip the bandaid off, so they could move past 5.
It was relative to the number the comment I replied to included. I would assume GPT-5 is nowhere near 100x the parameters of o3. My point is that if this release isn't notable because of parameter count, nor (importantly) performance, what is it notable for? I guess it unifies the thinking and non-thinking models, but this is more of a product improvement, not a model improvement.
The fact that it unifies the regular model and the reasoning model is a big change. I’m sure internally it’s a big change, but also in terms of user experience.
I feel it’s worthy of a major increment, even if benchmarks aren’t significantly improved.
The hallucination benchmarks did show major improvement. We know existing benchmarks are nearly useless at this point. It's reliability that matters more.
I’m more worried about how they still confidently reason through things incorrectly all the time, which isn’t quite the same as hallucination, but it’s in a similar vein.
I mean that's just the consequence of releasing a new model every couple months. If Open AI stayed mostly silent since the GPT-4 release (like they did for most iterations) and only now released 5 then nobody would be complaining about weak gains in benchmarks.
If everyone else had stayed silent as well, then I would agree. But as it is right now they are juuust about managing to match the current pace of the other contenders.
Which actually is fine, but they have previously set quite high expectations. So some will probably be disappointed at this.
If they had stayed silent since GPT-4, nobody would care what OpenAI was releasing as they would have become completely irrelevant compared to Gemini/Claude.