What would make 2.5 Pro (or anything else) categorically better would be if it could say "I don't know".
There will be things that Claude 3.7 or Gemini Pro will not know, and the interpolations they come up with will not make sense.
You must rely on your own internal model in your head to verify the answers it gives.
On hallucination: it is a problem but again, it reduces as you use heavier models.
This is what significantly reduces the utility, if it can only be trusted to answer things I know the answer to, why would I ask it anything?
I have written about it here: https://news.ycombinator.com/item?id=44712300
What would make 2.5 Pro (or anything else) categorically better would be if it could say "I don't know".
There will be things that Claude 3.7 or Gemini Pro will not know, and the interpolations they come up with will not make sense.