I also felt this way initially, like "that's it?". But overall the massive reduction in hallucinations and increase in general accuracy makes it almost reliable. Math is correct, it follows all commands far more closely, can continue when it's cut off by the reply limit, etc.
Then I tried it for writing code. Let's just say I no longer write code, I just fine tune what it writes for me.
Then I tried it for writing code. Let's just say I no longer write code, I just fine tune what it writes for me.