Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Benchmarks are favorable enough they're comparing to non-OpenAI models again. Interesting that tokens/second is similar to 5.4. Maybe there's some genuine innovation beyond bigger model better this time?


It's behind Opus 4.7 in SWE-Bench Pro, if you care about that kind of thing. It seems on-trend, even though benchmarks are less and less meaningful for the stuff we expect from models now.

Will be interesting to try.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: