Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The test has many reasoning, code and instruction following questions which I expected o1 to be excelling at. I do not have an interpretation for such poor results on our test, was just sharing them as a data point for people to make their own mind. My best guess at this point is that o1 is optimized for a very specific and narrow use case, similar to what you suggest.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: