Hacker Newsnew | past | comments | ask | show | jobs | submit | lostmsu's commentslogin

This would be more interesting if the commits would list everyone who voted "yes" as authors.

One time retrieval is 1.5x times more expensive than HDD that would fit the data.

Transformers too. JEPA any day now

Are you considering batch inference?

You are anti-progress. Pro-humanity is not the same as pro-progress.

More like 6 months now. Qwen 3.5 is on Sonnet 4.5 levels

I think previous models could do hacking just fine.

The Mythos system card shows massive improvements over Opus in hacking (e.g. a 0.8% -> 72% in "Firefox shell exploitation"). If you thought Opus was already human-professional-level, well.

What's the professional human baseline?

You are reading the percentages wrong.

Because 100% is maximum, you should be looking at error rates instead. GPT has 25% on Terminal Bench and the new model has 18%, almost 1.4x reduction.


Agent Workstation for Codex CLI + Claude Code — with task scheduler, git worktree & remote control, skills management

It's unnoticed because it didn't. In Google's own benchmarks they are on par, and I've seen 3rd party benchmarks where Qwen beats G4 with high margin

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: