More

lostmsu · 2026-04-09T12:09:12 1775736552

This would be more interesting if the commits would list everyone who voted "yes" as authors.

lostmsu · 2026-04-09T11:49:13 1775735353

One time retrieval is 1.5x times more expensive than HDD that would fit the data.

lostmsu · 2026-04-08T16:02:17 1775664137

Transformers too. JEPA any day now

lostmsu · 2026-04-08T16:00:54 1775664054

Are you considering batch inference?

lostmsu · 2026-04-08T15:59:29 1775663969

You are anti-progress. Pro-humanity is not the same as pro-progress.

lostmsu · 2026-04-08T15:12:09 1775661129

More like 6 months now. Qwen 3.5 is on Sonnet 4.5 levels

lostmsu · 2026-04-08T12:44:39 1775652279

I think previous models could do hacking just fine.

stratos123 · 2026-04-09T13:57:33 1775743053

The Mythos system card shows massive improvements over Opus in hacking (e.g. a 0.8% -> 72% in "Firefox shell exploitation"). If you thought Opus was already human-professional-level, well.

lostmsu · 2026-04-09T15:17:34 1775747854

What's the professional human baseline?

lostmsu · 2026-04-08T12:34:10 1775651650

You are reading the percentages wrong.

Because 100% is maximum, you should be looking at error rates instead. GPT has 25% on Terminal Bench and the new model has 18%, almost 1.4x reduction.

lostmsu · 2026-04-07T02:36:45 1775529405

Agent Workstation for Codex CLI + Claude Code — with task scheduler, git worktree & remote control, skills management

lostmsu · 2026-04-05T22:07:20 1775426840

It's unnoticed because it didn't. In Google's own benchmarks they are on par, and I've seen 3rd party benchmarks where Qwen beats G4 with high margin