Hacker Newsnew | past | comments | ask | show | jobs | submit | RA_Fisher's commentslogin

I don’t think there are other models near Fable’s capabilities.

That remains to be seen.

It's notable that Anthropic are still using SWEBench as a coding benchmark rather than the newer more difficult DeepSWE which shows them well behind GPT 5.5

https://deepswe.datacurve.ai/

Bear in mind that all the marketing efforts such as solving Erdos problem are the result of concerted RL training to impart those narrow capabilities, and how much of any benchmark results, or "early access" paid shill vibe reports, reflect improved performance for more general real-world use cases remains to be seen.


For how long though? The past two months have seen a ridiculous number of model releases.

Well I have just tested it and GPT 5.5 is still smarter. It catches bugs that Fable doesn’t. Anthropic Fable is basically still sloppy like Opus 4.x. And I got also the downgrade for “cyber violations” trying to build a custom Debian ISO…that tells me their safeguards are sh**. I didn’t ask it to hack anything. Just to make a script that builds a custom Debian distribution with various settings…so this Fable thing seems like a flop&slop already. That warning plus the privacy change is the wake up call to move from Anthropic

Why don't you think that? What I've read is that other models can find the same bugs.

In what ways? LM Arena has Opus 4.7 w/ 1567 -/+ 7 vs. 1505 -/+ 10 from GPT-5.5 Codex in code. I'm currently using both.

Admittedly my recent experience tilts Opus now 4.8, but you and others have my interest piqued re: GPT-5.5 Codex so I'm trying that more now.


arena is not a good benchmark, it is very susceptible to sycophancy

Claude Code will write the whole thing for you. Whereas doesn’t Copilot require input along the way of coding? ie- it doesn’t do all the programming for you


It can code the whole thing for you, copilot in vscode is simply better, people just never tried it.


If you give Copilot a file with a list of tasks to complete, it will try to churn through them (just like most other harness would do these days).


Ah okay, can it work on a whole repo in an agentic way?


Yes, of course, it can also span subagents, work for an hour without interactivity if that's what you want etc. just like any other harness.

Actually due to stupid billing system of github which charges per "premium request" instead of tokens, you could and still can abuse it so it costs nothing. They're changing it from next month to usage based billing though.


Ah thank you for updating me there.

Do people bring their own then (considering work doesn’t pay for it)?


Our corp specifically prohibits that, because of code leak/training.


AI gives us a means of leverage. We can do more with less. production = f(labor, capital, technology) + eps


This always comes up and the only thing I can think is: Doesn't Google make like 10B a quarter in profit from GCP alone? Did we really need a cheaper SQL injection checker?


Anthropic and Claude are running circles around Google / Gemini for me these days. Anthropic was quite helpful for a while but strange limit issues started popping up. The final thread was a bug that essentially broke my ability to develop. I moved over to Claude Code full time and haven't looked back. Opus 4.6 is awesome for accelerating probabilistic programming!


They’ll probably receive most if not all of Iran’s focus now.


I limit the risk and insist on payment upfront.


Location: Brighton, MI, USA Remote: Yes Willing to relocate: Yes

Technologies: Bayesian statistics, econometrics, causal inference, experimental design, machine learning, AI systems, R (expert), Python (expert), SQL (expert), Stan/brms, PyMC, scikit-learn, tidyverse, PostgreSQL/Redshift, AWS, Airflow, Ansible, Terraform

Résumé/CV: https://statwonk.com/about/


That’s good, reducing healthcare costs will increase access and boost the our health.

Agree that AI should replace CEOs. They’re often biased in unhelpful ways that AI isn’t and it costs people wellbeing.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: