More

RA_Fisher · 2026-06-10T11:21:07 1781090467

I don’t think there are other models near Fable’s capabilities.

HarHarVeryFunny · 2026-06-10T13:36:21 1781098581

That remains to be seen.

It's notable that Anthropic are still using SWEBench as a coding benchmark rather than the newer more difficult DeepSWE which shows them well behind GPT 5.5

https://deepswe.datacurve.ai/

Bear in mind that all the marketing efforts such as solving Erdos problem are the result of concerted RL training to impart those narrow capabilities, and how much of any benchmark results, or "early access" paid shill vibe reports, reflect improved performance for more general real-world use cases remains to be seen.

fc417fc802 · 2026-06-10T12:18:43 1781093923

For how long though? The past two months have seen a ridiculous number of model releases.

thefounder · 2026-06-10T17:13:33 1781111613

Well I have just tested it and GPT 5.5 is still smarter. It catches bugs that Fable doesn’t. Anthropic Fable is basically still sloppy like Opus 4.x. And I got also the downgrade for “cyber violations” trying to build a custom Debian ISO…that tells me their safeguards are sh**. I didn’t ask it to hack anything. Just to make a script that builds a custom Debian distribution with various settings…so this Fable thing seems like a flop&slop already. That warning plus the privacy change is the wake up call to move from Anthropic

ImPostingOnHN · 2026-06-10T13:03:11 1781096591

Why don't you think that? What I've read is that other models can find the same bugs.

RA_Fisher · 2026-05-30T14:08:29 1780150109

In what ways? LM Arena has Opus 4.7 w/ 1567 -/+ 7 vs. 1505 -/+ 10 from GPT-5.5 Codex in code. I'm currently using both.

Admittedly my recent experience tilts Opus now 4.8, but you and others have my interest piqued re: GPT-5.5 Codex so I'm trying that more now.

spongebobstoes · 2026-05-30T15:43:58 1780155838

arena is not a good benchmark, it is very susceptible to sycophancy

RA_Fisher · 2026-05-23T11:58:31 1779537511

Claude Code will write the whole thing for you. Whereas doesn’t Copilot require input along the way of coding? ie- it doesn’t do all the programming for you

mirekrusin · 2026-05-23T12:05:01 1779537901

It can code the whole thing for you, copilot in vscode is simply better, people just never tried it.

__mharrison__ · 2026-05-23T12:48:19 1779540499

If you give Copilot a file with a list of tasks to complete, it will try to churn through them (just like most other harness would do these days).

RA_Fisher · 2026-05-23T13:59:06 1779544746

Ah okay, can it work on a whole repo in an agentic way?

mirekrusin · 2026-05-23T19:18:07 1779563887

Yes, of course, it can also span subagents, work for an hour without interactivity if that's what you want etc. just like any other harness.

Actually due to stupid billing system of github which charges per "premium request" instead of tokens, you could and still can abuse it so it costs nothing. They're changing it from next month to usage based billing though.

RA_Fisher · 2026-05-31T13:10:29 1780233029

Ah thank you for updating me there.

RA_Fisher · 2026-05-23T11:56:22 1779537382

Do people bring their own then (considering work doesn’t pay for it)?

krzyk · 2026-05-23T12:49:19 1779540559

Our corp specifically prohibits that, because of code leak/training.

RA_Fisher · 2026-04-28T17:13:02 1777396382

AI gives us a means of leverage. We can do more with less. production = f(labor, capital, technology) + eps

krainboltgreene · 2026-04-28T17:42:06 1777398126

This always comes up and the only thing I can think is: Doesn't Google make like 10B a quarter in profit from GCP alone? Did we really need a cheaper SQL injection checker?

RA_Fisher · 2026-04-16T14:41:30 1776350490

Anthropic and Claude are running circles around Google / Gemini for me these days. Anthropic was quite helpful for a while but strange limit issues started popping up. The final thread was a bug that essentially broke my ability to develop. I moved over to Claude Code full time and haven't looked back. Opus 4.6 is awesome for accelerating probabilistic programming!

RA_Fisher · 2026-04-08T11:32:12 1775647932

They’ll probably receive most if not all of Iran’s focus now.

RA_Fisher · 2026-04-04T15:16:40 1775315800

I limit the risk and insist on payment upfront.

RA_Fisher · 2026-04-01T16:43:17 1775061797

Location: Brighton, MI, USA Remote: Yes Willing to relocate: Yes

Technologies: Bayesian statistics, econometrics, causal inference, experimental design, machine learning, AI systems, R (expert), Python (expert), SQL (expert), Stan/brms, PyMC, scikit-learn, tidyverse, PostgreSQL/Redshift, AWS, Airflow, Ansible, Terraform

Résumé/CV: https://statwonk.com/about/

RA_Fisher · 2026-04-01T13:30:44 1775050244

That’s good, reducing healthcare costs will increase access and boost the our health.

Agree that AI should replace CEOs. They’re often biased in unhelpful ways that AI isn’t and it costs people wellbeing.