More

anthonypasq · 2026-06-11T17:15:22 1781198122

are we just re-inventing playwright tests except 10x slower and infinity times more expensive?

i feel like im going insane

hugs · 2026-06-11T17:53:13 1781200393

since the rise of agentic coding tools, it feels like we're in a new "eternal september" of people discovering ui end-to-end test automation.

acdha · 2026-06-11T18:11:37 1781201497

Also the merits of documentation and specs. It’s been eye-opening to see the subset of developers who were almost disdainful about writing documentation for their colleagues but are now tripping over themselves to do so for their clanker.

Daishiman · 2026-06-11T21:24:53 1781213093

Agents read the docs. People don't. That's the underlying reason.

simoncion · 2026-06-11T23:12:47 1781219567

> People don't.

People falling all over themselves to write docs for their pile-of-linear-algebra-with-a-smiley-face-painted-on-it [0] don't read the docs, no. People who give a shit about writing solid software that doesn't get them paged at three in the damn morning do.

[0] The face is there to provide social-trustworthiness signals to engage the human pack-bonding instinct, natch.

Daishiman · 2026-06-12T07:16:11 1781248571

Your sarcasm is unwarranted, because what I said is true and reflects the experience of a lot of people.

A decade ago I left a job and spent the last week thoroughly documenting every flow and code section of an app that I worked with, which was the core value proposition of the company. A couple years later I ask around and nobody even took a look at that.

People just don't read, and there are actually good reasons for that, one of them being that documentation is outdated in most orgs and the effort to keep it up to date is greater than reading the code.

simoncion · 2026-06-12T11:45:49 1781264749

> ...what I said is true and reflects the experience of a lot of people.

Wow. What I said is true and reflects the experience of a lot of people. Amazing!

Daishiman · 2026-06-12T15:14:36 1781277276

That's nice, but if you write docs and people don't read them then that's clearly not a winning strategy in many orgs.

simianwords · 2026-06-11T20:04:19 1781208259

[flagged]

acdha · 2026-06-12T00:12:49 1781223169

That’s a rather stunning comparison: racism is a problem because it’s unfairly treating sentient beings but a pile of linear algebra is not even sentient, much less your peer. That’s part of why I used the term: “agent” isn’t current because agents have, well, agency and can be held accountable.

https://lucumr.pocoo.org/2026/5/26/clankers/

dragonwriter · 2026-06-11T23:21:41 1781220101

Positing an equivalence between a dismissive term for AI bots and a racial slur against black people is, like, super racist.

inigyou · 2026-06-11T18:05:15 1781201115

People are rediscovering everything. Some people have proposed using a more formal language to tell the AI precisely what code to write. That's a compiler.

righthand · 2026-06-11T20:40:16 1781210416

Well playwright tests used to be called puppeteer tests which used to be called selenium tests, so you tell me.

dragonwriter · 2026-06-11T23:20:11 1781220011

Ťhose are all technology variations of “automated web ui tests”, which is a subset of “automated ui tests”, which is itself almost (but not quite exactly) a subset of “automated user acceptance tests”, none of which are new categories.

anthonypasq · 2026-06-11T17:08:21 1781197701

Mythos is 20x more expensive though

ACCount37 · 2026-06-11T17:37:33 1781199453

Fable 5 is listed at merely x2 of Opus 4.8 on OpenRouter. $10/$50 per 1M I/O, vs $5/$25.

Now, Fable 5 is currently borderline unusable because of asinine filters. But I assume they'll fix this shit eventually.

anthonypasq · 2026-06-11T19:21:08 1781205668

im talking about compared to composer 2.5

anthonypasq · 2026-06-11T17:06:19 1781197579

i mean this is difficult to calculate because of prompt cacheing, the ratio of input/output token etc, but if you just do some napkin math, i find it hard to believe people are getting this many tokens on a $20 plan.

heres some napkin math

gpt oss 120b is in/out price at 0.039/ 0.18 per million on open router. heres some assumptions.

1. the ratio of input/ouput is about 25/1. (coding is mostly grep and fairly low outpu)

2. you are getting 75% prompt cache reads

Case B: 50% Prompt Caching Discount (Standard Provider Rate)At 75% Prompt Caching:Total Tokens Obtained: 658,749,010 (approx. 659 Million tokens)

Input: ~633mil

~475 mil cached at 50% input pricing = ~$9.25

~158 mil uncached = ~$6.15

tokensOutput: 25mil tokens ($4.5)

This doesnt even account for profit margins on inference providers, or the fact that openAI probably has a much more efficient inference stack.

its really hard to know what these companies are actually paying, but from everything im hearing, people are reporting API inference pricing is 50% margin.

moralestapia · 2026-06-11T17:34:01 1781199241

I didn't say "use openrouter" as you might end using subsidized resources, part of the argument is to avoid that and reach the true capital cost of inference per token (or something like that).

I meant, buy/lease the hardware that lets you run this model, run gpt-oss-120b and measure. I did this once and it was like 10x more expensive than any hosted alternative, and $20 wouldn't get you far there.

anthonypasq · 2026-06-11T19:43:30 1781207010

heres the creator of opencode explaining how you are wrong

https://youtu.be/1VqKUrxR2C8?si=uOAs_4XNXtTyTwCP&t=2195

moralestapia · 2026-06-11T20:08:01 1781208481

He's either incompetent or lying.

An H100 today costs $2.95 an hour on vast.ai[1], which is already a good deal.

gpt-oss-120b on an H100 gives you ~200-250 tokens per second. I will be generous and say you can get a million tokens an hour out of it.

OpenCode Go (which I gladly pay for, because of this in part) is $10 a month, that's three hours of H100 use, and the models you have there are more expensive than gpt-oss-120b. Sure, they have "scale" (although that doesn't apply to AI inference, but whatever) and this and that, they're still pricing it 20-30x below their minimum threshold of capital expense.

Apples to apples, GLM 5.1 they sell it to you at $4.40 per million tokens, at ~50 tps in an H100 (being generous) it costs ~$16 to do a million tokens.

The math is simple and clear, they lose money.

1: https://vast.ai/pricing

anthonypasq · 2026-06-10T15:54:19 1781106859

but this has to happen during training no?

anthonypasq · 2026-06-10T15:51:55 1781106715

the role of evolution is always a confounding factor as well and all the various analogies to how it maps onto AI research are always not quite satisfactory.

anthonypasq · 2026-06-10T15:50:11 1781106611

Yes it seems most anti-LLM researchers take issue with LLMs on fundamental math/architecture based properties, but seem to miss all the engineering going on around the model to make it useful.

Those mathematical shortcomings very well might mean they arent a path to true AGI, but that honestly seems fairly irrelevant at this point tbh.

anthonypasq · 2026-06-09T17:33:47 1781026427

what incentive does Cognition have for doing this? seems like complete nonsense speculation on your part.

bel8 · 2026-06-09T17:45:35 1781027135

With billions/trillions of dollars floating around, is it hard to imagine benchmarks could be biased?

I think it's safe to assume everything AI related is heavily biased until proven otherwise. Just like in pharma.

camdenreslink · 2026-06-09T18:22:40 1781029360

People game benchmarks for fake internet points to get their favorite web framework to the top of the list. I'm pretty sure they will do it for billions of dollars.

anthonypasq · 2026-06-09T19:04:11 1781031851

you didnt answer my question. Why would cognition be biased towards making anthropic look good?

gloosx · 2026-06-10T07:42:14 1781077334

Because Cognition is a major customer of Anthropic?

anthonypasq · 2026-06-10T15:55:52 1781106952

they are also a major customer of OpenAI and every other model maker. whats your point?

anthonypasq · 2026-06-09T16:53:04 1781023984

because theyve been out for 6 months? i mean what the hell are you people expecting?

anthonypasq · 2026-06-09T16:42:29 1781023349

American AI companies are charging more, that doesnt mean inference isnt getting cheaper. idk why this is so hard for people to understand.

anthonypasq · 2026-06-09T16:41:03 1781023263

maybe its insane to think this, but if all AI providers turned off free plans tomorrow i think they would easily have enough people willing to pay $20 a month for it to sustain all their spending.

everyone is still fighting for market share so they are giving stuff away, but that doesnt mean people wouldnt be willing to pay for it if it wasnt free.

FromTheFirstIn · 2026-06-10T01:01:02 1781053262

This proposition boils down to a belief that there are 3 billion people who are interested in AI for free but aren’t currently paying $20, but who would pay $20 if that was the price. The global median income is around $12k, so this would mean that there’s roughly be a global budget of 0.5% of everyone’s annual income going to chatbots. If you’re off by half, the price doubles for each person. I think you’d make a lot of money betting against the existence of 3 billion ghost customers

brokenmachine · 2026-06-11T04:49:36 1781153376

I wouldn't.