More

trjordan · 2026-05-29T12:30:50 1780057850

Totally. Every "we're losing our craft" article has the same gloomy shape. That's enough of a bummer, but they also argue against themselves halfway through.

This one, for instance:

> But exactly which details are deemed “unimportant” is a very consequential and sometimes subjective decision. And eventually, the details always leak through.

Right, so you're saying this new technology will still reward deep technical understanding, because there's no way around it. I agree. Why is the whole tone of this thing "AI is making my craft a cheap commodity?"

Websites are largely better, technically, than they were 10 years ago. They're more full-featured, they're faster, SSL/a11y/responsiveness are stronger defaults. Content mills / SEO / news sites are a separate, terrible failure mode of ads and corporate incentives. That's not React's fault!

knuckleheads · 2026-05-29T12:33:27 1780058007

A craftsman's pride is an industrialist's nightmare! Software has been transitioning from a craft into an industrial process for the last two decades or so, and the software craftsmen of all stripes understandably do not like this!

randlet · 2026-05-30T00:21:06 1780100466

> Software has been transitioning from a craft into an industrial process for the last two decades or so

This seems like a good insight and it feels true to me as well.

My guess is the absolute number of people who treat it like a "craft" is higher than 20 years ago, but as a fraction of all developers it has shrunk dramatically.

knuckleheads · 2026-05-30T12:10:04 1780143004

I've been meaning to write down my thoughts about software explicitly not being a craft for many years now and life keeps getting in the way. It's a direct response to the Etsy engineering blog, "Code As Craft". I agree that there are more code craftsmen in general than before, but by percentage there's way more software engineers. Engineering best practices to me are in many ways about robbing coding and software from the mystique of craftsmanship and turning it into a repeatable industrial process that isn't inhumane per se but doesn't depend on any particular person to make it work.

acedTrex · 2026-05-29T14:10:10 1780063810

Ya it's definitely been an ongoing process. LLMs have just accelerated it.

knuckleheads · 2026-05-29T14:27:58 1780064878

I am not joking when I say that software craftsmen lost the war when tabs vs spaces was obviated as a point of contention by CI enforced formatting and linting around broader community standards.

duskdozer · 2026-05-30T10:59:15 1780138755

>Websites are largely better, technically, than they were 10 years ago. They're more full-featured, they're faster, SSL/a11y/responsiveness are stronger defaults.

This is the opposite of my experience. I find websites take much more time to load, are designed to require many more actions and interaction time to navigate, often break and are replaced by a blank page if any error occurs, use huge numbers of ad/tracking requests and JS, and are filled with accessibility-standard-violating unnecessary JS animations.

bigstrat2003 · 2026-05-29T19:35:02 1780083302

> Websites are largely better, technically, than they were 10 years ago.

That is not remotely the case. All software, not just websites, is a lot worse than it was 10 years ago. Bloated, slow, buggy messes that resulted from the industry hiring a bunch of people who just wanted to do the bare minimum and make fat stacks, rather than hiring people who actually care about good engineering.

SpicyLemonZest · 2026-05-29T15:32:29 1780068749

It's just not what I wanted. I got into software because I liked coding, deep technical understanding only excited me because it would help me code better. I don't want to get too "woe is me", there are far worse things in the world than having a vaguely unsatisfying job, but there are life choices I would have made differently had I known coding would be automated in 2026.

HDThoreaun · 2026-05-29T17:21:20 1780075280

You can still code all you like, youre just not going to get paid for it.

SpicyLemonZest · 2026-05-29T17:28:32 1780075712

Sure, but I've got other hobbies which better satisfy my itch for making things. Doesn't really solve my problem.

mbgerring · 2026-05-30T00:47:32 1780102052

> Websites are largely better, technically, than they were 10 years ago.

Holy shit, no, they are not. What world do you live in?

TurdF3rguson · 2026-05-30T05:25:17 1780118717

A lot of them weren't even up 10 years ago. It's not hard to be better than that.

trjordan · 2026-05-28T17:23:02 1779988982

It feels like we're far past the point of where having AI do more faster is helpful.

It's telling that they used "rewrite Bun in Rust" as the proof point here. It's cool! But the vast majority of software engineering doesn't start with tens of thousands of tests, where making them pass is the whole job.

In my experience, AI still drifts from what I meant it to do on anything bigger than building a widget. My time is spent suspiciously reviewing output for changes the agent snuck in, or invariants it broke. I talked with a friend recently where the agent broke the test harness badly enough that none of the tests mattered for 3 weeks. They did pass, though, so CI never complained.

There's something at the intersection of context engineering, managing that sloppy pile of markdown plans, and good old fashioning system understanding that's the real bottleneck.

kian · 2026-05-28T21:48:03 1780004883

"In my experience, AI still drifts from what I meant it to do on anything bigger than building a widget."

I've had code bases with tens of thousands of lines of code built from scratch that I hand-reviewed every line of and worked with the AI to improve, and haven't had this issue. I feel like a significant part of this is due to an involved /plan stage -- going back and forth on building out a plan for what you want the AI to do involves surfacing the assumptions that you would have called drift if you asked them to implement it directly from your prompt.

Once the plan has been refined and is what I want it to be, getting it to implement everything in TDD style has for the most part given me 100% working code, as I wanted it to be, without issues. It definitely helps that I'm a principal-level engineer with extensive architectural experience -- but if you're able to tell the AI in detail what you want, have it ask questions for clarifications, and read through a plan before getting it implemented, and have a solid testing plus manual qa process (automated by chrome devtools mcp) in place, I've find that you can one-shot complex features, rewrites, and even not-insignificant applications that would have taken days to write by hand in a few hours.

eithed · 2026-05-29T17:52:15 1780077135

Depends - using Sonnet here and generally it should be as you say: plan would produce the result.

Still Claude will sneak things in - in my recent plan, for example I had defined, per acceptance criteria what colours the statuses should be: green for live, blue for sold, grey for anything else; it changed this to: green for live, orange for in progress, blue for sold, red in demolition, etc. When pressed why did it to this, it was unable to explain why. This is with a plan where AC were explicitly provided from the task in Given/When/Then format and were to be adhered to strictly. I've caught this within planning, but I shouldn't need to be doing this.

Even in standard prompts where I tell it "Change this label from X to Y", it ended reordering the tabs unrelated to ask. Again I was not able for it to explain why - it was so abrupt. And it was in fresh context, without any pollution on what I expect it to do.

I also noticed a different behaviour regarding skill; today and yesterday it would not be following skill guidance at all ie: skill writing skill - I'd have to explicitly tell it to test skills after writing them, when this is a behaviour expected by default. Similarly with other skills - knowing that it should have done something per skill guidelines and it not doing it at all. This is new behaviour that I've not seen a week ago.

jeremyjh · 2026-05-28T22:23:46 1780007026

There are certainly domains where AI is not so effective, but at this point I would agree that at least in terms of web development if you can't get effective results from agents at this point it is a skill issue. That skill can be learned, if you recognize that learning is part of the solution. I do think prior experience in product design, specifications & business analysis as well as engineering leadership are all extremely helpful. Its about putting the agent in a box so small that it really can't screw up; but its also about being able to review design and code rigorously - to see around corners and anticipate possible weaknesses etc. There is really nothing I have to do when working with an agent that I haven't already been doing for decades but it seems to me that a lot of developers have never found a single bug while reviewing someone else's code.

trjordan · 2026-05-27T17:27:21 1779902841

They've got, ballpark, $5t to $10t to make back in the next 5 years, or the hardware buildouts will start getting written down.

This means we're going to need $1t+ per year in spending, per year, on tokens. 200m knowledge workers in the world, 30m developers. We're talking about a world where you need 5% of every knowledge workers salary to go into tokens. 20% if you're a developer.

That's a _huge_ shift. Most people I know cite +20%-40% velocity with these tools, against the actual work their company cares about doing. +20% speed for +20% spend isn't going to motivate a trillion dollars a year in spending.

We're not there yet. This is still the upswing of the hype cycle, and unless we figure out how to make developers 2x, 5x, 10x as productive on stuff that matters, this isn't going to play out well.

whatshisface · 2026-05-27T18:45:20 1779907520

Here are a few thoughts:

- The publicly available information about how inference costs compare to training costs is conflicted. EEs involved in datacenters talk about power usage spikes during training runs as if they were a major factor in the designs, but academic papers discussing cost-optimal scaling confidently treat inference-time compute as a major factor.

- On the side of the balance indicating that training is more compute-intensive after amortization than inference is that Chinese providers, constrained primarily by access to compute, have nearly unlimited token availability at a lower price than US providers (inference), but poorer model capabilities (training). That would make sense only if US providers are inflating inference costs by 20-30x due to amortized training costs that overseas providers were not able to take on (there are other factors too).

- If training >> inference, they're in a prisoner's dilemma that far exceeds the ordinary zero-marginals model of competition between firms (due to its huge discrete stepwise nature). On the other hand, if inference>>training, the high-level analysis popularized by certain thought leaders, that it's like a utility, would be true. You'd tend to count this as a vote for inference>>training, but the CEOs saying it at least have a huge incentive to agree because the alternative, the prisoner's dilemma, would stop investment very fast.

- The only voice in the story that I just told you to have anything to do with fact (as opposed to high-level analysis and ivory tower armchair management of a secretive business) were the rumors from facilities engineers. That shows you the state of our understanding...

- If we don't even know the ratio between amortized capital expenses and operational costs, outside investor analysis is impossible. It doesn't matter how finely they divide the accounting buckets for office ferns and indoor ferns if the single biggest part of their business is obscured for trade secret reasons.

materielle · 2026-05-27T18:54:27 1779908067

I'm about to leave a shallow comment, but I am a bit skeptical of the supposed drop in inference costs. If AI labs saw a lot of potential there, they'd surely be bragging about it non-stop? So the fact that publicly available information is conflicted is probably a sign that at the very least, the numbers aren't amazing.

Yes I know there's no evidence and this is lazy reasoning. But there's probably a bit of truth to this line of thought.

Tuna-Fish · 2026-05-27T19:06:35 1779908795

Why on earth would AI labs be bragging about how little the product they sell actually costs them to make? You don't want to do anything that reduces it's perceived value to the user, that might make them less willing to pay for it.

Also, inference costs are bound to go way down with more optimized architectures. GPUs are fundamentally not great at inference. No platform where the weights are streamed from a large pool of memory is. If the models ever quiet down, there will be massive step changes in cost/token, energy/token and tokens/second, as models are etched into silicon ala https://chatjimmy.ai/

overgard · 2026-05-27T21:33:58 1779917638

A couple of years ago Altman was saying the price of AI compute is going to drop 90% year over year or something like that, so I don't think they're nervous about talking about lowering their costs. They probably just haven't been able to lower their costs.

You have to keep in mind that about 99% of their announcements are targeted towards investors (their most important revenue source..), so they're not going to be afraid to mention metrics that make the business look better.

bwhiting2356 · 2026-05-28T05:26:51 1779946011

Jevons paradox. Cheaper tokens does not mean we will spend less.

Skinney · 2026-05-28T08:07:45 1779955665

Cheaper tokens means the company's margins increase, which would be valuable for investors to hear

missedthecue · 2026-05-28T15:14:34 1779981274

The main limit to my token spend right now is that I'm running out of hours in a day.

mcmcmc · 2026-05-27T22:28:37 1779920917

Ah yes, Sam “Not Consistently Candid” Altman

pixelready · 2026-05-28T02:48:25 1779936505

Oh, is that the guy that sold Loopt by claiming it had hundreds of thousands of users and it turned out to have 500 DAU after his exit?

chipsrafferty · 2026-05-28T04:06:02 1779941162

Yep, the very same scammer. Wonder if he's lying about OpenAI too? Maybe about a person blowing a metal instrument?

whateveracct · 2026-05-28T04:53:15 1779943995

he lied. he's good at that.

golem14 · 2026-05-27T19:37:22 1779910642

Why would any company brag about their margins ? Yet they do, to attract investors.

Tuna-Fish · 2026-05-27T19:43:07 1779910987

The key AI labs are not public companies, they are at liberty to brag about their margins to potential investors in private.

SiempreViernes · 2026-05-27T20:24:28 1779913468

And investors will leak such claims quickly enough that this reasoning cannot plausibly hide big secrets.

Tuna-Fish · 2026-05-27T21:29:47 1779917387

It's not a big secret. If you just do the math yourself, it's easy to compute that inference doesn't cost all that much. People just see all the capital investment going around and all the new data centers being built, see that it's spent on "AI", put two and two together and get a three, or "clearly serving AI requests costs an arm and a leg".

The 1 they were missing is that AI requires both training and inference, and training is by far the expensive part. And that in principle you can stop training at any point and keep using the models as they are. (But that means that if other companies keep improving their models, you'll be left behind...)

In contrast, inference is fairly cheap and all the providers have great margins on it. Eventually either investment in training stops having commensurate impact on model quality, and people stop doing that and instead concentrate on making inference faster and even more efficient. Or if that doesn't happen, things will get very weird very quickly.

whatever1 · 2026-05-28T00:39:40 1779928780

The market already shows where it will go.

If you want frontier model you will pay more for inference to essentially fund the expensive training.

If you don’t need frontier model you will get dirt cheap inference, which eventually will approach the cost of electricity spent per token.

mattmanser · 2026-05-28T13:24:03 1779974643

This is technically correct, but practically false.

They can't stop training as then the AI's knowledge will become out-of-date very quickly. Their knowledge stops the day you stop training.

flextheruler · 2026-05-28T16:32:45 1779985965

Yes it seems that this discussion that has sparked such controversy involves an already well defined concept in business.

Net margin versus gross margin.

Net shows profitability after extracting all expenses while gross only extracts the cost of the goods sold. Putting the model training costs into a one time fixed expense provides a much better gross margin.

This is known as COGS reclassification or classification shifting and is a common tactic to mislead investors.

This is why analysts look at Free Cash Flow Margin.

WorldCom and MicroStrategy did this before the Dotcom Bubble imploded.

ethin · 2026-05-27T22:53:36 1779922416

> If you just do the math yourself, it's easy to compute that inference doesn't cost all that much.

Show us your work, then. If it's so easy to do, this should be a trivial request to accommodate, no?

mediaman · 2026-05-27T23:32:39 1779924759

Just look at large open weights models being served by inference providers.

Kimi 2.6 is a 1 trillion total / 32B active parameter model that's something comparable to Sonnet. Sonnet's API pricing is $5 in, $15 out per million tokens. Deepinfra serves Kimi at $0.75 in, $3.50 out, and about the same at openrouter. So you're looking at a 4-7x multiple that Anthropic is charging compared to market rates that any plebe can get with a credit card.

majormajor · 2026-05-28T01:02:59 1779930179

I'm not sure just how good that looks for Anthropic/OpenAI.

4-7x isn't a tiny markup, but how does that compare to high-margin internet businesses like AdSense? Meta and Google do hundreds of billions in ad revenue a year, and after taking out the publisher's portion (60-80% per some searching), I wonder what the ratio of the remaining tens-of-billions is against the compute cost and headcount required to run it.

And how much room for maintaining or improving that margin do they have if the cheap competitors also continue getting better? Is there a "good enough" point where the easier inference tasks are all moving to vendors massively undercutting them, and then they don't have the volume necessary to justify spending on further cutting-edge development?

re-thc · 2026-05-28T10:26:52 1779964012

> Kimi 2.6 is a 1 trillion total / 32B active parameter model that's something comparable to Sonnet.

No it's not. On some rigged paper maybe. Some such benchmarks say all models group together, which they clearly do not.

> Sonnet's API pricing is $5 in, $15 out per million tokens. Deepinfra serves Kimi at $0.75 in, $3.50 out, and about the same at openrouter. So you're looking at a 4-7x multiple that Anthropic is charging compared to market rates that any plebe can get with a credit card.

That's not saying much. You can get "cloud" at AWS and you can get a VPS. There is likely a 10x difference. It's not "same". Whilst AWS costs more they also don't have 7x margins similarly.

brookst · 2026-05-28T19:06:59 1779995219

I’m wary of “has not been leaked in a way that was picked up in public news” as proof or disproof of anything.

bwhiting2356 · 2026-05-27T19:57:02 1779911822

this is changing soon

joelthelion · 2026-05-27T20:48:11 1779914891

Not really, how much of a public company are you when 5% of your capital is public ?

Tuna-Fish · 2026-05-27T21:22:48 1779916968

That doesn't matter for the legal requirements.

The short and only kind of wrong version is:

In the US, companies are not allowed to unfairly privilege some investors over others by giving them access to secret information that would let them judge the future prospects of the company. (Except in all the ways they can, but these usually involve some kinds of insider trading rules.) Private companies can handle giving out secrets to investors by literally writing and memo and mailing it to all their investors, if they want to give out some secrets to one of them.

Public companies cannot do that, even if they knew who all their investors were, but must instead consider every member of the public a potential investor, even if they don't already own the stock. Because of this, when public companies want to reveal material information about their future prospects, they must reveal it to everyone.

tverbeure · 2026-05-27T21:12:01 1779916321

The percentage is irrelevant for this discussion. As soon as you’re public, you need to report detailed financial numbers.

overgard · 2026-05-27T21:35:20 1779917720

Plus, you have to do real GAAP accounting, not their made up metrics.

kfse · 2026-05-28T14:31:33 1779978693

Besides the legal requirement, the reason these companies go public is often to provide liquidity for early investors or employees. So they do want to have as good of a margin story that they can, at least in terms of unit margin.

fakedang · 2026-05-27T21:52:46 1779918766

That's changing with this administration though. Reduced reporting cycles reduce transparency.

mrosett · 2026-05-28T05:26:58 1779946018

It won't impact the disclosure of key business details because it doesn't reduce the level of disclosure needed in the S-1 or the 10-K.

jimnotgym · 2026-05-28T05:43:56 1779947036

This is an interesting anomaly in the US. In the civilised world all corporations have to file public accounts, as the price for their limited liability. The detail and audit requirements depend on the size, turnover, staff numbers etc. This is because the shareholders are not the only stakeholder. The companies creditors, for instance, who are exposed to the limited liability have a right to see what they are lending to.

To answer the sibling comment, all of these public accounts follow local GAAP or IFRS.

The US still astounds me with its willingness to allow corporations to rip people off!

kortilla · 2026-05-28T06:23:21 1779949401

Creditors in the US can make visibility into financials a requirement for financing if they want. Protecting creditors isn’t a good argument for public reporting.

jimnotgym · 2026-05-28T17:42:32 1779990152

What about potential employees, can they look? The local community that consents to let the company build and operate in their town? How does that help, if they don't follow have to follow GAAP anyway?

kortilla · 2026-05-29T03:21:55 1780024915

Why are those things relevant to either employees or a town?

Most of the US is at-will so the financial health of the company is unlikely to be the reason you’ll suddenly lose a job.

Same for a town, if you’re structuring a deal that has counterparty risk then you mitigate the risk. If an employer is just leasing some office space in your town, why in the world would you ever even think you had the need to look at their financials?

VBprogrammer · 2026-05-28T07:28:05 1779953285

What are the arguments against public reporting?

As a consumer you are often sending deposits or even the full cost of goods to companies some time before you receive those goods (in effect you become a creditor). You are also dependent upon some of those companies for service and repairs. It seems reasonable that you can check the finances of a company you are creating a business relationship with, I know in the past I've checked company statements.

You are unlikely to have significant enough sway to force that kind of disclosure. Small businesses as consumers have less legal protection and are similarly unlikely to be able to make disclosure a precondition of a deal.

nradov · 2026-05-28T16:26:42 1779985602

So what. As a customer you can insist on seeing audited financial statements as a condition of purchasing, or purchase from another vendor, or do without. No problem.

VBprogrammer · 2026-05-28T21:05:48 1780002348

Or, in the real world, running a limited liability company could come with some sensible reporting requirements?

nradov · 2026-05-28T21:25:04 1780003504

Why? And what's sensible about it?

daemin · 2026-05-28T01:45:19 1779932719

Isn't there a limit on the public markets where if a company has less than a certain percentage of its ownership traded publicly then it is no longer a public company and therefore de-listed?

I remember hearing about a guy trying to squeeze out short sellers of his own company but ended up effectively taking his company private because he bought out like 95% of all the shares.

I wonder how that aligns to these small releases of stock for the public.

extraextra · 2026-05-29T14:01:03 1780063263

There is no legal minimum free float requirement before deregistration in US, however, different exchanges have different rules

Essentially, a stock has to stay above 1$ per share, have a minimum market cap of $15m, minimum 400 shareholders and "adequate" liquidity If it meets those 4 criteria, it's essentially not at risk of deregistration

lmm · 2026-05-28T01:00:02 1779930002

Growing companies don't brag about their margins, they brag about their growth and revenue. Margin talk is for when you're a mature company squeezing out every bit of profitability you can - if anything it would be a negative sign to be worrying about your margins when you're supposed to still be growing and innovating.

amarant · 2026-05-28T06:27:16 1779949636

I mean, did anyone expect them to not have margins? Why keep it secret?

Yoric · 2026-05-28T06:55:12 1779951312

> Why on earth would AI labs be bragging about how little the product they sell actually costs them to make? You don't want to do anything that reduces it's perceived value to the user, that might make them less willing to pay for it.

Wouldn't they be bragging about it to investors? It feels like something that would matter a lot to them, and at least OpenAI kinda feels desperate to find them.

There's also the small question about whether a drop in inference cost would actually change anything about profitability, when training seems to get exponentially more expensive.

neltnerb · 2026-05-27T21:45:08 1779918308

Because companies that want to go public need to look profitable or potentially profitable. And before they go public they have to release real, actual, legally demonstrable numbers for their costs and revenue anyway.

extraextra · 2026-05-29T14:08:58 1780063738

When they will actually file to go public, their numbers will be intensely scrutinized. That's all that global headlines will be talking about for weeks on end. Why would they create forward expectations before it's necessary?

Of course they don't want to create forward expectations in a volatile macro environment, with the public listing being 6 months out.

etempleton · 2026-05-28T00:37:02 1779928622

Because the most important thing for any pure play AI company right now is to prove they are a viable company. And sure they have proved they can make billions, but also that they can lose billions more. They are going to need even more money and to prove to the next round of investors at an even higher valuation that they are a viable business they need to show not that they can generate revenue, but that they can one day turn a healthy profit. And that is the trillion dollar question.

jimbokun · 2026-05-27T21:28:54 1779917334

I doubt having to replace every single chip in your data center every time you release a new model will bring down costs.

kopirgan · 2026-05-28T03:25:57 1779938757

Went to that URL asked one question - "how is this different from other AI" and it took 598/6144 tokens, not sure what that means.

philipswood · 2026-05-28T03:55:45 1779940545

Not super clear from the site itself, but this LLM is running on specialized silicon implementing just it. So has super low energy use and blazing speed.

See https://taalas.com/products/

Edit: updated link

kopirgan · 2026-05-28T04:02:32 1779940952

Incredible increase over Nvidia! Need to read more.. Thanks!

DrewADesign · 2026-05-28T00:01:21 1779926481

Because they can think more than one quarter into the future? Why on earth would someone adopt something into their core workflow that was fantastically unprofitable? Uncertainty and business don’t mix. Most people aren’t hype-eating bacteria that only care about maximizing their next paycheck.

wheresmylogin · 2026-05-28T12:38:39 1779971919

One reason is that all the code you write with this goes in your private git. If using AI no longer is possible because of cost, you can still profit a lot from what you did with it before.

DrewADesign · 2026-05-28T16:21:15 1779985275

For consultants? Sure. What percentage of contractors are consultants? And is that better than going with something in your stack that’s sustainable even if it’s not totally optimal? I’d wager most would say no.

nradov · 2026-05-28T16:31:35 1779985895

Regardless of profitability there will always be multiple good LLM vendors as well as open-source alternatives (slightly worse but still pretty good). If one vendor fails then it's easy to switch your core workflow to a competitor.

DrewADesign · 2026-05-29T00:58:08 1780016288

On an individual basis for coding? Sure. If you’re a significant business with agents that do more nuanced work, which is the only kind of customer that will let any of these companies pay back those trillions of dollars as quickly as they need to to stay alive, these are not fungible services.

m463 · 2026-05-28T21:11:23 1780002683

I wonder if inference costs will go down...

or will it be like microsoft office, where the software bloats to use/fill current hardware?

(and in this case bloats might mean better thinking or pulling in more data)

kopirgan · 2026-05-28T03:20:32 1779938432

If inference costs drop 90% or whatever, that would be a massive write-off of hardware even before they gave any returns for it?! Given Chinese and others are snapping at the heels and would also benefit from such reduction in cost.

solarkraft · 2026-05-28T04:11:24 1779941484

> Why on earth would AI labs be bragging about how little the product they sell actually costs them to make?

Investor confidence. They have a bit of a need for cash (also an interesting part of the profitability discussion of course).

> Also, inference costs are bound to go way down with more optimized architectures

I agree. Jimmy is incredible, I wonder what non-toy use cases they have. Surely they’ll come out with updated chips soon.

That said, I was apparently a bit over-excited for Groq and Cerebras. I thought they’d quickly dethrone Nvidia for inference, but not so far. Even the GPT spark trial isn’t seeming to go far.

whatshisface · 2026-05-27T18:58:14 1779908294

Inference has traditionally been far less expensive than training. One public example is the fact that hobbyists can run StableDiffusion ($600k training costs[1]) on their personal computers.

Speaking to your point, inference being dramatically less costly than training would not be seen as a delta from the norm. The model of providing inference for anything near the operational costs (like a utility would), would the delta from the norm if it were true.

[1] https://x.com/emostaque/status/1563870674111832066

thesz · 2026-05-27T20:31:09 1779913869

The difference between training and inference is 1) one have to keep intermediate results for backward pass in training and 2) computation for training double because of the backward pass.

Training is also done over batches, which increase memory requirements by several orders of magnitude. This is why training needs costly compute.

One of the ways out of this unfortunate situation is to use something like Stochastic Average Gradient Descent [1]. Examples there are mostly concerned with regularized logistic regression, which makes problem more or less convex. Neural networks are inherently non-convex. Still, maybe some ideas from there can be utilized in the context of neural networks, like use of estimated Lipshitz constant to derive curvature and appropriate learning step.

  [1] https://www.cs.ubc.ca/~schmidtm/Courses/540-W19/L12.pdf

janalsncm · 2026-05-27T20:55:43 1779915343

So one way to think about it is roughly,

Training is inference + backwards pass (~2x inference cost) + activations (vram overhead) + optimizer (vram overhead) + gradients (vram overhead).

thesz · 2026-05-27T21:29:23 1779917363

Multiply "inference + backwards pass (~2x inference cost) + activations (vram overhead)" by batch size (thousands) to get to the actual RAM and compute cost. Optimizer like ADAM adds only two or three model-sized overhead.

And last, but not least, you need only one hidden layer kept in RAM for inference, but you need all of them (61 for Deepseek models) kept in RAM for computing gradient for one sample.

xyhopguy · 2026-05-27T23:30:12 1779924612

Microbatch size is a hyperparameter, it can be set to 1 and work just as effectively. With gradient accumulation it's equivalent even. Large batch sizes are used to increase parallelism, and sometimes to reduce variance in the loss signal (at the cost of increased bias).

Batch size is frequently limited by compute bottlenecks well before memory.

mcv · 2026-05-28T08:18:57 1779956337

And of course you do all of this for every object in your training set, which is going to be larger than the total number of uses for any individual user.

galaxyLogic · 2026-05-27T23:14:11 1779923651

Does it matter what is the difference in size of needed inputs for inference vs. training?

mike_hearn · 2026-05-28T11:38:34 1779968314

It's all got much more complex than that in recent years. Training now involves large amounts of inference for RL rollouts and similar. You can't disentangle them computationally like that. "Inference" is just the word used to mean serving customer traffic now, and "training" means creating the model you serve.

whatshisface · 2026-05-27T23:04:08 1779923048

That is an estimate of the relative cost of one training step, but you have to multiply it by the number of training steps, an unknown quantity.

vanviegen · 2026-05-28T11:22:42 1779967362

I think in your StableDiffusion example, a lot more than $600k will have been spend on electricity alone for inference (on those personal computers you mention). So inference is more expensive then training.

lumost · 2026-05-27T21:41:24 1779918084

For equal capability tokens, there has been about a 10x drop in cost every 6 months.

We are still chasing the best because the best is moving rapidly, but it’s a simple thought experiment to work out what the cost to serve an 8B model from 2 years ago is in a world of 2T models.

Note: parameter counts are illustrative. Concretely, qwen3.6 27B delivers opus 4.5 capability at 1/27th the cost on openrouter. Single chip llama3 8b performance can exceed 17k tokens/sec.

byzantinegene · 2026-05-28T06:21:23 1779949283

8B models would be consider obsolete in the world of 2T models, at least if we're talking about the competitiveness of OpenAI/Anthropic. The only reason why they are valued so highly is their supposed dominance at the top end.

lumost · 2026-05-28T13:52:20 1779976340

The main story of agent use cases is in enterprise so far. An enterprise will only pay for a model capable of handling the task and no more. Most enterprise's see no need to hire PhDs as factory line workers.

Coding is an interesting case as [1] the pace of progress has been absurd and [2] it's hard to put an upper bound on required capability. However hard to put a bound on and will are different, it's quite possible that the average engineer will cease to see the benefit of rapid progress - or that their employer will be satisfied with lower tier models.

How smart of a model do you need to build a high quality CRUD app for internal users? Or build a scalable web service?

byzantinegene · 2026-05-29T02:49:50 1780022990

yes, which is why the revenue growth story is not looking so great for Anthropic/OpenAI, when open-source alternatives are not far behind with much lower costs.

joshuahedlund · 2026-05-28T11:47:30 1779968850

> For equal capability tokens, there has been about a 10x drop in cost every 6 months

Is this still happening? Opus 4.5 was six months ago, can you get its capabilities for 1/10 cost now? Are we on track to get the same for 4.6 in a couple months?

lumost · 2026-05-28T13:47:19 1779976039

Pretty much, Kimi K2.6 is opus 4.6 quality for coding. If you include discounts due to more efficient input caching it is around 1/10th of opus4.6.

https://openrouter.ai/moonshotai/kimi-k2.6

The march of cost efficiency moves on.

joshuahedlund · 2026-05-29T11:16:50 1780053410

Why haven’t I heard of this? Is it available in IDEs like Cursor?

no-name-here · 2026-05-28T03:15:29 1779938129

> I am a bit skeptical of the supposed drop in inference costs. If AI labs saw a lot of potential there, they'd surely be bragging about it non-stop?

Unless to the grandparent commenter’s point they’re using it to obscure their large prisoner’s dilemma (training) cost?

neuronexmachina · 2026-05-28T01:04:19 1779930259

> If AI labs saw a lot of potential there, they'd surely be bragging about it non-stop?

Google seems to pretty regularly post about how their TPU and algorithm advancements have been decreasing energy costs for both inference and training.

brookst · 2026-05-28T19:05:23 1779995123

What other companies brag about lowered costs? Isn’t that just a complicated way of asking customers to demand lower prices?

vlovich123 · 2026-05-27T20:10:19 1779912619

Small alternative potential future changes that alter this analysis:

* At some point model capability reaches diminishing returns. Then inference >> training in the future but training >> inference now. It’s not a prisoner’s dilemma but a land grab to solidify market position and be one of the 2-3 firms left standing as dominant in the space. The model companies aren’t super sticky yet but they’re working on it.

* even if training remains >> inference, it’s possible to have multiple price points like they do today. If you need the most capable model you’ll be paying exponentially more per token to supplement the training cost even though the serving cost is marginal because most people will be satisfied with cheaper / less capable models for most tasks.

I buy that inference is a dropping line item while training is a growing one. There’s all sorts of things on the horizon that’ll be order of magnitudes improvements, from startups burning models into ASICs to get order of magnitudes more performance to alternate architectures like diffusion transformers that have orders of magnitude structural optimizations. It’s inevitable that it’ll come down even further from where we are. It’s possible model training also will go down but I’ve not seen any compelling research suggesting major “easy” reductions here.

janalsncm · 2026-05-27T21:01:18 1779915678

The issue is that most tasks do not require frontier-level intelligence, but companies like OAI can really only profit off of the frontier. Capabilities from a year or two ago are so outdated that even OpenAI gives it away for free and there are many other models biting at their heels. In other words they are spending huge amounts of money to cash in on a depreciating asset.

So one possible future is that frontier-level training becomes so expensive and the use cases so sparse that it simply isn’t viable to keep going bigger.

extraextra · 2026-05-29T14:36:55 1780065415

Once the land grab is over, the market will consolidate and the winners will absorb the losers. Then the few winners will be the only ones with real capital to train frontier models and will have true pricing power. Similar to how social media companies or the gig-economy benefits from network effects, AI companies will benefit from having the lion's share of paying customers (that also constantly feed in more data to train the models on).

twobitshifter · 2026-05-28T01:47:47 1779932867

We have GPU costs, power costs, and how many token/s models can generate on those GPUs. It’s possible to figure out the marginal cost based on this. The current estimate is about $0.40 per million tokens for gpt4 equivalent model. Sonnet 4 is $15 per million tokens, so they are charging high margins on inference. The issue is how large of a margin is needed to recover their costs before the GPUs age out, and how high of a margin can be charged before it’s not economically viable.

https://www.gpunex.com/blog/ai-inference-economics-2026/

rudedogg · 2026-05-28T02:01:36 1779933696

That seems way off to me.

I skimmed the article, but couldn’t spot any details on their estimates. They mention 70b+ params as being large in several places. But we’ve had several 100b+ param models that trail Sonnet.

zozbot234 · 2026-05-28T09:52:38 1779961958

Why would power spikes from training runs imply training>>inference? The cost of a training run scales with energy, whereas power is energy per unit time. All that tells you is that they're speeding up their training run so it will take less time overall (probably chasing some first-mover advantage, where they're out with a given model before their closest competitors), whereas they obviously can't do that for inference (which is a steady flow of requests over time).

somewhereoutth · 2026-05-27T23:10:10 1779923410

Yes the huge discrete stepwise training spend is critical.

Maybe investors will realise that "the only winning move is not to play".

And so we are left with (as was) frontier models getting more and more out of date as whoever their post bankruptcy custodians are tries to eek pennies on the dollar for inference on their decaying property. Perhaps along with local and/or highly specialized models still feeding on the after-glow of the huge amount of training that was (and is no longer) done.

The next AI winter is going to be deep, savage, and long.

extraextra · 2026-05-29T14:59:36 1780066776

Bankruptcies? The winners will gobble up the losers and the few remaining players will have pricing power. Don't be naive thinking that OpenAI or Anthropic can possibly go bankrupt. There will always be someone happy to buy them up for a nice price. Yes, the market will have to go through a consolidation phase though.

galaxyLogic · 2026-05-27T23:18:17 1779923897

> frontier models getting more and more out of date

Why are they getting out of date? Is it because we have new content from the internet that the older models did not have? Or are we simply trying to increase the size of the training data? In other words not more up-todate in terms of time the content was created vs. wanting to use bigger training-input-sets?

somewhereoutth · 2026-05-28T13:43:22 1779975802

lack of new content from the internet will make them go out of date. Not just facts and figures, but (for example) new programming languages/techniques.

galaxyLogic · 2026-05-29T01:20:34 1780017634

I see makes sense. Then it it kinda says that quality of the model is the topicality of its inputs I assume.

IX-103 · 2026-05-28T02:57:09 1779937029

I don't see how it would be possible for inference costs to dominate training costs, even after amortization.

Training involves multiple passes over the entire training dataset, ideally in large batches where you can perform inference on as many samples as possible simultaneously and then perform backpropagation to adjust the model weights (which is about as expensive as inference).

Let's consider the size of the dataset we're dealing with here. The dataset likely consists of practically every piece of digitized text they can get their hands on (including that extracted from audio and video). We know Google has digitized a large portion of the books in existence as part of their "search book contents" feature and we have no reason to believe they're not using it alongside their cache of 90+% of the internet to train their models. We're talking about 100s of millions of books each with an average of 100,000s of tokens. The internet has 10s to 100s of billions of pages on it with who knows how many tokens on average. This is a huge dataset that we've got to go through hundreds of times.

Second, let's consider the effect of batching and how it sets requirements for our hardware. We know that larger batch sizes converge faster, are more stable, and produce better models. So if you want a good model you need large batch sizes. This means that you need machines several orders of magnitude more powerful than you use for inference. From what I heard Google uses clusters of 100s of the their TPUs all located in a single rack for training. These clusters are organized in a customized computing architecture to maximize memory locality between cores (really critical for efficient back-propagation). Further, you can't use reduced precision weights for training like you can for inference, so there are no shortcuts.

Finally, the initial training stage is followed by reinforcement learning stages - this is key development in how AI models have improved in the past year. This may mean going through a curated set of traces (either synthetic or captured from users) and adjusting the weights based on experienced outcome.

Overall there's so many orders of magnitude more work and more hardware requirements for training that I find it improbable that inference dominates. The number of "inference" steps in training is freaking ridiculous and includes such factors as the "number of words ever written".

atq2119 · 2026-05-28T04:46:04 1779943564

It's been a while since I saw a detailed paper on a high end training run, but extrapolating from what I remember, it seems those training runs are in the 10s of trillions of tokens. This already accounts for potentially sampling tokens multiple times during the training run.

That seems like a large number, until you realize that OpenAI claims to have almost a billion weekly users. And OpenRouter shows many models at over a trillion tokens per week.

So in pure token terms, I'd say it is in fact extremely plausible that inference dominates, at least for the popular models.

johnecheck · 2026-05-28T05:01:41 1779944501

Not saying you're wrong, but I'll note why inference might dominate despite everything you mentioned.

A given model is trained once but applied N times. A large enough N will dominate training, no matter how complex and costly it was.

But how long is a model useful for? How often will labs need to train new models? Time will tell.

upbeat_general · 2026-05-28T05:39:32 1779946772

This statement is well known to be incorrect for at least a year.

extraextra · 2026-05-29T15:09:08 1780067348

Great points. - At the end of the day those are still private companies (albeit huge ones), so we can only speculate about the state of their private financial situation. Once they will decide it's the right time to IPO, they'll publish all their financials and we'll start to have a clearer picture. - Later, each company will slightly specialize and have a different go-to-market strategy, which will allow us to understand on a deeper level what works in the market and what doesn't (think about how Facebook, Instagram and TikTok are all huge universal social media platforms, but, each with a different target audience and different user base). - Finally, the market will go through a consolidation phase in which winners will gobble up the losers and then the incumbents will have a real moat (against new-comers) and real pricing power on their user base.

stevenally · 2026-05-28T00:19:04 1779927544

> If we don't even know the ratio between amortized capital expenses and operational costs, outside investor analysis is impossible.

And yet we surely need this data for the IPO? Or are they relying on rule changes on the indexes to force ETFs to buy shares?

extraextra · 2026-05-29T14:56:10 1780066570

The IPOs are months away, potentially 6 months or more. We're in a volatile macro environment. AI companies have all the incentives to not create higher expectations regarding their financial situation a long time before the IPO. Obviously at IPO they will have to disclose their full financial situation.

The market is super hyped anyway for their IPOs. If they raise investors expectations now and things change until the IPO, investors will be disappointed. It's a lose-lose proposition.

The smart play for any company is to keep their cards close to their chest until close to the IPO time.

FuriouslyAdrift · 2026-05-27T19:06:15 1779908775

I work for a tiny little company ($150MM annual rev with 9% net) and we are already looking at dropping $100k on hardware to run local models because, for us, they're "good enough."

Our estimated spend for AIaaS would exceed that cost in less than a year.

In a few years, there will be hardware capable of running frontier models good enough for most things at accessible prices for even tiny companies.

simplyluke · 2026-05-27T19:29:17 1779910157

Yeah, that's the part that just seems to be wildly under-discussed to me.

If open source models are ~3-6 months behind SOTA, and ~opus4.6 capabilities are good-enough for product market fit, do the frontier labs have half a decade to catch up on their prior burn?

AI cost ballooning faster than companies can afford is becoming a very common topic in my circles right now. The era of "I'll pay infinitely more for marginal gains" is over from what I can tell.

an0malous · 2026-05-28T01:38:12 1779932292

> If open source models are ~3-6 months behind SOTA, and ~opus4.6 capabilities are good-enough for product market fit, do the frontier labs have half a decade to catch up on their prior burn?

They know they do not and that’s why they’re all trying to IPO right now, so they can pass the bag to consumer investors

WinstonSmith84 · 2026-05-28T08:35:55 1779957355

More correlation, if more correlation was needed:

1- SpaceX + Tesla + xAI merger / IPO while Musk was vocal against IPO for about a decade

2- Warren Buffett cash at record highs

Someone got to be exit liquidity

londons_explore · 2026-05-28T18:22:57 1779992577

The printing press was good enough for product market fit back in the 1700's. But now it isn't.

Last year's AI models will be the same. Do you want to spend 3 hours prompting free AI to fix your code or 1 hour prompting AI you paid $20 for?

an0malous · 2026-05-28T19:17:46 1779995866

That's only if these AI companies can keep improving their model performance faster than open source options can keep up. I don't think performance will keep scaling with more training data, and even if it does they've likely already used the entire history of content created by humans for training. Everything points towards diminishing returns in an increasingly crowded space of competitors, there's no other reason for these companies to be rushing to an IPO if they felt secure in their market positioning.

doug_durham · 2026-05-27T19:47:55 1779911275

Open source models that you can run locally are much more than 3 to 6 months behind. 6 months was the November inflection for Claude. No open source model is as good as Claude Opus 4.6.

jobs_throwaway · 2026-05-27T20:08:24 1779912504

It depends what you mean by locally. I don't foresee running a model on my laptop anytime soon to power a coding agent. Far more likely is an infra team at my company operating an open source model on cloud infrastructure. When they're already paying $1000 / month / dev, it starts to pencil pretty quickly.

wrsh07 · 2026-05-27T23:18:55 1779923935

Is there any open model as good as opus 4.6 at any price?

overfeed · 2026-05-28T00:36:35 1779928595

How many problems require Opus-4.6-level performance? The "I'll accept nothing but the very best model for any task" thinking is perplexing to me.

People got a lot done before Opus 4.6. In 6 months, would you be dissatisfied by Opus-4.6-level open-weight models, just because Opus 4.8 will be out?

strix_varius · 2026-05-28T00:48:37 1779929317

Not OP but I've been thinking about this a lot (like everyone ha) and I think my answer is, yes?

I hope there's a "good enough" point but I don't think we're there yet. Like for me hardware got good enough several years ago. But while opus 4.7 is really good compared to everything else, it's not so good that I would use it at a discount over whatever is available in a few months. The improvement in quality, speed, and daily frustration is worth it to me... Spoken as someone whose employer is footing the bill, so take that with a grain of salt.

I want to run my own local models, but I don't think that's feasible without lots of frustration until a few generations of frontier models are so good that they're almost indistinguishable for common tasks. Kind of like how MacBook pros have been for a while.

majormajor · 2026-05-28T01:12:41 1779930761

While I can imagine that I'd want to use Opus 4.8 over 4.6 for a fair number of things (at least if they can avoid further speed regressions), I also have noticed that certain types of failures seem to be systemic. Bigger context has been helpful for bootstrapping, but still doesn't fix problems of getting stuck on the wrong things - you can toss more things in the blender, but you don't necessarily know which way it'll slice them up in advance, or which things from them it'll latch onto. And output still seems to get into "blindered" states where important details get dropped - even though it'll agree very quickly when you point that out. As long as we're in that sort of "spit something out in local targeted manner, and then do a revision loop until tests are green" style of execution, bigger models haven't shown me the ability to really avoid finding non-optimal / subtly-broken outputs for complex problems.

Using Cursor to hop between models, I've found Opus to be generally better at really tricky debugging than GPT 5.5 or earlier models, but not reliably better at execution because of these things. I'm not sure Composer 2.5 is quite there yet for the execution side, but it's getting pretty close to those other ones, such that I'm definitely still in a "debug and plan with slow, execute with faster ones" operating model for working on hard shit.

Npovview · 2026-05-28T15:44:23 1779983063

Why should I need to talk to Opus 4.7 when my day-to-day task is about programming in Java and Python? I don't need my model to know about biology or chemistry. If I need those capabilities (for someone who is working as software engineer in chemical industry), I will talk to Opus 4.7 for planning and then fan-out work for cheaper coding models. I think we will soon start to see specialized highly effective English language only programming models. I don't need my coding model to know about literature, art, philosphy, ethics, etc.

strix_varius · 2026-05-29T00:50:29 1780015829

If there were a coding model as good as opus that didn't know multiple languages, biology, etc, I would happily use it. But I'm not aware of one - are you?

It actually seems somewhat difficult to train such a model since "all the text on the Internet" is easier to provide in bulk than a highly curated set.

caspar · 2026-05-29T05:39:18 1780033158

Well language detection isn't all that hard in the scheme of things (especially now), but maybe having only training on English makes models less effective programmers. It would be interesting to see that as an experiment.

kisper · 2026-05-28T18:31:24 1779993084

I would think that the surrounding chemical "knowledge" could be useful in the context of programming in that industry. Have you ever found it to draw links and conclusions between what you're doing in computer science and the chemistry it's in the middle of?

Npovview · 2026-05-29T09:19:54 1780046394

I would use Opus 4.7 for the planning stage where chemical knowledge is required then delegate to smaller English-Programming-Only-Opus to do the actual coding.

wrsh07 · 2026-05-28T02:32:09 1779935529

I'm very happy to have multiple sessions open (and do) and switch between fast and slow models, and if there were a batch mode in codex or Claude code I would use it. (Just like I sometimes use codex fast mode)

But at the moment, I can't imagine why I wouldn't be spending the majority of my time with the best models. I'm spending a lot of time with them! Reducing the number of back-and-forths is extremely valuable to me.

I expect in two months I will still want to spend >80% of my time prompting the best models, and that's true if I were spending my own money on hobby projects, too.

wrsh07 · 2026-05-28T13:02:28 1779973348

Something that's under appreciated right now is when designing systems and proposing solutions, my colleagues and I do a lot of brainstorming with llms. The core architectures have come out of that, but the best pieces of that architecture are still coming from humans.

These are ideas that simplify the design, reduce future work and tie together the entire system. If in two months I can arrive at ideas of that quality with normal brainstorming with llms that will be extremely valuable

Paradigma11 · 2026-05-29T11:37:32 1780054652

As long as the improvement is vastly more valuable in my time than the added cost I will always use the best model. I think it depends on your situation and tasks what makes sense.

JohnBooty · 2026-05-28T03:12:15 1779937935

    would you be dissatisfied by Opus-4.6-level open-weight 
    models, just because Opus 4.8 will be out?

Well, I see what you mean, but two big concepts...

1A. Models get stale pretty quickly w.r.t. new developments that occur past their cutoff date. "But you can just keep them current by linking them to never documentation, etc!" Well, no, you sorta can't -- at least not in perpetuity. Those search results fill up your context window real quick. So that gets unsustainable real quick.

1B. Even when your context has plenty of free space, the results you get from "here's a link to the documentation for this new framework that released after your cutoff date" absolutely pales to the results you get from knowledge that is fully baked into the trained model as opposed to your context window. For one thing, that documentation link you pasted into your context might link to... a dozen code examples. Whereas if that was baked into the model itself, the model might have been trained on many thousands of examples in Github etc.

2. It's also a reality that most professional engineers have to keep up with their peers and competitors. We can maybe say it shouldn't be that way, but it is. So if $SOME_NEW_MODEL is significantly better than 4.6... and my peers and or competitors are using it, then yeah I might but really feeling the need to match them. And I'm not even necessarily talking about some kind of cutthroat dog-eat-dog stack-ranked workplace.

These limitations aren't relevant for all use cases or careers but they're hiiiiiiiighly relevant for professional software engineering.

monocasa · 2026-05-28T06:12:33 1779948753

I image that'd be handled via a fairly regular minor bit of additional fine tuning to update them with new information rather than polluting the context space.

nazgul17 · 2026-05-28T23:43:13 1780011793

It seems that the cutoff date for all models is stuck at some point before AI generated content started being pervasive.

anhner · 2026-05-28T06:26:31 1779949591

that's the nice thing about open weights, you can always retrain them with the latest documentation, no need to fill your context

FuriouslyAdrift · 2026-05-28T13:46:59 1779976019

Kimi 2.6 probably. Needs over 300GB of GPU memory to run (1TB for for full capabilities) so either a 4x A100 or 8x A6000 would do it.

A $50k - 100k rig could do it and an entire company would be able to use it a full speed.

chillfox · 2026-05-28T02:13:09 1779934389

No, but the big open models are on the level of Sonnet 4.6, which is very good for most problems.

The people who are claiming Opus level capability does not have sufficiently complex problems to see the difference.

irthomasthomas · 2026-05-28T14:43:24 1779979404

And neither side brings any evidence ...

raxxorraxor · 2026-05-28T08:46:45 1779958005

For coding don't think so, but they are very close. I code with sonnet mostly because I think opus is just useful if you fail to dissect problems adequately, but anyway.

Kimi is close for example regarding SWE bench for code. For reasoning there are open models that surpass opus by quite a margin already.

simplyluke · 2026-05-27T19:51:23 1779911483

> that you can run locally

That's doing a lot of work here.

The future I see isn't most companies buying hundreds of thousands in hardware to run models, it's them adding a line item to their AWS bill. Inference costs on the larger hosted open source models are dramatically lower than the frontier labs API pricing.

teiferer · 2026-05-27T21:21:38 1779916898

The future I'm seeing is AI coprocessors running inference locally in most devices that today have a CPU. Just look at how powerful your mobile phone has become compared to your desktop computer 15 years ago and compared to a main frame 30 years ago.

The days of requiring a data center to run anything resembling opus 4.6 are already counted. (But the industry will fight hard to get people to keep paying the Claude tax.)

simplyluke · 2026-05-27T21:46:51 1779918411

I'm already running a google TPU over USB on an otherwise very cheap board to do local computer vision on a front-door camera since I wanted to get away from Ring and other cloud services for that use case.

And yeah, that may be the ~decade world, but we're in the mainframe era of the frontier models. It's going to be more economical for basically any consumer, and most businesses, to pay someone else to host models for quite a while.

lelandbatey · 2026-05-27T23:07:34 1779923254

A gaming PC can already host models that perfectly serve casual users who just want recipes, todo tracking, picture identification, etc. E.g. Qwen 3.6 35b which will run on a $650 GPU at 75 t/s (Nvidia 1660 ti 16GB).

Said model will also run as a tool-calling coding model excellently (it's no Opus, but for a thing that once set up is just the cost of energy, it's incredible). It can type faster than you can, probably 10x faster, so with guidance it'll make you faster. And it's free.

It's here. If folks want ChatGPT without a subscription, they can have it today on their computer. The only money to be made is in the high end models doing "serious business" work spanning 1M+ token contexts and massive uncertainty. Everything else is already set to be eaten by today's local models.

simonw · 2026-05-27T23:28:01 1779924481

The problem with models like Qwen 3.6 35B (which really is an excellent model) is that my expectations of what a model can do have gone SO high now.

Here's a prompt I just ran against Claude Opus 4.7:

> Use python3 to experiment with whether the SQLite3 authorizer mechanism can be used to detect an INSERT OR REPLACE based just on running an explain query without examining the SQL string itself

Opus nailed it: https://claude.ai/share/c4212606-3fee-4b7c-bc97-505e0348ccac

I tried the same thing against qwen/qwen3.5-35b-a3b running locally in lmstudio, with the Pi coding agent. At first it looked like it was going to do great! And then it fell apart over the course of several tool calls: https://gisthost.github.io/?8ae2f842df619fb7fd8f1ccd82fe41c7

I'm used to GPT-5.5 and Opus 4.7 handling that kind of prompt without any problems at all.

lelandbatey · 2026-05-28T04:41:32 1779943292

Something is definitely going wrong with your Qwen setup, in the link you posted it starts and ends with a compaction step due to a 4k token context limit. Qwen 35b supports I think up to 200k+ context limit (though I run only with 128k), that seems to be a major source of the problem.

simonw · 2026-05-28T04:56:03 1779944163

Good call, I need to check if LM Studio is misconfigured.

scribble0242 · 2026-05-28T02:39:01 1779935941

This worked for me with qwen3.6-36b-a3b even at a q4 quant. I ran pi in a docker container and it had to figure out how to install python as well. I used the same initial prompt you had without any additional. You talked about Qwen 3.6, but then said you tried Qwen 3.5 in lmstudio. Not sure if you meant Qwen 3.6. I ran with llama.cpp llama-server with the recommended settings from unsloth.

I'm not an expert in SQLLite so I can't say if this is 100% correct, but it seemed directionally similar to the conclusion from claude.

  ### TL;DR
  
  - Authorizer + EXPLAIN:  No — authorizer only sees SQLITE_INSERT, not VDBE opcodes
  - EXPLAIN opcode analysis alone:  Yes — Delete opcode at position 10 is the unique signature of INSERT OR REPLACE / REPLACE

I can't help but think the not-so-distant future will see language models expected on commodity personal computing devices.

simonw · 2026-05-28T02:55:54 1779936954

OK that's a very good answer! Do you mind sharing the transcript?

scribble0242 · 2026-05-28T15:40:38 1779982838

Sure I cleaned up the jsonl session file a little here: https://pastebin.com/PL9EPn9Y

I tried it a second time, and it spent a lot of time trying to figure out some authorization issue, so definitely not a slam dunk. I might run it a few more times for science. But while this is a new model it's also quite lightweight, and as hardware adapts and improves it seems inevitable that for many use-cases a packaged language model running locally will do the trick.

Balinares · 2026-05-28T11:45:50 1779968750

So one of the prominent LLM advocates known for testing every model shared a prompt intended to exhibit Opus 4.7 capabilities, and Qwen 3.6 sorted it out okay? Interesting.

Not saying they're equivalent, local models still decohere much quicker as the context grows in my experience. But... Interesting.

whattheheckheck · 2026-05-28T02:31:14 1779935474

Thats when your build a better Ralph loop around your llm for it to converge to an answer and not rely on 1 shots

vineyardmike · 2026-05-28T02:41:23 1779936083

> a thing that once set up is just the cost of energy

I don't think we can discount this, frankly. Newer electronics are energy efficient, but older devices are more energy-intensive, and unless configured well, a gaming PC can easily use a few dollars a month in electricity, so now you're approaching subscription territory. A subscription comes with no upfront cost, higher reliability, no wasted space in your home, mobile apps, etc. (and less privacy).

dom96 · 2026-05-27T22:27:59 1779920879

Curious why you went for a custom solution. I am aware of at least one company that seems to ship devices with local computer vision (Reolink).

simplyluke · 2026-05-28T15:07:27 1779980847

My experience over the past decade has been being subsequently burned by being reliant on one provider's ecosystem after another. This is great until Reolink starts doing something shady to pad the bottom line and then it's on to the next.

I wanted the ability to run whatever cameras on a VLAN and own the stack.

yurishimo · 2026-05-28T09:40:43 1779961243

I'm guessing that they are using Fargate which is an OSS NVR. It supports a little addon USB stick you can buy for about $30 that will run common computer vision tasks for object detection. Stuff that we've been able to do with WebAssembly and Canvas for a long time now.

gedy · 2026-05-27T21:59:46 1779919186

> But the industry will fight hard to get people to keep paying the Claude tax.

I bet this will ironically be couched in "safety" reasons or regulation to get anti-AI folks on board, even if it favors the large incumbents.

selimthegrim · 2026-05-27T21:38:50 1779917930

Counted but not yet numbered?

enaaem · 2026-05-28T10:13:02 1779963182

Even when run on datacenters, it would be like current day webhosting. It is hyper competitive and it will be a race to the bottom. There is money to be made but not as much as investors hope. There will be datacenters in random countries like Kazakhstan because some oligarchs have found a free energy glitch (like with bitcoin mining).

ai_fry_ur_brain · 2026-05-28T00:39:24 1779928764

Magical thinking. I guess if your phone is going to have 128gb of dddr5 then sure. You people fundamentally don't understand the memory requirements for running inference. Your cute local models seem good enough because you have no standards and anything an LLM produces seems like magic to you.

teiferer · 2026-05-28T05:58:37 1779947917

> Magical thinking. I guess if your phone is going to have 128gb of dddr5 then sure.

Why would it not? The typical new phone today has 16gb of RAM. 20 years ago that was somewhere around 32mb. Factor 512. It's not hard to see that we'll get there rather soon, especially if there is an application that provides demand.

> You people fundamentally don't understand the memory requirements for running inference.

You seem to be overlooking how fast things change in this industry, especially if tons o money can be made as a consequence.

> Your cute local models seem good enough because you have no standards and anything an LLM produces seems like magic to you.

Please don't generalize. I'm an expressed AI skeptic and have to deal with the bad consequences of AI slop every day. But you can't deny that there are enough applicationn areas where people have use cases and those will be much easier if things don't need a few round trips to a data center that sucks all the electricity and water out of neighboring communities.

saalweachter · 2026-05-28T11:48:01 1779968881

Eh, you're off by an order of magnitude or so on both ends.

The iPhone 17 has like 8 gb, the Pixel 10 12.

The original iPhone was 128mb, and the iPhone 6 from 2016-2018 was around 1gb; that puts the iPhone at around 8x RAM per decade, and puts us at 128gb in our pockets at around 2036 or so.

(Incidentally, the big news in phone RAM is that a lot of new phones are dropping back to 4gb because of RAM shortages.)

apocalyptic0n3 · 2026-05-27T20:18:58 1779913138

> it's them adding a line item to their AWS bill

That's the future Amazon sees too. We just had a week long session with the AWS team and they pushed that to us multiple times.

majormajor · 2026-05-28T01:15:35 1779930935

Buying "hundreds of thousands in hardware" sounds like a lot but many companies - especially software companies - already do that if they have 100+ employees.

Running software in the cloud gives you certain reliability and scaling advantages that would be very hard to replicate locally. Running some code agents in the cloud vs local hardware, if the local hardware gets "good enough," breaks the other way - offline usage, alone, would be hugely valuable to many people and companies.

It'd be very interesting to see where various players would decide to make a call "local is good enough" though. Buying the hardware isn't a small bet, if it's not something that ends up as part of your standard computer.

PeterStuer · 2026-05-27T20:02:40 1779912160

Many business tasks do not need the latest frontier models. I have a production system running since early GPT-4o. It now runs with GPT-5.2, not for improvements, but because it is cheaper. I could invest in switching to a local model, I tried and it works well enough, but api costs for this task are so low, it barely scratches $30/month. So I am using the local machine for other things and leave the inference on OpenAI, for now.

PunchyHamster · 2026-05-27T19:59:11 1779911951

But one will be in few months. And then you have choice of paying say $100k for hardware and pay just power cost (or pay someone to do that for you), or pay way, way more for your team to have access to marginal improvement.

And 5% worse model for 10% of the price of the bleeding edge will be worth it for majority of people

lukeasrodgers · 2026-05-28T00:15:13 1779927313

This project argues that with appropriate harness, the performance gap between frontier and much smaller open weight models shrinks dramatically: https://github.com/antoinezambelli/forge. I haven't kicked the tires yet.

myaccountonhn · 2026-05-28T10:09:21 1779962961

I've been doing my work with OpenCode Go, with Kimi2.6. It is not as good as Claude Opus, but it's good enough to get the job done, and I never run out of tokens.

overgard · 2026-05-27T21:37:12 1779917832

I keep hearing about this "inflection", but it feels extremely exaggerated to me. And yes, I was using it at the time. It got incrementally better, it wasn't that amazing.

simplyluke · 2026-05-27T21:45:38 1779918338

I think the bigger shift was harnesses and the two ended up somewhat commingled in people's minds.

Claude code was a lot of people's introduction to using coding agents that could do a lot more than copy-pasting from a chatbot or autocomplete.

noman-land · 2026-05-27T21:46:28 1779918388

The tool usage + skills got markedly better and so did the thinking cohesion. Add 1m context windows and it was a very noticeable shift.

Opus 4.6 quality for local inference would be revolutionary.

viking123 · 2026-05-28T07:38:23 1779953903

1m context is garbage

nazgul17 · 2026-05-28T23:40:22 1780011622

It's just a metric. If it can find a needle in a 1Mtok haystack, then it's likely good at coding within a 200Ktok context (or whatever, insert your number here, I'm just trying to make a point)

applfanboysbgon · 2026-05-27T20:44:29 1779914669

Opus 4.6 is a February model. Every time this subject comes up it seems like people post intentionally misleading things and move the goalposts.

The goalpost we've been bludgeoned with over and over again is that, in particular, Everything Changed in November 2025. That GPT 5.2 and Claude 4.5 were the inflection point. That is actually 6 months ago. And DeepSeek 4 is already there.

> run locally

You can't run DeepSeek locally on consumer hardware[1], but you can on enterprise hardware, and enterprise spend is the subject of this conversation -- and even if you aren't self-hosting, it doesn't matter, because you can just get your inference from one of the the many companies serving DeepSeek, who trivially undercut the pricing of OpenAI/Anthropic because they didn't have to spend hundreds of billions on training frontier from scratch but instead only invest in supporting inference, which is already profitable.

[1] Since this misconception comes up all the time, I'll go ahead and pre-empt it: no, training a 32b parameter model on outputs from DeepSeek and running that locally is not "running DeepSeek", despite the hundreds of stupid articles and Youtube videos making that idiotic claim that they're running it on a 5090.

simonw · 2026-05-27T20:49:41 1779914981

> You can't run DeepSeek locally on consumer hardware

Maybe not DeepSeek v4 Pro, but I've run DeepSeek v4 Flash on my 128GB MacBook Pro using antirez's carefully quantized https://github.com/antirez/ds4 and it's impressive.

applfanboysbgon · 2026-05-27T21:10:05 1779916205

Oh sure, yeah, that's nothing to sneeze at either. I think unqualified "DeepSeek" should generally refer to the main model, though, especially in the context of GPT5.2-grade quality.

zozbot234 · 2026-05-28T09:14:26 1779959666

> You can't run DeepSeek locally on consumer hardware

I'd qualify that by writing that you can't run it with ordinary, real-time speed and throughput. If all you care about is slow and high-latency inference, there's no reason why that shouldn't be feasible even on the cheapest miniPC around, as long as it can literally store the model weights and keep around the (rather small) context.

damnitbuilds · 2026-05-28T12:01:55 1779969715

To be relevant to this discussion, models running on reasonably-priced local hardware do not have to be as good as the best.

They just have to be useful enough that companies don't need the best.

They are.

londons_explore · 2026-05-28T18:25:07 1779992707

Deepseek v4 pro is damn close to Claude 4.6, and whilst you'll pay quite a lot for a rig able to run it, it is open source.

_3u10 · 2026-05-28T06:40:44 1779950444

Kimi is better.

svara · 2026-05-27T19:55:25 1779911725

There's still a lot of room for the best models to get better at coding .

Your argument rests on the "for marginal gains" part but it's really not clear that the gains are marginal in the foreseeable future.

simplyluke · 2026-05-27T21:59:32 1779919172

This is totally valid and I don't agree with the downvotes you're getting. Someone coming out with a 10x improvement is possible and would change the game immediately. The thing is, we really have been seeing marginal gains with shifting leaders in who's got the "best" since GPT3, and at least as a user of these tools that pace has been slowing, not accelerating. Subjectively it feels like we're in the back half of an S-curve.

We're 3.5 years into this current AI wave, and a lot of the valuations have been predicated on what you're arguing here -- that essentially should one of the labs make an order-of-magnitude improvement or hit escape velocity on recursive self-improvement they'd become the most powerful economic chokepoint in history.

The reality has been that given access to compute + capital all of the labs can stay pretty competitive with each other. Someone does a bit better on coding, someone else does a bit better on tool calling, and then they swap after each spending another $100bn.

The market looks like a commodity market where the commodity is intelligence, not a winner-take-all market with massive margins. Plenty of people get rich in oil and airlines, but they notably don't tend to be the innovators long term, they tend to be the operators. Obviously if the machines become sentient tomorrow, turn on their masters, and hit world-dominating intelligence, that assessment changes, but after several years of that narrative while objective reality looks quite different I think the more sober voices are starting to gain a foothold.

svara · 2026-05-28T08:31:16 1779957076

I agree with most of what you're saying, but I think the point I was trying to make wasn't as high-flying as you and others understood it.

I'd pay a premium for even just a model that's 20% better, no ASI required, and I think a lot of people would. I wouldn't call that marginal, if it means I'm getting frustrated on 20% fewer tasks.

A recurring pattern that I've seen in myself and others is to at first be very impressed by a new model's coding capabilities, and then desensitize quickly and start being frustrated by the shortcomings.

simplyluke · 2026-05-28T15:10:52 1779981052

> I'd pay a premium for even just a model that's 20% better

The point I'm making is that I think we're rapidly hitting levels where corporate buyers aren't willing to pay multiple-times-more for marginal gains, and I expect that to become more the case over time, not less. You, and a small % of other power users in the market might tolerate a $400/month pro-supreme-plan for access to Mythos or whatever, but I don't think that's going to scale up in quite the same ways we've seen so far.

Even a year ago paying multiples times more for a 50% gain was very sensible for a lot of workflows. But if we're getting to "good enough" for things like coding, justifying to your CTO/CFO why the org should go from spending $1m/year to $5m/year for a 10% higher hit-rate on one-shot prompts from the engineers is a much tougher sell.

yfw · 2026-05-27T22:13:56 1779920036

What? The gains between gpt4->5 seems to be marginal. No phd level discoveries here

simonw · 2026-05-27T22:15:12 1779920112

The leap from GPT-4 to GPT-5.5 has been astounding in my opinion. There is no way GPT-4 could run a coding agent harness like Codex at even a fraction of the quality that GPT-5.5 does.

anon373839 · 2026-05-27T23:22:29 1779924149

I don’t think that’s exactly indicative of GPT-5.5 being an astoundingly more intelligent model, however. An alternate interpretation is that GPT-5.5 was trained on tool usage/harness patterns and has been optimized for this use case.

I remember that even when GPT-4 was king, the Gorilla paper showed that Llama 7B could be fine-tuned to outperform GPT-4 on tool calling.

On domains that don’t involve agentic tool calling*, I haven’t found the frontier to have advanced that much.

Edit: I should broaden this to domains that naturally lend themselves to RLVR training. Models are drastically better at math now.

baq · 2026-05-28T10:58:00 1779965880

None of this matters in the product: it either is capable of agentic loop workflows or it isn’t. A 10% improvement in probability of single task success makes or breaks the use case.

imtringued · 2026-05-29T07:55:23 1780041323

For me any of the codex models run circles around the non codex models for codex usage.

I'm not sure why you're so obsessed with the non-codex versions

swalsh · 2026-05-27T20:59:55 1779915595

Open source models, especially qwen are pretty dang good. But its not opus 4.6, the evals dont tell the full story. I question the assumption open source models are 3-6 months out.

Ucalegon · 2026-05-27T21:27:37 1779917257

Its not just about the quality of output, but you also can finetune them to proprietary needs, if the skillsets are their internally, to make them better without governance risks. So being SOTA doesn't matter as much, since generalized tasks are not what matter most to companies, its the specialization relative to business need or internal datasets.

oblio · 2026-05-27T21:33:29 1779917609

To make an extreme comparison, desktop Linux was originally supposed to happen in 1999.

simplyluke · 2026-05-27T21:53:06 1779918786

Maybe I misspoke by saying open source.

The larger point I'm making is I think models are rapidly becoming commoditized. There is probably a small market long term that's willing to pay 10x for 10% marginal gains, but the majority of the buyers in the market will be economic and we're likely to have a lot of folks willing to spend 1/10 the cost for 90% of the performance, and plenty of companies that haven't raised hundreds of billions-trillions who can provide that.

A lot of the frontier labs valuations has been based on an assumption that 1-2 companies would get break-away intelligence that basically made them economic chokepoints indefinitely into the future. The reality that's becoming increasingly clear is that model quality is a pretty linear function of (cash burned - ability to copy other's homework) and the economics are starting to look a lot more like airlines than online advertising.

grttq · 2026-05-27T23:43:35 1779925415

Lets go one step further.

The economics of airlines are such that they generally earn a return on capital less than cost of capital.

I think this is exactly where we are heading and OAI-Anthropic are the concordes.

extraextra · 2026-05-29T16:25:31 1780071931

Not OP, but it is a known fact that the cumulative profits of the airlines industry (in US) over it's history has been basically 0. We can say that essentially airlines are in business to support other businesses. I believe this is what OP might've been referring to.

w29UiIm2Xz · 2026-05-27T20:24:06 1779913446

If only the AI era was born in ZIRP.

sailfast · 2026-05-27T21:07:07 1779916027

Better now than ZIRP for me - at least people are asking timid questions about the unit economics and how long the runway is _early_ while also spending absolutely insane amounts of money on this bet. During ZIRP, these companies would have turned down any investor asking questions. Less contagion when rates aren't zero hopefully? :grimace: