Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I should? what problems can I solve, that can be only done with an agent? As long as every AI provider is operating at a loss starting a sustainably monetizable project doesn't feel that realistic.


The post is just about playing around with the tech for fun. Why does monetization come into it? It feels like saying you don't want to use Python because Astral, the company that makes uv, is operating at a loss. What?


Agents use Apis that I will need to pay for and generally software dev is a job for me that needs to generate income.

If the Apis I call are not profitable for the provider then they won't be for me either.

This post is a fly.io advertisement


"Agents use Apis that I will need to pay for"

Not if you run them against local models, which are free to download and free to run. The Qwen 3 4B models only need a couple of GBs of available RAM and will run happily on CPU as opposed to GPU. Cost isn't a reason not to explore this stuff.


Google has what I would call a generous free tier, even including Gemini 2.5 Pro (https://ai.google.dev/gemini-api/docs/rate-limits). Just get an API key from AiStudio. Also very easy to just make a switch in your agent so that if you hit up against a rate limit for one model, re-request the query with the next model. With Pro/Flash/Flash-Lite and their previews, you've got 2500+ free requests per day.


> Not if you run them against local models, which are free to download and free to run .. run happily on CPU .. Cost isn't a reason not to explore this stuff.

Let's be realistic and not over-promise. Conversational slop and coding factorial will work. But the local experience for coding agents, tool-calling, and reasoning is still very bad until/unless you have a pretty expensive workstation. CPU and qwen 4b will be disappointing to even try experiments on. The only useful thing most people can realistically do locally is fuzzy search with simple RAG. Besides factorial, maybe some other stuff that's in the training set, like help with simple shell commands. (Great for people who are new to unix, but won't help the veteran dev who is trying to convince themselves AI is real or figuring out how to get it into their workflows)

Anyway, admitting that AI is still very much in a "pay to play" phase is actually ok. More measured stances, fewer reflexive detractors or boosters


Sure, you're not going to get anything close to a Claude Code style agent from a local model (unless you shell out $10,000+ for a 512GB Mac Studio or similar).

This post isn't about building Claude Code - it's about hooking up an LLM to one or two tool calls in order to run something like ping. For an educational exercise like that a model like Qwen 4B should still be sufficient.


The expectation that reasonable people have isn't fully local claude code, that's a strawman. But it's also not ping tools or the simple weather agent that tutorials like to use. It's somewhere in between, isn't that obvious? If you're into evangelism, acknowledging this and actually taking a measured stance would help prevent light skeptics from turning into complete AI-deniers. If you mislead people about one thing, they will assume they are being misled about everything


I don't think I was being misleading here.

https://fly.io/blog/everyone-write-an-agent/ is a tutorial about writing a simple "agent" - aka a thing that uses an LLM to call tools in a loop - that can make a simple tool call. The complaint I was responding to here was that there's no point trying this if you don't want to be hooked on expensive APIs. I think this is one of the areas where the existence of tiny but capable local models is relevant - especially for AI skeptics who refuse to engage with this technology at all if it means spending money with companies they don't like.


I think it is misleading to suggest today that tool-calling for nontrivial stuff really works with local models. It just works in demos because those tools always accept one or two arguments, usually string literals or numbers. In the real world functions take more complex arguments, many arguments, or take a single argument that's an object with multiple attributes, etc. You can begin to work around this stuff by passing function signatures, typing details, and JSON-schemas to set expectations in context, but local models tend to fail at handling this kind of stuff long before you ever hit limits in the context window. There's a reason demos are always using 1 string literal like hostname, or 2 floats like lat/long. It's normal that passing a dictionary with a few strict requirements might need 300 retries instead of 3 to get a tool call that's syntactically correct and properly passed arguments. Actually `ping --help` for me shows like 20 options, and for any attempt to 1:1 map things with more args I think you'd start to see breakdown pretty quickly.

Zooming in on the details is fun but doesn't change the shape of what I was saying before. No need to muddy the water; very very simple stuff still requires very big local hardware or a SOTA model.


You and I clearly have a different idea of what "very very simple stuff" involves.

Even the small models are very capable of stringing together a short sequence of simple tool calls these days - and if you have 32GB of RAM (eg a ~$1500 laptop) you can run models like gpt-oss:20b which are capable of operating tools like bash in a reasonably useful way.

This wasn't true even six months ago - the local models released in 2025 have almost all had tool calling specially trained into them.


You mean like a demo for simple stuff? Something like hello world type tasks? The small models you mentioned earlier are incapable of doing anything genuinely useful for daily use. The few tasks they can handle are easier and faster to just write yourself with the added assurance that no mistakes will be made.

I’d love to have small local models capable of running tools like current SOTA models, but the reality is that small models are still incapable, and hardly anyone has a machine powerful enough to run the 1 trillion parameter Kimi model.


Yes, I mean a demo for simple stuff. This whole conversation is attached to an article about building the simplest possible tool-in-a-loop agent as a learning exercise for how they work.


> software dev is a job for me that needs to generate income

sir, this is a hackernews


> This post is a <insert-startup-here> advertisement

same thing you said but in a different context... sir, this is a hackernews


Because if you build an agent you'll need to host it in a cloud virtual machine...? I don't follow.


I have an "agent" that posts our family schedule + weather + other relevant stuff to our shared channel.

It costs like 0.000025€ per day to run. Hardly something I need to get "profitable".

I could run it on a local model, but GPT-5 is stupidly good at it so the cost is well worth it.


Practically everything is something you will need to pay for in the end. You probably spent money on an internet connection, electricity, and computing equipment to write this comment. Are you intending to make a profit from commenting here?

You don't need to run something like this against a paid API provider. You could easily rework this to run against a local agent hosted on hardware you own. A number of not-stupid-expensive consumer GPUs can run some smaller models locally at home for not a lot of money. You can even play videogames with those cards after.

Get this: sometimes people write code and tinker with things for fun. Crazy, I know.


The submission is an advertisement for fly.io and OpenAI , both are paid services. We are commenting on an ad. The person who wrote it did it for money. Fly.io operates for money, OpenAi charges for their API.

They posted it here expecting to find customers. This is a sales pitch.

At this point why is it an issue to expect a developer to make money on it?

As a dev, If the chain of monetization ends with me then there is no mainstream adoption whatsoever on the horizon.

I love to tinker but I do it for free not using paid services.

As for tinkering with agents, its a solution looking for a problem.


Why are you repeatedly stating that the post is an ad as if it is some sort of dunk? Companies have blogs. Tech blogs often produce useful content. It is possible that an ad can both successfully promote the company and be useful to engineers. I find the Fly blog to be particularly well-written and thoughtful; it's taught me a good deal about Wireguard, for instance.


And that sounds fine, but Wireguard is not an overhyped industry promising huge gains in the future to investors and to developers jumping on a bandwagon who can find problems for this solution.

I actually have built agents already in the past and this is my opinion. If you read the article the author says they want to hear the reasoning for disliking it, so this is mine, the only way to create a business is raising money and hoping somebody strikes gold with the shovel Im paying for.


How would you feel about this post if the exact same content was posted on a developer's personal blog instead?

I ask because it's rare for a post on a corporate blog to also make sense outside of the context of that company, but this one does.


They're mentioning WireGuard because we do in fact do WireGuard, unlike LLM agents, which we do not offer as a service.


You keep saying this, but there is nothing in this post about our service. I didn't use Fly.io at all to write this post. Across the thread, someone had to remind me that I could have.


Sorry, I assumed a service offering Virtual machines shares python code with the intent to get people to run that python on their infra.


Yes. You've caught on to our devious plan. To do anything I suggested in this post, you'd have to use a computer. By spending compute cycles, you'd be driving scarcity of compute. By the inexorable law of supply and demand, this would drive the price of compute cycles up, allowing us to profit. We would have gotten away with it, if it wasn't for you.


Scooby Doobie Doooo!


No, we are not an LLM provider.


Yeah we have open source models too that we can use, and it’s actually more fun than using cloud providers in my opinion.


> what problems can I solve, that can be only done with an agent?

The problem that you might not intuitively understand how agents work and what they are and aren't capable of - at least not as well as you would understand it if you spent half an hour building one for yourself.


>> what problems can I solve, that can be only done with an agent?

> The problem that you might not intuitively understand how agents work and what they are and aren't capable of

I don't necessarily agree with the GP here, but I also disagree with this sentiment: I don't need to go through the experience of building a piece of software to understand what the capabilities of that class of software is.

Fair enough, with most other things (software or otherwise), they're either deterministic or predictably probabilistic, so simply using it or even just reading how it works is sufficient for me to understand what the capabilities are.

With LLMs, the lack of determinism coupled with completely opaque inner-workings is a problem when trying to form an intuition, but that problem is not solved by building an agent.


Seems like it would be a lot easier for everyone if we knew the answer to his/her question.


> As long as every AI provider is operating at a loss

None of them are doing that.

They need funding because the next model has always been much more expensive to train than the profits of the previous model. And many do offer a lot of free usage which is of course operated at a loss. But I don't think any are operating inference at a loss, I think their margins are actually rather large.


Parent comment never said operating inference at a loss, though it wouldn't surprise me, they just said "operating at a loss" which they most definitely are [0].

However, knowing a few people on teams at inference-only providers, I can promise you some of them absolutely are operating inference at a loss.

0. https://www.theregister.com/2025/10/29/microsoft_earnings_q1...


> Parent comment never said operating inference at a loss

Context. Whether inference is profitable at current prices is what informs how risky it is to build a product that depends on buying inference, which is what the post was about.


So you're assuming there's a world where these companies exist solely by providing inference?

The first obvious limitation of this would be that all models would be frozen in time. These companies are operating at an insane loss and a major part of that loss is required to continue existing. It's not realistic to imagine that there is an "inference" only future for these large AI companies.

And again, there are many inference only startups right now, and I know plenty of them are burning cash providing inference. I've done a lot of work fairly close to the inference layer and getting model serving happening with the requirements for regular business use is fairly tricky business and not as cheap as you seem to think.


The models may be somewhat frozen in time but with the right tools available to it they don't need all information innately coded into it. If they're able to query for reliable information to drag in they can talk about things that are well outside their original training data.


For a few months of news this works, but over the span of years even the statistical nature of language drifts a bit. Have you shipped natural language models to production? Even simple classifiers need to be updated periodically because of drift. There is no world where you lead the industry serving LLMs and don't train them as well.


> So you're assuming there's a world where these companies exist solely by providing inference?

Yes, obviously? There is no world where the models and hardware just vanish.


> and hardware just vanish.

Okay, this tells me you really don't understand model serving or any of the details of infrastructure. The hardware is incredibly ephemeral. Your home GPU might last a few years (and I'm starting to doubt that you've even trained a model at home), but these GPUs have incredibly short lifespans under load for production use.

Even if you're not working on the back end of these models, you should be well aware that one of the biggest concerns about all this investment is how limited the lifetime of GPUs is. It's not just about being "outdated" by superior technology, GPUs are relatively fragile hardware and don't last too long under constant load.

As far as models go, I have a hard time imagining a world in 2030 where the model replies "sorry, my cutoff date was 2026" and people have no problem with this.

Also, you still didn't address my point that startups doing inference only model serving are burning cash. Production inference is not the same as running inference locally where you can wait a few minutes for the result. I'm starting to wonder if you've ever even deployed a model of any size to production.


I didn't address the comment about how some startups are operating at a loss because it seems like an irrelevant nitpick at my wording that "none of them" is operating inference at a loss. I don't think the comment I was replying to was referring to relying on whatever startups you're talking about. I think they were referring to Google, Anthropic, and OpenAI - and so was I.

That seems like a theme with these replies, nitpicking a minor thing or ignoring the context or both, or I guess more generously I could blame myself of not being more precise with my wording. But sure, you have to buy new GPUs after making a bunch of money burning the ones you have down.

I think your point about knowledge cutoff is interesting, and I don't know what the ongoing cost to keeping a model up to date with world knowledge is. Most of the agents I think about personally don't actually want world knowledge and have to be prompted or fine tuned such that they won't use it. So I think that requirement kind of slipped my mind.


If the game is inference the winners are the cloud mega scalers, not the ai labs.


This thread isn't about who wins, it's about the implication that it's too risky to build anything that depends on inference because AI companies are operating at a loss.


So AI companies are profitable when you ignore some of the things they have to spend money on to operate?

Snark aside, inference is still being done at a loss. Anthropic, the most profitable AI vendor, is operating at a roughly -140% margin. xAI is the worst at somewhere around -3,600% margin.


If they are not operating inference at a loss and current models remain useful (why would they regress?), they could just stop developing the next model.


At minimum they have to incorporate new data every month or the models will fail to know how many Shrek movies there are and become increasingly wrong in a world that isn't static.


That sort of thing isn't necessary for all use cases. But if you're relying on the system to encode wikipedia or the zeitgeist then sure.


They could, but that’s a recipe for going out of business in the current environment.


Yes, but at the same time it's unlikely for existing models to disappear. You won't get the next model, but there is no choice but to keep inference running to pay off creditors.


The interesting companies to look at here are the ones that sell inference against open weight models that were trained by other companies - Fireworks, Cloudflare, DeepInfra, Together AI etc.

They need to cover their serving costs but are not spending money on training models. Are they profitable? Probably not yet, because they're investing a lot of cash in competing with each other to R&D more efficient ways of serving etc, but they're a lot closer to profitability than the labs that are spending millions of dollars on training runs.


Where do those numbers come from?


Can you cite your source for inference being at a loss? This disagrees with most of what I've read.


Sounds quite a bit like pyramid scheme "business model": how is it different?

If a company stops training new models until they can fund it out of previous profits, do we only slow down or halt altogether? If they all do?


> But I don't think any are operating inference at a loss, I think their margins are actually rather large.

Citation needed. I haven't seen any of them claim to have even positive gross margins to shareholders/investors, which surely they would do if they did.



> “if you consider each model to be a company, the model that was trained in 2023 was profitable. You paid $100 million, and then it made $200 million of revenue. There’s some cost to inference with the model, but let’s just assume, in this cartoonish cartoon example, that even if you add those two up, you’re kind of in a good state. So, if every model was a company, the model, in this example, profitable,” he added.

“What’s going on is that while you’re reaping the benefits from one company, you’re founding another company that’s much more expensive and requires much more upfront R&D investment. The way this is going to shake out is that it’s going to keep going up until the numbers get very large, and the models can’t get larger, and then there will be a large, very profitable business. Or at some point, the models will stop getting better, and there will perhaps be some overhang — we spent some money, and we didn’t get anything for it — and then the business returns to whatever scale it’s at,” he said.

This take from Amodei is hilarious but explains so much.


When comparing the cost of an H100 GPU per hour and calculating cost of tokens, it seems the OpenAI offering for the latest model is 5 times cheaper than renting the hardware.

OpenAI balance sheet also shows an $11 billion loss .

I can't see any profit on anything they create. The product is good but it relies on investors fueling the AI bubble.


https://martinalderson.com/posts/are-openai-and-anthropic-re...

All the labs are going hard on training and new GPUs. If we ever level off, they probably will be immensely profitable. Inference is cheap, training is expensive.


To do this analysis on an hourly retail cost and an open weight model and infer anything about the situation at OpenAI or Anthropic is quite a reach.

For one (basic) thing, they buy and own their hardware, and have to size their resources for peak demand. For another, Deepseek R1 does not come close to matching claude performance in many real tasks.


> When comparing the cost of an H100 GPU per hour and calculating cost of tokens, it seems the OpenAI offering for the latest model is 5 times cheaper than renting the hardware.

How did you come to that conclusion? That would be a very notable result if it did turn out OpenAI were selling tokens for 5x the cost it took to serve them.


I am reading it as OpenAI selling them for 20% of the cost to serve them (serving at the equivalent token/s with cloud pay-per-use GPUs).


You're right, I misunderstood.


it seems to me they are saying the opposite


Isn't that operating at a loss


> None of them are doing that.

Can you point us to the data?


You can be your own AI provider.


>starting a sustainably monetizable project doesn't feel that realistic.

and

>You can be your own AI provider.

Not sure that being your own AI provider is "sustainably monetizable"?


For internal software maybe, but for a client facing service the incentives are not right when the norm is to operate at a loss.


You are asking the wrong questions. You should be asking what the problems are that you can still solve better and cheaper than an agent? Because anything else, you are probably doing it wrong (the slow and expensive way). That's not long term sustainable. It helps if you know how agents work and as the article argues, there isn't a whole lot to that.


I love how programmers generally tout themselves as these tinkerers who love learning about and exploring technology… until it comes to AI and then it’s like “show me the profitable use case.” Just say you don’t like AI!


Yeah but fly.io is a cloud provider doing this advertisement with OpenAI Apis. Both cost money, so if it's not free to operate then the developed project should offset the costs.

Its about balance.

Really its the AI providers that have been promising unreal gains during this hype period, so people are more profit oriented.


What does "cloud provider" even have to do with this post?


Or maybe some of us realize that these tools are fucking useless and don’t offer any “value” apart from the most basic thing imaginable.

And I use value in quotes because as soon as the AI providers suddenly need to start generating a profit, that “value” is going to cost more than your salary.


It doesn't have to be profitable. Elegant and clever would suffice.


I don't think hn is reflective of where programmers are today, culturally. 10 years ago, sure, it probably was.


what place is more reflective today?


I don't know about online forums, but all my IRL friends have a lot more balanced takes on AI than this forum. And honestly it extends beyond this forum to the wider internet. Online, the discourse seems extremely polarized: either it's all a pyramid scheme or stories about how development jobs are already defunct and AI can supervise AI etc.


Show me where TFA even implied that you should start a sustainably monetizable project with agents?


[flagged]


What is the point of your comment?

Lonely developer?


I think we may have both nailed it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: