Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Obviously, that's my point.

We can do the math. GPT-4o can emit about 70 tokens a second. API pricing is $10/million for output tokens and $2.5/million for input tokens.

Assuming a workload where inputs tokens are 10:1 with output tokens, and that I can generate continuous load (constantly generating tokens). I'll end up paying $210/day in API fees, or $76,650 in a year.

Let's assume the hardware required to service this load is a rack of 8 H100s (probably not accurate, but likely in the ballpark.). That cost $240k.

So the hardware would pay for itself in 3 years. It probably has a service life of about double that.

Of course we have to consider energy too. Each H100 is 700watts, meaning our rack is 5.6 kilowatts, so we're looking at about 49 megawatt-hours to operate for the year. Let's assume they pay wholesale electricity prices of $50/mwh (not unreasonable), and you're looking at a ~$2,500 annual energy bill.

So there's no reason to think that inference alone isn't a profitable business.



That doesn't sounds like brilliant margins, to be honest. You've left out the entire "running a business" costs, plus the model training costs. They need to pay their staff, offices, and especially lawyers (for all the lawsuits over the scraped content used to train the models).

It's not unusual for a startup to not be profitable, and they're obviously not as the company doesn't make a profit, but I'm not sure why isolating one aspect of their business and declaring it profitable would justify the idea that this company is inevitably a good investment "even if the company went defunct tomorrow".

Perhaps you meant "win" in the sense of "being influential" or something, but I'm pretty sure the people who invested billions of dollars use definitions that involve more concrete returns on their investment.


Oh they are 100% losing money hand over fist if you include training costs and the eye-watering salaries they pay some of their employees.

I was responding to someone upthread suggesting that they were running even inference at a loss.


You're missing the fact that requests are batched. It's 70 tokens per second for you, but also for 10s-100s of other paying customers at the same time.


All these efficiencies just increase OpenAI's margin on inference. Of course it's not "one cluster per customer" and of course a customer can't saturate a cluster by themselves, my illustration was only to point out that the economics work.


Inference alone totally can be. Just look at banana.dev, runpod, lambda labs, or replicate.

The issue is OpenAI is not just selling inference.

Though I wouldn’t be surprised if there were some hidden costs that are hard for us to account for due to the sheer amount of traffic they must be getting on an hourly basis.


Oh actually banana.dev shutdown. Maybe it’s not as profitable.


70 somethings per second, is slow. So that means it does take a very significant amount of resources, considering it's running on the same or better hardware. To sustain 70 things per second for thousands of users, it gets expensive really quickly.


My point is that at current API pricing the users are paying enough to cover inference costs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: