>winning on cost-effectiveness Nobody is winning in this area until these things...

JSR_FDED · 2025-12-01T22:56:21 1764629781

Nobody is winning until cars are the size of a pack of cards. Which is big enough to transport even the largest cargo.

ActorNightly · 2025-12-02T04:35:15 1764650115

Lol its kinda suprising that the level of understanding around LLMs is so little.

You already have agents, that can do a lot of "thinking", which is just generating guided context, then using that context to do tasks.

You already have Vector Databases that are used as context stores with information retrieval.

Fundamentally, you can have the same exact performance on a lot of task whether all the information exists in the model, or you use a smaller model with a bunch of context around it for guidance.

So instead of wasting energy and time encoding the knowledge information into the model, making the size large, you could have an "agent-first" model along with just files of vector databases, and the model can fit in a single graphics cards, take the question, decide which vector db it wants to load, and then essentially answer the question in the same way. At $50 per TB from SSD not only do you gain massive cost efficiency, but you also gain the ability to run a lot more inference cheaper, which can be used for refining things, background processing, and so on.

eru · 2025-12-02T06:52:55 1764658375

You should start a company and try your strategy. I hope it works! (Though I am doubtful.)

In any case, models are useful, even when they don't hit these efficiency targets you are projecting. Just like cars are useful, even when they are bigger than a pack of cards.

ActorNightly · 2025-12-02T17:32:14 1764696734

If someone wants to fund me, Ill gladly work on this. There is no money in this though, because selling cloud service is much more profitable.

Its also not a matter of it working or not. It already works. Take a small model that fits on a GPU with a large context window, like Gemma 27b or smaller ones, give it a whole bunch of context on the topic, and ask it questions and it will generate very accurate results based on the context.

So instead of encoding everything into the model itself, you can just take training data, store it in vector DBs, and train a model to retrieve that data based on query, and then the rest of it is just training context extraction.

eru · 2025-12-03T00:20:48 1764721248

> There is no money in this though, because selling cloud service is much more profitable.

Oh, be more creative. One simple way to make money off your idea is:

(1) Get a hedge fund to finance your R&D.

(2) Hedge fund shorts AI cloud providers and other relevant companies.

(3) Your R&D pans out and the AI cloud providers' stock tanks.

(4) The hedge fund makes a profit.

Though I don't understand: wouldn't your idea work work when served from the cloud, too? If what you are saying is true, you'd provide a better service at lower cost?

ActorNightly · 2025-12-03T20:03:36 1764792216

From a functional pespective, it would provide somewhat identical performance to existing systems with a lower cost due to less dependence on compute and more dependence on storage. It would also allow more on-prem solutions.

However the issue with "funding" isn't as simple as that statement above. Remember, modern funding is not about value its about hype. There is a reason why CEOs like Jenson say that if they could go back in time, they would never start their companies knowing the bullshit they have to walk through.

Ive also had my fair share of experiences in trying to get startups off the ground - for example, back around 2018, I was working on a system that would take your existing AWS cloud setup, and move it all to EC2s with self hosted services, which saved people money in the long run. I had proof of concept working and everything. The issue that I ran into when trying to get funding to build this out into a full blown product/service that I didn't realize is that being on AWS services for companies was equivalent to a person wearing an expensive business suit to a sales meeting - it was fact that they would advertise because it was seen as industry standard and created "warm feelings" with their customers. So at most, I would get some small time customers, while getting paid much less.

Now I just work on stuff (and yes, I am working on the issue at hand with existing models), and publish it to github (not gonna share it cause don't want my HN account associated with it). If someone contacts me with a dollar figure Im all game.

JSR_FDED · 2025-12-02T08:49:43 1764665383

https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

ActorNightly · 2025-12-02T16:26:37 1764692797

Ok then point out where I made a mistake.

Nothing shows lack of understanding of the subject matter more than referencing the Dunning Kruger effect in a conversation.

bbor · 2025-12-01T22:57:17 1764629837

I mean, there are lots of models that run on home graphics cards. I'm having trouble finding reliable requirements for this new version, but V3 (from February) has a 32B parameter model that runs on "16GB or more" of VRAM[1], which is very doable for professionals in the first world. Quantization can also help immensely.

Of course, the smaller models aren't as good at complex reasoning as the bigger ones, but that seems like an inherently-impossible goal; there will always be more powerful programs that can only run in datacenters (as long as our techniques are constrained by compute, I guess).

FWIW, the small models of today are a lot better than anything I thought I'd live to see as of 5 years ago! Gemma3n (which is built to run on phones[2]!) handily beats ChatGPT 3.5 from January 2023 -- rank ~128 vs. rank ~194 on LLMArena[3].

[1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...

[2] https://huggingface.co/google/gemma-3n-E4B-it

[3] https://lmarena.ai/leaderboard/text/overall [1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...

qeternity · 2025-12-01T23:02:34 1764630154

> but V3 (from February) has a 32B parameter model that runs on "16GB or more" of VRAM[1]

No. They released a distilled version of R1 based on a Qwen 32b model. This is not V3, and it's not remotely close to R1 or V3.2.

beefnugs · 2025-12-01T22:24:01 1764627841

Why does that matter? They wont be making at home graphics cards anymore. Why would you do that when you can be pre-sold $40k servers for years into the future

observationist · 2025-12-01T23:19:58 1764631198

Because Moore's law marches on.

We're around 35-40 orders of magnitude from computers now to computronium.

We'll need 10-15 years before handheld devices can run a couple terabytes of ram, 64-128 terabytes of storage, and 80+ TFLOPS. That's enough to run any current state of the art AI at around 50 tokens per second, but in 10 years, we're probably going to have seen lots of improvements, so I'd guess conservatively you're going to be able to see 4-5x performance per parameter, possibly much more, so at that point, you'll have the equivalent of a model with 10T parameters today.

If we just keep scaling and there are no breakthroughs, Moore's law gets us through another century of incredible progress. My default assumption is that there are going to be lots of breakthroughs, and that they're coming faster, and eventually we'll reach a saturation of research and implementation; more, better ideas will be coming out than we can possibly implement over time, so our information processing will have to scale, and it'll create automation and AI development pressures, and things will be unfathomably weird and exotic for individuals with meat brains.

Even so, in only 10 years and steady progress we're going to have fantastical devices at hand. Imagine the enthusiast desktop - could locally host the equivalent of a 100T parameter AI, or run personal training of AI that currently costs frontier labs hundreds of millions in infrastructure and payroll and expertise.

Even without AGI that's a pretty incredible idea. If we do get to AGI (2029 according to Kurzweil) and it's open, then we're going to see truly magical, fantastical things.

What if you had the equivalent of a frontier lab in your pocket? What's that do to the economy?

NVIDIA will be churning out chips like crazy, and we'll start seeing the solar system measured in terms of average cognitive FLOPS per gram, and be well on the way toward system scale computronium matrioshka brains and the like.

blonder · 2025-12-02T03:20:45 1764645645

I appreciate your rabid optimism, but considering that Moores Law has ceased to be true for multiple years now I am not sure a handwave about being able to scale to infinity is a reasonable way to look at things. Plenty of things have slowed down in progress in our current age, for example airplanes.

timschmidt · 2025-12-02T04:20:58 1764649258

Someone always crawls out of the woodwork to repeat this supposed "fact" which hasn't been true for the entire half-century it's been repeated. Jim Keller (designer of most of the great CPUs of the last couple decades) gave a convincing presentation several years ago about just how not-true it is: https://www.youtube.com/watch?v=oIG9ztQw2Gc Everything he says in it still applies today.

Intel struggled for a decade, and folks think that means Moore's law died. But TSMC and Samsung just kept iterating. And hopefully Intel's 18a process will see them back in the game.

eru · 2025-12-02T06:57:27 1764658647

During the 1990s (and for some years before and after) we got 'Dennard scaling'. The frequency of processors tended to increase exponentially, too, and featured prominently in advertising and branding.

I suspect many people conflated Dennard scaling with Moore's law and the demise of Dennard scaling is what contributes to the popular imagination that Moore's law is dead: frequencies of processors have essentially stagnated.

See https://en.wikipedia.org/wiki/Dennard_scaling

timschmidt · 2025-12-02T08:57:35 1764665855

Yup. Since then we've seen scaling primarily in transistor count, though clock speed has increased slowly as well. Increased transistor count has led to increasingly complex and capable instruction decode, branch prediction, out of order execution, larger caches, and wider execution pipelines in attempt to increase single-threaded performance. We've also seen the rise of embarrassingly parallel architectures like GPUs which more effectively make use of additional transistors despite lower clock speeds. But Moore's been with us the whole time.

Chiplets and advanced packaging are the latest techniques improving scaling and yield keeping Moore alive. As well as continued innovation in transistor design, light sources, computational inverse lithography, and wafer scale designs like Cerebras.

eru · 2025-12-02T10:08:03 1764670083

Yes. Increase in transistor count is what the original Moore's law was about. But during the golden age of Dennard scaling it was easy to get confused.

timschmidt · 2025-12-02T10:24:52 1764671092

Agreed. And specifically Moore's law is about transistors per constant dollar. Because even in his time, spending enough could get you scaling beyond what was readily commercially available. Even if transistor count had stagnated, there is still a massive improvement from the $4,000 386sx Dad somehow convinced Mom to greenlight in the late 80s compared to a $45 Raspberry Pi today. And that factors into the equation as well.

Of course, feature size (and thus chip size) and cost are intimately related (wafers are a relatively fixed cost). And related as well to production quantity and yield (equipment and labor costs divide across all chips produced). That the whole thing continues scaling is non-obvious, a real insight, and tantamount to a modern miracle. Thanks to the hard work and effort of many talented people.

eru · 2025-12-02T11:03:56 1764673436

The way I remember it, it was about the transistor count in the commercially available chip with the lowest per transistor cost. Not transistor count per constant dollar.

Wikipedia quotes it as:

> The complexity for minimum component costs has increased at a rate of roughly a factor of two per year. Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years.

But I'm fairly sure, if you graph how many transistors you can buy per inflation adjusted dollar, you get a very similar graph.

timschmidt · 2025-12-02T11:41:58 1764675718

Yes. I think you're probably right about phrasing. And transistor count per inflation adjusted dollar is the unit most commonly used to graph it. Similar ways to say the same thing.

observationist · 2025-12-02T16:28:34 1764692914

The Law of Accelerating Returns is a better formulation, not tied to any particular substrate, it's just not as widely known.

https://imgur.com/a/UOUGYzZ - had chatgpt whip up an updated chart.

LoAR shows remarkably steady improvement. It's not about space or power efficiency, just ops per $1000, so transistor counts served as a very good proxy for a long time.

There's been sufficiently predictable progress that 80-100 TFLOPS in your pocket by 3035 is probably a solid bet, especially if a fully generative AI OS and platform catches on as a product. The LoAR frontier for compute in 2035 is going to be more advanced than the limits of prosumer/flagship handheld products like phones, so theres a bit of lag and variability.

js8 · 2025-12-02T10:13:49 1764670429

You could put 64TBs of storage into your pocket with current technology. There are 4TB microSD cards available.

Not sure about the stated GFlops.. but I suspect we find that AI doesn't need that much compute to begin with.

fragmede · 2025-12-02T10:23:24 1764671004

You can run models locally on high end smartphones today with apps like PocketPal or Local LLM.

eru · 2025-12-02T06:54:22 1764658462

> What if you had the equivalent of a frontier lab in your pocket? What's that do to the economy?

Well, these days people have the equivalent of a frontier lab from perhaps 40 years ago in their pocket. We can see what that has done to the economy, and try to extrapolate.

ActorNightly · 2025-12-02T04:45:28 1764650728

Nothing to do with Moores Law or AGI.

The current models are simply inefficient for their capability in how they handle data.

delaminator · 2025-12-01T23:50:05 1764633005

> If we do get to AGI (2029 according to Kurzweil)

if you base your life on Kurzweil's hard predictions you're going to have a bad time

ActorNightly · 2025-12-02T04:12:33 1764648753

I didn't say winning business, I said winning on cost effectiveness.