Hacker Newsnew | past | comments | ask | show | jobs | submit | hackinthebochs's commentslogin

>We know that they do not reason because we know the algorithm behind the curtain.

In other words, we didn't put the "reasoning algorithm" in LLMs therefore they do not reason. But what is this reasoning algorithm that is a necessary condition for reasoning and how do you know LLMs parameters didn't converge on it in the process of pre-training?


Model parameters are weights, not algorithms. The LLM algorithm is (relatively) fixed: generate the next token according to the existing context, the model weights, and some randomization. That’s it. There is no more algorithm than that. The training parameters can shift the probabilities for predicting a token given the context, but there’s no more to it than that. There is no “reasoning algorithm” in the weights to converge to.

This overly reductive description of LLMs misses the forest for the trees. LLMs are circuit builders, the converged parameters pick out specific paths through the network that define programs. In other words, LLMs are differentiable computers[1]. Analogous to how a CPU is configured by the program state to execute arbitrary programs, the parameters of a converged LLM configure the high level matmul sequences towards a wide range of information dynamics.

Statistics has little relevance to LLM operation. The statistics of the training corpus imparts constraints on the converged circuit dynamics, but otherwise has no representation internally to the LLM.

[1] https://x.com/karpathy/status/1582807367988654081


> LLMs are circuit builders

I think they are circuit "approximators". In other words, a result of a glorified linear regression..


I called it a “big wad of linear algebra,” above. That’s all it is.


I see nothing to preclude a foundation model being augmented by a smaller model that serializes particulars about an individuals cumulative interaction with the model and then streamlines it into the execution thread of the foundation model.

>this LLM behavior can be analogized in terms of some human behavior, thus it follows that LLMs are human-like

No, the argument is "this behavior is similar enough to human behavior that using it as evidence against <claim regarding LLM capability that humans have> is specious"

>"Confabulation" in LLMs and "confabulation" in humans have basically nothing in common

I don't know why you think this. They seem to have a lot in common. I call it sensible nonsense. Humans are prone to this when self-reflective neural circuits break down. LLMs are characterized by a lack of self-reflective information. When critical input is missing, the algorithm will craft a narrative around the available, but insufficient information resulting in sensible nonsense (e.g. neural disorders such as somatoparaphrenia)


> No, the argument is "this behavior is similar enough to human behavior that using it as evidence against <claim regarding LLM capability that humans have> is specious"

I'm not really following. LLM capabilities are self-evident, comparing them to a human doesn't add any useful information in that context.

> LLMs are characterized by a lack of self-reflective information. When critical input is missing, the algorithm will craft a narrative around the available, but insufficient information resulting in sensible nonsense (e.g. neural disorders such as somatoparaphrenia)

You're just drawing lines between superficial descriptions from disparate concepts that have a metaphorical overlap. It's also wrong. LLMs do not "craft a narrative around available information when critical input is missing", LLM confabulations are statistical, not a consequence of missing information or damage.


>LLM capabilities are self-evident

This is undermined by all the disagreement about what LLMs can do and/or how to characterize it.

>LLM confabulations are statistical, not a consequence of missing information or damage.

LLMs aren't statistical in any substantive sense. LLMs are a general purpose computing paradigm. They are circuit builders, the converged parameters define pathways through the architecture that pick out specific programs. Or as Karpathy puts it, LLMs are a differentiable computer[1]. So yes, narrative crafting in terms of leveraging available putative facts into a narrative is an apt characterization of what LLMs do.

[1] https://x.com/karpathy/status/1582807367988654081


>Structurally a transformer model is so unrelated to the shape of the brain there's no reason to think they'd have many similarities.

Substrate dissimilarities will mask computational similarities. Attention surfaces affinities between nearby tokens; dendrites strengthen and weaken connections to surrounding neurons according to correlations in firing rates. Not all that dissimilar.


Linear regression has well characterized mathematical properties. But we don't know the computational limits of stacked transformers. And so declaring what LLMs can't do is wildly premature.


> And so declaring what LLMs can't do is wildly premature.

The opposite is true as well. Emergent complexity isn’t limitless. Just like early physicists tried to explain the emergent complexity of the universe through experimentation and theory, so should we try to explain the emergent complexity of LLMs through experimentation and theory.

Specifically not pseudoscience, though.


>so should we try to explain the emergent complexity of LLMs through experimentation and theory.

Physicists had the real world to verify theories and explanations against.

So far anyone 'explaining the emergent complexity of LLMs through experimentation and theory' is essentially just making stuff up nobody can verify.


Well that’s why I provided the caveat “specifically not pseudoscience”, which is, as you described, “just making stuff up nobody can verify”.


If you say not pseudoscience and then make up pseudoscience anyway then what's the point? The field has not advanced anywhere enough in understanding for convoluted explanations about how LLMs can never do x to be anything but pseudoscience.


Sure, that's true as well. But I don't see this as a substantive response given that the only people making unsupported claims in this thread are those trying to deflate LLM capabilities.


So, to review this thread

  - OP asked for someone to make a logical argument for the separation of “training” from “model”
  - I made the argument
  - You cherry picked an argument against my specific example and made an appeal to emergent complexity
  - I pointed out that emergent complexity isn’t limitless
  - “the only people making unsupported claims in this thread are those trying to deflate LLM capabilities”


You made a pretty nonsensical argument, pretty much seems like the big standard for these arguments.

What does linear regression have to do with the limitations of a stacked transfer ? Absolutely nothing. This is the problem here. You don't know shit and just make up whatever. You can see people doing the same thing in GPT-1, 2, 3, 4 threads all telling us why LLMs will never be able to do thing it manages to do later.


> You don’t know shit

lol. Why so emotionally charged? Are you perhaps worried that you’ve invested too much time and effort into a technology that may not deliver what influencers have been promising for years? Like a proverbial bagholder?

> What does linear regression have to do with the limitations of a stacked transfer ? Absolutely nothing. This is the problem here.

We’re talking about fundamental concepts of modeling in this subthread. LLMs, despite what influencers may tell you, are simply models. I’ll even throw you a bone and admit they are models for intelligence. But they are still models, and therefore all of the things that we have learned about “models” since Plato are still relevant. Most importantly, since Plato we’ve known that “models” have fundamental limits vs. what they try to represent, otherwise they would be a facsimile, not a model.

> You can see people doing the same thing in GPT-1, 2, 3, 4 threads all telling us why LLMs will never be able to do thing it manages to do later.

I hope you enjoy winning these imaginary arguments against these imaginary comments. The fundamental limitations of LLMs discussed since GPT-1 have never been addressed by changing the architecture of the underlying model. All of the improvements we’ve experienced have been due to (1) improvements in training regime and (2) harnesses / heuristics (e.g. Agents).

Now, care to provide a counterargument that shows you know a little more than “shit”?


>We’re talking about fundamental concepts of modeling in this subthread. LLMs, despite what influencers may tell you, are simply models. I’ll even throw you a bone and admit they are models for intelligence. But they are still models, and therefore all of the things that we have learned about “models” since Plato are still relevant. Most importantly, since Plato we’ve known that “models” have fundamental limits vs. what they try to represent, otherwise they would be a facsimile, not a model.

Okay, but the brain is also “just a model” of the world in any meaningful sense, so that framing does not really get you anywhere. Calling something a model does not, by itself, establish a useful limit on what it can or cannot do. Invoking Plato here just sounds like pseudo-profundity rather than an actual argument.

>I hope you enjoy winning these imaginary arguments against these imaginary comments. The fundamental limitations of LLMs discussed since GPT-1 have never been addressed by changing the architecture of the underlying model. All of the improvements we’ve experienced have been due to (1) improvements in training regime and (2) harnesses / heuristics (e.g. Agents).

If a capability appears once training improves, scale increases, or better inference-time scaffolding is added, then it was not demonstrated to be a 'fundamental impossibility'.

That is the core issue with your argument: you keep presenting provisional limits as permanent ones, and then dressing that up as theory. A lot of people have done that before, and they have repeatedly been wrong.


To be clear, you are confusing me with other commenters in this thread. All I want is for those that liken LLMs to stochastic parrots and other deflationary claims to offer an argument that engages with the actual structure of LLMs and what we know about them. No one seems to be up to that challenge. But then I can't help but wonder where people's confident claims come from. I'm just tired of the half-baked claims and generic handwavy allusions that do nothing but short-circuit the potential for genuine insight.


>AlphaGo didn't teach itself that move. The verifier taught AlphaGo that move.

No. AlphaGo developed a heuristic by playing itself repeatedly, the heuristic then noticed the quality of that move in the moment.

Heuristics are the core of intelligence in terms of discovering novelty, but this is accessible to LLMs in principle.


Why would you want every site on the internet to traffic in government IDs? This is by far the least bad out of all possible ways to implement age checking. The benefit of this is that it can short-circuit support for more onerous age verification. The writing has been on the wall for some time now: the era of completely unrestricted internet is coming to an end. The question is how awful will the new normal be? This implementation is a win all around, a complete nothingburger. We should be celebrating it, not fighting it tooth and nail.

The tech crowds utter derangement over this minor mandate is truly a sight to behold.


> This is by far the least bad out of all possible ways to implement age checking.

Not quite. The least bad (that I'm aware of) is to mandate RTA headers (or an equivalent more comprehensive self categorization system) and to also mandate that major platforms (presumably OS and browsers, based on MAU or some such) implement support for filtering on those headers.

But sending a binned age as per the California law is the next best thing to that.


In fact, many libraries have computers sectioned off in semi-private areas exactly for this reason...


Are you sure that's a library?

I mean we have places here like that where you can insert some coins for a private viewing cabin but we don't call them libraries :)


A law defines the nature of collective action in response to certain violations. Words on paper themselves are impotent. If there is no potential for enforcement, i.e. there is no counterfactual state of collective action, there is no law.


I've always found it strange how Americans like to validate their ideals using their kids as vehicles. Instead of teaching kids how to be successful in a less than ideal world, we teach them our ideal view of the world. Like teaching kids violence is never the answer, instead of sometimes a situation does call for violence. We raise kids for a world that doesn't exist. It's up to the kid/adult to unlearn those obviously bogus ideals after they make contact with the world. It's just odd how we're so practiced at setting up our children for less success in the real world.


How did you arrive at this being uniquely American? I would say it's Western society more generally.


I mainly said America because I only feel qualified to speak on America. But I do think there is something uniquely American about seeing the march of "progress" as an ultimate ideal and stagnation in any form as a defeat. Economic and social progress is basically a founding ideal of American society and is a major driver of our success over the centuries. It permeates our culture in so many ways, e.g. the idea that your kids should have it better than you. So shaping the next generation by way of shaping the views of your kids, despite the potential mismatch between the ideal and the reality is seen as just a part of the march of progress.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: