Nobody watches the same movie, the same way no man steps into the same river twice.
Some people have trouble following plot. Some people excuse themselves to use the bathroom. Some people have trouble catching all the dialog. Some people close their eyes during the scary parts. Different elements call up totally different associations in different people's brains. If you watch a movie a first time and then a second time, they're different movies. So I'm OK with watching a different movie, same as everybody else.
Often, when there's a really powerful scene, I'll rewatch it two or three times before continuing, too. Because there's more richness than I can capture with just one viewing, and I want to feel like I experience it fully before moving on. So that makes it a different movie too. I'm not going to let someone else dictate my experience.
Movies aren't consumed as bit-perfect binaries to begin with. They're distributed as files that way, maybe, but even the basic viewing and sound conditions are different for everyone. Color fidelity, detail, acoustic muddiness due to room reverb. Literally everyone's watching a "differently patched binary" if that's how you want to think of it.
Since there are no humans involved, it's more like growing a tree. Sure it's good to know how trees grow, but not knowing about cells didn't stop thousands of years of agriculture.
Its not like tree at all because tree is one and done.
Code is a project that has to be updated, fixed, etc.
So when something breaks - you have to ask the contractor again. It may not find an issue, or mess things up when it tries to fix it making project useless, etc.
Its more like a car. Every time something goes wrong you will pay for it - sometimes it will get back in even worse shape (no refunds though), sometimes it will cost you x100 because there is nothing you can do, you need it and you can't manage it on your own.
Trees are not static, unchanging, pop into existence and forget about, things.
Trees that don't get regular "updates" of adequate sunlight, water, and nutrients die. In fact, too much light or water could kill it. Or soil that is not the right courseness or acidity level could hamper or prevent growth. Now add "bugs". Literal bugs, diseases, and even competing plants that could eat, poison, or choke the tree.
You might be thinking of trees that are indigenous to an area. Even these compete for the resources and plagues of their area, but are more apt than the trees accustom to different environments, and even they go through the cycle of life.
I think his analogy was perfect, because this is the first time coding could resemble nature. We are just used to the carefully curated human made code, as there has not been such a thing as naturally occuring, no human interaction, code before
The Gas Town piece reminded me of this as well. The author there leaned into role playing, social and culture analogies, and it made a lot more sense than an architecture diagram in which one node is “black box intelligence” with a single line leading out of it…
I wouldn't say it is a tree as such as at least trees are deterministic where input parameters (seed, environment, sunlight) define the output.
LLM outputs are akin to a mutant tree that can decide to randomly sprout a giant mushroom instead of a branch. And you won't have any idea why despite your input parameters being deterministic.
You haven't done a lot of gardening if you don't know plants get 'randomly' (there's a biological explanation, but with the massive amounts of variables it feels random) attacked by parasites all the time. Go look at pot growing subreddits, they spend an enormous chunk of their time fighting mites.
Determinism is not strictly anti-randomness (though I can see why one can confuse it to be polar opposites). Rather we do not even have true randomness (at least not proven) and should actually be called pseudorandom. Determinism just means that if you have the same input parameters (considering all parameters have been accounted for), you will get the same result. In other words, you can start with a particular random seed (pseudorandom seed to be precise) and always end up with the same end result and that would be considered deterministic.
> You haven't done a lot of gardening if you don't know plants
I grow "herbs".
> there's a biological explanation
Exactly. There is always an explanation for every phenomena that occurs in this observable, physical World. There is a defined cause and effect. Even if it "feels random". That's not how it is with LLMs. Because in between your deterministic input parameters and the output that is generated, there is a black box: the model itself. You have no access to the billions of parameters within the models which means you are not sure you can always reproduce the output. That black box is what causes non-determinism.
EDIT: just wanted to add - "attacked by parasites all the time", is why I said if you have control over the environment. Controlling environment encompasses dealing with parasites as well. Think of well-controlled environment like a lab.
Do you think LLMs sidestep cause and effect somehow ? There's an explanation there too, we just don't know it, But that's the case for many natural phenomena.
In what world are trees deterministic? There are a set of parameters that you can control that give you a higher probability of success, but uncontrollable variables can wipe you out.
Explained here [1]. We live in a pseudorandom World. So everything is deterministic if you have the same set of input parameters. That includes trees as well.
I am not talking about controllable/uncontrollable variables. That has no bearing on whether a process is deterministic in theory or not. If you can theoretically control all variables (even if you practically cannot), you have a deterministic process as you can reproduce the entire path: from input to output. LLMs are currently a black box. You have no access to the billions of parameters within the model, making it non-deterministic. The day we have tools where we can control all the billions of parameters within the model, then we can retrace the exact path taken, thereby making it deterministic.
Except that the tree is so malformed and the core structure so unsound that it can't grow much past its germination and dies of malnourishment because since you have zero understanding of biology, forestry and related fields there is no knowledge to save it or help it grow healthy.
Also out of nowhere an invasive species of spiders that was inside the seed starts replicating geometrically and within seconds wraps the whole forest with webs and asks for a ransom in order to produce the secret enzyme that can dissolve it. Trying to torch it will set the whole forest on fire, brute force is futile. Unfortunately, you assumed the process would only plagiarize the good bits, but seems like it also sometimes plagiarizes the bad bits too, oops.
If it's "Who is worse Google or LLMs?", I think I'll say Google is worse. The biggest issue I see with LLMs is needing to pay a subscription to tech companies to be able to use them.
You don't even need to do that- pay a subscription, I mean. A gemma 3 4b model will run on near potato hardware at usable speeds and achieves performance for many purposes on part with ChatGPT 3.5 turbo or better in many tasks much more beneficial than ad tech and min/max'ing media engagement. Or the free versions of many SOTA web LLMs, all free, to the world, if you have a web browser.
I think it's more comforting to think that you could've done nothing about your life being bad. It's so obvious that you need both good luck and hard work.
Then why not say "they are just computer programs"?
I think the reason people don't say that is because they want to say "I already understand what they are, and I'm not impressed and it's nothing new". But what the comment you are replying to is saying is that the inner workings are the important innovative stuff.
> Then why not say "they are just computer programs"?
LLMs are probabilistic or non-deterministic computer programs, plenty of people say this. That is not much different than saying "LLMs are probabilistic next-token prediction based on current context".
> I think the reason people don't say that is because they want to say "I already understand what they are, and I'm not impressed and it's nothing new". But what the comment you are replying to is saying is that the inner workings are the important innovative stuff.
But we already know the inner workings. It's transformers, embeddings, and math at a scale that we couldn't do before 2015. We already had multi-layer perceptrons with backpropagation and recurrent neural networks and markov chains before this, but the hardware to do this kind of contextual next-token prediction simply didn't exist at those times.
I understand that it feels like there's a lot going on with these chatbots, but half of the illusion of chatbots isn't even the LLM, it's the context management that is exceptionally mundane compared to the LLM itself. These things are combined with a carefully crafted UX to deliberately convey the impression that you're talking to a human. But in the end, it is just a program and it's just doing context management and token prediction that happens to align (most of the time) with human expectations because it was designed to do so.
The two of you seem to be implying there's something spooky or mysterious happening with LLMs that goes beyond our comprehension of them, but I'm not seeing the components of your argument for this.
No one understands how an LLM works. Some people just delude themselves into thinking that they do.
Saying "I know how LLMs work because I read a paper about transformer architecture" is about as delusional as saying "I read a paper about transistors, and now I understand how Ryzen 9800X3D works". Maybe more so.
It takes actual reverse engineering work to figure out how LLMs can do small bits and tiny slivers of what they do. And here you are - claiming that we actually already know everything there is to know about them.
I never claimed we already know everything about LLMs. Knowing "everything about" anything these days is impossible given the complexity of our technology. Even antennae, a centuries old technology, is something we're still innovating on and don't completely understand in all domains.
But that's a categorically different statement than "no one understands how an LLM works", because we absolutely do.
You're spending a lot of time describing whether we know or don't know LLMs, but you're not talking at all about what it is that you think we do or do not understand. Instead of describing what you think the state of the knowledge is about LLMs, can you talk about what it is that you think that is unknown or not understood?
I think the person you are responding to is using a strange definition of "know."
I think they mean "do we understand how they process information to produce their outputs" (i.e., do we have an analytical description of the function they are trying to approximate).
You and I mean, we understand the training process that produces their behaviour (and this training process is mainly standard statistical modelling / ML).
I agree. The two of us are talking past each other, and I wonder if it's because there's a certain strain of thought around LLMs that believes that epistemological questions and technology that we don't fully understand are somehow unique to computer science problems.
Questions about the nature of knowledge (epistemology and other philosophical/cognitive studies) in humans are still unsolved to this day, and frankly may never be fully understood. I'm not saying this makes LLM automatically similar to human intelligence, but there are plenty of behaviors, instincts, and knowledge across many kinds of objects that we don't fully understand the origin of. LLMs aren't qualitatively different in this way.
There are many technologies that we used that we didn't fully understand at the time, even iterating and improving on those designs without having a strong theory behind them. Only later did we develop the theoretical frameworks that explain how those things work. Much like we're now researching the underpinnings of how LLMs work to develop more robust theories around them.
I'm genuinely trying to engage in a conversation and understand where this person is coming from and what they think is so unique about this moment and this technology. I understand the technological feat and I think it's a huge step forward, but I don't understand the mysticism that has emerged around it.
> Saying "I know how LLMs work because I read a paper about transformer architecture" is about as delusional as saying "I read a paper about transistors, and now I understand how Ryzen 9800X3D works". Maybe more so.
Which is to say, not delusional at all.
Or else we have to accept that basically hardly anyone "understands" anything. You set an unrealistic standard.
Beginners play abstract board games terribly. We don't say that this means they "don't understand" the game until they become experts; nor do we say that the experts "haven't understood" the game because it isn't strongly solved. Knowing the rules, consistently making legal moves and perhaps having some basic tactical ideas is generally considered sufficient.
Similarly, people who took the SICP course and didn't emerge thoroughly confused can reasonably be said to "understand how to program". They don't have to create MLOC-sized systems to prove it.
> It takes actual reverse engineering work to figure out how LLMs can do small bits and tiny slivers of what they do. And here you are - claiming that we actually already know everything there is to know about them.
No; it's a dismissal of the relevance of doing more detailed analysis, specifically to the question of what "understanding" entails.
The fact that a large pile of "transformers" is capable of producing the results we see now, may be surprising; and we may lack the mental resources needed to trace through a given calculation and ascribe aspects of the result to specific outputs from specific parts of the computation. But that just means it's a massive computation. It doesn't fundamentally change how that computation works, and doesn't negate the "understanding" thereof.
Understanding a transistor is an incredibly small part of how Ryzen 9800X3D does what it does.
Is it a foundational part? Yes. But if you have it and nothing else, that adds up to knowing almost nothing about how the whole CPU works. And you could come to understand much more than that without ever learning what a "transistor" even is.
Understanding low level foundations does not automatically confer the understanding of high level behaviors! I wish I could make THAT into a nail, and drive it into people's skulls, because I keep seeing people who INSIST on making this mistake over and over and over and over and over again.
My entire point here is that one can, in fact, reasonably claim to "understand" a system without being able to model its high level behaviors. It's not a mistake; it's disagreeing with you about what the word "understand" means.
For the sake of this conversation "understanding" implicitly means "understand enough about it to be unimpressed".
This is what's being challenged: That you can discount LLMs as uninteresting because they are "just" probalistic inference machines. This completely underestimates just how far you can push the concept.
Your pedantic definition of understand might be technically correct. But that's not what's being discussed.
That is, unless you assign metaphysical properties to the notion of intelligence. But the current consensus is that intelligence can be simulated, at least in principle.
Saying we understand the training process of LLMs does not mean that LLMs are not super impressive. They are shining testiments to the power of statistical modelling / machine learning. Arbitrarily reclassifying them as something else is not useful. It is simply untrue.
There is nothing wrong with being impressed by statistics... You seem to be saying that statistics is interesting and there for to say that LLMs are statistics dismissed them. I think perhaps you are just implicitly biased against statistics! :p
Is understanding a system not implicitly saying you know how, on a high level, it works?
You'd have to know a lot about transformer architecture and some reasonable LLM specific stuff to do this beyond just those basics listed earlier.
When it's not just a black box but you can say something meaningful to approximate its high level behavior is where I'd put understand. Transistors won't get you to CPU archiecture and transformers don't get you to LLMs.
There is so much complexity in interactions of systems that is easy to miss.
Saying that one can understand a modern CPU by understanding how a transistor works is kinda akin to saying you can understand the operation of a country by understanding a human from it. It's a necessary step, probably, but definitely not sufficient.
It also reminds me of a pet peeve in software development where it's tempting to think you understand the system from the unit tests of each component, while all the interesting stuff happens when different components interact with each other in novel ways.
Even saving can be seen as greed. Someone can focus too much on accumulating for themselves. Both investing and saving can be seen as preparation.
To avoid things becoming evil, you just need to make sure that your interactions with other are cooperative and not zero sum, and not all investments are zero sum.
reply