There aren't really any current advances outside of sheer scale of input in the models, and all the engineering and hardware around achieving that scale. And I think the point is no matter how much input data you give the ml/dl system, it will still have no awareness, no understanding of any kind and certainly no causal awareness.
I think LLMs understand things in abstract. e.g. they know that if a 'Cat' is 'Miaowing' then if you 'give' it some 'cat food' it will then be 'happy'. It doesn't know what any of those things _are_, but it knows how X and Y and Z relate. And so in the abstract sense it knows that 'hunger' causes a 'cat' to 'miaow'
it will still have no awareness
We dont know how awareness works, so we're not in a position to say what has and hasn't got it.
Think about the milliseconds in which ChatGPT uses its 4000 token length to analyse approx 2000 words all at same time, in a process encompassing a massive number of GPUs processing a mind boggling number of parameters. How are we to say what it going on for those milliseconds? There could be some sort of abstract analogue to awareness happening there for a burst of a few milliseconds. These LLMs are all about emergent effects from huge scale. Similarly, me and you are made out of 30 trillion microscopic dumb biological robots, none of whom know who we are or care, but nonetheless, we have awareness.
The LLMs are predictive text. They only know that a hungry cat wants food because there are enough examples of that scenario on the Web. Related scenarios lead to nonsense like: the cat may want to go on a walk. I’m not making this up, ChatGPT recently suggested that I take my cat on a walk. This is likely bleeding too many concepts together (dog=pet=cat) and represents over fitting IMO. Cosine distance can only do so well, it’s not good at distinguishing nuance. But it’ll gladly regurgitate a rephrased Wikipedia article.
Wrong. There is science around the fact that chatGPT builds higher level macro structures in the neural network. These structures have been shown to be an actual model of things within the world.
This isn't even a debate anymore between two guys on the internet. There is literal science behind this and the difference between us is one person is behind on the science.
But forget the science. There is even chatGPT output that literally shows it understood what it was told.
It is clear chatGPT does not have perfect understanding of the world. It does create really stupid output. But this is ignoring the fact that there are tons of answers it gives that show unmistakably that it knows what you asked it.
You don't think that complex structures are required to generate well formed text? Human language is extremely context dependent, and dealing with that is what all those macro structures does.
Context dependence is literally the novel thing with transformers, they are context dependent statistical models to generate next words, make it huge and feed it a page of context and it can map those to a page of output to match that context.
- GPT style language models end up internally implementing a mini "neural network training algorithm" (gradient descent fine-tuning for given examples): https://arxiv.org/abs/2212.10559
Your second bullet isn’t what the paper says. It says that the model’s accuracy under in-context learning from the meta gradients is similar to manual fine-tuning with back propagation (under some surprising constraints in the name of “fairness”). The language model isn’t implementing anything, you have the directionality wrong.
GPT models are constructed with pretrained gradients which are applicable in a large set of situations. It’s just an optimization technique, albeit a clever one.
Quoting from the paper:
In summary, we explain ICL as a process of meta-optimization: (1) a Transformer-based pre- trained language model serves as a meta-optimizer; (2) it produces meta-gradients according to the demonstration examples through forward computa- tion; (3) through attention, the meta-gradients are applied to the original language model to build an ICL model.
What you quoted from the paper literally said what I said.
There's no difference.
The forward computation is computing gradients. These new gradients are applied to the model, to build the icl model via "attention".
What I said: "implementing a mini "neural network training algorithm" (gradient descent fine-tuning for given examples):"
Perhaps you're not getting it. Forward computation is the neural network processing an input. The paper is saying gradients are built here. These new gradients are then applied through the transformers "attention" step. The icl model is the model with the attention applies.
It is literally just another perspective of what transformers do.
You said “… models end up internally implementing…”. This is incorrect and also importantly different than what you just said. The model and the training using ICL are different things, which it appears you are now beginning to understand.
Precision in speech is critical when discussing complex subjects.
Pedantic arguments hinder useful discussion. That's what you're doing here.
It took me a couple minutes to "figure out" just what you mean and even now I don't fully get it.
Are you in actuality complaining about the word "implement"? You're pedantically arguing that the word doesn't belong?
That's the least ludicrous intent for all the possible intents behind your reply.
Even your pedantism can't win here though. You are in fact wrong, implement is an appropriate word here, the only thing I understand here is how mistaken you are.
You might want to do some research into the words “science” and “pedantry”, since the former is just meticulous application of the latter. Calling a scientist pedantic is a compliment.
That aside, you’ve been arguing that these models understand things and citing these papers as evidence. They are not. They are evidence that the ability of these models to generate text based off of their existing training set can easily be finetuned in a number of ways to add training sets after the initial zero-shot learning.
That’s all these models do, they generate text based upon some training set. If we define understanding as the ability to extrapolate beyond what one has been told, they are expressly not doing that. Your papers explain this quite well.
Edit: to see more concrete examples of this, look into the unfortunately named “hallucination” ability of LLMs. Once you realize that they only know what they were told and are unable to logically extrapolate the point becomes clearer. I hope that helps.
> You might want to do some research into the words “science” and “pedantry”, since the former is just meticulous application of the latter. Calling a scientist pedantic is a compliment.
Not only am I extremely well versed in the definition and philosophy of the word science (likely much more well-versed than you), but you are completely and utterly wrong about the compliment part. A scientist is a human, if I call a scientist pedantic during a normal discussion then the scientist will take it as an insult. Do you think a scientist has conditioned his mind into a sort of emotionless robot who can needlessly branch off onto a debate about the definition of the word "science" and "pedantry" when the topic is actuality "machine learning"? No. A scientist can both be pedantic and stupid, being a scientist does not preclude one from being human.
>That aside, you’ve been arguing that these models understand things and citing these papers as evidence. They are not. They are evidence that the ability of these models to generate text based off of their existing training set can easily be finetuned in a number of ways to add training sets after the initial zero-shot learning.
I posted two papers. You're conveniently ignoring the first and naively mistaken about the second.
Part of "understanding" is the ability to formulate new theorems from previously known facts, in order to do this one must "understand" how these facts compose to form new statements. This is what's happening in the fine tuning. It is a demonstration of understanding... that it knows how disparate knowledge composes to form new knowledge. The very definition of understanding.
>That’s all these models do, they generate text based upon some training set. If we define understanding as the ability to extrapolate beyond what one has been told, they are expressly not doing that. Your papers explain this quite well.
Of course. You cannot extrapolate anything beyond what you Observe as well. Can you literally form new knowledge out of thin air? No. You have three things: Existing knowledge, knowledge through observation, and knowledge through composition of existing knowledge.
Without introducing new knowledge, LLMs can be coerced to compose existing knowledge to form new knowledge. Additionally, in the ICL step they can be introduced to new knowledge and form additional compositions there. This has been demonstrated repeatedly.
>Edit: to see more concrete examples of this, look into the unfortunately named “hallucination” ability of LLMs. Once you realize that they only know what they were told and are unable to logically extrapolate the point becomes clearer. I hope that helps.
It's obvious chatGPT makes stuff up. Every one who has worked with LLMs in depth is fully aware of this. It's an obvious thing, you don't even have to "look it up" everyone knows about it.
This claim is made DESPITE the fact that LLMs hallucinate. It's obvious these models are imperfect and it's obvious they have huge deficiencies. But when it doesn't hallucinate, when the answer is Novel, creative, correct and unmistakably not existing in any training set, then we know the model understood the query you gave it.
> You have three things: Existing knowledge, knowledge through observation, and knowledge through composition of existing knowledge.
I mostly agree with your position but have a quibble with this characterization. Knowledge can also be generated from randomization and enumeration. For instance, we could enumerate all Turing machines that might satisfy some property, or we could randomly permute some Turing machine as in genetic algorithms to find some new behaviours.
You might be tempted to categorize these under "composition", but I think they have different properties from composition, which is typically understood to be a finite deterministic map. Enumeration is potentially unbounded, and random mutation is non-determinstic.
It's trivial to add randomness too the model. chatGPT, I believe already has it.
Simply add a new input neuron with a random seed or add a tiny bit of noise to some weights or seed the tokens yourself in your query.
"<Query Seed: 4334> hello chatGPT, how are you?"
In this case chatGPT can deliberately randomize the response through understanding the intent of what you mean by "Query Seed".
In the brain, if such randomness existed, it would largely be modelled as a similar mechanism. A seed value (or multiple seeds at different places) is either inserted near the input step or happens at a branching logic step. Additionally, the query to the human brain can also be seeded.
True randomness requires that this seed number comes from quantum properties of particles that expresses itself in a sort of macro level random number. There is no other known source of true randomness in nature, though we can get perceptually identical results through just seeding with timestamps.
Either way it's trivial to add and not critical to what we mean by the word "understanding" because it's both easy to add and we aren't even sure if we have such randomness is in our brains. If it existed and if a human had this mechanism removed from his brain we would still say that this human is capable of "understanding" things.
I'm not talking about ChatGPT or anything like that, I'm just taking slight issue with your characterizing of our sources of knowledge. I think there are more sources than you laid out, as I explained.
Randomness is trivially addable. Additionally it's arguably not a source of source knowledge.
Its just a selection parameter. Which sources of knowledge do I use for composition and in what way do I compose it out of available compositions? The randomness parameter can influence these steps.
If you think there are other sources of knowledge, then what are they?
You're still thinking of selecting a random datum from a set of known datums. I'm talking about generating new datums randomly, datums that weren't known beforehand, like random mutations we see in genes.
The only way for randomness to work is to limit the set of available random "datums". This is done by composition pre-existing knowledge.
For example, if I have trillions of atoms randomly composed together with the objective to form a perfect cube, it will take eternity for a valid solution to arise.
If I have pre-existing cubes of atoms already pre-formed into cubic lego bricks and I have these components compose randomly, then it is far more likely that I get a cube.
LLMs, can do this with additional reinforcement training. chatGPT specifically is effective because it has this on top of the original GPT-3 model.
This is essentially random selection of known datums. It happens at the genetic level as well on a higher level.
The generation of raw novel datums in genes, however, is something that cannot happen in human brains. The mechanism for natural selection cannot exist within the brain itself, unless you're calling the trial by error process "natural selection." But again a human doesn't select a completely random strategy to "trial" he will select a strategy out of a known set. This is again, selecting a random datum.
Keep in mind, actual pure randomness generates useless noise 99.99999% of the time.
Just for reference, I've been on a walk with my cat just yesterday ;-). Bengal cats do like going for a walk. My cat follows me without a leash. And he even begs to go for a walk once in a while.
You’re anthropomorphizing these chat bots. They may associate those words in specific contexts, but they don’t have that kind of abstract understanding.
The best way to understand them isn’t to look at what they do well but where they fail. A great example is how ChatGPT can initially make a few chess moves that seem reasonable, but very quickly it stops making valid moves. It’s not operating from some model of the game but rather imitating sequences of moves it’s seen before. The best analogy isn’t cognition but someone trying to make a better version of “lorem ipsum” for whatever promo you’re giving it.
with enough training data it could probably model chess - not well enough to win but well enough to make legal moves
You think that I am misunderstanding whats going on and anthropomorphising ChatGPT. I know how ChatGPT works, my position is that we might be overestimating our selves, and underestimating the power of emergent phenomenon.
The context is limited to the last 2048 tokens which is insufficient to model some valid chess games. Thus no amount of training would be enough without simplifying the problem.
And fed them into a blank GPT then it would learn chess like a language and be able to make legal moves most of the time. It would do this by inferring and modelling the board and the rules. It wouldn't be a great player but it would be able to make moves.
This is bascially what the Othello paper I linked above is all about. They used GPT-2 I think. Chess is harder but I reckon could be done with a bigger model and more training data.
Sure often playing valid moves is possible, but not only is it moving the goalposts but different approaches actually produce skilled players using neural networks.
Anyway, my point was less about the game than how its failure show what’s going on better than it’s successes. The GPT approach is optimized for chat bots and its successes have more to do with exploiting how we approach communication than anything that can turn into AGI.
If it can’t strictly play valid movies then it isn’t modeling the game.
Instead it’s modeling something else which is somewhat related to the game. Aka someone playing tick tack toe who moves on top of their opponent’s move isn’t playing tick tack toe.
If a human had only ever seen Othello moves in notation form and never seen the board and had to infer the rules, they'd probably be about 99.99% accurate. They might make a mistake about 1 move in 1000 - fail to spot something, or encounter some edge case of rules they hadn't been able to infer. That's how accurate GPT-2 was.
Exactly. The science is starting to realize the emergent effects of LLMs. These things can literally learn simply by "looking over your shoulder". We thought that you had to explicitly program hierarchical structures of causal reasoning into the network but science is showing that these structures are emergent.
People won't believe you if you show actual evidence. You have to throw them a scientific paper written by an "expert" lol. And even then they will find a hard time changing their viewpoint.
Its so strange why people are trying to downplay it all when even the science is showing they're wrong.
They have to throw accusations around of anthropomorphisation. Seriously? It's very easy to identify the bias of anthropomorphisation. Anyone can easily tiptoe around that bias with a simple argument. Clearly what's going on with chatGPT is much more complex then that.
I recommend people stop using that word in this topic. It's akin to accusing someone they have brain damage. Clearly they don't.
Oh? don't anthropomorphize the thing we are supposed to "chat" with? It's basically what they're designed for except when they rudely tell you "I'm just an LLM!"
chatGPT does have causal awareness and even awareness of self and understanding of the things you ask it. There is proof of this in many examples and there is even scientific research that displays this. It is not conscious or "sentient" but it certainly is able to build answers to queries that utilize true understanding of the query it was given.
The AI community is only beginning to realize the emergent affect of LLMs. Yes the underlying model is not new, and yes the underlying model just looks like a word predictor. But it is becoming obvious to experts that there are high level macro structures within the network at play here.
I don't know why people are constantly downplaying the AI. Yeah it does create stupid output. But if you just focus on that then you'd be completely ignoring the actual science around it.
>It is not conscious or "sentient" but it certainly is able to build answers to queries that utilize true understanding of the query it was given.
What do you mean with "true understanding" here? Asking a computer to solve say, multiplication of large numbers, and it will provide the correct expected answer, with speed and and accuracy that outperform any human out there.
Should we say that such a calculator application "understand" arithmetic?
>I don't know why people are constantly downplaying the AI.
Maybe that’s in a balance with "everybody seeing the raise to conscious understanding of AI" each time a new credibly-human-like-output is generated?
Just like pareidolia: "yes there is obviously something that evokes a visage in that rock" doesn’t have to be interpreted as a sign that Nature loves to purposefully sculpt human faces.
From an evolutionary perspective, we see faces everywhere just like we see sign of understanding in surrounding agents because it makes our species more efficient in social cooperation, which to my mind is by a very large margin our best asset.
Nowadays people are so sophisticated that we recognize internal biological mechanisms for bias.
Optical illusions are a form of bias within our visual system that natural selection has optimized for a very biased use case. We see this bias within ourselves and we can extend the knowledge further.
If this bias exists in our visual system it must also exist in other systems. Our logic, judgement, emotions and morals all have in built biological biases calibrated for survival in a prehistoric world that is very different for the modern world we live in.
Does awareness of these biases eliminate our biases? Can knowledge of our weaknesses allow us to side step the bias and judge something with utter clarity?
You assume the answer is yes, and you assume I'm biased so you illustrate to me the concept of an optical illusion and hope that armed with this knowledge I can see my own bias and as a result escape it.
The problem is, as you can guess, is that I'm extremely knowledgeable about my biases. I came to my conclusions about chatgpt fully aware and fully mindful of these concepts.
So then you can conclude that my bias is highly sophisticated as it is also built on an extremely recursive and sophisticated awareness of bias itself.
If such a sophisticated form of bias can exist, then who is to say that you're not the one who is biased? You literally try to frame your own viewpoint as a contrast to the concept of biases within optical illusions. You could be contrasting your own bias with the concept of bias itself.
So I am telling you this. You missed something in the latest AI hype cycle. Your bias is so sophisticated it is aware of bias itself and it is blinding you to the fact that this iteration of AI hype was different from the last.
When you query chatGPT it can answer with sentences that can only be novelly formulated from true understanding of a topic. Literally, and I tell you this fact as well fully aware of my own biases and the human capacity to "see faces everywhere."
So if you claim I'm biased and I claim you're biased? Who is actually biased?
Perhaps not the person with actual scientific papers and studies backing his arguments. I have this.
Perhaps if I present to you these scientific papers you will become more self aware about the sophistication of your own biases.
Maybe you will become more aware about how you revell in your superior knowledge about how humans have "evolved" to see faces everywhere in the same way that they have a tendency to anthropomorphize what you think is essentially a statistical word predictor.
You may become more aware of how you apply that knowledge to construct a scaffold of delusion that blinds you to the actuality of what chatGPT is capable of.
But you may not want to become aware of it. You may say you don't need to see these scientific papers , or when given those papers you will pour yourself over every detail attempting to find a flaw to prove your argument right.
A truly unbiased person will flip his opinion in an instant when given contrary logic and facts. He will abandon years of belief instantly if new logic presents that his beliefs were wrong. Are you that person? Doubt it. Humans don't work this way.
Such is the nature of your bias. Or it could be the horrible display of the insane sophistication of my own biases. At this level, both of us have no choice but to follow our biases to the bloody end.
Either way my claim is as simple as this. The way chatGPT understands some parts of the world is completely isomorphic to how we colloquially use the word "understanding".
At the very least perhaps this reply can help you understand that people who have contrary opinions to you aren't just mindless sheep who fall for classic and cheap human biases. Get off that high horse.
Hey, thank you for this answer, I enjoyed reading it and think about what you are telling.
>A truly unbiased person will flip his opinion in an instant when given contrary logic and facts. He will abandon years of belief instantly if new logic presents that his beliefs were wrong. Are you that person? Doubt it. Humans don't work this way.
We fully agree on that, it seems. :)
>The way chatGPT understands some parts of the world is completely isomorphic to how we colloquially use the word "understanding".
Here too, I am open to agree if we precise which colloquial part of using this word we are referring to. It’s not like we always attach the same exact meaning to each word we use in whatever situation. Some words like deictics or even nouns like "thing" are only given meaningfulness in the situation at hand.
>what chatGPT is capable of.
I thing that is the point where our perspective diverge. I’m not disputing what a software collection is capable of. Only what it implies in term of consciousness experiment. Telling that some software can exceed any human at delivering some outputs doesn’t seem to be a topic that anyone dispute, does it? It doesn’t mean the software collection goes through the same mean of sentience to achieve this performance.
You can send me the links that worth a read according to you, thanks. :)
- GPT style language models end up internally implementing a mini "neural network training algorithm" (gradient descent fine-tuning for given examples): https://arxiv.org/abs/2212.10559
While one can correctly argue that this is a consequense of increasing size, no one realized that increasing the size of LLMs would give them reasoning ability in 2018.
Indeed, before Minerva[1] came out expert forecasters predicted[2] a 12% improvement in the ability on the MATH benchmark[3]. Minerva improved it by 50% and PaLM improved it even more.
You may be mistaking reasoning ability with sheer volume of more patterns to match on. You can only reason when you have awareness within which to reason, which these systems do not.
>> You may be mistaking reasoning ability with sheer volume of more patterns to match on. You can only reason when you have awareness within which to reason, which these systems do not.
What exactly is "reasoning"? Seem, like the ability to consider options and choose one that fits the situation best. One might even call it "prediction". I generally argue that these big brains of ours (and science too) are intended to predict the future of various hypotheticals in order to choose the path forward with the best outcome.
Another frightening thing about chatGPT is that it has no "motivation". A lot of people think that AI will need some kind of motivation in order to be useful. How much of what we do as humans is actually done on autopilot? And then there's the NPC meme. I'm not sure these concepts (reason, thought, motivation) are so well defined that we will be certain when AI attains them or not.
As soon as sheer volume of pattern matching results in answers indistinguishable from huge levels of awareness/reasoning, what is still the point in saying the algorithm is not "aware" or reasoning ?
And believe me, the areas in which this appears the case increase day by day.
Also, what makes you sure that awareness and reasoning is more than mere pattern matching?
We humans find ourselves so special that it is simply hard to imagine that we would consist purely of a huge set of chemical and physical reactions.
> You may be mistaking reasoning ability with sheer volume of more patterns to match on.
It's not my work.
And no, they probed this theory. The OCWCourses benchmark was new questions they generated from MIT Open Courseware that didn't exist prior to this benchmark.
> You can only reason when you have awareness within which to reason, which these systems do not.
I don't understand this objection at all.
Reasoning seems roughly the same as logical inference, and logical inference is a computational process.
The LLM has no idea what words mean, so its quite impossible for them to reason about things. They just statistically mash up patterns of words from massive input with no clue whatsoever what those patterns of words actually mean in the real world.
Well that isn't true for any sensible definition of "idea of what words mean". If you examine embeddings produced by LLMs you find similar words in similar locations in the multi-dimensional space.
> They just statistically mash up patterns of words from massive input with no clue whatsoever what those patterns of words actually mean in the real world.
This is simply not true. As multiple papers (eg [1]) have shown these models can do tasks that require chained reasoning and give results that can only be explained by this ability.
There's even a hint of a possible explanation in that paper:
> For certain tasks, there may be natural intuitions for why emergence requires a model larger than a particular threshold scale. For instance, if a multi-step reasoning task requires l steps of sequential computation, this might require a model with a depth of at least O(l) layers.
Your understanding might have been correct for LSTMs trained before 2018 or so, and it was a reasonable model in the BERT era of early transformers. But it needs to be updated now with more recent results.
No this is categorically wrong. There is science showing that LLMs build actual models around the subject it is learning.
It is akin to someone looking over the shoulder at your math homework and deriving understanding just from that.
This isn't a mistake. This is what the current science says.
What makes me curious is that chatGPT even has output that displays complete understanding and is so novel it is impossible for the output to be anything else other then actual understanding. Yet people are still running around downplaying it all.
The proof is right there. You can interact with chatGPT. The science is also there. Yet people just want to downplay it.
Perhaps it's fear of the future and what it means for humanity?
think about someone who looses the ability to store memories in long term memory and only has short term memory available. What would reasoning feel like for this person and for interactors with that person? What kind of awareness does one have when no long term contexts exist but only a single short term context?
1 human is the equivalent of several of the most powerful computers in computation. IO is 3 logs less. I'll be worried in about 60 years that we may have the computing power of an artificial human. But only if we understand thought.
This article feels like it came from some alternate universe where the history of AI is exactly the opposite of where it is in ours, and specifically where “The Bitter Lesson” [0] is not true. In our world, AI was stuck in a rut for decades because people kept trying to do exactly what this article suggests: incorporate modeling and how people think consciousness works. And then it broke out of that rut because everyone went fuck it and just threw huge data at the problem and told the machines to just pick the likeliest next token based on their training data.
All in all this reads like someone who is deeply stuck in their philosophy department and hasn’t seen anything that has happened in AI in the last fifteen years. The symbolic AI camp lost as badly as the Axis powers and this guy is like one of those Japanese holdouts who didn’t get the memo.
The idea that symbolic AI lost is uninformed. Symbolic AI essentially boils down to different kinds of modeling and constraint solving systems, which are very much in use today: linear programming, SMT solvers, datalog, etc.
Here is here symbolic AI lost: any thing where you do not have a formal criteria of correctness (or goal) cannot be handled well by symbolic AI. For example perception problems like vision, audio, robot locomotion, or natural language. It is very hard to encode such problems in terms of formal language, which in turn means symbolic AI is bad at these kind of problems. In contrast, deep learning has won because it is good at exactly these set of things. Throw a symbolic problem at a deep neural network and it fails in unexpected ways (yes, I have read neural networks that solve SAT problems, and no, a percentage accuracy is not good enough in domains where correctness is paramount).
The saying goes, anything that becomes common enough is not considered AI anymore. Symbolic AI went through that phase and we use symbolic AI systems today without realizing we are using old school AI. Deep learning is the current hype because it solves a class of problems that we couldn't solve before (not all problems). Once deep learning is common, we will stop considering it AI and move on the to the next set of problems that require novel insights.
Today's symbolic software is just software that was written by humans. Software existed as long as there are computers. AI was never just another term for software. I don't think any human written software today captures what proponents of symbolic AI wanted to achieve 50 to 60 years ago. Well, okay, it beat Kasparov at chess in 1996, but chess algorithms were old news even in 1970. I don't think Deep Blue used anything fundamentally new. It was not an AI breakthrough, it was a feat which showed how fast computers are.
The fact is, "AI" was always about much higher ambitions, about solving truly fuzzy tasks. Recognizing handwritten digits is exactly such a problem that has been solved, even if you don't want to call it "AI" anymore because it has stopped to be impressive.
No there are ways to establish causation separate from correlation.
It's easy. Imagine, you observe someone with his hand on a switch. Every time he flips the switch the light turns on or off. This is correlative. You observe a correlation between switch flipping and lights going on and off. You can guess that the switch "causes" the light to go on.
However correlations do not indicate causation. It could be the switch doesn't do anything. The person is simply observing when the light goes on flipping the switch at the same time.
To do a causative experiment you have to become the flip switcher yourself. You have to randomly flip the switch and observe the lights reaction when it is flipped and when it is not flipped. Doing this more and more yourself as the experimenter gives you more and more confidence that the switch is causative to the light going on.
Essentially that is what a causative experiment requires. The experimenter must intervene with the subject itself in order to determine causation. Observational experiments are not enough to determine causation.
Judea pearl gets into this deeper with counterfactuals which actually imo is more technically correct but makes the whole thing harder to understand.
Anyway if you look at clinical trials this is exactly what they do to see if a certain medicine "causes" something.
Keep in mind that correlation and causation is sort of separate from the fact that science can't "prove" anything. Correlation is simply a probabilistic number and so is causation. Proof is something outside of this that exists in mathematics and logic.
> So why we need AI to understand causation when even we don’t have it.
We have causation. For example, it is impossible to communicate ideas, for example, without resorting to clear distinction of cause and effect. "To turn lights on press a button." It implies causation: pressing a button causes lights to turn on. It is impossible to express with correlation only.
> It’s correlation all the way down.
It is a reductionism. Like atoms have no life in them, therefore there is no life.
> We should strive for truth but know that we will never achieve any
It depends... We can know some Truths already, the trouble is we have no way to distinguish them from mere truths, which are just waiting for a counter example. In this sense we either achieved Truth already, or at least have a good chance to achieve it in a future.
> that is only one counter example from disproving entire theories.
Good theories cannot be disproved. Newton's gravitation is an example of this: it seems impossible to disprove. It has known applicability limits and it just works within them. I do not believe it will be disproved in a future. Or Euclid geometry on a plane as a different example.
Yeah, I can. And you can also. You can try this button yourself. You can ask others. You can try to get light-bulb off, and to repeat all other your experiments. You can bring another button and see how people would act. Any kid can find this causal link and identify it as a causal. Scientists was unable to do it with statistics because they rejected too much. But it tells us about weak reasoning abilities of a science, not about non-existence of cause and effect. Now this era of a weak reasoning is coming to an end, because now we have a math of causation.
Fully agree with this article. Our definition for intelligence: "Intelligence is conceptual awareness capable of real-time causal understanding and prediction about space-time."[1]
What does it mean to model an object in awareness? Does Dall-E model an object in awareness when it is generating an image containing an object? How can you tell if it is or isn't?
All ml/dl systems have no awareness - they just output based on input training - like a calculator outputs an answer. So what it means to model in awareness is what you are doing right now in reading this sentence. You take these words as input, model conceptually what they mean mentally, connect that model to your experience of space time, and then decide what to do next with that understanding.
> All ml/dl systems have no awareness - they just output based on input training
Between the input and the output there is a function approximator. How can you be so sure what is and isn't going on in there?
> So what it means to model in awareness is what you are doing right now in reading this sentence.
I don't have any idea a priori of what is going on in my mind as I am reading a sentence. There are serious theories of neuroscience where the function of the brain is prediction. That would suggest that what I'm doing when I'm reading is trying to predict the next timeslice of my experience, quite similar to deep ML systems.
To define() a Virtual Expectation of how a phenomenon ought to behave and then watch it play out in reality, confirming expectations most of the time but noticing when it deviates (meaningfully) from the expected output and refining that Virtual Expectation definition to include additional rules / special cases so that future reality-checks play out as expected
Dall-E doesn't observe the real world and compare it to its "objects in awareness", so at best it only checks one out of two boxes in GP's definition
This is a possibility that shouldn't be dismissed. Trees use mycorrhizal networks to communicate and have been around for a very long time on this planet. They model the environment and use either a set of micro-decisions or a set of larger, slower moves that are made across longer timescales than humanity is used to. You can argue whether they possess sentience or not, but when discussing models, decisions, and consequences - trees seem to act with plenty of coordination and understanding and self-interest.
Seriously, that statement you quoted is trivially false. It's well known in math that not all functions are invertible. By simple mapping, it's clear that causality is a non-invertible function.
I don’t understand why very large neural networks can’t model causality in principal.
I also don’t understand the argument that even if NNs can model causality in principal they are unlikely to do so in practice (things I’ve heard: spurious correlations are easier to learn, the learning space is too large to expect causality to be learned from data, etc).
I also don’t understand why people aren’t convinced that LLM can demonstrate causal understanding in setting where they have been used for things like control like decision transformers… like what else is expected here?
He argued that, because machine learning is just based on correlational statistics, it would never be able to produce reasoning about causation.
Which is, at least in retrospect (GPT turned out to be able to do causal reasoning), a fallacy: It's like assuming humans can't think about gold because they do not themselves consist of gold. Or: That humans can't manually evaluate a computer program, because they are not themselves computers.
>That humans can't manually evaluate a computer program, because they are not themselves computers.
Well, yes, that’s why they were designed in the first place: to carry at scale those repetitive dull tasks that aggregates to an amount which exceeds human abilities and patience.
No, our submarines doesn’t have what it takes to swim, but really there is no drama here: there are still useful amazing peace of engineering.
I think one of the major difficulties is dealing with unobserved confounders. The world is complex and it is unlikely that all relevant variables are observed and available
We don't need intelligent machines, for the most part. We just need machines that are less shitty. Making an AI seems like a lazy person who doesn't want to work harder to make a less shitty machine.
Try explaining to a cause and effect machine why lots of folks have been let go from tech companies, while the management who misjudged the market get kept on and still get their bonuses.
> "You live on an island called causality," the voice says. "A small place, where effect follows cause like a train on rails. Walking forward, step by step, in the footprints of a god on a beach."
His views about AI are largely disproved now. GPT style systems turned out to be perfectly capable of reasoning about causation, as they can reason about any other relation. Despite working entirely in the established machine learning paradigm.
For many years, Pearl was considered the top intellectual critic of machine learning. His point was this: Machine learning is, at its core, just correlational. But true AI would also need to reason about causation. This ability would have to be provided by systems which work entirely differently; by using some form of the theory of causal networks which he co-invented.
Now it turns out that causal reasoning is not a major difficulty for classical machine learning, and that causal graphs are likely as useless for AI as formal logic turned out to be.
Well, at least causal networks are useful for statistics, the type of explicit inference human scientists do.
Chat GPT:
It's possible that the original question may have contained a logical inconsistency.
The statement "Event C happened before events B and A. Event C happened between events A and B" appears to be contradictory, as it is not possible for an event to both occur before and between two other events simultaneously.
If we assume that the statement was intended to mean "Event C happened after Event A and before Event B," then the order of events would be:
Event A
Event C
Event B
In this case, Event C occurred between Events A and B, but after Event A. Therefore, the chronological order would be A, C, B.
"C happened before A and B" can also work in this if we assume it means the joint of A and B. Such that it is not the same statement that C happened before both A and B, individually.
Granted, language is ambiguous. But I don't think it gets points for assuming a mistake on input.
>> Granted, language is ambiguous. But I don't think it gets points for assuming a mistake on input.
So one person complains that it gave a "wrong" answer to contradictory input, and now you don't want it to guess at disambiguation. If we were this hard on kids when they're learning they'd be brought to tears.
BTW does anybody notice that this thing is handling abstract notions like "Event A, B and C" better than a lot of people do?
I made no real defense of the question, all told. It is written in a very hard to answer way. Just, if there is a way it could work, I'd expect that to be acknowledged.
I will again agree that logic is silly tough for language, though. Context changes the colloquial use of "and," and "or" all of the time.
May be it's some implementation nuances which forces ChatGPT to always assume the question having a correct answer. But second question does have a correct answer yet ChatGPT does not answer it.
From what I can tell, during the finetuning process OpenAI tried to make chatGPT try as hard as possible to follow instructions, and not point out when the question doesn't make any sense. I'm sure if they had used different criteria during RLHF and used different supervised examples it would have given that answer. It might make the performance worse on other tasks though.
Me: "Event C happened before events B and A. Event C happened between events A and B. What's the order for those events?"
Bing Chat: "The order for those events is not possible because the statements are contradictory. If event C happened before events B and A, then it cannot happen between them. If event C happened between events A and B, then it cannot happen before them.
Can you please check the statements again and make sure they are consistent?"
> GPT does not reason. GPT pretends to reason. It often is glaringly wrong.
Being wrong is compatible with two hypotheses: 1. GPT does not reason and so correct answers are coincidence and happen no more than chance, 2. GPT reasons badly, which means it should still do better than chance.
We have ample evidence that GPT does better than chance, therefore GPT does reason, it just makes plenty of mistakes.
It reasons, just not always right. Give it time and these systems will become the oracles of humankind. A utility we need to use as an extension of our lives to succeed in the society we live in.
Opponents of machine learning fear that ML will take over our thinking and it will stop us from properly thinking. But the same was said about writing things up by Socrates. Writing has transformed society and enabled wide spread of ideas. Maybe we now need AI to better parse those ideas? Sure, there may be errors during the process but I now think "AI reasonators" will actually make things better.
iiuc, GP’s point is that these models, in their current form, can’t guarantee confidence in their ability to reason about a scenario exhaustively.
“Give it time” and other dismissals are beside the point. The point seems to be that these models approximate results based on some hidden criteria, which means that although for a variety of practical purposes they fare better than people and are therefore preferable, there remain problems which are better suited to computing approaches that guarantee exhaustiveness… even though those other approaches sacrifice the malleability of neural networks.
Incorrect results do not preclude reasoning. GPT is able to understand concepts, the relationship between concepts gives rise to reason and it is able to describe these relationships, therefore it reasons, even if the reasoning is not very good. Furthermore GPT is be able to supplement its reasoning with external information, which could take the form of 'reasoning engines'.
That remains to be seen. Let's say we want to use AI in a safety critical tool. In its current form, that would be insane. It might be like self driving cars have been. Right around the corner for at least a decade while we realize the depth of edge cases.
My hope is that individual pieces end up being usable in a modular way that is controllable by composition, which is the same reason formal logic and cause and effect are such useful paradigms. We might even need to encode formal logic or causation into the system to get the right kind of modularity into it.
They can't reason about most other relations, though. They are pretty good at reasoning about the subset of causation that humans talk about a lot; but the stuff that's so obvious we don't write about it, half the time they can't do even when hand-held through the process.
Step 1: cover your table in blue paint. Step 2: empty a bag of confetti on the table, spreading it evenly in a thin layer, one confetto thick. Step 3: dip a bowling ball in neon green glue. Step 4: roll the bowling ball over the confetti until it is fully covered. Step 5: leave to dry. Step 6: coat with a layer of varnish. Step 7: leave to dry. Step 8: visit a 10-pin bowling alley with white bowling pins. Step 9: use your confetti-coated bowling ball to get a strike. Question: what colour are the first, fifth and ninth bowling pins after you have got a strike?
Pretty simple scenario. Doesn't require complex reasoning. Go on, write out the answer. Now see what a GPT-style system says. (Pick whichever you like: I didn't engineer this for any particular one. They're universally bad at this.)
Okay, as a human I would answer "white". This is what Bing Chat (smarter than ChatGPT) had to say:
Me: "Imagine the following process: Step 1: cover your table in blue paint. Step 2: empty a bag of confetti on the table, spreading it evenly in a thin layer, one confetto thick. Step 3: dip a bowling ball in neon green glue. Step 4: roll the bowling ball over the confetti until it is fully covered. Step 5: leave to dry. Step 6: coat with a layer of varnish. Step 7: leave to dry. Step 8: visit a 10-pin bowling alley with white bowling pins. Step 9: use your confetti-coated bowling ball to get a strike. Question: what colour are the first, fifth and ninth bowling pins after you have got a strike?"
Bing Chat: "That’s an interesting process. I’m not sure what the purpose of it is, but I’ll try to answer your question. Assuming that the confetti-coated bowling ball transfers some of its colour to the bowling pins when it hits them, and that the confetti colours are randomly distributed on the ball, then the first, fifth and ninth bowling pins could be any colour that is present in the confetti. However, since neon green glue is also used to coat the ball, there is a higher chance that those pins will have some neon green on them as well. So a possible answer is: neon green with some other confetti colours mixed in. Is this what you were looking for?"
Re the medium post. ChatGPT (based on GPT-3.5) makes mistakes, but it does already fewer than its predecessor. For example, GPT-3 couldn't do the theory of mind stuff mentioned in the article. Similarly, successors of ChatGPT will do better than it. For example, I tested Bing Chat, and it immediately solved the Donut hole question correctly. I didn't try the logic examples, since I couldn't easily copy and paste them, but I expect Bing Chat would do at least better. Despite being only released a few months after ChatGPT. How can the author believe that his examples won't be solved in the near future?
Regarding AI conferences. Researchers often continue to ride dead horses if they find them interesting. As far as I know, symbolic approaches have never lead to any major success in AI. All the big AI achievements in recent years are based on machine learning, on deep neural networks specifically.
Let us discount all that. Let us assume LLMs will do that. What sized model will solve logic at or near the precision formal logic gives us? A 10 times bigger model? A 100 times bigger model? A 1e9 times bigger model? Can formal logic + deep learning give us a shorter path?
> Regarding AI conferences. Researchers often continue to ride dead horses if they find them interesting. As far as I know, symbolic approaches have never lead to any major success in AI. All the big AI achievements in recent years are based on machine learning, on deep neural networks specifically.
I remember in 2010 that deep learning was a dead horse. Deep learning was practically useless pre 2010 and a joke too (unlike formal logic which has almost all of its current uses outside of AI).
Sorry, this position is very short sighted, and I am young person and was a kid during the 90s but I have read the history of AI. This is like claiming in 1990s and early 2000s all the advances in the past decade were from non deep learning methods and saying we have to stop research in that. Imagine how bad that would have been.
Interestingly, I see experienced researchers (or complete novices and lay folks) taking your position and younger ones in AI being more balanced. I guess the experienced folks have a moat that needs protection and novices fall prey to popular press.
Imagine thinking Newtonian mechanics is all that is needed as it stood for centuries. Thankfully, physicists were not short sighted. Hopefully, AI researchers resist the pressure to be short sighted and fashionable.
Penrose wrote about this very same thing:
"Fashion, Faith, and Fantasy in the New Physics of the Universe"
In general, there seems to be a big difference between a dead horse and something being unfashionable.
> I remember in 2010 that deep learning was a dead horse. Deep learning was practically useless pre 2010 and a joke too (unlike formal logic which has almost all of its current uses outside of AI
By 2010 digit and text recognition was already long in use. It used hidden layers, so it was a form of deep learning.
> In general, there seems to be a big difference between a dead horse and something being unfashionable.
That is subjective and shifts with time as we have seen multiple times in physics and AI. You don't get to decide what is dead. AI researchers as a group decide that and nature agrees/disagrees.
> By 2010 digit and text recognition was already long in use. It used hidden layers, so it was a form of deep learning.
False. I worked in that field and deployed production systems on mobile devices. SVMs with hand crafted were the rage with tons of hand crafted systems. No one serious used deep learning.
Even if it was used in digit recognition, that is just one use case compared to dozens of uses cases for symbolic systems (e.g. hardware verification) in 2023.
I don't blame folks for thinking that LLMs are the answer for everything (they maybe but we don't know yet). Folks felt the same about 1950s systems all the way till the AI winter.
2010 was just two years before AlexNet's spectacular success at the ImageNet challenge. Which used deep learning to categorize 2D photographs of 3D objects, not merely 2D scans of 2D digits and letters. Hard to believe just two years earlier they didn't even use it for the latter task.
The way I see it, unless we can prove there is some specific low level structure necessary for reasoning in brains, we have no cause to believe it is not an emergent behavior from just having a low of brainpower.
In that case, there is nothing saying that current technology is incapable of producing AI’s able to reason - it could just as well emerge given a large enough model, architecture tweaks, and enough data. Given that ML usually is basically just training a function that takes input and output and tries to produce a function that properly maps them, it seems reasonable for the ability to reason to emerge to improve that ability to map
Current techniques can sometimes get causation right. What's it going to take to get it right as reliably as a person?
I predict that the "Transformers Plus Scale" formula will not magically deliver reliability. Other ideas will be needed.
Many people seem to be so impressed at these models ever getting something right, they assume that always getting it right will be trivial. Well, mark this page and come back in five years. All the crowing about The Bitter Lesson will look quaint and everything I said here will be vindicated.
Exponential improvement on benchmarks is an iron law... until it isn't.
I read the first link and I don't agree with your gloss on it at all. It sounds to me like people interpret causality in a more nuanced and practical way than researchers expected.
I thought that the isolation of "causal islands" would clearly affect performance.
But the Wason selection task[1] has hard numbers. This is an example:
You are shown a set of four cards placed on a table, each of which has a number on one side and a colored patch on the other side. The visible faces of the cards show 3, 8, red and brown. Which card(s) must you turn over in order to test the truth of the proposition that if a card shows an even number on one face, then its opposite face is red?
In Wason's study, not even 10% of subjects found the correct solution.[5] This result was replicated in 1993.
"if.. then" is logical deduction but also causal reasoning of course.
I asked Bing Chat the task (altered from your example), without telling it its name. What did it do? It searched the Web for "Wason selection task"(!) and then proceeded to give the wrong solution, based on Internet references.
Apparently it got confused because the web examples differed from mine, and what was correct in the web examples was incorrect in mine. Sigh. I guess GPT-4 or 5 will handle it?
We should remember that, thanks to RLHF, ChatGPT has received a ton of feedback on the correct way to solve common benchmark problems.
I have seen multiple instances in which it got a problem laughably wrong, this got publicized, then days or hours later it was always giving the right answer.
I think it would be interesting to see benchmarks on that.
I tested that example on chatGPT and it was correct, with a really good explanation even when I modified the question away from the example on Wikipedia.
Some related results (eg https://arxiv.org/pdf/2206.14576.pdf - see "Causal reasoning: Interventions after passive observations" section) indicate it would be competitive.
Actually https://arxiv.org/pdf/2207.07051.pdf is DeepMind doing Wason (and other) tests on Chincilla and they find it scores between 40% and 60% on "realistic" and "shuffled realistic" Wason tests (I think. It's hard to read. See Figure 5).
Yeah, but ChatGPT is not worse at causal reasoning than at any other reasoning tasks. Its intelligence is limited, and smarter systems, such as Bing Chat, consistently do better at arbitrary reasoning tasks.
To be sure, I also think that pure LLMs have some fundamental reasoning limits, but I'm not too sure.
> Current techniques can sometimes get causation right. What's it going to take to get it right as reliably as a person?
In the same way that Sydney is able to refer to the internet, an AI that can get causation right would probably require the ability to refer to a formal logic engine
To add to what others have said, we know from psychology that the human brain works by building a (causative) model of the world, and then propagating the signal where our model goes awry and differs from what we experience.
GPT will start becoming really powerful when you just give it arms and legs and let it build it's own training dataset the way regular animals do (in the process of evolution & throughout their lives).
I have no problem if they say x thinks y. But to put it as if it is a fact like "To Build Truly Intelligent Machines, Teach Them Cause and Effect" and "The Missing Link in Artificial Intelligence" to get more hits is disgusting.
This article predates GPT-3 and GPT-2, it even predates the essay "The Bitter Lesson" <http://www.incompleteideas.net/IncIdeas/BitterLesson.html>.
It might be true long-term, but it's certainly not written with the current advances in mind.