One thing I find that constantly makes pain for users is assuming that any of these models are thinking, when in reality they're completing a sentence. This might seem like a nitpick at first, but it's a huge deal in reality: if you ask a language model to evaluate whether a solution is right, it's not evaluating the solution, it's giving you a statistically likely next sentence where yes and no are fairly common. If you tell it's wrong, the likely next sentence is something affirming it, but it doesen't really make a difference.
The only way to use a tool like this is to give a problem that fits context, evaluate the solution it chugs at you and re-roll it if it wasn't correct. Don't tell a language model to think because it can't and won't. It's a way less efficient way of re-rolling the solution
You’re right and wrong at the same time. A quantum superposition of validity.
The word thinking is going too much work in your argument, but arguably “assume it’s thinking” is not doing enough work.
The models do compute and can reduce entropy; however, they don’t match the way we presume things do this because we assume every intelligence is human or more accurately the same as our own mind.
To see the algorithm for what it is, you can make it work through a logical set of steps from input to output but it requires multiple passes. The models use a heuristic pattern matching approach to reasoning instead of a computational one like symbolic logic.
While the algorithms are computed, the virtual space the input is transformed to the output is not computational.
The models remain incredible and remarkable but they are incomplete.
Further there is a huge garbage in garbage out problem as often the input to the model lacks enough information to decide on the next transformation to the code base. That’s part of the illusion of conversationality that tricks us into thinking the algorithm is like a human.
AI has always had human reactions like this. Eliza was surprisingly effective, right?
It may be that average humans are not capable of interacting with an AI reliably because the illusion is overwhelming for instinctive reasons.
As engineers we should try to accurately assess and measure what is actually happening so we can predict and reason about how the models fit into systems.
Slight quibble, but the reinforcement learning from human feedback means they're trained (somewhat) on what the specific human asking the question is likely to consider right or wrong.
This is both why they're sycophantic, and also why they're better than just median internet comments.
But this is only a slight quibble, because what you say is also somewhat true, and why they have such a hard time saying "I don't know".
I’m guessing the argument is that LLMs get worse for problems they haven’t seen before, so you may assume they think for problems that are commonly discussed in the internet or seen on github, but once you step out of that zone, you get plausible but logically false results.
That or a reductive fallacy, in either case I’m not convinced, IMO they are just not smart enough (either due to lack of complexity in the architecture or bad training that didn’t help it generalize reasoning patterns).
They regurgitate what they're trained on so they're largely consensus based. However, the consensus can be frequently wrong--especially when the information is outdated
Someone with the ability to "think" should be able to separate oft repeated fiction from fact
The only way to use a tool like this is to give a problem that fits context, evaluate the solution it chugs at you and re-roll it if it wasn't correct. Don't tell a language model to think because it can't and won't. It's a way less efficient way of re-rolling the solution