Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One thing I find that constantly makes pain for users is assuming that any of these models are thinking, when in reality they're completing a sentence. This might seem like a nitpick at first, but it's a huge deal in reality: if you ask a language model to evaluate whether a solution is right, it's not evaluating the solution, it's giving you a statistically likely next sentence where yes and no are fairly common. If you tell it's wrong, the likely next sentence is something affirming it, but it doesen't really make a difference.

The only way to use a tool like this is to give a problem that fits context, evaluate the solution it chugs at you and re-roll it if it wasn't correct. Don't tell a language model to think because it can't and won't. It's a way less efficient way of re-rolling the solution





You’re right and wrong at the same time. A quantum superposition of validity.

The word thinking is going too much work in your argument, but arguably “assume it’s thinking” is not doing enough work.

The models do compute and can reduce entropy; however, they don’t match the way we presume things do this because we assume every intelligence is human or more accurately the same as our own mind.

To see the algorithm for what it is, you can make it work through a logical set of steps from input to output but it requires multiple passes. The models use a heuristic pattern matching approach to reasoning instead of a computational one like symbolic logic.

While the algorithms are computed, the virtual space the input is transformed to the output is not computational.

The models remain incredible and remarkable but they are incomplete.

Further there is a huge garbage in garbage out problem as often the input to the model lacks enough information to decide on the next transformation to the code base. That’s part of the illusion of conversationality that tricks us into thinking the algorithm is like a human.

AI has always had human reactions like this. Eliza was surprisingly effective, right?

It may be that average humans are not capable of interacting with an AI reliably because the illusion is overwhelming for instinctive reasons.

As engineers we should try to accurately assess and measure what is actually happening so we can predict and reason about how the models fit into systems.


but it’s also true that the next sentence is generated by evaluating the whole conversation including the proposed solution.

my mental model is that the llm learned to predict what another person would say just by looking at that solution.

so it’s really telling whether the solution is likely (likely!) to be right or wrong


Slight quibble, but the reinforcement learning from human feedback means they're trained (somewhat) on what the specific human asking the question is likely to consider right or wrong.

This is both why they're sycophantic, and also why they're better than just median internet comments.

But this is only a slight quibble, because what you say is also somewhat true, and why they have such a hard time saying "I don't know".


idk… maybe we’ll found out the reason is that on the internet no one ends a conversation saying “i don’t know” :D

That's my point :)

>The only way to use a tool like this is to give a problem that fits context

Or give context to the model which fits the problem. That's more of an art than a science at this point it seems

I think people with better success are those better at generating prompts but that's non trivial


I get that a submarine can't swim.

I'm just not so sure of importance of the difference between swimming and whatever the word for how a submarine moves is.

If it looks like thinking and quacks like thinking...


Can you go into a bit more detail why the two approaches are so different in your opinion?

I don't think I agree and I want to understand this argument better.


I’m guessing the argument is that LLMs get worse for problems they haven’t seen before, so you may assume they think for problems that are commonly discussed in the internet or seen on github, but once you step out of that zone, you get plausible but logically false results.

That or a reductive fallacy, in either case I’m not convinced, IMO they are just not smart enough (either due to lack of complexity in the architecture or bad training that didn’t help it generalize reasoning patterns).


They regurgitate what they're trained on so they're largely consensus based. However, the consensus can be frequently wrong--especially when the information is outdated

Someone with the ability to "think" should be able to separate oft repeated fiction from fact




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: