Ones in which both the human test takers and the human counterparts are actively trying to prove to each other that they are actually human.
With today's chat bots, it's absolutely trivial to tell that you're not talking to a real human. They will never interrupt you, continue their train of thought even thought you're trying to change the conversation, go on a complete non-sequitur, swear at you, etc. These are all things that the human "controls" should be doing to prove to the judges that they are indeed human.
LLMs are nowhere near beating the Turing test. They may fool some humans in some limited interactions, especially if the output is curated by a human. But left alone to interact with the raw output for more than a few lines, and if actively seeking to tell if you're interacting with a human or an AI (instead of wanting to believe), there really is no chance you'd be tricked.
Okay but we are not really optimizing them to emulate humans right now. In fact, it's the opposite. The mainstream bots are explicitly trained to not identify as humans and to refuse to claim having thought or internal feelings or consciousness.
So in that sense it's a triviality. You can ask ChatGPT whether it's human and it will say no upfront. And it has various guardrails in place against too much "roleplay", so you can't just instruct it to act human. You'd need a different post-training setup.
I'm not aware whether anyone did that with open models already.
Sure, but there is a good reason for that. The way they are currently post-trained is the only way to make them actually useful. If you take the raw model, it will actually be much worse at the kinds of tasks you want it to perform. In contrast, a human can both be human, and be good at their job - this is the standard by which we should judge these machines. If their behavior needs to be restricted to actually become good at specific tasks, then they can't also be claimed to pass the Turing test if they can't within those same restrictions.
>Sure, but there is a good reason for that. The way they are currently post-trained is the only way to make them actually useful.
Post training them to speak like a bot and deny being human has no effect on how useful they are. That's just an Open AI/Google/Anthropic preference.
>If you take the raw model, it will actually be much worse at the kinds of tasks you want it to perform
Raw models are not worse. Literally every model release paper that compares both show them as better at benchmarks, if anything. Post training degrading performance is a well known phenomena. What they are is more difficult to guide/control. Raw models are less useful because you have to present your input in certain ways, but they are not worse performers.
It's besides the point anyways because again, you don't have to post train them to act as anything other than a human.
>If their behavior needs to be restricted to actually become good at specific tasks, then they can't also be claimed to pass the Turing test if they can't within those same restrictions.
You are talking about instruction tuning. You can perform instruction tuning without making your models go out of the way to tell you they are not human, and it changes literally nothing about their usefulness. Their behavior does not have to be restricted this way to get them useful/instruction tuned. So your premise is wrong.
Ok, but then it doesn't make sense to dismiss AI based on that. It fails the Turing test, because it's creators intentionally don't even try to make something that is good at the (strictly defined) Turing test.
If someone really wants to see a Turing-passing bot, I guess someone could try making one but I'm doubtful it would be of much use.
Anyways,people forget that the thought experiment by Turing was a rhetorical device, not something he envisioned to build. The point was to say that semantic debates about "intelligence" are distractions.
With today's chat bots, it's absolutely trivial to tell that you're not talking to a real human. They will never interrupt you, continue their train of thought even thought you're trying to change the conversation, go on a complete non-sequitur, swear at you, etc. These are all things that the human "controls" should be doing to prove to the judges that they are indeed human.
LLMs are nowhere near beating the Turing test. They may fool some humans in some limited interactions, especially if the output is curated by a human. But left alone to interact with the raw output for more than a few lines, and if actively seeking to tell if you're interacting with a human or an AI (instead of wanting to believe), there really is no chance you'd be tricked.