But this is all extremely hypothetical. Nobody knows what the real scenario will be, if something like this ever comes up - it probably won't be one person sitting in a room with a chat terminal and a "Release" button, under pressure to make a decision on the spot.
This contrived experiment is being done in the name of research, ostensibly to arm humans with the tools we need to deal with this situation if it arises. If the AI's reasoning can't hold up to a wider analysis, then you can only conclude that it's a trick of some sort, and hopefully when the time comes we won't rely simply on the testimony of one tricked person to make such a critical decision.
I have a lot of respect for Yudkowski but it honestly baffles me that he thinks this particular exercise should be given any weight whatsoever. It's totally unscientific, relies entirely on the unsubstantiated testimony of one individual, and if he weren't the one holding the keys I have to feel like he would be highly critical of it as well.
That exercise was not research, it was a counterexample to a single specific security claim: that we can "choose not to release the AI," so we don't have to worry about them being dangerous.
You don't need to research a counter-example, you just produce one. He did so by exhibiting a relatively weak AI that couldn't be contained.
>You don't need to research a counter-example, you just produce one. He did so by exhibiting a relatively weak AI that couldn't be contained.
Well, no. He showed that a person pretending to be an AI could convince a person to let it out of the box. Sometimes.
It's not really a meaningful counterexample. Everyone involved with the event knew that the AI wasn't real and that there are zero consequences for letting it out. You'd have to concoct a much more involved test as a decent counterexample.
A person pretending to be an AI is surely easier to contain than a super-powerful AI that is smarter than all people. A failure to contain the former -- a strictly easier task -- is a perfect example of a failure to contain the latter.
This contrived experiment is being done in the name of research, ostensibly to arm humans with the tools we need to deal with this situation if it arises. If the AI's reasoning can't hold up to a wider analysis, then you can only conclude that it's a trick of some sort, and hopefully when the time comes we won't rely simply on the testimony of one tricked person to make such a critical decision.
I have a lot of respect for Yudkowski but it honestly baffles me that he thinks this particular exercise should be given any weight whatsoever. It's totally unscientific, relies entirely on the unsubstantiated testimony of one individual, and if he weren't the one holding the keys I have to feel like he would be highly critical of it as well.