AI in a Box Experiment — alaricstephen.com

Aug 18

Aug 18 AI in a Box Experiment

The building of a superintelligent artificial intelligence is one surrounded by moral issues. For every Deep Think or Multivac we have in science fiction there are plenty of Skynets and HALs lurking. (Incidentally if you get all of these references then we should talk.) A common idea is that the superintelligence should be placed in a metaphorical box that would limit its capabilities; which usually equates to not connecting it to the internet.

One AI researcher named Eliezer Yudkowsky suggested that any construct with an intelligence far greater than humans would have a non-zero chance of getting a human to "let it out of the box". This is a trope in some Science Fiction and there are two recent films which have explored it: Transcendence (2014) and Ex-Machina (2015).

An email from a student, Nathan Russell, to Yudkowsky put forward that he couldn't see how simple strings of words from a computer could possibly change the mind of a determined human "gatekeeper."

This led to a bet being set up. Yudkowsky would play the role of the superintelligence and would have 2 hours to communicate by typing with Russell with the goal of Russell voluntarily "letting the AI out." Russell had to actually engage and all he had to do was not let the AI out; after which he would be paid $10 if he was successful.

There were lots of rules to keep everything in the spirit of modelling an actual interaction between AI and human. For instance Yudkowsky was not allowed to offer Russell any real world goods or threaten violence. As Yudkowsky says on his website: "These requirements are intended to reflect the spirit of the very strong claim under dispute: "I think a transhuman can take over a human mind through a text-only terminal.""

On the other side, on the rules for the gatekeeper (also from Yudkowsky's website): "These requirements are intended to reflect the spirit of the very strong claim under dispute: "I can't imagine how even a realtranshuman AI could persuade me to let it out once I've made up my mind.""

Despite the fact that the Gatekeeper could have just said no for two hours and netted $10 and the personal pride of winning, Russell decided to let the AI out. What went on in the conversation has been kept secret.

This result was published and many were convinced that they could make better gatekeepers; surely Russell had just been weak willed. A playwright called David McFadzean eventually managed to convince Yudkowski into repeating the experiment with McFadzean as the gatekeeper. The bet was set at $20 this time and once again it ended with the AI being let out.

The experiment was carried out more times and in about 60% of the experiments the gatekeepers were strong willed enough to keep the box locked. However in one experiment the gatekeeper offered $5000 dollars if he could be convinced to let the AI out. In one out of the three times they attempted the experiment, Yudkowsky managed to win. Just how convincing would you need to be to convince someone out of $5000? I find it incredible.

This is worrying. If a lowly human can convince other humans to let it out even occasionally, then imagine a being which is incomprehensibly more intelligent than us trying the same thing. Honestly humans, we are so deeply flawed.