Repeated Prisoner's Dilemma — alaricstephen.com

Mar 29

Mar 29 Repeated Prisoner's Dilemma

The prisoners’ dilemma is a classic of game theory. There are two prisoners who have been caught doing a bank heist, but the police don't quite have enough to pin it to them. They do have enough evidence to get them on some more minor misdemeanor. Each prisoner is taken to a separate room where they are interrogated. While both originally agreed to cooperate with each other and not give in to the police, they are faced with a dilemma. If both stay quiet then the police will put them away for 6 months on the minor crime; unable to make a higher sentence stick. However if one prisoner defects and gives the other one’s name to the police then they get to go free, while the other gets 10 years. Finally, if both prisoners defect then they both get 8 years in prison.

So the best thing for both prisoners would be to keep quiet. However, if you know the other prisoner is going to cooperate then by defecting you could go from doing 6 months to doing no time. Similarly if you are pretty sure the other prisoner is going to defect, then once again your best move is to defect; thereby getting 8 years rather than 10 years. So although there is a best solution in theory, in practice in every scenario defecting is best.

That is all just the bog standard normal prisoners’ dilemma. But common extension is for 2 players to play the game a few times in a row. Let’s abstract it and say that if both players choose to cooperate (C) then they get 2 points. If one cooperates and the other defects (D) then the cooperator gets nothing, while the defector gets 3 points. Finally if they both defect then they both only get 1 point. This has the same structure as above but with points as a positive thing. We are going to have both players play 5 times in a row.

What makes this different is that you can now be judged by your past actions. Defection can be punished, while cooperation can rewarded. Let's look at some strategies. Always picking C would be great against someone else doing the same thing, but would get destroyed by someone that always picks D. However someone that always picks D would do rubbishly against someone doing the same. Tit for Tat is a well performing strategy which starts by picking C and then picks whatever the other player did the turn before. This will get loads of points from people that always picks C and while minimise their loss against those that always defect.

These strategies can be tested against each other by having tournaments of computer programs all playing each other. Those strategies that get lots of points can be propagated more for the next game. These experiments have been done lots of times and they have thrown up some qualities of the best strategies. There should be an element of reward for cooperation and punishment for defection. However another more subtle quality of the best strategies was forgiveness. Imagine two Tit for Taters that get caught in a loop. Each has just been punished and so each punishes the other. This is a horrific spiral. If one of them is programmed to occasionally randomly pick C even if they have just had D picked against them, then it breaks the cycle.