Iterated Prisoner's Dilemma

Co-operation in the prisoner's dilemma proves elusive, since for the one-shot game, it is impossible to be confident in the trustworthiness of the opponent and so one must defect to prevent a sucker payoff. However, a scenario exists whereby the true nature of another player can be assessed. For this to be the case, the participants must play the game multiple times, with knowledge of the previous moves. Thus duplicity is no longer of use- with a visible reputation for defection, a player is unlikely to convince another to co-operate. This introduces an incentive to co-operate early (risking the sucker payoff) to benefit from mutual co-operation later. In fact, a strategy along these lines turns out to offer greater expected payoff than persistent defection; co-operation can emerge from a non-cooperative interpretation of game theory.

Definition: In an iterated game in which each round consists of a play of the prisoner's dilemma against the same opponent, the Tit-for-Tat strategy is:

In round 1, co-operate.

In round n for n ≥ 2, play your opponent's strategy from round n - 1.

In 1974, political scientist Robert Axelrod² organised a computer simulation of the iterated prisoner's dilemma by inviting game theorists to supply programs implementing their strategy of choice. Each program was run against all the others, itself, and a random program which opted for co-operation or defection with equal probability each round. In a preliminary event, Anatol Rapport's Tit-for-Tat program only achieved second place, with victory going to look ahead, a program which employed tree-searching techniques similar to those used in computer chess programs. None-the-less, it captured the interest of many of the participants and many of them sought to improve upon tit-for-tat for the main competition. However, it transpired that the most elegant formulation was also the most effective, with Tit-for-Tat scoring higher than any of the other 13 entries to the main event (although, as Axelrod notes, no player submitted the tit for two tats program which would have scored higher still against this field).

Theorem If the prisoner's dilemma is iterated for an indefinite number of rounds with the probability of a succesive round being sufficiently high (i.e., a large number of iterations are carried out) then Tit-for-Tat is a Nash equilibrium.

Notice the stipulation on an unknown number of rounds: were the precise timing of the final round known, then that round could be played without fear of repercussions and thus reduces to the single iteration case, with defection being the rational behaviour in that round. Further, with too few iterations a strategy of initial co-operation will be unable to catch up with a defection strategy that succeeds in suckering a co-operating player.

Proof: In Hargreaves¹ it is demonstrated that there are only three types of response to a tit-for-tat player t:

c, co-operation in all future rounds.
d, defect in all future rounds.
a, alternate defection and co-operation.

Note that c is equivalent to playing t against t, since mutual co-operation will arise in this scenario. Thus to show that t is the best response to t (that is, a Nash equilibrium), we require that

E₁(t, t) ≥ E₁(d, t) and E₁(t, t) ≥ E₁(a, t)

In the context of an iterated game, (rather than the strategic forms considered earlier) we can calculate the expected returns E₁ as follows. Suppose a single round is played, and subsequent iterations occur with fixed probability p. Then the expected number of rounds is given by the geometric sum 1+p+p²+p³ +. . . = 1/(1-p) and so

E₁(t, t) = E₁(c, t) = 3/(1-p).

Using the payoff matrix given here, for a defection strategy d, there is a return of 4 in the first round, then a return of 1 for all subsequent rounds (as tit-for-tat retaliates by continually defecting). Thus

E₁(d, t) = 4×1 + 1×(p+p²+p³+...)=4 + p(1+p+p²+...) = 4 + p/(1-p)

For the alternating strategy a, the participants are perpetually out of sync and thus alternate between sucker payoffs of 4 and 0. This, via a geometric progression in p², gives rise to expectation: E₁(a, t) = 4×1 + 0×p + 4×p^2 +... = 4(1+p²+p⁴+...) = 4/(1-p²

Hence (by some algebra or graphical techniques) if p ≥ 1/3

E₁(t, t) ≥ E₁(d, t) and E₁(t, t) ≥ E₁(a, t)

as desired.

However, an optimal approach to playing the iterated prisoner's dilemma needn't be the optimal entry to an IPD tournament; under certain conditions (notably, the ability to enter multiple programs), it is possible to achieve higher scores for some programs through the sacrifice of others. For more details on this recent result, see Wntrmute's writeup under prisoner's dilemma.

¹ S.P. Hargreaves Heap, Y. Varoufakis ‘Game Theory- A critical text’ (Second edition, 2004)
² R.Axelrod (1980) ‘Effective choice in the prisoner’s dilemma’ Journal of Conflict Resolution Vol 24 No. 1 (March).

Part of A survey of game theory- see project homenode for details and links to the print version.

Prisoner's Dilemma	Grim Trigger	gun-to-the-head scenario	Writing chess software
look ahead	Qia and the liars	The Counterfeiters	Game Theory: Example games
Two finger Morra	Bimatrix game	Trust Metrics	Nash equilibrium
payoff	The Selfish Gene	Geometric Progression	geometric sequence
Grim	Defection	tit for tat	lollipop kid
iteration	Trust	Game theory	Random