Co-operation in the prisoner's dilemma proves elusive, since for the one-shot game, it is impossible to be confident in the trustworthiness of the opponent and so one must defect to prevent a sucker payoff. However, a scenario exists whereby the true nature of another player can be assessed. For this to be the case, the participants must play the game multiple times, with knowledge of the previous moves. Thus duplicity is no longer of use- with a visible reputation for defection, a player is unlikely to convince another to co-operate. This introduces an incentive to co-operate
early (risking the sucker payoff) to benefit from mutual co-operation later. In fact, a strategy along these lines turns out to offer greater expected payoff than persistent defection; co-operation can emerge from a non-cooperative interpretation of game theory.
Definition: In an iterated game in which each round consists of a play of the prisoner's dilemma against the same opponent, the Tit-for-Tat strategy is:
- In round 1, co-operate.
- In round n for n ≥ 2, play your opponent's strategy from round n - 1.
In 1974, political scientist Robert Axelrod2 organised a computer simulation of the iterated prisoner's dilemma by inviting game theorists to supply programs implementing their strategy
of choice. Each program was run against all the others, itself, and a random program which
opted for co-operation or defection with equal probability each round. In a preliminary event,
Anatol Rapport's Tit-for-Tat program only achieved second place, with victory going to look
ahead, a program which employed tree-searching techniques similar to those used in computer
chess programs. None-the-less, it captured the interest of many of the participants and many of
them sought to improve upon tit-for-tat for the main competition. However, it transpired that the
most elegant formulation was also the most effective, with Tit-for-Tat scoring higher than any of
the other 13 entries to the main event (although, as Axelrod notes, no player submitted the tit for
two tats program which would have scored higher still against this field).
Theorem If the prisoner's dilemma is iterated for an indefinite number of rounds with the
probability of a succesive round being sufficiently high (i.e., a large number of iterations are carried out) then Tit-for-Tat is a Nash equilibrium.
Notice the stipulation on an unknown number of rounds: were the precise timing of the final
round known, then that round could be played without fear of repercussions and thus reduces to
the single iteration case, with defection being the rational behaviour in that round. Further, with
too few iterations a strategy of initial co-operation will be unable to catch up with a defection
strategy that succeeds in suckering a co-operating player.
Proof: In Hargreaves1 it is demonstrated that there are only three types of response to a tit-for-tat player t:
- c, co-operation in all future rounds.
- d, defect in all future rounds.
- a, alternate defection and co-operation.
Note that c is equivalent to playing t against t, since mutual co-operation will arise in this scenario.
Thus to show that t is the best response to t (that is, a Nash equilibrium), we require that
E1(t, t) ≥ E1(d, t) and E1(t, t) ≥ E1(a, t)
In the context of an iterated game, (rather than the strategic forms considered earlier) we can
calculate the expected returns E1 as follows. Suppose a single round is played, and subsequent
iterations occur with fixed probability p. Then the expected number of rounds is given by the
geometric sum 1+p+p2+p3 +. . . = 1/(1-p) and so
E1(t, t) = E1(c, t) = 3/(1-p).
Using the payoff matrix given here, for a defection strategy d, there is a return of 4 in the first round, then a return of 1 for all subsequent rounds (as tit-for-tat retaliates by continually defecting). Thus
E1(d, t) = 4×1 + 1×(p+p2+p3+...)=4 + p(1+p+p2+...) = 4 + p/(1-p)
For the alternating strategy a, the participants are perpetually out of sync and thus alternate
between sucker payoffs of 4 and 0. This, via a geometric progression in p2, gives rise to expectation:
E1(a, t) = 4×1 + 0×p + 4×p^2 +... = 4(1+p2+p4+...) = 4/(1-p2
Hence (by some algebra or graphical techniques) if p ≥ 1/3
E1(t, t) ≥ E1(d, t) and E1(t, t) ≥ E1(a, t)
as desired.
However, an optimal approach to playing the iterated prisoner's dilemma needn't be the optimal entry to an IPD tournament; under certain conditions (notably, the ability to enter multiple programs), it is possible to achieve higher scores for some programs through the sacrifice of others. For more details on this recent result, see Wntrmute's writeup under prisoner's dilemma.
1 S.P. Hargreaves Heap, Y. Varoufakis ‘Game Theory- A critical text’ (Second edition, 2004)
2 R.Axelrod (1980) ‘Effective choice in the prisoner’s dilemma’ Journal of Conflict Resolution
Vol 24 No. 1 (March).
Part of A survey of game theory- see project homenode for details and links to the print version.