Play it safe solution to the Prisoner's Dilemma.
Suppose A and B are prisoners in the situation described above, and that neither is confident of their ability to predict whether the other will confess or not based on her previous decisions (because A and B are unprofessional, so they did not get their stories straight beforehand). Thus, assume that each prisoner confesses at random (with a weighted coin, say) such that s/he confesses a certain percentage of the time (this is A's "strategy" over a large number of "games"). We wish to find a strategy for A that will ensure him a moderate expected value regardless of B's decisions.
First, some definitions:
- Let a := the outcome if A confesses and B does not. ( 0 in the above writeup by Jennifer).
- Let b := the outcome if A does not confess and B does. ( -9 above).
- Let c := the outcome if both confess. ( -6 above).
- Let d := the outcome if both do not confess. ( -1 above).
- Note that a,b and c are negative because I like to think of positive as good (who knows why :).
- Let p := the probability that A confesses (we want to find this... ).
- Let q := the probability that B confesses (...such that this doesn't matter).
Then, the function E:[0,1]X[0,1] -> R defined by
E(p,q) := p(1 - q) a + q(1 - p) b + pq c + (1 - p)(1 - q) d
= (a - d) p + (b - d) q + (c - b + d - a) pq
= (a - d) p + ((b - d) + (c - b + d - a) p) q + d
returns A's expected outcome.
Now we observe that if
((b - d) + (c - b + d - a) p) = 0
, then the q term above
vanishes, which is exactly what we wanted. So
(d - b)
p = -----------------
(c - a) + (d - b)
1
= ----------------- .
(c - a)
------- + 1
(d - b)
Thus (c-a)/(d-b) >= 0, since otherwise, p would not be in [0,1], so we would have no play safe solution. (We don't worry that d=b here, because if so, p=0 is a valid solution)
Thus we have a solution
iff ((c>=a)
and (d>b))
or ((c<=a) and (d<b)).
Note: An alternative approach that leads to the same result is to take the
partial derivative of E with respect to q, set it equal to zero and solve for p. This approach works because E is
linear in q, so that dE/dq is
constant with respect to q.
Some practical discussion:
Practically speaking, we can strengthen this condition:
d<b implies that A gets a heavier sentence if both do not confess (so there is not as much proof that they did it) than if A does not confess and B turns him in; thus, the second "or" term above cannot be satisfied, so we must have:
(c>=a) and (d>b)
We have just seen that d>b. Now we can assume that if A confesses, his/her sentence will be no less if B does not confess than if B does so c<=a (it could be more, because A's sentence may be shortened since s/he was more cooperative than B). So the only practical case with a solution is where A has nothing to lose by confessing. In this case, p=1, as our intuition tells us. The expected outcome is then:
E(1,q) = (a - d) 1 + ((b - d) + (c - a + d - b) p) q + d = a - d + (0)q + d = a
Of course, this case is pretty trivial, since c=a removes the "dilemma". But isn't it nice to know that the math supports it?