4.2 The role of decentralized punishment: spontaneous co-operation in repeated PD games


Back to contents - Previous file - Next file


A survey of prominent results in two-player games

As a unique model of strategic behaviour, the PD poses an obvious problem, all the more so as non-co-operation appears as a stable outcome in this game. Even if one player offers in advance a commitment that he will co-operate (which, given the structure of the payoffs in the PD game, can only be the result of irrational behaviour), the other player will stick to a free-riding strategy. Likewise, if both players agree to co-operate, they both have an incentive to violate such a (non-binding) agreement. Precisely because they have found such a result too limiting, game theorists have worked hard to determine analytical conditions under which the mutual defection outcome would cease to be a unique possible equilibrium even within the basic framework of the PD game. In other words, they have set about demonstrating the possibility of co-operation without giving up the payoff structure characteristic of the PD.

FIG. 4.1. A repeated PD game

As pointed out above, it is by assuming that the PD game is repeated, thereby creating what is called a (PD) supergame, that the co-operative equilibrium can be generated. The fundamental reason why co-operation—understood as a high occurrence of (C,C) outcomes—may then be consistent with self-interested behaviour is that the repetition of the game opens the door to the possibility of conditional co-operation and punishment, a possibility which was precluded as long as the PD was deemed to operate within a single-period framework (see above). More precisely, to show that co-operation is possible, the assumption must be made that the game is repeated infinitely or that information is incomplete—there is some uncertainty about the others' strategies (either because their payoffs or the degree of their rationality are imperfectly known) or about the length of the game (the game horizon is finite but indefinite). In the following, we explain these results in three distinct steps.

  1. Repetition of the PD game is not by itself sufficient to make co-operation a possible outcome. Indeed, when the game has a finite length (players know for sure when the process of their interactions will end), non-co-operation is the unique equilibrium outcome. Let us illustrate this important result with the help of a simple example. Consider two players who have to choose whether to co-operate or defect on two successive occasions. In particular, they face at each period the same payoff structure as given in Figure 4.1.

Each player when choosing his own strategy, has to decide whether to co-operate or to defect at the first period and at the second period. A strategy is now somewhat more complicated to devise than in a one-shot PD game. Indeed, the move in the second period can now be conditioned by the outcome of the first period (that is, by the history of the game till the second stage is reached) and this implies that a strategy is a complete plan of action over the whole game: 'it specifies a feasible action for the player in every contingency in which the player might be called upon to act' (Gibbons, 1992: 93). For instance, a possible strategy for player A, called a 'strategy of brave co-operation', could be to start by co-operating in the first period, to co-operate in the second period if player B has co operated in the first period and to defect otherwise. In such a strategy, the fact that player B has acted 'co-operatively in the first period is interpreted as an act of 'goodwill' which must be reciprocated in the second period. Can such a strategy be an equilibrium strategy?

To answer this question, one has to look at what happens in the second period. If, in the second period, player A co-operates, clearly, player B should defect, since such a move yields a payoff of 10 to player B instead of 5 if he co-operates. But, then, it cannot be part of an equilibrium strategy for A to co-operate since, if B always chooses to defect in the second period, he would be better off defecting in the second period (which would bring him a payoff of 0 instead of a negative payoff of -1): co-operation in the second period is not the best response to unconditional defection of player B in that period. Therefore, in such a framework, a strategy of brave reciprocation cannot be an equilibrium strategy. It is also clear that, if player A defects in the second period, it is always better for player B to also defect in period 2, as a result of which unconditional defection is a dominant move for both players in the second period. Reasoning backwards, it is now easy to understand why co-operation cannot be an equilibrium move even in the first period, since it cannot help establish co-operation in the second period. Unconditional defection in both periods is a dominant strategy for both players and corresponds to the unique equilibrium of this repeated game.

The same reasoning, known as the backwards induction argument, can be applied to any PD game repeated a finite number of times, T: the players anticipate that cooperation in period T- I cannot trigger co operation in period T(since defection is a dominant move in that period), and therefore choose to also defect in period T- 1. Similarly, a co-operative move in period T- 2 will not help establish co-operation in the following two periods, and defection is a dominant strategy. Obviously, this is equally true of any period t with respect to the T- t rounds still to be played. Unconditional defection at all periods is the equilibrium strategy of this game. Since this argument applies irrespective of the length of the game, it is moreover evident that, even as the number of rounds increases, co-operation does not become an equilibrium behaviour in a finitely repeated game.

  1. What deserves to be emphasized is that, in the above example, the number of rounds (stages) in the game is finite and certain. If the length of the game is infinite, co-operation becomes possible. This is because there is no more any last period from which to reason in the way described above. In this case, it may be worth while giving co operation a try. A similar possibility obtains when the length of the game is finite but indefinite. It is useful to elaborate on this case in order to show more precisely why and under which conditions co-operation may arise. Consider the simple strategy known as 'tit for tat', based on the following principles:
  1. Start by choosing to co-operate
  2. Thereafter in period n choose the action that the other player chose in period n - 1.

FIG. 4.2. A 2 x 2 symmetrical PD game

We will now demonstrate that 'tit for tat' is an equilibrium strategy under certain conditions. Consider the version of the PD game given in Figure 4.2.

In this game, c is therefore the utility which one player loses when he (she) cooperates while the other free-rides; b is the utility he (she) gains when the situation is reversed (b > 0, c > 0). If both players co operate, they receive b - c while if both defect they act zero. It is assumed that b > c: players derive more utility from joint cooperation than from joint defection. Furthermore, individuals play the above game repeatedly and non-anonymously: each player accumulates experience of the behaviour of his (her) opponent since he (she) meets him (her) personally at each round of the game and is able to recall his (her) past moves. The game horizon is finite but the players do not know the end of the game. In other words, whenever they meet, the players ignore whether they are encountering for the last time. Let pi be the probability that, after each round, the (extended) game is carried over into the next period. These assumptions, it would appear, offer a valid description of the way human interaction works in many small group settings.

In the foregoing context, a strategy is a plan for playing the whole extended (or repeated) game. Two possible strategies are unconditional co-operation co-operating in every round, irrespective of the opponent's behaviour—and unconditional defection defecting in every round. It is manifest that the former cannot be an equilibrium strategy: if A knows that B is always going to co-operate whatever he himself (or she herself) does, he (she) will have no interest ever to co-operate with B and B would be an eternal sucker (hence the label S given by Sugden to such a strategy of unconditional co-operation). Clearly, S is not a best reply to itself: the only strategies that are best replies to S are those that reply to S by defecting in every round, among which is the strategy of unconditional defection denoted by N (for nasty).

The strategy N. by contrast, is evidently an equilibrium strategy: the only strategies that are best replies to N are those that reply to N by defecting in every round and, since N is such a strategy, it is a best reply to itself (Sugden, 1986: 109). Indeed, if A knows that B will always freeride, there is no point in his (her) ever co-operating with B.

Another equilibrium strategy is precisely the simple tit-for-tat strategy (T for short). As we know, this is a strategy of conditional co-operation or reciprocity (to use Sugden's term). It may be noted that if two T-players meet, they co-operate in every round (since they both start by co-operating and then continue forever to co-operate). If, however, a T-player meets an N-player, the T-player will co-operate only in the first round; thereafter he will defect. This is because T-players are not altruists: they are prepared to co-operate only with people like themselves and they wish to avoid being suckers.

For the sake of illustration, let us now consider three (among many other) possible replies to T: T itself, N. and a new strategy, A (for alternation) which consists of defecting in odd numbered rounds and co-operating in even-numbered ones. It can be shown that one of these three strategies must be a best reply to T (see Axelrod, 1981, 1984 or Sugden, 1986: 110-11 for proof). The expected utility derived from playing each of these strategies against T is as follows:

E(T,T) = (b - c) + p (b - c) + p 2(b - c) + b - c + × × ×

= (b - c)/(1 - p )

E(N,T) = b + 0 + 0 +× × × = b

E(A,T) = b - p c + p 2b - p 3c + p 4b × × ×

= (b - p c)/(1-p 2)

It is then easy to see that if p > c/b, we have E(T,T) > E(N,T) and E(T,T) > E(A,T). Thus, if p > cab, T is better than N or A as a reply to T and, since one of these strategies is a best reply to T. T must necessarily be a best reply to itself. In other words, tit-for-tat is an equilibrium strategy: if player A knows that player B follows the tit-for-tat strategy, the most rational thing for player A to do is to adopt the same strategy.

Let us look at the condition p > c/b more closely. If the game is certain to end after one round, so that p = 0, this condition is violated. We are brought back to the one-shot PD game where each player's interest is to defect. Conversely, if the game horizon is infinite, p = 1 and the condition for T to be an equilibrium strategy is automatically satisfied (since c/b is smaller than one by assumption). In other words, when the players are assured that the game will be played forever, they will follow a strategy of conditional co-operation when they know that their opponents have chosen that strategy, and this they will do irrespective of the particular values taken by 6 and c. Moreover, it is worth noticing that the condition p > c/b does not imply that the game must be very long: for example, if b = 2 and c = 1, the condition is satisfied if p > 1/2, that is, if the average number of rounds played per game is greater than 2.0.

Notice carefully that it is only when B plays T that A has also an interest in playing T: T is a best reply to itself, that is a Nash equilibrium strategy, but not necessarily a best reply to any other strategy. Contrary to defection in the one-shot PD game, T is thus not a dominant strategy. If, for example, B plays N (the 'nasty' strategy of defecting in every round), A's best reply to B cannot he to play a simple tit-for-tat strategy: if B plays N. A would lose c units (since his payoff would be -c in the first round and 0 in all the subsequent rounds), and would obviously be better off defecting from the beginning. In other words, N is also a best reply to itself and therefore also supports a Nash equilibrium. As a matter of fact, there are many possible equilibrium strategies.

The tit-for-tat strategy is clearly a useful pedagogical device to illustrate how a punishment mechanism can be embodied in the strategy itself and thereby allow cooperation to emerge. Bear in mind that this is only possible if the time-horizon is infinite so as to ensure that any player can always be punished for long enough to prevent any deviation from being worth while. This being said, it is important to stress that the tit-for-tat strategy suffers from a fundamental drawback inasmuch as, being myopic and mechanical, it cannot 'absorb' mistakes understood as unintended deviations from the chosen strategy. In other words, whenever a mistake occurs, blind adoption of this strategy inescapably leads to the considerable social losses that result from repeated mutual defections. To see this, suppose A knows that B is a T-player. Yet, in one round, say round i, A makes a mistake and defects (even though B has co-operated in round i-l). A expects B to defect in round i+1 in response to his occasional lapse. If A follows the strict principles of tit for tat, he will respond by defecting in round i+2 which, in turn, would cause B to defect in round i+3, and so on. An endless chain of retaliation and counter-retaliation is thereby initiated which really looks absurd given the fact that the triggering factor was a simple mistake.

In actual fact, this absurd outcome is to do with the fact that such a blind strategy is not subgame-perfect because it relies on an out-of-equilibrium threat that is not credible. Indeed, how can a player believe that, if he deviates, the other player will be ready to carry out a punishment threat that everybody knows unmistakably leads to definitive partial defection? Given that, after a 'mistake' has occurred, a player is known to defect at a particular stage, clearly, the other player should also defect at that stage, instead of co-operating and being a 'sucker'.

In game-theoretical terms, this idea is captured by the concept of subgame-perfectness. A Nash equilibrium is subgame-perfect if the players' strategies constitute a Nash equilibrium in every subgame. A subgame is the piece of a game 'that remains to be played beginning at any point at which the complete history of the game thus far is common knowledge among the players' (Gibbons, 1992: 94-5). In the two-stage PD, for instance, there are four subgames, corresponding to the second-stage games that follow the four possible first-stage outcomes, namely (C,D), (D,C), (D,D), (C,C) [ibid.]. In equilibrium, of course, punishment strategies are devised in such a way that they are never actually implemented. But, to be effective, those punishments must be credible in the sense that each player must find it optimal to carry them out if needed, that is in all the possible subgames whether or not on the equilibrium path.

As shown above, there are many possible Nash equilibria in the infinitely repeated PD game. Such a result is known as the folk theorem, which states that almost any outcome that on average yields at least the mutual defection payoff (i.e. the minimax outcome) to each player can be sustained as a Nash equilibrium. As pointed out by Kreps, 'a good way to interpret the folk theorem, when the players can engage in explicit pre-play negotiation, is that it shows how repetition can greatly expand the set of selfenforcing agreements to which the players might come' (Kreps, 1990: 512). A stronger version of the folk theorem is couched in terms of subgame-perfect Nash equilibrium strategies: it shows that it is possible to find a pair of subgame-perfect equilibrium strategies to support any possible sequence of outcomes (provided it yields on average at least the minimax outcome to each player). In other words, in the infinite version of the prisoner's dilemma, any (finite) succession of actions can be shown to belong to a subgame-perfect equilibrium strategy.

As Abreu (1988) has shown, such strategies need not be particularly complex. More precisely, any subgame-perfect outcome of an infinitely repeated game can be supported by a 'simple' strategy profile which is history-independent, that is it specifies the same punishment for any deviation, after any previous history, by a particular player.

Now, for the folk theorem to hold true, players must be especially induced to punish those who deviate, even if it is costly to them. This is achieved by resorting to what Axelrod (1986) has celled 'metanorms', that is, strategies that punish players who fail to play their part in punishing free-riders (Myerson, 1991: 335-7; Seabright, 1993: 120). Interestingly, Axelrod has shown with the help of a computer-based simulation, that co-operation can be sustained provided that the players start with a sufficiently high level of 'vengefulness', vengefulness being defined as the probability that a player will punish someone who is seen non-punishing (Axelrod, 1986; see also Elster, 1989a: 132-3 and Dasgupta and Mäler, 1990: 16).

  1. One of the most surprising results of repeated game theory is the following: in a game of finite duration, a suspicion by one party that the other may practice a tit-for-tat strategy induces the other to adopt the same and both have then an incentive to cooperate till near the end of the game (Kreps and Wilson, 1982; Kreps et al., 1982; Kreps, 1990: 536-43; see also Friedman, 1990: 190-4). This indicates that the set of Nash equilibria is not robust to slight perturbations. Thus, 'A one-in-one-thousand chance that one's opponent is generous, or that one's opponent assesses a substantial probability that oneself is benevolent, isn't much of a "change" in the game. Yet it completely changes the theoretical prediction', implying that we must be very wary of the theoretical prediction (Kreps, 1990: 542). This agnostic conclusion can be actually tied back to reputation in the following sense:

In the finitely repeated prisoners' dilemma, suppose one player assesses a small probability that the second will 'irrationally' play the strategy of tit-for-tat.... In a long-enough (but still finite) repetition of the game, if you think your opponent plays tit-for-tat, you will want to give cooperation a try. And even if your opponent isn't in this fashion, the 'rational' thing for her to do is to mimic this sort of behavior to keep cooperation alive.... We can think of these effects as 'reputation' in the sense that a player will act 'irrationally' to keep alive the possibility that she is irrational, if being thought of as irrational will later be to that player's benefit through its influence on how others play. That is, in this sort of situation, players must weigh against the short-run benefits of certain actions the long-run consequences of their actions. (Kreps, 1990: 542-3)

It is important to emphasize that, for co-operation to succeed in this kind of game, the following assumption is crucial: if the player with the uncertain type (that is, the player for whom there is a doubt that he could follow a tit-for-tat strategy) ever deviates from that strategy, then he would be immediately considered as being rational by the other player(s) and non-co-operation would ensue.

The above conclusion may also be reached if the number of rounds in the game is rather small. Yet, the probability that the other player can play only the tit-for-tat strategy must be large enough if co-operation is to occur in a game that is not long repeated. If this requirement is met, there exists an equilibrium in which both players co-operate in all but the last two stages of the game. It thus appears that co-operation can be sustained if a sufficiently high suspicion that the opponent plays only tit for tat makes up for a low number of game stages (for more details and proof, see Gibbons, 1992: 224-32).

Two last remarks are in order (see Myerson, 1991: 342). First, not every way of modifying the game with small initial doubt will lead to the co-operative outcome. Perhaps paradoxically, a suspicion that the other player is still more inclined to cooperate than in tit for tat may actually make co-operation impossible. This happens in so far as his strategy entails too much tolerance regarding the other's 'accidental' defection. For instance, if the first player assigns a small positive probability that the other player always co-operates (a 'generous' strategy), then the unique subgame-perfect equilibrium would be for each player always to defect. The best response to the generous strategy is indeed always to defect so that no player has any incentive to cultivate the other's belief that he is going to be always generous. Second, as shown by Fudenberg and Maskin ( 1986), any payoff allocation that is achievable in an infinitely repeated game may also be approximately achieved in a long finite version of the game with smallprobability perturbations.

Co-operation in N-player PD games

So far, attention has been limited to games involving only two players. An interesting question arises as to what extent the reported results can be extended to situations involving more than two players. To answer this question, it is necessary to focus on the decision problem of a given player interacting with the (n - 1) other players. Consider, for instance, a PD game where the collective output Y to be divided equally among the N players is a concave function, f(ni) of the number of voluntary contributors ni, and where each contributor has to bear a fixed cost, c. Figure 4.3 illustrates this case.

FIG. 4.3. N-person prisoner's dilemma

We assume that for all values of ni between 0 and N. which implies that the 'marginal' productivity of an additional contribution is everywhere strictly smaller than N times the fixed cost incurred by one contributor. If such were not the case, then it would mean that the marginal productivity of the contributions is so high as to induce everyone to contribute even though he would internalize only a tiny fraction (1/N) of the benefits: in this case, the problem of public good provision would have obviously vanished. If the above condition holds true, as a comparison of corresponding entries in the figure's two rows indicates, the noncontributing strategy dominates regardless of the number of contributors.

If the above game is repeated a finite and certain number of times, then the backwards induction argument applies, and the dominant strategy of every player is to free-ride at every stage of the game. Conversely, if the length of the game is finite and uncertain or if it is infinite, the results obtained for a two-player game continue to hold provided that the tit-for-tat strategy of conditional co-operation (or the corresponding T1 strategy, defined below) is properly redefined. Indeed, in N-player games, it is not a priori clear what playing tit for tat means since one does not know what number of defections is needed to make one player following the tit-for-tat strategy defect. For the sake of illustration, it can easily be seen that the following is an equilibrium strategy: (i) start co-operating, (ii) defect if at least one other player has defected in the previous round, (iii) otherwise co-operate.

Such a generalized tit-for-tat strategy is a best response to itself and, if followed by everybody, universal co-operation will get established: it is therefore a Nash equilibrium.

Consider now the situation in which all players but one follow a modified strategy where the component (ii) above becomes 'defect if at least two other players have defected in the previous round'. It is then obvious that the last player will choose to start by defecting and continue to do so. The above modified tit-for-tat strategy is therefore not a best reply to itself, yet it is a Nash equilibrium strategy for the (N-1) remaining players. At equilibrium, cooperation is almost universal since there is an unrepentant free-rider. Similarly, if component (ii) above is replaced by 'defect if at least M other players have defected in the previous round', there is a partial cooperation equilibrium in which (M - 1) players defect while (N - M + 1) players cooperate (as long as M remains relatively small).

Notice carefully that the above equilibria are not subgame-perfect. This should not trouble us too much since the stronger version of the folk theorem has been generalized for N-player-games (see, e.g., Myerson, 1991: 331-7), thereby ensuring that other, more sophisticated, strategies exist which support the co-operative outcome as a subgame-perfect equilibrium.

Co-operation and imperfect monitorability

So far, we have assumed that any defection can without cost be unfailingly attributed to the real culprit. In actual fact, this assumption is not as crucial as it may appear at first sight because in an N-player repeated PD game framework, most punishing strategies carry out a threat of collective punishment: this means that, as long as the adverse consequences of defection can be detected by the players, anonymous retaliatory strategies (that is, strategies which are not explicitly directed at the culprits) can be effective in discouraging defection. Just consider the trigger strategy which consists of co-operating as long as no other player has defected, and otherwise defect. It is evident that if everyone follows that strategy, an equilibrium can be sustained in which no defection occurs. Yet, off equilibrium, any lapse into defection would cause a collective reaction which blindly harms everyone's interests. (This does not prevent this strategy from being subgame-perfect.)

Now, it is not at all certain that, as has been assumed above, the adverse consequences of defection can easily be detected. Indeed, if there are exogenous risks (by which we mean uncertainties that are beyond man's control), the players may not be able to ascertain whether a given reduction in the productivity of a resource is due to natural factors or to a malevolent human act. This gives rise to more complex problems as it is more difficult to relate punishment to actual defection given that punishment may be carried out even though exogenous factors are entirely responsible for the disappointing results. For example, in marine fisheries, the complexity of the ecological system is such that it may be difficult to determine whether a drop in total catches is to be ascribed to a sudden change in marine environment (e.g. in marine currents), to overfishing, or to any other potentially harmful human practice (e.g. the use of destructive fishing gears). To take another example, in water management systems, not only may water losses result from stealing by a few participants but also from technical deficiencies that are not directly imputable to the user group. Likewise, in forestry management schemes in which exclusion is imperfectly enforced, participants may be unable to ascertain whether violation of a rule of access to the forest is to be assigned to someone from the user community or to an outsider. Note that this problem is also very common in fisheries and has actually become more serious with the introduction of mechanized, highly mobile boats.

As underlined above, imperfect observability of the aforementioned kind implies that punishment threats that deter opportunistic behaviour may actually have to be carried out with positive probability in equilibrium (since one may misleadingly believe that defection has occurred). As a result, threats in an equilibrium may have a positive expected cost (they may have to be carried out even though nobody defected) and the expected benefit of deterring opportunistic behaviour. In such circumstances, finding the best equilibria for the players 'may require a trade-off or balancing between these costs and benefits' (Myerson, 1991: 343).

In theory, one can envision a kind of punishment mechanism similar to that in the perfect information PD game which is susceptible of deterring defection even though observability of the players' actions is imperfect. Such a mechanism has been illustrated in the case of a class of infinitely repeated PD games by Abreu et al. (1991). Consider five players who are working independently to prevent some damage to a common infrastructure (one could think here of a water management system) from occurring by choosing an appropriate level of supervision and maintenance effort. For instance, assume that (1) in a period of time of length c, the cost to player i of exerting effort at level e, (where 0 < ei < 1) is e (ei+ (ei)2)/2, (2) the probability of the damage occurring is and (3) the damage, if occurring, costs one unit of payoff to each player. The players cannot observe one another's effort levels but everyone can observe the damage whenever it occurs. At each period of time, the expected net payoff to each player is maximized over his own level of effort by letting ei = 1/2. This is of course far below the social optimum which would require each player to exert an effort level equal to one. There is however a way of getting out of this awkward equilibrium. In the words of Myerson:

Because the players can observe accidents but cannot observe one another's effort, the only way to give one another an incentive to increase his or her effort is to threaten that some punishment may occur when there is an accident. Notice that the probability of an accident depends symmetrically on everyone's effort and is positive even when everyone chooses the maximum effort level 1. So when an accident occurs, there is no way to tell who, if anyone, was not exerting enough effort. Furthermore, the only way to punish (or reward) anyone in this game is by reducing (or increasing) effort to change the probability of accidents, which affects everyone equally. (Myerson, 1991: 344).

It can be shown that there exists another equilibrium, which actually maximizes the average expected payoff of each player and, in this equilibrium, high efforts are most effectively encouraged if the players plan to punish only if there is an accident. Similarly, for punishment to be most effective, it is important that all players reduce their effort levels as much as possible and this can be achieved if they return to their 'co-operative' effort levels only when another accident occurs. Such a result may perhaps appear surprising, yet it is perfectly in the logic of punishment strategies that are generated within the PD game itself: It is precisely because such a sanctioning system may be quite costly and complex to carry out that people usually prefer to devise external mechanisms to deter opportunistic behaviour (see below, Chapters 8 and 12).

Continue