Survival of the Least Fit: the Prisoner's Dilemma and Evolutionary Computation

Dr Marcus Frean and Dr Edward Abraham


In genetic algorithms it is often taken for granted that selection of the most successful members of a population will result in individuals whose fitness is higher than their ancestors. On the contrary, there are circumstances in which "survival of the fittest" is catastrophically bad, and survival of the least fit leads to the highest population fitness over time. Such situation are succinctly described in terms of the Prisoner's Dilemma concept from game theory. This is reminiscent of the 'no free lunch' result of computational learning theory, a connection which we hope to make explicit.

In the Prisoner's Dilemma, two agents make independent choices about whether they will "cooperate" with one another. What makes the situation interesting is the particular relationship between payoffs and actions: the joint payoff is highest for mutual cooperation, and lowest for mutual defection, yet at the individual level it always pays to defect (i.e. fail to cooperate). If fitnesses are determined from payoffs which have this structure, then selection of the fittest individuals leads to an erosion of the fitness of all survivors, as the number of cooperators drops. If one's goal is to evolve individuals (or populations, for that matter) with high fitness, one should actually select the LEAST fit individuals in each generation.

One widely investigated "escape" from the prisoner's dilemma lies in reciprocation: if contacts between agents are liable to be repeated, it can become "selfish" to cooperate if doing so engenders the cooperation of the other agent: the generic example of such a successful behaviour is the strategy known as tit-for-tat. A particularly interesting phenomenon in this iterate prisoner's dilemma is that a strategy X may beat strategy Y which in turn beats strategy Z, and yet Z may beat X, forming a cycle. If we consider a collection of these strategists occupying a two dimensional spatial array and "invading" one another under very simple dynamics, such cycles can persist indefinitely.

These spatial models exhibit behaviour that is very like self-organized criticality: they spontaneously form fractally distributed "clumpy" patterns having no characteristic length scale. On the other hand, this does not appear to be true in the time domain - a somewhat anomalous result.

In an echo of the situation with simple prisoner's dilemma agents, in these models the fastest (i.e. most consistent) invader is generally not the most successful, nor is the slowest invader the least successful. Simulation results (for the relative areas occupied in the long term) are in excellent agreement with the appropriate mean field theory - an intruiging result given that the mean field theory ignores all spatial structure.

Finally, we note that spatial patchiness among competing species has usually been attributed to variation in the underlying substrate or other extrinsic effects: the above model demonstrates that it can plausibly arise from competitive interactions alone.