The Last Name Puzzle


Where did all these Smiths Come From?

It is a common practice for the child of two parents to take the last name of the father. There might seem a reasonable chance that this custom would make it unlikely that any last name would die out, or obliterate all others. But since there is no way for a lost name to return, and there is a small chance that any particular name will disappear, then it is reasonable to wonder whether every society with such a rule will end up with everyone having the same last name.

To analyze this problem, let's ruthlessly simplify it. Each person lives just over a year, born on the first second of January first, and dying a few seconds into the next year, after all the women have given birth to the next generation. Moreover, each woman gives birth to two children (not necessarily the same sex). These children get married on the first of April, the wife instantly becomes pregnant, and the cycle continues. Oddly enough, exactly half the children born each year are male.

With these few modest assumptions, it would be easy to program a simple society and see what happens to last names, assuming that there were, say, 26 original couples, with last names "A" through "Z".

Here are some results (not mine) for the relative number of last names existing in the K-th generation per 1,000 distinct names in the starting generation. Note that the actual population is assumed to be much larger than 1,000, and that these are simply relative proportions!

KSurnames
01000
1 752
2 595
3 512
4 451
5 396
10 252
15 194
20 148
25 126
30 108
40 81
50 66
60 58
80 43
100 38
200 24

If we look at a simple version of this problem, in which there are just 8 couples, 4 Smiths and 4 Jones's, we can regard the evolution of the number of names as a sort of random "stagger". Every January first, the number of names changes in a random way (essentially recording the result of 16 fair coin flips), and the system moves along a linear space between the extremes of 8 Smiths and 8 Jones's. Notice that, at any intermediate point, the system is just as likely to move Smithwards as Joneswards; even when there are 7 Smiths and 1 Jones family, the Jones family has a 25% chance of disappearing (the system steps left) and a 25% chance of doubling its representation (the system steps to the right). It's just that, if the system ever steps all the way to one extreme, it has obliterated a name that can never come back.

The interesting thing to note here is that for purposes of name passing, female children don't matter. But that means we can replace this problem by the following one: A society consists of a bunch of last names, with multiple copies. Each year, each copy of a last name has a 25% chance of dying, a 50% chance of surviving, and a 25% chance of being doubled, and we assume that the probabilities are "cooked" so that the total number of copies stays the same from generation to generation.

Name loss is more obvious because of the way a married couple usually takes the man's name, but this problem will still recur in other societies. For example, if the daughter takes the mother's name and the son the father's, then name loss still occurs; a mother can't be guaranteed to have a daughter, so we will still see the eventual "Smith death of the Name Universe"; in more realistic models where some couples have large families and others no children, the homogenization occurs even more rapidly. Even in cases where the population explodes (everybody has lots of babies) the relative proportion of one name is likely to increase, while others dwindle.

In the recent article by Manrubia et al, it is pointed out that, while in a static population, there is an eventual tendency to name loss, but that as long as there are a large number of names, it is possible to estimate the relative frequency of the names. These results hold especially accurately for growing populations. They state that the number of names belonging to exactly n people will be proportional to 1/n^2. Thus, for every name belonging to 200 people, there will be about 100 names belonging to just 20 people.

Reference:

  1. Susanna Manrubia, Bernard Derrida, Damian Zanette,
    Genealogy in the Era of Genomes,
    American Scientist,
    Volume 91, March-April 2003, pages 158-165.


Last revised on 10 March 2003.