The Worley Population of England - A Simulation

A simple Monte Carlo program has been used to investigate how many separate Worley families are likely to exist in England today. The idea for the work followed the Y-STR measurements undertaken by DNA Heritage for the Worley Surname DNA Project, which has so far revealed two different haplotypes for English participants and a further five amongst US customers.

The calculation is purely theoretical and proceeds by creating a list of fictional male Worleys, saving four pieces of data for each - the year of birth, year of death, fertility and haplotype number. The program runs between the years 1300 and 2001, moving on one year at a time. At the outset, there are no Worleys, but for each year up to 1400 there is a chance that someone will adopt the name. Each founder is assumed to introduce a new haplotype, and they are numbered from one. For each new year in the sequence, Monte Carlo decisions are made regarding the death of each living person in the list and the creation of new ones as true, adopted or non-paternal sons. At the end of the sequence of years, various statistics are extracted from the list and compared with real data. The whole process is repeated, with slightly different constants and new random-number choices, until a match to the given data is obtained, at which point the desired results – the number of original haplotypes, the number surviving today and the total number of Worleys ever living (in England) – become available.

Founders

At the beginning of each run, the probability of bringing in a new founder in any year 1300-1400 is set to 0.20*random, so runs are undertaken with mean numbers of founders in the range 0-20. As each founder is created, he is allocated the next available haplotype number (thereby assuming that the founders are genetically distinct) and assumed to have the age 24, so that his year of birth is filled in as the current year minus 24.

Fertility

Each new subject is allocated a fertility f = -25+125*random. Any negative reading is recorded as zero, and the number stored is an integer in the range 0-100. The fertility measure is used, together with the current year, the age of the subject and a random number choice to determine whether the subject has a son in any given year.

Addition of Sons

Each male is regarded as having the potential to produce children from the age of 25. To build in the very different family sizes over the period studied, the age range for child production in ancient times is taken as 25-40 and in modern times as 25-30. So for each subject in a given year, we set the lower age limit as 25 and determine the upper by linear interpolation. If the subject is in the resultant age range, his probability of having a child is taken as f/100 and the probability of having a son as f/200. In 1300, a man living to 40 with a fertility of around f=50 could have (40-25)*f/100 = 7.5 children or half as many sons. For comparison, Wrigley and Schofield (p.254) have the mean number of children in completed families as about seven. In 2001, a man living to at least 30 with fertility f=50 will typically produce (30-25)*f/100 ~ 2.5 children or half as many sons.

Having determined the probability that the subject will produce a son in a given year, the actual event is determined by whether a random number is less or greater than the calculated probability. If a son is born, then in the majority of cases (again determined by random number) he is allocated the same haplotype number as his father, but for the remaining small fraction k (taken here as 0.5%, 1% or 2%) of cases, representing adoption, name change and infidelity, the child bears the Worley surname but has a different father and thus introduces a new haplotype.

Child production ends with the death of the father. Because we are interested only in Y-chromosome DNA, children may be the result of successive partners, but such circumstances have not been included in the simulation.

Death

This has proved to be the most complex outcome to simulate. It has been achieved using a probability of death in a given year given by a*exp(-age/b) + c*exp(age/d). The first exponential represents the high risk encountered in the early years, and the second the chance of dying in later years. It was considered desirable, if possible, that the c term be chosen to be small enough so that the second term could be chosen without impacting upon the first (child mortality) term. The four terms a,b,c,d are represented by a₁,b₁,c₁,d₁ in 1300 and a₂,b₂,c₂,d₂ in 2001. Death rates are very different at the two extreme years, so the probability is determined by interpolation between the two sets of parameters. Because it seems implausible that progress has been linear between the two extremes, interpolation instead goes according to the square of the year difference, so that half way between 1300 and 2001 we incur only a quarter of the possible change. For a particular year the interpolation is carried out and then, using the age of the subject, the probability of death is determined. Death occurs if a chosen random number is less than the calculated probability. The constants adopted are: a₁= 0.145, b₁= 1.045 years, c₁= 0.008, d₁= 22.78 years, a₂= 0.00308, b₂= 1.045 years, c₂= 0.00004, d₂= 10.86 years.

The constants a₁ and b₁ are derived from Wrigley and Schofield’s (p249) childhood mortality figures from 1550 onwards, fitted to an exponential.

The constants a₂ and b₂ refer to modern infant mortality, which is currently around 5.0 per thousand in the first year. We fit this to an exponential such that the b term is, arbitrarily, the same as that in 1300.

The constants c₂ and d₂ are chosen to obey two constraints: (i) the probability of death at age 110 to be 1.0 and (ii) the life expectancy at birth (neglecting the infant mortality contribution) should be ~78 years.

The constants c₁ and d₁ were similarly chosen: (i) the probability of death at age 110 should again be 1.0 and (ii) the life expectancy at birth, again neglecting the child component, should be less than the figure available for the USA in 1840, 40 years. The actual constants were chosen so that the program produced a reasonable rate of solutions, and results in a life expectancy of 35 years in 1300. Requiring that both curves pass through 1.0 at age 110 ensures that the probability of death in 1300 is higher at any age than in 2001.

Note that in both cases the c term is much smaller than the a term, so that there is little interference between the two components. Finally, to provide a little more randomness, at the beginning of each run all of the constants a₁..d₂ are adjusted by random amounts in the range +/-5%.

Monte Carlo Method

Though the constants listed above appear very precise, they are used only to determine probabilities of events. Once a probability has been determined (for example, 0.233) then the event will only occur if the next random number generated is less than 0.233. Each event is governed by random number choices and by the events that occurred before it, so there is great scope for a wide range of possible outcomes in any trial. Random numbers are generated by a subtract-and-carry routine.

Reference Data

At the end of each run the list created is compared with three pieces of specific Worley data:

The number of Worleys dying between 1837 and 1864 was long ago extracted from PRO bound volumes, quarter by quarter. The total number is 516. I assume that this implies around 258 males – small inaccuracies and even a missing quarter will not be important because the program cannot be expected to match the target exactly. So the figure is widened by three standard deviations in both directions, yielding a target range of 210-306 male deaths.

The Surname Atlas is based upon the 1881 census, and reports a total Worley population of 1321. After halving and opening up the range, we obtain a target of 583-737 males.

Finally, the 2001 Census produces 2730 Worleys, giving a target range of 1254-1476 males.

One further piece of data is of interest but has not been included as a constraint. The ONS Names List estimates that since the beginning of parish registers in the 16th century, there are, or have been, about 10,300 Worleys in all, and therefore about 5150 males. Because of the ambiguities in this statement and the differing times for the commencement of registers, no attempt has been made to extract this particular statistic from the results.

Results

With the non-paternal event rate k initially set at 1%, repeated runs were made until eighty successes were obtained, and the procedure was then repeated with rates of 0.5% and 2% for a further forty measurements each. Each success typically required a number of trials that could vary in the range several hundred to several thousand. The results are tabulated below and are expressed as the mean and standard deviation in each case.

Non-Paternal Event Rate k	0.5%	1.0%	2.0%
Total Worleys W	4820 +/- 310	4850 +/- 250	4800 +/- 230
Total Haplotypes h	29 +/- 5	53 +/- 9	103 +/- 10
Surviving Haplotypes h´	13 +/- 3	24 +/- 5	48 +/- 6
Total Founders	4.3 +/- 2.3	4.4 +/- 2.6	4.9 +/- 3.3
Surviving Founders' Haplotypes	1.2 +/- 0.6	1.2 +/- 0.5	1.2 +/- 0.4
Highest Fraction	0.85 +/- 0.16	0.80 +/- 0.15	0.69 +/- 0.13

For a sample of successful trials, the population of male Worleys was tracked between 1300 and 2001, and was found to rise exponentially with time. If we express the population as p=A*exp(my) where y is the year, the average value for m was found to be 0.0082+/-7, corresponding to a population which doubles every 84.5 years.

Discussion

The total number of Worleys appears to be unchanged with k, and the number itself, around 4820, is gratifyingly close to the ONS estimate. Two other measurements also appear to be unaffected by changes in k, the total number of founders and the number of founders’ families surviving to this day. Despite the possibility of creating up to about twenty founders, good solutions are obtained only when the number of founders is around four or five. Furthermore, even with such a number of founders, the number of haplotypes due to founders and surviving in the Worley population today is only h´~1.2. It appears that to survive from around 1300 to today cannot be achieved other than with a large number of descendants. If two or more founders begin families, then to achieve today’s population each must rise in number at a slower rate. This appears to drive them too close to extinction, so the numbers of Worleys alive today can only be satisfied with about one progenitor. Patrick Hanks has written:

"On the basis of computational study of the current geographical location of 15,000 British family names, Hanks (1992) concluded that many surnames in Britain are monogenetic, while very many others have "become monogenetic" i.e. the name may have been coined in more than one place and in more than one time in the Middle Ages, but subsequently the family lines descended from all except one of the original medieval bearers have died out. Meanwhile, the descendants of just one medieval bearer have prospered and multiplied, resulting in the modern distribution of the name."

It was a surprise to find that the work had reproduced the above statement so accurately, given the simplicity of the model. A single dominant haplotype appears to emerge because the non-paternal event rate k is small. If k=0.01, and because it takes several generations to produce 100 descendants, it is typically 200-300 years later before a new haplotype is created by this means – and this is too late to develop to the extent of the founder.

For all three sets of data, we find that a large fraction of living, English Worleys belong to one haplotype. The fraction ranges from 0.69 to 0.85. This suggests that we should not have to wait very long for a match within the English results. There is also a clear relationship between k and the total number of haplotypes h. The founders contribute about four haplotypes and the remainder come from the number of non-paternal events, making a total h=4+Wk. For k=0.5%, 1% and 2%, this produces, using W=4820, h=28, 52 and 100 respectively, in good agreement with the observed figures. The number of haplotypes surviving to this day h´ appears to be just below one half of the total number, and is thus also approximately proportional to k. More exactly, we find that the number is close to h´=0.51Wk.

The exponential rise in the population is a useful observation. The time for the population to double, 84.5 years, is about three generations, so each generation contributes a rise of 1.26x. On an individual level, this is hardly explosive growth – each man would expect, on average, to have two viable male great-grandchildren. If our general population were unchanging, with no early death and in which every male formed a couple and had two children, one male and one female, then after three generations a man would be able to claim eight great-grandchildren, only one of which would be a male bearing his name. The growth rate we have found above, with two male Worley great-grandchildren per male, is greater than this steady-state regime, but from the point of view of a man wishing to establish a solid family bearing his name the outlook is still precarious. If the number of male children is a poissonian distribution with a mean of 1.26, the probability that after three generations a man has no male Worley descendants is a very significant 47%. This figure may allow us to understand how only about one in the original perhaps four founders have descendants alive today.

Conclusions

A Monte Carlo routine has been used to model the growth of Worley males in England between 1300 and 2001, using plausible probabilities for the number of founders, birth rates, death rates and the rate of sons with the father's surname but different haplotype. The main conclusions are:

The number of Worley males who ever existed between these years is about W=4820.

The number of founders adopting the name between 1300 and 1400 is about 4-5. Of these, the mean number of founders having DNA signatures still detectable today in the modern Worley population is about 1.2.

The number of haplotypes ever created in the period depends upon the non-paternal event rate k, and is approximately given by h=4+Wk.

The number of haplotypes surviving to 2001 is also dependent upon k, and is approximately given by h´=0.51Wk.

Modern-day Worleys fall into a number of haplotype groups, and it seems likely that there is one predominant group. If k is less than or about equal to 1%, the major haplotype should account for about 80% of all living English Worleys.

Reference

E.A.Wrigley and R.S.Schofield, The Population History of England, C.U.P 1993.