Mathematical modelling of infectious diseases

M. J. Keeling* and L. Danon

+ Author Affiliations

Biological Sciences, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK

*Correspondence to: Prof. M. J. Keeling, Biological Sciences, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK. E-mail: m.j.keeling@warwick.ac.uk

Accepted September 24, 2009.

Next Section

Abstract

Introduction Mathematical models allow us to extrapolate from current information about the state and progress of an outbreak, to predict the future and, most importantly, to quantify the uncertainty in these predictions. Here, we illustrate these principles in relation to the current H1N1 epidemic.

Sources of data Many sources of data are used in mathematical modelling, with some forms of model requiring vastly more data than others. However, a good estimation of the number of cases is vitally important.

Areas of agreement Mathematical models, and the statistical tools that underpin them, are now a fundamental element in planning control and mitigation measures against any future epidemic of an infectious disease. Well-parameterized mathematical models allow us to test a variety of possible control strategies in computer simulations before applying them in reality.

Areas of controversy The interaction between modellers and public-health practitioners and the level of detail needed for models to be of use.

Growing points The need for stronger statistical links between models and data.

Areas timely for developing research Greater appreciation by the medical community of the uses and limitations of models and a greater appreciation by modellers of the constraints on public-health resources.

Key words

modelling swine flu prediction uncertainty

Previous Section

Next Section

Introduction

The progress of an epidemic through the population is highly amenable to mathematical modelling. In particular, the first attempt to model and hence predict or explain patterns dates back over 100 years,1 although it was the work of Kermack and McKendrick2 that established the basic foundations of the subject. These early models, and many subsequent revisions and improvements,3,4 operated on the principle that individuals can be classified by their epidemiological status—most simply susceptible to the infection, infected and therefore infectious, and recovered and hence no longer infectious. (We stress that this classification is based upon an individual’s ability to host and transmit a pathogen, and may be relatively unconnected to their medical status.) In this review, we focus on how such models can be used to predict the future outcome of an epidemic process (or the impact of control measures); however, models may also have a more theoretical use as explanatory tools elucidating fundamental principles of transmission and the factors driving epidemic behaviour.

The so-called SIR model is one of the simplest and most fundamental of all epidemiological models. It is based upon calculating the proportion of the population in each of the three classes (susceptible, infected and recovered) and determining the rates of transition between these classes (Fig. 1). In the simplest model of a single epidemic, births and deaths can often be ignored, and so, only two transitions are possible: infection (moving individuals from the susceptible to the infected class) and recovery (moving individuals from the infected to the recovered class). It is generally assumed (and supported by epidemic data) that the per capita rate that a given susceptible individual becomes infected is proportional to the prevalence of infection in the population;5 while for simplicity it is often assumed that infected individuals recover at a constant rate.2 To make progress even with this simple model requires modellers to estimate two parameters: the proportionality constant for infection and the recovery rate. This illustrates the fundamental relationship between models and statistics; without a good statistical estimation of parameters from epidemiological data, models cannot be used as a predictive tool, they can only illustrate general concepts. This interplay between models and statistics is something we shall return to later.

Fig. 1

View larger version:

In this page In a new window

Download as PowerPoint Slide

Fig. 1

From left to right: a pictorial representation of the flow of individuals between classes in the SIR model. The basic differential equations for the SIR model which give the rate of change of the proportion in each class (negative values reflect flows out of a class, whereas positive values reflect flows into the class). The result of numerically solving the SIR model, showing how the proportion of susceptible, infected and recovered individuals in the population is predicted to change over time.

Once the recovery and transmission parameters have been estimated, the SIR model predicts an epidemic that follows recognized patterns: the number of cases initially increases exponentially until the proportion of susceptible in the population has been sufficiently depleted that the growth rate slows; this process continues until the epidemic can no longer be sustained and the number of cases drops eventually leading to extinction of the infection. The simple SIR model (Fig. 1) produces three general predictions that have important public-health implications and are supported by a range of more complex models.3,4

The fundamental parameter that governs the epidemic behaviour is the basic reproductive ratio, R0; which is defined as the average number of secondary cases produced by a single infectious individual in a totally susceptible population.3 Values of R0 1 mean that an epidemic is possible. (For the current H1N1 pandemic in the UK, it is estimated that R0 is ∼1.4; note that R0 depends on both the infection and the population.)

At the end of an epidemic, a proportion of the population remains susceptible2 (has not been infected). This proportion becomes very small for even moderately large values of R0, but for R0 = 1.4, we would only expect ∼51% of the population to get infected. (More complex models change this precise value, but not the general concept.)

Vaccination operates by reducing the pool of susceptible individuals, and when this is reduced sufficiently, an infectious disease cannot spread within the population. Most importantly, it is not necessary to vaccinate everyone to prevent an epidemic; immunizing someone not only protects that person but confers some protection to the population in general. The classic result is that to eradicate an endemic infection or prevent a novel pandemic, a proportion 1–1/R0 of the population needs to be successfully immunized;3 so for the current pandemic, we would need to immunize ∼29% of the population. (More complex models show that this value can be reduced if vaccination is carefully targeted.3,4)

We now introduce an alternative approach to modelling the progress of an epidemic, before considering extensions of the SIR model that increase its realism and predictive accuracy. Given the recent increase in computational power, it is now feasible to develop an individual-based model for relatively large populations.6 Here, the concept is to describe the status and interactions of each person in the population, rather than trying to estimate the number of people with a particular status. This change from a population-level to an individual-level perspective is incredibly powerful and allows a wide range of biologically and socially realistic assumptions to be included.7 The difficulty with such individual-based approaches is 3-fold: First, we currently have a very limited understanding of the behaviour of individuals and the range of variability, and while recent work using diary-based studies8 or mobile phones9,10 aims to dissect the interactions that could lead to disease transmission, it is still unclear how well these data describe the interactions of individuals with symptoms. Secondly, during an epidemic, the vast majority of data that are collected is at the population-scale (such as estimates of the number of cases in a region), which is ideal for parameterizing population-scale models (such as the SIR model) but more statistically challenging to use in an individual-level approach.6 Finally, due to the complexity and computational costs of individual-level models, it may be difficult to obtain general insights or to assess the implications of particular underlying assumptions. For these reasons, population-scale models based on the SIR paradigm are most often used for short-term public-health predictions, whereas individual-level models are more commonly used as planning tools.

Although the SIR model provides a simple and generic framework for understanding and predicting epidemiological dynamics, a number of modifications are possible which increase the model’s realism but also increase the number of parameters that have to be estimated.4 We consider these in the chronological order that they were developed, focusing on the new insights that are provided and the reasons why such extra features were included.

Previous Section

Next Section

Age structure

Largely prompted by the implementation of mass-vaccination control programmes against a range of childhood infections, mathematical models began to structure the population by age.11 This has two main implications that interact: first, older individuals are more likely to have been exposed to infection (simply because they have been around for longer), and secondly, people tend to preferentially mix with others of a similar age—a principle known as assortativity. The vast majority of this age-structured modelling was performed for measles, where the mixing between school children drives the epidemic process, and so the school holidays have a dramatic impact.11,12 This work has strong resonance with the current modelling and statistical analysis of the H1N1 pandemic, due to the age-dependent susceptibility that has been recorded (with young children being much more susceptible than adults) and due to the role that school closures and school holidays may play in limiting epidemic spread.

Previous Section

Next Section

Stochasticity

The persistence of infections, particularly childhood infections, within a population prompted the study of stochastic models, in which the number of individuals in any class is always an integer (whole number) and events happen at random but with a given underlying probability that is based on the associated deterministic model. These stochastic models generate different epidemics on each realization and thus capture the variability in the epidemic profile. Apart from this obvious variability, two major results arise from this stochastic approach. Focusing on measles in cities in England and the USA, early studies established the existence of a critical population size, below which an infectious disease is unable to persist without reintroduction.13 Secondly, even when R0 > 1, the success of an epidemic is not guaranteed, chance events can lead to the early extinction of any outbreak. Following a single introduction of disease, the chance of extinction (without generating a major epidemic) is given by 1/R0 and so decreases rapidly as R0 increases.14 For the H1N1 outbreak, where the probability of failure from a single introduction is ∼71%, stochastic effects played an important role in the initial stages of the epidemic.

Previous Section

Next Section

Risk structure

With the growth of the HIV epidemic in the late 1980s and early 1990s, considerable attention was focused towards understanding the spread of this and other sexually transmitted infections (STIs).15 Clearly for STIs, the dominant risk factor for becoming infected is the number of sexual partners (and unprotected sex acts), and it was therefore seen as vital to structure the population into multiple risk groups.3 Such risk-structured models highlighted how various sections of the community were at far greater risk than others both due to their behaviour and due to their increased interaction with other high-risk individuals. For H1N1, there are clear comparisons between age- and risk-structured models (as age is itself a risk factor); however, other risk groups could be considered: for example, health-care workers could be modelled as a high-risk group due to their potentially greater contact with infected individuals.

Previous Section

Next Section

Infectious distributions

The chronic nature of HIV drew attention to the within-host dynamics and distributions of infectious periods.16 The challenge is to accurately capture the impact of within-host processes at the population level. Changes in viral load, which correspond to varying infectivity, are generally modelled by movement between multiple infection states. The speed with which an epidemic spreads through a population depends on the generation time—the length of time between successive generations of infected individuals—which is defined by the infectiousness profile. Therefore, including variability in infectiousness affects predictions about the speed of epidemic spread, the impact of stochasticity, the value of and the impact of control measures.17,18

Previous Section

Next Section

Spatial structure

A clear failing of the SIR models is the inability to describe any spatial aspects of the spread of disease. The Foot and Mouth Disease epidemic of 2001 highlighted the importance of spatially explicit modelling as transmission between farms was a highly localized process.19,20 Such models pointed to the local depletion of susceptibles as a mechanism for slowing epidemic spread compared with a fully mixed population, and the potential for locally targeted measures to control and contain an outbreak.6 Although many of these concepts do translate for the current H1N1 outbreak, the distance moved by people generally reduces these spatial effects and leads to relatively synchronized epidemics across the whole of the UK.

The basic SIR model and all of the above extra features are all part of a general modelling framework that could be applied to a range of directly transmitted infectious pathogens. To make models that are specific to influenza, or specific to the current H1N1 pandemic in the UK, requires that the models are carefully parameterized to match available data, and that this parameterization reflects both statistical uncertainty and uncertainty in the data itself. Three main data sources are available in the UK, each of which provides important insights into particular elements of epidemiological dynamics and each of which have associated difficulties:

The first few hundred (FF100) database was compiled (as the name suggests) from detailed data gathered on the first few hundred cases (Fig. 2). In fact, information exists on cases of H1N1 that were laboratory confirmed together with contacts that were successfully traced in an effort to control the initial outbreak. Such data provides the only reliable estimates of individual-level data such as the basic reproductive ratio (average number of secondary cases produced per identified case), and the delays between infection, subsequent transmission and the onset of symptoms. However, several problems exist with the interpretation of this information, primarily due to the nature of contact tracing itself. It is implicitly assumed that all secondary cases have been successfully traced and that we can successfully identify the infecting individual for each case. In addition, initial cases are only usually identified once symptoms arise, so infection pathways often have to be inferred retrospectively. Finally, these data were collected at a time when the UK was prophylactically administering anti-virals to family members and other close contacts; therefore, there is some uncertainty in how these data translate to the current situation.

Fig. 2

View larger version:

In this page In a new window

Download as PowerPoint Slide

Fig. 2

(A) Examples of the type of information that can be gained from the FF100 data; black ovals represent individuals testing positive, white ovals represent individuals testing negative and the arrow show the direction of tracing. As exemplified in the left-hand figure, knowing the average number of infected contacts per source case allows us to calculate the basic reproductive ratio. The right-hand figure shows how information on the date of onset of symptoms and the date of contact can allow us to estimate incubation, latent and infectious periods. (B) Examples of the type of network data available from the FF100 scheme; black rectangles represent initial source infections, while ovals represent traced individuals; black ovals represent individuals testing positive, white ovals represent individuals testing negative. Lines show routes of tracing. The number of secondary infected individuals from each primary case provides a measure of the basic reproductive ratio.

The Qflu database contains information from patients contacting their GP and being diagnosed with influenza-like illness (ILI). Qflu operates from around 3300 general practices spread throughout the UK covering a total population of almost 22 million potential patients. Qflu therefore provides an age- and regionally stratified picture of the unfolding epidemic at the population-scale and was the main data source until 23 July when the National Pandemic Flu Service came into operation in England. The naive assumption would be to assume that the number of ILI cases accurately reflects the number of cases in the UK; however, three main biases disrupt this ideal. First, not everyone who is ill with the current H1N1 pandemic contacts his or her GP. Secondly, not everyone who is diagnosed with an ILI actually has H1N1 infection; only 30–40% of swabs taken at sentinel GP surgeries are positive for H1N1, although the sensitivity of this methodology is unknown. Finally, there may be strong age-related biases in consulting a GP, with younger children more likely to be taken as a precautionary measure.

Finally on 23 July, the National Pandemic Flu Service (NPFS) began operation in England. The majority of those symptomatic for influenza were requested to call this service or visit the web site, whereas pregnant women, children under one and people living in Wales and Scotland should continue to contact their GP. Again difficulties and biases exist with this data. In addition to the points made above with respect to Qflu, there is also the extreme difficulty of matching the number of cases reported before and after the start of NPFS as this was co-incident with the start of school holidays in many areas and a decline in the epidemic.

The main difficulty with these multiple data sources is the inability to accurately estimate the number of true H1N1 cases at any point in time, with often a 4-fold difference between maximum and minimum estimates. This uncertainty is translated into an inability to predict the proportion of cases that require hospital treatment (as although we know the current number in a hospital, we do not accurately know the total number of cases in the population), and by a similar reasoning, we are unable to accurately predict the expected case fatalities. Figure 3 illustrates two further difficulties in predicting the course of any outbreak from early case-reporting data. In Figure 3A, the initial proportion of the population who are susceptible to infection is unknown (and varied from 50% to 100%), leading to a range of possible outcomes all of which would match the early growth in a number of cases. In Figure 3B, we use a simple age-structured model of children and adults but assume that the relative mixing between these age-groups is unknown; again for parameters that match the early growth and the relative number of cases in children and adults, a wide variety of predicted outcomes are possible.

Fig. 3

View larger version:

In this page In a new window

Download as PowerPoint Slide

Fig. 3

Examples of the uncertainty in predicting the future course of an outbreak from early epidemic data. (A) The proportion of the population initially susceptible to the infection is unknown; results are from a simple SIR model parameterized to match both the observed growth rate and the observed basic reproductive ratio (R0 = 1.4). (B) We use an age-structured model of children and adults parameterized such that all epidemics have the same early growth, basic reproductive ratio and ratio of cases in adults and children; here, uncertainty enters in the relative transmission rates within and between the two age-classes.

However, despite these difficulties, mathematical modelling supported by solid statistical analysis does produce many useful predictions about the current pandemic. Most importantly, mathematical models allow us to rigorously quantify our uncertainty in the epidemic to date and to extend this uncertainty into predictions about the future. Therefore, although current data preclude accurate prediction of the epidemic, recognizing and quantifying uncertainty allow us to develop plausible worst-case scenarios to aid public-health planning in the months ahead. In addition, despite the uncertainty, models for the current H1N1 pandemic can be used to explore the impact of control measures; for example, we can ask whether given the range of plausible models that agree with the currently available data, are there any scenarios in which vaccine should not be targeted to the high-risk individuals, or whether there are scenarios in which the predicted autumn/winter epidemic will exceed health-service capacity. It is in this role that models, of various levels of complexity, could have the greatest public-health benefit.

Previous Section

Next Section

Funding

This research was supported by the Medical Research Council.

Previous Section

Next Section

Acknowledgements

This research was supported by the Medical Research Council. We thank Thomas House for his very helpful comments on the manuscript, and Sam Mason for his help visualizing the FF100 data for Figure 2.

© The Author 2009. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Previous Section

References

↵ Hamer W. Epidemic diseases in England—the evidence of variability and of persistency of type. Lancet 1906;1:733-739.

CrossRefGoogle Scholar

↵ Kermack WO, McKendrick AG. Contribution to the mathematical theory of epidemics. Proc R Soc Lond A 1927;115:700-721.

CrossRefGoogle Scholar

↵ Anderson RM, May RM. Infectious Diseases of Humans. Oxford, UK: Oxford University Press; 1991.

Google Scholar

↵ Keeling MJ, Rohani P. Modeling Infectious Diseases. New Jersey, USA: Princeton University Press; 2008.

Google Scholar

↵ Begon M, Turner J. A clarification of transmission terms in host-microparasite models: numbers, densities and areas. Epidemiol Infect 2002;129:147-153.

CrossRefMedlineGoogle Scholar

↵ Riley S. Large-scale spatial-transmission models of infectious disease. Science 2007;316:1298-1301.

Abstract/FREE Full Text

↵ Ferguson NM, Cummings DA, Fraser C, Cajka JC, Cooley PC, Burke DS. Strategies for mitigating an influenza pandemic. Nature 2006;442:448-452.

CrossRefMedlineWeb of ScienceGoogle Scholar

↵ Mossong J, Hens N, Jit M, et al. Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Med 2008;5:381-391.

CrossRefWeb of ScienceGoogle Scholar

↵ González M, Hidalgo C, Barabasi A-L. Understanding individual human mobility patterns. Nature 2008;453:779-782.

CrossRefMedlineWeb of ScienceGoogle Scholar

↵ Eagle N, Pentland A. Eigenbehaviors: identifying structure in routine. Behav Ecol Sociobiol 2009;63:1057-1066.

CrossRefWeb of ScienceGoogle Scholar

↵ Schenzle D. An age-structured model of pre- and post-vaccination measles transmission. IMA J Math Appl Med Biol 1984;1:169-191.

Abstract/FREE Full Text

↵ Bolker BM. Chaos and complexity in measles models—a comparative numerical study. IMA J Math Appl Med Biol 1993;10:83-95.

Abstract/FREE Full Text

↵ Bartlett MS. Measles periodicity and community size. J R Stat Soc A 1957;120:48-70.

CrossRefWeb of ScienceGoogle Scholar

↵ Bartlett MS. Deterministic and stochastic models for recurrent epidemics. Proc Third Berkley Symp Math Stat Prob 1956;4:81-108.

Google Scholar

↵ May RM, Anderson RM. Transmission dynamics of HIV-infection. Nature 1987;326:137-142.

CrossRefMedlineWeb of ScienceGoogle Scholar

↵ Nowak M, May RM. Virus Dynamics. Oxford, UK: Oxford University Press; 2005.

Google Scholar

↵ Lloyd-Smith JO, Galvani AP, Getz WM. Curtailing transmission of severe acute respiratory syndrome within a community and its hospital. Proc R Soc Lond B 2003;270:1979-1989.

Abstract/FREE Full Text

↵ Wearing HJ, Rohani P, Keeling MJ. Appropriate models for the management of infectious diseases. PLoS Med 2005;2:621-627.

CrossRefWeb of ScienceGoogle Scholar

↵ Ferguson NM, Donnelly CA, Anderson RM. Transmission intensity and impact of control policies on the foot and mouth epidemic in Great Britain. Nature 2001;413:542-548.

CrossRefMedlineWeb of ScienceGoogle Scholar

↵ Keeling MJ, Woolhouse ME, Shaw DJ, et al. Dynamics of the 2001 UK foot and mouth epidemic: stochastic dispersal in a heterogeneous landscape. Science 2001;294:813-817.

Abstract/FREE Full Text

Related articles

Article:

Norman Vetter

Editor’s Choice

Br Med Bull (2009) 92 (1): 1-5 doi:10.1093/bmb/ldp041

Extract Full Text (HTML) Full Text (PDF)

Articles citing this article

The Serial Intervals of Seasonal and Pandemic Influenza Viruses in Households in Bangkok, Thailand

Am J Epidemiol (2013) 177 (12): 1443-1451

AbstractFull Text (HTML)Full Text (PDF)

« Previous | Next Article »

Table of Contents

This Article

Br Med Bull (2009) 92 (1): 33-42.

doi: 10.1093/bmb/ldp038

First published online: October 24, 2009

AbstractFree

» Full Text (HTML)Free

Full Text (PDF)Free

All Versions of this Article:

ldp038v1

92/1/33 most recent

-Classifications

Article

-Services

Article metrics

Alert me when cited

Alert me if corrected

Find similar articles

Similar articles in PubMed

Add to my archive

Download citation

Request Permissions

Disclaimer

+Citing Articles

+Google Scholar

+PubMed

+Related Content

-Share

Add to CiteULike Add to Delicious Add to Facebook Add to Mendeley Add to Twitter

What’s this?

Navigate This Article

Top

Abstract

Introduction

Age structure

Stochasticity

Risk structure

Infectious distributions

Spatial structure

Funding

Acknowledgements

References

This Article

Br Med Bull (2009) 92 (1): 33-42.

doi: 10.1093/bmb/ldp038

First published online: October 24, 2009

AbstractFree

» Full Text (HTML)Free

Full Text (PDF)Free

All Versions of this Article:

ldp038v1

92/1/33 most recent

Navigate This Article

Top

Abstract

Introduction

Age structure

Stochasticity

Risk structure

Infectious distributions

Spatial structure

Funding

Acknowledgements

References

Search this journal:

Go

Advanced »

Current Issue

September 2015 115 (1)

British Medical Bulletin

Alert me to new issues

The Journal

About this journal

Rights & Permissions

Dispatch date of the next issue

We are mobile – find out more

Journals Career Network

Impact factor: 3.658

5-Yr impact factor: 4.420

Editor-in-Chief

Dr Norman J Vetter

View full editorial board

Alerting Services

Email table of contents

Email Advance Access

CiteTrack

XML RSS feed

Corporate Services

Reprints

Supplements

For Authors

Oxford Open

Open access options for authors – visit Oxford Open

Self-archiving policy

Instructions to authors

PMC Logo

This journal enables compliance with the NIH Public Access Policys Policy

This journal is fully compliant

Most Read

Most Cited

Hazards of heavy metal contamination

Health hazards and waste management

Contaminants in drinking water: Environmental pollution and health

Noise pollution: non-auditory effects on health

Environmental pollution and the global burden of disease

» View all Most Read articles

Online ISSN 1471-8391 – Print ISSN 0007-1420

Copyright © 2015 Oxford University Press