A Spatial Approach to Epidemic Dynamics Using Stochastic Cellular Automata with a Case Study of Novel H1N1 in Illinois keywords: Epidemiology Modeling, Stochastic Cellular Automata, Novel H1N1.
While Novel H1N1 first spread across North America in 2009, there was a lot of panic in the media. Most of the focus was put on its highly contagious and fatal nature. The helpful lesson I learned was on the importance of personal hygiene practice. I was interested in genomics, investigating the biological mechanics of H1N1, yet I did not have the resources for any of it. I set out to devise a model within my reach (one laptop and ten fingers), hoping to track, control, and eliminate the virus. In the summer of 2009, after discussion with some researchers, I came up with a research project trying to identify the origins of the H1N1 pandemic by looking at the data published by WHO. The strategy was to invert the Kermack-McKendrick SIR Model (Susceptible-Infected-Recovered) to solve for the approximate time the epidemic was present in each country. Later on, I presented the paper at a conference. The analysis of the SIR Model was strictly on a time-line. I concluded that “it was plausible to use the SIR model to extrapolate backwards and forwards, making it an option to investigate pandemics when data is still limited.” Afterwards, my curiosity took me towards the question “How can we efficiently model the spatial (geographical) spread of H1N1, or any other infectious disease?” In the beginning, I formulated the question in terms of sets of partial differential equations (PDE). However, solving such a system requires quite a bit of computing power. At the same time, it is very hard to consider realistic scenarios such as quarantine, vaccination, and boundary conditions. Incidentally, I attended a presentation on forest-fire simulations using Cellular Automata (CA) during the same conference. This bottom-up approach was far easier to implement and provided statistical insight along with the relative computational efficiency (Personal Computer versus supercomputer). Also, it was simpler to change the boundary conditions or add epidemic intervention, such as quarantine or vaccination.
Fuentes and Kuperman developed an epidemic model using deterministic CA (1999). While their research concentrated on converting the SIR Model to a discrete form, they did not test their model with any epidemic. In another paper, Schneckenreither, Popper, Zauner, and Breitenecker (2008) did a comparative study of stochastic CA, differential equations, and difference equations in modeling an epidemic. Their work examined how these different models, despite having different methods of spatial interactions, could yield similar results. Their study was from a mathematical perspective, and it did not involve CA’s applicability to a real epidemic.
To test the idea of CA, I made a few intuitive extensions to the past work and wrote a program in Mathematica using stochastic CA and compared the results with Winnebago County Health Department’s surveillance data. To my surprise, the model yielded data values that virtually mimicked the actual spread. I continued to expand and refine my model to a larger scale, such as state and regional. In addition, the following are investigated:
References Fuentes, M. A., & Kuperman, M. N. (1999). Cellular automata and epidemiological models with spatial dependence. Physica A, 267, 471—486.
Schneckenreither, G., Popper, N., Zauner, G., & Breitenecker, F. (2008). Modelling SIR-type epidemics by ODEs, PDEs, difference equations and cellular automata – A comparative study. Simulation Modelling Practice and Theory, 16(8), 1014—1023.
The following is a simulation run for Winnebago County, IL.
White: susceptible, Red: infected, Green: recovered (immune) - The simulation was run 1000 times, and total case numbers (infected and recovered) was recorded
- A t-test was conducted to see if the simulation reasonably matched the actual total case numbers reported by the Winnebago County Department of Health
T-Test Results:
Mean: 159.161 Standard Deviation: 95.9164
T-Score: 0.00530802
P-Value: 0.995881 (I was very surprised with the extremely high p-value at the very first try)
|
