A toolkit for the infectious disease modeller

the odin, dust, mcstate and orderly suite of R packages

Marc Baguelin

4/24/24

Our Raison d’être (why?)

From 1946 to 2024

  • Scarcity
  • Not powerful by today’s standards
  • Technical skills needed to operate
  • Complete knowledge possible
  • Widely available
  • Incredibly more powerful
  • No technical skills needed
  • No one understands 100% how it works

Expectations for ML in G/P Health?

  • Availability of access: large access to computing ressources
  • Computing power/complexity: Model (and data associated) are expected to be big and complex
  • Technical skills: Low entry point
  • Control: Not so much discussed, but relies on expert committee/peer review, emphasizes reproducible and transparent evidence

What did happen during Covid ?

  • Availability: large access to computing ressources but disparities for e.g. HPC
  • Complexity: More models and on average much more complex than before
  • Technical skills: People with low experience in epi modelling did initiate models
  • Control: Very variable in space and time, definitely not full transparency, explosion of pre-prints

How do we get 1-2-3 with 4?

  • We want (1) widely available (2) complex model with (3) low technical entry point BUT (4) reproducible and truly transparent
  • Many ML tools have (1*), (2) and (3) but not (4)
  • Our response to the “black box” problem : modularity (and open source!)
  • When pipelines get complex, no single person person can control everything but each module can be trusted
  • Experts can confidently focus on their bit

Core Components

  • odin: A domain-specific modelling language for generating systems of ordinary differential equations and deterministic and stochastic difference equations
  • dust: Facilitates high-performance parallel computation
  • mcstate: Integrates efficient Bayesian inference algorithms
  • orderly: Simplifies and enhances reproducibility of collaborative data analyses

A flavour of odin

  • Simple ‘R-like’ language to write models in R
  • Intuitive mapping with mathematical objects
deriv(S) <- -beta * S * I / N
deriv(I) <- beta * S * I / N - sigma * I
deriv(R) <- sigma * I

initial(S) <- N - I0
initial(I) <- I0
initial(R) <- 0

N <- user(1e6)
I0 <- user(1)
beta <- user(4)
sigma <- user(2)
\[\begin{align} \frac{dS}{dt} &= -\beta S \frac{I}{N}\\ \frac{dI}{dt} &= \beta S \frac{I}{N} - \sigma I\\ \frac{dR}{dt} &= \sigma I \end{align}\]

What’s there now (and beyond)

History of project

  • odin and orderly pre-existed the COVID pandemic
  • Decision to use odin and build Imperial UK real model “from scratch”
  • Decision based on discussion with our research software engineer team
  • Ambition to build tools with “legacy”
  • Creating mcstate (inference) dust (efficient parallelisation)
  • The process has created a huge technical debt

Application during COVID-19

  • Huge impact, informing critical government decisions in the UK through SPI-M and SAGE, from April 2020 to May 2022, while ensuring constant public access to our model code
  • Despite no real advertisement of the tools (and lack of documentation), the toolkit has been adopted by several groups worldwide

Publications

An inclusive tool

  • Empowering researchers to efficiently utilize the toolkit for policy scenarios by the lowering entry ticket to (complex) modelling pipelines
  • Accelerating uptake and adoption in academic (incl. teaching) and operational settings
  • Supporting computation from web browser to HPC and GPUs

Web app

We can use the wodin web interface for odin to fit to data

https://epimodels.dide.ic.ac.uk/demo/apps/fit/

Harnessing contemporary approach

  • Inference tools (mcstate) developed during the pandemic focused on PMCMC and MCMC
  • Developping mcstate2 including support for:
    • Automatic differentiation for odin models
    • Gradient descent, HMC and NUTS
    • Parallel tempering
    • Efficient hierarchical modelling of large models
  • Modular approach where “models” talk to “samplers”

A flavour of mcstate2

Creating a community

  • Organizing a workshop in the autumn to gather user feedback for interface and API improvement and resource development
  • User input is crucial in enhancing toolkit usability and effectiveness
  • Need to build a community
  • Contact me if interested m.baguelin@imperial.ac.uk

Summary

  • COVID-19 pandemic highlighted the critical need for rapid, reproducible pipelines for the modelling of epidemics
  • odin, dust, dcstate, and orderly developed to enhance teaching, understanding, production, and reproducibility of population-dynamic models
  • Some modelling groups worldwide have already embraced these packages
  • We are working on reimbursing the technical debt contracted during the pandemic

Our ambition

  • Improving efficiency and reproducibility of infectious disease modelling
  • Positioning the toolkit as a key global resource for population dynamic modelling
  • Transforming how infectious diseases and epidemic threats are modelled, leading to better public health outcomes worldwide

The End

Thank you for your attention. I’m happy to take any questions you may have.