Impact of Data Structure, Availability, and Noise Distribution on Parameter Identifiability

In 2020 I was accepted into one of the American Mathematical Society’s (AMS) Mathematics Research Community (MRC) working groups. The focus of our group is Dynamics of Infectious Diseases: Ecological Models Across Multiple Scales. The workshop was postponed in 2020 due to the pandemic, but we still met in small groups throughout the year to begin formalizing research questions before meeting for a week in the summer of 2021. I am in a subgroup focused on studying infectious disease model identifiability.

Identifiability analysis is a fundamental step in the parameter estimation process, which aims at establishing whether it is possible to uniquely estimate parameter values from a given model and data set. Figure 1: Generalized SEIR model used in analysis.

Identifiability can be approached from a structural or practical view. Structural identifiability aims to establish whether the model parameters can be uniquely determined based on the model structure using observations of the input-output behavior of the model. Practical identifiability focuses on characterizing the uncertainty in parameter estimates considering deficiencies in the data being used to calibrate the model. Since practical identifiability includes limitations caused by the quality or availability of experimental data, structural identifiability is a necessary but not sufficient condition for practical identifiability. In this study, we explored both structural and practical identifiability of a generalized Susceptible-Exposed-Infected-Recovered (SEIR) model (see Figure 1).

We are examining the structural identifiability of the SEIR model using different combination of input data including various types of observable data, data frequency, and noise distributions. We can rely on existing theory and software packages such as DAISY to explore conditions related to our model. Results from the structural identifiability analysis will provide new insight model identifiability as it relates to the relationship between model structure and data being used.

We analyze practical identifiability by generating data using known parameters and trying to recover the correct values in a least-squares estimation process. We consider data that represents outbreaks of various magnitudes and lengths. We measure the average relative error (ARE) of our estimates using Monte Carlo simulations, and compare results to a correlation matrix approach. Our results will illustrate conditions by which practical identifiability depends on the observable data, data frequency, and the region of the parameter space being explored.