4.2 Generating candidate models: general principle

The approach to generating many candidate models used in regional_analysis.Rmd is to generate character strings matching trending models definitions (see previous section) and then using the eval(parse(text = my_text)) trick to turn these into actual models. The simplest approach is to:

  1. generate text matching the right-hand side of the formula of the different models (we later refer to this as model content)

  2. build character strings matching model calls using the terms in 1); sprintf is particularly handy for this part

  3. transform these character strings to actual model calls within a lapply

Here is a toy example illustrating the approach:

As the main thing that changes across models is the model content, the main task boils down to generating combinations of predictors to capture different trends in the data. To this end, we will use expand.grid, which creates all possible combinations of a given set of variables. For instance, to generate all models which:

  • include a date effect
  • may include a test effect
  • may include a weekday effect
  • may include a previous day’s incidence as predictor (cases_lag_1), i.e. autoregressive model

We can use:

We see that mod_content contains the relevant model content, with some issues of additional + signs which will need removing. This can be done using simple regular expressions:

We now have clean model content which can be turned into trending models using the approach illustrated before. In the following sections, we highlight tricks for capturing specific trends in the data, but all ultimately rely on the principle illustrated here.