covariates • space4time

library(space4time)

The space-for-time mark-recapture model is parameterized using formulas for $\theta$ and $p$ . Some variables are automatically made available (see table below). However, adding additional covariates requires additional setup. There are two types of covariates, covariates for indices and individual grouping covariates. The covariates for indices essentially take the place of the variables in the table below in formulas, and these are added using the add_covariates() function. Grouping covariates are included in the auxiliary data frame and specified when fitting models.

Variable	Description
j	Release site
k	Recapture site
s	Release time
t	Recapture time
r	Initial release group
g	Group (individual covariates)
a1	Age during time s (“release”)
a2	Age during time t (“recapture”)

Specifying covariates for indices

Note that all numeric covariates should be centered and z-scaled (i.e. using the scale() function).

Using add_covariates()

Incorporating covariates into space-for-time mark-recapture models is made straightforward by the helper function add_covariates(), which can be used to add one covariate at a time. The function, extract_covariates(), can be used to see how these covariates are added.

If the covariate has missing levels from the indices (a1,a2,j,k,s,t,r,g), then two indicator variables are added. The first is named "ON_" appended to the covariate name, and it is 1 if the covariate has a value for that level, and 0 otherwise. The second is named "OFF_" appended to the covariate name, and it is 0 when the covariate has a value for that level, and 1 otherwise. The covariate is set to 0 for al values it is missing. These two indicator variables can be used to include temporal covariates for particular sites but not include them for the other sites. So, using these we can have transition probabilities from one site vary across a covariate, but for all other sites use the fully saturated model for transition probabilities.

We take advantage of a trick in using formulas, where we can turn on and off different variables.

Additional note

When specifying formulas for theta and p, all aliased parameters are removed from the model matrix prior to fitting the model. This is a technical detail that ensures that fitted models contain only identifiable parameters. A quirk of this is that a parameter included using the formulas can be removed and won’t be included in the fitted model shown when printing the fitted model.

Examples

Adding temporal covariate

Example with site configuration with one initial release site

In this example, we want to incorporate an annual covariate for temperature on the transition rates. In this case, we have mean annual temperatures for the first release site.

set.seed(1)
sim.dat <- sim_simple_s4t_ch()
s4t_ch <- sim.dat$s4t_ch

print(s4t_ch)
#> Capture history object
#> 
#> There are N = 3 with N = 1 sites with holdovers
#> 
#> Sites: 1, 2, 3
#> 
#> Sites with holdovers: 1
#> 
#> Site -> site:
#> 1 -> 2
#> 2 -> 3
#> 3 -> 
#> 
#> Age range per site:
#> 1: 1-3
#> 2: 1-3
#> 3: 1-3

temperature_data <- data.frame(s =    c(1, 2, 3, 4),
                               j =    c(1, 1, 1, 1),
                               temp = c(12.1,13.4,11.3,15.3)
                               )

# center and scale data
temperature_data$temp <- scale(temperature_data$temp)

Next, we can add the data to the s4t_ch object.

s4t_ch2 <- add_covariates(cov_df = temperature_data,
                          s4t_ch = s4t_ch)
#> Joining with `by = join_by(s, j)`
#> Following levels are missing from index j when joining covariate(s) temp:
#> Level(s):2
#> Joining with `by = join_by(s, j)`
#> Some levels from covariate(s) temp were missing. Added columns to cov_theta to
#> split observations between observed and unobserved values: OFF_temp; ON_temp
#> Missing levels:
#>   s j
#> 1 1 2
#> 2 2 2
#> 3 3 2
#> 4 4 2

extract_covariates(s4t_ch2)
#> Indices (and covariates) for state transitions:
#>    a1 a2 s t j k r g       temp OFF_temp ON_temp
#> 1   1  1 1 1 1 2 1 1 -0.5297258        0       1
#> 2   1  2 1 2 1 2 1 1 -0.5297258        0       1
#> 3   1  3 1 3 1 2 1 1 -0.5297258        0       1
#> 4   2  2 1 1 1 2 1 1 -0.5297258        0       1
#> 5   2  3 1 2 1 2 1 1 -0.5297258        0       1
#> 6   3  3 1 1 1 2 1 1 -0.5297258        0       1
#> 7   1  1 2 2 1 2 1 1  0.2147537        0       1
#> 8   1  2 2 3 1 2 1 1  0.2147537        0       1
#> 9   1  3 2 4 1 2 1 1  0.2147537        0       1
#> 10  2  2 2 2 1 2 1 1  0.2147537        0       1
#> 11  2  3 2 3 1 2 1 1  0.2147537        0       1
#> 12  3  3 2 2 1 2 1 1  0.2147537        0       1
#> 13  1  1 1 1 2 3 1 1  0.0000000        1       0
#> 14  2  2 1 1 2 3 1 1  0.0000000        1       0
#> 15  3  3 1 1 2 3 1 1  0.0000000        1       0
#> 16  1  1 2 2 2 3 1 1  0.0000000        1       0
#> 17  2  2 2 2 2 3 1 1  0.0000000        1       0
#> 18  3  3 2 2 2 3 1 1  0.0000000        1       0
#> 19  2  2 3 3 2 3 1 1  0.0000000        1       0
#> 20  3  3 3 3 2 3 1 1  0.0000000        1       0
#> 21  3  3 4 4 2 3 1 1  0.0000000        1       0
#> 
#> Indices (and covariates) for detection probability:
#>    a1 a2 s t j k r g       temp
#> 1   1  1 1 1 1 2 1 1 -0.5297258
#> 2   1  2 1 2 1 2 1 1 -0.5297258
#> 3   1  3 1 3 1 2 1 1 -0.5297258
#> 4   2  2 1 1 1 2 1 1 -0.5297258
#> 5   2  3 1 2 1 2 1 1 -0.5297258
#> 6   3  3 1 1 1 2 1 1 -0.5297258
#> 7   1  1 2 2 1 2 1 1  0.2147537
#> 8   1  2 2 3 1 2 1 1  0.2147537
#> 9   1  3 2 4 1 2 1 1  0.2147537
#> 10  2  2 2 2 1 2 1 1  0.2147537
#> 11  2  3 2 3 1 2 1 1  0.2147537
#> 12  3  3 2 2 1 2 1 1  0.2147537

The fully saturated formula for theta is $\theta \sim a1 * a2 * j * s$ (see vignette on formulas). Two different examples of how I recommend to include covariates are shown below. The formulas are slightly different and result in almost identical models.

The both of these models allow for temperature to explain the variation in transition rates for the first site, but not for the second or third site. This is accomplished by using : to multiply the variables for ON_temp and OFF_temp by the other variables (if we used a * instead the formula would include an intercept for ON_temp and OFF_temp, which wouldn’t make sense).

The first line for the theta_formula (OFF_temp:a1*a2*j*s) is only for the transition rates of the second and third site, because OFF_temp is 1 for the second and third sites. This allows for the saturated model that to be fit for the sites where we are not modeling these covariates.

The second line of the formula (ON_temp:temp:a1*a2 or ON_temp:temp*a1*a2) for theta specifies how the effects of temperature are modeled for the first site. We allow for temperature to effect transition rates and for this to vary depending on the age of fish at that site and how long they take to transition to the next site.

The third line (ON_temp:a1:a2 or ON_temp:a1*a2) specifies for the first site how transition rates vary by age and how long they take to transition to the next site. I think of this third line as acting as an intercept for how transition rates typically vary across age, and the second line describes how this varies by year as a function of temperature.

My preference is for the parameterization shown in model m1 over m2. However, the estimates of cohort transitions and apparent survival should always be compared to the fully saturated model (m0 below) to ensure that the covariate fit well. Then, LOO-PSIS can be used to compare models.

m1 <- fit_s4t_cjs_rstan(p_formula = ~ t * a2,
                        theta_formula = ~ OFF_temp:a1*a2*j*s +
                                ON_temp:temp:a1*a2  + 
                                ON_temp:a1:a2,
                        ageclass_formula =  ~ FL,
                        s4t_ch = s4t_ch2)

m2 <- fit_s4t_cjs_rstan(p_formula = ~ t * a2,
                        theta_formula = ~ OFF_temp:a1*a2*j*s +
                                ON_temp:temp*a1*a2  + 
                                ON_temp:a1*a2,
                        ageclass_formula =  ~ FL,
                        s4t_ch = s4t_ch2)


m0 <- fit_s4t_cjs_rstan(p_formula = ~ t * a2,
                        theta_formula = ~ a1*a2*j*s,
                        ageclass_formula =  ~ FL,
                        s4t_ch = s4t_ch2)

Example with site configuration with two or more initial release sites

The formulas from above change if there are more than one initial release site. In this example, there are two (or more) initial release site. For instance, three rotary screw traps where we have temperature data that varies by site and year.

Again, the first line in the formula (OFF_temp:a1*a2*j*s*r) indicates fully saturated transition rates for the sites without temperature data, which are all the recapture sites.

The second line (ON_temp:temp:j or ON_temp:temp*j), allows for the effect of temperature to depend on the site (initial release site).

The third line (ON_temp:temp:a1*a2 or ON_temp:temp*a1*a2) allows for the effect of temperature to vary across age (age at the rotary screw traps and then how long they take to transition to the next site).

The fourth line (ON_temp:j:a1:a2 or ON_temp:j*a1*a2) allows for each site how transition rates vary across age (age at the initial release site and then how long they take to transition to the next site). Note that temperature is not included in this fourth line.

m3 <- fit_s4t_cjs_rstan(p_formula = ~ a2 * r * t,
                        theta_formula =  ~ OFF_temp:a1*a2*j*s*r +
                                ON_temp:temp:j + 
                                ON_temp:temp:a1*a2  + 
                                ON_temp:j:a1:a2,
                              ageclass_formula = ~ FL,
                              s4t_ch =s4t_ch2)

m4 <- fit_s4t_cjs_rstan(p_formula = ~ a2 * r * t,
                        theta_formula =  ~ OFF_temp:a1*a2*j*s*r +
                                ON_temp:temp*j + 
                                ON_temp:temp*a1*a2  + 
                                ON_temp:j*a1*a2,
                              ageclass_formula = ~ FL,
                              s4t_ch =s4t_ch2)

Specifying individual grouping covariates

In progress…