Skip to contents

Generate simulated longitudinal categorical data from an AD(p) model with specified transition probabilities.

Usage

simulate_cat(
  n_subjects,
  n_time,
  order = 1,
  n_categories = 2,
  marginal = NULL,
  transition = NULL,
  blocks = NULL,
  homogeneous = TRUE,
  seed = NULL
)

Arguments

n_subjects

Number of subjects to simulate.

n_time

Number of time points.

order

Antedependence order p. Must be 0, 1, or 2. Default is 1.

n_categories

Number of categories c. Default is 2 (binary).

marginal

List of marginal/joint probabilities for initial time points. If NULL, uniform probabilities are used. See Details for structure.

transition

List of transition probability arrays for time points k = p+1 to n. If NULL, uniform transitions are used. See Details.

blocks

Optional integer vector of length n_subjects specifying group membership. Used with homogeneous = FALSE.

homogeneous

Logical. If TRUE (default), same parameters for all subjects. If FALSE, marginal and transition should be lists indexed by block.

seed

Optional random seed for reproducibility.

Value

Integer matrix with n_subjects rows and n_time columns, where each entry is a category code from 1 to c.

Details

Data are simulated sequentially:

  1. For k = 1: Draw Y(1) from marginal distribution

  2. For k = 2 to p: Draw Y(k) conditional on Y(1), ..., Y(k-1)

  3. For k = p+1 to n: Draw Y(k) conditional on Y(k-p), ..., Y(k-1)

Parameter structure for marginal:

  • Order 0: List with elements t1, t2, ..., tn, each a vector of length c summing to 1

  • Order 1: List with element t1 (vector of length c)

  • Order 2: List with t1 (vector), t2_given_1to1 (c x c matrix where rows represent conditioning values and columns represent outcomes)

Parameter structure for transition:

  • Order 0: Not used (NULL)

  • Order 1: List with elements t2, t3, ..., tn, each c x c matrix where rows are previous values and columns are current values (rows sum to 1)

  • Order 2: List with elements t3, t4, ..., tn, each c x c x c array where first two indices are conditioning values and third is outcome

References

Xie, Y. and Zimmerman, D. L. (2013). Antedependence models for nonstationary categorical longitudinal data with ignorable missingness: likelihood-based inference. Statistics in Medicine, 32, 3274-3289.

Examples

y <- simulate_cat(n_subjects = 30, n_time = 5, order = 1, n_categories = 3, seed = 1)
dim(y)
#> [1] 30  5