Simulate categorical antedependence series — simulate

Generate simulated longitudinal categorical data from an AD(p) model with specified transition probabilities.

Usage

simulate_cat(
  n_subjects,
  n_time,
  order = 1,
  n_categories = 2,
  marginal = NULL,
  transition = NULL,
  blocks = NULL,
  homogeneous = TRUE,
  seed = NULL
)

Arguments

n_subjects: Number of subjects to simulate.
n_time: Number of time points.
order: Antedependence order p. Must be 0, 1, or 2. Default is 1.
n_categories: Number of categories c. Default is 2 (binary).
marginal: List of marginal/joint probabilities for initial time points. If NULL, uniform probabilities are used. See Details for structure.
transition: List of transition probability arrays for time points k = p+1 to n. If NULL, uniform transitions are used. See Details.
blocks: Optional integer vector of length n_subjects specifying group membership. Used with homogeneous = FALSE.
homogeneous: Logical. If TRUE (default), same parameters for all subjects. If FALSE, marginal and transition should be lists indexed by block.
seed: Optional random seed for reproducibility.

Value

Integer matrix with n_subjects rows and n_time columns, where each entry is a category code from 1 to c.

Details

Data are simulated sequentially:

For k = 1: Draw Y(1) from marginal distribution
For k = 2 to p: Draw Y(k) conditional on Y(1), ..., Y(k-1)
For k = p+1 to n: Draw Y(k) conditional on Y(k-p), ..., Y(k-1)

Parameter structure for marginal:

Order 0: List with elements t1, t2, ..., tn, each a vector of length c summing to 1
Order 1: List with element t1 (vector of length c)
Order 2: List with t1 (vector), t2_given_1to1 (c x c matrix where rows represent conditioning values and columns represent outcomes)

Parameter structure for transition:

Order 0: Not used (NULL)
Order 1: List with elements t2, t3, ..., tn, each c x c matrix where rows are previous values and columns are current values (rows sum to 1)
Order 2: List with elements t3, t4, ..., tn, each c x c x c array where first two indices are conditioning values and third is outcome

References

Xie, Y. and Zimmerman, D. L. (2013). Antedependence models for nonstationary categorical longitudinal data with ignorable missingness: likelihood-based inference. Statistics in Medicine, 32, 3274-3289.

Examples

y <- simulate_cat(n_subjects = 30, n_time = 5, order = 1, n_categories = 3, seed = 1)
dim(y)
#> [1] 30  5