Generate simulated longitudinal categorical data from an AD(p) model with specified transition probabilities.
Usage
simulate_cat(
n_subjects,
n_time,
order = 1,
n_categories = 2,
marginal = NULL,
transition = NULL,
blocks = NULL,
homogeneous = TRUE,
seed = NULL
)Arguments
- n_subjects
Number of subjects to simulate.
- n_time
Number of time points.
- order
Antedependence order p. Must be 0, 1, or 2. Default is 1.
- n_categories
Number of categories c. Default is 2 (binary).
- marginal
List of marginal/joint probabilities for initial time points. If NULL, uniform probabilities are used. See Details for structure.
- transition
List of transition probability arrays for time points k = p+1 to n. If NULL, uniform transitions are used. See Details.
- blocks
Optional integer vector of length n_subjects specifying group membership. Used with homogeneous = FALSE.
- homogeneous
Logical. If TRUE (default), same parameters for all subjects. If FALSE, marginal and transition should be lists indexed by block.
- seed
Optional random seed for reproducibility.
Value
Integer matrix with n_subjects rows and n_time columns, where each entry is a category code from 1 to c.
Details
Data are simulated sequentially:
For k = 1: Draw Y(1) from marginal distribution
For k = 2 to p: Draw Y(k) conditional on Y(1), ..., Y(k-1)
For k = p+1 to n: Draw Y(k) conditional on Y(k-p), ..., Y(k-1)
Parameter structure for marginal:
Order 0: List with elements t1, t2, ..., tn, each a vector of length c summing to 1
Order 1: List with element t1 (vector of length c)
Order 2: List with t1 (vector), t2_given_1to1 (c x c matrix where rows represent conditioning values and columns represent outcomes)
Parameter structure for transition:
Order 0: Not used (NULL)
Order 1: List with elements t2, t3, ..., tn, each c x c matrix where rows are previous values and columns are current values (rows sum to 1)
Order 2: List with elements t3, t4, ..., tn, each c x c x c array where first two indices are conditioning values and third is outcome
References
Xie, Y. and Zimmerman, D. L. (2013). Antedependence models for nonstationary categorical longitudinal data with ignorable missingness: likelihood-based inference. Statistics in Medicine, 32, 3274-3289.
Examples
y <- simulate_cat(n_subjects = 30, n_time = 5, order = 1, n_categories = 3, seed = 1)
dim(y)
#> [1] 30 5