Computes maximum likelihood estimates for the parameters of an AD(p) model for categorical longitudinal data. The model is parameterized by transition probabilities, and MLEs are obtained in closed form.
Usage
fit_cat(
y,
order = 1,
blocks = NULL,
homogeneous = TRUE,
n_categories = NULL,
na_action = c("fail", "complete", "marginalize", "em"),
em_max_iter = 100,
em_tol = 1e-06,
em_epsilon = 1e-08,
em_safeguard = TRUE,
em_verbose = FALSE
)Arguments
- y
Integer matrix with n_subjects rows and n_time columns. Each entry should be a category code from 1 to c, where c is the number of categories.
- order
Antedependence order p. Must be 0, 1, or 2. Default is 1.
- blocks
Optional integer vector of length n_subjects specifying group membership. If NULL, all subjects are treated as one group.
- homogeneous
Logical. If TRUE (default), parameters are shared across all groups (blocks are ignored for estimation). If FALSE, separate transition probabilities are estimated for each group.
- n_categories
Number of categories. If NULL (default), inferred from the maximum value in y.
- na_action
Handling of missing values in
y. One of"fail"(default, error if any missing),"complete"(drop subjects with any missing values), or"marginalize"(maximize observed-data likelihood by integrating over missing outcomes), or"em"(useem_catfor orders 0 and 1).- em_max_iter
Maximum EM iterations used when
na_action = "em".- em_tol
EM convergence tolerance used when
na_action = "em".- em_epsilon
Numerical smoothing constant used when
na_action = "em".- em_safeguard
Logical; if
TRUE, use step-halving safeguard inem_catwhenna_action = "em".- em_verbose
Logical; print EM progress when
na_action = "em".
Value
A list of class "cat_fit" containing:
- marginal
List of marginal/joint probabilities for initial time points
- transition
List of transition probability arrays for k = p+1 to n
- log_l
Log-likelihood at MLE
- aic
Akaike Information Criterion
- bic
Bayesian Information Criterion
- n_params
Number of free parameters
- cell_counts
List of cell counts: observed counts for closed-form fits (
na_action = "fail"/"complete"), expected counts from the final E-step for EM fits (na_action = "em"), andNULLforna_action = "marginalize"- convergence
Optimizer convergence code (0 for closed-form solutions)
- settings
List of model settings
Details
For AD(p), the model decomposes as: $$P(Y_1, \ldots, Y_n) = P(Y_1, \ldots, Y_p) \times \prod_{k=p+1}^{n} P(Y_k | Y_{k-p}, \ldots, Y_{k-1})$$
MLEs are computed as empirical proportions:
Marginal/joint probabilities: count / N
Transition probabilities: conditional count / marginal count
Empty cells receive probability 0 (if denominator is also 0).
When na_action = "em", fit_cat() dispatches to
em_cat. In that case, em_safeguard controls step-halving
protection against likelihood-decreasing updates, and returned
log_l/AIC/BIC/cell_counts are synchronized via a final E-step
under the returned parameters.
For order = 2, na_action = "em" is not available and errors
explicitly; use na_action = "marginalize".
References
Xie, Y. and Zimmerman, D. L. (2013). Antedependence models for nonstationary categorical longitudinal data with ignorable missingness: likelihood-based inference. Statistics in Medicine, 32, 3274-3289.
Examples
if (FALSE) { # \dontrun{
# Simulate binary AD(1) data
set.seed(123)
y <- simulate_cat(n_subjects = 100, n_time = 5, order = 1, n_categories = 2)
# Fit model
fit <- fit_cat(y, order = 1)
print(fit)
# Compare orders
fit0 <- fit_cat(y, order = 0)
fit1 <- fit_cat(y, order = 1)
fit2 <- fit_cat(y, order = 2)
c(AIC_0 = fit0$aic, AIC_1 = fit1$aic, AIC_2 = fit2$aic)
# EM fit with missing data
y_miss <- y
y_miss[sample(length(y_miss), size = round(0.15 * length(y_miss)))] <- NA
fit_em <- fit_cat(
y_miss,
order = 1,
na_action = "em",
em_max_iter = 80,
em_tol = 1e-6
)
fit_em$settings$n_iter
fit_em$settings$cell_counts_type
} # }