Fit categorical antedependence model by maximum likelihood

Computes maximum likelihood estimates for the parameters of an AD(p) model for categorical longitudinal data. The model is parameterized by transition probabilities, and MLEs are obtained in closed form.

Usage

fit_cat(
  y,
  order = 1,
  blocks = NULL,
  homogeneous = TRUE,
  n_categories = NULL,
  na_action = c("fail", "complete", "marginalize", "em"),
  em_max_iter = 100,
  em_tol = 1e-06,
  em_epsilon = 1e-08,
  em_safeguard = TRUE,
  em_verbose = FALSE
)

Arguments

y: Integer matrix with n_subjects rows and n_time columns. Each entry should be a category code from 1 to c, where c is the number of categories.
order: Antedependence order p. Must be 0, 1, or 2. Default is 1.
blocks: Optional integer vector of length n_subjects specifying group membership. If NULL, all subjects are treated as one group.
homogeneous: Logical. If TRUE (default), parameters are shared across all groups (blocks are ignored for estimation). If FALSE, separate transition probabilities are estimated for each group.
n_categories: Number of categories. If NULL (default), inferred from the maximum value in y.
na_action: Handling of missing values in y. One of "fail" (default, error if any missing), "complete" (drop subjects with any missing values), or "marginalize" (maximize observed-data likelihood by integrating over missing outcomes), or "em" (use em_cat for orders 0 and 1).
em_max_iter: Maximum EM iterations used when na_action = "em".
em_tol: EM convergence tolerance used when na_action = "em".
em_epsilon: Numerical smoothing constant used when na_action = "em".
em_safeguard: Logical; if TRUE, use step-halving safeguard in em_cat when na_action = "em".
em_verbose: Logical; print EM progress when na_action = "em".

Value

A list of class "cat_fit" containing:

marginal: List of marginal/joint probabilities for initial time points
transition: List of transition probability arrays for k = p+1 to n
log_l: Log-likelihood at MLE
aic: Akaike Information Criterion
bic: Bayesian Information Criterion
n_params: Number of free parameters
cell_counts: List of cell counts: observed counts for closed-form fits (na_action = "fail"/"complete"), expected counts from the final E-step for EM fits (na_action = "em"), and NULL for na_action = "marginalize"
convergence: Optimizer convergence code (0 for closed-form solutions)
settings: List of model settings

Details

For AD(p), the model decomposes as: $$P(Y_1, \ldots, Y_n) = P(Y_1, \ldots, Y_p) \times \prod_{k=p+1}^{n} P(Y_k | Y_{k-p}, \ldots, Y_{k-1})$$

MLEs are computed as empirical proportions:

Marginal/joint probabilities: count / N
Transition probabilities: conditional count / marginal count

Empty cells receive probability 0 (if denominator is also 0).

When na_action = "em", fit_cat() dispatches to em_cat. In that case, em_safeguard controls step-halving protection against likelihood-decreasing updates, and returned log_l/AIC/BIC/cell_counts are synchronized via a final E-step under the returned parameters. For order = 2, na_action = "em" is not available and errors explicitly; use na_action = "marginalize".

References

Xie, Y. and Zimmerman, D. L. (2013). Antedependence models for nonstationary categorical longitudinal data with ignorable missingness: likelihood-based inference. Statistics in Medicine, 32, 3274-3289.

Examples

if (FALSE) { # \dontrun{
# Simulate binary AD(1) data
set.seed(123)
y <- simulate_cat(n_subjects = 100, n_time = 5, order = 1, n_categories = 2)

# Fit model
fit <- fit_cat(y, order = 1)
print(fit)

# Compare orders
fit0 <- fit_cat(y, order = 0)
fit1 <- fit_cat(y, order = 1)
fit2 <- fit_cat(y, order = 2)
c(AIC_0 = fit0$aic, AIC_1 = fit1$aic, AIC_2 = fit2$aic)

# EM fit with missing data
y_miss <- y
y_miss[sample(length(y_miss), size = round(0.15 * length(y_miss)))] <- NA
fit_em <- fit_cat(
  y_miss,
  order = 1,
  na_action = "em",
  em_max_iter = 80,
  em_tol = 1e-6
)
fit_em$settings$n_iter
fit_em$settings$cell_counts_type
} # }