Likelihood ratio test for antedependence order (categorical AD data)

Tests whether a higher-order AD model provides significantly better fit than a lower-order model for categorical longitudinal data.

Usage

test_order_cat(
  y = NULL,
  order_null = 0,
  order_alt = 1,
  blocks = NULL,
  homogeneous = TRUE,
  n_categories = NULL,
  fit_null = NULL,
  fit_alt = NULL,
  test = c("lrt", "score", "mlrt", "wald")
)

Arguments

y: Integer matrix with n_subjects rows and n_time columns. Each entry should be a category code from 1 to c. Can be NULL if both fit_null and fit_alt are provided.
order_null: Order under the null hypothesis (default 0).
order_alt: Order under the alternative hypothesis (default 1). Must be greater than order_null.
blocks: Optional integer vector of length n_subjects specifying group membership.
homogeneous: Logical. If TRUE (default), parameters are shared across all groups.
n_categories: Number of categories. If NULL, inferred from data.
fit_null: Optional pre-fitted model under null hypothesis (class "cat_fit"). If provided, y is not required for fitting under H0.
fit_alt: Optional pre-fitted model under alternative hypothesis. If provided, y is not required for fitting under H1.
test: Type of test statistic. One of "lrt" (default), "score", "mlrt", or "wald".

Value

A list of class "cat_lrt" containing:

method: Inference method used: one of "lrt", "score", "mlrt", or "wald".
lrt_stat: Likelihood ratio test statistic
df: Degrees of freedom
p_value: P-value from chi-square distribution
fit_null: Fitted model under H0
fit_alt: Fitted model under H1
order_null: Order under null
order_alt: Order under alternative
table: Summary data frame

Details

The likelihood ratio test statistic is: $$\lambda = -2[\ell_0 - \ell_1]$$ where $\ell_0$ and $\ell_1$ are the maximized log-likelihoods under the null and alternative hypotheses.

Under H0, $\lambda$ follows a chi-square distribution with degrees of freedom equal to the difference in the number of free parameters.

For testing AD(p) vs AD(p+1), the degrees of freedom are: $$df = (c-1)^2 \times c^p \times (n - p - 1)$$ where c is the number of categories and n is the number of time points.

If y contains missing values and models are fit internally, this function defaults to na_action = "marginalize" for fitting. Score- and Wald-based variants currently require complete data.

References

Xie, Y. and Zimmerman, D. L. (2013). Antedependence models for nonstationary categorical longitudinal data with ignorable missingness: likelihood-based inference. Statistics in Medicine, 32, 3274-3289.

Examples

if (FALSE) { # \dontrun{
# Simulate AD(1) data
set.seed(123)
y <- simulate_cat(200, 6, order = 1, n_categories = 2)

# Test AD(0) vs AD(1)
test_01 <- test_order_cat(y, order_null = 0, order_alt = 1)
print(test_01$table)

# Test AD(1) vs AD(2)
test_12 <- test_order_cat(y, order_null = 1, order_alt = 2)
print(test_12$table)

# Using pre-fitted models
fit0 <- fit_cat(y, order = 0)
fit1 <- fit_cat(y, order = 1)
test_prefitted <- test_order_cat(fit_null = fit0, fit_alt = fit1)
} # }