Skip to contents

Fits categorical antedependence models with missing outcomes using the Expectation-Maximization (EM) algorithm for orders 0 and 1.

Usage

em_cat(
  y,
  order = 1,
  blocks = NULL,
  homogeneous = TRUE,
  n_categories = NULL,
  max_iter = 100,
  tol = 1e-06,
  epsilon = 1e-08,
  safeguard = TRUE,
  verbose = FALSE
)

Arguments

y

Integer matrix with n_subjects rows and n_time columns. Values are category codes in 1, ..., n_categories; NA is allowed.

order

Antedependence order. Supported values are 0 and 1. Order 2 is not yet implemented in em_cat().

blocks

Optional block/group vector of length n_subjects. Any coding is accepted (e.g., non-sequential integers or factor levels).

homogeneous

Logical. If TRUE, a single parameter set is fitted across blocks. If FALSE, separate parameters are fitted by block.

n_categories

Number of categories. If NULL, inferred from observed data.

max_iter

Maximum number of EM iterations.

tol

Convergence tolerance on absolute log-likelihood change.

epsilon

Small positive constant used for smoothing and numerical stability.

safeguard

Logical; if TRUE, apply step-halving when an M-step update decreases observed-data log-likelihood.

verbose

Logical; if TRUE, print EM progress.

Value

A cat_fit object with fields matching fit_cat. In EM mode, cell_counts stores expected counts from the final E-step, with settings$cell_counts_type = "expected".

Details

For complete data (no missing values), this function defers to fit_cat with closed-form MLEs.

For missing data and orders 0/1, each EM iteration computes expected sufficient statistics with a forward-backward E-step, then updates probabilities by normalized expected counts in the M-step. If safeguard = TRUE, a step-halving line search is applied to the M-step update whenever the observed-data likelihood decreases.

A final E-step is run before returning so that log_l/AIC/BIC and expected cell counts correspond exactly to the returned parameter values.

See also

Examples

set.seed(1)
y <- simulate_cat(n_subjects = 40, n_time = 5, order = 1, n_categories = 3)
y[sample(length(y), 10)] <- NA
fit <- em_cat(y, order = 1, n_categories = 3, max_iter = 20, tol = 1e-5)
fit$settings$na_action
#> [1] "em"