Fits a zero-inflated count model: a binary inflation component combined with a Poisson or NB count component, where the count component is the *original* (non-truncated) distribution. Excess zeros above what the count model implies are absorbed by the inflation component.
a [Formula::Formula] of the form `y ~ x1 + x2 | z1 + z2` where the right-hand side after `|` specifies the inflation design. If `|` is absent, the same RHS is used for both parts.
optional data frame / environment.
standard model-frame arguments.
optional non-negative prior weights.
optional vector or matrix. If a 2-column matrix, columns are taken as `(count_offset, zero_offset)`.
count component distribution: `"poisson"` or `"negbin"`.
character, link for the inflation binomial. One of `"logit"` (default), `"probit"`, `"cloglog"`, `"log"`.
optional positive scalar starting value for the NB dispersion. If `NULL`, a method-of-moments pilot is used.
EM convergence tolerance and iteration cap.
IRLS-loop convergence tolerance and iteration cap (used inside each M-step).
Brent-loop tolerance / iteration cap for the inner theta MLE (only used when `dist = "negbin"`).
logical; keep the model frame on the result.
logical; keep the response / design matrices on the result.
A list of class `c("fastglm_zi", "fastglm")` with elements `coefficients` (a list with `count` and `zero`), `se` (likewise), `vcov` (full, including theta if NB), `loglik`, `theta`, `SE.theta`, `tau` (posterior P(Z=1|y)), `em_iter`, `converged`.
Estimation uses an EM algorithm with all numerical loops in C++. The E-step computes the posterior `tau_i = P(Z_i = 1 | y_i)` analytically; M-steps fit the inflation logit/probit/cloglog/log and the Poisson/NB count regression via the existing native fastglm IRLS (with weights `w_i (1 - tau_i)` on the count side). For NB, an inner Brent MLE re-estimates `theta` after the count-side beta step. Final vcov comes from the numerical Jacobian of the analytical observed score at the EM fixed point (block-structured for `(gamma, beta, theta)`).
This is the fastglm analogue of [pscl::zeroinfl()] – coefficients agree to 1e-5 on standard datasets (slightly looser than hurdle to allow for the EM iteration), with all numerical loops in C++.
set.seed(1)
n <- 400
x <- rnorm(n)
eta_count <- 0.8 + 0.4*x
lam <- exp(eta_count)
eta_zero <- -0.6 + 0.5*x
p_inflate <- plogis(eta_zero)
z <- rbinom(n, 1, p_inflate)
y <- ifelse(z == 1, 0, rpois(n, lam))
df <- data.frame(y = y, x = x)
fit <- fastglm_zi(y ~ x, data = df, dist = "poisson")
coef(fit, model = "count")
#> (Intercept) x
#> 0.8590558 0.3067513
coef(fit, model = "zero")
#> (Intercept) x
#> -0.5134892 0.3188130