R/fastglm-hurdle.R
fastglm_hurdle.RdFits a hurdle model for count data with excess zeros: a binary (zero / non-zero) regression on the full sample combined with a zero- truncated count regression on the positive subset. The two parts have independent likelihoods and are estimated jointly. With `dist = "negbin"` the negative-binomial dispersion `theta` is estimated by an inner Brent-1D MLE inside the C++ driver, alternating with the count-side IRLS.
fastglm_hurdle(
formula,
data,
subset,
na.action,
weights,
offset,
dist = c("poisson", "negbin"),
zero.dist = "binomial",
link = c("logit", "probit", "cloglog", "log"),
init.theta = NULL,
tol = 1e-08,
maxit = 100L,
outer.tol = 1e-07,
outer.maxit = 50L,
theta.tol = 1e-08,
theta.maxit = 100L,
model = TRUE,
y = TRUE,
x = FALSE
)a [Formula::Formula] of the form `y ~ x1 + x2 | z1 + z2` where the right-hand side after `|` specifies the zero-model design. If `|` is absent, the same RHS is used for both parts.
optional data frame / environment from which to draw model matrix variables.
standard model-frame arguments.
optional non-negative prior weights.
optional vector or matrix. If a 2-column matrix, columns are taken as `(count_offset, zero_offset)`.
count component distribution: `"poisson"` or `"negbin"`.
zero/non-zero distribution; only `"binomial"` is currently supported.
character, link for the zero/non-zero binomial. One of `"logit"` (default), `"probit"`, `"cloglog"`, `"log"`.
optional positive scalar starting value for the NB dispersion. If `NULL`, a method-of-moments initializer is used.
IRLS-loop convergence tolerance and iteration cap.
`(beta, theta)` outer-loop convergence tolerance and iteration cap (only used when `dist = "negbin"`).
Brent-loop tolerance / iteration cap for the inner theta MLE (only used when `dist = "negbin"`).
logical; keep the model frame on the result.
logical; keep the response / design matrices on the result.
A list of class `c("fastglm_hurdle", "fastglm")` with elements `coefficients` (a list with `count` and `zero`), `se` (likewise), `vcov` (block-diagonal), `loglik`, `loglik_count`, `loglik_zero`, `theta`, `SE.theta`, `iter_*`, and `converged`.
This is the fastglm analogue of [pscl::hurdle()] – coefficients agree to 1e-6 on standard datasets, with all numerical loops in C++.
set.seed(1)
n <- 300
x1 <- rnorm(n); x2 <- rnorm(n)
eta_count <- 1.0 + 0.4*x1 - 0.2*x2
lam <- exp(eta_count)
y_count <- rpois(n, lam)
eta_zero <- -0.5 + 0.6*x1
p_pos <- plogis(eta_zero)
is_pos <- rbinom(n, 1, p_pos)
# zero-truncated: resample any rare zero from positive Poisson
for (i in seq_len(n)) {
while (y_count[i] == 0) y_count[i] <- rpois(1, lam[i])
}
y <- ifelse(is_pos == 1, y_count, 0)
df <- data.frame(y = y, x1 = x1, x2 = x2)
fit <- fastglm_hurdle(y ~ x1 + x2, data = df, dist = "poisson")
coef(fit, model = "count")
#> (Intercept) x1 x2
#> 0.9710285 0.3375237 -0.3219622
coef(fit, model = "zero")
#> (Intercept) x1 x2
#> -0.49868804 0.57931690 0.03888585