Fits a hurdle model for count data with excess zeros: a binary (zero / non-zero) regression on the full sample combined with a zero- truncated count regression on the positive subset. The two parts have independent likelihoods and are estimated jointly. With `dist = "negbin"` the negative-binomial dispersion `theta` is estimated by an inner Brent-1D MLE inside the C++ driver, alternating with the count-side IRLS.

fastglm_hurdle(
  formula,
  data,
  subset,
  na.action,
  weights,
  offset,
  dist = c("poisson", "negbin"),
  zero.dist = "binomial",
  link = c("logit", "probit", "cloglog", "log"),
  init.theta = NULL,
  tol = 1e-08,
  maxit = 100L,
  outer.tol = 1e-07,
  outer.maxit = 50L,
  theta.tol = 1e-08,
  theta.maxit = 100L,
  model = TRUE,
  y = TRUE,
  x = FALSE
)

Arguments

formula

a [Formula::Formula] of the form `y ~ x1 + x2 | z1 + z2` where the right-hand side after `|` specifies the zero-model design. If `|` is absent, the same RHS is used for both parts.

data

optional data frame / environment from which to draw model matrix variables.

subset, na.action

standard model-frame arguments.

weights

optional non-negative prior weights.

offset

optional vector or matrix. If a 2-column matrix, columns are taken as `(count_offset, zero_offset)`.

dist

count component distribution: `"poisson"` or `"negbin"`.

zero.dist

zero/non-zero distribution; only `"binomial"` is currently supported.

character, link for the zero/non-zero binomial. One of `"logit"` (default), `"probit"`, `"cloglog"`, `"log"`.

init.theta

optional positive scalar starting value for the NB dispersion. If `NULL`, a method-of-moments initializer is used.

tol, maxit

IRLS-loop convergence tolerance and iteration cap.

outer.tol, outer.maxit

`(beta, theta)` outer-loop convergence tolerance and iteration cap (only used when `dist = "negbin"`).

theta.tol, theta.maxit

Brent-loop tolerance / iteration cap for the inner theta MLE (only used when `dist = "negbin"`).

model

logical; keep the model frame on the result.

y, x

logical; keep the response / design matrices on the result.

Value

A list of class `c("fastglm_hurdle", "fastglm")` with elements `coefficients` (a list with `count` and `zero`), `se` (likewise), `vcov` (block-diagonal), `loglik`, `loglik_count`, `loglik_zero`, `theta`, `SE.theta`, `iter_*`, and `converged`.

Details

This is the fastglm analogue of [pscl::hurdle()] – coefficients agree to 1e-6 on standard datasets, with all numerical loops in C++.

Examples

set.seed(1)
n <- 300
x1 <- rnorm(n); x2 <- rnorm(n)
eta_count <- 1.0 + 0.4*x1 - 0.2*x2
lam <- exp(eta_count)
y_count <- rpois(n, lam)
eta_zero  <- -0.5 + 0.6*x1
p_pos     <- plogis(eta_zero)
is_pos    <- rbinom(n, 1, p_pos)
# zero-truncated: resample any rare zero from positive Poisson
for (i in seq_len(n)) {
  while (y_count[i] == 0) y_count[i] <- rpois(1, lam[i])
}
y <- ifelse(is_pos == 1, y_count, 0)
df <- data.frame(y = y, x1 = x1, x2 = x2)
fit <- fastglm_hurdle(y ~ x1 + x2, data = df, dist = "poisson")
coef(fit, model = "count")
#> (Intercept)          x1          x2 
#>   0.9710285   0.3375237  -0.3219622 
coef(fit, model = "zero")
#> (Intercept)          x1          x2 
#> -0.49868804  0.57931690  0.03888585