The fastglm package is a fast and stable alternative to stats::glm() for fitting generalized linear models. It is built on RcppEigen and is fully compatible with R’s family objects: the downstream methods you expect (summary(), vcov(), predict(), coef(), residuals(), logLik()) all work exactly as they do for a glm.
Beyond standard GLMs, fastglm provides dedicated fitting functions for negative-binomial regression, hurdle and zero-inflated count models, and Firth bias-reduced GLMs, all of which reuse the same C++ IRLS solver.
glm() or glm2().Matrix::dgCMatrix), on-disc big.matrix objects (bigmemory), and a streaming callback interface (fastglm_streaming()) for fitting on data that does not fit in memory.firth = TRUE) implementing the AS_mean adjustment of Kosmidis and Firth (2009, 2021) for all standard families, all six decomposition methods, and all three large-data paths.fastglm_nb()) with joint (beta, theta) MLE entirely in C++.fastglm_hurdle()) with Poisson or NB count components and a binary zero/non-zero component.fastglm_zi()) with Poisson or NB count distributions, fit by an EM algorithm in C++.vcov(), summary(), predict(se.fit = TRUE), and compatibility with sandwich::vcovHC() and sandwich::vcovCL() for robust standard errors.Install from CRAN:
install.packages("fastglm")or the development version from GitHub:
pak::pak("jaredhuling/fastglm")The main function is fastglm(). It takes a numeric design matrix x, a response y, and an R family object:
library(fastglm)
data(esoph)
x <- model.matrix(~ agegp + unclass(tobgp) + unclass(alcgp), data = esoph)
y <- cbind(esoph$ncases, esoph$ncontrols)
fit <- fastglm(x, y, family = binomial(link = "cloglog"))
summary(fit)
#>
#> Call:
#> fastglm.default(x = x, y = y, family = binomial(link = "cloglog"))
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -4.29199 0.29777 -14.414 < 2e-16 ***
#> agegp.L 3.30677 0.63454 5.211 1.88e-07 ***
#> agegp.Q -1.35525 0.57310 -2.365 0.018 *
#> agegp.C 0.20296 0.43333 0.468 0.640
#> agegp^4 0.15056 0.29318 0.514 0.608
#> agegp^5 -0.23425 0.17855 -1.312 0.190
#> unclass(tobgp) 0.33276 0.07285 4.568 4.93e-06 ***
#> unclass(alcgp) 0.80384 0.07538 10.664 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 61.609 on 87 degrees of freedom
#> Residual deviance: 96.950 on 80 degrees of freedom
#> AIC: 228
#>
#> Number of Fisher Scoring iterations: 6fastglm() operates on a pre-built design matrix. To use a formula and a data frame, pass fastglm_fit as the fitting function to base glm():
fit2 <- glm(cbind(ncases, ncontrols) ~ agegp + unclass(tobgp) + unclass(alcgp),
data = esoph, family = binomial(link = "cloglog"),
method = fastglm_fit)A third, minimal-use function, fastglmPure(), returns only the coefficient vector and working quantities, skipping dispersion, AIC, and null-deviance computation. Use this when calling fastglm from another package and you only need the coefficients.
The IRLS algorithm reduces every iteration to a weighted least-squares problem. fastglm supports six different matrix decompositions for solving that WLS step, all from RcppEigen (Bates and Eddelbuettel, 2013); the choice trades off speed against numerical stability and rank-revealing behavior:
method |
decomposition |
|---|---|
0 |
column-pivoted Householder QR (default; rank-revealing) |
1 |
unpivoted Householder QR |
2 |
LLT Cholesky |
3 |
LDLT Cholesky |
4 |
full-pivoted Householder QR |
5 |
bidiagonal divide-and-conquer SVD |
The default (method = 0) is the safe choice: it is rank-revealing, so it handles aliased or collinear columns gracefully. The Cholesky methods (2 and 3) are roughly 3–4x faster but assume full column rank.
set.seed(123)
n <- 5000; p <- 30
x <- matrix(rnorm(n * p), n, p)
y <- rbinom(n, 1, plogis(x %*% rnorm(p) * 0.05))
system.time(f0 <- fastglm(x, y, family = binomial())) # default QR
#> user system elapsed
#> 0.004 0.000 0.005
system.time(f2 <- fastglm(x, y, family = binomial(), method = 2)) # LLT
#> user system elapsed
#> 0.002 0.000 0.002fastglm runs the same IRLS algorithm as glm.fit() but executes the per-iteration WLS solve in C++ via RcppEigen, which is often substantially faster than the compiled-R + LAPACK path that glm.fit() uses. The gap widens with sample size because the R-side overhead in glm.fit() is fixed per iteration:
library(microbenchmark)
library(ggplot2)
set.seed(123)
n.obs <- 10000
n.vars <- 100
x <- matrix(rnorm(n.obs * n.vars, sd = 3), n.obs, n.vars)
y <- 1 * (drop(x[, 1:25] %*% runif(25, -0.1, 0.1)) > rnorm(n.obs))
ct <- microbenchmark(
glm.fit = glm.fit(x, y, family = binomial()),
fastglm_QR = fastglm(x, y, family = binomial(), method = 0),
fastglm_LLT = fastglm(x, y, family = binomial(), method = 2),
fastglm_LDLT = fastglm(x, y, family = binomial(), method = 3),
times = 25L
)
autoplot(ct, log = FALSE) +
ggplot2::stat_summary(fun = median, geom = 'point', size = 2) +
ggplot2::theme_bw()
Coefficient estimates agree with glm.fit() to floating-point precision:
gl <- glm.fit(x, y, family = binomial())
c(fastglm_QR = max(abs(coef(gl) - coef(fastglm(x, y, family = binomial(), method = 0)))),
fastglm_LLT = max(abs(coef(gl) - coef(fastglm(x, y, family = binomial(), method = 2)))),
fastglm_LDLT = max(abs(coef(gl) - coef(fastglm(x, y, family = binomial(), method = 3)))))
#> fastglm_QR fastglm_LLT fastglm_LDLT
#> 1.360023e-15 1.249001e-15 1.165734e-15fastglm does not compromise computational stability for speed. It uses a step-halving safeguard following Marschner (2011) and starts from better-initialized values than glm() or glm2::glm2(), so it tends to converge in cases where the standard IRLS algorithm fails. As an example, consider a Gamma model with a sqrt link — a mild response misspecification combined with a badly misspecified link. In such scenarios the standard IRLS algorithm tends to have convergence issues:
set.seed(1)
x <- matrix(rnorm(10000 * 100), ncol = 100)
y <- (exp(0.25 * x[,1] - 0.25 * x[,3] + 0.5 * x[,4] - 0.5 * x[,5] + rnorm(10000))) + 0.1
gfit1 <- glm(y ~ x, family = Gamma(link = "sqrt"), method = fastglm_fit)
gfit2 <- glm(y ~ x, family = Gamma(link = "sqrt"))
## fastglm converges with a higher likelihood
c(fastglm_converged = gfit1$converged, glm_converged = gfit2$converged)
#> fastglm_converged glm_converged
#> TRUE FALSE
c(fastglm_logLik = logLik(gfit1), glm_logLik = logLik(gfit2))
#> fastglm_logLik glm_logLik
#> -16046.66 -16704.05See vignette("fastglm", package = "fastglm") for the full comparison, including glm2::glm2() and speedglm.
For the most commonly used family/link combinations, fastglm dispatches variance(), mu.eta(), linkinv(), and dev.resids() to inline C++ implementations rather than calling back into R once per IRLS iteration. The covered combinations are:
Detection is automatic: if the family object matches one of the above, the native path is used; otherwise fastglm falls back to the R-callback path. The C++ native approach is meaningfully faster on large n because it eliminates the per-iteration calls to R for each of the four family functions.
For designs that are sparse, that live on disc, or that have to be built from a parquet / arrow / DuckDB source, fastglm provides three large-data paths that share a common streaming kernel and produce identical results:
Matrix::dgCMatrix: pass directly to fastglm(). Useful for one-hot encoded categoricals and high-dimensional sparse designs.bigmemory::big.matrix: pass directly to fastglm(). The matrix is read in row-blocks and never fully materialized in memory.fastglm_streaming(chunk_callback, n_chunks, family): a user-supplied closure yields one row-block per call. The right path for fitting on a parquet dataset, DuckDB query, or any external columnar store.A short example of the streaming computation approach:
n <- 4000
X <- cbind(1, matrix(rnorm(n * 3), n, 3))
y <- rbinom(n, 1, plogis(X %*% c(0.2, 0.4, -0.2, 0.3)))
chunk_size <- 1000
chunks <- function(k) {
idx <- ((k - 1) * chunk_size + 1):(k * chunk_size)
list(X = X[idx, , drop = FALSE], y = y[idx])
}
fit_stream <- fastglm_streaming(chunks, n_chunks = 4, family = binomial())
fit_full <- fastglm(X, y, family = binomial(), method = 2)
max(abs(coef(fit_stream) - coef(fit_full)))
#> [1] 1.042751e-07See vignette("large-data-fastglm", package = "fastglm") for a detailed walk-through of all three paths.
fastglm_nb() fits negative-binomial regression with the dispersion theta estimated jointly with the regression coefficients, in the spirit of MASS::glm.nb(). The joint (beta, theta) MLE runs entirely in C++; IRLS for beta, Brent’s method for theta:
set.seed(123)
n <- 5000
X <- cbind(1, matrix(rnorm(n * 3), n, 3))
mu <- exp(X %*% c(0.5, 0.4, -0.2, 0.3))
y <- MASS::rnegbin(n, mu = mu, theta = 2)
f_nb <- fastglm_nb(X, y)
c(coef = coef(f_nb), theta = f_nb$theta)
#> coef.x1 coef.x2 coef.x3 coef.x4 theta
#> 0.4926199 0.3862258 -0.2200450 0.2991296 1.9923309fastglm_hurdle() fits a two-part count model: a binary regression for whether y > 0, plus a zero-truncated Poisson or NB regression on the positive subset. The two parts factorize and both are fit by the same C++ IRLS solver. This is the same model as pscl::hurdle() (Zeileis, Kleiber, and Jackman, 2008). Different designs for the count and zero parts are specified via the Formula package’s two-RHS syntax:
set.seed(123)
n <- 5000
x1 <- rnorm(n); x2 <- rnorm(n)
lam <- exp(0.7 + 0.4 * x1 - 0.3 * x2)
is_pos <- rbinom(n, 1, plogis(-0.4 + 0.5 * x1 + 0.2 * x2))
yt <- integer(n)
for (i in seq_len(n)) {
repeat { v <- rpois(1, lam[i]); if (v > 0) { yt[i] <- v; break } }
}
y <- ifelse(is_pos == 1, yt, 0L)
f_h <- fastglm_hurdle(y ~ x1 + x2, data = data.frame(y, x1, x2), dist = "poisson")
coef(f_h)
#> count_(Intercept) count_x1 count_x2 zero_(Intercept)
#> 0.6793788 0.4162547 -0.3137229 -0.3767285
#> zero_x1 zero_x2
#> 0.4447325 0.1999711fastglm_zi() fits a zero-inflated Poisson or NB regression, a binary inflation component overlaid on the original count distribution, fit by an EM algorithm in C++ with closed-form posterior responsibilities and an analytical observed-information vcov. This is the same model as pscl::zeroinfl():
set.seed(123)
n <- 5000
x1 <- rnorm(n); x2 <- rnorm(n)
eta_c <- 0.7 + 0.4 * x1 - 0.3 * x2
eta_z <- -0.4 + 0.5 * x1 + 0.2 * x2
z <- rbinom(n, 1, plogis(eta_z))
y <- ifelse(z == 1, 0L, rpois(n, exp(eta_c)))
f_zi <- fastglm_zi(y ~ x1 + x2, data = data.frame(y, x1, x2), dist = "poisson")
coef(f_zi)
#> count_(Intercept) count_x1 count_x2 zero_(Intercept)
#> 0.6647396 0.3965153 -0.3407168 -0.3750258
#> zero_x1 zero_x2
#> 0.4512788 0.1819040Setting firth = TRUE activates the general mean-bias reduction of Kosmidis and Firth (2009, 2021). This extends Firth’s (1993) original logistic penalty to arbitrary GLM families, producing finite estimates even under separation and removing the leading bias from maximum likelihood estimates:
data(sex2, package = "logistf")
X <- model.matrix(~ age + oc + vic + vicl + vis + dia, data = sex2)
y <- sex2$case
f_firth <- fastglm(X, y, family = binomial(), firth = TRUE)
coef(f_firth)
#> (Intercept) age oc vic vicl vis
#> 0.12025405 -1.10598133 -0.06881673 2.26887465 -2.11140819 -0.78831695
#> dia
#> 3.09601183Firth bias reduction works with all six decomposition methods, all standard R families and link functions, and all three large-data paths (sparse, big.matrix, streaming). See vignette("firth-fastglm", package = "fastglm") for verification against logistf::logistf() and brglm2::brglmFit().
The fitted object stores the unscaled covariance directly, so vcov() and summary() work as expected. Heteroskedasticity-consistent and cluster-robust covariance matrices are available via sandwich::vcovHC() and sandwich::vcovCL(), fastglm registers methods on those generics, so loading sandwich is all that is required:
library(sandwich)
V_hc <- vcovHC(fit, type = "HC0")
V_cl <- vcovCL(fit, cluster = cluster, type = "HC1")Results are numerically identical to sandwich applied to a glm fit to floating-point precision. predict() supports se.fit = TRUE:
predict(fit, newdata = xnew, type = "response", se.fit = TRUE)A comprehensive benchmarking study is available in vignette("benchmarks-fastglm", package = "fastglm"), comparing fastglm against the canonical reference implementations across standard GLMs (glm.fit, glm2, speedglm), negative-binomial regression (MASS::glm.nb), Firth bias-reduced GLMs (brglm2, logistf), and hurdle / zero-inflated count regressions (pscl::hurdle, pscl::zeroinfl).
The following summary plot shows the speedup fastglm delivers over the canonical reference for each model class, as a function of sample size. The reference for the standard GLMs is the fastest among glm.fit, glm2, and speedglm, so the comparison is conservative. Larger is better:

Across all model classes the same picture holds: fastglm matches the canonical reference implementation to floating-point precision, and the runtime gap grows with sample size. By the speedup is generally an order of magnitude or more. For models with an outer iteration (NB joint MLE, hurdle/ZI with NB), the gap is widest, since the entire outer loop is in C++ in fastglm and entirely in R in the reference implementations.
Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27–38.
Kosmidis, I. and Firth, D. (2009). Bias reduction in exponential family nonlinear models. Biometrika, 96(4), 793–804.
Kosmidis, I. and Firth, D. (2021). Jeffreys-prior penalty, finiteness and shrinkage in binomial-response generalized linear models. Biometrika, 108(1), 71–82.
Marschner, I. C. (2011). glm2: Fitting generalized linear models with convergence problems. The R Journal, 3(2), 12–15.
Bates, D. and Eddelbuettel, D. (2013). Fast and elegant numerical linear algebra using the RcppEigen package. Journal of Statistical Software, 52(5), 1–24.
Zeileis, A., Kleiber, C., and Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27(8), 1–25.