Fast generalized linear model fitting — fastglm

`fastglm_fit()` is a fitting method for [glm()]. It works like `glm.fit()`, i.e., by being supplied to the `method` argument of `glm()`.

fastglm_fit(
  x,
  y,
  weights = rep(1, NROW(y)),
  start = NULL,
  etastart = NULL,
  mustart = NULL,
  offset = rep(0, NROW(y)),
  family = gaussian(),
  control = list(),
  intercept = TRUE,
  singular.ok = TRUE,
  firth = FALSE
)

fastglm_control(fastmethod = 0L, tol = 1e-07, maxit = 100L, ...)

# S3 method for class 'fastglmFit'
vcov(object, ...)

# S3 method for class 'fastglmFit'
summary(object, ...)

Arguments

x: a design matrix of dimension `n * p`. Can also be a `big.matrix` object from bigmemory.
y: a vector of observations of length `n`.
weights: an optional vector of 'prior weights' to be used in the fitting process. Should be `NULL` or a numeric vector.
start: optional starting values for the parameters in the linear predictor.
etastart: optional starting values for the linear predictor.
mustart: optional starting values for the vector of means.
offset: this can be used to specify an *a priori* known component to be included in the linear predictor during fitting. This should be `NULL` or a numeric vector of length equal to the number of cases.
family: a description of the error distribution and link function to be used in the model. This must be a family function or the result of a call to a family function. (See [`family`] for details of family functions.)
control: a list of parameters for controlling the fitting process. This is passed to `fastglm_control()`.
singular.ok, intercept: See [glm.fit()].
firth: `logical`; if `TRUE` apply Firth's (1993) bias-reducing penalty to the score function. Currently supported only for `family = binomial(link = "logit")` on dense `x`. See `logistf::logistf()` for the canonical reference implementation.
fastmethod: `integer`; the method used for fitting. Allowable values include 0 for the column-pivoted QR decomposition, 1 for the unpivoted QR decomposition, 2 for the LLT Cholesky, 3 for the LDLT Cholesky, 4 for the full pivoted QR decomposition, and 5 for the Bidiagonal Divide and Conquer SVD. Default is 0. Can also be supplied as `method` when not supplied directly as an argument from `glm()` (see Examples).
tol: `numeric`; threshold tolerance for convergence.
maxit: `integer`; the maximum number of IRLS iterations.
...: for `vcov()` and `summary()`, other arguments passed downstream.
object: a `fastglmFit` object; the output of a call to `glm()` with `method = fastglm_fit`.

Details

The purpose of the functions documented on this page is to facilitate integration with existing [glm()] utilities in base R. `fastglm_fit()` is just a wrapper for [fastglmPure()] with some additional quality-of-life features. The `vcov()` and `summary()` methods use the unscaled coefficient covariance matrix returned directly from the C++ solver, so no refit is required.

Examples

set.seed(1234)
n <- 1e4
x <- matrix(rnorm(n * 25), ncol = 25)
eta <- 0.1 + 0.25 * x[,1] - 0.25 * x[,3] + 0.75 * x[,5] -0.35 * x[,6]
dat <- as.data.frame(x)

# binomial
dat$y <- rbinom(n, 1, pnorm(eta))

system.time({
    gl <- glm(y ~ ., data = dat,
              family = binomial)
})
#>    user  system elapsed 
#>   0.025   0.001   0.027 

system.time({
    gf0 <- glm(y ~ ., data = dat,
               family = binomial,
               method = fastglm_fit)
})
#>    user  system elapsed 
#>   0.009   0.000   0.010 

system.time({
    gf1 <- glm(y ~ ., data = dat,
               family = binomial,
               method = fastglm_fit,
               fastmethod = 1)
})
#>    user  system elapsed 
#>   0.009   0.001   0.009 

# poisson
dat$y <- rpois(n, eta^2)

system.time({
    gl <- glm(y ~ ., data = dat,
              family = poisson)
})
#>    user  system elapsed 
#>   0.035   0.002   0.037 

system.time({
    gf0 <- glm(y ~ ., data = dat,
               family = poisson,
               method = fastglm_fit)
})
#>    user  system elapsed 
#>   0.011   0.001   0.012 

system.time({
    gf1 <- glm(y ~ ., data = dat,
               family = poisson,
               method = fastglm_fit,
               fastmethod = 1)
})
#>    user  system elapsed 
#>   0.011   0.002   0.013 

# gamma
dat$y <- rgamma(n, exp(eta) * 1.75, 1.75)

system.time({
    gl <- glm(y ~ ., data = dat,
              family = Gamma(link = "log"))
})
#>    user  system elapsed 
#>   0.043   0.005   0.048 

system.time({
    gf0 <- glm(y ~ ., data = dat,
               family = Gamma(link = "log"),
               method = fastglm_fit)
})
#>    user  system elapsed 
#>   0.013   0.000   0.014 

system.time({
    gf1 <- glm(y ~ ., data = dat,
               family = Gamma(link = "log"),
               method = fastglm_fit,
               fastmethod = 1)
})
#>    user  system elapsed 
#>   0.013   0.001   0.014 

# Different (equivalent) ways of supplying
# control arguments:
gf1 <- glm(y ~ ., data = dat,
           family = Gamma(link = "log"),
           method = fastglm_fit,
           fastmethod = 1)

gf1 <- glm(y ~ ., data = dat,
           family = Gamma(link = "log"),
           method = fastglm_fit,
           control = list(fastmethod = 1))

gf1 <- glm(y ~ ., data = dat,
           family = Gamma(link = "log"),
           method = fastglm_fit,
           control = list(method = 1))