Fast generalized linear model fitting — fastglm.fit • fastglm

`fastglm.fit()` is a fitting method for [glm()]. It works like `glm.fit()`, i.e., by being supplied to the `method` argument of `glm()`.

# S3 method for class 'fit'
fastglm(
  x,
  y,
  weights = rep(1, NROW(y)),
  start = NULL,
  etastart = NULL,
  mustart = NULL,
  offset = rep(0, NROW(y)),
  family = gaussian(),
  control = list(),
  intercept = TRUE,
  singular.ok = TRUE
)

# S3 method for class 'control'
fastglm(fastmethod = 0L, tol = 1e-07, maxit = 100L, ...)

# S3 method for class 'fastglmFit'
vcov(object, refit = TRUE, ...)

# S3 method for class 'fastglmFit'
summary(object, refit = TRUE, ...)

Arguments

x: a design matrix of dimension `n * p`. Can also be a `big.matrix` object from bigmemory.
y: a vector of observations of length `n`.
weights: an optional vector of 'prior weights' to be used in the fitting process. Should be `NULL` or a numeric vector.
start: optional starting values for the parameters in the linear predictor.
etastart: optional starting values for the linear predictor.
mustart: optional starting values for the vector of means.
offset: this can be used to specify an *a priori* known component to be included in the linear predictor during fitting. This should be `NULL` or a numeric vector of length equal to the number of cases.
family: a description of the error distribution and link function to be used in the model. This must be a family function or the result of a call to a family function. (See [`family`] for details of family functions.)
control: a list of parameters for controlling the fitting process. This is passed to `fastglm.control()`.
singular.ok, intercept: See [glm.fit()].
fastmethod: `integer`; the method used for fitting. Allowable values include 0 for the column-pivoted QR decomposition, 1 for the unpivoted QR decomposition, 2 for the LLT Cholesky, 3 for the LDLT Cholesky, 4 for the full pivoted QR decomposition, and 5 for the Bidiagonal Divide and Conquer SVD. Default is 0. Can also be supplied as `method` when not supplied directly as an argument from `glm()` (see Examples).
tol: `numeric`; threshold tolerance for convergence.
maxit: `integer`; the maximum number of IRLS iterations.
...: for `vcov()` and `summary()`, other arguments passed to [vcov.glm()] and [summary.glm()] when `refit = TRUE`.
object: a `fastglmFit` object; the output of a call to `glm()` with `method = fastglm.fit`.
refit: `logical`; whether to refit the model using `glm()` with `method = "glm.fit"`. If `TRUE`, the model will be refit using the estimated coefficients as starting values for a single IRLS iteration in order to produce the usual coefficient covariance matrix. If `FALSE`, `vcov` will only produce the diagonal of the covariance matrix.

Details

The purpose of the functions documented on this page is to facilitate integration with existing [glm()] utilities in base R. `fastglm.fit()` is just a wrapper for [fastglmPure()] with some additional quality-of-life features. The `vcov()` and `summary()` methods are quick hacks to use the existing architecture for these functions in base R. Because of this, they involve refitting the GLM with the estimated coefficients as starting values.

Examples

set.seed(1234)
n <- 1e4
x <- matrix(rnorm(n * 25), ncol = 25)
eta <- 0.1 + 0.25 * x[,1] - 0.25 * x[,3] + 0.75 * x[,5] -0.35 * x[,6]
dat <- as.data.frame(x)

# binomial
dat$y <- rbinom(n, 1, pnorm(eta))

system.time({
    gl <- glm(y ~ ., data = dat,
              family = binomial)
})
#>    user  system elapsed 
#>   0.026   0.003   0.028 

system.time({
    gf0 <- glm(y ~ ., data = dat,
               family = binomial,
               method = fastglm.fit)
})
#>    user  system elapsed 
#>   0.010   0.001   0.010 

system.time({
    gf1 <- glm(y ~ ., data = dat,
               family = binomial,
               method = fastglm.fit,
               fastmethod = 1)
})
#>    user  system elapsed 
#>    0.01    0.00    0.01 

# poisson
dat$y <- rpois(n, eta^2)

system.time({
    gl <- glm(y ~ ., data = dat,
              family = poisson)
})
#>    user  system elapsed 
#>   0.035   0.002   0.037 

system.time({
    gf0 <- glm(y ~ ., data = dat,
               family = poisson,
               method = fastglm.fit)
})
#>    user  system elapsed 
#>   0.011   0.001   0.013 

system.time({
    gf1 <- glm(y ~ ., data = dat,
               family = poisson,
               method = fastglm.fit,
               fastmethod = 1)
})
#>    user  system elapsed 
#>   0.011   0.000   0.012 

# gamma
dat$y <- rgamma(n, exp(eta) * 1.75, 1.75)

system.time({
    gl <- glm(y ~ ., data = dat,
              family = Gamma(link = "log"))
})
#>    user  system elapsed 
#>   0.042   0.003   0.045 

system.time({
    gf0 <- glm(y ~ ., data = dat,
               family = Gamma(link = "log"),
               method = fastglm.fit)
})
#>    user  system elapsed 
#>   0.014   0.001   0.015 

system.time({
    gf1 <- glm(y ~ ., data = dat,
               family = Gamma(link = "log"),
               method = fastglm.fit,
               fastmethod = 1)
})
#>    user  system elapsed 
#>   0.014   0.002   0.016 

# Different (equivalent) ways of supplying
# control arguments:
gf1 <- glm(y ~ ., data = dat,
           family = Gamma(link = "log"),
           method = fastglm.fit,
           fastmethod = 1)

gf1 <- glm(y ~ ., data = dat,
           family = Gamma(link = "log"),
           method = fastglm.fit,
           control = list(fastmethod = 1))

gf1 <- glm(y ~ ., data = dat,
           family = Gamma(link = "log"),
           method = fastglm.fit,
           control = list(method = 1))