Fast generalized linear model fitting
input model matrix. Must be a matrix object
numeric response vector of length nobs.
a description of the error distribution and link function to be used in the model.
For fastglmPure this can only be the result of a call to a family function.
(See family for details of family functions.)
an optional vector of 'prior weights' to be used in the fitting process. Should be a numeric vector.
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be a numeric vector of length equal to the number of cases
starting values for the parameters in the linear predictor.
starting values for the linear predictor.
values for the vector of means.
an integer scalar with value 0 for the column-pivoted QR decomposition, 1 for the unpivoted QR decomposition, 2 for the LLT Cholesky, 3 for the LDLT Cholesky, 4 for the full pivoted QR decomposition, 5 for the Bidiagonal Divide and Conquer SVD
threshold tolerance for convergence. Should be a positive real number
maximum number of IRLS iterations. Should be an integer
logical; if `TRUE` apply the Kosmidis–Firth mean bias-reducing adjusted score (AS_mean) to the IRLS iteration. The adjustment modifies each working response by \(\xi_i = \varphi \, h_i \, (d^2\mu/d\eta^2)_i \, V(\mu_i) / (2\, w_i\, (d\mu/d\eta)_i^3)\), where \(\varphi\) is the dispersion (1 for binomial/Poisson, estimated iteratively for other families). For `binomial(logit)` this reduces to the familiar \(h_i (0.5 - \mu_i) / (\mu_i (1 - \mu_i))\). Supported for all standard GLM families on dense, sparse (`dgCMatrix`), and `big.matrix` designs, and with any decomposition method. Sparse and `big.matrix` paths use lagged leverages from the previous IRLS iteration; at convergence the result is identical to the exact-leverage dense path. Convergence is checked on the sup-norm of the coefficient update.
A list with the elements
a vector of coefficients
a vector of the standard errors of the coefficient estimates
a scalar denoting the computed rank of the model matrix
a scalar denoting the degrees of freedom in the model
the vector of residuals
a numeric scalar - the root mean square for residuals
the vector of fitted values
[fastglm_fit()]
set.seed(1)
x <- matrix(rnorm(1000 * 25), ncol = 25)
eta <- 0.1 + 0.25 * x[,1] - 0.25 * x[,3] + 0.75 * x[,5] -0.35 * x[,6] #0.25 * x[,1] - 0.25 * x[,3]
y <- 1 * (eta > rnorm(1000))
yp <- rpois(1000, eta ^ 2)
yg <- rgamma(1000, exp(eta) * 1.75, 1.75)
# binomial
system.time(gl1 <- glm.fit(x, y, family = binomial()))
#> user system elapsed
#> 0.003 0.000 0.003
system.time(gf1 <- fastglmPure(x, y, family = binomial(), tol = 1e-8))
#> user system elapsed
#> 0.001 0.000 0.001
system.time(gf2 <- fastglmPure(x, y, family = binomial(), method = 1, tol = 1e-8))
#> user system elapsed
#> 0.001 0.000 0.001
system.time(gf3 <- fastglmPure(x, y, family = binomial(), method = 2, tol = 1e-8))
#> user system elapsed
#> 0.000 0.000 0.001
system.time(gf4 <- fastglmPure(x, y, family = binomial(), method = 3, tol = 1e-8))
#> user system elapsed
#> 0 0 0
max(abs(coef(gl1) - gf1$coef))
#> [1] 1.110223e-15
max(abs(coef(gl1) - gf2$coef))
#> [1] 4.440892e-16
max(abs(coef(gl1) - gf3$coef))
#> [1] 1.332268e-15
max(abs(coef(gl1) - gf4$coef))
#> [1] 1.776357e-15
# poisson
system.time(gl1 <- glm.fit(x, yp, family = poisson(link = "log")))
#> user system elapsed
#> 0.003 0.000 0.003
system.time(gf1 <- fastglmPure(x, yp, family = poisson(link = "log"), tol = 1e-8))
#> user system elapsed
#> 0.001 0.000 0.001
system.time(gf2 <- fastglmPure(x, yp, family = poisson(link = "log"), method = 1, tol = 1e-8))
#> user system elapsed
#> 0.001 0.000 0.001
system.time(gf3 <- fastglmPure(x, yp, family = poisson(link = "log"), method = 2, tol = 1e-8))
#> user system elapsed
#> 0.001 0.000 0.000
system.time(gf4 <- fastglmPure(x, yp, family = poisson(link = "log"), method = 3, tol = 1e-8))
#> user system elapsed
#> 0.001 0.000 0.001
max(abs(coef(gl1) - gf1$coef))
#> [1] 1.665335e-16
max(abs(coef(gl1) - gf2$coef))
#> [1] 2.220446e-16
max(abs(coef(gl1) - gf3$coef))
#> [1] 2.220446e-16
max(abs(coef(gl1) - gf4$coef))
#> [1] 1.457168e-16
# gamma
system.time(gl1 <- glm.fit(x, yg, family = Gamma(link = "log")))
#> user system elapsed
#> 0.008 0.001 0.008
system.time(gf1 <- fastglmPure(x, yg, family = Gamma(link = "log"), tol = 1e-8))
#> user system elapsed
#> 0.002 0.000 0.002
system.time(gf2 <- fastglmPure(x, yg, family = Gamma(link = "log"), method = 1, tol = 1e-8))
#> user system elapsed
#> 0.002 0.000 0.002
system.time(gf3 <- fastglmPure(x, yg, family = Gamma(link = "log"), method = 2, tol = 1e-8))
#> user system elapsed
#> 0.001 0.000 0.001
system.time(gf4 <- fastglmPure(x, yg, family = Gamma(link = "log"), method = 3, tol = 1e-8))
#> user system elapsed
#> 0.001 0.000 0.001
max(abs(coef(gl1) - gf1$coef))
#> [1] 5.551115e-16
max(abs(coef(gl1) - gf2$coef))
#> [1] 6.661338e-16
max(abs(coef(gl1) - gf3$coef))
#> [1] 5.551115e-16
max(abs(coef(gl1) - gf4$coef))
#> [1] 8.881784e-16