Cross validation for Orthogonalizing EM

cv.oem(
  x,
  y,
  penalty = c("elastic.net", "lasso", "ols", "mcp", "scad", "mcp.net", "scad.net",
    "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net", "grp.scad.net",
    "sparse.grp.lasso"),
  weights = numeric(0),
  lambda = NULL,
  type.measure = c("mse", "deviance", "class", "auc", "mae"),
  nfolds = 10,
  foldid = NULL,
  grouped = TRUE,
  keep = FALSE,
  parallel = FALSE,
  ncores = -1,
  ...
)

Arguments

x

input matrix of dimension n x p or CsparseMatrix objects of the Matrix (sparse not yet implemented. Each row is an observation, each column corresponds to a covariate. The cv.oem() function is optimized for n >> p settings and may be very slow when p > n, so please use other packages such as glmnet, ncvreg, grpreg, or gglasso when p > n or p approx n.

y

numeric response vector of length nobs.

penalty

Specification of penalty type in lowercase letters. Choices include "lasso", "ols" (Ordinary least squares, no penaly), "elastic.net", "scad", "mcp", "grp.lasso"

weights

observation weights. defaults to 1 for each observation (setting weight vector to length 0 will default all weights to 1)

lambda

A user supplied lambda sequence. By default, the program computes its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this.

type.measure

measure to evaluate for cross-validation. The default is type.measure = "deviance", which uses squared-error for gaussian models (a.k.a type.measure = "mse" there), deviance for logistic regression. type.measure = "class" applies to binomial only. type.measure = "auc" is for two-class logistic regression only. type.measure = "mse" or type.measure = "mae" (mean absolute error) can be used by all models; they measure the deviation from the fitted mean to the response.

nfolds

number of folds for cross-validation. default is 10. 3 is smallest value allowed.

foldid

an optional vector of values between 1 and nfold specifying which fold each observation belongs to.

grouped

Like in glmnet, this is an experimental argument, with default TRUE, and can be ignored by most users. For all models, this refers to computing nfolds separate statistics, and then using their mean and estimated standard error to describe the CV curve. If grouped = FALSE, an error matrix is built up at the observation level from the predictions from the nfold fits, and then summarized (does not apply to type.measure = "auc").

keep

If keep = TRUE, a prevalidated list of arrasy is returned containing fitted values for each observation and each value of lambda for each model. This means these fits are computed with this observation and the rest of its fold omitted. The folid vector is also returned. Default is keep = FALSE

parallel

If TRUE, use parallel foreach to fit each fold. Must register parallel before hand, such as doMC.

ncores

Number of cores to use. If parallel = TRUE, then ncores will be automatically set to 1 to prevent conflicts

...

other parameters to be passed to "oem" function

Value

An object with S3 class "cv.oem"

Examples

set.seed(123) n.obs <- 1e4 n.vars <- 100 true.beta <- c(runif(15, -0.25, 0.25), rep(0, n.vars - 15)) x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars) y <- rnorm(n.obs, sd = 3) + x %*% true.beta fit <- cv.oem(x = x, y = y, penalty = c("lasso", "grp.lasso"), groups = rep(1:20, each = 5)) layout(matrix(1:2, ncol = 2)) plot(fit) plot(fit, which.model = 2)