Validates subgroup treatment effects for fitted subgroup identification model class of Chen, et al (2017)

validate.subgroup(model, B = 50L, method = c("training_test_replication",
  "boot_bias_correction"), train.fraction = 0.5,
  benefit.score.quantiles = c(0.1666667, 0.3333333, 0.5, 0.6666667,
  0.8333333), parallel = FALSE)

Arguments

model

fitted model object returned by fit.subgroup() function

B

integer. number of bootstrap replications or refitting replications.

method

validation method. "boot_bias_correction" for the bootstrap bias correction method of Harrell, et al (1996) or "training_test_replication" for repeated training and test splitting of the data (train.fraction should be specified for this option)

train.fraction

fraction (between 0 and 1) of samples to be used for training in training/test replication. Only used for method = "training_test_replication"

benefit.score.quantiles

a vector of quantiles (between 0 and 1) of the benefit score values for which to return bootstrapped information about the subgroups. ie if one of the quantile values is 0.5, the median value of the benefit scores will be used as a cutoff to determine subgroups and summary statistics will be returned about these subgroups

parallel

Should the loop over replications be parallelized? If FALSE, then no, if TRUE, then yes. If user sets parallel = TRUE and the fitted fit.subgroup() object uses the parallel version of an internal model, say for cv.glmnet(), then the internal parallelization will be overridden so as not to create a conflict of parallelism.

References

Chen, S., Tian, L., Cai, T. and Yu, M. (2017), A general statistical framework for subgroup identification and comparative treatment scoring. Biometrics. doi:10.1111/biom.12676

Harrell, F. E., Lee, K. L., and Mark, D. B. (1996). Tutorial in biostatistics multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in medicine, 15, 361-387. doi:10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4

See also

fit.subgroup for function which fits subgroup identification models, plot.subgroup_validated for plotting of validation results, and print.subgroup_validated for arguments for printing options for validate.subgroup().

Examples

library(personalized) set.seed(123) n.obs <- 500 n.vars <- 20 x <- matrix(rnorm(n.obs * n.vars, sd = 3), n.obs, n.vars) # simulate non-randomized treatment xbetat <- 0.5 + 0.5 * x[,11] - 0.5 * x[,13] trt.prob <- exp(xbetat) / (1 + exp(xbetat)) trt01 <- rbinom(n.obs, 1, prob = trt.prob) trt <- 2 * trt01 - 1 # simulate response delta <- 2 * (0.5 + x[,2] - x[,3] - x[,11] + x[,1] * x[,12]) xbeta <- x[,1] + x[,11] - 2 * x[,12]^2 + x[,13] xbeta <- xbeta + delta * trt # continuous outcomes y <- drop(xbeta) + rnorm(n.obs, sd = 2) # create function for fitting propensity score model prop.func <- function(x, trt) { # fit propensity score model propens.model <- cv.glmnet(y = trt, x = x, family = "binomial") pi.x <- predict(propens.model, s = "lambda.min", newx = x, type = "response")[,1] pi.x } subgrp.model <- fit.subgroup(x = x, y = y, trt = trt01, propensity.func = prop.func, loss = "sq_loss_lasso", nfolds = 5) # option for cv.glmnet subgrp.model$subgroup.trt.effects
#> $subgroup.effects #> 0 effect among recommended 0 1 effect among recommended 1 #> 19.75833 18.26168 #> #> $avg.outcomes #> Recommended 0 Recommended 1 #> Received 0 -3.821617 -31.30368 #> Received 1 -23.579944 -13.04201 #> #> $sample.sizes #> Recommended 0 Recommended 1 #> Received 0 50 170 #> Received 1 159 121 #> #> $overall.subgroup.effect #> [1] 19.16871 #>
x.test <- matrix(rnorm(10 * n.obs * n.vars, sd = 3), 10 * n.obs, n.vars) # simulate non-randomized treatment xbetat.test <- 0.5 + 0.5 * x.test[,11] - 0.5 * x.test[,13] trt.prob.test <- exp(xbetat.test) / (1 + exp(xbetat.test)) trt01.test <- rbinom(10 * n.obs, 1, prob = trt.prob.test) trt.test <- 2 * trt01.test - 1 # simulate response delta.test <- 2 * (0.5 + x.test[,2] - x.test[,3] - x.test[,11] + x.test[,1] * x.test[,12]) xbeta.test <- x.test[,1] + x.test[,11] - 2 * x.test[,12]^2 + x.test[,13] xbeta.test <- xbeta.test + delta.test * trt.test y.test <- drop(xbeta.test) + rnorm(10 * n.obs, sd = 2) valmod <- validate.subgroup(subgrp.model, B = 10, method = "training_test", train.fraction = 0.75) valmod
#> family: gaussian #> loss: sq_loss_lasso #> method: weighting #> #> validation method: training_test_replication #> iterations: 10 #> #> Average Test Set Outcomes: #> Recommended 0 Recommended 1 #> Received 0 -14.9597 (SE = 11.1886, n = 13.7) -21.4771 (SE = 7.7994, n = 43) #> Received 1 -21.5129 (SE = 3.6628, n = 37) -14.965 (SE = 4.2433, n = 31.3) #> #> 0 effect among recommended 0 1 effect among recommended 1 #> 6.5532 (SE = 11.8404, n = 50.7) 6.5121 (SE = 10.2546, n = 74.3) #> #> Overall Subgroup Effect #> 5.4065 (SE = 8.0859)
print(valmod, which.quant = c(4, 5))
#> family: gaussian #> loss: sq_loss_lasso #> method: weighting #> #> validation method: training_test_replication #> Cutoff: Quant_67 #> iterations: 10 #> #> Average Test Set Outcomes: #> Recommended 0 Recommended 1 #> Received 0 -14.0381 (SE = 7.9558, n = 29.5) -30.5683 (SE = 5.5557, n = 27.2) #> Received 1 -19.1725 (SE = 2.7345, n = 54.5) -16.1053 (SE = 8.1065, n = 13.8) #> #> 0 effect among recommended 0 1 effect among recommended 1 #> 5.1344 (SE = 8.506, n = 84) 14.4631 (SE = 9.6631, n = 41) #> #> Overall Subgroup Effect #> 9.5669 (SE = 5.7354) #> #> <===============================================> #> #> family: gaussian #> loss: sq_loss_lasso #> method: weighting #> #> validation method: training_test_replication #> Cutoff: Quant_83 #> iterations: 10 #> #> Average Test Set Outcomes: #> Recommended 0 Recommended 1 #> Received 0 -15.9204 (SE = 7.3824, n = 40.2) -33.8492 (SE = 7.3124, n = 16.5) #> Received 1 -20.2731 (SE = 2.4943, n = 62.8) -6.8326 (SE = 9.5111, n = 5.5) #> #> 0 effect among recommended 0 1 effect among recommended 1 #> 4.3527 (SE = 7.389, n = 103) 27.0166 (SE = 13.303, n = 22) #> #> Overall Subgroup Effect #> 9.0653 (SE = 5.3511)
bene.score.test <- subgrp.model$predict(x.test) mean(y.test[bene.score.test > 0 & trt01.test == 1]) - mean(y.test[bene.score.test > 0 & trt01.test == 0])
#> [1] 13.18799
mean(y.test[bene.score.test <= 0 & trt01.test == 0]) - mean(y.test[bene.score.test <= 0 & trt01.test == 1])
#> [1] 15.84766
quantile(valmod$boot.results[[1]][,1], c(0.025, 0.975))
#> 2.5% 97.5% #> -11.53297 27.44914
quantile(valmod$boot.results[[1]][,2], c(0.025, 0.975))
#> 2.5% 97.5% #> -9.99472 17.55450