Fits subgroup identification model class of Chen, et al (2017)
fit.subgroup(x, y, trt, propensity.func = NULL, loss = c("sq_loss_lasso", "logistic_loss_lasso", "cox_loss_lasso", "owl_logistic_loss_lasso", "owl_logistic_flip_loss_lasso", "owl_hinge_loss", "owl_hinge_flip_loss", "sq_loss_lasso_gam", "logistic_loss_lasso_gam", "sq_loss_gam", "logistic_loss_gam", "owl_logistic_loss_gam", "owl_logistic_flip_loss_gam", "owl_logistic_loss_lasso_gam", "owl_logistic_flip_loss_lasso_gam", "sq_loss_gbm", "abs_loss_gbm", "logistic_loss_gbm", "cox_loss_gbm"), method = c("weighting", "a_learning"), match.id = NULL, augment.func = NULL, cutpoint = 0, larger.outcome.better = TRUE, reference.trt = NULL, retcall = TRUE, ...)
x  The design matrix (not including intercept term) 

y  The response vector 
trt  treatment vector with each element equal to a 0 or a 1, with 1 indicating treatment status is active. 
propensity.func  function that inputs the design matrix x and the treatment vector trt and outputs
the propensity score, ie Pr(trt = 1  X = x). Function should take two arguments 1) x and 2) trt. See example below.
For a randomized controlled trial this can simply be a function that returns a constant equal to the proportion
of patients assigned to the treatment group, i.e.:

loss  choice of both the M function from Chen, et al (2017) and potentially the penalty used for variable selection.
All

method  subgroup ID model type. Either the weighting or Alearning method of Chen et al, (2017) 
match.id  a (character, factor, or integer) vector with length equal to the number of observations in 
augment.func  function which inputs the response Example 2: For binary and timetoevent outcomes, make sure that predictions are returned on the scale of the predictors Example 3:

cutpoint  numeric value for patients with benefit scores above which
(or below which if 
larger.outcome.better  boolean value of whether a larger outcome is better/preferable. Set to 
reference.trt  which treatment should be treated as the reference treatment. Defaults to the first level of 
retcall  boolean value. if 
...  options to be passed to underlying fitting function. For all For all 
Chen, S., Tian, L., Cai, T. and Yu, M. (2017), A general statistical framework for subgroup identification and comparative treatment scoring. Biometrics. doi:10.1111/biom.12676 http://onlinelibrary.wiley.com/doi/10.1111/biom.12676/abstract
Xu, Y., Yu, M., Zhao, Y. Q., Li, Q., Wang, S., & Shao, J. (2015), Regularized outcome weighted subgroup identification for differential treatment effects. Biometrics, 71(3), 645653. doi: 10.1111/biom.12322 http://onlinelibrary.wiley.com/doi/10.1111/biom.12322/full
Zhao, Y., Zeng, D., Rush, A. J., & Kosorok, M. R. (2012), Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107(499), 11061118. doi: 10.1080/01621459.2012.695674 http://dx.doi.org/10.1080/01621459.2012.695674
validate.subgroup
for function which creates validation results for subgroup
identification models, predict.subgroup_fitted
for a prediction function for fitted models
from fit.subgroup
, plot.subgroup_fitted
for a function which plots
results from fitted models, and print.subgroup_fitted
for arguments for printing options for fit.subgroup()
.
from fit.subgroup
.
library(personalized) set.seed(123) n.obs < 500 n.vars < 15 x < matrix(rnorm(n.obs * n.vars, sd = 3), n.obs, n.vars) # simulate nonrandomized treatment xbetat < 0.5 + 0.5 * x[,7]  0.5 * x[,9] trt.prob < exp(xbetat) / (1 + exp(xbetat)) trt01 < rbinom(n.obs, 1, prob = trt.prob) trt < 2 * trt01  1 # simulate response delta < 2 * (0.5 + x[,2]  x[,3]  x[,11] + x[,1] * x[,12] ) xbeta < x[,1] + x[,11]  2 * x[,12]^2 + x[,13] + 0.5 * x[,15] ^ 2 xbeta < xbeta + delta * trt # continuous outcomes y < drop(xbeta) + rnorm(n.obs, sd = 2) # binary outcomes y.binary < 1 * (xbeta + rnorm(n.obs, sd = 2) > 0 ) # timetoevent outcomes surv.time < exp(20  xbeta + rnorm(n.obs, sd = 1)) cens.time < exp(rnorm(n.obs, sd = 3)) y.time.to.event < pmin(surv.time, cens.time) status < 1 * (surv.time <= cens.time) # create function for fitting propensity score model prop.func < function(x, trt) { # fit propensity score model propens.model < cv.glmnet(y = trt, x = x, family = "binomial") pi.x < predict(propens.model, s = "lambda.min", newx = x, type = "response")[,1] pi.x } subgrp.model < fit.subgroup(x = x, y = y, trt = trt01, propensity.func = prop.func, loss = "sq_loss_lasso", nfolds = 10) # option for cv.glmnet summary(subgrp.model)#> family: gaussian #> loss: sq_loss_lasso #> method: weighting #> propensity #> function: propensity.func #> #> Average Outcomes: #> Recommended 0 Recommended 1 #> Received 0 6.2631 (n = 82) 26.8532 (n = 119) #> Received 1 26.4854 (n = 128) 1.6497 (n = 171) #> #> 0 effect among recommended 0 1 effect among recommended 1 #> 20.2223 (n = 210) 25.2035 (n = 290) #> #> Benefit score quantiles: #> 0% 25% 50% 75% 100% #> 36.451 6.808 2.337 10.538 37.786 #> #> 12 out of 15 variables selected in total by the lasso (cross validation criterion). #> #> Estimate #> Trt1 2.4460 #> V1 0.1780 #> V2 2.5630 #> V3 2.5829 #> V4 0.1536 #> V5 0.2795 #> V6 0.7616 #> V7 0.3729 #> V8 0.7436 #> V10 0.8779 #> V11 1.1277 #> V13 1.6186 #> V15 0.7653# fit lasso + gam model with REML option for gam subgrp.modelg < fit.subgroup(x = x, y = y, trt = trt01, propensity.func = prop.func, loss = "sq_loss_lasso_gam", method.gam = "REML", # option for gam nfolds = 5) # option for cv.glmnet subgrp.modelg#> family: gaussian #> loss: sq_loss_lasso_gam #> method: weighting #> propensity #> function: propensity.func #> #> Average Outcomes: #> Recommended 0 Recommended 1 #> Received 0 12.6764 (n = 131) 25.1851 (n = 70) #> Received 1 21.0034 (n = 141) 2.777 (n = 158) #> #> 0 effect among recommended 0 1 effect among recommended 1 #> 8.327 (n = 272) 22.4081 (n = 228) #> #> Benefit score quantiles: #> 0% 25% 50% 75% 100% #> 57.584 11.433 1.937 8.934 44.564subgrp.model.bin < fit.subgroup(x = x, y = y.binary, trt = trt01, propensity.func = prop.func, loss = "logistic_loss_lasso", type.measure = "auc", # option for cv.glmnet nfolds = 5) # option for cv.glmnet subgrp.model.bin#> family: binomial #> loss: logistic_loss_lasso #> method: weighting #> propensity #> function: propensity.func #> #> Average Outcomes: #> Recommended 0 Recommended 1 #> Received 0 0.5161 (n = 85) 0.1788 (n = 116) #> Received 1 0.1778 (n = 130) 0.5541 (n = 169) #> #> 0 effect among recommended 0 1 effect among recommended 1 #> 0.3383 (n = 215) 0.3753 (n = 285) #> #> Benefit score quantiles: #> 0% 25% 50% 75% 100% #> 2.0266 0.3695 0.1284 0.6247 2.2427library(survival) subgrp.model.cox < fit.subgroup(x = x, y = Surv(y.time.to.event, status), trt = trt01, propensity.func = prop.func, loss = "cox_loss_lasso", nfolds = 5) # option for cv.glmnet subgrp.model.cox#> family: cox #> loss: cox_loss_lasso #> method: weighting #> propensity #> function: propensity.func #> #> Average Outcomes: #> Recommended 0 Recommended 1 #> Received 0 115.7397 (n = 137) 2.2659 (n = 64) #> Received 1 78.6273 (n = 196) 40.6071 (n = 103) #> #> 0 effect among recommended 0 1 effect among recommended 1 #> 37.1124 (n = 333) 38.3413 (n = 167) #> #> Benefit score quantiles: #> 0% 25% 50% 75% 100% #> 1.0584 0.4067 0.1596 0.1056 1.0928