Validates subgroup treatment effects for fitted subgroup identification model class of Chen, et al (2017)

validate.subgroup(
  model,
  B = 50L,
  method = c("training_test_replication", "boot_bias_correction"),
  train.fraction = 0.75,
  benefit.score.quantiles = c(0.1666667, 0.3333333, 0.5, 0.6666667, 0.8333333),
  parallel = FALSE
)

Arguments

model

fitted model object returned by fit.subgroup() function

B

integer. number of bootstrap replications or refitting replications.

method

validation method. "boot_bias_correction" for the bootstrap bias correction method of Harrell, et al (1996) or "training_test_replication" for repeated training and test splitting of the data (train.fraction should be specified for this option)

train.fraction

fraction (between 0 and 1) of samples to be used for training in training/test replication. Only used for method = "training_test_replication"

benefit.score.quantiles

a vector of quantiles (between 0 and 1) of the benefit score values for which to return bootstrapped information about the subgroups. ie if one of the quantile values is 0.5, the median value of the benefit scores will be used as a cutoff to determine subgroups and summary statistics will be returned about these subgroups

parallel

Should the loop over replications be parallelized? If FALSE, then no, if TRUE, then yes. If user sets parallel = TRUE and the fitted fit.subgroup() object uses the parallel version of an internal model, say for cv.glmnet(), then the internal parallelization will be overridden so as not to create a conflict of parallelism.

Value

An object of class "subgroup_validated"

avg.results

Estimates of average conditional treatment effects when subgroups are determined based on the provided cutoff value for the benefit score. For example, if cutoff = 0 and there is a treatment and control only, then the treatment is recommended if the benefit score is greater than 0.

se.results

Standard errors of the estimates from avg.estimates

boot.results

Contains the individual results for each replication. avg.results is comprised of averages of the values from boot.results

avg.quantile.results

Estimates of average conditional treatment effects when subgroups are determined based on different quntile cutoff values for the benefit score. For example, if benefit.score.quantiles = 0.75 and there is a treatment and control only, then the treatment is recommended if the benefit score is greater than the 75th upper quantile of all benefit scores. If multiple quantile values are provided, e.g. benefit.score.quantiles = c(0.15, 0.5, 0.85), then results will be provided for all quantile levels.

se.quantile.results

Standard errors corresponding to avg.quantile.results

boot.results.quantiles

Contains the individual results for each replication. avg.quantile.results is comprised of averages of the values from boot.results.quantiles

family

Family of the outcome. For example, "gaussian" for continuous outcomes

method

Method used for subgroup identification model. Weighting or A-learning

n.trts

The number of treatment levels

comparison.trts

All treatment levels other than the reference level

reference.trt

The reference level for the treatment. This should usually be the control group/level

larger.outcome.better

If larger outcomes are preferred for this model

cutpoint

Benefit score cutoff value used for determining subgroups

val.method

Method used for validation

iterations

Number of replications used in the validation process

nobs

Number of observations in x provided to fit.subgroup

nvars

Number of variables in x provided to fit.subgroup

Details

Estimates of various quantities conditional on subgroups and treatment statuses are provided and displayed via the print.subgroup_validated function:

  1. "Conditional expected outcomes" The first results shown when printing a subgroup_validated object are estimates of the expected outcomes conditional on the estimated subgroups (i.e. which subgroup is 'recommended' by the model) and conditional on treatment/intervention status. If there are two total treatment options, this results in a 2x2 table of expected conditional outcomes.

  2. "Treatment effects conditional on subgroups" The second results shown when printing a subgroup_validated object are estimates of the expected outcomes conditional on the estimated subgroups. If the treatment takes levels \(j \in \{1, \dots, K\}\), a total of \(K\) conditional treatment effects will be shown. For example, of the outcome is continuous, the \(j\)th conditional treatment effect is defined as \(E(Y|Trt = j, Subgroup=j) - E(Y|Trt = j, Subgroup =/= j)\), where \(Subgroup=j\) if treatment \(j\) is recommended, i.e. treatment \(j\) results in the largest/best expected potential outcomes given the fitted model.

  3. "Overall treatment effect conditional on subgroups " The third quantity displayed shows the overall improvement in outcomes resulting from all treatment recommendations. This is essentially an average over all of the conditional treatment effects weighted by the proportion of the population recommended each respective treatment level.

References

Chen, S., Tian, L., Cai, T. and Yu, M. (2017), A general statistical framework for subgroup identification and comparative treatment scoring. Biometrics. doi:10.1111/biom.12676

Harrell, F. E., Lee, K. L., and Mark, D. B. (1996). Tutorial in biostatistics multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in medicine, 15, 361-387. doi:10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4

Huling. J.D. and Yu, M. (2021), Subgroup Identification Using the personalized Package. Journal of Statistical Software 98(5), 1-60. doi:10.18637/jss.v098.i05

See also

fit.subgroup for function which fits subgroup identification models, plot.subgroup_validated for plotting of validation results, and print.subgroup_validated for arguments for printing options for validate.subgroup().

Examples

library(personalized)

set.seed(123)
n.obs  <- 500
n.vars <- 20
x <- matrix(rnorm(n.obs * n.vars, sd = 3), n.obs, n.vars)


# simulate non-randomized treatment
xbetat   <- 0.5 + 0.5 * x[,11] - 0.5 * x[,13]
trt.prob <- exp(xbetat) / (1 + exp(xbetat))
trt01    <- rbinom(n.obs, 1, prob = trt.prob)

trt      <- 2 * trt01 - 1

# simulate response
delta <- 2 * (0.5 + x[,2] - x[,3] - x[,11] + x[,1] * x[,12])
xbeta <- x[,1] + x[,11] - 2 * x[,12]^2 + x[,13]
xbeta <- xbeta + delta * trt

# continuous outcomes
y <- drop(xbeta) + rnorm(n.obs, sd = 2)

# create function for fitting propensity score model
prop.func <- function(x, trt)
{
    # fit propensity score model
    propens.model <- cv.glmnet(y = trt,
                               x = x, family = "binomial")
    pi.x <- predict(propens.model, s = "lambda.min",
                    newx = x, type = "response")[,1]
    pi.x
}

subgrp.model <- fit.subgroup(x = x, y = y,
                             trt = trt01,
                             propensity.func = prop.func,
                             loss   = "sq_loss_lasso",
                             # option for cv.glmnet,
                             # better to use 'nfolds=10'
                             nfolds = 3)


x.test <- matrix(rnorm(10 * n.obs * n.vars, sd = 3), 10 * n.obs, n.vars)


# simulate non-randomized treatment
xbetat.test   <- 0.5 + 0.5 * x.test[,11] - 0.5 * x.test[,13]
trt.prob.test <- exp(xbetat.test) / (1 + exp(xbetat.test))
trt01.test    <- rbinom(10 * n.obs, 1, prob = trt.prob.test)

trt.test      <- 2 * trt01.test - 1

# simulate response
delta.test <- 2 * (0.5 + x.test[,2] - x.test[,3] - x.test[,11] + x.test[,1] * x.test[,12])
xbeta.test <- x.test[,1] + x.test[,11] - 2 * x.test[,12]^2 + x.test[,13]
xbeta.test <- xbeta.test + delta.test * trt.test

y.test <- drop(xbeta.test) + rnorm(10 * n.obs, sd = 2)

valmod <- validate.subgroup(subgrp.model, B = 2,
                            method = "training_test",
                            train.fraction = 0.75)
valmod
#> family:  gaussian 
#> loss:    sq_loss_lasso 
#> method:  weighting 
#> 
#> validation method:  training_test_replication 
#> cutpoint:           0 
#> replications:       2 
#> 
#> benefit score: f(x), 
#> Trt recom = 1*I(f(x)>c)+0*I(f(x)<=c) where c is 'cutpoint'
#> 
#> Average Test Set Outcomes:
#>                               Recommended 0                    Recommended 1
#> Received 0   -24.6386 (SE = 3.2025, n = 13) -23.1746 (SE = 7.9863, n = 41.5)
#> Received 1 -17.5581 (SE = 1.8924, n = 39.5)   -12.5447 (SE = 4.8997, n = 31)
#> 
#> Treatment effects conditional on subgroups:
#> Est of E[Y|T=0,Recom=0]-E[Y|T=/=0,Recom=0] 
#>            -7.0805 (SE = 5.0949, n = 52.5) 
#> Est of E[Y|T=1,Recom=1]-E[Y|T=/=1,Recom=1] 
#>            10.6299 (SE = 3.0866, n = 72.5) 
#> 
#> Est of 
#> E[Y|Trt received = Trt recom] - E[Y|Trt received =/= Trt recom]:                    
#> 4.476 (SE = 1.9268) 

print(valmod, which.quant = c(4, 5))
#> family:  gaussian 
#> loss:    sq_loss_lasso 
#> method:  weighting 
#> 
#> validation method:  training_test_replication 
#> cutpoint:           Quant_67 
#> replications:       2 
#> 
#> benefit score: f(x), 
#> Trt recom = 1*I(f(x)>c)+0*I(f(x)<=c) where c is 'cutpoint'
#> 
#> Average Test Set Outcomes:
#>                              Recommended 0                    Recommended 1
#> Received 0 -16.3927 (SE = 11.6213, n = 27) -37.9023 (SE = 7.3754, n = 27.5)
#> Received 1  -17.2144 (SE = 0.4242, n = 57)  -8.5765 (SE = 7.3799, n = 13.5)
#> 
#> Treatment effects conditional on subgroups:
#> Est of E[Y|T=0,Recom=0]-E[Y|T=/=0,Recom=0] 
#>              0.8217 (SE = 12.0455, n = 84) 
#> Est of E[Y|T=1,Recom=1]-E[Y|T=/=1,Recom=1] 
#>             29.3258 (SE = 14.7553, n = 41) 
#> 
#> Est of E[Y|Trt received = Trt recom] - E[Y|Trt received =/= Trt recom]:                      
#> 9.9253 (SE = 12.0313) 
#> 
#> <===============================================>
#> 
#> family:  gaussian 
#> loss:    sq_loss_lasso 
#> method:  weighting 
#> 
#> validation method:  training_test_replication 
#> cutpoint:           Quant_83 
#> replications:       2 
#> 
#> benefit score: f(x), 
#> Trt recom = 1*I(f(x)>c)+0*I(f(x)<=c) where c is 'cutpoint'
#> 
#> Average Test Set Outcomes:
#>                               Recommended 0                  Recommended 1
#> Received 0 -20.0343 (SE = 5.5038, n = 37.5) -35.2546 (SE = 5.7461, n = 17)
#> Received 1 -16.8211 (SE = 0.9053, n = 65.5)  -4.5877 (SE = 13.5549, n = 5)
#> 
#> Treatment effects conditional on subgroups:
#> Est of E[Y|T=0,Recom=0]-E[Y|T=/=0,Recom=0] 
#>             -3.2132 (SE = 6.4091, n = 103) 
#> Est of E[Y|T=1,Recom=1]-E[Y|T=/=1,Recom=1] 
#>              30.6669 (SE = 7.8088, n = 22) 
#> 
#> Est of E[Y|Trt received = Trt recom] - E[Y|Trt received =/= Trt recom]:                     
#> 2.4436 (SE = 7.2189)