Creates an propensity function that optionally utilizes cross-fitting

create.propensity.function(
  crossfit = TRUE,
  nfolds.crossfit = 10,
  cv.glmnet.args = NULL
)

Arguments

crossfit

A logical value indicating whether to use cross-fitting (TRUE) or not (FALSE). Cross-fitting is more computationally intensive, but helps to prevent overfitting, see Chernozhukov, et al. (2018)

nfolds.crossfit

An integer specifying the number of folds to use for cross-fitting. Must be greater than 1

cv.glmnet.args

A list of NAMED arguments to pass to the cv.glmnet function. For example, cv.glmnet.args = list(type.measure = "mse", nfolds = 10). See cv.glmnet and glmnet for all possible options.

Value

A function which can be passed to the augment.func argument of the fit.subgroup function.

References

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters https://arxiv.org/abs/1608.00060

See also

fit.subgroup for estimating ITRs and create.propensity.function for creation of propensity functions

Examples

library(personalized)

set.seed(123)
n.obs  <- 500
n.vars <- 15
x <- matrix(rnorm(n.obs * n.vars, sd = 3), n.obs, n.vars)


# simulate non-randomized treatment
xbetat   <- 0.5 + 0.5 * x[,7] - 0.5 * x[,9]
trt.prob <- exp(xbetat) / (1 + exp(xbetat))
trt01    <- rbinom(n.obs, 1, prob = trt.prob)

trt      <- 2 * trt01 - 1

# simulate response
# delta below drives treatment effect heterogeneity
delta <- 2 * (0.5 + x[,2] - x[,3] - x[,11] + x[,1] * x[,12] )
xbeta <- x[,1] + x[,11] - 2 * x[,12]^2 + x[,13] + 0.5 * x[,15] ^ 2
xbeta <- xbeta + delta * trt

# continuous outcomes
y <- drop(xbeta) + rnorm(n.obs, sd = 2)

aug.func <- create.augmentation.function(family = "gaussian",
                                         crossfit = TRUE,
                                         nfolds.crossfit = 10,
                                         cv.glmnet.args = list(type.measure = "mae",
                                                               nfolds = 5))

prop.func <- create.propensity.function(crossfit = TRUE,
                                        nfolds.crossfit = 10,
                                        cv.glmnet.args = list(type.measure = "mae",
                                                              nfolds = 5))

subgrp.model <- fit.subgroup(x = x, y = y,
                             trt = trt01,
                             propensity.func = prop.func,
                             augment.func = aug.func,
                             loss   = "sq_loss_lasso",
                             nfolds = 10)    # option for cv.glmnet (for ITR estimation)

summary(subgrp.model)
#> family:    gaussian 
#> loss:      sq_loss_lasso 
#> method:    weighting 
#> cutpoint:  0 
#> augmentation 
#> function: augment.func 
#> propensity 
#> function:  propensity.func 
#> 
#> benefit score: f(x), 
#> Trt recom = 1*I(f(x)>c)+0*I(f(x)<=c) where c is 'cutpoint'
#> 
#> Average Outcomes:
#>                Recommended 0      Recommended 1
#> Received 0  -7.1071 (n = 84) -28.9549 (n = 117)
#> Received 1 -26.749 (n = 112)  -3.0901 (n = 187)
#> 
#> Treatment effects conditional on subgroups:
#> Est of E[Y|T=0,Recom=0]-E[Y|T=/=0,Recom=0] 
#>                          19.6419 (n = 196) 
#> Est of E[Y|T=1,Recom=1]-E[Y|T=/=1,Recom=1] 
#>                          25.8647 (n = 304) 
#> 
#> NOTE: The above average outcomes are biased estimates of
#>       the expected outcomes conditional on subgroups. 
#>       Use 'validate.subgroup()' to obtain unbiased estimates.
#> 
#> ---------------------------------------------------
#> 
#> Benefit score quantiles (f(X) for 1 vs 0): 
#>      0%     25%     50%     75%    100% 
#> -34.541  -5.529   3.090  11.518  42.140 
#> 
#> ---------------------------------------------------
#> 
#> Summary of individual treatment effects: 
#> E[Y|T=1, X] - E[Y|T=0, X]
#> 
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#> -69.082 -11.058   6.181   6.974  23.036  84.280 
#> 
#> ---------------------------------------------------
#> 
#> 10 out of 15 interactions selected in total by the lasso (cross validation criterion).
#> 
#> The first estimate is the treatment main effect, which is always selected. 
#> Any other variables selected represent treatment-covariate interactions.
#> 
#>            Trt1     V2      V3      V5     V6     V8      V9    V10     V11
#> Estimate 3.6115 2.3216 -2.4331 -0.0386 0.8201 1.2297 -0.3421 0.8635 -1.0772
#>             V13    V15
#> Estimate 1.5338 1.2956