R/fit_subgroup_two_part.R
fit_subgroup_2part.Rd
Fits subgroup identification models
fit_subgroup_2part( x, y, trt, propensity.func = NULL, propensity.func.positive = NULL, match.id = NULL, augment.func.zero = NULL, augment.func.positive = NULL, cutpoint = 1, larger.outcome.better = TRUE, penalize.ate = TRUE, y_eps = 1e06, ... )
x  The design matrix (not including intercept term) 

y  The nonnegative response vector 
trt  treatment vector with each element equal to a 0 or a 1, with 1 indicating treatment status is active. 
propensity.func  function that inputs the design matrix x and the treatment vector trt and outputs
the propensity score, ie Pr(trt = 1  X = x). Function should take two arguments 1) x and 2) trt. See example below.
For a randomized controlled trial this can simply be a function that returns a constant equal to the proportion
of patients assigned to the treatment group, i.e.:

propensity.func.positive  function that inputs the design matrix x and the treatment vector trt and outputs
the propensity score for units with positive outcome values, ie Pr(trt = 1  X = x, Z = 1). Function should take
two arguments 1) x and 2) trt. See example below.
For a randomized controlled trial this can simply be a function that returns a constant equal to the proportion
of patients assigned to the treatment group, i.e.:

match.id  a (character, factor, or integer) vector with length equal to the number of observations in Example 1: Example 2: augment.func < function(x, y, trt) { data < data.frame(x, y, trt) lmod < glm(y ~ x * trt, family = binomial()) ## get predictions when trt = 1 data$trt < 1 preds_1 < predict(lmod, data, type = "response") ## get predictions when trt = 1 data$trt < 1 preds_n1 < predict(lmod, data, type = "response") ## return predictions averaged over trt return(0.5 * (preds_1 + preds_n1)) } 
augment.func.zero  (similar to augment.func.positive) function which inputs the
indicators of whether each response is positive ( 
augment.func.positive  (similar to augment.func.zero) function which inputs the positive part response
(ie all observations in 
cutpoint  numeric value for patients with benefit scores above which
(or below which if 
larger.outcome.better  boolean value of whether a larger outcome is better/preferable. Set to 
penalize.ate  should the treatment main effect (ATE) be penalized too? 
y_eps  positive value above which observations in 
...  options to be passed to 
set.seed(42) dat < sim_semicontinuous_data(250, n.vars = 15) x < dat$x y < dat$y trt < dat$trt prop_func < function(x, trt) { propensmod < glm(trt ~ x, family = binomial()) propens < unname(fitted(propensmod)) propens } fitted_model < fit_subgroup_2part(x, y, trt, prop_func, prop_func) fitted_model#> family: 2part #> loss: 2part #> method: #> cutpoint: 1 #> propensity #> function: propensity.func #> #> benefit score: f(x), #> Trt recom = 1*I(f(x)>c)+0*I(f(x)<=c) where c is 'cutpoint' #> #> Average Outcomes: #> Recommended 0 Recommended 1 #> Received 0 1.5903 (n = 62) 0.1273 (n = 66) #> Received 1 0.1498 (n = 25) 4.4873 (n = 97) #> #> Treatment effects conditional on subgroups: #> Est of E[YT=0,Recom=0]E[YT=/=0,Recom=0] #> 1.4406 (n = 87) #> Est of E[YT=1,Recom=1]E[YT=/=1,Recom=1] #> 4.36 (n = 163) #> #> NOTE: The above average outcomes are biased estimates of #> the expected outcomes conditional on subgroups. #> Use 'validate.subgroup()' to obtain unbiased estimates. #> #>  #> #> Benefit score quantiles (f(X) for 1 vs 0): #> 0% 25% 50% 75% 100% #> 1.706e04 4.662e01 5.409e+00 5.855e+01 1.649e+05 #> #>  #> #> Summary of individual treatment effects: #> E[YT=1, X] / E[YT=0, X] #> #> Note: for survival outcomes, the above ratio is #> E[g(Y)T=1, X] / E[g(Y)T=0, X], #> where g() is a monotone increasing function of Y, #> the survival time #> #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> 0.00 0.47 5.41 1057.68 58.55 164909.90## correlation of estimated covariateconditional risk ratio and truth cor(fitted_model$benefit.scores, dat$treatment_risk_ratio, method = "spearman")#> [,1] #> [1,] 0.07631469