Fitting subgroup identification models for semicontinuous positive outcomes

Fits subgroup identification models

fit_subgroup_2part(
  x,
  y,
  trt,
  propensity.func = NULL,
  propensity.func.positive = NULL,
  match.id = NULL,
  augment.func.zero = NULL,
  augment.func.positive = NULL,
  cutpoint = 1,
  larger.outcome.better = TRUE,
  penalize.ate = TRUE,
  y_eps = 1e-06,
  ...
)

Arguments

x	The design matrix (not including intercept term)
y	The nonnegative response vector
trt	treatment vector with each element equal to a 0 or a 1, with 1 indicating treatment status is active.
propensity.func	function that inputs the design matrix x and the treatment vector trt and outputs the propensity score, ie Pr(trt = 1 \| X = x). Function should take two arguments 1) x and 2) trt. See example below. For a randomized controlled trial this can simply be a function that returns a constant equal to the proportion of patients assigned to the treatment group, i.e.: `propensity.func = function(x, trt) 0.5`.
propensity.func.positive	function that inputs the design matrix x and the treatment vector trt and outputs the propensity score for units with positive outcome values, ie Pr(trt = 1 \| X = x, Z = 1). Function should take two arguments 1) x and 2) trt. See example below. For a randomized controlled trial this can simply be a function that returns a constant equal to the proportion of patients assigned to the treatment group, i.e.: `propensity.func = function(x, trt) 0.5`.
match.id	a (character, factor, or integer) vector with length equal to the number of observations in `x` indicating using integers or levels of a factor vector which patients are in which matched groups. Defaults to `NULL` and assumes the samples are not from a matched cohort. Matched case-control groups can be created using any method (propensity score matching, optimal matching, etc). If each case is matched with a control or multiple controls, this would indicate which case-control pairs or groups go together. If `match.id` is supplied, then it is unecessary to specify a function via the `propensity.func` argument. A quick usage example: if the first patient is a case and the second and third are controls matched to it, and the fouth patient is a case and the fifth through seventh patients are matched with it, then the user should specify `match.id = c(1,1,1,2,2,2,2)` or `match.id = c(rep("Grp1", 3),rep("Grp2", 4))` the covariates `x`, and `trt` and outputs predicted values (on the probability scale) for the response using a model constructed with `x`. `augment.func.zero()` can also be simply a function of `x` and `y`. This function is used for efficiency augmentation. When the form of the augmentation function is correct, it can provide efficient estimation of the subgroups. Some examples of possible augmentation functions are: Example 1: `augment.func <- function(x, y) {lmod <- glm(y ~ x, family = binomial()); return(fitted(lmod))}` Example 2: augment.func <- function(x, y, trt) { data <- data.frame(x, y, trt) lmod <- glm(y ~ x * trt, family = binomial()) ## get predictions when trt = 1 data$trt <- 1 preds_1 <- predict(lmod, data, type = "response") ## get predictions when trt = -1 data$trt <- -1 preds_n1 <- predict(lmod, data, type = "response") ## return predictions averaged over trt return(0.5 * (preds_1 + preds_n1)) }
augment.func.zero	(similar to augment.func.positive) function which inputs the indicators of whether each response is positive (`1*(y > 0)`), the covariates `x`, and `trt` for all samples and outputs predicted values (on the link scale) for the response using a model constructed with `x`. `augment.func.positive()` can also be simply a function of `x` and `y`. This function is used for efficiency augmentation.
augment.func.positive	(similar to augment.func.zero) function which inputs the positive part response (ie all observations in `y` which are strictly positive), the covariates `x`, and `trt` and outputs predicted values (on the link scale) for the response using a model constructed with `x`. `augment.func.positive()` can also be simply a function of `x` and `y`. This function is used for efficiency augmentation.
cutpoint	numeric value for patients with benefit scores above which (or below which if `larger.outcome.better = FALSE`) will be recommended to be in the treatment group. Defaults to 1, since the benefit score is a risk ratio
larger.outcome.better	boolean value of whether a larger outcome is better/preferable. Set to `TRUE` if a larger outcome is better/preferable and set to `FALSE` if a smaller outcome is better/preferable. Defaults to `TRUE`.
penalize.ate	should the treatment main effect (ATE) be penalized too?
y_eps	positive value above which observations in `y` will be considered positive
...	options to be passed to `cv.hd2part`

Examples


set.seed(42)

dat <- sim_semicontinuous_data(250, n.vars = 15)
x <- dat$x
y <- dat$y
trt <- dat$trt

prop_func <- function(x, trt)
{
    propensmod <- glm(trt ~ x, family = binomial())

    propens <- unname(fitted(propensmod))
    propens
}

fitted_model <- fit_subgroup_2part(x, y, trt, prop_func, prop_func)

fitted_model
#> family:    2part 
#> loss:      2part 
#> method:    
#> cutpoint:  1 
#> propensity 
#> function:  propensity.func 
#> 
#> benefit score: f(x), 
#> Trt recom = 1*I(f(x)>c)+0*I(f(x)<=c) where c is 'cutpoint'
#> 
#> Average Outcomes:
#>              Recommended 0   Recommended 1
#> Received 0 1.5903 (n = 62) 0.1273 (n = 66)
#> Received 1 0.1498 (n = 25) 4.4873 (n = 97)
#> 
#> Treatment effects conditional on subgroups:
#> Est of E[Y|T=0,Recom=0]-E[Y|T=/=0,Recom=0] 
#>                            1.4406 (n = 87) 
#> Est of E[Y|T=1,Recom=1]-E[Y|T=/=1,Recom=1] 
#>                             4.36 (n = 163) 
#> 
#> NOTE: The above average outcomes are biased estimates of
#>       the expected outcomes conditional on subgroups. 
#>       Use 'validate.subgroup()' to obtain unbiased estimates.
#> 
#> ---------------------------------------------------
#> 
#> Benefit score quantiles (f(X) for 1 vs 0): 
#>        0%       25%       50%       75%      100% 
#> 1.706e-04 4.662e-01 5.409e+00 5.855e+01 1.649e+05 
#> 
#> ---------------------------------------------------
#> 
#> Summary of individual treatment effects: 
#> E[Y|T=1, X] / E[Y|T=0, X]
#> 
#> Note: for survival outcomes, the above ratio is 
#> E[g(Y)|T=1, X] / E[g(Y)|T=0, X], 
#> where g() is a monotone increasing function of Y, 
#> the survival time
#> 
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#>      0.00      0.47      5.41   1057.68     58.55 164909.90 

## correlation of estimated covariate-conditional risk ratio and truth
cor(fitted_model$benefit.scores, dat$treatment_risk_ratio, method = "spearman")
#>            [,1]
#> [1,] 0.07631469