sens()
performs sensitivity analysis on a risk-adjusted regression by
computing the maximum and minimum regression coefficients consistent with the
data and the analyst's prior knowledge, expressed through epsilon
, the
bound on the mean absolute difference between the true and estimated risks.
It additionally can provide bootstrapped pointwise confidence intervals for
the regression coefficients.
Usage
sens(
df,
group_col,
obs_col,
p_col,
base_group,
epsilon,
lwr_col = NULL,
upr_col = NULL,
eta = 0.01,
m = 101L,
N = 0L,
alpha = 0.05,
chunk_size = 100L,
n_threads = 1L
)
Arguments
- df
The data frame containing the data.
- group_col
The name of the column containing the group labels. This column should be a factor or coercible to a factor.
- obs_col
The name of the column containing whether or not the outcome was observed. This column should be a logical or coercible to a logical.
- p_col
The name of the column containing the estimated risks. These risks should be expressed on the probability scale, i.e., be between 0 and 1.
- base_group
The name of the base group. This group will be used as the reference group in the regression.
- epsilon
The bound on the mean absolute difference between the true and estimated risks.
- lwr_col
The name of the column containing the lower bounds on the true risk. (Defaults to 0 for all observations.)
- upr_col
The name of the column containing the upper bounds on the true risk. (Defaults to 1 for all observations.)
- eta
The step size for the grid search. Note that while steps are taken at the group level, the step size is expressed at the level of change in average risk across the entire population. In other words, smaller groups will have proportionally larger steps. (Defaults to 0.01.)
- m
The grid size for the maximization approximation. (Defaults to
101
.)- N
The number of bootstrap resamples to use to compute pointwise confidence intervals. (Defaults to 0, which performs no bootstrap.)
- alpha
The confidence level for the pointwise confidence intervals. (Defaults to 0.05.)
- chunk_size
The number of repetitions to perform in each chunk when run in parallel. Larger chunk sizes make it less likely that separate threads will block on each other, but also make it more likely that the threads will finish at different times. (Defaults to 100.)
- n_threads
The number of threads to use when running in parallel. (Defaults to 1, i.e., serial execution.)
Value
A data frame containing the following columns:
epsilon
: Values of epsilon ranging from 0 to the input value ofepsilon
inm
steps.beta_min_{group}
: The minimum value of the regression coefficient for the groupgroup
. (Note that the base group is not included in this list.)beta_max_{group}
: The maximum value of the regression coefficient for the groupgroup
. (Note that the base group is not included in this list.)(If
N > 0
)beta_min_{group}_{alpha/2}
: Thealpha/2
quantile of the bootstrap distribution of the minimum value of the regression coefficient for groupgroup
. (Note that the base group is not included in this list.)(If
N > 0
)beta_min_{group}_{1 - alpha/2}
: The1 - alpha/2
quantile of the bootstrap distribution of the minimum value of the regression coefficient for groupgroup
. (Note that the base group is not included in this list.)(If
N > 0
)beta_max_{group}_{alpha/2}
: Thealpha/2
quantile of the bootstrap distribution of the maximum value of the regression coefficient for groupgroup
. (Note that the base group is not included in this list.)(If
N > 0
)beta_max_{group}_{1 - alpha/2}
: The1 - alpha/2
quantile of the bootstrap distribution of the maximum value of the regression coefficient for groupgroup
. (Note that the base group is not included in this list.)
Details
The sensitivity analysis assumes that every group contains at least one observed and one unobserved individual, and that the estimated risks and upper and lower bounds are "sortable," i.e., that there exists a permutation of the rows such that the estimated risks and upper and lower bounds are all non-decreasing within each group and observation status. If these conditions are not met, the function will throw an error.
To ensure that these conditions continue to hold, the bootstrap resamples are stratified by group and observation status. As a result, in small samples, the confidence intervals may be slightly narrowed, since they do not account for uncertainty in the number of individuals in each group, and the number of observed and unobserved individuals within each group.
Examples
# Generate some data
set.seed(1)
df <- tibble::tibble(
group = factor(
sample(c("a", "b"), 1000, replace = TRUE),
levels = c("a", "b")
),
p = runif(1000)^2,
frisked = runif(1000) < p + 0.1 * (group != "a")
)
# Compute the sensitivity analysis
sens(df, group, frisked, p, "a", 0.1)
#> # A tibble: 101 × 3
#> epsilon beta_min_b beta_max_b
#> <dbl> <dbl> <dbl>
#> 1 0 0.102 0.102
#> 2 0.001 0.102 0.102
#> 3 0.002 0.102 0.102
#> 4 0.003 0.101 0.102
#> 5 0.004 0.101 0.103
#> 6 0.005 0.101 0.103
#> 7 0.006 0.101 0.103
#> 8 0.007 0.101 0.103
#> 9 0.008 0.101 0.103
#> 10 0.009 0.100 0.103
#> # ℹ 91 more rows
# Search over a finer grid
sens(df, group, frisked, p, "a", 0.1, eta = 0.001)
#> # A tibble: 101 × 3
#> epsilon beta_min_b beta_max_b
#> <dbl> <dbl> <dbl>
#> 1 0 0.102 0.102
#> 2 0.001 0.0995 0.104
#> 3 0.002 0.0972 0.106
#> 4 0.003 0.0948 0.108
#> 5 0.004 0.0924 0.110
#> 6 0.005 0.0900 0.112
#> 7 0.006 0.0876 0.114
#> 8 0.007 0.0851 0.116
#> 9 0.008 0.0827 0.119
#> 10 0.009 0.0802 0.121
#> # ℹ 91 more rows
# Increase the accuracy of the maximization approximation
sens(df, group, frisked, p, "a", 0.1, m = 1001)
#> # A tibble: 1,001 × 3
#> epsilon beta_min_b beta_max_b
#> <dbl> <dbl> <dbl>
#> 1 0 0.102 0.102
#> 2 0.0001 0.102 0.102
#> 3 0.0002 0.102 0.102
#> 4 0.0003 0.102 0.102
#> 5 0.0004 0.102 0.102
#> 6 0.0005 0.102 0.102
#> 7 0.0006 0.102 0.102
#> 8 0.0007 0.102 0.102
#> 9 0.0008 0.102 0.102
#> 10 0.0009 0.102 0.102
#> # ℹ 991 more rows
# \donttest{
# Calculate 90% pointwise confidence intervals
sens(df, group, frisked, p, "a", 0.1, N = 1000, alpha = 0.1)
#> Resamples ■■■■■■ 17% | ETA: 13s
#> Resamples ■■■■■■■■■■■■ 36% | ETA: 10s
#> Resamples ■■■■■■■■■■■■■■■■■■ 55% | ETA: 7s
#> Resamples ■■■■■■■■■■■■■■■■■■■■■■■■ 75% | ETA: 4s
#> Resamples ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 95% | ETA: 1s
#> # A tibble: 101 × 7
#> epsilon beta_min_b_05.0 beta_min_b_95.0 beta_max_b_05.0 beta_max_b_95.0
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 0.0767 0.126 0.0767 0.126
#> 2 0.001 0.0762 0.126 0.0768 0.126
#> 3 0.002 0.0758 0.126 0.0770 0.126
#> 4 0.003 0.0754 0.126 0.0772 0.126
#> 5 0.004 0.0750 0.126 0.0774 0.126
#> 6 0.005 0.0746 0.126 0.0776 0.127
#> 7 0.006 0.0742 0.126 0.0778 0.127
#> 8 0.007 0.0738 0.126 0.0779 0.127
#> 9 0.008 0.0735 0.126 0.0781 0.127
#> 10 0.009 0.0731 0.126 0.0783 0.127
#> # ℹ 91 more rows
#> # ℹ 2 more variables: beta_min_b <dbl>, beta_max_b <dbl>
# Run in parallel, adjusting the chunk size to avoid blocking
sens(df, group, frisked, p, "a", 0.1, n_threads = 2, eta = 0.0001,
chunk_size = 1000)
#> # A tibble: 101 × 3
#> epsilon beta_min_b beta_max_b
#> <dbl> <dbl> <dbl>
#> 1 0 0.102 0.102
#> 2 0.001 0.0995 0.104
#> 3 0.002 0.0972 0.106
#> 4 0.003 0.0948 0.108
#> 5 0.004 0.0924 0.111
#> 6 0.005 0.0900 0.113
#> 7 0.006 0.0876 0.115
#> 8 0.007 0.0851 0.116
#> 9 0.008 0.0827 0.119
#> 10 0.009 0.0802 0.121
#> # ℹ 91 more rows
# }