| Title: | Sequential Probability Ratio Tests Toolbox |
|---|---|
| Description: | A toolbox for Sequential Probability Ratio Tests (SPRT) based on Wald (1945) <doi:10.2134/agronj1947.00021962003900070011x>. SPRTs are applied during the sampling process, ideally after each observation, and at every stage return a decision to either continue sampling or terminate and accept one of the specified hypotheses. The `seq_ttest()` function performs one-sample, two-sample, and paired t-tests for one- and two-sided hypotheses (Schnuerch & Erdfelder (2019) <doi:10.1037/met0000234>). The `seq_anova()` function performs a sequential one-way fixed effects ANOVA (Steinhilber et al. (2024) <doi:10.1037/met0000677>). The `plan_sample_size()` function helps plan sequential studies by simulating required sample sizes across a range of effect sizes. For more information, see the vignettes browseVignettes(package = "sprtt") or the package website <https://meikesteinhilber.github.io/sprtt/>. |
| Authors: | Meike Snijder-Steinhilber [aut, cre] (ORCID: <https://orcid.org/0000-0002-7144-2100>), Martin Schnuerch [aut, ths] (ORCID: <https://orcid.org/0000-0001-6531-2265>), Anna-Lena Schubert [aut, ths] (ORCID: <https://orcid.org/0000-0001-7248-0662>) |
| Maintainer: | Meike Snijder-Steinhilber <[email protected]> |
| License: | AGPL (>= 3) |
| Version: | 0.3.1 |
| Built: | 2026-05-20 10:45:00 UTC |
| Source: | https://github.com/meikesteinhilber/sprtt |
Removes locally cached simulation data (~150 MB) used by plan_sample_size().
Data will be automatically re-downloaded on next use of sample size planning functions.
This function is useful when:
You want to free up disk space
The cached data may be outdated and you want to force a fresh download
Troubleshooting cache-related issues
cache_clear()cache_clear()
Invisibly returns TRUE if cache was cleared, FALSE if no cache existed.
## Not run: # Clear cache cache_clear() ## End(Not run)## Not run: # Clear cache cache_clear() ## End(Not run)
Displays information about cached simulation data (~150 MB) used by plan_sample_size().
Shows the cache directory location, whether data is cached, file size, and dataset
version metadata.
The simulation data is automatically downloaded on first use of sample size planning functions and stored locally for faster subsequent access.
cache_info()cache_info()
Invisibly returns a list with:
cache_dir: Character string with the cache directory path
data_cached: Logical indicating if simulation data is cached
file_size_mb: Numeric file size in MB (or NA if not cached)
data_version: GitHub release tag of the cached dataset (or NA if not cached)
data_created: Date the dataset was created (or NA if not cached)
cache_clear() to remove cached data
download_sample_size_data() to manually download simulation data
plan_sample_size() which uses the cached data
A dataset that includes 120 individuals.
df_cancerdf_cancer
A data frame with 2 variables:
A dataset that includes 120 individuals with sex gender and monthly income.
df_incomedf_income
A data frame with 2 variables:
A dataset that includes 120 individuals.
df_stressdf_stress
A data frame with 2 variables:
Downloads pre-computed simulation results from GitHub releases. Data is cached locally and only needs to be downloaded once.
Data is hosted at: MeikeSteinhilber/sprtt_plan_sample_size
download_sample_size_data(force = FALSE)download_sample_size_data(force = FALSE)
force |
Logical. If TRUE, re-download even if data exists. Default FALSE. |
Invisibly returns the path to the cached data file.
## Not run: # Download data (only needed once) download_sample_size_data() # Force re-download (e.g., after data update) download_sample_size_data(force = TRUE) ## End(Not run)## Not run: # Download data (only needed once) download_sample_size_data() # Force re-download (e.g., after data update) download_sample_size_data(force = TRUE) ## End(Not run)
Draws exemplary samples with a certain effect size for the sequential one-oway ANOVA or the sequential t-test, see Steinhilber et al. (2023) https://doi.org/10.31234/osf.io/m64ne
draw_sample_mixture(k_groups, f, max_n, counter_n = 100, verbose = FALSE)draw_sample_mixture(k_groups, f, max_n, counter_n = 100, verbose = FALSE)
k_groups |
number of groups (levels of factor_A) |
f |
Cohen's f. The simulated effect size. |
max_n |
sample size for the groups (total sample size = max_n*k_groups) |
counter_n |
number of times the function tries to find a possible parameter combination for the distribution. Default value is set to 100. |
verbose |
|
returns a data.frame with the columns y (observations) and x (factor_A).
set.seed(333) data <- sprtt::draw_sample_mixture( k_groups = 2, f = 0.40, max_n = 2 ) data data <- sprtt::draw_sample_mixture( k_groups = 4, f = 1.2, # very large effect size max_n = 4, counter_n = 1000, # increase of counter is necessary verbose = TRUE # prints more information to the console ) dataset.seed(333) data <- sprtt::draw_sample_mixture( k_groups = 2, f = 0.40, max_n = 2 ) data data <- sprtt::draw_sample_mixture( k_groups = 4, f = 1.2, # very large effect size max_n = 4, counter_n = 1000, # increase of counter is necessary verbose = TRUE # prints more information to the console ) data
Draws exemplary samples with a certain effect size for the sequential one-oway ANOVA or the sequential t-test, see Steinhilber et al. (2023) https://doi.org/10.31234/osf.io/m64ne
draw_sample_normal(k_groups, f, max_n, sd = NULL, sample_ratio = NULL)draw_sample_normal(k_groups, f, max_n, sd = NULL, sample_ratio = NULL)
k_groups |
number of groups (levels of factor_A) |
f |
Cohen's f. The simulated effect size. |
max_n |
sample size for the groups (total sample size = max_n*k_groups) |
sd |
vector of standard deviations of the groups. Default value is 1 for each group. |
sample_ratio |
vector of sample ratios between th groups. Default value is 1 for each group. |
returns a data.frame with the columns y (observations) and x (factor_A).
set.seed(333) data <- sprtt::draw_sample_normal( k_groups = 2, f = 0.20, max_n = 2 ) data data <- sprtt::draw_sample_normal( k_groups = 4, f = 0, max_n = 2, sd = c(1, 2, 1, 8) ) data data <- sprtt::draw_sample_normal( k_groups = 3, f = 0.40, max_n = 2, sd = c(1, 0.8, 1), sample_ratio = c(1, 2, 3) ) dataset.seed(333) data <- sprtt::draw_sample_normal( k_groups = 2, f = 0.20, max_n = 2 ) data data <- sprtt::draw_sample_normal( k_groups = 4, f = 0, max_n = 2, sd = c(1, 2, 1, 8) ) data data <- sprtt::draw_sample_normal( k_groups = 3, f = 0.40, max_n = 2, sd = c(1, 0.8, 1), sample_ratio = c(1, 2, 3) ) data
Loads pre-computed simulation results for SPRT sample size planning. If not already cached locally, the data (~150 MB) will be downloaded automatically from GitHub releases. Use this function to access the complete dataset for custom analysis and visualization. See the Data Structure section below for details on available columns.
Data is hosted at: MeikeSteinhilber/sprtt_plan_sample_size
load_sample_size_data()load_sample_size_data()
A named list with the following elements:
description: Short description of the dataset
version: GitHub release tag of the dataset (e.g., "v0.1.0-data")
created: Date the dataset was created (as character string)
n_rep: Number of simulation iterations per condition
data: A data frame with simulation results (see Data Structure)
The data element contains simulation results with the following columns:
Simulation Metadata:
batch: Batch identifier for the simulation run
iteration: Individual simulation iteration within a batch
source_file: Path to the file containing simulation parameters or results
Input Parameters:
f_simulated: The true effect size used to generate the simulated data
f_expected: The expected effect size specified for the SPRT
k_groups: Number of groups in the design
alpha: Significance level (Type I error rate)
power: Desired statistical power (1 - Type II error rate)
distribution: Data distribution used for simulation
sd: Standard deviation(s) used in data generation in each group
sample_ratio: Ratio of sample sizes between groups (e.g., 1:1, 2:1)
n_raw_data: Total number of raw observations generated in each group
fix_n: Fixed sample size
Individual Test Results:
n: Actual sample size at which the SPRT terminated
decision: Test decision
decision_error: Whether the decision was erroneous (Type I or Type II error)
log_lr: Log-likelihood ratio at termination
f: Calculated effect size from the data
f_adj: Adjusted effect size
f_statistic: F-statistic from ANOVA test
Summary Statistics (Aggregated across iterations):
decision_error_rate: Proportion of incorrect decisions
mean_n: Mean sample size across all iterations
sd_error_n: Standard error of the mean sample size (sd(n)/sqrt(n))
median_n: Median sample size (50th percentile)
min_n, max_n: Minimum and maximum sample sizes observed
q25_n, q50_n, q75_n, q90_n, q95_n: Sample size quantiles
decision_rate_25, decision_rate_50, decision_rate_75,
decision_rate_90, decision_rate_95, decision_rate_100:
Cumulative decision rates at various percentages of maximum sample size
## Not run: # Load data (downloads automatically if needed) loaded <- load_sample_size_data() # Access the simulation data frame head(loaded$data) # Check dataset version loaded$version # e.g. "v0.1.0-data" loaded$created ## End(Not run)## Not run: # Load data (downloads automatically if needed) loaded <- load_sample_size_data() # Access the simulation data frame head(loaded$data) # Check dataset version loaded$version # e.g. "v0.1.0-data" loaded$created ## End(Not run)
Renders a parameterized R Markdown report that helps plan sample size for the sequential ANOVA.
The function takes expected effect size (f_expected), number of groups (k_groups),
the power, and decision rate, then generates a reproducible HTML report summarizing the simulation-based
sample size recommendations. The alpha level is always 0.05.
The template is located under:
inst/rmarkdown/templates/report_sample_size/skeleton/skeleton.Rmd.
plan_sample_size( f_expected, k_groups, beta = 0.05, decision_rate = 0.85, output_dir = tempdir(), output_file = "sprtt-report-sample-size-planning.html", open = interactive(), overwrite = FALSE )plan_sample_size( f_expected, k_groups, beta = 0.05, decision_rate = 0.85, output_dir = tempdir(), output_file = "sprtt-report-sample-size-planning.html", open = interactive(), overwrite = FALSE )
f_expected |
Numeric scalar. The expected standardized effect size (e.g., Cohen's f). Must be between 0.1 and 0.4 (increments of 0.05). |
k_groups |
Integer scalar. The number of groups to compare. Must be between 2 and 4. |
beta |
Numeric scalar (default = 0.05). Desired beta error rate (Type II error). Possible values are 0.20, 0.10, and 0.05. |
decision_rate |
Numeric scalar (default = 0.85). Desired chance to reach a decision. Must be between 0.75 and 0.95 (increments of 0.05). |
output_dir |
Character string. Directory in which to save the rendered HTML report.
Defaults to a temporary directory ( |
output_file |
Character string. File name of the generated HTML report.
Defaults to |
open |
Logical (default = |
overwrite |
Logical (default = |
This function is a front-end utility for rendering a pre-defined R Markdown report using
rmarkdown::render().
Invisibly returns the path to the rendered HTML file (character string). The report is optionally opened in the default browser.
If the specified output file already exists:
and overwrite = FALSE, the user is asked whether to overwrite (in interactive sessions);
otherwise, an error is thrown.
If overwrite = TRUE, the file is replaced silently.
## Not run: # Generate and open an SPRT sample size planning report: plan_sample_size( f_expected = 0.25, k_groups = 3, decision_rate = 0.9 ) # Prevent overwriting an existing file: plan_sample_size(0.25, 3, overwrite = FALSE) ## End(Not run)## Not run: # Generate and open an SPRT sample size planning report: plan_sample_size( f_expected = 0.25, k_groups = 3, decision_rate = 0.9 ) # Prevent overwriting an existing file: plan_sample_size(0.25, 3, overwrite = FALSE) ## End(Not run)
Creates a visualization of the sequential probability ratio test (SPRT) for ANOVA results, showing the log-likelihood ratio trajectory across sample sizes and decision boundaries.
plot_anova( anova_results, labels = TRUE, position_labels_x = 0.15, position_labels_y = 0.1, position_lr_x = NULL, position_lr_y = NULL, font_size = 15, line_size = 1, highlight_color = "#CD2626" )plot_anova( anova_results, labels = TRUE, position_labels_x = 0.15, position_labels_y = 0.1, position_lr_x = NULL, position_lr_y = NULL, font_size = 15, line_size = 1, highlight_color = "#CD2626" )
anova_results |
A |
labels |
Logical. If |
position_labels_x |
Numeric value between 0 and 1 controlling the
horizontal position of decision labels as a proportion of maximum sample
size. Default is |
position_labels_y |
Numeric value controlling the vertical spacing
between decision boundaries and their labels. The value is multiplied by
|
position_lr_x |
Optional numeric value for the x-coordinate (sample size)
of the likelihood ratio label. If |
position_lr_y |
Optional numeric value for the y-coordinate
(log-likelihood ratio) of the likelihood ratio label. If |
font_size |
Numeric. Base font size for plot text. Default is |
line_size |
Numeric. Line width for the trajectory and boundaries.
Default is |
highlight_color |
Character string. Color for highlighting the decision
point or final sample. Default is |
A ggplot2::ggplot() object showing:
Log-likelihood ratio trajectory across sample sizes
Dashed horizontal lines indicating decision boundaries
Highlighted point showing where decision was reached (or final sample)
Optional labels for decision regions and likelihood ratio value
# simulate data for the example ------------------------------------------------ set.seed(3) data <- sprtt::draw_sample_normal(3, f = 0.25, max_n = 50) # calculate the SPRT ----------------------------------------------------------- anova_results <- sprtt::seq_anova(y~x, f = 0.25, data = data, plot = TRUE) # plot the results ------------------------------------------------------------- # default settings sprtt::plot_anova(anova_results) # variant 1 sprtt::plot_anova(anova_results, labels = TRUE, position_labels_x = 0.05, position_lr_x = 150, position_lr_y = 0, highlight_color = "green" ) # variant 2 sprtt::plot_anova(anova_results, labels = TRUE, position_labels_x = 0.15, position_labels_y = 0.2, position_lr_x = 60, position_lr_y = 1, font_size = 25, line_size = 2, highlight_color = "darkred" ) # no labels sprtt::plot_anova(anova_results, labels = FALSE ) # custom additions sprtt::plot_anova(anova_results) + ggplot2::geom_vline(xintercept = 66, linewidth = 1, linetype = "dashed") # further information ---------------------------------------------------------- # run this code: vignette("one_way_anova", package = "sprtt")# simulate data for the example ------------------------------------------------ set.seed(3) data <- sprtt::draw_sample_normal(3, f = 0.25, max_n = 50) # calculate the SPRT ----------------------------------------------------------- anova_results <- sprtt::seq_anova(y~x, f = 0.25, data = data, plot = TRUE) # plot the results ------------------------------------------------------------- # default settings sprtt::plot_anova(anova_results) # variant 1 sprtt::plot_anova(anova_results, labels = TRUE, position_labels_x = 0.05, position_lr_x = 150, position_lr_y = 0, highlight_color = "green" ) # variant 2 sprtt::plot_anova(anova_results, labels = TRUE, position_labels_x = 0.15, position_labels_y = 0.2, position_lr_x = 60, position_lr_y = 1, font_size = 25, line_size = 2, highlight_color = "darkred" ) # no labels sprtt::plot_anova(anova_results, labels = FALSE ) # custom additions sprtt::plot_anova(anova_results) + ggplot2::geom_vline(xintercept = 66, linewidth = 1, linetype = "dashed") # further information ---------------------------------------------------------- # run this code: vignette("one_way_anova", package = "sprtt")
Performs a sequential one-way fixed effects ANOVA, which is a variant of a Sequential Probability Ratio Test (SPRT). The test allows for continuous monitoring of data collection and provides stopping boundaries based on likelihood ratios, offering efficiency gains over traditional fixed-N designs.
The sequential ANOVA continuously evaluates the likelihood ratio after each observation (or group of observations), stopping when sufficient evidence accumulates for either H0 or H1.
For methodological details, see Steinhilber et al. (2024)
https://doi.org/10.1037/met0000677. For practical guidance, see
vignette("one_way_anova", package = "sprtt").
seq_anova( formula, f, alpha = 0.05, power = 0.95, data, verbose = TRUE, plot = FALSE, seq_steps = "single" )seq_anova( formula, f, alpha = 0.05, power = 0.95, data, verbose = TRUE, plot = FALSE, seq_steps = "single" )
formula |
A formula specifying the model (e.g., |
f |
Cohen's f (expected minimal effect size or effect size of interest), that defines the H1. |
alpha |
Type I error rate (alpha level). The probability of rejecting H0 when it is true. Default is 0.05. Must be between 0 and 1. |
power |
Statistical power (1 - beta), where beta is the Type II error rate. The probability of correctly rejecting H0 when H1 is true with effect size f. Default is 0.95. Must be between 0 and 1. Higher values lead to wider stopping boundaries and potentially larger sample sizes. |
data |
A data frame containing the variables specified in the formula. Missing values (NA) will be removed with a warning |
verbose |
a logical value whether you want a verbose output or not. |
plot |
Logical. If |
seq_steps |
Specifies when to calculate test statistics during sequential
testing (only relevant when
For unbalanced designs or non-standard sequences, specify custom steps as a vector. |
An object of the S4 class seq_anova_results. Click on the
class link to see the full description of the slots.
To get access to the object use the
@-operator or []-brackets instead of $.
See the examples below.
Only one-way fixed effects ANOVA is currently supported
Repeated measures ANOVA is not yet implemented
plot_anova() for visualizing sequential ANOVA results
plan_sample_size() for sample size planning
seq_ttest() for sequential t-tests
vignette("one_way_anova", package = "sprtt") for detailed tutorial
vignette("plan_sample_size", package = "sprtt") for planning guidance
Steinhilber et al. (2023) for theoretical background
# simulate data ---------------------------------------------------------------- set.seed(333) data <- sprtt::draw_sample_normal(k_groups = 3, f = 0.25, sd = c(1, 1, 1), max_n = 50) # calculate sequential ANOVA --------------------------------------------------- results <- sprtt::seq_anova(y ~ x, f = 0.25, data = data) # test decision results@decision # test results results # calculate sequential ANOVA --------------------------------------------------- results <- sprtt::seq_anova(y ~ x, f = 0.25, data = data, alpha = 0.01, power = .80, verbose = TRUE) results # calculate sequential ANOVA --------------------------------------------------- results <- sprtt::seq_anova(y ~ x, f = 0.15, data = data, alpha = 0.05, power = .80, verbose = FALSE) results# simulate data ---------------------------------------------------------------- set.seed(333) data <- sprtt::draw_sample_normal(k_groups = 3, f = 0.25, sd = c(1, 1, 1), max_n = 50) # calculate sequential ANOVA --------------------------------------------------- results <- sprtt::seq_anova(y ~ x, f = 0.25, data = data) # test decision results@decision # test results results # calculate sequential ANOVA --------------------------------------------------- results <- sprtt::seq_anova(y ~ x, f = 0.25, data = data, alpha = 0.01, power = .80, verbose = TRUE) results # calculate sequential ANOVA --------------------------------------------------- results <- sprtt::seq_anova(y ~ x, f = 0.15, data = data, alpha = 0.05, power = .80, verbose = FALSE) results
Performs one-sample, two-sample, and paired sequential t-tests, which are variants of Sequential Probability Ratio Tests (SPRT). The test allows for continuous monitoring of data collection and provides stopping boundaries based on likelihood ratios, offering efficiency gains over traditional fixed-N designs.
The sequential t-test continuously evaluates the likelihood ratio after each observation (or pair of observations), stopping when sufficient evidence accumulates for either H0 or H1.
For methodological details, see Schnuerch & Erdfelder (2019)
https://doi.org/10.1037/met0000234. For practical guidance, see
vignette("t_test", package = "sprtt").
seq_ttest( x, y = NULL, data = NULL, mu = 0, d, alpha = 0.05, power = 0.95, alternative = "two.sided", paired = FALSE, na.rm = TRUE, verbose = TRUE )seq_ttest( x, y = NULL, data = NULL, mu = 0, d, alpha = 0.05, power = 0.95, alternative = "two.sided", paired = FALSE, na.rm = TRUE, verbose = TRUE )
x |
Works with two classes:
|
y |
An optional (non-empty) numeric vector of data values.
Only used for two-sample tests when |
data |
An optional data frame containing the variables specified in the formula.
Only used when |
mu |
a number indicating the true value of the mean (or difference in means if you are performing a two sample test). |
d |
a number indicating the specified (expected) effect size (Cohen's d) |
alpha |
Type I error rate (alpha level). The probability of rejecting H0 when it is true. Default is 0.05. Must be between 0 and 1. |
power |
Statistical power (1 - beta), where beta is the Type II error rate. The probability of correctly rejecting H0 when H1 is true with effect size d. Default is 0.95. Must be between 0 and 1. Higher values lead to wider stopping boundaries and potentially larger sample sizes. |
alternative |
a character string specifying the alternative hypothesis,
must be one of |
paired |
Logical indicating whether to perform a paired t-test.
Default is |
na.rm |
a logical value indicating whether |
verbose |
a logical value whether you want a verbose output or not. |
An object of the S4 class seq_ttest_results. Click on the
class link to see the full description of the slots.
To get access to the object use the
@-operator or []-brackets instead of $.
See the examples below.
seq_anova() for sequential one-way ANOVA
draw_sample_normal() for simulating test data
vignette("t_test", package = "sprtt") for detailed tutorial
vignette("usage_sprtt", package = "sprtt") for package overview
Schnuerch & Erdfelder (2019) https://doi.org/10.1037/met0000234 for theoretical background
# set seed -------------------------------------------------------------------- set.seed(333) # load library ---------------------------------------------------------------- library(sprtt) # one sample: numeric input --------------------------------------------------- treatment_group <- rnorm(20, mean = 0, sd = 1) results <- seq_ttest(treatment_group, mu = 1, d = 0.8) # get access to the slots ----------------------------------------------------- # @ Operator results@likelihood_ratio # [] Operator results["likelihood_ratio"] # two sample: numeric input---------------------------------------------------- treatment_group <- stats::rnorm(20, mean = 0, sd = 1) control_group <- stats::rnorm(20, mean = 1, sd = 1) seq_ttest(treatment_group, control_group, d = 0.8) # two sample: formula input --------------------------------------------------- stress_level <- stats::rnorm(20, mean = 0, sd = 1) sex <- as.factor(c(rep(1, 10), rep(2, 10))) seq_ttest(stress_level ~ sex, d = 0.8) # NA in the data -------------------------------------------------------------- stress_level <- c(NA, stats::rnorm(20, mean = 0, sd = 2), NA) sex <- as.factor(c(rep(1, 11), rep(2, 11))) seq_ttest(stress_level ~ sex, d = 0.8, na.rm = TRUE) # work with dataset (data are in the package included) ------------------------ seq_ttest(monthly_income ~ sex, data = df_income, d = 0.8)# set seed -------------------------------------------------------------------- set.seed(333) # load library ---------------------------------------------------------------- library(sprtt) # one sample: numeric input --------------------------------------------------- treatment_group <- rnorm(20, mean = 0, sd = 1) results <- seq_ttest(treatment_group, mu = 1, d = 0.8) # get access to the slots ----------------------------------------------------- # @ Operator results@likelihood_ratio # [] Operator results["likelihood_ratio"] # two sample: numeric input---------------------------------------------------- treatment_group <- stats::rnorm(20, mean = 0, sd = 1) control_group <- stats::rnorm(20, mean = 1, sd = 1) seq_ttest(treatment_group, control_group, d = 0.8) # two sample: formula input --------------------------------------------------- stress_level <- stats::rnorm(20, mean = 0, sd = 1) sex <- as.factor(c(rep(1, 10), rep(2, 10))) seq_ttest(stress_level ~ sex, d = 0.8) # NA in the data -------------------------------------------------------------- stress_level <- c(NA, stats::rnorm(20, mean = 0, sd = 2), NA) sex <- as.factor(c(rep(1, 11), rep(2, 11))) seq_ttest(stress_level ~ sex, d = 0.8, na.rm = TRUE) # work with dataset (data are in the package included) ------------------------ seq_ttest(monthly_income ~ sex, data = df_income, d = 0.8)