Package 'sprtt'

Title: Sequential Probability Ratio Tests Toolbox
Description: A toolbox for Sequential Probability Ratio Tests (SPRT) based on Wald (1945) <doi:10.2134/agronj1947.00021962003900070011x>. SPRTs are applied during the sampling process, ideally after each observation, and at every stage return a decision to either continue sampling or terminate and accept one of the specified hypotheses. The `seq_ttest()` function performs one-sample, two-sample, and paired t-tests for one- and two-sided hypotheses (Schnuerch & Erdfelder (2019) <doi:10.1037/met0000234>). The `seq_anova()` function performs a sequential one-way fixed effects ANOVA (Steinhilber et al. (2024) <doi:10.1037/met0000677>). The `plan_sample_size()` function helps plan sequential studies by simulating required sample sizes across a range of effect sizes. For more information, see the vignettes browseVignettes(package = "sprtt") or the package website <https://meikesteinhilber.github.io/sprtt/>.
Authors: Meike Snijder-Steinhilber [aut, cre] (ORCID: <https://orcid.org/0000-0002-7144-2100>), Martin Schnuerch [aut, ths] (ORCID: <https://orcid.org/0000-0001-6531-2265>), Anna-Lena Schubert [aut, ths] (ORCID: <https://orcid.org/0000-0001-7248-0662>)
Maintainer: Meike Snijder-Steinhilber <[email protected]>
License: AGPL (>= 3)
Version: 0.3.1
Built: 2026-05-20 10:45:00 UTC
Source: https://github.com/meikesteinhilber/sprtt

Help Index


Clear cached simulation data

Description

[Experimental]

Removes locally cached simulation data (~150 MB) used by plan_sample_size(). Data will be automatically re-downloaded on next use of sample size planning functions.

This function is useful when:

  • You want to free up disk space

  • The cached data may be outdated and you want to force a fresh download

  • Troubleshooting cache-related issues

Usage

cache_clear()

Value

Invisibly returns TRUE if cache was cleared, FALSE if no cache existed.

Examples

## Not run: 
# Clear cache
cache_clear()

## End(Not run)

Cache information

Description

[Experimental]

Displays information about cached simulation data (~150 MB) used by plan_sample_size(). Shows the cache directory location, whether data is cached, file size, and dataset version metadata.

The simulation data is automatically downloaded on first use of sample size planning functions and stored locally for faster subsequent access.

Usage

cache_info()

Value

Invisibly returns a list with:

  • cache_dir: Character string with the cache directory path

  • data_cached: Logical indicating if simulation data is cached

  • file_size_mb: Numeric file size in MB (or NA if not cached)

  • data_version: GitHub release tag of the cached dataset (or NA if not cached)

  • data_created: Date the dataset was created (or NA if not cached)

See Also


Test data to run the examples

Description

A dataset that includes 120 individuals.

Usage

df_cancer

Format

A data frame with 2 variables:

treatment_group
control_group

Test data to run the examples

Description

A dataset that includes 120 individuals with sex gender and monthly income.

Usage

df_income

Format

A data frame with 2 variables:

monthly_income
sex

Test data to run the examples

Description

A dataset that includes 120 individuals.

Usage

df_stress

Format

A data frame with 2 variables:

baseline_stress
one_year_stress

Download simulation data for sample size planning

Description

[Experimental]

Downloads pre-computed simulation results from GitHub releases. Data is cached locally and only needs to be downloaded once.

Data is hosted at: MeikeSteinhilber/sprtt_plan_sample_size

Usage

download_sample_size_data(force = FALSE)

Arguments

force

Logical. If TRUE, re-download even if data exists. Default FALSE.

Value

Invisibly returns the path to the cached data file.

Examples

## Not run: 
# Download data (only needed once)
download_sample_size_data()

# Force re-download (e.g., after data update)
download_sample_size_data(force = TRUE)

## End(Not run)

Draw Samples from a Gaussian Mixture Distribution

Description

Draws exemplary samples with a certain effect size for the sequential one-oway ANOVA or the sequential t-test, see Steinhilber et al. (2023) https://doi.org/10.31234/osf.io/m64ne

Usage

draw_sample_mixture(k_groups, f, max_n, counter_n = 100, verbose = FALSE)

Arguments

k_groups

number of groups (levels of factor_A)

f

Cohen's f. The simulated effect size.

max_n

sample size for the groups (total sample size = max_n*k_groups)

counter_n

number of times the function tries to find a possible parameter combination for the distribution. Default value is set to 100.

verbose

TRUE or FALSE. Print out more information about the internal process of sampling the parameters (the internal counter that was reached, some additional hints and the drawn parameters for the Gaussian Mixture distributions.)

Value

returns a data.frame with the columns y (observations) and x (factor_A).

Examples

set.seed(333)

data <- sprtt::draw_sample_mixture(
  k_groups = 2,
  f = 0.40,
  max_n = 2
)
data

data <- sprtt::draw_sample_mixture(
  k_groups = 4,
  f = 1.2, # very large effect size
  max_n = 4,
  counter_n = 1000, # increase of counter is necessary
  verbose = TRUE # prints more information to the console
)
data

Draw Samples from a Normal Distribution

Description

Draws exemplary samples with a certain effect size for the sequential one-oway ANOVA or the sequential t-test, see Steinhilber et al. (2023) https://doi.org/10.31234/osf.io/m64ne

Usage

draw_sample_normal(k_groups, f, max_n, sd = NULL, sample_ratio = NULL)

Arguments

k_groups

number of groups (levels of factor_A)

f

Cohen's f. The simulated effect size.

max_n

sample size for the groups (total sample size = max_n*k_groups)

sd

vector of standard deviations of the groups. Default value is 1 for each group.

sample_ratio

vector of sample ratios between th groups. Default value is 1 for each group.

Value

returns a data.frame with the columns y (observations) and x (factor_A).

Examples

set.seed(333)

data <- sprtt::draw_sample_normal(
  k_groups = 2,
  f = 0.20,
  max_n = 2
)
data

data <- sprtt::draw_sample_normal(
  k_groups = 4,
  f = 0,
  max_n = 2,
  sd = c(1, 2, 1, 8)
)
data

data <- sprtt::draw_sample_normal(
  k_groups = 3,
  f = 0.40,
  max_n = 2,
  sd = c(1, 0.8, 1),
  sample_ratio = c(1, 2, 3)
)
data

Access sample size simulation data

Description

[Experimental]

Loads pre-computed simulation results for SPRT sample size planning. If not already cached locally, the data (~150 MB) will be downloaded automatically from GitHub releases. Use this function to access the complete dataset for custom analysis and visualization. See the Data Structure section below for details on available columns.

Data is hosted at: MeikeSteinhilber/sprtt_plan_sample_size

Usage

load_sample_size_data()

Value

A named list with the following elements:

  • description: Short description of the dataset

  • version: GitHub release tag of the dataset (e.g., "v0.1.0-data")

  • created: Date the dataset was created (as character string)

  • n_rep: Number of simulation iterations per condition

  • data: A data frame with simulation results (see Data Structure)

Data Structure

The data element contains simulation results with the following columns:

Simulation Metadata:

  • batch: Batch identifier for the simulation run

  • iteration: Individual simulation iteration within a batch

  • source_file: Path to the file containing simulation parameters or results

Input Parameters:

  • f_simulated: The true effect size used to generate the simulated data

  • f_expected: The expected effect size specified for the SPRT

  • k_groups: Number of groups in the design

  • alpha: Significance level (Type I error rate)

  • power: Desired statistical power (1 - Type II error rate)

  • distribution: Data distribution used for simulation

  • sd: Standard deviation(s) used in data generation in each group

  • sample_ratio: Ratio of sample sizes between groups (e.g., 1:1, 2:1)

  • n_raw_data: Total number of raw observations generated in each group

  • fix_n: Fixed sample size

Individual Test Results:

  • n: Actual sample size at which the SPRT terminated

  • decision: Test decision

  • decision_error: Whether the decision was erroneous (Type I or Type II error)

  • log_lr: Log-likelihood ratio at termination

  • f: Calculated effect size from the data

  • f_adj: Adjusted effect size

  • f_statistic: F-statistic from ANOVA test

Summary Statistics (Aggregated across iterations):

  • decision_error_rate: Proportion of incorrect decisions

  • mean_n: Mean sample size across all iterations

  • sd_error_n: Standard error of the mean sample size (sd(n)/sqrt(n))

  • median_n: Median sample size (50th percentile)

  • min_n, max_n: Minimum and maximum sample sizes observed

  • q25_n, q50_n, q75_n, q90_n, q95_n: Sample size quantiles

  • decision_rate_25, decision_rate_50, decision_rate_75, decision_rate_90, decision_rate_95, decision_rate_100: Cumulative decision rates at various percentages of maximum sample size

Examples

## Not run: 
# Load data (downloads automatically if needed)
loaded <- load_sample_size_data()

# Access the simulation data frame
head(loaded$data)

# Check dataset version
loaded$version  # e.g. "v0.1.0-data"
loaded$created

## End(Not run)

Generates HTML reports for sample size planning for sequential ANOVAs.

Description

[Experimental]

Renders a parameterized R Markdown report that helps plan sample size for the sequential ANOVA. The function takes expected effect size (f_expected), number of groups (k_groups), the power, and decision rate, then generates a reproducible HTML report summarizing the simulation-based sample size recommendations. The alpha level is always 0.05.

The template is located under: inst/rmarkdown/templates/report_sample_size/skeleton/skeleton.Rmd.

Usage

plan_sample_size(
  f_expected,
  k_groups,
  beta = 0.05,
  decision_rate = 0.85,
  output_dir = tempdir(),
  output_file = "sprtt-report-sample-size-planning.html",
  open = interactive(),
  overwrite = FALSE
)

Arguments

f_expected

Numeric scalar. The expected standardized effect size (e.g., Cohen's f). Must be between 0.1 and 0.4 (increments of 0.05).

k_groups

Integer scalar. The number of groups to compare. Must be between 2 and 4.

beta

Numeric scalar (default = 0.05). Desired beta error rate (Type II error). Possible values are 0.20, 0.10, and 0.05.

decision_rate

Numeric scalar (default = 0.85). Desired chance to reach a decision. Must be between 0.75 and 0.95 (increments of 0.05).

output_dir

Character string. Directory in which to save the rendered HTML report. Defaults to a temporary directory (tempdir()).

output_file

Character string. File name of the generated HTML report. Defaults to "sprtt-report-sample-size-planning.html".

open

Logical (default = interactive()). If TRUE, the generated report is opened in the system's default web browser after rendering.

overwrite

Logical (default = FALSE). If FALSE and the target file already exists, the user is prompted interactively whether to overwrite it. In non-interactive sessions, an error is raised unless overwrite = TRUE.

Details

This function is a front-end utility for rendering a pre-defined R Markdown report using rmarkdown::render().

Value

Invisibly returns the path to the rendered HTML file (character string). The report is optionally opened in the default browser.

File Overwrite Behavior

  • If the specified output file already exists:

    • and overwrite = FALSE, the user is asked whether to overwrite (in interactive sessions); otherwise, an error is thrown.

    • If overwrite = TRUE, the file is replaced silently.

Examples

## Not run: 
# Generate and open an SPRT sample size planning report:
plan_sample_size(
  f_expected = 0.25,
  k_groups = 3,
  decision_rate = 0.9
)

# Prevent overwriting an existing file:
plan_sample_size(0.25, 3, overwrite = FALSE)

## End(Not run)

Plot Sequential ANOVA Results

Description

[Experimental]

Creates a visualization of the sequential probability ratio test (SPRT) for ANOVA results, showing the log-likelihood ratio trajectory across sample sizes and decision boundaries.

Usage

plot_anova(
  anova_results,
  labels = TRUE,
  position_labels_x = 0.15,
  position_labels_y = 0.1,
  position_lr_x = NULL,
  position_lr_y = NULL,
  font_size = 15,
  line_size = 1,
  highlight_color = "#CD2626"
)

Arguments

anova_results

A seq_anova_results object from seq_anova(). Important: The seq_anova() function must be called with plot = TRUE to generate the necessary data for plotting.

labels

Logical. If TRUE (default), display decision labels ("Accept H0" / "Accept H1") and the likelihood ratio at the decision point.

position_labels_x

Numeric value between 0 and 1 controlling the horizontal position of decision labels as a proportion of maximum sample size. Default is 0.15 (left side); 0.5 centers the labels.

position_labels_y

Numeric value controlling the vertical spacing between decision boundaries and their labels. The value is multiplied by ⁠max(|log-likelihood ratio|)⁠ to determine spacing. Larger values move labels further from boundaries. Default is 0.075.

position_lr_x

Optional numeric value for the x-coordinate (sample size) of the likelihood ratio label. If NULL (default), positioned at the decision point or final sample size.

position_lr_y

Optional numeric value for the y-coordinate (log-likelihood ratio) of the likelihood ratio label. If NULL (default), positioned at y = 0 for early decisions, or slightly offset for continuing sampling scenarios.

font_size

Numeric. Base font size for plot text. Default is 20.

line_size

Numeric. Line width for the trajectory and boundaries. Default is 1.5.

highlight_color

Character string. Color for highlighting the decision point or final sample. Default is "#CD2626" (red).

Value

A ggplot2::ggplot() object showing:

  • Log-likelihood ratio trajectory across sample sizes

  • Dashed horizontal lines indicating decision boundaries

  • Highlighted point showing where decision was reached (or final sample)

  • Optional labels for decision regions and likelihood ratio value

Examples

# simulate data for the example ------------------------------------------------
set.seed(3)
data <- sprtt::draw_sample_normal(3, f = 0.25, max_n = 50)

# calculate the SPRT -----------------------------------------------------------
anova_results <- sprtt::seq_anova(y~x, f = 0.25, data = data, plot = TRUE)

# plot the results -------------------------------------------------------------
# default settings
sprtt::plot_anova(anova_results)
# variant 1
sprtt::plot_anova(anova_results,
                 labels = TRUE,
                 position_labels_x = 0.05,
                 position_lr_x = 150,
                 position_lr_y = 0,
                 highlight_color = "green"
                 )
# variant 2
sprtt::plot_anova(anova_results,
                  labels = TRUE,
                  position_labels_x = 0.15,
                  position_labels_y = 0.2,
                  position_lr_x = 60,
                  position_lr_y = 1,
                  font_size = 25,
                  line_size = 2,
                  highlight_color = "darkred"
)
# no labels
sprtt::plot_anova(anova_results,
                 labels = FALSE
                 )
# custom additions
sprtt::plot_anova(anova_results) +
  ggplot2::geom_vline(xintercept = 66, linewidth = 1, linetype = "dashed")

# further information ----------------------------------------------------------
# run this code:
vignette("one_way_anova", package = "sprtt")

Sequential Analysis of Variance

Description

Performs a sequential one-way fixed effects ANOVA, which is a variant of a Sequential Probability Ratio Test (SPRT). The test allows for continuous monitoring of data collection and provides stopping boundaries based on likelihood ratios, offering efficiency gains over traditional fixed-N designs.

The sequential ANOVA continuously evaluates the likelihood ratio after each observation (or group of observations), stopping when sufficient evidence accumulates for either H0 or H1.

For methodological details, see Steinhilber et al. (2024) https://doi.org/10.1037/met0000677. For practical guidance, see vignette("one_way_anova", package = "sprtt").

Usage

seq_anova(
  formula,
  f,
  alpha = 0.05,
  power = 0.95,
  data,
  verbose = TRUE,
  plot = FALSE,
  seq_steps = "single"
)

Arguments

formula

A formula specifying the model (e.g., outcome ~ group). The response variable should be on the left side and the grouping factor on the right side. Currently only supports one-way designs.

f

Cohen's f (expected minimal effect size or effect size of interest), that defines the H1.

alpha

Type I error rate (alpha level). The probability of rejecting H0 when it is true. Default is 0.05. Must be between 0 and 1.

power

Statistical power (1 - beta), where beta is the Type II error rate. The probability of correctly rejecting H0 when H1 is true with effect size f. Default is 0.95. Must be between 0 and 1. Higher values lead to wider stopping boundaries and potentially larger sample sizes.

data

A data frame containing the variables specified in the formula. Missing values (NA) will be removed with a warning

verbose

a logical value whether you want a verbose output or not.

plot

Logical. If TRUE, stores sequential test statistics at each sequential step for visualization with plot_anova(). This enables retrospective examination of the decision process but increases computation time for large datasets. Default is FALSE.

seq_steps

Specifies when to calculate test statistics during sequential testing (only relevant when plot = TRUE). Options:

  • Vector of integers: Calculate at specific sample sizes (e.g., c(10, 20, 30))

  • "single": Calculate after each observation (step size = 1). Note: Starts at k_groups * 2 to ensure sufficient data

  • "balanced": Calculate when each group gains one observation (step size = number of groups). Note: Starts at k_groups * 2

For unbalanced designs or non-standard sequences, specify custom steps as a vector.

Value

An object of the S4 class seq_anova_results. Click on the class link to see the full description of the slots. To get access to the object use the @-operator or ⁠[]⁠-brackets instead of $. See the examples below.

Limitations

  • Only one-way fixed effects ANOVA is currently supported

  • Repeated measures ANOVA is not yet implemented

See Also

  • plot_anova() for visualizing sequential ANOVA results

  • plan_sample_size() for sample size planning

  • seq_ttest() for sequential t-tests

  • vignette("one_way_anova", package = "sprtt") for detailed tutorial

  • vignette("plan_sample_size", package = "sprtt") for planning guidance

  • Steinhilber et al. (2023) for theoretical background

Examples

# simulate data ----------------------------------------------------------------
set.seed(333)
data <- sprtt::draw_sample_normal(k_groups = 3,
                    f = 0.25,
                    sd = c(1, 1, 1),
                    max_n = 50)


# calculate sequential ANOVA ---------------------------------------------------
results <- sprtt::seq_anova(y ~ x, f = 0.25, data = data)
# test decision
results@decision
# test results
results

# calculate sequential ANOVA ---------------------------------------------------
results <- sprtt::seq_anova(y ~ x,
                            f = 0.25,
                            data = data,
                            alpha = 0.01,
                            power = .80,
                            verbose = TRUE)
results

# calculate sequential ANOVA ---------------------------------------------------
results <- sprtt::seq_anova(y ~ x,
                            f = 0.15,
                            data = data,
                            alpha = 0.05,
                            power = .80,
                            verbose = FALSE)
results

Sequential Probability Ratio Test using t-statistic

Description

Performs one-sample, two-sample, and paired sequential t-tests, which are variants of Sequential Probability Ratio Tests (SPRT). The test allows for continuous monitoring of data collection and provides stopping boundaries based on likelihood ratios, offering efficiency gains over traditional fixed-N designs.

The sequential t-test continuously evaluates the likelihood ratio after each observation (or pair of observations), stopping when sufficient evidence accumulates for either H0 or H1.

For methodological details, see Schnuerch & Erdfelder (2019) https://doi.org/10.1037/met0000234. For practical guidance, see vignette("t_test", package = "sprtt").

Usage

seq_ttest(
  x,
  y = NULL,
  data = NULL,
  mu = 0,
  d,
  alpha = 0.05,
  power = 0.95,
  alternative = "two.sided",
  paired = FALSE,
  na.rm = TRUE,
  verbose = TRUE
)

Arguments

x

Works with two classes: numeric and formula. Therefore you can write "x" or "x~y".

  • "numeric input": a (non-empty) numeric vector of data values.

  • "formula input": a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs either 1 for a one-sample test or a factor with two levels giving the corresponding groups.

y

An optional (non-empty) numeric vector of data values. Only used for two-sample tests when x is numeric.

data

An optional data frame containing the variables specified in the formula. Only used when x is a formula.

mu

a number indicating the true value of the mean (or difference in means if you are performing a two sample test).

d

a number indicating the specified (expected) effect size (Cohen's d)

alpha

Type I error rate (alpha level). The probability of rejecting H0 when it is true. Default is 0.05. Must be between 0 and 1.

power

Statistical power (1 - beta), where beta is the Type II error rate. The probability of correctly rejecting H0 when H1 is true with effect size d. Default is 0.95. Must be between 0 and 1. Higher values lead to wider stopping boundaries and potentially larger sample sizes.

alternative

a character string specifying the alternative hypothesis, must be one of two.sided (default), greater or less. You can specify just the initial letter.

paired

Logical indicating whether to perform a paired t-test. Default is FALSE.

na.rm

a logical value indicating whether NA values should be removed before the computation proceeds.

verbose

a logical value whether you want a verbose output or not.

Value

An object of the S4 class seq_ttest_results. Click on the class link to see the full description of the slots. To get access to the object use the @-operator or ⁠[]⁠-brackets instead of $. See the examples below.

See Also

Examples

# set seed --------------------------------------------------------------------
set.seed(333)

# load library ----------------------------------------------------------------
library(sprtt)

# one sample: numeric input ---------------------------------------------------
treatment_group <- rnorm(20, mean = 0, sd = 1)
results <- seq_ttest(treatment_group, mu = 1, d = 0.8)

# get access to the slots -----------------------------------------------------
# @ Operator
results@likelihood_ratio

# [] Operator
results["likelihood_ratio"]

# two sample: numeric input----------------------------------------------------
treatment_group <- stats::rnorm(20, mean = 0, sd = 1)
control_group <- stats::rnorm(20, mean = 1, sd = 1)
seq_ttest(treatment_group, control_group, d = 0.8)

# two sample: formula input ---------------------------------------------------
stress_level <- stats::rnorm(20, mean = 0, sd = 1)
sex <- as.factor(c(rep(1, 10), rep(2, 10)))
seq_ttest(stress_level ~ sex, d = 0.8)

# NA in the data --------------------------------------------------------------
stress_level <- c(NA, stats::rnorm(20, mean = 0, sd = 2), NA)
sex <- as.factor(c(rep(1, 11), rep(2, 11)))
seq_ttest(stress_level ~ sex, d = 0.8, na.rm = TRUE)

# work with dataset (data are in the package included) ------------------------
seq_ttest(monthly_income ~ sex, data = df_income, d = 0.8)