Statistical Significance Calculator
Compare a control and variant with a two-proportion z-test: conversion rates feed a pooled standard error, the rate gap becomes a z-score, and the z-score becomes a p-value. The result helps teams judge whether an observed uplift is strong enough to support an experiment decision.
Experiment Inputs
Compare control and variant conversion counts with a two-proportion significance check.
Quick Scenarios
Significance Verdict
95% confidence · Two-sided
The variant clears the selected significance threshold.
p = 0.0167
Absolute lift: +0.66 pp · Relative lift: +14.8%
Control conversion rate
4.5%
540 / 12,000
Variant conversion rate
5.16%
612 / 11,850
95% confidence interval
+0.12 pp to +1.21 pp
Observed power
66.8%
Incremental conversions per 10k visitors: 66.5
Interpretation
At 95% confidence, the variant is statistically higher than the control. The observed lift is +0.66 pp (+14.8%) with p = 0.0167.
Decision hint
The result is statistically significant and materially sized enough for a rollout discussion. Confirm sample quality, guardrail metrics, and segment consistency before final launch.
Detailed Breakdown
Use this section to pressure-test how strong the evidence is before you roll out the variant.
| Metric | Value |
|---|---|
| Pooled conversion rate | 4.83% |
| Standard error | 0.0028 |
| Z-score | 2.393 |
| Cohen's h | 0.031 |
| Current confidence threshold | 95% confidence |
| Tail setting | Two-sided |
80% power target
16,328
Recommended visitors per variant for a clearer read.
90% power target
21,858
Useful when false negatives are expensive.
Current traffic gap
4,478
Additional visitors per variant to hit the 80% power target.
Assumption notes
- Each visitor should appear in only one group during the test window.
- The conversion event should be binary and measured the same way for both groups.
- Confidence intervals use the difference in conversion rates, not the relative lift.
What to review before rollout
- Check that the experiment was not stopped early after repeated peeking.
- Confirm guardrail metrics such as refunds, churn, or downstream activation.
- Segment the result if major traffic sources or devices behaved differently.
Editorial & Review Information
Reviewed on: 2026-03-11
Published on: 2025-09-10
Author: LumoCalculator Editorial Team
What we checked: Formula math, default example arithmetic, interpretation thresholds, boundary statements, and source accessibility.
Purpose and scope: This page supports experiment planning for binary conversion events such as signups, checkouts, clicks, or submissions. It is not a replacement for a full experimentation platform or a multi-metric decision framework.
How to use this review: Keep user assignment clean, lock the primary metric before launch, compare the p-value with the selected confidence level, then review practical impact and guardrail metrics before rollout.
Use Scenarios
Landing-page and signup tests
Use the calculator when a product, growth, or lifecycle team is comparing one conversion event across two experiences and needs a fast “ship, hold, or keep running” readout.
Feature flags and rollout checks
Pair a primary conversion metric with a one-sided “variant worse than control” guardrail check before rolling a feature from limited beta to a broader audience.
Survey, email, and ops pilots
The same logic works for response rate, click-through rate, or task completion rate when each observation is a yes-or-no outcome and the two groups stay independent.
Formula Explanation
1) Conversion rate and absolute lift
Conversion rate = conversions / visitors
Absolute lift = variant rate - control rate
This first step turns raw counts into comparable rates. Absolute lift is the percentage-point gap between the two variants. Relative lift then divides that gap by the control rate so teams can discuss “up 12%” and “up 0.5 points” at the same time.
2) Pooled standard error
Pooled rate = (control conversions + variant conversions) / total visitors
Standard error = sqrt(pooled rate x (1 - pooled rate) x (1 / control visitors + 1 / variant visitors))
The pooled rate creates the no-difference benchmark used by the z-test. The standard error then measures how much random variation we would expect around the observed lift if there were no true effect.
3) Z-score and p-value
Z-score = absolute lift / standard error
P-value = probability of seeing a difference this large under the null hypothesis
A larger z-score means the observed gap is less compatible with pure noise. The p-value converts that z-score into a decision threshold. If the p-value is smaller than alpha, the result is marked statistically significant at the selected confidence level.
4) Confidence interval, power, and sample planning
Confidence interval = absolute lift +/- critical value x unpooled standard error
Required sample per variant ≈ 2 x ((critical value + power target z) / effect size)^2
The confidence interval shows the range of rate gaps still compatible with the data. Power and required sample size translate the same evidence into an execution question: do you have enough traffic to detect the effect size you care about with acceptable false-negative risk?
Example Cases
Case 1: Homepage CTA uplift
Inputs
- Control: 12,000 visitors, 540 conversions
- Variant: 11,850 visitors, 612 conversions
- Confidence level: 95%
- Hypothesis: Two-sided
Computed Results
- Control rate: 4.50%
- Variant rate: 5.16%
- Absolute lift: +0.66 pp
- P-value: 0.0167
- Observed power: 66.8%
Interpretation
The variant clears a 95% threshold, but the traffic level is still below a comfortable power target for repeatability.
Decision Hint
Treat this as a launch candidate, then verify downstream quality metrics before a full rollout.
Case 2: Email subject-line test
Inputs
- Control: 800 sends, 48 clicks
- Variant: 810 sends, 59 clicks
- Confidence level: 95%
- Hypothesis: Two-sided
Computed Results
- Control rate: 6.00%
- Variant rate: 7.28%
- Absolute lift: +1.28 pp
- P-value: 0.3011
- Observed power: 17.9%
Interpretation
The uplift looks promising, but the interval is wide and the test is badly underpowered.
Decision Hint
Keep the experiment running or narrow the minimum detectable effect before calling a winner.
Case 3: Guardrail drop check
Inputs
- Control: 9,000 users, 396 conversions
- Variant: 9,050 users, 340 conversions
- Confidence level: 95%
- Hypothesis: Variant < Control
Computed Results
- Control rate: 4.40%
- Variant rate: 3.76%
- Absolute lift: -0.64 pp
- P-value: 0.0145
- Observed power: 70.6%
Interpretation
A one-sided guardrail check indicates the variant is materially worse than control.
Decision Hint
Pause or roll back the treatment and inspect UX friction or audience-quality shifts before relaunch.
Boundary Conditions
Sources & References
- ABTestGuide - A/B-Test Calculator - Calculator-first reference for conversion-rate significance and power framing.
- GraphPad - t Test Calculator - Reference for p-value interpretation, confidence-interval language, and test-result reporting.
- Qualtrics - Statistical Significance Calculator & Guide - Long-form explanation of statistical significance, confidence levels, and sample-size planning.
- VWO - A/B Test Statistical Significance Calculator - Experimentation-focused framing for significance, lift interpretation, and planning context.
- Optimizely - Sample Size Calculator - Sample-size and minimum-detectable-effect context for running experiments to adequate power.