AB Tasty Sample Size Calculator
Estimate how many visitors you need per variation before launching an A/B test. Enter your baseline conversion rate, minimum detectable effect, confidence level, statistical power, and number of variants to get a defensible testing target.
Your results will appear here
Enter your assumptions above and click Calculate Sample Size.
Expert guide to using an AB Tasty sample size calculator
An AB Tasty sample size calculator helps experimentation teams answer a question that sits at the center of every trustworthy test: how much traffic do we need before we can believe the outcome? Too few visitors and you risk calling noise a win. Too many visitors and you delay decisions, waste opportunity cost, and frustrate stakeholders who expect faster iteration. A rigorous calculator brings structure to that tradeoff by quantifying the number of users required to detect a meaningful difference between a control experience and one or more variants.
At its core, a sample size calculator for A/B testing uses statistical assumptions. You start with a baseline conversion rate, define the minimum detectable effect you care about, set a confidence level, and choose your target statistical power. The calculator then estimates the number of observations needed per variant. This matters because an experiment that spreads traffic across multiple versions requires more total visitors than a simple control-versus-one-variant split.
Why sample size matters in experimentation
If you stop an A/B test after a small burst of traffic, early randomness can easily make a weak idea look strong or hide a real winner. Sample size planning protects against both errors:
- False positives: declaring a winner when no true effect exists.
- False negatives: missing a real improvement because the test is underpowered.
- Operational bias: repeatedly checking results too early and ending tests when the chart looks favorable.
- Misallocated traffic: splitting visitors across too many variants without enough traffic to support them.
This is why mature optimization programs do not launch tests based only on design enthusiasm or anecdotal user feedback. They start with a sample size estimate, then validate whether the business has enough qualifying traffic to complete the experiment in a reasonable timeframe.
The inputs that drive the calculation
To use an AB Tasty sample size calculator well, you need to understand each variable.
- Baseline conversion rate: your current performance. If your page converts at 5%, the calculator assumes that roughly 5 out of every 100 users convert in the control condition.
- Minimum detectable effect, or MDE: the smallest lift worth detecting. If you set an MDE of 10%, a 5% baseline becomes a target difference between 5.00% and 5.50%.
- Confidence level: commonly 95%. This controls your tolerance for false positives. Higher confidence means you require stronger evidence and usually more traffic.
- Statistical power: often 80% or 90%. Power measures how likely your test is to detect a real effect of at least the size you specified. Higher power also increases sample requirements.
- Number of variants: adding more variants increases the total audience requirement, because each version needs enough users to support valid comparisons.
These settings should reflect business reality. For instance, if a 2% uplift would not materially affect revenue, you should not force the calculator to detect such a tiny difference unless you have enormous traffic. On the other hand, if your average order value is very high, even a small uplift may justify the longer runtime.
| Statistical setting | Common value | Approximate critical value | Meaning in practice |
|---|---|---|---|
| Confidence level | 90% | Z ≈ 1.645 | Faster tests, but greater tolerance for false positives. |
| Confidence level | 95% | Z ≈ 1.960 | Most common business standard for A/B testing. |
| Confidence level | 99% | Z ≈ 2.576 | Very strict, useful when false winners are especially costly. |
| Power | 80% | Z ≈ 0.842 | Balanced default that many product teams use. |
| Power | 85% | Z ≈ 1.036 | Moderately stronger protection against false negatives. |
| Power | 90% | Z ≈ 1.282 | Higher rigor, but increased traffic and longer runtime. |
How the underlying formula works
For binary conversion outcomes, most calculators use a two-proportion sample size formula. In simplified terms, the method compares the current conversion rate p1 with an expected treatment rate p2. If your baseline is 5% and your MDE is 10%, then p2 = 5% × 1.10 = 5.5%. The calculator then estimates how many users are required in each group for the observed difference to be statistically detectable with your chosen confidence and power.
Although different testing platforms may apply multiple-testing adjustments, Bayesian methods, or sequential analysis, the classic fixed-horizon approach remains a strong planning tool. It is especially useful before a test starts, when teams need a practical estimate for resources, timing, and experiment prioritization.
Worked examples with real sample size statistics
The relationship between baseline rate, MDE, and required traffic is not linear. Detecting a tiny improvement requires dramatically more data than detecting a large one. The table below shows illustrative per-variant sample sizes using a two-sided 95% confidence level and 80% power.
| Baseline conversion rate | MDE | Expected treatment rate | Approx. sample size per variant | Total for 2 variants |
|---|---|---|---|---|
| 2.0% | 10% | 2.2% | ≈ 38,400 | ≈ 76,800 |
| 5.0% | 10% | 5.5% | ≈ 31,200 | ≈ 62,400 |
| 10.0% | 10% | 11.0% | ≈ 14,700 | ≈ 29,400 |
| 5.0% | 20% | 6.0% | ≈ 8,100 | ≈ 16,200 |
These figures highlight an important planning lesson: if your baseline conversion is low and your desired lift is modest, your test can become very expensive in traffic terms. That is why experimentation leaders often prioritize pages with stronger traffic and stronger intent signals before attempting subtle optimization work on lower-traffic areas.
Choosing a realistic MDE
The MDE is one of the most misunderstood inputs in any AB Tasty sample size calculator. Teams often choose an unrealistically small effect because it feels more rigorous. In reality, that can produce a test that would take months to finish. A better approach is to ask: what is the smallest uplift that would actually change our roadmap, justify implementation cost, or materially affect revenue?
For example:
- A homepage hero text change might need a larger MDE because expected impact is usually modest and implementation is simple.
- A pricing-page redesign could justify a smaller MDE if the traffic is highly qualified and revenue impact is meaningful.
- A checkout flow experiment may deserve high rigor because even a small conversion movement can have large downstream value.
As a rule, if the computed duration is too long, consider one of four actions: increase your MDE, reduce the number of variants, target a higher-converting audience, or redesign the test to create a stronger expected effect.
Common mistakes that lead to bad decisions
- Using all site traffic instead of eligible traffic: only count users who can actually enter the experiment.
- Ignoring seasonality: a test spanning promotions, holidays, or campaigns may violate stable-baseline assumptions.
- Stopping early after a spike: peeking at results can inflate false positives when not handled with the right methodology.
- Testing too many variants: every extra variant dilutes traffic and extends runtime.
- Mixing metrics: plan sample size around the primary metric that determines success, not a secondary engagement indicator.
- Forgetting implementation constraints: if only a subset of devices or geographies qualifies, your effective sample shrinks.
How confidence and power affect business risk
Confidence level and power are not abstract academic settings. They are risk controls. A low confidence threshold may allow you to move faster, but it increases the chance of shipping a false winner. Low power might save time up front, but it makes you more likely to miss good ideas. Mature teams align these settings with business context. For low-risk creative experiments, the organization may accept lighter standards. For pricing, checkout, regulated flows, or high-revenue surfaces, stronger settings are often justified.
Public statistical references from government and university institutions can help teams understand these tradeoffs more deeply. Useful reading includes the NIST Engineering Statistics Handbook, the National Cancer Institute explanation of statistical power, and Penn State’s materials on hypothesis testing for proportions at online.stat.psu.edu.
How to estimate test duration from sample size
Once you know the sample requirement per variant, estimating runtime becomes straightforward. Multiply that per-variant requirement by the total number of variants. Then divide by your monthly eligible traffic. If you need 30,000 users per variant and you have 2 variants, you need 60,000 total users. With 50,000 eligible visitors per month, the test would require about 1.2 months, assuming traffic is evenly split and stable.
However, practical runtime often runs longer because not all traffic is exposed evenly across devices, geographies, or logged-in states. Also, many teams exclude internal users, bot traffic, or low-quality sessions. It is wise to add a buffer rather than planning to the exact day.
Interpreting results from this calculator
This calculator returns a sample estimate per variant, total required sample, expected conversions at baseline and treatment, and an estimated runtime based on your monthly eligible traffic. Those outputs can help you answer four planning questions:
- Is this experiment feasible with current traffic?
- Will the test finish within an acceptable business window?
- Should we reduce the number of variants?
- Is the expected effect too small to justify the wait?
For many teams, the most valuable use of a sample size calculator is not the exact number itself. It is the discipline it creates. It forces experiment planning to become quantitative rather than opinion driven.
Best practices for stronger A/B testing programs
- Define one primary success metric before launch.
- Use recent, segmentation-aware baseline data.
- Prioritize tests that can plausibly create meaningful lift.
- Limit variants when traffic is constrained.
- Pre-commit your stopping rule and analysis method.
- Document assumptions so future teams can audit the decision.
In short, an AB Tasty sample size calculator is not just a utility for analysts. It is a decision framework for product managers, marketers, CRO specialists, and executives. It turns experimentation from guesswork into an investment process. When used correctly, it helps teams run fewer underpowered tests, avoid misleading wins, and allocate scarce traffic to experiments that have a realistic chance of delivering actionable learning.
If your organization is serious about experimentation maturity, use sample size planning at the very start of every test brief. Combine it with a clear hypothesis, a documented primary metric, a known rollout plan, and post-test interpretation standards. That approach dramatically improves the quality of insights you generate and reduces the chance of making product decisions based on statistical noise.