AB Test Sample Size Calculator Excel Style
Estimate how many visitors you need per variant before launching an A/B test. This calculator uses a standard two-proportion sample size method, shows the expected total traffic requirement, and visualizes how sample size changes as the minimum detectable uplift gets smaller.
Your results will appear here
Enter your assumptions and click Calculate Sample Size to estimate visitors required per variant and total test traffic.
Expert guide to using an AB test sample size calculator in Excel
An AB test sample size calculator Excel workflow is one of the most practical ways to plan experiments before traffic starts flowing. Marketers, product managers, conversion rate optimization specialists, and analysts all face the same question: How many visitors do we need before we can trust the outcome? If you guess too low, your experiment ends with noise instead of insight. If you guess too high, you may delay decisions and waste time. That is exactly why sample size planning matters.
At its core, sample size estimation is a balance between business ambition and statistical discipline. You want enough observations to detect a meaningful improvement, but not so many that every small test becomes operationally expensive. Excel remains popular because it is accessible, auditable, and easy to share across teams. A well-built spreadsheet can calculate per-variant sample size, total traffic requirements, expected test duration, and even scenario comparisons for different uplift assumptions.
What the calculator is actually estimating
This calculator is built around a standard two-proportion hypothesis test. In a classic A/B test, the control and the variant each produce a conversion rate. Sample size planning asks how many observations are needed in each group to detect a target difference with a chosen confidence level and a chosen power level.
- Baseline conversion rate: the current performance of your control experience.
- Minimum detectable uplift: the smallest relative improvement worth detecting.
- Confidence level: the probability threshold for limiting false positives.
- Power: the probability of detecting a real effect if it exists.
- Test type: whether you use a one-sided or two-sided hypothesis.
In most business experiments, a 95% confidence level and 80% power are the common starting point. Those choices are popular because they strike a reasonable balance between rigor and feasibility. If you raise either threshold, required sample size rises quickly.
Key practical takeaway: sample size is highly sensitive to the effect you want to detect. Detecting a 2% relative uplift requires dramatically more traffic than detecting a 20% uplift. That single assumption often matters more than any spreadsheet formatting or dashboard design.
Why Excel is still useful for AB test planning
Even with many online tools available, Excel still has serious advantages. First, it allows transparent formulas that stakeholders can inspect. Second, you can embed your own business assumptions such as expected weekly traffic, traffic allocation across pages, mobile versus desktop split, and seasonality adjustments. Third, Excel makes scenario planning very easy. You can create columns for several baseline rates and several uplifts, then compare outcomes side by side.
If you want to mirror the logic of this calculator in Excel, the main statistical element is the normal critical value. In Excel, analysts often use NORM.S.INV() to get the Z-score for confidence and power assumptions. Once you have those Z-scores, you plug them into the two-proportion sample size equation.
n = ((Zalpha * SQRT(2 * pbar * (1 – pbar)) + Zbeta * SQRT(p1 * (1 – p1) + p2 * (1 – p2))) ^ 2) / (p2 – p1) ^ 2Where p1 is the baseline conversion rate, p2 is the expected variant conversion rate, and pbar is the average of the two. The result is the estimated number of observations needed per variant.
Reference table: common confidence and power settings
| Setting | Probability | Approximate Z-score | Typical use |
|---|---|---|---|
| Confidence level | 90% | 1.645 | Faster directional testing when risk tolerance is higher |
| Confidence level | 95% | 1.960 | Most common default for product and CRO experiments |
| Confidence level | 99% | 2.576 | High certainty, but much larger traffic requirement |
| Power | 80% | 0.842 | Standard planning threshold in many experimentation programs |
| Power | 90% | 1.282 | Useful when missing a real lift is especially costly |
| Power | 95% | 1.645 | High-sensitivity planning, often expensive in traffic terms |
Sample size examples with realistic benchmark assumptions
To understand the scale, assume a two-sided test with 95% confidence and 80% power. The table below shows approximate per-variant sample size requirements for common business situations. These values are representative of standard two-proportion calculations and illustrate the nonlinear relationship between baseline rate and MDE.
| Baseline conversion rate | Relative uplift target | Expected variant rate | Approximate sample size per variant | Total for A/B test |
|---|---|---|---|---|
| 5% | 10% | 5.5% | 31,000+ | 62,000+ |
| 10% | 15% | 11.5% | 6,800+ | 13,600+ |
| 20% | 10% | 22% | 6,100+ | 12,200+ |
| 30% | 5% | 31.5% | 17,000+ | 34,000+ |
Notice the pattern. When the uplift target gets smaller, the required sample size expands rapidly. This is one of the biggest reasons experiments underperform in practice: teams ask a low-traffic page to detect a tiny effect. The math does not cooperate.
How to build the same logic in Excel
- Create input cells for baseline conversion, uplift, confidence, power, and number of variants.
- Convert percentages to decimals. For example, 10% becomes 0.10.
- Calculate expected variant rate as baseline multiplied by one plus relative uplift.
- Use Excel’s NORM.S.INV() function for your confidence and power critical values.
- Apply the two-proportion sample size formula.
- Round up with ROUNDUP() to avoid underestimating required traffic.
- Multiply by the number of variants to estimate total observations required.
- Divide by average daily traffic to estimate test runtime.
A spreadsheet version is useful because it can also include operational constraints. For example, if your page receives 3,000 sessions per week and your calculator says you need 24,000 total observations, the test will likely run for about eight weeks before accounting for exclusions or quality filters. That insight matters more than the raw sample size number because it influences business planning, campaign sequencing, and release calendars.
Common mistakes when using an AB test sample size calculator Excel sheet
- Using an unrealistic uplift target. Teams often assume a large improvement because they want a shorter test. That does not make the estimate correct.
- Stopping early. Peeking at data every day and ending when the numbers look good increases false positives.
- Ignoring traffic splits. If traffic is not evenly distributed, effective sample size changes.
- Running too many variants. More variants can be attractive, but they usually increase total traffic needs.
- Mixing users and sessions. Decide which unit matters for the KPI and keep measurement consistent.
- Not accounting for seasonality. Weekday and weekend behavior can shift results, especially on ecommerce sites.
When a one-sided test makes sense
A one-sided test assumes you only care about detecting improvement in one direction. That reduces the critical threshold slightly and can lower sample size. However, it should only be used when a decrease is not relevant to your decision framework, which is rare in live product work. In most practical experimentation programs, the safer default is a two-sided test because both wins and losses matter.
How to interpret the output correctly
The number shown by this calculator is not a guarantee that your test will produce a winner. It is simply the estimated sample size needed to detect the specific effect you asked for under the chosen statistical assumptions. If the real effect is smaller than your target uplift, you may still reach the sample size and see no significant result. That outcome does not mean the test failed. It may mean the true effect was too small to justify rollout, or that the variation did not outperform meaningfully.
Another important point is that sample size should be planned before launch, not retrofitted after the data starts arriving. Pre-registration is common in formal experimental settings for this reason. If you set the rules after seeing partial outcomes, your false positive risk rises.
Authoritative statistical references
If you want deeper methodological grounding, these sources are excellent starting points:
- NIST Engineering Statistics Handbook for foundational hypothesis testing and statistical methods.
- Penn State STAT program resources for probability, inference, and applied statistical reasoning.
- U.S. FDA guidance on experimental design considerations for high-rigor perspectives on planning and inference.
Final recommendation
If you are building or auditing an ab test sample size calculator excel model, start with conservative defaults: 95% confidence, 80% power, a realistic baseline, and a minimum detectable uplift that reflects true business value. Then run scenarios. Compare a 5%, 10%, and 15% uplift target. Estimate duration based on actual traffic, not optimistic forecasts. Finally, document the assumptions directly in your spreadsheet so other stakeholders can evaluate the logic before the experiment begins.
The real purpose of sample size planning is not to impress with statistical language. It is to support better decisions. A well-designed calculator helps your team know when to test, what effect size is worth pursuing, how long the test should run, and whether the expected traffic justifies the effort. That is what turns experimentation from guesswork into disciplined optimization.