Bayes Success-Run Theorem Calculator

Estimate the posterior probability of a hidden condition, strategy, or model being true after observing a run of repeated successes. This calculator applies Bayes updating across independent success events and visualizes how confidence changes as the streak grows.

Posterior Probability Run-Length Analysis Interactive Chart

Formula used: P(H|Sⁿ) = [P(H) × P(S|H)ⁿ] / {[P(H) × P(S|H)ⁿ] + [(1 – P(H)) × P(S|not H)ⁿ]}. This assumes repeated trials are conditionally independent given the hypothesis.

Prior probability of hypothesis being true (%)

Example: 10 means you believe the hypothesis has a 10% chance before seeing the success run.

Success probability if hypothesis is true (%)

This is P(success | true hypothesis).

Success probability if hypothesis is false (%)

This is P(success | false hypothesis). It is the competing explanation’s success rate.

Observed number of consecutive successes

Enter the streak length n. A value of 0 returns the prior.

Chart maximum run length

The chart will show posterior probability from run 0 up to this length.

Decimal precision

Choose how many decimals to display in percentages and likelihood values.

Enter values and click Calculate Posterior to see the updated probability after a run of successes.

Expert Guide to the Bayes Success-Run Theorem Calculator

A bayes success-run theorem calculator helps you answer a practical question that appears in science, finance, operations, machine learning, sports analytics, and medical testing: if you observe a sequence of successes, how much should your belief increase that the underlying explanation is actually true? The power of Bayesian reasoning is that it does not look only at the streak itself. It also respects the starting odds and the possibility that the same streak could occur even when your preferred explanation is false.

This distinction matters because long runs can feel persuasive even when the base rate is low. A trader may see several profitable days, a manufacturer may log several defect-free checks, a scientist may see repeated positive assay results, or an analyst may observe a model making several correct predictions in a row. Without Bayes, people often overreact to the streak. With Bayes, you compare two worlds: the world where your hypothesis is true and the world where it is false. The calculator then updates the prior probability into a posterior probability.

Prior Your starting belief before seeing the run.

Likelihood How probable each success is under each competing explanation.

Posterior Your revised belief after the streak is observed.

What the calculator is actually computing

The calculator uses a straightforward repeated-event Bayesian update. Let H be the hypothesis that you care about, such as “the process is genuinely high quality,” “the strategy has an edge,” or “the patient truly has the condition.” Let S represent one observed success. If you observe n independent successes in a row, then:

P(H|Sⁿ) = [P(H) × P(S|H)ⁿ] / ([P(H) × P(S|H)ⁿ] + [(1 – P(H)) × P(S|not H)ⁿ])

There are four key inputs:

Prior probability: your belief in the hypothesis before the run starts.
Success probability if true: how likely one success is if the hypothesis is correct.
Success probability if false: how likely one success is even when the hypothesis is wrong.
Run length: the number of consecutive successes observed.

This model works best when each success is conditionally independent given the hypothesis. In plain language, once you know whether the hypothesis is true or false, one success should not mechanically force the next success. That assumption is often reasonable in screening, repeated classification tasks, quality checks, and controlled experiments, but it can fail in highly autocorrelated systems like markets, social behavior, or operational bottlenecks. The calculator still offers useful intuition, but you should be aware of the assumptions.

Why streaks can be misleading without Bayesian updating

Humans love narratives. A run of 5, 7, or 10 successes feels powerful because our brains naturally focus on recent observations. The problem is that a sequence can be impressive under both the true and false hypothesis. Bayes forces a direct comparison. If the success rate is 70% when the hypothesis is true but 60% when it is false, even a long run may not tell you very much because the two scenarios are too similar. By contrast, if the success rate is 80% under the true hypothesis and only 10% under the false hypothesis, each additional success sharply increases the posterior.

Base rates matter just as much. A very low prior can be hard to overcome. This is the same logic that appears in disease screening. Even a good test can produce many positive results among healthy people if the disease is rare. The success-run theorem extends that idea across repeated positives or repeated successes. As the streak gets longer, the posterior rises, but the pace depends on both the prior and the ratio between the true-hypothesis and false-hypothesis success rates.

Interpreting the output

Posterior probability tells you the revised chance that the hypothesis is true after observing the success run.
Bayes factor for the run compares how much more likely the run is if the hypothesis is true than if it is false. For independent successes, it is simply [P(S|H) / P(S|not H)]ⁿ.
Likelihood under each scenario shows the raw probability of the observed streak in each world.
Chart trend helps you see how evidence compounds as run length increases.

For many users, the chart is the most useful part. It reveals whether confidence rises gradually, accelerates after several successes, or stalls because the competing false-hypothesis success rate remains high. This visual perspective is especially important for decision-making because it lets you ask practical questions like: “At what streak length would my posterior exceed 80%?” or “How much stronger does my edge need to be before a run becomes persuasive?”

Comparison table: how the same streak behaves in different settings

Scenario	Prior P(H)	P(S\|H)	P(S\|not H)	Run Length	Posterior P(H\|S^n)
Weak evidence environment	10%	60%	40%	5	31.9%
Moderate evidence environment	10%	70%	20%	5	84.0%
High discrimination environment	10%	85%	5%	5	99.3%
Low base-rate environment	1%	85%	5%	5	93.3%

The lesson from the table is clear. A streak only becomes compelling when the success probability under the true hypothesis is much larger than under the false hypothesis. If those probabilities are close, you may need many more observed successes before your confidence moves meaningfully.

Real statistics: medical testing and repeated positive results

Bayesian reasoning is especially important in medicine because prevalence can be low and false positives are never zero. According to the U.S. Food and Drug Administration, sensitivity and specificity are the core performance metrics used to describe many diagnostic tests, and their interpretation depends on prevalence in the target population. The National Cancer Institute and other public-health sources also emphasize that screening outcomes must be read in context rather than as isolated results.

Published screening concept	Typical statistical idea	Why Bayes matters	Success-run relevance
Mammography screening examples often use roughly 80% sensitivity with false-positive rates around 7% to 10% in illustrative Bayesian teaching cases	Positive tests are more common when disease is present, but can still occur when absent	Low prevalence can keep the posterior moderate after only one positive result	Multiple independent positives can raise the posterior sharply
Many infectious disease tests are reported using sensitivity and specificity on FDA or CDC informational pages	Accuracy metrics are conditional on true disease status	Posterior probability changes with background prevalence	Repeated positives or confirmatory tests are naturally modeled as a success run
Quality control assays in public health laboratories often require repeat confirmation	Replication reduces the chance that one positive is only noise	Each additional success multiplies evidence	A success-run model quantifies that compounding evidence directly

If you use this calculator for repeated positive tests, “success” simply means “positive test result.” The hypothesis H is “the condition is actually present.” P(S|H) becomes test sensitivity and P(S|not H) becomes the false-positive rate. A run of independent positives then maps directly into the Bayesian update. This is one reason confirmatory testing is so valuable in low-prevalence settings.

When to use a bayes success-run theorem calculator

Evaluating whether a trading or betting strategy has a real edge after several wins.
Estimating whether a production process is truly within control after repeated passes.
Updating confidence in a machine learning model after a series of correct predictions.
Interpreting repeated positive or repeated successful screening outcomes.
Assessing whether a salesperson, team, or process is outperforming chance.
Comparing competing hypotheses in experiments that generate repeated binary outcomes.

Common mistakes users make

Ignoring the prior: A streak does not erase a tiny base rate instantly.
Using unrealistic false-hypothesis success rates: If P(S|not H) is understated, the posterior will be inflated.
Assuming independence when it is not justified: Correlated successes make the streak look stronger than it really is.
Confusing posterior with certainty: Even a high posterior remains probabilistic, not absolute proof.
Overlooking sample design: Selective reporting can turn ordinary variation into an apparently meaningful run.

How to choose better inputs

If you are unsure about the prior, use a range and test several values. For example, compare a conservative prior, a neutral prior, and an optimistic prior. If you are unsure about P(S|not H), be especially careful because this parameter often drives the result more than people expect. In operational settings, a good approach is to estimate both success rates from historical data. In scientific settings, use published validation studies when available. In strategic settings such as forecasting or investing, avoid overfitting recent streaks and use a broad benchmark.

Another good practice is sensitivity analysis. Keep the observed run length fixed and vary one parameter at a time. If a tiny change in the false-hypothesis success rate dramatically changes the posterior, then the decision should be treated as fragile. The interactive chart on this page is useful for that purpose because it helps you see whether confidence grows steadily or depends on optimistic assumptions.

Best practices for decision-making

Use the posterior as one input to a decision, not the only input.
Check whether independence is plausible before trusting the run-length update.
Model the strongest reasonable alternative, not a weak strawman false hypothesis.
Look at how many successes are needed to cross important thresholds such as 50%, 80%, or 95% confidence.
Document assumptions so your conclusion can be reviewed later.

Authoritative sources for further study

For readers who want deeper background on Bayesian interpretation, diagnostic accuracy, and evidence updating, the following public sources are helpful:

Final takeaway

A bayes success-run theorem calculator is valuable because it converts an intuitive but often misleading idea, “we have a streak, so it must be real,” into a disciplined probability update. The key insight is that evidence accumulates multiplicatively, but only in relation to the alternative explanation and the prior odds. If your true-hypothesis success rate is substantially larger than the false-hypothesis success rate, a run of successes can drive posterior belief upward very quickly. If the two rates are close, the same streak may provide surprisingly little evidence. Use the calculator to test realistic assumptions, inspect the chart, and make decisions with a more rigorous understanding of what repeated success actually means.

Bayes Success Run Theorem Calculator