Hypergeometric Random Variable Calculator
Compute exact probabilities for sampling without replacement. Enter your population size, number of successes in the population, sample size, and target number of observed successes to evaluate the hypergeometric distribution instantly.
Total number of items in the population.
Items classified as successes in the full population.
How many items are drawn without replacement.
The target number of successes in your sample.
Choose whether you want an exact probability or a cumulative probability.
Results
Enter values and click Calculate Probability to see the hypergeometric probability, expected value, variance, and a full distribution chart.
Chart shows the probability mass function across all feasible values of X.
Expert Guide to the Hypergeometric Random Variable Calculator
A hypergeometric random variable calculator helps you compute exact probabilities when you draw items from a finite population without replacement. This is a crucial distinction. Many probability tools assume independence between draws, but hypergeometric models recognize that each draw changes the composition of the remaining population. If you remove one success from the pool, there is one fewer success available on the next draw. That changing structure is what makes the hypergeometric distribution so useful in quality control, auditing, genetics, epidemiology, card games, survey verification, and acceptance sampling.
Suppose a shipment contains 100 components, 8 of which are defective. If an inspector selects 10 parts without replacement, what is the probability of finding exactly 1 defective part? That is a classic hypergeometric question. The calculator above solves this by using the exact combinatorial formula rather than an approximation. This matters because exact probabilities can drive operational decisions such as whether a batch is accepted, whether a process remains in statistical control, or whether a sample should trigger a deeper investigation.
What the hypergeometric distribution measures
The hypergeometric distribution models the number of successes in a fixed-size sample drawn without replacement from a finite population. It uses four values:
- N: total population size
- K: number of successes in the population
- n: sample size
- x: number of observed successes in the sample
The probability of observing exactly x successes is:
P(X = x) = [C(K, x) × C(N – K, n – x)] / C(N, n)
Here, C(a, b) is the number of combinations of b items chosen from a items.
This formula is elegant because it counts all favorable samples and divides by the total number of possible samples. The term C(K, x) counts ways to choose the successes, and C(N – K, n – x) counts ways to choose the failures. Dividing by C(N, n) converts that count into a probability.
When to use this calculator
Use a hypergeometric random variable calculator when sampling is done without replacement from a finite population. Common examples include:
- Manufacturing quality inspection: A lot contains a known or estimated number of defective units, and inspectors sample a subset.
- Audit testing: An auditor examines a sample of transactions from a known pool containing a number of exceptions.
- Medical screening studies: Researchers sample a finite registry and count records with a specific trait.
- Lottery or card problems: Drawing cards from a deck without replacement is naturally hypergeometric.
- Ecology and genetics: Sampling organisms or alleles from a finite population often follows this structure.
If your draws are independent or effectively independent because the population is extremely large relative to the sample, a binomial model may be acceptable. But if the sample is not tiny relative to the population, hypergeometric probabilities are usually the correct choice.
How the calculator works
The calculator above supports both exact and cumulative probabilities:
- P(X = x): probability of exactly x successes
- P(X ≤ x): probability of x or fewer successes
- P(X ≥ x): probability of x or more successes
- P(X < x): probability of fewer than x successes
- P(X > x): probability of more than x successes
It also reports the mean and variance of the distribution. For a hypergeometric random variable, the expected value is:
E(X) = n × (K / N)
Var(X) = n × (K / N) × (1 – K / N) × ((N – n) / (N – 1))
The extra factor (N – n) / (N – 1) is called the finite population correction. It reduces the variance compared with the binomial model because draws without replacement become more informative as the sample grows.
Interpreting each input correctly
To avoid mistakes, it helps to map your real-world problem carefully:
- Population size (N) is the full set of items from which the sample is drawn.
- Successes in population (K) means the items with the trait you care about. A success is just a label, not necessarily a good outcome.
- Sample size (n) is how many items are actually drawn.
- Observed successes (x) is the count you want the probability for.
For instance, if a class has 30 students, 18 are undergraduates, and you randomly pick 5 students for interviews, then N = 30, K = 18, n = 5. If you want the chance that exactly 4 are undergraduates, then x = 4.
Hypergeometric versus binomial: why the distinction matters
People often use the binomial distribution because it is simpler, but the hypergeometric model is more accurate when sampling without replacement. The difference may be modest in huge populations, but it can become substantial when the sample is a meaningful fraction of the population.
| Feature | Hypergeometric | Binomial |
|---|---|---|
| Sampling method | Without replacement | With replacement or independent trials |
| Population size | Finite and explicitly modeled | Not directly modeled |
| Probability of success per draw | Changes after each draw | Constant across draws |
| Typical uses | Quality control, auditing, finite sampling | Repeated independent experiments |
| Variance | Includes finite population correction | n × p × (1 – p) |
As a practical rule, when the sample size exceeds about 5 percent to 10 percent of the population, the hypergeometric model is usually preferred. In regulated settings such as audits, laboratory testing, and acceptance sampling, exact modeling is often the better professional choice.
Real statistics and practical context
Finite-population sampling is not an abstract topic. It appears repeatedly in official statistics and public health methodology. The Centers for Disease Control and Prevention discusses probability concepts used in epidemiologic study design, while the U.S. Census Bureau publishes extensive guidance on modeling and sampling inputs for population-based estimation. Educational resources from Penn State University also explain finite distributions, combinations, and exact discrete probability methods used in formal statistics courses.
In industrial practice, sampling plans can be surprisingly small relative to lot size. For example, a batch of 50 or 125 units may be tested with sample sizes such as 5, 8, or 13. In those conditions, sampling without replacement is the natural model, not a binomial shortcut. Likewise, educational assessment audits and small registry reviews often involve compact finite populations where exact counts materially affect probability estimates.
| Scenario | Population N | Successes K | Sample n | Question |
|---|---|---|---|---|
| Quality control lot | 100 parts | 8 defectives | 10 inspected | Probability of exactly 1 defective |
| Card hand in poker | 52 cards | 4 aces | 5 cards | Probability of exactly 1 ace |
| Medical chart review | 200 records | 35 flagged cases | 20 reviewed | Probability of at least 5 flagged cases |
| Audit exception testing | 500 invoices | 22 exceptions | 30 sampled | Probability of zero exceptions |
Step by step example
Imagine a box contains 25 lightbulbs, of which 6 are defective. You randomly test 5 bulbs without replacement. What is the probability that exactly 2 bulbs are defective?
- Set N = 25.
- Set K = 6.
- Set n = 5.
- Set x = 2.
- Select P(X = x).
The calculator evaluates the combinations and returns the exact probability. It also displays the expected number of defective bulbs in the sample, which is 5 × (6/25) = 1.2. That does not mean you will observe 1.2 defects. It means that if the same sampling process were repeated many times, the long-run average number of defects per sample would approach 1.2.
Understanding feasible values of X
Not every x is possible. The valid range of a hypergeometric random variable is:
max(0, n – (N – K)) ≤ X ≤ min(n, K)
This range matters because impossible values should receive probability zero. If there are only 3 successes in the population, you cannot observe 4 successes in your sample. Similarly, if your sample is large enough that at least 2 successes must appear, then values below 2 are impossible. The calculator automatically builds the chart using only the feasible support of the distribution.
Why the chart is valuable
The distribution chart helps you move beyond a single probability. Instead of only asking for one x value, you can visually inspect where the mass of the distribution lies. This is especially useful when comparing a threshold probability such as P(X ≥ 4) against the most likely outcomes. In operational settings, the graph can clarify whether an observed result is ordinary, moderately unusual, or highly unlikely.
Common mistakes to avoid
- Using the binomial distribution when sampling is without replacement from a small or moderate population.
- Confusing the sample size n with the number of successes in the population K.
- Entering a value of x outside the feasible range.
- Interpreting expected value as the most likely single outcome.
- Ignoring cumulative probability when the question asks for at most, at least, fewer than, or more than.
Professional applications
Hypergeometric modeling appears in several serious analytical workflows. In acceptance sampling, managers may use exact probabilities to estimate the chance that a lot with a given number of defectives passes inspection. In internal auditing, the same logic can quantify the chance of finding a certain number of policy violations in a sample of claims or invoices. In genetics and bioinformatics, enrichment analysis often relies on hypergeometric calculations to test whether a selected subset is overrepresented for specific features. In survey validation work, the method supports finite-population reasoning when subsets are drawn from a complete frame.
How to decide whether your result is surprising
A single exact probability gives one lens, but context matters. If the observed x falls in a low-probability tail, that may indicate an unusual outcome under the assumed population composition. Analysts often compare the observation with cumulative tail areas such as P(X ≥ x) or P(X ≤ x). Those tail probabilities answer a stronger question: how unusual is this result or a more extreme one? If that probability is very small, the observation may suggest that the assumed number of successes in the population is inaccurate, or that a process has shifted.
Final takeaways
A hypergeometric random variable calculator is the right tool whenever you sample without replacement from a finite population and need exact discrete probabilities. It is especially valuable when the sample is a nontrivial share of the population, when compliance or quality decisions depend on precision, or when approximation error could change an interpretation. By entering N, K, n, and x, you can obtain not just a probability, but a better understanding of the full distribution, the expected number of successes, and the degree of uncertainty around that expectation.
Use the calculator above to test exact outcomes, cumulative thresholds, and distribution shape. Whether you are analyzing inspection data, designing a sample-based audit, or solving a classroom probability problem, the hypergeometric model provides a rigorous answer grounded in finite-population logic.