Calculate Prediction Interval in R

Use this interactive calculator to estimate a prediction interval for a new observation from a regression model, then learn how to reproduce the result in R with correct syntax, interpretation, and practical modeling guidance.

Predicted value (y-hat)

The fitted value from your regression model for the new x value.

Residual standard error (s)

Also called the standard error of the regression or residual standard deviation.

Leverage of new point (h)

For a simple mean prediction, h is often small. In R, this is handled internally by predict().

Degrees of freedom

Usually residual degrees of freedom: n – p, where p is the number of estimated parameters.

Confidence level

This sets the width of the prediction interval.

R output style

Prediction intervals are wider because they include individual outcome variability.

Scenario label

How to calculate a prediction interval in R

A prediction interval estimates the range where a future individual observation is likely to fall, given a fitted statistical model and a chosen confidence level. In R, this task is usually done with the predict() function on a fitted model such as lm(). While many users know how to produce a fitted value, confusion often begins when deciding whether to request a confidence interval or a prediction interval. The difference matters because each interval answers a different question.

If your goal is to estimate the expected average response at a given predictor value, you usually want a confidence interval. If your goal is to estimate where an actual single future case may land, you want a prediction interval. Since individual outcomes vary around the regression line, a prediction interval must account for both uncertainty in the estimated mean and the random scatter of future observations. That extra uncertainty makes prediction intervals noticeably wider.

The core formula behind a prediction interval

For a linear regression model, the prediction interval for a new observation at predictor value x₀ is commonly expressed as:

y-hat ± t* × s × sqrt(1 + h)

y-hat is the predicted value from the model
t* is the critical value from the t distribution for the selected confidence level and residual degrees of freedom
s is the residual standard error
h is the leverage term for the new predictor location

By contrast, a confidence interval for the mean response uses:

y-hat ± t* × s × sqrt(h)

The only visible difference is the 1 + h term instead of just h, but that difference is conceptually large. The added 1 captures the random variation of a brand-new observation.

How R calculates prediction intervals

In R, the standard workflow looks like this:

model <- lm(y ~ x, data = mydata) newdata <- data.frame(x = 10) predict(model, newdata = newdata, interval = “prediction”, level = 0.95)

This returns a matrix with three columns:

fit: the predicted value
lwr: the lower bound of the prediction interval
upr: the upper bound of the prediction interval

If you instead write interval = “confidence”, R returns the confidence interval for the mean response rather than the interval for an individual future observation. This distinction is one of the most common sources of errors in applied analysis, especially in business forecasting, quality control, and biomedical studies where the user wants to predict a single future case.

Example in R

sales_model <- lm(sales ~ ad_spend, data = marketing) new_campaign <- data.frame(ad_spend = 5000) predict(sales_model, newdata = new_campaign, interval = “prediction”, level = 0.95)

If the fitted value is 120 units, the residual standard error is 15, the leverage is 0.08, and the residual degrees of freedom are 28, the 95% prediction interval is approximately:

fit = 120
t* ≈ 2.048
standard error for prediction = 15 × sqrt(1 + 0.08) ≈ 15.59
margin of error ≈ 31.93
prediction interval ≈ [88.07, 151.93]

That means a future individual observation is expected to fall in that range about 95% of the time, assuming the model assumptions hold and the new observation is drawn from the same process as the training data.

Prediction interval versus confidence interval

Because people often compare these two intervals, it helps to see the practical difference clearly.

Feature	Confidence Interval	Prediction Interval
Main purpose	Estimate the mean response at x₀	Estimate a future individual response at x₀
Formula factor	s × sqrt(h)	s × sqrt(1 + h)
Width	Narrower	Wider
Typical use	Inference about expected average outcome	Forecasting a single future case
R syntax	interval = “confidence”	interval = “prediction”

Numerical comparison using the same model inputs

Using y-hat = 120, residual standard error = 15, leverage = 0.08, and df = 28, you get the following 95% intervals:

Statistic	Confidence Interval	Prediction Interval
Critical t value	2.048	2.048
Standard error term	15 × sqrt(0.08) = 4.24	15 × sqrt(1.08) = 15.59
Margin of error	8.69	31.93
Lower bound	111.31	88.07
Upper bound	128.69	151.93
Total width	17.38	63.86

The prediction interval is almost four times wider in this example. That is not a mistake. It reflects genuine uncertainty in the next individual outcome.

Step-by-step process in R

Fit your regression model using lm() or another supported modeling function.
Create a newdata data frame containing the predictor values of interest.
Call predict() with interval = “prediction”.
Set the confidence level using the level argument.
Inspect the returned fit, lower bound, and upper bound.
Validate assumptions before interpreting the interval as reliable.

fit <- lm(weight_loss ~ calories + exercise_hours, data = health) person_a <- data.frame( calories = 1800, exercise_hours = 4 ) predict(fit, newdata = person_a, interval = “prediction”, level = 0.95)

Assumptions you should not ignore

Prediction intervals in ordinary least squares regression depend on several assumptions. Violations do not always make the model useless, but they can make interval estimates too narrow or too optimistic.

Linearity: the relationship between predictors and outcome is correctly represented by the model.
Independent errors: observations are not unduly correlated.
Constant variance: error variance is reasonably stable across predictor values.
Approximate normality of residuals: especially important for smaller samples when constructing t-based intervals.
Comparable future cases: the new observation should come from the same process as the data used to fit the model.

If you are extrapolating far beyond your observed x range, the formal interval from R may still compute, but the real-world reliability can be poor. A mathematically valid interval is not automatically a substantively credible forecast.

When the interval gets wider

Your prediction interval becomes wider when any of the following happen:

The residual standard error is large
The confidence level increases from 90% to 95% or 99%
The sample size is smaller, which raises the t critical value
The new x value has high leverage
The model fit is weak and unexplained variability is high

This is one reason that prediction intervals are more honest than point forecasts alone. A single predicted number can look precise even when the plausible range is broad.

Common mistakes when trying to calculate a prediction interval in R

1. Using interval = “confidence” by accident

This returns the interval for the mean response, not the interval for a future individual case. It is usually too narrow for forecasting a new observation.

2. Forgetting to supply newdata correctly

The variable names in newdata must exactly match those used in the model formula. If they do not, R may throw an error or produce unexpected behavior.

3. Interpreting the interval as covering 95% of all future data exactly

The interval is conditional on the model being correctly specified and the assumptions being approximately valid. In practice, actual coverage can differ from the nominal level.

4. Ignoring transformations

If your model uses log transformations or polynomial terms, the interval is calculated on the modeled scale unless you back-transform carefully. Back-transforming intervals may require extra attention.

5. Extrapolating beyond the data range

Even if R returns a prediction interval, the estimate may be unstable if the new point is far from the observed predictor values.

Authoritative references for interval estimation and regression

For high-quality reference material, review these authoritative sources:

Why use a calculator if R already computes it?

A calculator like the one above is useful because it reveals the mechanics hidden inside predict(). You can see exactly how the fitted value, residual standard error, leverage, degrees of freedom, and confidence level combine to produce the final range. This is especially helpful for teaching, auditing analysis pipelines, checking model outputs, or documenting a report for nontechnical stakeholders.

It also helps you understand why intervals differ between scenarios. For example, if two observations have the same fitted value but one has much higher leverage, the interval for that observation will be wider. That is not due to random luck. It is because predictions become less stable at unusual or extreme predictor settings.

Final takeaway

To calculate a prediction interval in R, the most direct method is to fit your model and call predict(…, interval = “prediction”). The interval you get reflects both uncertainty in the estimated mean response and the natural spread of future observations. Compared with confidence intervals, prediction intervals are wider and usually more appropriate when forecasting a single future case.

If you remember just one rule, let it be this: use confidence intervals for the mean, and use prediction intervals for an individual outcome. That small syntax choice in R changes the interpretation substantially.

Educational note: this calculator follows the standard t-based regression interval formula for a single prediction point. For generalized linear models, mixed models, time-series models, or heteroskedastic settings, interval estimation may require a different method.

Calculate Prediction Interval In R