R How to Calculate 95 Confidence Interval for Linear Regression
This guide explains how to calculate a 95% confidence interval for linear regression in R, including the formula, practical interpretation, and step-by-step examples.
Introduction
A 95% confidence interval for linear regression provides a range of values that is likely to contain the true population slope coefficient with 95% probability. This is a fundamental concept in statistical inference that helps assess the precision of your regression model.
In R, you can calculate confidence intervals for linear regression coefficients using the confint() function or by manually calculating them using the standard error of the coefficient and the t-distribution.
Formula
The confidence interval for a regression coefficient β in a linear regression model is calculated as:
Where:
- β is the estimated coefficient
- t*(α/2, n-p-1) is the critical t-value from the t-distribution with n-p-1 degrees of freedom
- SE(β) is the standard error of the coefficient
- n is the sample size
- p is the number of predictors (including the intercept)
- α is the significance level (0.05 for 95% confidence)
For a 95% confidence interval, you use the t-value that leaves 2.5% in each tail of the t-distribution.
Example Calculation
Let's consider a simple linear regression model where we're predicting house prices based on square footage. Suppose we have the following regression output:
| Term | Estimate | Std. Error | t value | Pr(>|t|) |
|---|---|---|---|---|
| (Intercept) | 50000 | 10000 | 5.00 | 0.0002 |
| sqft | 200 | 10 | 20.00 | < 0.0001 |
For the coefficient of sqft (β = 200), the standard error (SE) is 10, and we have n = 100 observations with p = 2 parameters (intercept + sqft).
The degrees of freedom for the t-distribution are n-p-1 = 98. The critical t-value for a 95% confidence interval is approximately 2.0096.
The 95% confidence interval is calculated as:
This means we're 95% confident that the true population coefficient for sqft is between approximately 179.90 and 220.10.
Interpreting Results
The confidence interval provides several important insights:
- Precision: A narrow confidence interval indicates a more precise estimate of the coefficient.
- Significance: If the interval does not include zero, the coefficient is statistically significant at the 95% level.
- Practical Importance: The width of the interval helps determine if the effect size is practically meaningful.
For example, if the confidence interval for the sqft coefficient includes zero, it suggests that square footage may not have a significant effect on house prices at the 95% confidence level.
Always consider the context when interpreting confidence intervals. A statistically significant result may not always be practically important, and vice versa.
FAQ
What does a 95% confidence interval mean?
A 95% confidence interval means that if we were to take many samples and calculate a 95% confidence interval for each, approximately 95% of these intervals would contain the true population parameter.
How do I calculate confidence intervals in R?
You can use the confint() function on your linear regression model object, or manually calculate them using the formula shown in this guide.
What if my confidence interval includes zero?
If the confidence interval for a coefficient includes zero, it suggests that the effect may not be statistically significant at the 95% confidence level.