Variance and Calculating Confidence Interval
Variance is a fundamental measure of statistical dispersion that quantifies how far data points are from the mean. Calculating confidence intervals around variance estimates provides a range of plausible values for the true population variance. This guide explains both concepts with practical examples and an interactive calculator.
What is Variance?
Variance measures how far each number in a dataset is from the mean (average) of the dataset. A high variance indicates that the data points are spread out over a wide range of values, while a low variance indicates that the data points are clustered closely around the mean.
Key Point: Variance is always non-negative and is expressed in the same units as the original data squared.
Types of Variance
There are two main types of variance calculations:
- Population Variance: Calculated when you have data for the entire population.
- Sample Variance: Calculated when you have data from a sample of the population (most common in real-world applications).
Calculating Variance
The formulas for calculating variance differ slightly between population and sample data.
Population Variance Formula
σ² = Σ(xᵢ - μ)² / N
- σ² = population variance
- xᵢ = each individual data point
- μ = population mean
- N = total number of data points in the population
Sample Variance Formula
s² = Σ(xᵢ - x̄)² / (n - 1)
- s² = sample variance
- xᵢ = each individual data point
- x̄ = sample mean
- n = number of data points in the sample
The key difference is that sample variance uses n-1 in the denominator (Bessel's correction) to provide an unbiased estimate of the population variance.
Steps to Calculate Variance
- Calculate the mean of your data set.
- For each data point, subtract the mean and square the result.
- Sum all these squared differences.
- Divide by the number of data points (for population) or n-1 (for sample).
Confidence Interval for Variance
A confidence interval for variance provides a range of values that is likely to contain the true population variance. The most common method uses the chi-square distribution.
Confidence Interval Formula
Lower Bound = (n-1)s² / χ²α/2,n-1
Upper Bound = (n-1)s² / χ²1-α/2,n-1
- s² = sample variance
- n = sample size
- χ²α/2,n-1 = critical value from chi-square distribution
- α = significance level (e.g., 0.05 for 95% confidence)
The confidence interval gives you a range of values that you can be confident contains the true population variance. For example, a 95% confidence interval means that if you took many samples and calculated a 95% confidence interval for each, about 95% of those intervals would contain the true population variance.
Interpreting Confidence Intervals
- If the interval is wide, it indicates high uncertainty about the true variance.
- If the interval is narrow, it indicates low uncertainty and a more precise estimate.
- Always report the confidence level with your interval (e.g., "95% CI").
Example Calculation
Let's calculate the variance and confidence interval for the following sample data: 5, 7, 9, 11, 13.
Step 1: Calculate the Sample Mean
Mean (x̄) = (5 + 7 + 9 + 11 + 13) / 5 = 45 / 5 = 9
Step 2: Calculate Each Squared Difference
- (5 - 9)² = 16
- (7 - 9)² = 4
- (9 - 9)² = 0
- (11 - 9)² = 4
- (13 - 9)² = 16
Step 3: Calculate Sample Variance
s² = (16 + 4 + 0 + 4 + 16) / (5 - 1) = 40 / 4 = 10
Step 4: Calculate 95% Confidence Interval
Using chi-square critical values (χ²0.025,4=0.484 and χ²0.975,4=11.143):
Lower Bound = (4 × 10) / 11.143 ≈ 3.59
Upper Bound = (4 × 10) / 0.484 ≈ 82.23
95% CI for variance: [3.59, 82.23]
Interpretation: We are 95% confident that the true population variance lies between approximately 3.59 and 82.23.