How to Calculate True Negative From Confusion Matrix
In machine learning and statistics, a confusion matrix is a table that describes the performance of a classification model. One of the key metrics derived from this matrix is the true negative rate, which measures how well a model correctly identifies negative cases. This guide will explain how to calculate true negatives from a confusion matrix, including practical examples and common pitfalls.
What is True Negative?
A true negative (TN) occurs when a classification model correctly predicts the negative class. For example, in a medical test that identifies a disease, a true negative would be a healthy person correctly identified as not having the disease.
True negatives are important because they show the model's ability to correctly identify negative cases, which is often as important as identifying positive cases in many applications.
Key Point: True negatives are part of the model's overall accuracy, along with true positives, false positives, and false negatives.
Confusion Matrix Basics
A confusion matrix is a 2x2 table that summarizes the performance of a classification algorithm. It has four components:
- True Positives (TP): Correctly predicted positive cases
- True Negatives (TN): Correctly predicted negative cases
- False Positives (FP): Incorrectly predicted positive cases (Type I error)
- False Negatives (FN): Incorrectly predicted negative cases (Type II error)
The matrix looks like this:
| Predicted Positive | Predicted Negative | |
|---|---|---|
| Actual Positive | True Positive (TP) | False Negative (FN) |
| Actual Negative | False Positive (FP) | True Negative (TN) |
True negatives are found in the bottom-right cell of this matrix.
Calculating True Negative
The true negative count is simply the number of negative cases that the model correctly identified. It's one of the four components of the confusion matrix.
Formula
True Negative (TN) = Number of correctly predicted negative cases
In practice, you would:
- Count all negative cases in your dataset
- Count how many of these were correctly identified by your model
- This count is your true negative value
Common Pitfalls
- Confusing true negatives with true positives
- Misinterpreting false negatives as true negatives
- Assuming a high true negative rate means the model is perfect
Practical Examples
Let's look at two examples to illustrate how to calculate true negatives.
Example 1: Medical Test
Suppose a disease test was given to 1000 people:
- 500 people actually have the disease (positive cases)
- 500 people do not have the disease (negative cases)
The test results were:
- 450 people correctly identified as having the disease (TP)
- 50 people incorrectly identified as having the disease (FP)
- 400 people correctly identified as not having the disease (TN)
- 50 people incorrectly identified as not having the disease (FN)
In this case, the true negative count is 400.
Example 2: Spam Detection
For a spam detection system processing 10,000 emails:
- 2,000 emails are spam (positive cases)
- 8,000 emails are not spam (negative cases)
The system's performance was:
- 1,800 spam emails correctly identified (TP)
- 200 non-spam emails incorrectly marked as spam (FP)
- 7,800 non-spam emails correctly identified (TN)
- 200 spam emails incorrectly marked as non-spam (FN)
Here, the true negative count is 7,800.
FAQ
What is the difference between true negatives and false negatives?
True negatives are cases where the model correctly predicts the negative class, while false negatives are cases where the model incorrectly predicts the negative class (it actually is positive).
Why is true negative important in machine learning?
True negatives are important because they show the model's ability to correctly identify negative cases, which is crucial in applications where false positives are costly (like medical testing).
How do I calculate the true negative rate?
The true negative rate is calculated by dividing the number of true negatives by the total number of actual negatives (TN / (TN + FN)).