How to Calculate Real Recall

Real Recall is a crucial metric in machine learning that measures the proportion of actual positive cases that were correctly identified by a classification model. Unlike precision, which measures the accuracy of positive predictions, recall focuses on the model's ability to find all relevant instances in a dataset.

What is Real Recall?

Real Recall, also known as sensitivity or true positive rate, is a key performance metric for classification models. It answers the question: "Of all the actual positive cases, how many did the model correctly identify?"

Recall is particularly important in applications where missing a positive case is more costly than incorrectly identifying a negative case as positive. Examples include medical diagnosis, fraud detection, and spam filtering.

Recall is calculated from the confusion matrix, which shows the counts of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN).

Real Recall Formula

Real Recall = TP / (TP + FN)

Where:

TP = True Positives (correctly identified positive cases)
FN = False Negatives (positive cases incorrectly identified as negative)

The formula shows that recall is the ratio of correctly identified positive cases to all actual positive cases in the dataset.

How to Calculate Real Recall

Identify the number of true positives (TP) in your model's predictions.
Identify the number of false negatives (FN) in your model's predictions.
Apply the formula: Real Recall = TP / (TP + FN)
Multiply the result by 100 to get a percentage.

For example, if your model correctly identified 80 positive cases (TP) and missed 20 positive cases (FN), your real recall would be 80 / (80 + 20) = 0.8 or 80%.

Example Calculation

Let's say you're building a spam filter and your model's predictions are as follows:

True Positives (TP): 120 (correctly identified spam emails)
False Negatives (FN): 30 (spam emails incorrectly identified as not spam)

Using the formula:

Real Recall = 120 / (120 + 30) = 120 / 150 = 0.8 or 80%

This means your model correctly identified 80% of all spam emails in the dataset.

Interpreting Real Recall

A high recall score indicates that the model is good at identifying positive cases. However, it's important to consider recall in conjunction with precision:

High recall, low precision: The model identifies most positive cases but also has many false positives.
Low recall, high precision: The model has few false positives but misses many positive cases.
Balanced recall and precision: The model performs well overall.

In some applications, you may need to adjust the classification threshold to achieve a better balance between recall and precision.

FAQ

What is the difference between recall and precision?: Recall measures the model's ability to find all positive cases, while precision measures the accuracy of positive predictions. A high recall means the model finds most positives, while high precision means positive predictions are likely correct.
When should I use recall instead of accuracy?: Use recall when false negatives are more costly than false positives. For example, in medical testing, missing a positive case (false negative) is more serious than incorrectly identifying a negative case as positive.
How can I improve recall in my model?: To improve recall, you can adjust the classification threshold to be more sensitive to positive cases, use more training data, or apply techniques like oversampling to address class imbalance.
What is a good recall score?: A good recall score depends on the specific application. In some cases, 80% or higher may be acceptable, while in others, you may need 90% or more to meet business requirements.
Can recall be higher than 100%?: No, recall cannot be higher than 100% because it represents a proportion of actual positive cases. A score of 100% means the model identified all positive cases without any false negatives.