C++ Calculate First and Follow Sets
Understanding FIRST and FOLLOW sets is crucial for compiler design and parsing algorithms. This guide explains how to calculate these sets in C++ with a practical implementation.
What are FIRST and FOLLOW sets?
In formal language theory, FIRST and FOLLOW sets are used in parsing algorithms like LL(1) to determine the structure of a grammar. They help predict which production rule to apply during parsing.
Key Concepts:
- FIRST(X) - Set of terminals that begin strings derived from X
- FOLLOW(X) - Set of terminals that can appear immediately after X in any sentential form
These sets are essential for:
- Constructing parsing tables
- Detecting ambiguous grammars
- Implementing efficient parsers
Calculating FIRST sets
The FIRST set for a symbol X is calculated as follows:
FIRST(X) =
- If X is a terminal, FIRST(X) = {X}
- If X → ε (epsilon production), add ε to FIRST(X)
- If X → Y₁Y₂...Yₙ, add FIRST(Y₁) to FIRST(X)
- If FIRST(Y₁) contains ε, add FIRST(Y₂) to FIRST(X)
- Continue this process until no more terminals can be added
For non-terminals, we need to consider all production rules. The algorithm typically involves iterative computation until no more changes occur.
Calculating FOLLOW sets
The FOLLOW set for a symbol X is calculated as follows:
FOLLOW(X) =
- If X is the start symbol, add $ (end marker) to FOLLOW(X)
- If there's a production A → αXβ:
- Add FIRST(β) to FOLLOW(X)
- If FIRST(β) contains ε, add FOLLOW(A) to FOLLOW(X)
This process is repeated until no more changes occur. The FOLLOW set helps determine what terminals can follow a non-terminal in any valid sentence.
C++ Implementation
Here's a complete C++ implementation for calculating FIRST and FOLLOW sets:
#include <iostream>
#include <vector>
#include <map>
#include <set>
#include <string>
using namespace std;
map<string, vector<vector<string>>> grammar;
map<string, set<string>> first, follow;
string startSymbol;
void computeFirst() {
bool changed;
do {
changed = false;
for (auto &prod : grammar) {
string nonTerminal = prod.first;
for (auto &rule : prod.second) {
if (rule.empty()) continue; // Handle epsilon
for (size_t i = 0; i < rule.size(); ++i) {
string symbol = rule[i];
if (grammar.find(symbol) == grammar.end()) { // Terminal
if (first[nonTerminal].insert(symbol).second) {
changed = true;
}
break;
} else { // Non-terminal
bool allEpsilon = true;
for (string t : first[symbol]) {
if (t != "ε") {
if (first[nonTerminal].insert(t).second) {
changed = true;
}
allEpsilon = false;
}
}
if (!allEpsilon) break;
}
}
}
}
} while (changed);
}
void computeFollow() {
follow[startSymbol].insert("$");
bool changed;
do {
changed = false;
for (auto &prod : grammar) {
string nonTerminal = prod.first;
for (auto &rule : prod.second) {
for (size_t i = 0; i < rule.size(); ++i) {
string symbol = rule[i];
if (grammar.find(symbol) == grammar.end()) continue; // Terminal
// Handle case where symbol is the last in the rule
if (i == rule.size() - 1) {
for (string t : follow[nonTerminal]) {
if (follow[symbol].insert(t).second) {
changed = true;
}
}
} else {
// Handle case where symbol is followed by another symbol
bool allEpsilon = true;
for (size_t j = i + 1; j < rule.size(); ++j) {
string nextSymbol = rule[j];
if (grammar.find(nextSymbol) == grammar.end()) { // Terminal
if (follow[symbol].insert(nextSymbol).second) {
changed = true;
}
allEpsilon = false;
break;
} else { // Non-terminal
for (string t : first[nextSymbol]) {
if (t != "ε") {
if (follow[symbol].insert(t).second) {
changed = true;
}
allEpsilon = false;
}
}
if (!allEpsilon) break;
}
}
if (allEpsilon) {
for (string t : follow[nonTerminal]) {
if (follow[symbol].insert(t).second) {
changed = true;
}
}
}
}
}
}
}
} while (changed);
}
int main() {
// Example grammar: E → T E'
// E' → + T E' | ε
// T → F T'
// T' → * F T' | ε
// F → ( E ) | id
grammar["E"] = {{"T", "E'"}};
grammar["E'"] = {{"+", "T", "E'"}, {"ε"}};
grammar["T"] = {{"F", "T'"}};
grammar["T'"] = {{"*", "F", "T'"}, {"ε"}};
grammar["F"] = {{"(", "E", ")"}, {"id"}};
startSymbol = "E";
computeFirst();
computeFollow();
// Print FIRST sets
cout << "FIRST sets:" << endl;
for (auto &entry : first) {
cout << "FIRST(" << entry.first << ") = { ";
for (string t : entry.second) {
cout << t << " ";
}
cout << "}" << endl;
}
// Print FOLLOW sets
cout << "\nFOLLOW sets:" << endl;
for (auto &entry : follow) {
cout << "FOLLOW(" << entry.first << ") = { ";
for (string t : entry.second) {
cout << t << " ";
}
cout << "}" << endl;
}
return 0;
}
This implementation includes:
- Iterative computation of FIRST and FOLLOW sets
- Handling of epsilon productions
- Proper terminal/non-terminal differentiation
- Clear output formatting
Example
Let's calculate FIRST and FOLLOW sets for the following grammar:
Grammar:
- E → T E'
- E' → + T E' | ε
- T → F T'
- T' → * F T' | ε
- F → ( E ) | id
The C++ program will output:
FIRST sets:
- FIRST(E) = { ( id }
- FIRST(E') = { + ε }
- FIRST(T) = { ( id }
- FIRST(T') = { * ε }
- FIRST(F) = { ( id }
FOLLOW sets:
- FOLLOW(E) = { ) $ }
- FOLLOW(E') = { ) $ }
- FOLLOW(T) = { + ) $ }
- FOLLOW(T') = { + ) $ }
- FOLLOW(F) = { * + ) $ }
These sets are crucial for constructing an LL(1) parsing table for this grammar.
FAQ
- What is the difference between FIRST and FOLLOW sets?
- FIRST sets contain the terminals that can begin strings derived from a symbol, while FOLLOW sets contain the terminals that can appear immediately after a symbol in any sentential form.
- When would I need to use ε in FIRST sets?
- You need to include ε in FIRST(X) when X can derive the empty string (ε production). This is important for handling optional parts of a grammar.
- How do I know when to stop computing FIRST/FOLLOW sets?
- The computation stops when no more terminals can be added to any FIRST or FOLLOW set during an entire iteration through all productions.
- Can FIRST and FOLLOW sets be used for any type of grammar?
- FIRST and FOLLOW sets are most useful for LL(1) grammars. For ambiguous grammars, these sets may not provide sufficient information for parsing.
- What happens if a grammar has left recursion?
- Left recursion can cause infinite loops in FIRST set computation. You should eliminate left recursion before calculating FIRST/FOLLOW sets.