C++ Calculate First and Follow Sets

Understanding FIRST and FOLLOW sets is crucial for compiler design and parsing algorithms. This guide explains how to calculate these sets in C++ with a practical implementation.

What are FIRST and FOLLOW sets?

In formal language theory, FIRST and FOLLOW sets are used in parsing algorithms like LL(1) to determine the structure of a grammar. They help predict which production rule to apply during parsing.

Key Concepts:

FIRST(X) - Set of terminals that begin strings derived from X
FOLLOW(X) - Set of terminals that can appear immediately after X in any sentential form

These sets are essential for:

Constructing parsing tables
Detecting ambiguous grammars
Implementing efficient parsers

Calculating FIRST sets

The FIRST set for a symbol X is calculated as follows:

FIRST(X) =

If X is a terminal, FIRST(X) = {X}
If X → ε (epsilon production), add ε to FIRST(X)
If X → Y₁Y₂...Yₙ, add FIRST(Y₁) to FIRST(X)
If FIRST(Y₁) contains ε, add FIRST(Y₂) to FIRST(X)
Continue this process until no more terminals can be added

For non-terminals, we need to consider all production rules. The algorithm typically involves iterative computation until no more changes occur.

Calculating FOLLOW sets

The FOLLOW set for a symbol X is calculated as follows:

FOLLOW(X) =

If X is the start symbol, add $ (end marker) to FOLLOW(X)
If there's a production A → αXβ:
- Add FIRST(β) to FOLLOW(X)
- If FIRST(β) contains ε, add FOLLOW(A) to FOLLOW(X)

This process is repeated until no more changes occur. The FOLLOW set helps determine what terminals can follow a non-terminal in any valid sentence.

C++ Implementation

Here's a complete C++ implementation for calculating FIRST and FOLLOW sets:

#include <iostream>
#include <vector>
#include <map>
#include <set>
#include <string>

using namespace std;

map<string, vector<vector<string>>> grammar;
map<string, set<string>> first, follow;
string startSymbol;

void computeFirst() {
    bool changed;
    do {
        changed = false;
        for (auto &prod : grammar) {
            string nonTerminal = prod.first;
            for (auto &rule : prod.second) {
                if (rule.empty()) continue; // Handle epsilon
                for (size_t i = 0; i < rule.size(); ++i) {
                    string symbol = rule[i];
                    if (grammar.find(symbol) == grammar.end()) { // Terminal
                        if (first[nonTerminal].insert(symbol).second) {
                            changed = true;
                        }
                        break;
                    } else { // Non-terminal
                        bool allEpsilon = true;
                        for (string t : first[symbol]) {
                            if (t != "ε") {
                                if (first[nonTerminal].insert(t).second) {
                                    changed = true;
                                }
                                allEpsilon = false;
                            }
                        }
                        if (!allEpsilon) break;
                    }
                }
            }
        }
    } while (changed);
}

void computeFollow() {
    follow[startSymbol].insert("$");
    bool changed;
    do {
        changed = false;
        for (auto &prod : grammar) {
            string nonTerminal = prod.first;
            for (auto &rule : prod.second) {
                for (size_t i = 0; i < rule.size(); ++i) {
                    string symbol = rule[i];
                    if (grammar.find(symbol) == grammar.end()) continue; // Terminal

                    // Handle case where symbol is the last in the rule
                    if (i == rule.size() - 1) {
                        for (string t : follow[nonTerminal]) {
                            if (follow[symbol].insert(t).second) {
                                changed = true;
                            }
                        }
                    } else {
                        // Handle case where symbol is followed by another symbol
                        bool allEpsilon = true;
                        for (size_t j = i + 1; j < rule.size(); ++j) {
                            string nextSymbol = rule[j];
                            if (grammar.find(nextSymbol) == grammar.end()) { // Terminal
                                if (follow[symbol].insert(nextSymbol).second) {
                                    changed = true;
                                }
                                allEpsilon = false;
                                break;
                            } else { // Non-terminal
                                for (string t : first[nextSymbol]) {
                                    if (t != "ε") {
                                        if (follow[symbol].insert(t).second) {
                                            changed = true;
                                        }
                                        allEpsilon = false;
                                    }
                                }
                                if (!allEpsilon) break;
                            }
                        }
                        if (allEpsilon) {
                            for (string t : follow[nonTerminal]) {
                                if (follow[symbol].insert(t).second) {
                                    changed = true;
                                }
                            }
                        }
                    }
                }
            }
        }
    } while (changed);
}

int main() {
    // Example grammar: E → T E'
    //                  E' → + T E' | ε
    //                  T → F T'
    //                  T' → * F T' | ε
    //                  F → ( E ) | id

    grammar["E"] = {{"T", "E'"}};
    grammar["E'"] = {{"+", "T", "E'"}, {"ε"}};
    grammar["T"] = {{"F", "T'"}};
    grammar["T'"] = {{"*", "F", "T'"}, {"ε"}};
    grammar["F"] = {{"(", "E", ")"}, {"id"}};

    startSymbol = "E";

    computeFirst();
    computeFollow();

    // Print FIRST sets
    cout << "FIRST sets:" << endl;
    for (auto &entry : first) {
        cout << "FIRST(" << entry.first << ") = { ";
        for (string t : entry.second) {
            cout << t << " ";
        }
        cout << "}" << endl;
    }

    // Print FOLLOW sets
    cout << "\nFOLLOW sets:" << endl;
    for (auto &entry : follow) {
        cout << "FOLLOW(" << entry.first << ") = { ";
        for (string t : entry.second) {
            cout << t << " ";
        }
        cout << "}" << endl;
    }

    return 0;
}

This implementation includes:

Iterative computation of FIRST and FOLLOW sets
Handling of epsilon productions
Proper terminal/non-terminal differentiation
Clear output formatting

Example

Let's calculate FIRST and FOLLOW sets for the following grammar:

Grammar:

E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
F → ( E ) | id

The C++ program will output:

FIRST sets:

FIRST(E) = { ( id }
FIRST(E') = { + ε }
FIRST(T) = { ( id }
FIRST(T') = { * ε }
FIRST(F) = { ( id }

FOLLOW sets:

FOLLOW(E) = { ) $ }
FOLLOW(E') = { ) $ }
FOLLOW(T) = { + ) $ }
FOLLOW(T') = { + ) $ }
FOLLOW(F) = { * + ) $ }

These sets are crucial for constructing an LL(1) parsing table for this grammar.

FAQ

What is the difference between FIRST and FOLLOW sets?: FIRST sets contain the terminals that can begin strings derived from a symbol, while FOLLOW sets contain the terminals that can appear immediately after a symbol in any sentential form.
When would I need to use ε in FIRST sets?: You need to include ε in FIRST(X) when X can derive the empty string (ε production). This is important for handling optional parts of a grammar.
How do I know when to stop computing FIRST/FOLLOW sets?: The computation stops when no more terminals can be added to any FIRST or FOLLOW set during an entire iteration through all productions.
Can FIRST and FOLLOW sets be used for any type of grammar?: FIRST and FOLLOW sets are most useful for LL(1) grammars. For ambiguous grammars, these sets may not provide sufficient information for parsing.
What happens if a grammar has left recursion?: Left recursion can cause infinite loops in FIRST set computation. You should eliminate left recursion before calculating FIRST/FOLLOW sets.