How to Calculate Follow in Compiler Design

In compiler design, the FOLLOW set is a crucial concept used in parsing algorithms like LL(1) parsing. It represents the set of terminals that can appear immediately to the right of a non-terminal in any valid sentence of the grammar. Understanding how to calculate FOLLOW is essential for designing efficient parsers.

What is the FOLLOW Set?

The FOLLOW set for a non-terminal A in a grammar is defined as the set of terminals that can appear immediately to the right of A in any valid derivation of the grammar. In other words, it's the set of terminals that can follow A in any sentence generated by the grammar.

FOLLOW sets are particularly important in LL(1) parsing, where they help determine whether a particular production rule can be used during parsing. The FOLLOW set for the start symbol of a grammar always includes the end-of-input marker ($).

Key Points:

FOLLOW sets are used in LL(1) parsing to resolve parsing conflicts
They help determine which production rule to apply during parsing
The FOLLOW set for the start symbol always includes $

How to Calculate FOLLOW

Calculating FOLLOW sets involves several steps that are applied iteratively until no more terminals can be added to any FOLLOW set. Here's the step-by-step process:

Initialize FOLLOW(S) = {$} where S is the start symbol
For each production rule A → αBβ:
- Add FIRST(β) to FOLLOW(B)
- If β can derive ε, add FOLLOW(A) to FOLLOW(B)
Repeat step 2 until no more terminals can be added to any FOLLOW set

Formal Definition:

For a grammar G with productions P, the FOLLOW set for a non-terminal A is defined as:

FOLLOW(A) = {a | S ⇒* αAaβ, a ∈ T ∪ {$}, α, β ∈ (T ∪ N)*}

This process continues until a fixed point is reached where no more terminals can be added to any FOLLOW set.

Worked Example

Let's consider the following simple grammar:

S → aAd | bBc

A → a | ε

B → b | ε

We'll calculate FOLLOW sets for all non-terminals:

Initialize FOLLOW(S) = {$}
For S → aAd:
- FIRST(d) = {d} → FOLLOW(A) = {d}
- FIRST(d) cannot derive ε → no change
For S → bBc:
- FIRST(c) = {c} → FOLLOW(B) = {c}
- FIRST(c) cannot derive ε → no change
For A → a:
- No change to FOLLOW(A)
For A → ε:
- FOLLOW(A) = FOLLOW(S) = {$}
For B → b:
- No change to FOLLOW(B)
For B → ε:
- FOLLOW(B) = FOLLOW(S) = {$}

The final FOLLOW sets are:

FOLLOW(S) = {$}
FOLLOW(A) = {d, $}
FOLLOW(B) = {c, $}

FAQ

What is the difference between FIRST and FOLLOW sets?: The FIRST set contains the first terminals that can appear in any string derived from a non-terminal, while the FOLLOW set contains the terminals that can appear immediately after a non-terminal in any valid derivation.
When is the FOLLOW set for a non-terminal empty?: The FOLLOW set for a non-terminal is empty if the non-terminal cannot appear in any valid derivation of the grammar. This typically happens with non-terminals that are not reachable from the start symbol.
How are FOLLOW sets used in LL(1) parsing?: In LL(1) parsing, FOLLOW sets help resolve parsing conflicts by determining which production rule to apply when multiple rules are possible. The FOLLOW set for a non-terminal indicates which terminals can follow it, helping the parser make the correct choice.
What happens if a grammar has left recursion?: Left recursion in a grammar can cause problems when calculating FOLLOW sets because it can lead to infinite loops in the derivation process. Left recursion should be eliminated before calculating FOLLOW sets.