How to Calculate Follow in Compiler Design
In compiler design, the FOLLOW set is a crucial concept used in parsing algorithms like LL(1) parsing. It represents the set of terminals that can appear immediately to the right of a non-terminal in any valid sentence of the grammar. Understanding how to calculate FOLLOW is essential for designing efficient parsers.
What is the FOLLOW Set?
The FOLLOW set for a non-terminal A in a grammar is defined as the set of terminals that can appear immediately to the right of A in any valid derivation of the grammar. In other words, it's the set of terminals that can follow A in any sentence generated by the grammar.
FOLLOW sets are particularly important in LL(1) parsing, where they help determine whether a particular production rule can be used during parsing. The FOLLOW set for the start symbol of a grammar always includes the end-of-input marker ($).
Key Points:
- FOLLOW sets are used in LL(1) parsing to resolve parsing conflicts
- They help determine which production rule to apply during parsing
- The FOLLOW set for the start symbol always includes $
How to Calculate FOLLOW
Calculating FOLLOW sets involves several steps that are applied iteratively until no more terminals can be added to any FOLLOW set. Here's the step-by-step process:
- Initialize FOLLOW(S) = {$} where S is the start symbol
- For each production rule A → αBβ:
- Add FIRST(β) to FOLLOW(B)
- If β can derive ε, add FOLLOW(A) to FOLLOW(B)
- Repeat step 2 until no more terminals can be added to any FOLLOW set
Formal Definition:
For a grammar G with productions P, the FOLLOW set for a non-terminal A is defined as:
FOLLOW(A) = {a | S ⇒* αAaβ, a ∈ T ∪ {$}, α, β ∈ (T ∪ N)*}
This process continues until a fixed point is reached where no more terminals can be added to any FOLLOW set.
Worked Example
Let's consider the following simple grammar:
S → aAd | bBc
A → a | ε
B → b | ε
We'll calculate FOLLOW sets for all non-terminals:
- Initialize FOLLOW(S) = {$}
- For S → aAd:
- FIRST(d) = {d} → FOLLOW(A) = {d}
- FIRST(d) cannot derive ε → no change
- For S → bBc:
- FIRST(c) = {c} → FOLLOW(B) = {c}
- FIRST(c) cannot derive ε → no change
- For A → a:
- No change to FOLLOW(A)
- For A → ε:
- FOLLOW(A) = FOLLOW(S) = {$}
- For B → b:
- No change to FOLLOW(B)
- For B → ε:
- FOLLOW(B) = FOLLOW(S) = {$}
The final FOLLOW sets are:
- FOLLOW(S) = {$}
- FOLLOW(A) = {d, $}
- FOLLOW(B) = {c, $}
FAQ
- What is the difference between FIRST and FOLLOW sets?
- The FIRST set contains the first terminals that can appear in any string derived from a non-terminal, while the FOLLOW set contains the terminals that can appear immediately after a non-terminal in any valid derivation.
- When is the FOLLOW set for a non-terminal empty?
- The FOLLOW set for a non-terminal is empty if the non-terminal cannot appear in any valid derivation of the grammar. This typically happens with non-terminals that are not reachable from the start symbol.
- How are FOLLOW sets used in LL(1) parsing?
- In LL(1) parsing, FOLLOW sets help resolve parsing conflicts by determining which production rule to apply when multiple rules are possible. The FOLLOW set for a non-terminal indicates which terminals can follow it, helping the parser make the correct choice.
- What happens if a grammar has left recursion?
- Left recursion in a grammar can cause problems when calculating FOLLOW sets because it can lead to infinite loops in the derivation process. Left recursion should be eliminated before calculating FOLLOW sets.