How to Calculate First and Follow in Compiler Design

In compiler design, FIRST and FOLLOW sets are fundamental concepts used in parsing algorithms like LL(1) parsing. These sets help determine the validity of productions in a grammar and are essential for constructing parsing tables. This guide explains how to calculate FIRST and FOLLOW sets with clear examples and an interactive calculator.

What Are FIRST and FOLLOW Sets?

FIRST and FOLLOW sets are used in compiler design to analyze the grammar of a programming language. They help determine which terminals can appear next in a production rule and which terminals can follow a non-terminal.

Key Points:

FIRST(X) is the set of terminals that begin the strings derived from symbol X.
FOLLOW(X) is the set of terminals that can appear immediately after X in any sentential form.
These sets are used in LL(1) parsing to build parsing tables.

The FIRST set of a symbol X is the set of terminals that can appear as the first symbol of any string derived from X. For a terminal, the FIRST set is simply the terminal itself. For a non-terminal, it's the union of the FIRST sets of all productions that start with a terminal, or the FIRST sets of the first symbols of productions that start with non-terminals.

The FOLLOW set of a symbol X is the set of terminals that can appear immediately to the right of X in any sentential form. The FOLLOW set of the start symbol includes the end-of-input marker ($).

Calculating FIRST Sets

To calculate the FIRST set for a grammar symbol, follow these steps:

For each terminal symbol, FIRST(X) = {X}.
For each non-terminal symbol, FIRST(X) is the union of the FIRST sets of the first symbols of all productions for X.
If a production starts with a non-terminal that can derive ε (the empty string), include the FIRST sets of the next symbols in the production.
If all symbols in a production can derive ε, include ε in FIRST(X).

FIRST(X) = {a | X ⇒* a...} ∪ {ε | X ⇒* ε}

For example, if we have the production A → aB | b, then FIRST(A) = FIRST(a) ∪ FIRST(b) = {a, b}.

Calculating FOLLOW Sets

To calculate the FOLLOW set for a grammar symbol, follow these steps:

FOLLOW(S) = {$}, where S is the start symbol.
If there's a production A → αBβ, then FOLLOW(B) includes FIRST(β) - {ε}.
If there's a production A → αB or A → αBβ where FIRST(β) contains ε, then FOLLOW(B) includes FOLLOW(A).

FOLLOW(X) = {a | S ⇒* αXaβ} ∪ {a | S ⇒* αXβ and FIRST(β) contains ε}

For example, if we have the production S → aBc and FIRST(c) = {c}, then FOLLOW(B) includes {c}.

Example Calculation

Let's calculate FIRST and FOLLOW sets for the following grammar:

S → aBc | bBd
B → eB | f

Calculating FIRST Sets

Symbol	FIRST Set
S	{a, b}
B	{e, f}
a, b, c, d, e, f	{a}, {b}, {c}, {d}, {e}, {f}

Calculating FOLLOW Sets

Symbol	FOLLOW Set
S	{$}
B	{c, d}

These sets help in constructing the parsing table for the grammar.

FAQ

What is the difference between FIRST and FOLLOW sets?

The FIRST set contains the terminals that can appear at the beginning of any string derived from a symbol, while the FOLLOW set contains the terminals that can appear immediately after the symbol in any sentential form.

When are ε included in FIRST sets?

ε is included in the FIRST set of a symbol if that symbol can derive the empty string. This happens when all productions for the symbol can derive ε or when a production starts with a non-terminal that can derive ε.

How are FOLLOW sets used in parsing?

FOLLOW sets are used in LL(1) parsing to determine which production to use when the current input symbol is in the FOLLOW set of the left-hand side of a production.

Can FIRST and FOLLOW sets be empty?

No, FIRST sets are never empty because they must contain at least one terminal or ε. FOLLOW sets can be empty if the symbol never appears before another terminal in any sentential form.