C Program to Calculate First and Follow Sets
This guide explains how to write a C program to calculate FIRST and FOLLOW sets, which are fundamental concepts in compiler design. We'll cover the theory, provide a complete implementation, and walk through a practical example.
Introduction
In compiler design, FIRST and FOLLOW sets are used to determine the validity of productions in a context-free grammar. These sets help in parsing and syntax analysis.
Key Concepts:
- FIRST set of a non-terminal is the set of terminals that begin the strings derived from that non-terminal.
- FOLLOW set of a non-terminal is the set of terminals that can appear immediately to the right of that non-terminal in any sentential form.
Calculating these sets is crucial for constructing parsing tables in top-down parsers like LL(1) parsers. The algorithm involves iterative computation until no more elements can be added to the sets.
Algorithm Overview
The algorithm for calculating FIRST and FOLLOW sets consists of two main parts:
FIRST Set Calculation
- Initialize FIRST sets for all non-terminals to empty sets.
- For each production rule:
- If the right-hand side starts with a terminal, add it to the FIRST set of the left-hand side non-terminal.
- If the right-hand side starts with a non-terminal, add all elements of its FIRST set to the left-hand side's FIRST set, excluding ε (epsilon).
- If all symbols in the right-hand side can derive ε, add ε to the FIRST set.
- Repeat until no more changes occur in any FIRST set.
FOLLOW Set Calculation
- Initialize FOLLOW sets for all non-terminals to empty sets, except for the start symbol which gets $ (end marker).
- For each production rule:
- For each non-terminal B in the right-hand side, look at the symbol A following B:
- If A is a terminal, add it to FOLLOW(B).
- If A is a non-terminal, add FIRST(A) to FOLLOW(B), excluding ε.
- If A can derive ε, add FOLLOW of the left-hand side non-terminal to FOLLOW(B).
- For each non-terminal B in the right-hand side, look at the symbol A following B:
- Repeat until no more changes occur in any FOLLOW set.
C Program Implementation
The following C program implements the algorithm to calculate FIRST and FOLLOW sets for a given context-free grammar.
The program includes functions to:
- Check if a symbol is terminal or non-terminal
- Add elements to sets while avoiding duplicates
- Compute FIRST sets using the iterative algorithm
- Compute FOLLOW sets using the iterative algorithm
- Print the results in a readable format
Worked Example
Let's walk through an example grammar to see how the program works:
Grammar:
- S → aB | bA
- A → a | ε
- B → b
FIRST Set Calculation
- Initialize FIRST sets to empty sets.
- Process each production:
- S → aB: FIRST(S) = {a}
- S → bA: FIRST(S) = {a, b}
- A → a: FIRST(A) = {a}
- A → ε: FIRST(A) = {a, ε}
- B → b: FIRST(B) = {b}
FOLLOW Set Calculation
- Initialize FOLLOW(S) = {$}
- Process each production:
- S → aB: FOLLOW(B) = {$, a}
- S → bA: FOLLOW(A) = {$, a}
- A → a: (no change)
- A → ε: (no change)
- B → b: (no change)
After running the program with this grammar, you should get the following results:
FAQ
What is the difference between FIRST and FOLLOW sets?
The FIRST set of a non-terminal contains all terminals that can appear as the first symbol in any string derived from that non-terminal. The FOLLOW set contains all terminals that can appear immediately after the non-terminal in any sentential form.
When would I use FIRST and FOLLOW sets in compiler design?
These sets are primarily used in constructing parsing tables for top-down parsers like LL(1) parsers. They help determine which production to use during parsing based on the current input symbol.
What does the ε symbol represent in FIRST sets?
The ε symbol represents the empty string. It appears in a FIRST set when the non-terminal can derive an empty string, meaning it's optional in some productions.
How does the algorithm handle left recursion?
The algorithm can handle left recursion, but it may require more iterations to converge. In some cases, left-recursive grammars need to be transformed before FIRST/FOLLOW sets can be computed.