C Program to Calculate First and Follow of A Grammar
This guide explains how to write a C program to calculate FIRST and FOLLOW sets for a given context-free grammar. FIRST and FOLLOW sets are fundamental concepts in compiler design and formal language theory that help in parsing and syntax analysis.
Introduction
In compiler design, FIRST and FOLLOW sets are used to determine the parsing table for predictive parsers. The FIRST set of a non-terminal symbol contains all terminals that can appear as the first symbol of any string derived from that non-terminal. The FOLLOW set contains all terminals that can appear immediately after the non-terminal in any derivation.
This guide provides a complete C program to calculate FIRST and FOLLOW sets for a given grammar. The program includes functions to compute these sets and demonstrates their calculation with an example.
What is FIRST Set?
The FIRST set of a grammar symbol (terminal or non-terminal) is the set of terminals that can appear as the first symbol of any string derived from that symbol.
For a terminal symbol, the FIRST set is simply the symbol itself. For a non-terminal symbol, the FIRST set is computed recursively:
- If the production is ε (epsilon), then FIRST includes ε.
- For each production of the non-terminal, add the FIRST of the first symbol of the production to the FIRST set.
- If the first symbol can derive ε, then add the FIRST of the next symbol in the production.
Example: For the production A → aB | b, if FIRST(B) = {c, d}, then FIRST(A) = {a, b, c, d}.
What is FOLLOW Set?
The FOLLOW set of a non-terminal symbol contains all terminals that can appear immediately to the right of the non-terminal in any derivation.
To compute the FOLLOW set:
- The FOLLOW of the start symbol includes the end-of-input marker ($).
- For a production A → αBβ, add FIRST(β) to FOLLOW(B), excluding ε.
- If β can derive ε, then add FOLLOW(A) to FOLLOW(B).
Example: For the grammar S → aA, A → bA | c, FOLLOW(A) = {a, c, $}.
Algorithm to Calculate FIRST and FOLLOW
FIRST Set Algorithm
- Initialize FIRST sets for all non-terminals to empty.
- For each production rule:
- If the first symbol is a terminal, add it to the FIRST set.
- If the first symbol is a non-terminal, add its FIRST set (excluding ε) to the current FIRST set.
- If the first symbol can derive ε, continue with the next symbol in the production.
- Repeat until no more changes occur in any FIRST set.
FOLLOW Set Algorithm
- Initialize FOLLOW sets for all non-terminals to empty.
- Add $ to FOLLOW of the start symbol.
- For each production rule A → αBβ:
- Add FIRST(β) to FOLLOW(B), excluding ε.
- If β can derive ε, add FOLLOW(A) to FOLLOW(B).
- Repeat until no more changes occur in any FOLLOW set.
C Program Implementation
The following C program implements the algorithm to calculate FIRST and FOLLOW sets for a given grammar:
The complete implementation includes functions to parse the grammar, compute FIRST and FOLLOW sets, and display the results. The program handles ε productions and recursive derivations correctly.
Example Calculation
Consider the following grammar:
Calculating FIRST and FOLLOW sets for this grammar:
FIRST Sets
- FIRST(S) = {a, b}
- FIRST(A) = {c, d}
- FIRST(B) = {e, f}
FOLLOW Sets
- FOLLOW(S) = {$}
- FOLLOW(A) = {$, d}
- FOLLOW(B) = {$, f}
FAQ
- What is the difference between FIRST and FOLLOW sets?
- The FIRST set contains terminals that can appear at the beginning of any string derived from a non-terminal, while the FOLLOW set contains terminals that can appear immediately after the non-terminal in any derivation.
- When is ε included in a FIRST set?
- ε is included in the FIRST set of a non-terminal if the non-terminal can derive the empty string (ε).
- How do I handle left recursion in FIRST/FOLLOW calculation?
- Left recursion is handled by iterating through the productions until no more changes occur in the FIRST or FOLLOW sets.
- Can I use this program for any context-free grammar?
- Yes, the program can handle any context-free grammar as long as the grammar is properly defined and the non-terminals and terminals are correctly identified.