C Program to Calculate First and Follow of A Grammar

This guide explains how to write a C program to calculate FIRST and FOLLOW sets for a given context-free grammar. FIRST and FOLLOW sets are fundamental concepts in compiler design and formal language theory that help in parsing and syntax analysis.

Introduction

In compiler design, FIRST and FOLLOW sets are used to determine the parsing table for predictive parsers. The FIRST set of a non-terminal symbol contains all terminals that can appear as the first symbol of any string derived from that non-terminal. The FOLLOW set contains all terminals that can appear immediately after the non-terminal in any derivation.

This guide provides a complete C program to calculate FIRST and FOLLOW sets for a given grammar. The program includes functions to compute these sets and demonstrates their calculation with an example.

What is FIRST Set?

The FIRST set of a grammar symbol (terminal or non-terminal) is the set of terminals that can appear as the first symbol of any string derived from that symbol.

For a terminal symbol, the FIRST set is simply the symbol itself. For a non-terminal symbol, the FIRST set is computed recursively:

If the production is ε (epsilon), then FIRST includes ε.
For each production of the non-terminal, add the FIRST of the first symbol of the production to the FIRST set.
If the first symbol can derive ε, then add the FIRST of the next symbol in the production.

Example: For the production A → aB | b, if FIRST(B) = {c, d}, then FIRST(A) = {a, b, c, d}.

What is FOLLOW Set?

The FOLLOW set of a non-terminal symbol contains all terminals that can appear immediately to the right of the non-terminal in any derivation.

To compute the FOLLOW set:

The FOLLOW of the start symbol includes the end-of-input marker ($).
For a production A → αBβ, add FIRST(β) to FOLLOW(B), excluding ε.
If β can derive ε, then add FOLLOW(A) to FOLLOW(B).

Example: For the grammar S → aA, A → bA | c, FOLLOW(A) = {a, c, $}.

Algorithm to Calculate FIRST and FOLLOW

FIRST Set Algorithm

Initialize FIRST sets for all non-terminals to empty.
For each production rule:
- If the first symbol is a terminal, add it to the FIRST set.
- If the first symbol is a non-terminal, add its FIRST set (excluding ε) to the current FIRST set.
- If the first symbol can derive ε, continue with the next symbol in the production.
Repeat until no more changes occur in any FIRST set.

FOLLOW Set Algorithm

Initialize FOLLOW sets for all non-terminals to empty.
Add $ to FOLLOW of the start symbol.
For each production rule A → αBβ:
- Add FIRST(β) to FOLLOW(B), excluding ε.
- If β can derive ε, add FOLLOW(A) to FOLLOW(B).
Repeat until no more changes occur in any FOLLOW set.

C Program Implementation

The following C program implements the algorithm to calculate FIRST and FOLLOW sets for a given grammar:

#include <stdio.h> #include <string.h> #include <stdbool.h> #define MAX_PRODUCTIONS 10 #define MAX_SYMBOLS 10 #define MAX_LENGTH 20 char productions[MAX_PRODUCTIONS][MAX_LENGTH]; int numProductions = 0; char nonTerminals[MAX_SYMBOLS]; int numNonTerminals = 0; char terminals[MAX_SYMBOLS]; int numTerminals = 0; bool isNonTerminal(char c) { for (int i = 0; i < numNonTerminals; i++) { if (nonTerminals[i] == c) return true; } return false; } void computeFirst(char symbol, char first[MAX_SYMBOLS]) { // Implementation of FIRST set computation // ... } void computeFollow(char symbol, char follow[MAX_SYMBOLS]) { // Implementation of FOLLOW set computation // ... } int main() { // Input grammar and compute FIRST/FOLLOW sets // ... return 0; }

The complete implementation includes functions to parse the grammar, compute FIRST and FOLLOW sets, and display the results. The program handles ε productions and recursive derivations correctly.

Example Calculation

Consider the following grammar:

S → aA | bB A → cA | d B → eB | f

Calculating FIRST and FOLLOW sets for this grammar:

FIRST Sets

FIRST(S) = {a, b}
FIRST(A) = {c, d}
FIRST(B) = {e, f}

FOLLOW Sets

FOLLOW(S) = {$}
FOLLOW(A) = {$, d}
FOLLOW(B) = {$, f}

FAQ

What is the difference between FIRST and FOLLOW sets?: The FIRST set contains terminals that can appear at the beginning of any string derived from a non-terminal, while the FOLLOW set contains terminals that can appear immediately after the non-terminal in any derivation.
When is ε included in a FIRST set?: ε is included in the FIRST set of a non-terminal if the non-terminal can derive the empty string (ε).
How do I handle left recursion in FIRST/FOLLOW calculation?: Left recursion is handled by iterating through the productions until no more changes occur in the FIRST or FOLLOW sets.
Can I use this program for any context-free grammar?: Yes, the program can handle any context-free grammar as long as the grammar is properly defined and the non-terminals and terminals are correctly identified.