Introduction

We will be looking at the front end, i.e., the analysis portion of a compiler.

The syntax describes the form of a program in a given language, while the semanticsdescribes the meaning of that program. We will use the standard context-free grammaror BNF (Backus-Naur Form) to describe the syntax

We will learn syntax-directed translation, where the grammar does more than specify the syntax. We augment the grammar with attributes and use this to guide the entire front end.

The front end discussed in this chapter has as source language infix expressions consisting of digits, +, and -. The target language is postfix expressions with the same components. The compiler will convert

7+4-5 to 74+5-.

Actually, our simple compiler will handle a few other operators as well.

We will tokenize the input (i.e., write a scanner), model the syntax of the source, and let this syntax direct the translation all the way to three-address code, our intermediate language.


Syntax Definition


Definition of Grammars


context-free grammar (CFG) consists of

  1. A set of terminal tokens.
  2. A set of nonterminals.
  3. A set of productions (rules for transforming nonterminals).
  4. A specific nonterminal designated as start symbol.

Example:

    Terminals: 0 1 2 3 4 5 6 7 8 9 + -
    Nonterminals: list digit
    Productions: list → list + digit
                 list → list - digit
                 list → digit
                 digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
    Start symbol: list
  

If no start symbol is specifically designated, the LHS of the first production is the start symbol.


Derivations

Watch how we can generate the input 7+4-5 starting with the start symbol, applying productions, and stopping when no productions are possible (we have only terminals).

    list → list - digit
         → list - 5
         → list + digit - 5
         → list + 4 - 5
         → digit + 4 - 5
         → 7 + 4 - 5
  

This process of applying productions, starting with the start symbol and ending when only terminals are present is called aderivation and we say that the final string has been derived from the initial string (in this case the start symbol).

The set of all strings derivable from the start symbol is the language generated by the CFG

  • It is important that you see that this context-free grammar generates precisely the set of infix expressions with single digits as operands (so 25 is not allowed) and + and - as operators.
  • The way you get different final expressions is that you make different choices of which production to apply. There are 3 productions you can apply to list and 10 you can apply to digit.
  • The result cannot have blanks since blank is not a terminal.
  • The empty string is not possible since, starting from list, we cannot get to the empty string. If we wanted to include the empty string, we would add the production
  • list → ε
  • The idea is that the input language to the compiler is approximately the language generated by the grammar. It is approximate since I have ignored the scanner.

Given a grammar, parsing a string consists of determining if the string is in the language generated by the grammar. If it is in the language, parsing produces a derivation. If it is not, parsing reports an error.

The opposite of derivation is reduction, that is, the LHS of a production, produces the RHS (a derivation) and the RHS is reduced by the production to the LHS.


Amrita Bagchi

Amrita Bagchi Creator

(No description available)

Suggested Creators

Amrita Bagchi