CS 351 Translation Exercise

Barney Maccabe Last modified: Tue Oct 5 08:49:18 MDT 1999

Introduction

The goal of this exercise is to introduce the basics of program translation. We start by defining a simple, assembly-like, target language. This language provides very simple operations and primitive control structures.

After we define the target language, we define a simple programming language -- the input language. The language includes input and output statements, assignment statements, expressions written using infix notation and integer variables with implicit declarations.

The program you write for this assignment will translate programs written in the input language into the target language. Source code for an interpreter for the target language is provided so that you can actually run the programs you have compiled. If you spot a significant problem with the code (i.e., a bug, or a particularly ugly piece of code), send me mail (maccabe@cs.unm.edu) and you'll earn a bonus point provided that you carefully document the problem.

Due Date: Thursday September 30, 1999 at the start of class

Description

The Target Language

The target language is line oriented, i.e., each instruction is written on a single line and each line has at most one instruction. Blank lines and lines in which the first non-space character is ';' are ignored. Instruction lines consist of an optional instruction label (a label is an alphanumeric string followed by a colon, ':'); followed by an operation; followed by zero, one, or two operands. Additional text after the last operand is ignored; however, this text should be proceeded by a semicolon (i.e., it should look like a comment.)

The Language

Operation	Operand 1	Operand 2	Meaning
input	store label		input --> store[opnd1]
output	store label		store[opnd1] --> output
copy	store label	store label	store[opnd1] --> store[opnn2]
set	value	store label	opnd1 --> store[opnd2]
mult	store label	store label	store[opnd1] * store[opnd2] --> store[opnd2]
div	store label	store label	store[opnd1] / store[opnd2] --> store[opnd2]
add	store label	store label	store[opnd1] + store[opnd2] --> store[opnd2]
sub	store label	store label	store[opnd1] - store[opnd2] --> store[opnd2]
eq	store label	code label	if( store[opnd1] == 0 ) opnd2 --> PC
ne	store label	code label	if( store[opnd1] != 0 ) opnd2 --> PC
lt	store label	code label	if( store[opnd1] < 0 ) opnd2 --> PC
le	store label	code label	if( store[opnd1] <= 0 ) opnd2 --> PC
ge	store label	code label	if( store[opnd1] >= 0 ) opnd2 --> PC
gt	store label	code label	if( store[opnd1] > 0 ) opnd2 --> PC
goto	code label		opnd1 --> PC
nop			no operation
stop			terminates execution
end			marks the end of the program (translation time)

An Example

The following sample program reads two numbers and computes the product of the numbers using addition:

	;;
	;; a sample program -- read two integers and produce the product
	;; using addition only 
	;;

	input	a
	input	b

	set	0	prod
	copy	a	t1

	ge	t1	l1	; make sure t1 is non-negative
	set	0	t2
	sub	t2	t1
l1:	nop

l2:	nop
	eq	t1	l3	; quit when we get to zero
	add	b	prod
	set	-1	t3	; decrement t1
	add	t3	t1
	goto	l2

l3:	nop

	ge	a	l4      ; negate the result if a < 0
	set	0	t2
	sub	t2	prod

l4:     nop
	output	prod
	
	stop
	end
	
	-4
	5

Input Language

Programs written in the input language consist of a sequences of statement. The language provides five types of statements: read, write, assignment, if, and while. All variables are of type integer and implicitly declared by their first use.

Grammar

The following is a grammar for the input language:

           <goal> :: <stmt list> eof
      <stmt list> :: <stmt> | <stmt> <stmt list>
           <stmt> :: <read stmt> | <write stmt> | <assign stmt> | <if stmt> | <while stmt>
      <read stmt> :: "read" <operand> ';'
     <write stmt> :: "write" <expr> ';'
    <assign stmt> :: <operand> '=' <expr> ';'
        <if stmt> :: "if" <expr> "then" <stmt list> "end"
                   | "if" <expr> "then" <stmt list> "else" <stmt list> "end"
     <while stmt> :: "while" <expr> "do" <stmt list> "end"
           <expr> :: <operand>
                   | <expr> <binary operator> <expr>
                   | <unary operator> <expr>
                   | '(' <expr> ')'
        <operand> :: <letter or digit>
                   | <letter or digit> <operand>

<binary operator> :: '*' | '/' | '+' | '-' | "==" | "!=" | "<" | "<=" | ">=" | ">"

 <unary operator> :: '-'

<letter or digit> :: 'a' | 'A' | 'b' | 'B' | ... '8' | '9'

Examples

The following program computes the product of its two inputs (again, using addition).

read x;
read y;

prod = 0;
temp = x;
if temp < 0 then 
    temp = -temp;
end

while temp != 0 do
    temp = temp - 1;
    prod = prod + y;
end

if x < 0 then
    prod = -prod;
end

write prod;

Grammar Manipulation

The grammar given for the input language has two significant problems. First, it is ambiguous. Second, it cannot be used directly in a recursive descent parse. We will correct these shortcomings in this section.

We start by addressing the ambiguity problem. The ambiguity problem occurs in the recognition of expressions. Consider the expression:

a * b + c

This expression can be recognized by two parse trees:

                                           <expr>
                     ______________________/  | \______
                    /                         |        \
                   /                          |         \
               <expr>                         |          \
      _________/  | \__________               |           \
     /            |            \              |            \
    /             |             \             |             \
<operand> <binary operator> <operand> <binary operator> <operand>
    |             |             |             |             |
    a             *             b             +             c


               <expr>
           ____/  | \_______________________
          /       |                         \
         /        |                          \
        /         |                        <expr>
       /          |                ________/  | \_________
      /           |               /           |           \
     /            |              /            |            \
<operand> <binary operator> <operand> <binary operator> <operand>
    |             |             |             |             |
    a             *             b             +             c

The following expression grammar resolves the ambiguities in the original grammar by introducing traditional associativity and precedence rules:

    <expr> :: <expr> <rel op> <term>
            | <term>
    <term> :: <term> <add op> <factor>
            | <factor>
  <factor> :: <factor> <mult op> <primary>
            | <primary>
 <primary> :: <operand> | <unary op> <expr> | '(' <expr> ')'
  <rel op> :: "==" | "!=" | "<" | "<=" | ">=" | ">"
  <add op> :: '+' | '-'
 <mult op> :: '*' | '/'
<unary op> :: '-'

This grammar yields the following parse tree for the expression given earlier:

                                <expr>
                                  |
                                <term>
                 _______________/ |  \
                /                 |   \
               /                  |    \
            <term>                |     \
              |                   |      \
           <factor>               |      |
       ____/  |  \__              |      |
      /       |     \             |      \
     /        |      \            |       \
 <factor>     |       \           |     <factor>
    |         |        \          |        |
<primary>     |     <primary>     |    <primary>
    |         |         |         |        |
<operand> <mult op> <operand> <add op> <operand>
    |         |         |         |        |
    a         *         b         +        c

The problem with this grammar is that it does not lend itself to predictive parsing. The problem can be seen by considering how you know whether to expand <expr> to <term> or to <expr> <rel op> <term>. When you can see the whole expression, the decision is relatively easy; however, when you have a limited lookahead, it becomes impossible to make this decision in the general case. We avoid this problem by altering the grammar one final time.

    <expr> :: <term><term'>
   <term'> :: epsilon | <rel op><term><term'>
    <term> :: <factor><factor'>
 <factor'> :: epsilon | <add op><factor><factor'>
  <factor> :: <primary><primary'>
<primary'> :: epsilon | <mult op><primary><primary'>
 <primary> :: <operand> | <unary op> <expr> | '(' <expr> ')'

Code Structure

Compilers usually break the recognition process into two stages: lexical analysis and parsing. Lexical analysis is the process of recognizing simple structures in the code called "tokens," which include punctuation marks and words. Parsing is the process of recognizing the larger structures define by the grammar.

For this project, your lexical analyzer should recognize the following tokens: semicolon, left parenthesis, right parenthesis, equal (==), not equal (!=), less than (<), less or equal (<=), greater or equal (>=), greater than (>), plus (+), minus (-), asterisk (*), slash (/), assign (=), the reserved words ("while", "do", "if", "then", "else", "end", "read", "write"), operands, and eof.

Using the final expression grammar, along with the other parts of first grammar, parsing is relatively straightforward. To illustrate this, the following code fragment illustrates the recognition of an expression:

    void read_term(){
	read_fact();
	read_fact_();
    };
    
    void read_term_() {
	Token t = lex.peek();
	switch( t.kind() ) {
	case TOK_EQOP:
	case TOK_NEOP:
	case TOK_LTOP:
	case TOK_LEOP:
	case TOK_GEOP:
	case TOK_GTOP:
	    lex.get();		// consume the token
	    read_term();
	    read_term_();
	default:
	    ;			// epsilon
	}
    };

    void read_expr() {
	read_term();
	read_term_();
    }