Next: 10 Leaf Procedures on Up: A Laboratory Manual for Previous: 8 The ISEM Graphics

9 The SPARC Instruction Formats

9.1 Goal

To cover the instruction encoding and decoding for the SPARC.

9.2 Objectives

After completing this lab, you will be able to:

Hand assemble SPARC assembly language instructions, and
Hand disassemble SPARC machine language instructions.

9.3 Discussion

In this lab we consider instruction encoding and decoding for the operations that we have introduced in previous labs. In particular, we will consider encodings for instructions that use the data manipulation and branching operations. After we introduce instruction encoding, we consider the translation of synthetic operations. Finally, we conclude this lab by considering instruction decoding on the SPARC.

All SPARC instructions are encoded in a single 32-bit instruction word, there are no extension words.

9.3.1 Encoding load and store instructions

The SPARC machine language uses two different formats for load and store instructions. These formats are shown in Figure 9.1. The first format is used for instructions that use one or two registers in the effective address. The second format is used for instructions that use an integer constant in the effective address.

Figure 9.1: Instruction formats for load and store instructions

In the first format the 32-bit instruction is divided into seven fields. The first field (reading from the left) holds the 2-bit value 11, while the fifth field (bit 13) holds the 1-bit value 0. These bits are the same for all load and store instructions that use two source registers. The sixth field (bits 5 through 12) holds the address space indicator, asi. For the present, we will always set the asi field to zero. The remaining fields, rd, op , rs , and rs , hold encodings for the destination register, the operation, and the two source registers, respectively.

Registers are encoded using the 5-bit binary representation of the register number. Table 9.1 summarizes the operation encodings for the load and store operations.

table2232
Table 9.1: Operation encodings for the load and store operations

Example: Hand assemble the instruction:

ldd     [%r4+%r7], %r11

Because this instruction uses two registers in the address specification, it is encoded using the first format shown in Figure 9.1. As such, we must determine the values for the rd, op , rs , and rs fields. The following table summarizes these encodings:

tabular2256

These encodings lead to the following machine instruction:

That is, 1101 0110 0001 1001 0000 0000 0000 0111 in binary, or 0xD6190007.

If the assembly language instruction only uses a single register in the address specification (e.g., register indirect addressing), the register is encoded in one of the source register fields (i.e., sr or sr ) while %r0 is encoded in the other. It doesn't matter which field holds the register specified in the assembly language instruction and which field holds the encoding for %r0. However, isem-as encodes %r0 in sr .

Example: Hand assemble the instruction:

ldub     [%r23], %r19

Because this instruction uses registers in the address specification, it is encoded using the first format shown in Figure 9.1. As such, we must determine the values for the rd, op , rs , and rs fields. The following table summarizes these encodings:

tabular2311

These encodings lead to the following machine instruction:

That is, 1110 0110 0000 1101 1100 0000 0000 0000 in binary, or 0xE60DC000.

In the second format the 32-bit instruction is divided into six fields. As in the previous format, the first field holds the 2-bit value 11. However, unlike the previous format, the fifth field holds the 1-bit value 1. The remaining fields, rd, op , rs , and siconst , hold encodings for the destination register, the operation, the source register, and the constant value, respectively. When this format is used, the integer constant is encoded using the 13-bit 2's complement representation and stored in the siconst field of the instruction.

9.3.2 Encoding sethi instructions

The format used to encode sethi instructions is shown in Figure 9.2. Sethi instructions are encoded in four fields. The first field holds the 2-bit value 00. The next field, rd, holds the 5-bit encoding of the destination register. The third field holds the 3-bit value 100. The final filed holds the 22-bit binary encoding of the value specified in the instruction.

Figure 9.2: Instruction format for sethi instructions

Example: Hand assemble the instruction:

sethi   %hi(0x87654321), %r2

This instruction is encoded using the format shown in Figure 9.2. As such, we need to determined the values for the rd and const fields. The following table summarizes these encodings:

These encodings lead to the following machine instruction:

That is, 0000 0101 0010 0001 1101 1001 0101 0000 in binary, or 0x0521D950.

9.3.3 Encoding integer data manipulation instructions

Data manipulation instructions are encoded using two formats: one for instructions that use two source registers and another for instructions that use a source register and a small integer constant. The formats used for integer data manipulation instructions are shown in Figure 9.3

Figure 9.3: Instruction formats for data manipulation instructions

In the first format the 32-bit instruction is divided into seven fields. The first field (reading from the left) holds the 2-bit value 10, while the fifth field (bit 13) holds the 1-bit value 0. These bits are the same for all data manipulation instructions that use two source registers. The sixth field (bits 5 through 12) is unused-the bits in this field must be zero. The remaining fields, rd, op , rs , and rs , hold encodings for the destination register, the operation, and the two source registers, respectively

In the second format the 32-bit instruction is divided into six fields. As in the previous format, the first field holds the 2-bit value 01. However, unlike the previous format, the fifth field holds the 1-bit value 1. The remaining fields, rd, op , rs , and siconst , hold encodings for the destination register, the operation, the source register, and the constant value, respectively. When this format is used, the integer constant is encoded using the 13-bit 2's complement representation and stored in the siconst field of the instruction.

Recall that a SPARC assembly language instruction begins with the name of the operation, followed by the two source operands, followed by the destination operand. In considering the translation from an assembly language instruction into machine language, there are a few points to keep in mind:

The operation is encoded in the op field.
The first source operand must be a register and it is encoded in the rs field.
The second source operand can be a register or a constant value. If it is a register, it is encoded in the rs field; otherwise, it is encoded in the siconst field.
The destination register is encoded in the rd field.

Table 9.2 summarizes the operation encodings for the data manipulation operations that we have covered in the previous labs. When an instruction using one of these operations is encoded, the operator encoding is placed in the op field of the machine instruction.

table2535
Table 9.2: Operation encodings for the data manipulation operations

Example: Hand Assemble the following SPARC instructions.

sub     %r27, %r16, %r26

Because this instruction uses two source registers, it is encoded using the first format shown in Figure 9.3. As such, we must determine the values for the op , rd, rs , and rs fields. The following table summarizes these encodings:

tabular2562

These encodings lead to the following machine instruction:

That is, 1011 0110 0010 0100 0000 0000 0001 1010 in binary, or 0xB624001A.

Example: Hand Assemble the following SPARC instructions.

smulcc  %r29, -23, %r19

Because this instruction uses one source register and a signed integer constant, it is encoded using the second format shown in Figure 9.3. As such, we must determine the values for the op , rd, rs , and siconst fields. The following table summarizes these encodings:

tabular2614

These encodings lead to the following machine instruction:

That is, 1010 0110 1101 1111 0111 1111 1110 1001 in binary, or 0xA6DF7FE9.

9.3.4 Encoding conditional branching instructions

The machine language format for the conditional branching operations on the SPARC is shown in Figure 9.4. This format divides the machine instruction into five fields. The first and fourth fields hold the fixed values 102 and 0102. The remaining fields, a, cond, and disp , hold the encoded values for the annul bit, the branching condition, and program counter displacement.

Figure 9.4: Instruction format for conditional branch instructions

The a field of a machine instruction is set (i.e., 1) for instructions that use the annul suffix (``,a''). This field is clear (i.e, 0) for conditional branching instructions that do not nullify the results of the next instruction. The cond field of a machine instruction encodes the condition under which the branch is taken. Table 9.3 summarizes the operation encodings for the branching operations supported by the SPARC.

table2688
Table 9.3: Operation encodings for the conditional branching operations

To complete the encoding of an assembly language instruction that uses conditional branching, you need to determine the value of the disp field. We address this issue by considering how a processor uses this value. When the processor determines that the branching condition is satisfied, it multiplies the value in the disp field by 4 and adds it to the program counter (PC). To be more precise, the processor sign extends the 22-bit value stored in the disp field to 30 bits and concatenates two zeros to construct a 32-bit which which it adds to the PC. In effect, the disp field holds the distance from the target to the destination measured in instructions.

Example: Hand Assemble the branch instruction in the following SPARC code fragment.

        cmp     %r2, 8
        bne     l1
        nop
        inc     %r3
l1:

In this case, the target is 3 instructions from the branch instruction, so the disp field will be the 22-bit binary encoding of 3.

tabular2715

These encodings lead to the following machine instruction:

That is, 0001 0010 1000 0000 0000 0000 0000 0011 in binary, or 0x12800003.

Example: Hand Assemble the branch instruction in the following SPARC code fragment.

top:    add     %r2, %r3, %r2
        deccc   %r4
        bne     top

In this case, the target is 2 instructions (back) from the branch instruction, so the disp field will be the 22-bit binary encoding of -2.

tabular2747

These encodings lead to the following machine instruction:

That is, 0001 0010 1011 1111 1111 1111 1111 1110 in binary, or 0x12BFFFFE.

9.3.5 Synthetic Instructions

In most cases, an assembly language instruction is simply a symbolic representation of a machine language instruction. The SPARC architecture also defines a number of assembly language instructions that do not correspond directly to SPARC machine language instructions. These are called synthetic instructions. The assembler translates synthetic instructions into one or more machine language instructions. Using synthetic instructions can frequently make your programs easier to read. Table 9.4 summarizes the translation provided by the assembler for most of the synthetic instructions on the SPARC.

table2775
Table 9.4: The synthetic instructions

Most of the translations shown in Table 9.4 are straightforward. However, the implementation of the set instruction merits further discussion. The assembler will always try to use one of the first two translations if it can. That is, if the constant value can be represented in 13 bits, the assembler will select the first translation. If the least significant 10 bits of the constant value are 0, it will used the second translation. Otherwise, the assembler will use the third translation. Note, if the constant value is relocatable, the assembler will always select the third translation.

9.3.6 The read and write instructions

The Y register, introduced in Lab 4 is one of the SPARC state registers. As shown in Table 9.4, when you use a state register as the destination in a mov instruction, it is translated to a wr (write) instruction. Similarly, when you use a state register as the source register in a mov instruction it is translated to a rd (read) instruction.

Write instructions are encoded using the formats shown in Figure 9.3. When the destination register is the Y register, the rd field is set to the 5-bit value 00000 and the op field is set to the 6-bit value 110000.

Read instructions are encoded using the second format shown in Figure 9.3. When the source register is the Y register, the op field is set to the 6-bit value 101000 and the rs field is set to the 5-bit value 00000.

9.3.7 Relocatable expressions

In this lab, we have limited our discussion to the translation of instructions that use absolute expressions. We will consider the translation of relocatable expressions when we consider linking and loading in Lab 15.

9.3.8 Decoding SPARC instructions

We conclude our discussion of instruction formats by considering instruction decoding. That is, the process by which a SPARC processor determines the instruction it is executing.

The SPARC uses a distributed opcode. The two most significant bits in an instruction represent the primary opcode. If the primary opcode is 00, bits 22-24 of the instruction provide the secondary opcode. If the primary opcode is 01, the instruction is a call instruction and the remaining bits (bits 0-29) are a displacement for the program counter (we will discuss the call instruction at greater length in Lab 10). Otherwise, if the primary opcode is either 10 or 11, bits 19-24 of the instruction provide the secondary opcode. Figure 9.5 illustrates the positions of the secondary opcodes based on the primary opcode.

Figure 9.5: The primary opcode ina SPARC instruction

Once you have determined the primary and secondary opcodes, you'll be able to to determined the instruction and, knowing the instruction, decode the remaining fields of the instruction. If the primary opcode is 01, the instruction is a call instruction and you can easily complete the decoding of the instruction.

If the primary opcode is 00, the instruction is an unimplemented instruction, a conditional branch instruction, or a sethi instruction. Table 9.5 summarizes how the 3-bit value in op is used to identify the instruction.

table3069
Table 9.5: Decoding the op field

The data manipulation instructions are encoded with a primary opcode of 10. Table 9.6 shows how the 6-bit value in the op field is used to determine the instruction when the primary opcode is 10.

table3083
Table 9.6: Decoding the op field when the primary opcode is 10

Instructions that access memory are encoded with a primary opcode of 11. Table 9.7 shows how the 6-bit value in the op field is used to determine the instruction when the primary opcode is 11.

table3095
Table 9.7: Decoding the op field when the primary opcode is 11

When you decode an instrcution that has a primary opcode of 10 or 11, you will need to examine bit 13 to determine whether bits 0-12 of the instruction hold an immediate value or a register. If bit 13 is 1, bits 0-12 hold an immediate value.

Example: Give an instruction that will assemble to the value 0x09012345.

In binary, this instruction is 00 00100 100 000100.... That is, the primary opcode is 00 and op is 100. From Table 9.5, this is a sethi instruction. Using the sethi format to partition the bits yields:

Thus, the destination register is %r4, and the integer constant is 0x12345. The following instruction will be assembled as 0x09012345.

sethi   %hi(0x12345<<10), %r4

Example: Give an instruction that will assemble to the value 0x10800006.

In binary, this instruction is 00 01000 010 000000.... That is, the primary opcode is 00 and op is 010. From Table 9.5, this is a conditional branch instruction. Using the conditional branch format to partition the bits yields:

Thus, the operator is ``ba'' and the displacement is +6 words. The following instruction will be assembled as 0x10800006.

ba      .+(6*4)

(When you use isem-as, `.' is the address of the current instruction.

Example: Give an instruction that will assemble to the value 0x8601600E

In binary, the instruction is 10 00011 000000 0001.... That is, the primary opcode is 10 and op is 000000. From Table 9.6, this is an add instruction. Because bit 13 is 1, we use the second format in Figure 9.3 to decode this instruction.

Thus, the destination is %r3, the source register is %r5, and the constant is 0xE. The following instruction will be assembled as 0x8601600E.

add     %r5, 14, %r3

9.4 Summary

9.5 Review Questions

9.6 Exercises

Next: 10 Leaf Procedures on Up: A Laboratory Manual for Previous: 8 The ISEM Graphics

Barney Maccabe
Mon Sep 2 20:51:56 MDT 1996