To cover the instruction encoding and decoding for the SPARC.
After completing this lab, you will be able to:
In this lab we consider instruction encoding and decoding for the operations that we have introduced in previous labs. In particular, we will consider encodings for instructions that use the data manipulation and branching operations. After we introduce instruction encoding, we consider the translation of synthetic operations. Finally, we conclude this lab by considering instruction decoding on the SPARC.
All SPARC instructions are encoded in a single 32-bit instruction word, there are no extension words.
The SPARC machine language uses two different formats for load and store instructions. These formats are shown in Figure 9.1. The first format is used for instructions that use one or two registers in the effective address. The second format is used for instructions that use an integer constant in the effective address.
Figure 9.1: Instruction formats for load and store instructions
In the first format the 32-bit instruction is divided into seven fields. The first field (reading from the left) holds the 2-bit value 11, while the fifth field (bit 13) holds the 1-bit value 0. These bits are the same for all load and store instructions that use two source registers. The sixth field (bits 5 through 12) holds the address space indicator, asi. For the present, we will always set the asi field to zero. The remaining fields, rd, op , rs , and rs , hold encodings for the destination register, the operation, and the two source registers, respectively.
Registers are encoded using the 5-bit binary representation of the register number. Table 9.1 summarizes the operation encodings for the load and store operations.
Table 9.1: Operation encodings for the load and store operations
ldd [%r4+%r7], %r11
Because this instruction uses two registers in the address specification, it is encoded using the first format shown in Figure 9.1. As such, we must determine the values for the rd, op , rs , and rs fields. The following table summarizes these encodings:
These encodings lead to the following machine instruction:
That is, 1101 0110 0001 1001 0000 0000 0000 0111 in binary, or 0xD6190007.
If the assembly language instruction only uses a single register in the address specification (e.g., register indirect addressing), the register is encoded in one of the source register fields (i.e., sr or sr ) while %r0 is encoded in the other. It doesn't matter which field holds the register specified in the assembly language instruction and which field holds the encoding for %r0. However, isem-as encodes %r0 in sr .
ldub [%r23], %r19
Because this instruction uses registers in the address specification, it is encoded using the first format shown in Figure 9.1. As such, we must determine the values for the rd, op , rs , and rs fields. The following table summarizes these encodings:
These encodings lead to the following machine instruction:
That is, 1110 0110 0000 1101 1100 0000 0000 0000 in binary, or 0xE60DC000.
In the second format the 32-bit instruction is divided into six fields. As in the previous format, the first field holds the 2-bit value 11. However, unlike the previous format, the fifth field holds the 1-bit value 1. The remaining fields, rd, op , rs , and siconst , hold encodings for the destination register, the operation, the source register, and the constant value, respectively. When this format is used, the integer constant is encoded using the 13-bit 2's complement representation and stored in the siconst field of the instruction.
The format used to encode sethi instructions is shown in Figure 9.2. Sethi instructions are encoded in four fields. The first field holds the 2-bit value 00. The next field, rd, holds the 5-bit encoding of the destination register. The third field holds the 3-bit value 100. The final filed holds the 22-bit binary encoding of the value specified in the instruction.
Figure 9.2: Instruction format for sethi instructions
sethi %hi(0x87654321), %r2
This instruction is encoded using the format shown in Figure 9.2. As such, we need to determined the values for the rd and const fields. The following table summarizes these encodings:
These encodings lead to the following machine instruction:
That is, 0000 0101 0010 0001 1101 1001 0101 0000 in binary, or 0x0521D950.
Data manipulation instructions are encoded using two formats: one for instructions that use two source registers and another for instructions that use a source register and a small integer constant. The formats used for integer data manipulation instructions are shown in Figure 9.3
Figure 9.3: Instruction formats for data manipulation instructions
In the first format the 32-bit instruction is divided into seven fields. The first field (reading from the left) holds the 2-bit value 10, while the fifth field (bit 13) holds the 1-bit value 0. These bits are the same for all data manipulation instructions that use two source registers. The sixth field (bits 5 through 12) is unused-the bits in this field must be zero. The remaining fields, rd, op , rs , and rs , hold encodings for the destination register, the operation, and the two source registers, respectively
In the second format the 32-bit instruction is divided into six fields. As in the previous format, the first field holds the 2-bit value 01. However, unlike the previous format, the fifth field holds the 1-bit value 1. The remaining fields, rd, op , rs , and siconst , hold encodings for the destination register, the operation, the source register, and the constant value, respectively. When this format is used, the integer constant is encoded using the 13-bit 2's complement representation and stored in the siconst field of the instruction.
Recall that a SPARC assembly language instruction begins with the name of the operation, followed by the two source operands, followed by the destination operand. In considering the translation from an assembly language instruction into machine language, there are a few points to keep in mind:
Table 9.2 summarizes the operation encodings for the data manipulation operations that we have covered in the previous labs. When an instruction using one of these operations is encoded, the operator encoding is placed in the op field of the machine instruction.
Table 9.2: Operation encodings for the data manipulation operations
sub %r27, %r16, %r26
Because this instruction uses two source registers, it is encoded using the first format shown in Figure 9.3. As such, we must determine the values for the op , rd, rs , and rs fields. The following table summarizes these encodings:
These encodings lead to the following machine instruction:
That is, 1011 0110 0010 0100 0000 0000 0001 1010 in binary, or 0xB624001A.
smulcc %r29, -23, %r19
Because this instruction uses one source register and a signed integer constant, it is encoded using the second format shown in Figure 9.3. As such, we must determine the values for the op , rd, rs , and siconst fields. The following table summarizes these encodings:
These encodings lead to the following machine instruction:
That is, 1010 0110 1101 1111 0111 1111 1110 1001 in binary, or 0xA6DF7FE9.
The machine language format for the conditional branching operations on the SPARC is shown in Figure 9.4. This format divides the machine instruction into five fields. The first and fourth fields hold the fixed values 102 and 0102. The remaining fields, a, cond, and disp , hold the encoded values for the annul bit, the branching condition, and program counter displacement.
Figure 9.4: Instruction format for conditional branch instructions
The a field of a machine instruction is set (i.e., 1) for instructions that use the annul suffix (``,a''). This field is clear (i.e, 0) for conditional branching instructions that do not nullify the results of the next instruction. The cond field of a machine instruction encodes the condition under which the branch is taken. Table 9.3 summarizes the operation encodings for the branching operations supported by the SPARC.
Table 9.3: Operation encodings for the conditional branching operations
To complete the encoding of an assembly language instruction that uses conditional branching, you need to determine the value of the disp field. We address this issue by considering how a processor uses this value. When the processor determines that the branching condition is satisfied, it multiplies the value in the disp field by 4 and adds it to the program counter (PC). To be more precise, the processor sign extends the 22-bit value stored in the disp field to 30 bits and concatenates two zeros to construct a 32-bit which which it adds to the PC. In effect, the disp field holds the distance from the target to the destination measured in instructions.
cmp %r2, 8 bne l1 nop inc %r3 l1:
In this case, the target is 3 instructions from the branch instruction, so the disp field will be the 22-bit binary encoding of 3.
These encodings lead to the following machine instruction:
That is, 0001 0010 1000 0000 0000 0000 0000 0011 in binary, or 0x12800003.
top: add %r2, %r3, %r2 deccc %r4 bne top
In this case, the target is 2 instructions (back) from the branch instruction, so the disp field will be the 22-bit binary encoding of -2.
These encodings lead to the following machine instruction:
That is, 0001 0010 1011 1111 1111 1111 1111 1110 in binary, or 0x12BFFFFE.
In most cases, an assembly language instruction is simply a symbolic representation of a machine language instruction. The SPARC architecture also defines a number of assembly language instructions that do not correspond directly to SPARC machine language instructions. These are called synthetic instructions. The assembler translates synthetic instructions into one or more machine language instructions. Using synthetic instructions can frequently make your programs easier to read. Table 9.4 summarizes the translation provided by the assembler for most of the synthetic instructions on the SPARC.
Table 9.4: The synthetic instructions
Most of the translations shown in Table 9.4 are straightforward. However, the implementation of the set instruction merits further discussion. The assembler will always try to use one of the first two translations if it can. That is, if the constant value can be represented in 13 bits, the assembler will select the first translation. If the least significant 10 bits of the constant value are 0, it will used the second translation. Otherwise, the assembler will use the third translation. Note, if the constant value is relocatable, the assembler will always select the third translation.
The Y register, introduced in Lab 4 is one of the SPARC state registers. As shown in Table 9.4, when you use a state register as the destination in a mov instruction, it is translated to a wr (write) instruction. Similarly, when you use a state register as the source register in a mov instruction it is translated to a rd (read) instruction.
Write instructions are encoded using the formats shown in Figure 9.3. When the destination register is the Y register, the rd field is set to the 5-bit value 00000 and the op field is set to the 6-bit value 110000.
Read instructions are encoded using the second format shown in Figure 9.3. When the source register is the Y register, the op field is set to the 6-bit value 101000 and the rs field is set to the 5-bit value 00000.
In this lab, we have limited our discussion to the translation of instructions that use absolute expressions. We will consider the translation of relocatable expressions when we consider linking and loading in Lab 15.
We conclude our discussion of instruction formats by considering instruction decoding. That is, the process by which a SPARC processor determines the instruction it is executing.
The SPARC uses a distributed opcode. The two most significant bits in an instruction represent the primary opcode. If the primary opcode is 00, bits 22-24 of the instruction provide the secondary opcode. If the primary opcode is 01, the instruction is a call instruction and the remaining bits (bits 0-29) are a displacement for the program counter (we will discuss the call instruction at greater length in Lab 10). Otherwise, if the primary opcode is either 10 or 11, bits 19-24 of the instruction provide the secondary opcode. Figure 9.5 illustrates the positions of the secondary opcodes based on the primary opcode.
Figure 9.5: The primary opcode ina SPARC instruction
Once you have determined the primary and secondary opcodes, you'll be able to to determined the instruction and, knowing the instruction, decode the remaining fields of the instruction. If the primary opcode is 01, the instruction is a call instruction and you can easily complete the decoding of the instruction.
If the primary opcode is 00, the instruction is an unimplemented instruction, a conditional branch instruction, or a sethi instruction. Table 9.5 summarizes how the 3-bit value in op is used to identify the instruction.
Table 9.5: Decoding the op field
The data manipulation instructions are encoded with a primary opcode of 10. Table 9.6 shows how the 6-bit value in the op field is used to determine the instruction when the primary opcode is 10.
Table 9.6: Decoding the op field when the primary opcode is 10
Instructions that access memory are encoded with a primary opcode of 11. Table 9.7 shows how the 6-bit value in the op field is used to determine the instruction when the primary opcode is 11.
Table 9.7: Decoding the op field when the primary opcode is 11
When you decode an instrcution that has a primary opcode of 10 or 11, you will need to examine bit 13 to determine whether bits 0-12 of the instruction hold an immediate value or a register. If bit 13 is 1, bits 0-12 hold an immediate value.
In binary, this instruction is 00 00100 100 000100.... That is, the primary opcode is 00 and op is 100. From Table 9.5, this is a sethi instruction. Using the sethi format to partition the bits yields:
Thus, the destination register is %r4, and the integer constant is 0x12345. The following instruction will be assembled as 0x09012345.
sethi %hi(0x12345<<10), %r4
In binary, this instruction is 00 01000 010 000000.... That is, the primary opcode is 00 and op is 010. From Table 9.5, this is a conditional branch instruction. Using the conditional branch format to partition the bits yields:
Thus, the operator is ``ba'' and the displacement is +6 words. The following instruction will be assembled as 0x10800006.
ba .+(6*4)(When you use isem-as, `.' is the address of the current instruction.
In binary, the instruction is 10 00011 000000 0001.... That is, the primary opcode is 10 and op is 000000. From Table 9.6, this is an add instruction. Because bit 13 is 1, we use the second format in Figure 9.3 to decode this instruction.
Thus, the destination is %r3, the source register is %r5, and the constant is 0xE. The following instruction will be assembled as 0x8601600E.
add %r5, 14, %r3