To cover the instruction encoding and decoding for the SPARC.
After completing this lab, you will be able to:
In this lab we consider instruction encoding and decoding for the operations that we have introduced in previous labs. In particular, we will consider encodings for instructions that use the data manipulation and branching operations. After we introduce instruction encoding, we consider the translation of synthetic operations. Finally, we conclude this lab by considering instruction decoding on the SPARC.
All SPARC instructions are encoded in a single 32-bit instruction word, there are no extension words.
The SPARC machine language uses two different formats for load and store instructions. These formats are shown in Figure 9.1. The first format is used for instructions that use one or two registers in the effective address. The second format is used for instructions that use an integer constant in the effective address.
Figure 9.1: Instruction formats for load and store instructions
In the first format the 32-bit instruction is divided into seven
fields. The first field (reading from the left) holds the 2-bit value
11, while the fifth field (bit 13) holds the 1-bit value 0. These
bits are the same for all load and store instructions that use two
source registers. The sixth field (bits 5 through 12) holds the
address space indicator, asi. For the present, we will always
set the asi field to zero. The remaining fields, rd,
op , rs
, and rs
, hold
encodings for the destination register, the operation, and the two
source registers, respectively.
Registers are encoded using the 5-bit binary representation of the register number. Table 9.1 summarizes the operation encodings for the load and store operations.
Table 9.1: Operation encodings for the load and store operations
ldd [%r4+%r7], %r11
Because this instruction uses two registers in the address
specification, it is encoded using the first format shown in
Figure 9.1. As such, we must determine the values for
the rd, op , rs
, and rs
fields. The following table summarizes these encodings:
These encodings lead to the following machine instruction:
That is, 1101 0110 0001 1001 0000 0000 0000 0111 in binary, or 0xD6190007.
If the assembly language instruction only uses a single register in
the address specification (e.g., register indirect addressing), the
register is encoded in one of the source register fields (i.e.,
sr or sr
) while %r0 is encoded in the other.
It doesn't matter which field holds the register specified in the
assembly language instruction and which field holds the encoding for
%r0. However, isem-as encodes %r0 in sr
.
ldub [%r23], %r19
Because this instruction uses registers in the address
specification, it is encoded using the first format shown in
Figure 9.1. As such, we must determine the values for
the rd, op , rs
, and rs
fields. The following table summarizes these encodings:
These encodings lead to the following machine instruction:
That is, 1110 0110 0000 1101 1100 0000 0000 0000 in binary, or 0xE60DC000.
In the second format the 32-bit instruction is divided into six
fields. As in the previous format, the first field holds the 2-bit
value 11. However, unlike the previous format, the fifth field holds
the 1-bit value 1. The remaining fields, rd, op ,
rs
, and siconst
, hold encodings for the
destination register, the operation, the source register, and the
constant value, respectively. When this format is used, the integer
constant is encoded using the 13-bit 2's complement representation and
stored in the siconst
field of the instruction.
The format used to encode sethi instructions is shown in Figure 9.2. Sethi instructions are encoded in four fields. The first field holds the 2-bit value 00. The next field, rd, holds the 5-bit encoding of the destination register. The third field holds the 3-bit value 100. The final filed holds the 22-bit binary encoding of the value specified in the instruction.
Figure 9.2: Instruction format for sethi instructions
sethi %hi(0x87654321), %r2
This instruction is encoded using the format shown in
Figure 9.2. As such, we need to determined the
values for the rd and const fields. The
following table summarizes these encodings:
These encodings lead to the following machine instruction:
That is, 0000 0101 0010 0001 1101 1001 0101 0000 in binary, or 0x0521D950.
Data manipulation instructions are encoded using two formats: one for instructions that use two source registers and another for instructions that use a source register and a small integer constant. The formats used for integer data manipulation instructions are shown in Figure 9.3
Figure 9.3: Instruction formats for data manipulation instructions
In the first format the 32-bit instruction is divided into seven
fields. The first field (reading from the left) holds the 2-bit value
10, while the fifth field (bit 13) holds the 1-bit value 0. These
bits are the same for all data manipulation instructions that use two
source registers. The sixth field (bits 5 through 12) is unused-the
bits in this field must be zero. The remaining fields, rd,
op , rs
, and rs
, hold
encodings for the destination register, the operation, and the two
source registers, respectively
In the second format the 32-bit instruction is divided into six
fields. As in the previous format, the first field holds the 2-bit
value 01. However, unlike the previous format, the fifth field holds
the 1-bit value 1. The remaining fields, rd,
op , rs
, and siconst
,
hold encodings for the destination register, the operation, the source
register, and the constant value, respectively. When this format is
used, the integer constant is encoded using the 13-bit 2's complement
representation and stored in the siconst
field of the
instruction.
Recall that a SPARC assembly language instruction begins with the name of the operation, followed by the two source operands, followed by the destination operand. In considering the translation from an assembly language instruction into machine language, there are a few points to keep in mind:
Table 9.2 summarizes the operation encodings for the data
manipulation operations that we have covered in the previous labs.
When an instruction using one of these operations is encoded, the
operator encoding is placed in the op field of the machine
instruction.
Table 9.2: Operation encodings for the data manipulation operations
sub %r27, %r16, %r26
Because this instruction uses two source registers, it is encoded
using the first format shown in Figure 9.3. As such,
we must determine the values for the op , rd,
rs
, and rs
fields. The following
table summarizes these encodings:
These encodings lead to the following machine instruction:
That is, 1011 0110 0010 0100 0000 0000 0001 1010 in binary, or 0xB624001A.
smulcc %r29, -23, %r19
Because this instruction uses one source register and a signed
integer constant, it is encoded using the second format shown in
Figure 9.3. As such, we must determine the values for
the op , rd, rs
, and
siconst
fields. The following table summarizes these
encodings:
These encodings lead to the following machine instruction:
That is, 1010 0110 1101 1111 0111 1111 1110 1001 in binary, or 0xA6DF7FE9.
The machine language format for the conditional branching operations
on the SPARC is shown in Figure 9.4. This format
divides the machine instruction into five fields. The first and
fourth fields hold the fixed values 102 and 0102. The remaining
fields, a, cond, and disp , hold the
encoded values for the annul bit, the branching condition, and program
counter displacement.
Figure 9.4: Instruction format for conditional branch instructions
The a field of a machine instruction is set (i.e., 1) for instructions that use the annul suffix (``,a''). This field is clear (i.e, 0) for conditional branching instructions that do not nullify the results of the next instruction. The cond field of a machine instruction encodes the condition under which the branch is taken. Table 9.3 summarizes the operation encodings for the branching operations supported by the SPARC.
Table 9.3: Operation encodings for the conditional branching operations
To complete the encoding of an assembly language instruction that uses
conditional branching, you need to determine the value of the
disp field. We address this issue by considering how a
processor uses this value. When the processor determines that the
branching condition is satisfied, it multiplies the value in the
disp
field by 4 and adds it to the program counter (PC).
To be more precise, the processor sign extends the 22-bit value stored
in the disp
field to 30 bits and concatenates two
zeros to construct a 32-bit which which it adds to the PC. In effect,
the disp
field holds the distance from the target to
the destination measured in instructions.
cmp %r2, 8 bne l1 nop inc %r3 l1:
In this case, the target is 3 instructions from the branch
instruction, so the disp field will be the 22-bit
binary encoding of 3.
These encodings lead to the following machine instruction:
That is, 0001 0010 1000 0000 0000 0000 0000 0011 in binary, or 0x12800003.
top: add %r2, %r3, %r2 deccc %r4 bne top
In this case, the target is 2 instructions (back) from the branch
instruction, so the disp field will be the 22-bit
binary encoding of -2.
These encodings lead to the following machine instruction:
That is, 0001 0010 1011 1111 1111 1111 1111 1110 in binary, or 0x12BFFFFE.
In most cases, an assembly language instruction is simply a symbolic representation of a machine language instruction. The SPARC architecture also defines a number of assembly language instructions that do not correspond directly to SPARC machine language instructions. These are called synthetic instructions. The assembler translates synthetic instructions into one or more machine language instructions. Using synthetic instructions can frequently make your programs easier to read. Table 9.4 summarizes the translation provided by the assembler for most of the synthetic instructions on the SPARC.
Table 9.4: The synthetic instructions
Most of the translations shown in Table 9.4 are straightforward. However, the implementation of the set instruction merits further discussion. The assembler will always try to use one of the first two translations if it can. That is, if the constant value can be represented in 13 bits, the assembler will select the first translation. If the least significant 10 bits of the constant value are 0, it will used the second translation. Otherwise, the assembler will use the third translation. Note, if the constant value is relocatable, the assembler will always select the third translation.
The Y register, introduced in Lab 4 is one of the SPARC state registers. As shown in Table 9.4, when you use a state register as the destination in a mov instruction, it is translated to a wr (write) instruction. Similarly, when you use a state register as the source register in a mov instruction it is translated to a rd (read) instruction.
Write instructions are encoded using the formats shown in
Figure 9.3. When the destination register is the Y
register, the rd field is set to the 5-bit value 00000 and the
op field is set to the 6-bit value 110000.
Read instructions are encoded using the second format shown in
Figure 9.3. When the source register is the Y register,
the op field is set to the 6-bit value 101000 and the
rs
field is set to the 5-bit value 00000.
In this lab, we have limited our discussion to the translation of instructions that use absolute expressions. We will consider the translation of relocatable expressions when we consider linking and loading in Lab 15.
We conclude our discussion of instruction formats by considering instruction decoding. That is, the process by which a SPARC processor determines the instruction it is executing.
The SPARC uses a distributed opcode. The two most significant bits in an instruction represent the primary opcode. If the primary opcode is 00, bits 22-24 of the instruction provide the secondary opcode. If the primary opcode is 01, the instruction is a call instruction and the remaining bits (bits 0-29) are a displacement for the program counter (we will discuss the call instruction at greater length in Lab 10). Otherwise, if the primary opcode is either 10 or 11, bits 19-24 of the instruction provide the secondary opcode. Figure 9.5 illustrates the positions of the secondary opcodes based on the primary opcode.
Figure 9.5: The primary opcode ina SPARC instruction
Once you have determined the primary and secondary opcodes, you'll be able to to determined the instruction and, knowing the instruction, decode the remaining fields of the instruction. If the primary opcode is 01, the instruction is a call instruction and you can easily complete the decoding of the instruction.
If the primary opcode is 00, the instruction is an unimplemented
instruction, a conditional branch instruction, or a sethi instruction.
Table 9.5 summarizes how the 3-bit value in op is
used to identify the instruction.
Table 9.5: Decoding the op field
The data manipulation instructions are encoded with a primary opcode
of 10. Table 9.6 shows how the 6-bit value in the
op field is used to determine the instruction when the primary
opcode is 10.
Table 9.6: Decoding the op field when the primary opcode is 10
Instructions that access memory are encoded with a primary opcode of
11. Table 9.7 shows how the 6-bit value in the op
field is used to determine the instruction when the primary opcode is
11.
Table 9.7: Decoding the op field when the primary opcode is 11
When you decode an instrcution that has a primary opcode of 10 or 11, you will need to examine bit 13 to determine whether bits 0-12 of the instruction hold an immediate value or a register. If bit 13 is 1, bits 0-12 hold an immediate value.
In binary, this instruction is 00 00100 100 000100.... That is,
the primary opcode is 00 and op is 100. From
Table 9.5, this is a sethi instruction. Using the sethi
format to partition the bits yields:
Thus, the destination register is %r4, and the integer constant is 0x12345. The following instruction will be assembled as 0x09012345.
sethi %hi(0x12345<<10), %r4
In binary, this instruction is 00 01000 010 000000.... That is,
the primary opcode is 00 and op is 010. From
Table 9.5, this is a conditional branch instruction.
Using the conditional branch format to partition the bits yields:
Thus, the operator is ``ba'' and the displacement is +6 words. The following instruction will be assembled as 0x10800006.
ba .+(6*4)(When you use isem-as, `.' is the address of the current instruction.
In binary, the instruction is 10 00011 000000 0001.... That is,
the primary opcode is 10 and op is 000000. From
Table 9.6, this is an add instruction. Because bit 13
is 1, we use the second format in Figure 9.3 to decode
this instruction.
Thus, the destination is %r3, the source register is %r5, and the constant is 0xE. The following instruction will be assembled as 0x8601600E.
add %r5, 14, %r3