Jumps and branches
There are a handful of instructions which (conditionally) allows the programmer to set the instruction pointer to a static or runtime-dynamic address.
1 Jumps
There are three types of jump instructions used in various scenarios:
1-1011-00-#
jmp
: jumps to an absolute address (3 bytes)1-1011-01-#
jmpr
: jumps to a relative offset (2 bytes)1-1011-10-#
jmpd
: jumps to a dynamic address (1 byte)
A pipeline diagram (for jmp
in this case) is shown below. For a relative jump (jmpr
), a two byte instruction is used instead of three bytes and the offset is calculated based on the instruction pointer value at the address of the first jump instruction byte.
fetch | fetch | decode | decode | execute | writeback |
---|---|---|---|---|---|
jmp |
btb hit known | jumped if btb miss, written to btb | |||
low byte | low byte shift | low byte ready | invalidate instr. | ||
high byte or btb instr. | high byte ready | invalidate instr. if btb miss | |||
btb instr. + 1 | invalidate instr. if btb miss | ||||
btb instr. + 2 | invalidate instr. if btb miss | ||||
btb instr. + 3 or jump |
Each jump instruction can be callable, which is done by setting the LSB to 1, indicated by a l
character in the assembly (like memory instructions would add a '
for a dynamic offset):
.func: ... ; subroutine...
ret ; return from subroutine
main: jmprl .func ; assembler calculates relative offset
fetch | fetch | decode | decode | execute | writeback |
---|---|---|---|---|---|
jmpl |
btb hit known | load spl |
load copy sfl |
add 4 to spl |
…, memory low byte ready |
low byte | load sph |
load copy sfh |
addition, prop. carry | memory high byte ready | |
high byte or btb instr. |
1.1 Dynamic addresses
Sometimes, an address is present in memory (like a switch table) or argument registers (like a callback argument) and a jump must be performed to that address. The jmpd
instruction provides this dynamic functionality by piping the last two accumulator writes into the instruction pointer.
Because dynamic addressing may change, the BTB (explained in Section 3.3) compares the speculation address versus actual address and recovers/invalidates if necessary when a BTB hit was found.1
fetch | fetch | decode | decode | execute | writeback |
---|---|---|---|---|---|
ast ra |
load ra |
written to acc | |||
ast rb |
load rb |
written to acc | |||
jmpd |
btb hit known | compare target vs. actual | jumped if btb miss or invalid address |
2 Branches
A branch is a non-callable, relative, conditional jump with the synopsys 1-100-T-###
brh
(2 bytes), where bit T
is whether the jump should be taken when uncertain (only on BTB hit). The 3 bit condition is either the following:
C
- carry outS
- sign bit (MSB)Z
- zeroU
- underflow!C
- not carry out!S
- not sign bit (MSB)!Z
- not zero!U
- not underflow
fetch | fetch | decode | decode | execute | writeback |
---|---|---|---|---|---|
brh c |
btb hit known | recovery ready for taken -> not taken | jumped if mispredicted, written to btb | ||
signed offset | perform offset, ready for not taken -> taken jump | invalidate instr. | |||
btb hit and T, btb miss or !T | invalidate instr. if mispredicted | ||||
taken + 1 or not taken + 1 | invalidate instr. if mispredicted | ||||
taken + 2 or not taken + 2 | invalidate instr. if mispredicted | ||||
taken + 3 or not taken + 3 or recover |
For address calculation, branch recovery and BTB writes, the following address delay cycles are used:
- Address calculation (relative jumps/branches): 3 cycles (relative to the instruction)
- Branch recovery (taken -> not taken branches): 3 cycles (incremented)
- Branch Target Buffer writes: 4 cycles
3 Fetch security and optimisations
3.1 Instruction cache
A cache miss may occur which results in the CPU stalling until the valid/ready bit turns high for the designated page. Instruction pointer incrementation is disabled until then, unless an interrupt is received, in which case it resumes again.
fetch | fetch | decode | decode | execute | writeback |
---|---|---|---|---|---|
instr., miss known at end of cycle | new address written in cc | invalidate instr. | |||
instr. + 1, second miss, ignore | invalidate instr. | ||||
instr. + 2, hit, wait valid bit | invalidate instr. if invalid bit |
3.2 Cache miss deferring
If a cache miss happened in the middle of a multi-byte instruction, the state is held within the cycles of fetching from the upper memory. The clock doesn’t switch off during this time. Inversely, the instruction requiring multiple additional bytes can implement a latch only when a valid byte is received.
fetch | fetch | decode | decode | execute | writeback |
---|---|---|---|---|---|
imm |
|||||
cache miss value… | through alu, defer state | written to acc, deferred | |||
valid value | through alu | written to acc |
3.3 Branch Target Buffer
The Branch Target Buffer (BTB) caches the jump destination addresses to allow the instruction cache to perform a jump within a single cycle, avoiding a three cycle penalty. Section 1 provides a pipeline diagram of the BTB. Below are diagrams showing the interference of a cache miss during a jump with both a BTB hit and miss.
fetch | fetch | decode | decode | execute | writeback |
---|---|---|---|---|---|
jmpr |
btb hit | no jump necessary | |||
signed offset | cancel cache miss | perform offset | invalidate instr. | ||
valid destination | could be a cache miss |
fetch | fetch | decode | decode | execute | writeback |
---|---|---|---|---|---|
jmpr |
btb miss | jumped to address, written to btb | |||
signed offset | perform offset | invalidate instr. | |||
penalty cycle | cancel cache miss | invalidate instr. | |||
penalty cycle | cancel cache miss | invalidate instr. | |||
penalty cycle | cancel cache miss | invalidate instr. | |||
valid destination | could be a cache miss |
BTB addresses are invalidated when the page of origin gets written to, which includes context swaps in virtual memory like by switching userland processes.
Footnotes
Currently uncertain. BTB may not be used for dynamic addresses.↩︎