Memory architecture
A look into QCPU’s memory system
…
1 Memory access
A memory instruction (which is either mst
or mld
and their word equivalent) can be composed like the following:
<dynamic:1> <register:2>, low byte, high byte instr
Dynamic offsets is a bit whether to add an 8 bit accumulator offset to the address, indicated by '
after the instruction. For the register selection, it’s a special or index register used in the address calculation:
id | short | description |
---|---|---|
00 |
none | |
01 |
sf |
stack frame (user or kernel depending on current context) |
10 |
sp |
stack pointer (user or kernel depending on current context) |
11 |
idx |
general-purpose index register (rx and ry ) |
Special registers have a user and kernel copy. They aren’t addressable in userspace, but can be written to using the DMAC in kernel space or implicitly through certain instructions (like mutating the stack pointer with msp
).
Depending on the processor’s current context (either user or kernel mode), operations using the special registers will vary between the two copies. For example, if a system call is done in userland, the stack immediately switches to the kernel’s location.
2 Address resolution
Like for the imm
(load immediate) and addi
(add with immediate, and alike) instructions described in the diagrams below, the memory instructions use an instruction constant. For memory instructions specifically, it’s a 2 byte, little-endian immediate.
fetch | fetch | decode | decode | execute | writeback |
---|---|---|---|---|---|
imm |
through alu | written to acc | |||
value for immediate | ready for alu |
fetch | fetch | decode | decode | execute | writeback |
---|---|---|---|---|---|
ast ra |
load ra |
through alu | written to acc | ||
addi |
add value with acc | written to acc | |||
value for immediate | ready for alu |
The full address calculation pipeline is shown in the pipeline diagram below. Memory instruction mldw'
loads a word from memory with an additional dynamic offset (the accumulator store prefixing the instruction) and the index registers.
Together, the address is calculated as [dynamic offset +] instruction constant + register
.
fetch | fetch | decode | decode | execute | writeback |
---|---|---|---|---|---|
ast rz |
load rz |
through alu | written to acc, dynamic offset | ||
mldw' idx |
load rx |
rx + imml | dynamic offset | ||
low byte | load ry , low byte ready |
ry + imml prop. carry | dynamic offset prop. carry | ||
high byte | high byte ready | invalidate instr. | |||
sync recall | |||||
rst ra |
write to reg | ||||
rst rb |
write to reg |
The immediate and register is calculated through the ALU, but the dynamic offset is calculated with a hardware-accelerated component. Note that the intermediate address is put through in two separate bytes. The dynamic offset component consists of an adder (adding the previous accumulator value and the low byte of the intermediate address) and a conditional incrementer (if a carry-out was present in the addition of the previous cycle, an increment is done for the high byte).
3 Reference indirection
An indirect access is done by loading two bytes from an arbitrary memory location and storing the result into the index registers. This can be combined with another access using idx
as address register.
mldw 0x0000 ; load first address
rst rx
rst ry
mld idx 0x0000 ; load from indirect address rst ra
4 Async memory
Fetching from memory takes time, especially when a cache miss is present. To combat this more efficiently, results from memory can be asynchronously written to a temporary register set and read from later. If the register is still ‘pending’, the processor will stall until it resolves the register.
mldw 0x0000 ; load first address
amra ; mark buffer a as pending
amr mal ; move register a, low byte, into acc amr mah ; move register a, high byte, into acc