Memory architecture
QCPU is a load-store architecture. A memory instruction is composed of a source or destination register, an address register, and a 5 bit unsigned immediate:
instr <source or destination>, <address>, <5 bit uimm>A memory instruction operates on the lower 8 bits of the source or destination register. Each memory instruction has an associated instruction to operate on words (the full 16 bit register).
1 Address resolution
A 5 bit unsigned immediate can be provided as offset to the memory instruction. For byte-sized accesses, the immediate has a range of 0-31. For word-sized accesses, the immediate is left shifted for a range of 0-63, aligned by 2 bytes, and chops off the LSB of the resolved address.
Memory from a pointer (e.g. allocated by the kernel) can be used like so:
mld t1, x1, 0 ; loads from ptr x1 into t1Memory can be addressed by the stack pointer, like so:
mldw x1, sp, 0 ; loads sp+0 in x1
mldw x2, sp, 2 ; loads sp+2 in x2To address memory on an assemble-time known address, it must be first loaded into a register. If the memory address sits against the page boundary in a 32 or 64 byte range, for byte-sized and word-sized respectively, it can be done with a single lui instruction. Otherwise, an additional ioriu instruction must be added to saturate the address with the lower byte:
lui t1, 0xEA lsh 8 ; load upper byte
mld x1, t1, 0 ; loads 0xEA00 into x1 lui t1, 0xEA lsh 8 ; load upper byte
mld x1, t1, 31 ; loads 0xEA1F into x1 lui t1, 0xBE lsh 8 ; load upper byte
ioriu t1, 0xEF ; ORs lower byte
mld x1, t1, 0 ; loads 0xBEEF into x1Addresses relative to the current instruction pointer can be calculated by reading the ip CSR:
csrr t1, 0x02
addi t1, $ - .label - 2
mld x1, t1, 02 Memory operations
Prefetching from data memory is done by loading into zr, which discards the result (pseudo-instruction prfd).
Resetting memory (setting a byte or word to 0) is done by storing zr to the destination (pseudo-instructions mclr and mclrw for clearing a byte and word respectively).
3 Atomics
Atomic operations can be done using the xch atomic instruction. xch atomically swaps a memory location with a register.
For implementing a mutex lock mechanism, harts can swap 1 into its place and expect 0 when they’ve taken the lock:
lui x1, .mutex'u
lli x2, 1
.try_lock: xch x2, x1, .mutex'l ; 0bXXXXXXXX.000XXXXX
brh nz, .try_lock ; was already taken? try again
; ... lock taken
xch zr, x1, .mutex'l ; release lock4 Async memory
Memory accesses are asynchronous. If a read happens to a destination register, the GPR’s async resolved bit is reset. A read operation on that register stalls the CPU when the async operation is not yet resolved. Memory writes are queued in the same way, but can only be awaited using the fence pseudoinstruction. Atomic instructions (xch family of instructions) are also async, but atomic with symmetric multiprocessing.