QCPU
  • Instruction set
  • CSRs
  • Assembly
  • Guide
  • Snippets

On this page

  • 1 Address resolution
  • 2 Memory operations
  • 3 Atomics
  • 4 Async memory

Memory architecture

memory
A look into QCPU’s memory system
Published

October 10, 2025

QCPU is a load-store architecture. A memory instruction is composed of a source or destination register, an address register, and a 5 bit unsigned immediate:

instr <source or destination>, <address>, <5 bit uimm>

A memory instruction operates on the lower 8 bits of the source or destination register. Each memory instruction has an associated instruction to operate on words (the full 16 bit register).

1 Address resolution

A 5 bit unsigned immediate can be provided as offset to the memory instruction. For byte-sized accesses, the immediate has a range of 0-31. For word-sized accesses, the immediate is left shifted for a range of 0-63, aligned by 2 bytes, and chops off the LSB of the resolved address.

Memory from a pointer (e.g. allocated by the kernel) can be used like so:

                  mld t1, x1, 0                 ; loads from ptr x1 into t1

Memory can be addressed by the stack pointer, like so:

                  mldw x1, sp, 0                ; loads sp+0 in x1
                  mldw x2, sp, 2                ; loads sp+2 in x2

To address memory on an assemble-time known address, it must be first loaded into a register. If the memory address sits against the page boundary in a 32 or 64 byte range, for byte-sized and word-sized respectively, it can be done with a single lui instruction. Otherwise, an additional ioriu instruction must be added to saturate the address with the lower byte:

                  lui t1, 0xEA lsh 8            ; load upper byte
                  mld x1, t1, 0                 ; loads 0xEA00 into x1
                  lui t1, 0xEA lsh 8            ; load upper byte
                  mld x1, t1, 31                ; loads 0xEA1F into x1
                  lui t1, 0xBE lsh 8            ; load upper byte
                  ioriu t1, 0xEF                ; ORs lower byte
                  mld x1, t1, 0                 ; loads 0xBEEF into x1

Addresses relative to the current instruction pointer can be calculated by reading the ip CSR:

                  csrr t1, 0x02
                  addi t1, $ - .label - 2
                  mld x1, t1, 0

2 Memory operations

Prefetching from data memory is done by loading into zr, which discards the result (pseudo-instruction prfd).

Resetting memory (setting a byte or word to 0) is done by storing zr to the destination (pseudo-instructions mclr and mclrw for clearing a byte and word respectively).

3 Atomics

Atomic operations can be done using the xch atomic instruction. xch atomically swaps a memory location with a register.

For implementing a mutex lock mechanism, harts can swap 1 into its place and expect 0 when they’ve taken the lock:

                  lui x1, .mutex'u       
                  lli x2, 1
.try_lock:        xch x2, x1, .mutex'l          ; 0bXXXXXXXX.000XXXXX
                  brh nz, .try_lock             ; was already taken? try again

                  ; ... lock taken

                  xch zr, x1, .mutex'l          ; release lock

4 Async memory

Memory accesses are asynchronous. If a read happens to a destination register, the GPR’s async resolved bit is reset. A read operation on that register stalls the CPU when the async operation is not yet resolved. Memory writes are queued in the same way, but can only be awaited using the fence pseudoinstruction. Atomic instructions (xch family of instructions) are also async, but atomic with symmetric multiprocessing.