Assembly
QCPU language reference
QCPU CLI is a tool which includes an assembler, linker and emulator for the QCPU 2 specification.
Often the most efficient way to learn something new is to see examples, so this documentation shows how to use each of QCPU CLI’s (language) features. QCPU CLI is created with the Zig programming language and can be compiled without any other dependencies. For starters, running --help shows the different options you may use:
… help message
1 Root section and linking
root.s:
@section root
@ialign 2
_: bkpt ; entrypoint
@linkinfo(origin) root, 0x0800$ qcpu --dlinker root.s
... (binary dump)
$ qcpu --virtualise --step root.sThe QCPU assembler requires a root section (--rootsection root) in one of the input files. The root section is used as base tree to lazily evaluate input symbols and performs dead tree elimination of unreferenced sections. For preserving data order, the root section can only be defined once; the section order must be guaranteed.
Linking order and placement is controlled with the @linkinfo builtin. Like the root section, only one file can specify @linkinfo builtins to preserve the linking order of the output binary. Defining section placements, symbols are created and exposed which can be referenced at link-time to receive the start address of the defined section (zero if eliminated).
The listing above is the general skeleton of any program analysed with the QCPU assembler. It defines the entrypoint root section starting at 0x0800 (physical memory instruction entrypoint). For liveness, there must be a label at the start of the section (in this case _, to discard it).
2 Instructions
Like most assembly languages, instruction lines generally look like the following:
label: instr op, op, op ; comments col 80Conventionally, a space width of 6 is used.
There are two supported types of comments, prefixed with ; and //. QCPU assembly doesn’t support multiline comments (like /* */ in C). This allows the assembler to tokenise any line out of context.
Instructions are taken from the instruction reference and operands are separated by commas. Unlike certain pseudo-instructions, instructions always generate 2 bytes.
mldw x1, sp, 4 ; load word from sp + 42.1 Pseudo-instructions
Pseudo-instructions are formulated like normal instructions, but are interpreted by the assembler to produce a special kind of output. There are two kinds of pseudo-instructions: special encoding and instruction aliases.
2.2 Pseudo-instruction encoding
These encoded pseudo-instructions convert some kind of encoding into their representing binary:
u8,i8: a byte (unsigned or signed, respectively)u16,i16: a word (unsigned or signed, respectively)u24,i24: a 24 bit word (unsigned or signed, respectively)u32,i32: a 32 bit word (unsigned or signed, respectively)
var: u16 0xEAEA ; memory location containing a two byte variable
ptr: u16 .var ; memory location containing a two byte ptr to variableA couple of pseudo-instructions manage variably-sized encoding of data:
str: ascii "Hello world!"
sentinel_str: ascii "This is a C-like string" 0 reserve u8, 16 ; 1 * 16 null bytes
reserve u16, 16 ; 2 * 16 null bytes2.3 Pseudo-instruction aliases
A couple of pseudo-instructions are aliases to known instructions:
mov t1, x1 ; alias for 'add t1, x1, 0'
cut8 t1 ; alias for 'andiu t1, 0xFF'
ret ; alias for 'jmpd rp'However, certain aliases (or, more precisely, instruction encodings) have special behaviours:
sysc 1 ; alias/special behaviour for 'jmpd 0, 1'3 Address semantics
3.1 Masks
Instructions can use label addresses which are resolved during link-time. Immediates are, however, less than the 16 bit address range which is available, masking a subset of the address:
lui x1, .label'u ; masks 0b11111111.00000000
ioriu x1, .label'l ; masks 0b00000000.11111111
mld x2, x1, .label'l ; masks 0b00000000.00011111
mldw x3, x1, .label'l ; masks 0b00000000.00111110To prevent any bit loss, the assembler will give an error if an instruction masks off an inputted immediate value. For example, providing lli with an immediate value of 0x0100 (first bit of the upper byte) results in such an error. To silence expected masks, reference modifiers can be used to manually mask a reference label:
.label'u- masks0b11111111.00000000.label'l- masks0b00000000.11111111
3.2 Offset expressions
References can be offsetted by expressions:
addi x1, .label'l + 4
lli x2, .label'l - (@len lsh 2)3.3 Differences
An address difference can be made if both labels belong to the same defined section:
@section data
.str: ascii "Hello world!"
.len: u8 .len - .str4 Builtins
Documentation of builtins can be retrieved with --docs builtin like --docs align.
Builtins are provided by the assembler and are prefixed with @.
The following documentation is in alphabetical order, but it makes more sense to start to understand @section and @linkinfo before anything else, as those builtins provide the foundation for the assembly language. The underlying examples omit the root section and linkinfo boilerplate.
4.1 @align
4.2 @alignop
4.3 @barrier
4.4 @buildinfo
4.5 @define
TODO: define with argument