Assembly Language ================= RISC-V assembly programs are written as plain text files. The file extension is mostly arbitrary but :code:`.asm` and :code:`.S` are quite common. An assembly program is a linear sequence of "items". Items can be many things: labels, instructions, literal bytes and strings, etc. The Bronzebeard assembly syntax also supports basic comments. Single-line comments can be intermixed with the source code by using the :code:`#` character. Multi-line comments are not supported at this point in time. However, you can always emulate multi-line comments by using multiple single-line comments to construct larger blocks. Instructions ------------ Instructions instruct the CPU to do something with a given set of registers and/or immediate values. Registers are named 32-bit "slots" that the CPU can use to store information at runtime. Immediate values are typically integers of varying sizes (depending on the specific instruction at hand). An instruction is written as a name followed by its arguments. The arguments can be separated by a comma for readability but this isn't a requirement (commas are treated as whitespace). Here is an example of using the :code:`addi` instruction to the value :code:`12` into register :code:`x1`:: # x1 = 0 + 12 addi x1, zero, 12 Registers --------- The RISC-V ISA specifies 32 general purpose registers. Each register is cable of a holding a single 32-bit value (or 64 bits on a 64-bit system). Register 0 is the only special case: it always holds the value zero no matter what gets written to it. There also exists the "program counter" which is a register that holds the location of the current program's execution. This :code:`pc` register can't be accessed directly but is utilized by certain instructions. A given register can be referenced in multiple ways: by number, by name, or by its alias. The alias and suggested usage of each register can be ignored when writing simple assembly programs. They are given more meaning when dealing with more complex `ABIs `_ and `calling conventions `_. ============== ============== ============= =============== Number Name Alias Suggested Usage ============== ============== ============= =============== :code:`0` :code:`x0` :code:`zero` Hard-wired zero :code:`1` :code:`x1` :code:`ra` Return address :code:`2` :code:`x2` :code:`sp` Stack pointer :code:`3` :code:`x3` :code:`gp` Global pointer :code:`4` :code:`x4` :code:`tp` Thread pointer :code:`5-7` :code:`x5-7` :code:`t0-2` Temporary registers :code:`8` :code:`x8` :code:`s0/fp` Saved register / frame pointer :code:`9` :code:`x9` :code:`s1` Saved register :code:`10-11` :code:`x10-11` :code:`a0-1` Function arguments / return values :code:`12-17` :code:`x12-17` :code:`a2-7` Function arguments :code:`18-27` :code:`x18-27` :code:`s2-11` Saved registers :code:`28-31` :code:`x28-31` :code:`t3-6` Temporary registers ============== ============== ============= =============== Constants --------- A constant in Bronzebeard is the named result of an integer expression. Floating point numbers or any other non-integer expression results aren't supported at this time. Numbers can be represented as decimal, binary, or hex. Character literals can also be used if surrounded by single-quotes. Simple math operations such as addition, multiplication, shifting, and binary ops all work as one would expect. The actual precedence rules and evaluation of arithmetic expressions is handled by the Python lauguage itself (via the `eval `_ builtin). Here are a few examples:: RCU_BASE_ADDR = 0x40021000 RCU_APB2EN_OFFSET = 0x18 GPIO_BASE_ADDR_C = 0x40011000 GPIO_CTL1_OFFSET = 0x04 GPIO_MODE_OUT_50MHZ = 0b11 GPIO_CTL_OUT_PUSH_PULL = 0b00 FOO = 42 BAR = FOO * 2 BAZ = (BAR >> 1) & 0b11111 QMARK = '?' SPACE = ' ' Labels ------ Labels are single-token items that end with a colon such as :code:`foo:` or :code:`bar:`. They effectively mark a location in the assembly program with a human-readable name. Labels have two primary use cases: being targets for jump / branch offsets and marking the position of data. Here is an example that utilizes a label in order to create an infinite loop:: loop: j loop Notice how the label ends with a colon when it is defined but not when it is referenced. This is necessary to distinguish label definitions from other keywords. Include ------- The :code:`include` keyword can be used to include other assembly source files into the current program. At the moment, files are searched relative to the file containing the :code:`include` keyword. Additional include directories can be specified on the command via the :code:`-i` (or :code:`--include`) flag. Here is a basic example:: include gd32vf103.asm You can find another example of this in the `Longan Nano LED example `_. Include Bytes ------------- Similar to :code:`include`, :code:`include_bytes` can be used to embed binary files into the output binary. Regardless of the file's type, it will be simply be baked into the binary as raw bytes. Here are some example:: include_bytes cat.jpg include_bytes prelude.forth include_bytes my_random_file.dat String Literals --------------- String literals allow you to embed UTF-8 strings into your binary. They start with the :code:`string` keyword (then a single space) and are followed by any number of characters (til end of line). This item is lexed in a special way such that the literal string content remains unchanged. This means that spaces, newlines, quotes, and comments are all preserved within the literal string value. The regex used for lexing these items is roughly: :code:`string (.*)`:: # note that any comments after these lines would be included in the string string hello string "world" string "hello world" string hello ## world string hello\nworld string hello\\nworld Numeric Sequence Literals ------------------------- Numeric sequence literals allow you to embed homogeneous sequences of numbers into your binary. Integer Sequences ^^^^^^^^^^^^^^^^^ Integers can be positive or negative and expressed in decimal, binary, or hex. ================= ================ Keyword Bytes per Number ================= ================ :code:`bytes` 1 :code:`shorts` 2 :code:`ints` 4 :code:`longs` 4 :code:`longlongs` 8 ================= ================ Examples ^^^^^^^^ Here are a few examples of the various numeric sequences:: bytes 1 2 0x03 0b100 5 0x06 0b111 8 bytes -1 0xff # same value once encoded as 2's comp integers shorts 0x1234 0x5678 ints 1 2 3 4 longs 1 2 3 4 # same as above (both 4 bytes each) Packed Values ------------- Packed values allow you embed packed numeric literals, expressions, or references into your binary. They start with the :code:`pack` keyword and are followed by a format specifier and a value. The format specifier is a subset of the format outlined in Python's builtin `struct module `_. The pack format is composed of two characters: the first specifies endianness and the second details the numeric size and type: ========= ================== ===== Character Meaning Bytes ========= ================== ===== :code:`<` Little endian N/A :code:`>` Big endian N/A :code:`b` Signed char 1 :code:`B` Unsigned char 1 :code:`h` Signed short 2 :code:`H` Unsigned short 2 :code:`i` Signed int 4 :code:`I` Unsigned int 4 :code:`l` Signed long 4 :code:`L` Unsigned long 4 :code:`q` Signed long long 8 :code:`Q` Unsigned long long 8 ========= ================== ===== Here are a few examples:: pack