Assembly Language

RISC-V assembly programs are written as plain text files. The file extension is mostly arbitrary but .asm and .S are quite common. An assembly program is a linear sequence of “items”. Items can be many things: labels, instructions, literal bytes and strings, etc.

The Bronzebeard assembly syntax also supports basic comments. Single-line comments can be intermixed with the source code by using the # character. Multi-line comments are not supported at this point in time. However, you can always emulate multi-line comments by using multiple single-line comments to construct larger blocks.

Instructions

Instructions instruct the CPU to do something with a given set of registers and/or immediate values. Registers are named 32-bit “slots” that the CPU can use to store information at runtime. Immediate values are typically integers of varying sizes (depending on the specific instruction at hand).

An instruction is written as a name followed by its arguments. The arguments can be separated by a comma for readability but this isn’t a requirement (commas are treated as whitespace). Here is an example of using the addi instruction to the value 12 into register x1:

# x1 = 0 + 12
addi x1, zero, 12

Registers

The RISC-V ISA specifies 32 general purpose registers. Each register is cable of a holding a single 32-bit value (or 64 bits on a 64-bit system). Register 0 is the only special case: it always holds the value zero no matter what gets written to it. There also exists the “program counter” which is a register that holds the location of the current program’s execution. This pc register can’t be accessed directly but is utilized by certain instructions.

A given register can be referenced in multiple ways: by number, by name, or by its alias. The alias and suggested usage of each register can be ignored when writing simple assembly programs. They are given more meaning when dealing with more complex ABIs and calling conventions.

Number

Name

Alias

Suggested Usage

0

x0

zero

Hard-wired zero

1

x1

ra

Return address

2

x2

sp

Stack pointer

3

x3

gp

Global pointer

4

x4

tp

Thread pointer

5-7

x5-7

t0-2

Temporary registers

8

x8

s0/fp

Saved register / frame pointer

9

x9

s1

Saved register

10-11

x10-11

a0-1

Function arguments / return values

12-17

x12-17

a2-7

Function arguments

18-27

x18-27

s2-11

Saved registers

28-31

x28-31

t3-6

Temporary registers

Constants

A constant in Bronzebeard is the named result of an integer expression. Floating point numbers or any other non-integer expression results aren’t supported at this time. Numbers can be represented as decimal, binary, or hex. Character literals can also be used if surrounded by single-quotes. Simple math operations such as addition, multiplication, shifting, and binary ops all work as one would expect. The actual precedence rules and evaluation of arithmetic expressions is handled by the Python lauguage itself (via the eval builtin).

Here are a few examples:

RCU_BASE_ADDR = 0x40021000
RCU_APB2EN_OFFSET = 0x18

GPIO_BASE_ADDR_C = 0x40011000
GPIO_CTL1_OFFSET = 0x04
GPIO_MODE_OUT_50MHZ = 0b11
GPIO_CTL_OUT_PUSH_PULL = 0b00

FOO = 42
BAR = FOO * 2
BAZ = (BAR >> 1) & 0b11111

QMARK = '?'
SPACE = ' '

Labels

Labels are single-token items that end with a colon such as foo: or bar:. They effectively mark a location in the assembly program with a human-readable name. Labels have two primary use cases: being targets for jump / branch offsets and marking the position of data.

Here is an example that utilizes a label in order to create an infinite loop:

loop:
    j loop

Notice how the label ends with a colon when it is defined but not when it is referenced. This is necessary to distinguish label definitions from other keywords.

Include

The include keyword can be used to include other assembly source files into the current program. At the moment, files are searched relative to the file containing the include keyword. Additional include directories can be specified on the command via the -i (or --include) flag.

Here is a basic example:

include gd32vf103.asm

You can find another example of this in the Longan Nano LED example.

Include Bytes

Similar to include, include_bytes can be used to embed binary files into the output binary. Regardless of the file’s type, it will be simply be baked into the binary as raw bytes.

Here are some example:

include_bytes cat.jpg
include_bytes prelude.forth
include_bytes my_random_file.dat

String Literals

String literals allow you to embed UTF-8 strings into your binary. They start with the string keyword (then a single space) and are followed by any number of characters (til end of line). This item is lexed in a special way such that the literal string content remains unchanged. This means that spaces, newlines, quotes, and comments are all preserved within the literal string value.

The regex used for lexing these items is roughly: string (.*):

# note that any comments after these lines would be included in the string
string hello
string "world"
string "hello world"
string hello  ##  world
string hello\nworld
string   hello\\nworld

Numeric Sequence Literals

Numeric sequence literals allow you to embed homogeneous sequences of numbers into your binary.

Integer Sequences

Integers can be positive or negative and expressed in decimal, binary, or hex.

Keyword

Bytes per Number

bytes

1

shorts

2

ints

4

longs

4

longlongs

8

Examples

Here are a few examples of the various numeric sequences:

bytes 1 2 0x03 0b100 5 0x06 0b111 8
bytes -1 0xff  # same value once encoded as 2's comp integers
shorts 0x1234 0x5678
ints  1 2 3 4
longs 1 2 3 4  # same as above (both 4 bytes each)

Packed Values

Packed values allow you embed packed numeric literals, expressions, or references into your binary. They start with the pack keyword and are followed by a format specifier and a value. The format specifier is a subset of the format outlined in Python’s builtin struct module.

The pack format is composed of two characters: the first specifies endianness and the second details the numeric size and type:

Character

Meaning

Bytes

<

Little endian

N/A

>

Big endian

N/A

b

Signed char

1

B

Unsigned char

1

h

Signed short

2

H

Unsigned short

2

i

Signed int

4

I

Unsigned int

4

l

Signed long

4

L

Unsigned long

4

q

Signed long long

8

Q

Unsigned long long

8

Here are a few examples:

pack <B, 0
pack <B, 255
pack <h, -1234
pack <I ADDR
pack <I %position(foo, ADDR)

Shorthand Syntax

In addition to the above pack keyword, a small set of shorthand keywords (loosely based on NASM syntax) are available for embedding integers of specific widths. The specific endianness and signedness will be inferred by the assembler’s configuration and resolved integer value, respectively. Internally, these are implemented as AST transformations to the more general pack syntax.

Keyword

Bytes

db

1

dh

2

dw

4

dd

8

Here are some examples:

# 1-byte integers
db -1  # 2's complement will end up as 0xff
db 0xff
db 0x20

# 2-byte integers
dh 0x2000

# 4-byte integers
dw 0x20000000
dw some_label
dw RAM_ADDR

# 8-byte integers
dd 0x2000000000000000

Alignment

The align keyword tells the assembler to enforce alignment to a certain byte boundary. This alignment is achieved by padding the binary with 0x00 bytes until it aligns with the boundary. In pseudo-code, the assembler adds zeroes until: len(binary) % alignment == 0:

# align the current location in the binary to 4 bytes
align 4

Alignment is important when mixing instructions and data into the same binary (which happens quite often). According to the RISC-V spec, instructions MUST be aligned to a 32-bit (4 byte) boundary unless the CPU supports the “C” Standard Extension for Compressed Instructions (in which case the alignment requirement is relaxed to a 16-bit (2 byte) boundary).

For example, the following code may be invalid because the instruction is not on a 32-bit boundary:

bytes 0x42      # occupies 1 byte
addi x0, x0, 0  # misaligned :(

To fix this, we need to tell the assembler to ensure that the binary is aligned to 32 bits (4 bytes) before proceeding:

bytes 0x42      # occupies 1 byte
align 4         # will pad the binary with 3 0x00 bytes
addi x0, x0, 0  # aligned :)

Modifiers

In addition to basic arithmetic operations, Bronzebeard assembly supports a small number of “modifiers”. Note that the %offset and %position modifiers are NOT permitted within the value of a constant.

You can think of these like simple, builtin functions:

  • %hi(value) - Calculate the sign-adjusted top 20 bits of a value

  • %lo(value) - Calculate the sign-adjusted bottom 12 bits of a value

  • %offset(label) - Calculate the offset of a label relative to the current item’s address

  • %position(label, addr) - Calculate the position of a label based on a given absolute address

Error

The keyword error can be used to abort the assembler in a human-understandable fashion. For example, if a given board doesn’t support visual displays but the program requires it, an error directive can be used to explicity inform the user of the problem as soon as possible. This is much better user-experience than waiting for the assembler to drop a cryptic “symbol not found” type of error sometime later.

This keyword captures a simple message based on the regex: error (.*):

error This device doesn't support displays