Introduction

This is the book for learning how to write GameBoy Advance (GBA) games in Rust.

I'm Lokathor, the main author of the book. There's also Ketsuban who provides the technical advisement, reviews the PRs, and keeps my crazy in check.

The book is a work in progress, as you can see if you actually try to open many of the pages listed in the Table Of Contents.

Feedback

It's very often hard to tell when you've explained something properly. In the same way that your brain will read over small misspellings and correct things into the right word, if an explanation for something you already understand accidentally skips over some small detail then your brain can fill in the gaps without you realizing it.

Please, if things don't make sense then file an issue about it so I know where things need to improve.

Reader Requirements

This book naturally assumes that you've already read Rust's core book:

Now, I know it sounds silly to say "if you wanna program Rust on this old video game system you should already know how to program Rust", but the more people I meet and chat with the more they tell me that they jumped into Rust without reading any or all of the book. You know who you are.

Please, read the whole book!

In addition to the core book, there's also an expansion book that I will declare to be required reading for this:

The Rustonomicon is all about trying to demystify unsafe. We'll end up using a fair bit of unsafe code as a natural consequence of doing direct hardware manipulations. Using unsafe is like swinging a sword, you should start slowly, practice carefully, and always pay attention no matter how experienced you think you've become.

That said, it's sometimes a necessary tool to get the job done, so you have to break out of the borderline pathological fear of using it that most rust programmers tend to have.

Book Goals and Style

So, what's this book actually gonna teach you?

My goal is certainly not just showing off the crate. Programming for the GBA is weird enough that I'm trying to teach you all the rest of the stuff you need to know along the way. If I do my job right then you'd be able to write your own crate for GBA stuff just how you think it should all go by the end.

Overall the book is sorted more for easy review once you're trying to program something. The GBA has a few things that can stand on their own and many other things are a mass of interconnected concepts, so some parts of the book end up having to refer you to portions that you haven't read yet. The chapters and sections are sorted so that minimal future references are required, but it's unavoidable that it'll happen sometimes.

The actual "tutorial order" of the book is the Examples chapter. Each section of that chapter breaks down one of the provided examples in the examples directory of the repository. We go over what sections of the book you'll need to have read for the example code to make sense, and also how we apply the general concepts described in the book to the specific example cases.

Development Setup

Before you can build a GBA game you'll have to follow some special steps to setup the development environment.

Once again, extra special thanks to Ketsuban, who first dove into how to make this all work with rust and then shared it with the world.

Per System Setup

Obviously you need your computer to have a working rust installation. However, you'll also need to ensure that you're using a nightly toolchain (we will need it for inline assembly, among other potential useful features). You can run rustup default nightly to set nightly as the system wide default toolchain, or you can use a toolchain file to use nightly just on a specific project, but either way we'll be assuming the use of nightly from now on. You'll also need the rust-src component so that cargo-xbuild will be able to compile the core crate for us in a bit, so run rustup component add rust-src.

Next, you need devkitpro. They've got a graphical installer for Windows that runs nicely, and I guess pacman support on Linux (I'm on Windows so I haven't tried the Linux install myself). We'll be using a few of their general binutils for the arm-none-eabi target, and we'll also be using some of their tools that are specific to GBA development, so even if you already have the right binutils for whatever reason, you'll still want devkitpro for the gbafix utility.

  • On Windows you'll want something like C:\devkitpro\devkitARM\bin and C:\devkitpro\tools\bin to be added to your PATH, depending on where you installed it to and such.
  • On Linux you can use pacman to get it, and the default install puts the stuff in /opt/devkitpro/devkitARM/bin and /opt/devkitpro/tools/bin. If you need help you can look in our repository's .travis.yml file to see exactly what our CI does.

Finally, you'll need cargo-xbuild. Just run cargo install cargo-xbuild and cargo will figure it all out for you.

Per Project Setup

Once the system wide tools are ready, you'll need some particular files each time you want to start a new project. You can find them in the root of the rust-console/gba repo.

  • thumbv4-none-agb.json describes the overall GBA to cargo-xbuild (and LLVM) so it knows what to do. Technically the GBA is thumbv4-none-eabi, but we change the eabi to agb so that we can distinguish it from other eabi devices when using cfg flags.
  • crt0.s describes some ASM startup stuff. If you have more ASM to place here later on this is where you can put it. You also need to build it into a crt0.o file before it can actually be used, but we'll cover that below.
  • linker.ld tells the linker all the critical info about the layout expectations that the GBA has about our program, and that it should also include the crt0.o file with our compiled rust code.

Compiling

Once all the tools are in place, there's particular steps that you need to compile the project. For these to work you'll need some source code to compile. Unlike with other things, an empty main file and/or an empty lib file will cause a total build failure, because we'll need a no_std build, and rust defaults to builds that use the standard library. The next section has a minimal example file you can use (along with explanation), but we'll describe the build steps here.

  • arm-none-eabi-as crt0.s -o target/crt0.o

    • This builds your text format crt0.s file into object format crt0.o that's placed in the target/ directory. Note that if the target/ directory doesn't exist yet it will fail, so you have to make the directory if it's not there. You don't need to rebuild crt0.s every single time, only when it changes, but you might as well throw a line to do it every time into your build script so that you never forget because it's a practically instant operation anyway.
  • cargo xbuild --target thumbv4-none-agb.json

    • This builds your Rust source. It accepts most of the normal options, such as --release, and options, such as --bin foo or --examples, that you'd expect cargo to accept.
    • You can not build and run tests this way, because they require std, which the GBA doesn't have. If you want you can still run some of your project's tests with cargo test --lib or similar, but that builds for your local machine, so anything specific to the GBA (such as reading and writing registers) won't be testable that way. If you want to isolate and try out some piece code running on the GBA you'll unfortunately have to make a demo for it in your examples/ directory and then run the demo in an emulator and see if it does what you expect.
    • The file extension is important! It will work if you forget it, but cargo xbuild takes the inclusion of the extension as a flag to also compile dependencies with the same sysroot, so you can include other crates in your build. Well, crates that work in the GBA's limited environment, but you get the idea.

At this point you have an ELF binary that some emulators can execute directly (more on that later). However, if you want a "real" ROM that works in all emulators and that you could transfer to a flash cart to play on real hardware there's a little more to do.

  • arm-none-eabi-objcopy -O binary target/thumbv4-none-agb/MODE/BIN_NAME target/ROM_NAME.gba

    • This will perform an objcopy on our program. Here I've named the program arm-none-eabi-objcopy, which is what devkitpro calls their version of objcopy that's specific to the GBA in the Windows install. If the program isn't found under that name, have a look in your installation directory to see if it's under a slightly different name or something.
    • As you can see from reading the man page, the -O binary option takes our lovely ELF file with symbols and all that and strips it down to basically a bare memory dump of the program.
    • The next argument is the input file. You might not be familiar with how cargo arranges stuff in the target/ directory, and between RLS and cargo doc and stuff it gets kinda crowded, so it goes like this:
      • Since our program was built for a non-local target, first we've got a directory named for that target, thumbv4-none-agb/
      • Next, the "MODE" is either debug/ or release/, depending on if we had the --release flag included. You'll probably only be packing release mode programs all the way into GBA roms, but it works with either mode.
      • Finally, the name of the program. If your program is something out of the project's src/bin/ then it'll be that file's name, or whatever name you configured for the bin in the Cargo.toml file. If your program is something out of the project's examples/ directory there will be a similar examples/ sub-directory first, and then the example's name.
    • The final argument is the output of the objcopy, which I suggest putting at just the top level of the target/ directory. Really it could go anywhere, but if you're using git then it's likely that your .gitignore file is already setup to exclude everything in target/, so this makes sure that your intermediate game builds don't get checked into your git.
  • gbafix target/ROM_NAME.gba

    • The gbafix tool also comes from devkitpro. The GBA is very picky about a ROM's format, and gbafix patches the ROM's header and such so that it'll work right. Unlike objcopy, this tool is custom built for GBA development, so it works just perfectly without any arguments beyond the file name. The ROM is patched in place, so we don't even need to specify a new destination.

And you're finally done!

Of course, you probably want to make a script for all that, but it's up to you. On our own project we have it mostly set up within a Makefile.toml which runs using the cargo-make plugin.

Hello, Magic

So we know all the steps to build our source, we just need some source.

We're beginners, so we'll start small. With normal programming there's usually a console available, so the minimal program prints "Hello, world" to the terminal. On a GBA we don't have a terminal and standard out and all that, so the minimal program draws a red, blue, and green dot to the screen.

At the lowest level of device programming, it's all Magic Numbers. You write special values to special places and then the hardware does something. A clear API makes every magic number and magic location easy to understand. A clear and good API also prevents you from using the wrong magic number in the wrong place and causing problems for yourself.

This is the minimal example to just test that our build system is all set, so just this once we'll go full magic number crazy town, for fun. Ready? Here goes:

hello_magic.rs:

#![no_std]
#![feature(start)]

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
  loop {}
}

#[start]
fn main(_argc: isize, _argv: *const *const u8) -> isize {
  unsafe {
    (0x400_0000 as *mut u16).write_volatile(0x0403);
    (0x600_0000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
    (0x600_0000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
    (0x600_0000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
    loop {}
  }
}

Throw that into your project skeleton, build the program, and give it a run. You should see a red, green, and blue dot close-ish to the middle of the screen. If you don't, something already went wrong. Double check things, phone a friend, write your senators, try asking Lokathor or Ketsuban on the Rust Community Discord, until you're eventually able to get your three dots going.

Of course, I'm sure you want to know why those numbers are the numbers to use. Well that's what the whole rest of the book is about!

Help and Resources

Help

So you're stuck on a problem and the book doesn't say what to do. Where can you find out more?

The first place I would suggest is the Rust Community Discord. If it's a general Rust question then you can ask anyone in any channel you feel is appropriate. If it's GBA specific then you can try asking me (Lokathor) or Ketsuban in the #gamedev channel.

Emulators

You certainly might want to eventually write a game that you can put on a flash cart and play on real hardware, but for most of your development you'll probably want to be using an emulator for testing, because you don't have to fiddle with cables and all that.

In terms of emulators, you want to be using mGBA, and you want to be using the 0.7 Beta 1 or later. This update lets you run raw ELF files, which means that you can have full debug symbols available while you're debugging problems.

Information Resources

First, if I fail to describe something related to Rust, you can always try checking in The Rust Reference to see if they cover it. You can mostly ignore that big scary red banner at the top, things are a lot better documented than they make it sound.

If you need help trying to fiddle your math down as hard as you can, there are resources such as the Bit Twiddling Hacks page.

As to GBA related lore, Ketsuban and I didn't magically learn this all from nowhere, we read various technical manuals and guides ourselves and then distilled those works oriented around C and C++ into a book for Rust.

We have personally used some or all of the following:

  • GBATEK: This is the resource. It covers not only the GBA, but also the DS and DSi, and also a run down of ARM assembly (32-bit and 16-bit opcodes). The link there is to the 2.9b version on problemkaputt.de (the official home of the document), but if you just google for gbatek the top result is for the 2.5 version on akkit.org, so make sure you're looking at the newest version. Sometimes problemkaputt.de is a little sluggish so I've also mirrored the 2.9b version on my own site as well. GBATEK is rather large, over 2mb of text, so if you're on a phone or similar you might want to save an offline copy to go easy on your data usage.
  • TONC: While GBATEK is basically just a huge tech specification, TONC is an actual guide on how to make sense of the GBA's abilities and organize it into a game. It's written for C of course, but as a Rust programmer you should always be practicing your ability to read C code anyway. It's the programming equivalent of learning Latin because all the old academic books are written in Latin.
  • CowBite: This is more like GBATEK, and it's less complete, but it mixes in a little more friendly explanation of things in between the hardware spec parts.

And I haven't had time to look at it myself, The Audio Advance seems to be very good. It explains in depth how you can get audio working on the GBA. Note that the table of contents for each page goes along the top instead of down the side.

Non-Rust GBA Community

There's also the GBADev.org site, which has a forum and everything. They're coding in C and C++, but you can probably overcome that difference with a little work on your part.

I also found a place called GBATemp, which seems to have a more active forum but less of a focus on actual coding.

Quirks

The GBA supports a lot of totally normal Rust code exactly like you'd think.

However, it also is missing a lot of what you might expect, and sometimes we have to do things in slightly weird ways.

We start the book by covering the quirks our code will have, just to avoid too many surprises later.

No Std

First up, as you already saw in the hello_magic code, we have to use the #![no_std] outer attribute on our program when we target the GBA. You can find some info about no_std in two official sources:

The unstable book is borderline useless here because it's describing too many things in too many words. The embedded book is much better, but still fairly terse.

Bare Metal

The GBA falls under what the Embedded Book calls "Bare Metal Environments". Basically, the machine powers on and immediately begins executing some ASM code. Our ASM startup was provided by Ketsuban (check the crt0.s file). We'll go over how it works much later on, for now it's enough to know that it does work, and eventually control passes into Rust code.

On the rust code side of things, we determine our starting point with the #[start] attribute on our main function. The main function also has a specific type signature that's different from the usual main that you'd see in Rust. I'd tell you to read the unstable-book entry on #[start] but they literally just tell you to look at the tracking issue for it instead, and that's not very helpful either. Basically it just has to be declared the way it is, even though there's nothing passing in the arguments and there's no place that the return value will go. The compiler won't accept it any other way.

No Standard Library

The Embedded Book tells us that we can't use the standard library, but we get access to something called "libcore", which sounds kinda funny. What they're talking about is just the core crate, which is called libcore within the rust repository for historical reasons.

The core crate is actually still a really big portion of Rust. The standard library doesn't actually hold too much code (relatively speaking), instead it just takes code form other crates and then re-exports it in an organized way. So with just core instead of std, what are we missing?

In no particular order:

  • Allocation
  • Clock
  • Network
  • File System

The allocation system and all the types that you can use if you have a global allocator are neatly packaged up in the alloc crate. The rest isn't as nicely organized.

It's possible to implement a fair portion of the entire standard library within a GBA context and make the rest just panic if you try to use it. However, do you really need all that? Eh... probably not?

  • We don't need a file system, because all of our data is just sitting there in the ROM for us to use. When programming we can organize our const data into modules and such to keep it organized, but once the game is compiled it's just one huge flat address space. TODO: Parasyte says that a FS can be handy even if it's all just ReadOnly, so we'll eventually talk about how you might set up such a thing I guess, since we'll already be talking about replacements for three of the other four things we "lost". Maybe we'll make Parasyte write that section.
  • Networking, well, the GBA has a Link Cable you can use to communicate with another GBA, but it's not really like a unix socket with TCP, so the standard Rust networking isn't a very good match.
  • Clock is actually two different things at once. One is the ability to store the time long term, which is a bit of hardware that some gamepaks have in them (eg: pokemon ruby/sapphire/emerald). The GBA itself can't keep time while power is off. However, the second part is just tracking time moment to moment, which the GBA can totally do. We'll see how to access the timers soon enough.

Which just leaves us with allocation. Do we need an allocator? Depends on your game. For demos and small games you probably don't need one. For bigger games you'll maybe want to get an allocator going eventually. It's in some sense a crutch, but it's a very useful one.

So I promise that at some point we'll cover how to get an allocator going. Either a Rust Global Allocator (if practical), which would allow for a lot of the standard library types to be used "for free" once it was set up, or just a custom allocator that's GBA specific if Rust's global allocator style isn't a good fit for the GBA (I honestly haven't looked into it).

Bare Metal Panic

If our code panics, we usually want to see that panic message. Unfortunately, without a way to access something like stdout or stderr we've gotta do something a little weirder.

If our program is running within the mGBA emulator, version 0.7 or later, we can access a special set of addresses that allow us to send out CString values, which then appear within a message log that you can check.

We can capture this behavior by making an MGBADebug type, and then implement core::fmt::Write for that type. Once done, the write! macro will let us target the mGBA debug output channel.

When used, it looks like this:


# #![allow(unused_variables)]
#fn main() {
#[panic_handler]
fn panic(info: &core::panic::PanicInfo) -> ! {
  use core::fmt::Write;
  use gba::mgba::{MGBADebug, MGBADebugLevel};

  if let Some(mut mgba) = MGBADebug::new() {
    let _ = write!(mgba, "{}", info);
    mgba.send(MGBADebugLevel::Fatal);
  }
  loop {}
}
#}

If you want to follow the particulars you can check the MGBADebug source in the gba crate. Basically, there's one address you can use to try and activate the debug output, and if it works you write your message into the "array" at another address, and then finally write a send value to a third address. You'll need to have read the volatile section for the details to make sense.

LLVM Intrinsics

The above code will make your program fail to build in debug mode, saying that __clzsi2 can't be found. This is a special builtin function that LLVM attempts to use when there's no hardware version of an operation it wants to do (in this case, counting the leading zeros). It's not actually necessary in this case, which is why you only need it in debug mode. The higher optimization level of release mode makes LLVM pre-compute more and fold more constants or whatever and then it stops trying to call __clzsi2.

Unfortunately, sometimes a build will fail with a missing intrinsic even in release mode.

If LLVM wants core to have that intrinsic then you're in trouble, you'll have to send a PR to the compiler-builtins repository and hope to get it into rust itself.

If LLVM wants your code to have the intrinsic then you're in less trouble. You can look up the details and then implement it yourself. It can go anywhere in your program, as long as it has the right ABI and name. In the case of __clzsi2 it takes a usize and returns a usize, so you'd write something like:


# #![allow(unused_variables)]
#fn main() {
#[no_mangle]
pub extern "C" fn __clzsi2(mut x: usize) -> usize {
  //
}
#}

And so on for whatever other missing intrinsic.

Fixed Only

In addition to not having much of the standard library available, we don't even have a floating point unit available! We can't do floating point math in hardware! We could still do floating point math as pure software computations if we wanted, but that's a slow, slow thing to do.

Are there faster ways? It's the same answer as always: "Yes, but not without a tradeoff."

The faster way is to represent fractional values using a system called a Fixed Point Representation. What do we trade away? Numeric range.

  • Floating point math stores bits for base value and for exponent all according to a single well defined standard for how such a complicated thing works.
  • Fixed point math takes a normal integer (either signed or unsigned) and then just "mentally associates" it (so to speak) with a fractional value for its "units". If you have 3 and it's in units of 1/2, then you have 3/2, or 1.5 using decimal notation. If your number is 256 and it's in units of 1/256th then the value is 1.0 in decimal notation.

Floating point math requires dedicated hardware to perform quickly, but it can "trade" precision when it needs to represent extremely large or small values.

Fixed point math is just integral math, which our GBA is reasonably good at, but because your number is associated with a fixed fraction your results can get out of range very easily.

Representing A Fixed Point Value

So we want to associate our numbers with a mental note of what units they're in:

  • PhantomData is a type that tells the compiler "please remember this extra type info" when you add it as a field to a struct. It goes away at compile time, so it's perfect for us to use as space for a note to ourselves without causing runtime overhead.
  • The typenum crate is the best way to represent a number within a type in Rust. Since our values on the GBA are always specified as a number of fractional bits to count the number as, we can put typenum types such as U8 or U14 into our PhantomData to keep track of what's going on.

Now, those of you who know me, or perhaps just know my reputation, will of course immediately question what happened to the real Lokathor. I do not care for most crates, and I particularly don't care for using a crate in teaching situations. However, typenum has a number of factors on its side that let me suggest it in this situation:

  • It's version 1.10 with a total of 21 versions and nearly 700k downloads, so we can expect that the major troubles have been shaken out and that it will remain fairly stable for quite some time to come.
  • It has no further dependencies that it's going to drag into the compilation.
  • It happens all at compile time, so it's not clogging up our actual game with any nonsense.
  • The (interesting) subject of "how do you do math inside Rust's trait system?" is totally separate from the concern that we're trying to focus on here.

Therefore, we will consider it acceptable to use this crate.

Now the typenum crate defines a whole lot, but we'll focus down to just a single type at the moment: UInt is a type-level unsigned value. It's like u8 or u16, but while they're types that then have values, each UInt construction statically equates to a specific value. Like how the () type only has one value, which is also called (). In this case, you wrap up UInt around smaller UInt values and a B1 or B0 value to build up the binary number that you want at the type level.

In other words, instead of writing


# #![allow(unused_variables)]
#fn main() {
let six = 0b110;
#}

We write


# #![allow(unused_variables)]
#fn main() {
type U6 = UInt<UInt<UInt<UTerm, B1>, B1>, B0>;
#}

Wild, I know. If you look into the typenum crate you can do math and stuff with these type level numbers, and we will a little bit below, but to start off we just need to store one in some PhantomData.

A struct For Fixed Point

Our actual type for a fixed point value looks like this:


# #![allow(unused_variables)]
#fn main() {
use core::marker::PhantomData;
use typenum::marker_traits::Unsigned;

/// Fixed point `T` value with `F` fractional bits.
#[derive(Debug, Copy, Clone, Default, PartialEq, Eq, PartialOrd, Ord)]
#[repr(transparent)]
pub struct Fx<T, F: Unsigned> {
  bits: T,
  _phantom: PhantomData<F>,
}
#}

This says that Fx<T,F> is a generic type that holds some base number type T and a F type that's marking off how many fractional bits we're using. We only want people giving unsigned type-level values for the PhantomData type, so we use the trait bound F: Unsigned.

We use repr(transparent) here to ensure that Fx will always be treated just like the base type in the final program (in terms of bit pattern and ABI).

If you go and check, this is basically how the existing general purpose crates for fixed point math represent their numbers. They're a little fancier about it because they have to cover every case, and we only have to cover our GBA case.

That's quite a bit to type though. We probably want to make a few type aliases for things to be easier to look at. Unfortunately there's no standard notation for how you write a fixed point type. We also have to limit ourselves to what's valid for use in a Rust type too. I like the fx thing, so we'll use that for signed and then fxu if we need an unsigned value.


# #![allow(unused_variables)]
#fn main() {
/// Alias for an `i16` fixed point value with 8 fractional bits.
pub type fx8_8 = Fx<i16,U8>;
#}

Rust will complain about having non_camel_case_types, and you can shut that warning up by putting an #[allow(non_camel_case_types)] attribute on the type alias directly, or you can use #![allow(non_camel_case_types)] at the very top of the module to shut up that warning for the whole module (which is what I did).

Constructing A Fixed Point Value

So how do we actually make one of these values? Well, we can always just wrap or unwrap any value in our Fx type:


# #![allow(unused_variables)]
#fn main() {
impl<T, F: Unsigned> Fx<T, F> {
  /// Uses the provided value directly.
  pub fn from_raw(r: T) -> Self {
    Fx {
      num: r,
      phantom: PhantomData,
    }
  }
  /// Unwraps the inner value.
  pub fn into_raw(self) -> T {
    self.num
  }
}
#}

I'd like to use the From trait of course, but it was giving me some trouble, i think because of the orphan rule. Oh well.

If we want to be particular to the fact that these are supposed to be numbers... that gets tricky. Rust is actually quite bad at being generic about number types. You can use the num crate, or you can just use a macro and invoke it once per type. Guess what we're gonna do.


# #![allow(unused_variables)]
#fn main() {
macro_rules! fixed_point_methods {
  ($t:ident) => {
    impl<F: Unsigned> Fx<$t, F> {
      /// Gives the smallest positive non-zero value.
      pub fn precision() -> Self {
        Fx {
          num: 1,
          phantom: PhantomData,
        }
      }

      /// Makes a value with the integer part shifted into place.
      pub fn from_int_part(i: $t) -> Self {
        Fx {
          num: i << F::U8,
          phantom: PhantomData,
        }
      }
    }
  };
}

fixed_point_methods! {u8}
fixed_point_methods! {i8}
fixed_point_methods! {i16}
fixed_point_methods! {u16}
fixed_point_methods! {i32}
fixed_point_methods! {u32}
#}

Now you'd think that those can be const, but at the moment you can't have a const function with a bound on any trait other than Sized, so they have to be normal functions.

Also, we're doing something a little interesting there with from_int_part. We can take our F type and get its constant value. There's other associated constants if we want it in other types, and also non-const methods if you wanted that for some reason (maybe passing it as a closure function? dunno).

Casting Base Values

Next, once we have a value in one base type we will need to be able to move it into another base type. Unfortunately this means we gotta use the as operator, which requires a concrete source type and a concrete destination type. There's no easy way for us to make it generic here.

We could let the user use into_raw, cast, and then do from_raw, but that's error prone because they might change the fractional bit count accidentally. This means that we have to write a function that does the casting while perfectly preserving the fractional bit quantity. If we wrote one function for each conversion it'd be like 30 different possible casts (6 base types that we support, and then 5 possible target types). Instead, we'll write it just once in a way that takes a closure, and let the user pass a closure that does the cast. The compiler should merge it all together quite nicely for us once optimizations kick in.

This code goes outside the macro. I want to avoid too much code in the macro if we can, it's a little easier to cope with I think.


# #![allow(unused_variables)]
#fn main() {
  /// Casts the base type, keeping the fractional bit quantity the same.
  pub fn cast_inner<Z, C: Fn(T) -> Z>(self, op: C) -> Fx<Z, F> {
    Fx {
      num: op(self.num),
      phantom: PhantomData,
    }
  }
#}

It's horrible and ugly, but Rust is just bad at numbers sometimes.

Adjusting Fractional Part

In addition to the base value we might want to change our fractional bit quantity. This is actually easier that it sounds, but it also requires us to be tricky with the generics. We can actually use some typenum type level operators here.

This code goes inside the macro: we need to be able to use the left shift and right shift, which is easiest when we just use the macro's $t as our type. We could alternately put a similar function outside the macro and be generic on T having the left and right shift operators by using a where clause. As much as I'd like to avoid too much code being generated by macro, I'd even more like to avoid generic code with huge and complicated trait bounds. It comes down to style, and you gotta decide for yourself.


# #![allow(unused_variables)]
#fn main() {
      /// Changes the fractional bit quantity, keeping the base type the same.
      pub fn adjust_fractional_bits<Y: Unsigned + IsEqual<F, Output = False>>(self) -> Fx<$t, Y> {
        let leftward_movement: i32 = Y::to_i32() - F::to_i32();
        Fx {
          num: if leftward_movement > 0 {
            self.num << leftward_movement
          } else {
            self.num >> (-leftward_movement)
          },
          phantom: PhantomData,
        }
      }
#}

There's a few things at work. First, we introduce Y as the target number of fractional bits, and we also limit it that the target bits quantity can't be the same as we already have using a type-level operator. If it's the same as we started with, why are you doing the cast at all?

Now, once we're sure that the current bits and target bits aren't the same, we compute target - start, and call this our "leftward movement". Example: if we're targeting 8 bits and we're at 4 bits, we do 8-4 and get +4 as our leftward movement. If the leftward_movement is positive we naturally shift our current value to the left. If it's not positive then it must be negative because we eliminated 0 as a possibility using the type-level operator, so we shift to the right by the negative value.

Addition, Subtraction, Shifting, Negative, Comparisons

From here on we're getting help from this blog post by Job Vranish, so thank them if you learn something.

I might have given away the game a bit with those derive traits on our fixed point type. For a fair number of operations you can use the normal form of the op on the inner bits as long as the fractional parts have the same quantity. This includes equality and ordering (which we derived) as well as addition, subtraction, and bit shifting (which we need to do ourselves).

This code can go outside the macro, with sufficient trait bounds.


# #![allow(unused_variables)]
#fn main() {
impl<T: Add<Output = T>, F: Unsigned> Add for Fx<T, F> {
  type Output = Self;
  fn add(self, rhs: Fx<T, F>) -> Self::Output {
    Fx {
      num: self.num + rhs.num,
      phantom: PhantomData,
    }
  }
}
#}

The bound on T makes it so that Fx<T, F> can be added any time that T can be added to its own type with itself as the output. We can use the exact same pattern for Sub, Shl, Shr, and Neg. With enough trait bounds, we can do anything!


# #![allow(unused_variables)]
#fn main() {
impl<T: Sub<Output = T>, F: Unsigned> Sub for Fx<T, F> {
  type Output = Self;
  fn sub(self, rhs: Fx<T, F>) -> Self::Output {
    Fx {
      num: self.num - rhs.num,
      phantom: PhantomData,
    }
  }
}

impl<T: Shl<u32, Output = T>, F: Unsigned> Shl<u32> for Fx<T, F> {
  type Output = Self;
  fn shl(self, rhs: u32) -> Self::Output {
    Fx {
      num: self.num << rhs,
      phantom: PhantomData,
    }
  }
}

impl<T: Shr<u32, Output = T>, F: Unsigned> Shr<u32> for Fx<T, F> {
  type Output = Self;
  fn shr(self, rhs: u32) -> Self::Output {
    Fx {
      num: self.num >> rhs,
      phantom: PhantomData,
    }
  }
}

impl<T: Neg<Output = T>, F: Unsigned> Neg for Fx<T, F> {
  type Output = Self;
  fn neg(self) -> Self::Output {
    Fx {
      num: -self.num,
      phantom: PhantomData,
    }
  }
}
#}

Unfortunately, for Shl and Shr to have as much coverage on our type as it does on the base type (allowing just about any right hand side) we'd have to do another macro, but I think just u32 is fine. We can always add more later if we need.

We could also implement BitAnd, BitOr, BitXor, and Not, but they don't seem relevent to our fixed point math use, and this section is getting long already. Just use the same general patterns if you want to add it in your own programs. Shockingly, Rem also works directly if you want it, though I don't forsee us needing floating point remainder. Also, the GBA can't do hardware division or remainder, and we'll have to work around that below when we implement Div (which maybe we don't need, but it's complex enough I should show it instead of letting people guess).

Note: In addition to the various Op traits, there's also OpAssign variants. Each OpAssign is the same as Op, but takes &mut self instead of self and then modifies in place instead of producing a fresh value. In other words, if you want both + and += you'll need to do the AddAssign trait too. It's not the worst thing to just write a = a+b, so I won't bother with showing all that here. It's pretty easy to figure out for yourself if you want.

Multiplication

This is where things get more interesting. When we have two numbers A and B they really stand for (a*f) and (b*f). If we write A*B then we're really writing (a*f)*(b*f), which can be rewritten as (a*b)*2f, and now it's obvious that we have one more f than we wanted to have. We have to do the multiply of the inner value and then divide out the f. We divide by 1 << bit_count, so if we have 8 fractional bits we'll divide by 256.

The catch is that, when we do the multiply we're extremely likely to overflow our base type with that multiplication step. Then we do that divide, and now our result is basically nonsense. We can avoid this to some extent by casting up to a higher bit type, doing the multiplication and division at higher precision, and then casting back down. We want as much precision as possible without being too inefficient, so we'll always cast up to 32-bit (on a 64-bit machine you'd cast up to 64-bit instead).

Naturally, any signed value has to be cast up to i32 and any unsigned value has to be cast up to u32, so we'll have to handle those separately.

Also, instead of doing an actual divide we can right-shift by the correct number of bits to achieve the same effect. Except when we have a signed value that's negative, because actual division truncates towards zero and right-shifting truncates towards negative infinity. We can get around this by flipping the sign, doing the shift, and flipping the sign again (which sounds silly but it's so much faster than doing an actual division).

Also, again signed values can be annoying, because if the value just happens to be i32::MIN then when you negate it you'll have... still a negative value. I'm not 100% on this, but I think the correct thing to do at that point is to give $t::MIN as the output num value.

Did you get all that? Good, because this involves casting, so we will need to implement it three times, which calls for another macro.


# #![allow(unused_variables)]
#fn main() {
macro_rules! fixed_point_signed_multiply {
  ($t:ident) => {
    impl<F: Unsigned> Mul for Fx<$t, F> {
      type Output = Self;
      fn mul(self, rhs: Fx<$t, F>) -> Self::Output {
        let pre_shift = (self.num as i32).wrapping_mul(rhs.num as i32);
        if pre_shift < 0 {
          if pre_shift == core::i32::MIN {
            Fx {
              num: core::$t::MIN,
              phantom: PhantomData,
            }
          } else {
            Fx {
              num: (-((-pre_shift) >> F::U8)) as $t,
              phantom: PhantomData,
            }
          }
        } else {
          Fx {
            num: (pre_shift >> F::U8) as $t,
            phantom: PhantomData,
          }
        }
      }
    }
  };
}

fixed_point_signed_multiply! {i8}
fixed_point_signed_multiply! {i16}
fixed_point_signed_multiply! {i32}

macro_rules! fixed_point_unsigned_multiply {
  ($t:ident) => {
    impl<F: Unsigned> Mul for Fx<$t, F> {
      type Output = Self;
      fn mul(self, rhs: Fx<$t, F>) -> Self::Output {
        Fx {
          num: ((self.num as u32).wrapping_mul(rhs.num as u32) >> F::U8) as $t,
          phantom: PhantomData,
        }
      }
    }
  };
}

fixed_point_unsigned_multiply! {u8}
fixed_point_unsigned_multiply! {u16}
fixed_point_unsigned_multiply! {u32}
#}

Division

Division is similar to multiplication, but reversed. Which makes sense. This time A/B gives (a*f)/(b*f) which is a/b, one less f than we were after.

As with the multiplication version of things, we have to up-cast our inner value as much a we can before doing the math, to allow for the most precision possible.

The snag here is that the GBA has no division or remainder. Instead, the GBA has a BIOS function you can call to do i32/i32 division.

This is a potential problem for us though. If we have some unsigned value, we need it to fit within the positive space of an i32 after the multiply so that we can cast it to i32, call the BIOS function that only works on i32 values, and cast it back to its actual type.

  • If you have a u8 you're always okay, even with 8 floating bits.
  • If you have a u16 you're okay even with a maximum value up to 15 floating bits, but having a maximum value and 16 floating bits makes it break.
  • If you have a u32 you're probably going to be in trouble all the time.

So... ugh, there's not much we can do about this. For now we'll just have to suffer some.

// TODO: find a numerics book that tells us how to do u32/u32 divisions.


# #![allow(unused_variables)]
#fn main() {
macro_rules! fixed_point_signed_division {
  ($t:ident) => {
    impl<F: Unsigned> Div for Fx<$t, F> {
      type Output = Self;
      fn div(self, rhs: Fx<$t, F>) -> Self::Output {
        let mul_output: i32 = (self.num as i32).wrapping_mul(1 << F::U8);
        let divide_result: i32 = crate::bios::div(mul_output, rhs.num as i32);
        Fx {
          num: divide_result as $t,
          phantom: PhantomData,
        }
      }
    }
  };
}

fixed_point_signed_division! {i8}
fixed_point_signed_division! {i16}
fixed_point_signed_division! {i32}

macro_rules! fixed_point_unsigned_division {
  ($t:ident) => {
    impl<F: Unsigned> Div for Fx<$t, F> {
      type Output = Self;
      fn div(self, rhs: Fx<$t, F>) -> Self::Output {
        let mul_output: i32 = (self.num as i32).wrapping_mul(1 << F::U8);
        let divide_result: i32 = crate::bios::div(mul_output, rhs.num as i32);
        Fx {
          num: divide_result as $t,
          phantom: PhantomData,
        }
      }
    }
  };
}

fixed_point_unsigned_division! {u8}
fixed_point_unsigned_division! {u16}
fixed_point_unsigned_division! {u32}
#}

Trigonometry

TODO: look up tables! arcbits!

Just Using A Crate

If, after seeing all that, and seeing that I still didn't even cover every possible trait impl that you might want for all the possible types... if after all that you feel too intimidated, then I'll cave a bit on your behalf and suggest to you that the fixed crate seems to be the best crate available for fixed point math.

I have not tested its use on the GBA myself.

It's just my recommendation from looking at the docs of the various options available, if you really wanted to just have a crate for it.

Volatile Destination

TODO: update this when we can make more stuff const

Volatile Memory

The compiler is an eager friend, so when it sees a read or a write that won't have an effect, it eliminates that read or write. For example, if we write


# #![allow(unused_variables)]
#fn main() {
let mut x = 5;
x = 7;
#}

The compiler won't actually ever put 5 into x. It'll skip straight to putting 7 in x, because we never read from x when it's 5, so that's a safe change to make. Normally, values are stored in RAM, which has no side effects when you read and write from it. RAM is purely for keeping notes about values you'll need later on.

However, what if we had a bit of hardware where we wanted to do a write and that did something other than keeping the value for us to look at later? As you saw in the hello_magic example, we have to use a write_volatile operation. Volatile means "just do it anyway". The compiler thinks that it's pointless, but we know better, so we can force it to really do exactly what we say by using write_volatile instead of write.

This is kinda error prone though, right? Because it's just a raw pointer, so we might forget to use write_volatile at some point.

Instead, we want a type that's always going to use volatile reads and writes. Also, we want a pointer type that lets our reads and writes to be as safe as possible once we've unsafely constructed the initial value.

Constructing The VolAddress Type

First, we want a type that stores a location within the address space. This can be a pointer, or a usize, and we'll use a usize because that's easier to work with in a const context (and we want to have const when we can get it). We'll also have our type use NonZeroUsize instead of just usize so that Option<VolAddress<T>> stays as a single machine word. This helps quite a bit when we want to iterate over the addresses of a block of memory (such as locations within the palette memory). Hardware is never at the null address anyway. Also, if we had just an address number then we wouldn't be able to track what type the address is for. We need some PhantomData, and specifically we need the phantom data to be for *mut T:

  • If we used *const T that'd have the wrong variance.
  • If we used &mut T then that's fusing in the ideas of lifetime and exclusive access to our type. That's potentially important, but that's also an abstraction we'll build on top of this VolAddress type if we need it.

One abstraction layer at a time, so we start with just a phantom pointer. This gives us a type that looks like this:


# #![allow(unused_variables)]
#fn main() {
#[derive(Debug)]
#[repr(transparent)]
pub struct VolAddress<T> {
  address: NonZeroUsize,
  marker: PhantomData<*mut T>,
}
#}

Now, because of how derive is specified, it derives traits if the generic parameter supports those traits. Since our type is like a pointer, the traits it supports are distinct from whatever traits the target type supports. So we'll provide those implementations manually.


# #![allow(unused_variables)]
#fn main() {
impl<T> Clone for VolAddress<T> {
  fn clone(&self) -> Self {
    *self
  }
}
impl<T> Copy for VolAddress<T> {}
impl<T> PartialEq for VolAddress<T> {
  fn eq(&self, other: &Self) -> bool {
    self.address == other.address
  }
}
impl<T> Eq for VolAddress<T> {}
impl<T> PartialOrd for VolAddress<T> {
  fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
    Some(self.address.cmp(&other.address))
  }
}
impl<T> Ord for VolAddress<T> {
  fn cmp(&self, other: &Self) -> Ordering {
    self.address.cmp(&other.address)
  }
}
#}

Boilerplate junk, not interesting. There's a reason that you derive those traits 99% of the time in Rust.

Constructing A VolAddress Value

Okay so here's the next core concept: If we unsafely construct a VolAddress<T>, then we can safely use the value once it's been properly created.


# #![allow(unused_variables)]
#fn main() {
// you'll need these features enabled and a recent nightly
#![feature(const_int_wrapping)]
#![feature(min_const_unsafe_fn)]

impl<T> VolAddress<T> {
  pub const unsafe fn new_unchecked(address: usize) -> Self {
    VolAddress {
      address: NonZeroUsize::new_unchecked(address),
      marker: PhantomData,
    }
  }
  pub const unsafe fn cast<Z>(self) -> VolAddress<Z> {
    VolAddress {
      address: self.address,
      marker: PhantomData,
    }
  }
  pub unsafe fn offset(self, offset: isize) -> Self {
    VolAddress {
      address: NonZeroUsize::new_unchecked(self.address.get().wrapping_add(offset as usize * core::mem::size_of::<T>())),
      marker: PhantomData,
    }
  }
}
#}

So what are the unsafety rules here?

  • Non-null, obviously.
  • Must be aligned for T
  • Must always produce valid bit patterns for T
  • Must not be part of the address space that Rust's stack or allocator will ever uses.

So, again using the hello_magic example, we had


# #![allow(unused_variables)]
#fn main() {
(0x400_0000 as *mut u16).write_volatile(0x0403);
#}

And instead we could declare


# #![allow(unused_variables)]
#fn main() {
const MAGIC_LOCATION: VolAddress<u16> = unsafe { VolAddress::new_unchecked(0x400_0000) };
#}

Using A VolAddress Value

Now that we've named the magic location, we want to write to it.


# #![allow(unused_variables)]
#fn main() {
impl<T> VolAddress<T> {
  pub fn read(self) -> T
  where
    T: Copy,
  {
    unsafe { (self.address.get() as *mut T).read_volatile() }
  }
  pub unsafe fn read_non_copy(self) -> T {
    (self.address.get() as *mut T).read_volatile()
  }
  pub fn write(self, val: T) {
    unsafe { (self.address.get() as *mut T).write_volatile(val) }
  }
}
#}

So if the type is Copy we can read it as much as we want. If, somehow, the type isn't Copy, then it might be Drop, and that means if we read out a value over and over we could cause the drop method to trigger UB. Since the end user might really know what they're doing, we provide an unsafe backup read_non_copy.

On the other hand, we can write to the location as much as we want. Even if the type isn't Copy, not running Drop is safe, so a write is always safe.

Now we can write to our magical location.


# #![allow(unused_variables)]
#fn main() {
MAGIC_LOCATION.write(0x0403);
#}

VolAddress Iteration

We've already seen that sometimes we want to have a base address of some sort and then offset from that location to another. What if we wanted to iterate over all the locations. That's not particularly hard.


# #![allow(unused_variables)]
#fn main() {
impl<T> VolAddress<T> {
  pub const unsafe fn iter_slots(self, slots: usize) -> VolAddressIter<T> {
    VolAddressIter { vol_address: self, slots }
  }
}

#[derive(Debug)]
pub struct VolAddressIter<T> {
  vol_address: VolAddress<T>,
  slots: usize,
}
impl<T> Clone for VolAddressIter<T> {
  fn clone(&self) -> Self {
    VolAddressIter {
      vol_address: self.vol_address,
      slots: self.slots,
    }
  }
}
impl<T> PartialEq for VolAddressIter<T> {
  fn eq(&self, other: &Self) -> bool {
    self.vol_address == other.vol_address && self.slots == other.slots
  }
}
impl<T> Eq for VolAddressIter<T> {}
impl<T> Iterator for VolAddressIter<T> {
  type Item = VolAddress<T>;

  fn next(&mut self) -> Option<Self::Item> {
    if self.slots > 0 {
      let out = self.vol_address;
      unsafe {
        self.slots -= 1;
        self.vol_address = self.vol_address.offset(1);
      }
      Some(out)
    } else {
      None
    }
  }
}
impl<T> FusedIterator for VolAddressIter<T> {}
#}

VolAddressBlock

Obviously, having a base address and a length exist separately is error prone. There's a good reason for slices to keep their pointer and their length together. We want something like that, which we'll call a "block" because "array" and "slice" are already things in Rust.


# #![allow(unused_variables)]
#fn main() {
#[derive(Debug)]
pub struct VolAddressBlock<T> {
  vol_address: VolAddress<T>,
  slots: usize,
}
impl<T> Clone for VolAddressBlock<T> {
  fn clone(&self) -> Self {
    VolAddressBlock {
      vol_address: self.vol_address,
      slots: self.slots,
    }
  }
}
impl<T> PartialEq for VolAddressBlock<T> {
  fn eq(&self, other: &Self) -> bool {
    self.vol_address == other.vol_address && self.slots == other.slots
  }
}
impl<T> Eq for VolAddressBlock<T> {}

impl<T> VolAddressBlock<T> {
  pub const unsafe fn new_unchecked(vol_address: VolAddress<T>, slots: usize) -> Self {
    VolAddressBlock { vol_address, slots }
  }
  pub const fn iter(self) -> VolAddressIter<T> {
    VolAddressIter {
      vol_address: self.vol_address,
      slots: self.slots,
    }
  }
  pub unsafe fn index_unchecked(self, slot: usize) -> VolAddress<T> {
    self.vol_address.offset(slot as isize)
  }
  pub fn index(self, slot: usize) -> VolAddress<T> {
    if slot < self.slots {
      unsafe { self.vol_address.offset(slot as isize) }
    } else {
      panic!("Index Requested: {} >= Bound: {}", slot, self.slots)
    }
  }
  pub fn get(self, slot: usize) -> Option<VolAddress<T>> {
    if slot < self.slots {
      unsafe { Some(self.vol_address.offset(slot as isize)) }
    } else {
      None
    }
  }
}
#}

Now we can have something like:


# #![allow(unused_variables)]
#fn main() {
const OTHER_MAGIC: VolAddressBlock<u16> = unsafe {
  VolAddressBlock::new_unchecked(
    VolAddress::new_unchecked(0x600_0000),
    240 * 160
  )
};

OTHER_MAGIC.index(120 + 80 * 240).write_volatile(0x001F);
OTHER_MAGIC.index(136 + 80 * 240).write_volatile(0x03E0);
OTHER_MAGIC.index(120 + 96 * 240).write_volatile(0x7C00);
#}

Docs?

If you wanna see these types and methods with a full docs write up you should check the GBA crate's source.

Volatile ASM

In addition to some memory locations being volatile, it's also possible for inline assembly to be declared volatile. This is basically the same idea, "hey just do what I'm telling you, don't get smart about it".

Normally when you have some asm! it's basically treated like a function, there's inputs and outputs and the compiler will try to optimize it so that if you don't actually use the outputs it won't bother with doing those instructions. However, asm! is basically a pure black box, so the compiler doesn't know what's happening inside at all, and it can't see if there's any important side effects going on.

An example of an important side effect that doesn't have output values would be putting the CPU into a low power state while we want for the next VBlank. This lets us save quite a bit of battery power. It requires some setup to be done safely (otherwise the GBA won't ever actually wake back up from the low power state), but the asm! you use once you're ready is just a single instruction with no return value. The compiler can't tell what's going on, so you just have to say "do it anyway".

Note that if you use a linker script to include any ASM with your Rust program (eg: the crt0.s file that we setup in the "Development Setup" section), all of that ASM is "volatile" for these purposes. Volatile isn't actually a hardware concept, it's just an LLVM concept, and the linker script runs after LLVM has done its work.

Newtype

TODO: we've already used newtype twice by now (fixed point values and volatile addresses), so we need to adjust how we start this section.

There's a great Zero Cost abstraction that we'll be using a lot that you might not already be familiar with: we're talking about the "Newtype Pattern"!

Now, I told you to read the Rust Book before you read this book, and I'm sure you're all good students who wouldn't sneak into this book without doing the required reading, so I'm sure you all remember exactly what I'm talking about, because they touch on the newtype concept in the book twice, in two very long named sections:

...Yeah... The Rust Book doesn't know how to make a short sub-section name to save its life. Shame.

Newtype Basics

So, we have all these pieces of data, and we want to keep them separated, and we don't wanna pay the cost for it at runtime. Well, we're in luck, we can pay the cost at compile time.


# #![allow(unused_variables)]
#fn main() {
pub struct PixelColor(u16);
#}

TODO: we've already talked about repr(transparent) by now

Ah, except that, as I'm sure you remember from The Rustonomicon (and from the RFC too, of course), if we have a single field struct that's sometimes different from having just the bare value, so we should be using #[repr(transparent)] with our newtypes.


# #![allow(unused_variables)]
#fn main() {
#[repr(transparent)]
pub struct PixelColor(u16);
#}

And then we'll need to do that same thing for every other newtype we want.

Except there's only two tiny parts that actually differ between newtype declarations: the new name and the base type. All the rest is just the same rote code over and over. Generating piles and piles of boilerplate code? Sounds like a job for a macro to me!

Making It A Macro

If you're going to do much with macros you should definitely read through The Little Book of Rust Macros, but we won't be doing too much so you can just follow along here a bit if you like.

The most basic version of a newtype macro starts like this:


# #![allow(unused_variables)]
#fn main() {
#[macro_export]
macro_rules! newtype {
  ($new_name:ident, $old_name:ident) => {
    #[repr(transparent)]
    pub struct $new_name($old_name);
  };
}
#}

The #[macro_export] makes it exported by the current module (like pub kinda), and then we have one expansion option that takes an identifier, a ,, and then a second identifier. The new name is the outer type we'll be using, and the old name is the inner type that's being wrapped. You'd use our new macro something like this:


# #![allow(unused_variables)]
#fn main() {
newtype! {PixelColorCurly, u16}

newtype!(PixelColorParens, u16);

newtype![PixelColorBrackets, u16];
#}

Note that you can invoke the macro with the outermost grouping as any of (), [], or {}. It makes no particular difference to the macro. Also, that space in the first version is kinda to show off that you can put white space in between the macro name and the grouping if you want. The difference is mostly style, but there are some rules and considerations here:

  • If you use curly braces then you must not put a ; after the invocation.
  • If you use parentheses or brackets then you must put the ; at the end.
  • Rustfmt cares which you use and formats accordingly:
    • Curly brace macro use mostly gets treated like a code block.
    • Parentheses macro use mostly gets treated like a function call.
    • Bracket macro use mostly gets treated like an array declaration.

As a reminder: remember that macro_rules macros have to appear before they're invoked in your source, so the newtype macro will always have to be at the very top of your file, or if you put it in a module within your project you'll need to declare the module before anything that uses it.

Upgrade That Macro!

We also want to be able to add derive stuff and doc comments to our newtype. Within the context of macro_rules! definitions these are called "meta". Since we can have any number of them we wrap it all up in a "zero or more" matcher. Then our macro looks like this:


# #![allow(unused_variables)]
#fn main() {
#[macro_export]
macro_rules! newtype {
  ($(#[$attr:meta])* $new_name:ident, $old_name:ident) => {
    $(#[$attr])*
    #[repr(transparent)]
    pub struct $new_name($old_name);
  };
}
#}

So now we can write


# #![allow(unused_variables)]
#fn main() {
newtype! {
  /// Color on the GBA gives 5 bits for each channel, the highest bit is ignored.
  #[derive(Debug, Clone, Copy)]
  PixelColor, u16
}
#}

Next, we can allow for the wrapping of types that aren't just a single identifier by changing $old_name from :ident to :ty. We can't also do this for the $new_type part because declaring a new struct expects a valid identifier that's not already declared (obviously), and :ty is intended for capturing types that already exist.


# #![allow(unused_variables)]
#fn main() {
#[macro_export]
macro_rules! newtype {
  ($(#[$attr:meta])* $new_name:ident, $old_name:ty) => {
    $(#[$attr])*
    #[repr(transparent)]
    pub struct $new_name($old_name);
  };
}
#}

Next of course we'll want to usually have a new method that's const and just gives a 0 value. We won't always be making a newtype over a number value, but we often will. It's usually silly to have a new method with no arguments since we might as well just impl Default, but Default::default isn't const, so having pub const fn new() -> Self is justified here.

Here, the token 0 is given the {integer} type, which can be converted into any of the integer types as needed, but it still can't be converted into an array type or a pointer or things like that. Accordingly we've added the "no frills" option which declares the struct and no new method.


# #![allow(unused_variables)]
#fn main() {
#[macro_export]
macro_rules! newtype {
  ($(#[$attr:meta])* $new_name:ident, $old_name:ty) => {
    $(#[$attr])*
    #[repr(transparent)]
    pub struct $new_name($old_name);
    impl $new_name {
      /// A `const` "zero value" constructor
      pub const fn new() -> Self {
        $new_name(0)
      }
    }
  };
  ($(#[$attr:meta])* $new_name:ident, $old_name:ty, no frills) => {
    $(#[$attr])*
    #[repr(transparent)]
    pub struct $new_name($old_name);
  };
}
#}

Finally, we usually want to have the wrapped value be totally private, but there are occasions where that's not the case. For this, we can allow the wrapped field to accept a visibility modifier.


# #![allow(unused_variables)]
#fn main() {
#[macro_export]
macro_rules! newtype {
  ($(#[$attr:meta])* $new_name:ident, $v:vis $old_name:ty) => {
    $(#[$attr])*
    #[repr(transparent)]
    pub struct $new_name($v $old_name);
    impl $new_name {
      /// A `const` "zero value" constructor
      pub const fn new() -> Self {
        $new_name(0)
      }
    }
  };
  ($(#[$attr:meta])* $new_name:ident, $v:vis $old_name:ty, no frills) => {
    $(#[$attr])*
    #[repr(transparent)]
    pub struct $new_name($v $old_name);
  };
}
#}

Constant Assertions

Have you ever wanted to assert things even before runtime? We all have, of course. Particularly when the runtime machine is a poor little GBA, we'd like to have the machine doing the compile handle as much checking as possible.

Enter the static assertions crate, which provides a way to let you assert on a const expression.

This is an amazing crate that you should definitely use when you can.

It's written by Nikolai Vazquez, and they kindly wrote up a blog post that explains the thinking behind it.

However, I promised that each example would be single file, and I also promised to explain what's going on as we go, so we'll briefly touch upon giving an explanation here.

How We Const Assert

Alright, as it stands (2018-12-15), we can't use if in a const context.

Since we can't use if, we can't use a normal assert!. Some day it will be possible, and a failed assert at compile time will be a compile error and a failed assert at run time will be a panic and we'll have a nice unified programming experience. We can add runtime-only assertions by being a little tricky with the compiler.

If we write


# #![allow(unused_variables)]
#fn main() {
const ASSERT: usize = 0 - 1;
#}

that gives a warning, since the math would underflow. We can upgrade that warning to a hard error:


# #![allow(unused_variables)]
#fn main() {
#[deny(const_err)]
const ASSERT: usize = 0 - 1;
#}

And to make our construction reusable we can enable the underscore_const_names feature in our program (or library) and then give each such const an underscore for a name.


# #![allow(unused_variables)]
#![feature(underscore_const_names)]

#fn main() {
#[deny(const_err)]
const _: usize = 0 - 1;
#}

Now we wrap this in a macro where we give a bool expression as input. We negate the bool then cast it to a usize, meaning that true negates into false, which becomes 0usize, and then there's no underflow error. Or if the input was false, it negates into true, then becomes 1usize, and then the underflow error fires.


# #![allow(unused_variables)]
#fn main() {
macro_rules! const_assert {
  ($condition:expr) => {
    #[deny(const_err)]
    #[allow(dead_code)]
    const ASSERT: usize = 0 - !$condition as usize;
  }
}
#}

Technically, written like this, the expression can be anything with a core::ops::Not implementation that can also be as cast into usize. That's bool, but also basically all the other number types. Since we want to ensure that we get proper looking type errors when things go wrong, we can use ($condition && true) to enforce that we get a bool (thanks to Talchas for that particular suggestion).


# #![allow(unused_variables)]
#fn main() {
macro_rules! const_assert {
  ($condition:expr) => {
    #[deny(const_err)]
    #[allow(dead_code)]
    const _: usize = 0 - !($condition && true) as usize;
  }
}
#}

Asserting Something

As an example of how we might use a const_assert, we'll do a demo with colors. There's a red, blue, and green channel. We store colors in a u16 with 5 bits for each channel.


# #![allow(unused_variables)]
#fn main() {
newtype! {
  #[derive(Debug, Clone, Copy, PartialEq, Eq)]
  Color, u16
}
#}

And when we're building a color, we're passing in u16 values, but they could be using more than just 5 bits of space. We want to make sure that each channel is 31 or less, so we can make a color builder that does a const_assert! on the value of each channel.


# #![allow(unused_variables)]
#fn main() {
macro_rules! rgb {
  ($r:expr, $g:expr, $b:expr) => {
    {
      const_assert!($r <= 31);
      const_assert!($g <= 31);
      const_assert!($b <= 31);
      Color($b << 10 | $g << 5 | $r)
    }
  }
}
#}

And then we can declare some colors


# #![allow(unused_variables)]
#fn main() {
const RED: Color = rgb!(31, 0, 0);

const BLUE: Color = rgb!(31, 500, 0);
#}

The second one is clearly out of bounds and it fires an error just like we wanted.

Broad Concepts

The GameBoy Advance sits in a middle place between the chthonic game consoles of the ancient past and the "small PC in a funny case" consoles of the modern age.

On the one hand, yeah, you're gonna find a few strange conventions as you learn all the ropes.

On the other, at least we're writing in Rust at all, and not having to do all the assembly by hand.

This chapter for "concepts" has a section for each part of the GBA's hardware memory map, going by increasing order of base address value. The sections try to explain as much as possible while sticking to just the concerns you might have regarding that part of the memory map.

For an assessment of how to wrangle all three parts of the video system (PALRAM, VRAM, and OAM), along with the correct IO registers, into something that shows a picture, you'll want the Video chapter.

Similarly, the "IO Registers" part of the GBA actually controls how you interact with every single bit of hardware connected to the GBA. A full description of everything is obviously too much for just one section of the book. Instead you get an overview of general IO register rules and advice. Each particular register is described in the appropriate sections of either the Video or Non-Video chapters.

Bus Size

TODO: describe this

Minimum Write Size

TODO: talk about parts where you can't write one byte at a time

Volatile or Not?

TODO: discuss what memory should be used volatile style and what can be used normal style.

CPU

BIOS

  • Address Span: 0x0 to 0x3FFF (16k)

The BIOS of the GBA is a small read-only portion of memory at the very base of the address space. However, it is also hardware protected against reading, so if you try to read from BIOS memory when the program counter isn't pointed into the BIOS (eg: any time code you write is executing) then you get basically garbage data back.

So we're not going to spend time here talking about what bits to read or write within BIOS memory like we do with the other sections. Instead we're going to spend time talking about inline assembly (tracking issue) and then use it to call the GBA BIOS Functions.

Note that BIOS calls have more overhead than normal function calls, so don't go using them all over the place if you don't have to. They're also usually written more to be compact in terms of code than for raw speed, so you actually can out speed them in some cases. Between the increased overhead and not being as speed optimized, you can sometimes do a faster job without calling the BIOS at all. (TODO: investigate more about what parts of the BIOS we could potentially offer faster alternatives for.)

I'd like to take a moment to thank Marc Brinkmann (with contributions from Oliver Schneider and Philipp Oppermann) for writing this blog post. It's at least ten times the tutorial quality as the asm entry in the Unstable Book has. In fairness to the Unstable Book, the actual spec of how inline ASM works in rust is "basically what clang does", and that's specified as "basically what GCC does", and that's basically/shockingly not specified much at all despite GCC being like 30 years old.

So let's be slow and pedantic about this process.

Inline ASM

Fair Warning: Inline asm is one of the least stable parts of Rust overall, and if you write bad things you can trigger internal compiler errors and panics and crashes and make LLVM choke and die without explanation. If you write some inline asm and then suddenly your program suddenly stops compiling without explanation, try commenting out that whole inline asm use and see if it's causing the problem. Double check that you've written every single part of the asm call absolutely correctly, etc, etc.

Bonus Warning: The general information that follows regarding the asm macro is consistent from system to system, but specific information about register names, register quantities, asm instruction argument ordering, and so on is specific to ARM on the GBA. If you're programming for any other device you'll need to carefully investigate that before you begin.

Now then, with those out of the way, the inline asm docs describe an asm call as looking like this:


# #![allow(unused_variables)]
#fn main() {
asm!(assembly template
   : output operands
   : input operands
   : clobbers
   : options
   );
#}

And once you stick a lot of stuff in there it can absolutely be hard to remember the ordering of the elements. So we'll start with a code block that has some comments thrown in on each line:


# #![allow(unused_variables)]
#fn main() {
asm!(/* ASM */ TODO
    :/* OUT */ TODO
    :/* INP */ TODO
    :/* CLO */ TODO
    :/* OPT */
);
#}

Now we have to decide what we're gonna write. Obviously we're going to do some instructions, but those instructions use registers, and how are we gonna talk about them? We've got two choices.

  1. We can pick each and every register used by specifying exact register names. In THUMB mode we have 8 registers available, named r0 through r7. If you switch into 32-bit mode there's additional registers that are also available.

  2. We can specify slots for registers we need and let LLVM decide. In this style you name your slots $0, $1 and so on. Slot numbers are assigned first to all specified outputs, then to all specified inputs, in the order that you list them.

In the case of the GBA BIOS, each BIOS function has pre-designated input and output registers, so we will use the first style. If you use inline ASM in other parts of your code you're free to use the second style.

Assembly

This is just one big string literal. You write out one instruction per line, and excess whitespace is ignored. You can also do comments within your assembly using ; to start a comment that goes until the end of the line.

Assembly convention doesn't consider it unreasonable to comment potentially as much as every single line of asm that you write when you're getting used to things. Or even if you are used to things. This is cryptic stuff, there's a reason we avoid writing in it as much as possible.

Remember that our Rust code is in 16-bit mode. You can switch to 32-bit mode within your asm as long as you switch back by the time the block ends. Otherwise you'll have a bad time.

Outputs

A comma separated list. Each entry looks like

  • "constraint" (binding)

An output constraint starts with a symbol:

  • = for write only
  • + for reads and writes
  • & for for "early clobber", meaning that you'll write to this at some point before all input values have been read. It prevents this register from being assigned to an input register.

Followed by either the letter r (if you want LLVM to pick the register to use) or curly braces around a specific register (if you want to pick).

  • The binding can be any single 32-bit or smaller value.
  • If your binding has bit pattern requirements ("must be non-zero", etc) you are responsible for upholding that.
  • If your binding type will try to Drop later then you are responsible for it being in a fit state to do that.
  • The binding must be either a mutable binding or a binding that was pre-declared but not yet assigned.

Anything else is UB.

Inputs

This is a similar comma separated list.

  • "constraint" (binding)

An input constraint doesn't have the symbol prefix, you just pick either r or a named register with curly braces around it.

  • An input binding must be a single 32-bit or smaller value.
  • An input binding should be a type that is Copy but this is not an absolute requirement. Having the input be read is semantically similar to using core::ptr::read(&binding) and forgetting the value when you're done.

Clobbers

Sometimes your asm will touch registers other than the ones declared for input and output.

Clobbers are declared as a comma separated list of string literals naming specific registers. You don't use curly braces with clobbers.

LLVM needs to know this information. It can move things around to keep your data safe, but only if you tell it what's about to happen.

Failure to define all of your clobbers can cause UB.

Options

There's only one option we'd care to specify. That option is "volatile".

Just like with a function call, LLVM will skip a block of asm if it doesn't see that any outputs from the asm were used later on. Nearly every single BIOS call (other than the math operations) will need to be marked as "volatile".

BIOS ASM

  • Inputs are always r0, r1, r2, and/or r3, depending on function.
  • Outputs are always zero or more of r0, r1, and r3.
  • Any of the output registers that aren't actually used should be marked as clobbered.
  • All other registers are unaffected.

All of the GBA BIOS calls are performed using the swi instruction, combined with a value depending on what BIOS function you're trying to invoke. If you're in 16-bit code you use the value directly, and if you're in 32-bit mode you shift the value up by 16 bits first.

Example BIOS Function: Division

For our example we'll use the division function, because GBATEK gives very clear instructions on how each register is used with that one:

Signed Division, r0/r1.
  r0  signed 32bit Number
  r1  signed 32bit Denom
Return:
  r0  Number DIV Denom ;signed
  r1  Number MOD Denom ;signed
  r3  ABS (Number DIV Denom) ;unsigned
For example, incoming -1234, 10 should return -123, -4, +123.
The function usually gets caught in an endless loop upon division by zero.

The math folks tell me that the r1 value should be properly called the "remainder" not the "modulus". We'll go with that for our function, doesn't hurt to use the correct names. Our Rust function has an assert against dividing by 0, then we name some bindings without giving them a value, we make the asm call, and then return what we got.


# #![allow(unused_variables)]
#fn main() {
pub fn div_rem(numerator: i32, denominator: i32) -> (i32, i32) {
  assert!(denominator != 0);
  let div_out: i32;
  let rem_out: i32;
  unsafe {
    asm!(/* ASM */ "swi 0x06"
        :/* OUT */ "={r0}"(div_out), "={r1}"(rem_out)
        :/* INP */ "{r0}"(numerator), "{r1}"(denominator)
        :/* CLO */ "r3"
        :/* OPT */
    );
  }
  (div_out, rem_out)
}
#}

I hope this all makes sense by now.

Specific BIOS Functions

For a full list of all the specific BIOS functions and their use you should check the gba::bios module within the gba crate. There's just so many of them that enumerating them all here wouldn't serve much purpose.

Which is not to say that we'll never cover any BIOS functions in this book! Instead, we'll simply mention them when whenever they're relevent to the task at hand (such as controlling sound or waiting for vblank).

//TODO: list/name all BIOS functions as well as what they relate to elsewhere.

Work RAM

External Work RAM (EWRAM)

  • Address Span: 0x2000000 to 0x203FFFF (256k)

This is a big pile of space, the use of which is up to each game. However, the external work ram has only a 16-bit bus (if you read/write a 32-bit value it silently breaks it up into two 16-bit operations) and also 2 wait cycles (extra CPU cycles that you have to expend per 16-bit bus use).

It's most helpful to think of EWRAM as slower, distant memory, similar to the "heap" in a normal application. You can take the time to go store something within EWRAM, or to load it out of EWRAM, but if you've got several operations to do in a row and you're worried about time you should pull that value into local memory, work on your local copy, and then push it back out to EWRAM.

Internal Work RAM (IWRAM)

  • Address Span: 0x3000000 to 0x3007FFF (32k)

This is a smaller pile of space, but it has a 32-bit bus and no wait.

By default, 0x3007F00 to 0x3007FFF is reserved for interrupt and BIOS use. The rest of it is mostly up to you. The user's stack space starts at 0x3007F00 and proceeds down from there. For best results you should probably start at 0x3000000 and then go upwards. Under normal use it's unlikely that the two memory regions will crash into each other.

IO Registers

  • Address Span: 0x400_0000 to 0x400_03FE

Palette RAM (PALRAM)

  • Address Span: 0x500_0000 to 0x500_03FF (1k)

Palette RAM has a 16-bit bus, which isn't really a problem because it conceptually just holds u16 values. There's no automatic wait state, but if you try to access the same location that the display controller is accessing you get bumped by 1 cycle. Since the display controller can use the palette ram any number of times per scanline it's basically impossible to predict if you'll have to do a wait or not during VDraw. During VBlank you won't have any wait of course.

PALRAM is among the memory where there's weirdness if you try to write just one byte: if you try to write just 1 byte, it writes that byte into both parts of the larger 16-bit location. This doesn't really affect us much with PALRAM, because palette values are all supposed to be u16 anyway.

The palette memory actually contains not one, but two sets of palettes. First there's 256 entries for the background palette data (starting at 0x500_0000), and then there's 256 entries for object palette data (starting at 0x500_0200).

The GBA also has two modes for palette access: 8-bits-per-pixel (8bpp) and 4-bits-per-pixel (4bpp).

  • In 8bpp mode an 8-bit palette index value within a background or sprite simply indexes directly into the 256 slots for that type of thing.
  • In 4bpp mode a 4-bit palette index value within a background or sprite specifies an index within a particular "palbank" (16 palette entries each), and then a separate setting outside of the graphical data determines which palbank is to be used for that background or object (the screen entry data for backgrounds, and the object attributes for objects).

Transparency

When a pixel within a background or object specifies index 0 as its palette entry it is treated as a transparent pixel. This means that in 8bpp mode there's only 255 actual color options (0 being transparent), and in 4bpp mode there's only 15 actual color options available within each palbank (the 0th entry of each palbank is transparent).

Individual backgrounds, and individual objects, each determine if they're 4bpp or 8bpp separately, so a given overall palette slot might map to a used color in 8bpp and an unused/transparent color in 4bpp. If you're a palette wizard.

Palette slot 0 of the overall background palette is used to determine the "backdrop" color. That's the color you see if no background or object ends up being rendered within a given pixel.

Since display mode 3 and display mode 5 don't use the palette, they cannot benefit from transparency.

Video RAM (VRAM)

  • Address Span: 0x600_0000 to 0x601_7FFF (96k)

We've used this before! VRAM has a 16-bit bus and no wait. However, the same as with PALRAM, the "you might have to wait if the display controller is looking at it" rule applies here.

Unfortunately there's not much more exact detail that can be given about VRAM. The use of the memory depends on the video mode that you're using.

One general detail of note is that you can't write individual bytes to any part of VRAM. Depending on mode and location, you'll either get your bytes doubled into both the upper and lower parts of the 16-bit location targeted, or you won't even affect the memory. This usually isn't a big deal, except in two situations:

  • In Mode 4, if you want to change just 1 pixel, you'll have to be very careful to read the old u16, overwrite just the byte you wanted to change, and then write that back.
  • In any display mode, avoid using memcopy to place things into VRAM. It's written to be byte oriented, and only does 32-bit transfers under select conditions. The rest of the time it'll copy one byte at a time and you'll get either garbage or nothing at all.

Object Attribute Memory (OAM)

  • Address Span: 0x700_0000 to 0x700_03FF (1k)

The Object Attribute Memory has a 32-bit bus and no default wait, but suffers from the "you might have to wait if the display controller is looking at it" rule. You cannot write individual bytes to OAM at all, but that's not really a problem because all the fields of the data types within OAM are either i16 or u16 anyway.

Object attribute memory is the wildest yet: it conceptually contains two types of things, but they're interlaced with each other all the way through.

Now, GBATEK and CowByte doesn't quite give names to the two data types here. TONC calls them OBJ_ATTR and OBJ_AFFINE, but we'll be giving them names fitting with the Rust naming convention. Just know that if you try to talk about it with others they might not be using the same names. In Rust terms their layout would look like this:


# #![allow(unused_variables)]
#fn main() {
#[repr(C)]
pub struct ObjectAttributes {
  attr0: u16,
  attr1: u16,
  attr2: u16,
  filler: i16,
}

#[repr(C)]
pub struct AffineMatrix {
  filler0: [u16; 3],
  pa: i16,
  filler1: [u16; 3],
  pb: i16,
  filler2: [u16; 3],
  pc: i16,
  filler3: [u16; 3],
  pd: i16,
}
#}

(Note: the #[repr(C)] part just means that Rust must lay out the data exactly in the order we specify, which otherwise it is not required to do).

So, we've got 1024 bytes in OAM and each ObjectAttributes value is 8 bytes, so naturally we can support up to 128 objects.

At the same time, we've got 1024 bytes in OAM and each AffineMatrix is 32 bytes, so we can have 32 of them.

But, as I said, these things are all interlaced with each other. See how there's "filler" fields in each struct? If we imagine the OAM as being just an array of one type or the other, indexes 0/1/2/3 of the ObjectAttributes array would line up with index 0 of the AffineMatrix array. It's kinda weird, but that's just how it works. When we setup functions to read and write these values we'll have to be careful with how we do it. We probably won't want to use those representations above, at least not with the AffineMatrix type, because they're quite wasteful if you want to store just object attributes or just affine matrices.

Game Pak ROM / Flash ROM (ROM)

  • Address Span (Wait State 0): 0x800_0000 to 0x9FF_FFFF
  • Address Span (Wait State 1): 0xA00_0000 to 0xBFF_FFFF
  • Address Span (Wait State 2): 0xC00_0000 to 0xDFF_FFFF

The game's ROM data is a single set of data that's up to 32 megabytes in size. However, that data is mirrored to three different locations in the address space. Depending on which part of the address space you use, it can affect the memory timings involved.

TODO: describe WAITCNT here, we won't get a better chance at it.

TODO: discuss THUMB vs ARM code and why THUMB is so much faster (because ROM is a 16-bit bus)

Save RAM (SRAM)

  • Address Span: 0xE00_0000 to 0xE00FFFF (64k)

The actual amount of SRAM available depends on your game pak, and the 64k figure is simply the maximum possible. A particular game pak might have less, and an emulator will likely let you have all 64k if you want.

As with other portions of the address space, SRAM has some number of wait cycles per use. As with ROM, you can change the wait cycle settings via the WAITCNT register if the defaults don't work well for your game pak. See the ROM section for full details of how the WAITCNT register works.

The game pak SRAM also has only an 8-bit bus, so have fun with that.

The GBA Direct Memory Access (DMA) unit cannot access SRAM.

Also, you should not write to SRAM with code executing from ROM. Instead, you should move the code to WRAM and execute the save code from there. We'll cover how to handle that eventually.

Video

GBA Video starts with an IO register called the "Display Control Register", and then spirals out from there. You generally have to use Palette RAM (PALRAM), Video RAM (VRAM), Object Attribute Memory (OAM), as well as any number of other IO registers.

They all have to work together just right, and there's a lot going on when you first try doing it, so try to take it very slowly as you're learning each step.

RBG15 Color

TODO

Non-Video

Besides video effects the GBA still has an okay amount of stuff going on.

Obviously you'll want to know how to read the user's button inputs. That can almost go without saying, except that I said it.

Each other part can be handled in about any order you like.

Using interrupts is perhaps one of the hardest things for us as Rust programmers due to quirks in our compilation process. Our code all gets compiled to 16-bit THUMB instructions, and we don't have a way to mark a function to be compiled using 32-bit ASM instructions instead. However, an interrupt handler must be written in 32-bit ASM instructions for it to work. That means that we have to write our interrupt handler in 32-bit ASM by hand. We'll do it, but I don't think we'll be too happy about it.

The Link Cable related stuff is also probably a little harder to test than anything else. Just because link cable emulation isn't always the best, and or you need two GBAs with two flash carts and the cable for hardware testing. Still, we'll try to go over it eventually.

Buttons

It's all well and good to just show a picture, even to show an animation, but if we want a game we have to let the user interact with something.

Key Input Register

  • KEYINPUT, 0x400_0130, u16, read only

This little u16 stores the status of all the buttons on the GBA, all at once. There's only 10 of them, and we have 16 bits to work with, so that sounds easy. However, there's a bit of a catch. The register follows a "low-active" convention, where pressing a button clears that bit until it's released.


# #![allow(unused_variables)]
#fn main() {
const NO_BUTTONS_PRESSED: u16 = 0b0000_0011_1111_1111;
#}

The buttons are, going up in order from the 0th bit:

  • A
  • B
  • Select
  • Start
  • Right
  • Left
  • Up
  • Down
  • R
  • L

Bits above that are not used. However, since the left and right directions, as well as the up and down directions, can never be pressed at the same time, the KEYINPUT register should never read as zero. Of course, the register might read as zero if someone is using an emulator that allows for such inputs, so I wouldn't go so far as to make it be NonZeroU16 or anything like that.

When programming, we usually are thinking of what buttons we want to have be pressed instead of buttons we want to have not be pressed. This means that we need an inversion to happen somewhere along the line. The easiest moment of inversion is immediately as you read in from the register and wrap the value up in a newtype.


# #![allow(unused_variables)]
#fn main() {
pub fn read_key_input() -> KeyInput {
  KeyInput(KEYINPUT.read() ^ 0b0000_0011_1111_1111)
}
#}

Now the KeyInput you get can be checked for what buttons are pressed by checking for a set bit like you'd do anywhere else.


# #![allow(unused_variables)]
#fn main() {
impl KeyInput {
  pub fn a_pressed(self) -> bool {
    (self.0 & A_BIT) > 0
  }
}
#}

Note that the current KEYINPUT value changes in real time as the user presses or releases the buttons. To account for this, it's best to read the value just once per game frame and then use that single value as if it was the input across the whole frame. If you've worked with polling input before that should sound totally normal. If not, just remember to call read_key_input once per frame and then use that KeyInput value across the whole frame.

Detecting New Presses

The keypad only tells you what's currently pressed, but if you want to check what's newly pressed it's not too much harder.

All that you do is store the last frame's keys and compare them to the current keys with an XOR. In the gba crate it's called KeyInput::difference. Once you've got the difference between last frame and this frame, you know what changes happened.

  • If something is in the difference and not pressed in the last frame, that means it was newly pressed.
  • If something is in the difference and pressed in the last frame that means it was newly released.
  • If something is not in the difference then there's no change between last frame and this frame.

Key Interrupt Control

  • KEYCNT, 0x400_0132, u16, read/write

This lets you control what keys will trigger a keypad interrupt. Of course, for the actual interrupt to fire you also need to set the IME and IE registers properly. See the Interrupts section for details there.

The main thing to know about this register is that the keys are in the exact same order as the key input order. However, with this register they use a high-active convention instead (eg: the bit is active when the button should be pressed as part of the interrupt).

In addition to simply having the bits for the buttons, bit 14 is a flag for enabling keypad interrupts (in addition to the flag in the IE register), and bit 15 decides how having more than one button works. If bit 15 is disabled, it's an OR combination (eg: "press any key to continue"). If bit 15 is enabled it's an AND combination (eg: "press A+B+Start+Select to reset").

Timers

Direct Memory Access

The GBA has four Direct Memory Access (DMA) units that can be utilized. They're mostly the same in terms of overall operation, but each unit has special rules that make it better suited to a particular task.

Please Note: TONC and GBATEK have slightly different concepts of how a DMA unit's registers should be viewed. I've chosen to go by what GBATEK uses.

General DMA

A single DMA unit is controlled through four different IO Registers.

  • Source: (DMAxSAD, read only) A *const pointer that the DMA reads from.
  • Destination: (DMAxDAD, read only) A *mut pointer that the DMA writes to.
  • Count: (DMAxCNT_L, read only) How many transfers to perform.
  • Control: (DMAxCNT_H, read/write) A register full of bit-flags that controls all sorts of details.

Here, the x is replaced with 0 through 3 when utilizing whichever particular DMA unit.

Source Address

This is either a u32 or u16 address depending on the unit's assigned transfer mode (see Control). The address MUST be aligned.

With DMA0 the source must be internal memory. With other DMA units the source can be any non-SRAM location.

Destination Address

As with the Source, this is either a u32 or u16 address depending on the unit's assigned transfer mode (see Control). The address MUST be aligned.

With DMA0/1/2 the destination must be internal memory. With DMA3 the destination can be any non-SRAM memory (allowing writes into Game Pak ROM / FlashROM, assuming that your Game Pak hardware supports that).

Count

This is a u16 that says how many transfers (u16 or u32) to make.

DMA0/1/2 will only actually accept a 14-bit value, while DMA3 will accept a full 16-bit value. A value of 0 instead acts as if you'd used the maximum value for the DMA in question. Put another way, DMA0/1/2 transfer 1 through 0x4000 words, with 0 as the 0x4000 value, and DMA3 transfers 1 through 0x1_0000 words, with 0 as the 0x1_0000 value.

The maximum value isn't a very harsh limit. Even in just u16 mode, 0x4000 transfers is 32k, which would for example be all 32k of IWRAM (including your own user stack). If you for some reason do need to transfer more than a single DMA use can move around at once then you can just setup the DMA a second time and keep going.

Control

This u16 bit-flag field is where things get wild.

  • Bits 0-4 do nothing
  • Bit 5-6 control how the destination address changes per transfer:
    • 0: Offset +1
    • 1: Offset -1
    • 2: No Change
    • 3: Offset +1 and reload when a Repeat starts (below)
  • Bit 7-8 similarly control how the source address changes per transfer:
    • 0: Offset +1
    • 1: Offset -1
    • 2: No Change
    • 3: Prohibited
  • Bit 9: enables Repeat mode.
  • Bit 10: Transfer u16 (false) or u32 (true) data.
  • Bit 11: "Game Pak DRQ" flag. GBATEK says that this is only allowed for DMA3, and also your Game Pak hardware must be equipped to use DRQ mode. I don't even know what DRQ mode is all about, and GBATEK doesn't say much either. If DRQ is set then you must not set the Repeat bit as well. The gba crate simply doesn't bother to expose this flag to users.
  • Bit 12-13: DMA Start:
    • 0: "Immediate", which is 2 cycles after requested.
    • 1: VBlank
    • 2: HBlank
    • 3: Special, depending on what DMA unit is involved:
      • DMA0: Prohibited.
      • DMA1/2: Sound FIFO (see the Sound section)
      • DMA3: Video Capture, intended for use with the Repeat flag, performs a transfer per scanline (similar to HBlank) starting at VCOUNT 2 and stopping at VCOUNT 162. Intended for copying things from ROM or camera into VRAM.
  • Bit 14: Interrupt upon DMA complete.
  • Bit 15: Enable this DMA unit.

DMA Life Cycle

The general technique for using a DMA unit involves first setting the relevent source, destination, and count registers, then setting the appropriate control register value with the Enable bit set.

Once the Enable flag is set the appropriate DMA unit will trigger at the assigned time (Bit 12-13). The CPU's operation is halted while any DMA unit is active, until the DMA completes its task. If more than one DMA unit is supposed to be active at once, then the DMA unit with the lower number will activate and complete before any others.

When the DMA triggers via Enable, the Source, Destination, and Count values are copied from the GBA's registers into the DMA unit's internal registers. Changes to the DMA unit's internal copy of the data don't affect the values in the GBA registers. Another Enable will read the same values as before.

If DMA is triggered via having Repeat active then only the Count is copied in to the DMA unit registers. The Source and Destination are unaffected during a Repeat. The exception to this is if the destination address control value (Bits 5-6) are set to 3 (0b11), in which case a Repeat will also re-copy the Destination as well as the Count.

Once a DMA operation completes, the Enable flag of its Control register will automatically be disabled, unless the Repeat flag is on, in which case the Enable flag is left active. You will have to manually disable it if you don't want the DMA to kick in again over and over at the specified starting time.

DMA Limitations

The DMA units cannot access SRAM at all.

If you're using HBlank to access any part of the memory that the display controller utilizes (OAM, PALRAM, VRAM), you need to have enabled the "HBlank Interval Free" bit in the Display Control Register (DISPCNT).

Whenever DMA is active the CPU is not active, which means that Interrupts will not fire while DMA is happening. This can cause any number of hard to track down bugs. Try to limit your use of the DMA units if you can.

Sound

Interrupts

Link Cable

Game Pak

Examples