No more old book stuff (#123)

* stop with the book, we should focus on the crate.

* Update README.md

* Update README.md
This commit is contained in:
Lokathor 2021-04-05 18:11:42 -06:00 committed by GitHub
parent 99f80d2b9a
commit 8efef6ebc5
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
54 changed files with 32 additions and 6528 deletions

View file

@ -1,6 +1,6 @@
[package] [package]
name = "gba" name = "gba"
description = "A crate (and book) for making GBA games with Rust." description = "A crate for making GBA games with Rust."
version = "0.4.0-pre1" version = "0.4.0-pre1"
authors = ["Lokathor <zefria@gmail.com>", "Thomas Winwood <twwinwood@gmail.com>"] authors = ["Lokathor <zefria@gmail.com>", "Thomas Winwood <twwinwood@gmail.com>"]
repository = "https://github.com/rust-console/gba" repository = "https://github.com/rust-console/gba"

View file

@ -11,43 +11,45 @@
# gba # gba
_Eventually_ there will be a full [Tutorial A crate to make GBA programming easy.
Book](https://rust-console.github.io/gba/) that goes along with this crate.
However, currently the development focus is leaning towards having minimal
coverage of all the parts of the GBA. Until that's done, unfortunately the book
will be in a rather messy state.
## What's Missing Currently we don't have as much documentation as we'd like.
If you check out the [awesome-gbadev](https://github.com/gbdev/awesome-gbadev) repository they have many resources, though most are oriented towards C.
The following major GBA features are still missing from the crate: ## First Time Setup
* Affine Graphics Building for the GBA requires Nightly rust, and also uses the `build-std` feature, so you'll need the rust source available.
* Interrupt Handling
* Serial Communication
## Build Dependencies
Install required cargo packages
```sh ```sh
rustup install nightly rustup install nightly
rustup +nightly component add rust-src rustup +nightly component add rust-src
```
You'll also need the ARM binutils so that you can have the assembler and linker for the ARMv4T architecture.
The way to get them varies by platform:
* Ubuntu and other debian-like linux distros will usually have them in the package manager.
```shell
sudo apt-get install binutils-arm-none-eabi
```
* With OSX you can get them via homebrew.
```shell
brew install --cask gcc-arm-embedded
```
* On Windows you can get the installer from ARM's website and run that.
* Download the [GNU Arm Embedded Toolchain](https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-rm/downloads)
* When installing the toolchain, make sure to select "Add path to environment variable" during install.
* You'll have to restart any open command prompts after you so run the installer so that they see the new PATH value.
Finally, rustc itself is only able to make ELF format files. These can be run in emulators, but aren't able to be played on actual hardware.
You'll need to convert the ELF file into a GBA rom. There's a `cargo-make` file in this repository to do this, and it relies on a tool called `gbafix`
to assign the right header data to the ROM when packing it.
```sh
cargo install cargo-make cargo install cargo-make
cargo install gbafix cargo install gbafix
``` ```
Install arm build tools <!--
* Ubuntu
```shell
sudo apt-get install binutils-arm-none-eabi
```
* OSX
```shell
brew install --cask gcc-arm-embedded
```
* Windows
* Download the [GNU Arm Embedded Toolchain](https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-rm/downloads)
* Install the toolchain, make sure to select "Add path to environment variable" during install
## First Time Setup ## First Time Setup
Writing a Rust program for the GBA requires a fair amount of special setup. All Writing a Rust program for the GBA requires a fair amount of special setup. All
@ -61,8 +63,9 @@ project started quickly we got you covered:
```sh ```sh
curl https://raw.githubusercontent.com/rust-console/gba/master/init.sh -sSf | bash -s APP_NAME curl https://raw.githubusercontent.com/rust-console/gba/master/init.sh -sSf | bash -s APP_NAME
``` ```
-->
# Contribution # Contribution
This crate is Apache2 licensed and any contributions you submit must also be This crate is tri-licensed under Zlib / Apache-2.0 / MIT.
Apache2 licensed. Any contributions you submit must be licensed the same.

View file

@ -1,7 +0,0 @@
[book]
title = "Rust GBA Guide"
authors = ["Lokathor"]
[build]
build-dir = "../target/book-output"
create-missing = true

View file

@ -1,38 +0,0 @@
# Broad Concepts
The GameBoy Advance sits in a middle place between the chthonic game consoles of
the ancient past and the "small PC in a funny case" consoles of the modern age.
On the one hand, yeah, you're gonna find a few strange conventions as you learn
all the ropes.
On the other, at least we're writing in Rust at all, and not having to do all
the assembly by hand.
This chapter for "concepts" has a section for each part of the GBA's hardware
memory map, going by increasing order of base address value. The sections try to
explain as much as possible while sticking to just the concerns you might have
regarding that part of the memory map.
For an assessment of how to wrangle all three parts of the video system (PALRAM,
VRAM, and OAM), along with the correct IO registers, into something that shows a
picture, you'll want the Video chapter.
Similarly, the "IO Registers" part of the GBA actually controls how you interact
with every single bit of hardware connected to the GBA. A full description of
everything is obviously too much for just one section of the book. Instead you
get an overview of general IO register rules and advice. Each particular
register is described in the appropriate sections of either the Video or
Non-Video chapters.
## Bus Size
TODO: describe this
## Minimum Write Size
TODO: talk about parts where you can't write one byte at a time
## Volatile or Not?
TODO: discuss what memory should be used volatile style and what can be used normal style.

View file

@ -1,21 +0,0 @@
# Introduction
This is the book for learning how to write GameBoy Advance (GBA) games in Rust.
I'm **Lokathor**, the main author of the book. There's also **Ketsuban** who
provides the technical advisement, reviews the PRs, and keeps my crazy in check.
The book is a work in progress, as you can see if you actually try to open many
of the pages listed in the Table Of Contents.
## Feedback
It's very often hard to tell when you've explained something properly. In the
same way that your brain will read over small misspellings and correct things
into the right word, if an explanation for something you already understand
accidentally skips over some small detail then your brain can fill in the gaps
without you realizing it.
**Please**, if things don't make sense then [file an
issue](https://github.com/rust-console/gba/issues) about it so I know where
things need to improve.

View file

@ -1,21 +0,0 @@
# Non-Video
Besides video effects the GBA still has an okay amount of stuff going on.
Obviously you'll want to know how to read the user's button inputs. That can
almost go without saying, except that I said it.
Each other part can be handled in about any order you like.
Using interrupts is perhaps one of the hardest things for us as Rust programmers
due to quirks in our compilation process. Our code all gets compiled to 16-bit
THUMB instructions, and we don't have a way to mark a function to be compiled
using 32-bit ASM instructions instead. However, an interrupt handler _must_ be
written in 32-bit ASM instructions for it to work. That means that we have to
write our interrupt handler in 32-bit ASM by hand. We'll do it, but I don't
think we'll be too happy about it.
The Link Cable related stuff is also probably a little harder to test than
anything else. Just because link cable emulation isn't always the best, and or
you need two GBAs with two flash carts and the cable for hardware testing.
Still, we'll try to go over it eventually.

View file

@ -1,9 +0,0 @@
# Quirks
The GBA supports a lot of totally normal Rust code exactly like you'd think.
However, it also is missing a lot of what you might expect, and sometimes we
have to do things in slightly weird ways.
We start the book by covering the quirks our code will have, just to avoid too
many surprises later.

View file

@ -1,9 +0,0 @@
# Video
GBA Video starts with an IO register called the "Display Control Register", and
then spirals out from there. You generally have to use Palette RAM (PALRAM),
Video RAM (VRAM), Object Attribute Memory (OAM), as well as any number of other
IO registers.
They all have to work together just right, and there's a lot going on when you
first try doing it, so try to take it very slowly as you're learning each step.

View file

@ -1,102 +0,0 @@
# Buttons
It's all well and good to just show a picture, even to show an animation, but if
we want a game we have to let the user interact with something.
## Key Input Register
* KEYINPUT, `0x400_0130`, `u16`, read only
This little `u16` stores the status of _all_ the buttons on the GBA, all at
once. There's only 10 of them, and we have 16 bits to work with, so that sounds
easy. However, there's a bit of a catch. The register follows a "low-active"
convention, where pressing a button _clears_ that bit until it's released.
```rust
const NO_BUTTONS_PRESSED: u16 = 0b0000_0011_1111_1111;
```
The buttons are, going up in order from the 0th bit:
* A
* B
* Select
* Start
* Right
* Left
* Up
* Down
* R
* L
Bits above that are not used. However, since the left and right directions, as
well as the up and down directions, can never be pressed at the same time, the
`KEYINPUT` register should never read as zero. Of course, the register _might_
read as zero if someone is using an emulator that allows for such inputs, so I
wouldn't go so far as to make it be `NonZeroU16` or anything like that.
When programming, we usually are thinking of what buttons we want to have _be
pressed_ instead of buttons we want to have _not be pressed_. This means that we
need an inversion to happen somewhere along the line. The easiest moment of
inversion is immediately as you read in from the register and wrap the value up
in a newtype.
```rust
pub fn read_key_input() -> KeyInput {
KeyInput(KEYINPUT.read() ^ 0b0000_0011_1111_1111)
}
```
Now the KeyInput you get can be checked for what buttons are pressed by checking
for a set bit like you'd do anywhere else.
```rust
impl KeyInput {
pub fn a_pressed(self) -> bool {
(self.0 & A_BIT) > 0
}
}
```
Note that the current `KEYINPUT` value changes in real time as the user presses
or releases the buttons. To account for this, it's best to read the value just
once per game frame and then use that single value as if it was the input across
the whole frame. If you've worked with polling input before that should sound
totally normal. If not, just remember to call `read_key_input` once per frame
and then use that `KeyInput` value across the whole frame.
### Detecting New Presses
The keypad only tells you what's _currently_ pressed, but if you want to check
what's _newly_ pressed it's not too much harder.
All that you do is store the last frame's keys and compare them to the current
keys with an `XOR`. In the `gba` crate it's called `KeyInput::difference`. Once
you've got the difference between last frame and this frame, you know what
changes happened.
* If something is in the difference and _not pressed_ in the last frame, that
means it was newly pressed.
* If something is in the difference and _pressed_ in the last frame that means
it was newly released.
* If something is not in the difference then there's no change between last
frame and this frame.
## Key Interrupt Control
* KEYCNT, `0x400_0132`, `u16`, read/write
This lets you control what keys will trigger a keypad interrupt. Of course, for
the actual interrupt to fire you also need to set the `IME` and `IE` registers
properly. See the [Interrupts](05-interrupts.md) section for details there.
The main thing to know about this register is that the keys are in _the exact
same order_ as the key input order. However, with this register they use a
high-active convention instead (eg: the bit is active when the button should be
pressed as part of the interrupt).
In addition to simply having the bits for the buttons, bit 14 is a flag for
enabling keypad interrupts (in addition to the flag in the `IE` register), and
bit 15 decides how having more than one button works. If bit 15 is disabled,
it's an OR combination (eg: "press any key to continue"). If bit 15 is enabled
it's an AND combination (eg: "press A+B+Start+Select to reset").

View file

@ -1 +0,0 @@
# CPU

View file

@ -1,160 +0,0 @@
# No Std
First up, as you already saw in the `hello_magic` code, we have to use the
`#![no_std]` outer attribute on our program when we target the GBA. You can find
some info about `no_std` in two official sources:
* [unstable
book section](https://doc.rust-lang.org/unstable-book/language-features/lang-items.html#writing-an-executable-without-stdlib)
* [embedded
book section](https://rust-embedded.github.io/book/intro/no-std.html?highlight=no_std#a--no_std--rust-environment)
The unstable book is borderline useless here because it's describing too many
things in too many words. The embedded book is much better, but still fairly
terse.
## Bare Metal
The GBA falls under what the Embedded Book calls "Bare Metal Environments".
Basically, the machine powers on and immediately begins executing some ASM code.
Our ASM startup was provided by `Ketsuban` (check the `crt0.s` file). We'll go
over _how_ it works much later on, for now it's enough to know that it does
work, and eventually control passes into Rust code.
On the rust code side of things, we determine our starting point with the
`#[start]` attribute on our `main` function. The `main` function also has a
specific type signature that's different from the usual `main` that you'd see in
Rust. I'd tell you to read the unstable-book entry on `#[start]` but they
[literally](https://doc.rust-lang.org/unstable-book/language-features/start.html)
just tell you to look at the [tracking issue for
it](https://github.com/rust-lang/rust/issues/29633) instead, and that's not very
helpful either. Basically it just _has_ to be declared the way it is, even
though there's nothing passing in the arguments and there's no place that the
return value will go. The compiler won't accept it any other way.
## No Standard Library
The Embedded Book tells us that we can't use the standard library, but we get
access to something called "libcore", which sounds kinda funny. What they're
talking about is just [the core
crate](https://doc.rust-lang.org/core/index.html), which is called `libcore`
within the rust repository for historical reasons.
The `core` crate is actually still a really big portion of Rust. The standard
library doesn't actually hold too much code (relatively speaking), instead it
just takes code form other crates and then re-exports it in an organized way. So
with just `core` instead of `std`, what are we missing?
In no particular order:
* Allocation
* Clock
* Network
* File System
The allocation system and all the types that you can use if you have a global
allocator are neatly packaged up in the
[alloc](https://doc.rust-lang.org/alloc/index.html) crate. The rest isn't as
nicely organized.
It's _possible_ to implement a fair portion of the entire standard library
within a GBA context and make the rest just panic if you try to use it. However,
do you really need all that? Eh... probably not?
* We don't need a file system, because all of our data is just sitting there in
the ROM for us to use. When programming we can organize our `const` data into
modules and such to keep it organized, but once the game is compiled it's just
one huge flat address space. TODO: Parasyte says that a FS can be handy even
if it's all just ReadOnly, so we'll eventually talk about how you might set up
such a thing I guess, since we'll already be talking about replacements for
three of the other four things we "lost". Maybe we'll make Parasyte write that
section.
* Networking, well, the GBA has a Link Cable you can use to communicate with
another GBA, but it's not really like a unix socket with TCP, so the standard
Rust networking isn't a very good match.
* Clock is actually two different things at once. One is the ability to store
the time long term, which is a bit of hardware that some gamepaks have in them
(eg: pokemon ruby/sapphire/emerald). The GBA itself can't keep time while
power is off. However, the second part is just tracking time moment to moment,
which the GBA can totally do. We'll see how to access the timers soon enough.
Which just leaves us with allocation. Do we need an allocator? Depends on your
game. For demos and small games you probably don't need one. For bigger games
you'll maybe want to get an allocator going eventually. It's in some sense a
crutch, but it's a very useful one.
So I promise that at some point we'll cover how to get an allocator going.
Either a Rust Global Allocator (if practical), which would allow for a lot of
the standard library types to be used "for free" once it was set up, or just a
custom allocator that's GBA specific if Rust's global allocator style isn't a
good fit for the GBA (I honestly haven't looked into it).
## Bare Metal Panic
If our code panics, we usually want to see that panic message. Unfortunately,
without a way to access something like `stdout` or `stderr` we've gotta do
something a little weirder.
If our program is running within the `mGBA` emulator, version 0.7 or later, we
can access a special set of addresses that allow us to send out `CString`
values, which then appear within a message log that you can check.
We can capture this behavior by making an `MGBADebug` type, and then implement
`core::fmt::Write` for that type. Once done, the `write!` macro will let us
target the mGBA debug output channel.
When used, it looks like this:
```rust
#[panic_handler]
fn panic(info: &core::panic::PanicInfo) -> ! {
use core::fmt::Write;
use gba::mgba::{MGBADebug, MGBADebugLevel};
if let Some(mut mgba) = MGBADebug::new() {
let _ = write!(mgba, "{}", info);
mgba.send(MGBADebugLevel::Fatal);
}
loop {}
}
```
If you want to follow the particulars you can check the `MGBADebug` source in
the `gba` crate. Basically, there's one address you can use to try and activate
the debug output, and if it works you write your message into the "array" at
another address, and then finally write a send value to a third address. You'll
need to have read the [volatile](03-volatile_destination.md) section for the
details to make sense.
## LLVM Intrinsics
The above code will make your program fail to build in debug mode, saying that
`__clzsi2` can't be found. This is a special builtin function that LLVM attempts
to use when there's no hardware version of an operation it wants to do (in this
case, counting the leading zeros). It's not _actually_ necessary in this case,
which is why you only need it in debug mode. The higher optimization level of
release mode makes LLVM pre-compute more and fold more constants or whatever and
then it stops trying to call `__clzsi2`.
Unfortunately, sometimes a build will fail with a missing intrinsic even in
release mode.
If LLVM wants _core_ to have that intrinsic then you're in
trouble, you'll have to send a PR to the
[compiler-builtins](https://github.com/rust-lang-nursery/compiler-builtins)
repository and hope to get it into rust itself.
If LLVM wants _your code_ to have the intrinsic then you're in less trouble. You
can look up the details and then implement it yourself. It can go anywhere in
your program, as long as it has the right ABI and name. In the case of
`__clzsi2` it takes a `usize` and returns a `usize`, so you'd write something
like:
```rust
#[no_mangle]
pub extern "C" fn __clzsi2(mut x: usize) -> usize {
//
}
```
And so on for whatever other missing intrinsic.

View file

@ -1,29 +0,0 @@
# Reader Requirements
This book naturally assumes that you've already read Rust's core book:
* [The Rust Programming Language](https://doc.rust-lang.org/book/)
Now, I _know_ it sounds silly to say "if you wanna program Rust on this old
video game system you should already know how to program Rust", but the more
people I meet and chat with the more they tell me that they jumped into Rust
without reading any or all of the book. You know who you are.
Please, read the whole book!
In addition to the core book, there's also an expansion book that I will declare
to be required reading for this:
* [The Rustonomicon](https://doc.rust-lang.org/nomicon/)
The Rustonomicon is all about trying to demystify `unsafe`. We'll end up using a
fair bit of unsafe code as a natural consequence of doing direct hardware
manipulations. Using unsafe is like [swinging a
sword](https://www.zeldadungeon.net/wp-content/uploads/2013/04/tumblr_mlkpzij6T81qizbpto1_1280.gif),
you should start slowly, practice carefully, and always pay attention no matter
how experienced you think you've become.
That said, it's sometimes a [necessary
tool](https://www.youtube.com/watch?v=rTo2u13lVcQ) to get the job done, so you
have to break out of the borderline pathological fear of using it that most rust
programmers tend to have.

View file

@ -1 +0,0 @@
# RBG15 Color

View file

@ -1,239 +0,0 @@
# BIOS
* **Address Span:** `0x0` to `0x3FFF` (16k)
The [BIOS](https://en.wikipedia.org/wiki/BIOS) of the GBA is a small read-only
portion of memory at the very base of the address space. However, it is also
hardware protected against reading, so if you try to read from BIOS memory when
the program counter isn't pointed into the BIOS (eg: any time code _you_ write
is executing) then you get [basically garbage
data](https://problemkaputt.de/gbatek.htm#gbaunpredictablethings) back.
So we're not going to spend time here talking about what bits to read or write
within BIOS memory like we do with the other sections. Instead we're going to
spend time talking about [inline
assembly](https://doc.rust-lang.org/unstable-book/language-features/asm.html)
([tracking issue](https://github.com/rust-lang/rust/issues/29722)) and then use
it to call the [GBA BIOS
Functions](https://problemkaputt.de/gbatek.htm#biosfunctions).
Note that BIOS calls have _more overhead than normal function calls_, so don't
go using them all over the place if you don't have to. They're also usually
written more to be compact in terms of code than for raw speed, so you actually
can out speed them in some cases. Between the increased overhead and not being
as speed optimized, you can sometimes do a faster job without calling the BIOS
at all. (TODO: investigate more about what parts of the BIOS we could
potentially offer faster alternatives for.)
I'd like to take a moment to thank [Marc Brinkmann](https://github.com/mbr)
(with contributions from [Oliver Scherer](https://github.com/oli-obk) and
[Philipp Oppermann](https://github.com/phil-opp)) for writing [this blog
post](http://embed.rs/articles/2016/arm-inline-assembly-rust/). It's at least
ten times the tutorial quality as the `asm` entry in the Unstable Book has. In
fairness to the Unstable Book, the actual spec of how inline ASM works in rust
is "basically what clang does", and that's specified as "basically what GCC
does", and that's basically/shockingly not specified much at all despite GCC
being like 30 years old.
So let's be slow and pedantic about this process.
## Inline ASM
**Fair Warning:** The general information that follows regarding the asm macro
is consistent from system to system, but specific information about register
names, register quantities, asm instruction argument ordering, and so on is
specific to ARM on the GBA. If you're programming for any other device you'll
need to carefully investigate that before you begin.
Now then, with those out of the way, the inline asm docs describe an asm call as
looking like this:
```rust
let x = 10u32;
let y = 34u32;
let result: u32;
asm!(
// assembly template
"add {lhs}, {rhs}",
lhs = inout(reg_thumb) x => result,
rhs = in(reg_thumb) y,
options(nostack, nomem),
);
// result == 44
```
The `asm` macro follows the [RFC
2873](https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md)
syntax. The following is just a summary of the RFC.
Now we have to decide what we're gonna write. Obviously we're going to do some
instructions, but those instructions use registers, and how are we gonna talk
about them? We've got two choices.
1) We can pick each and every register used by specifying exact register names.
In THUMB mode we have 8 registers available, named `r0` through `r7`. To use
those registers you would write `in("r0") x` instead of
`rhs = in(reg_thumb) x`, and directly refer to `r0` in the assembly template.
2) We can specify slots for registers we need and let LLVM decide. This is what
we do when we write `rhs = in(reg_thumb) y` and use `{rhs}` in the assembly
template.
The `reg_thumb` stands for the register class we are using. Since we are
in THUMB mode, the set of registers we can use is limited. `reg_thumb` tells
LLVM: "use only registers available in THUMB mode". In 32-bit mode, you have
access to more register and you should use a different register class.
The register classes [are described in the
RFC](https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md#register-operands).
Look for "ARM" register classes.
In the case of the GBA BIOS, each BIOS function has pre-designated input and
output registers, so we will use the first style. If you use inline ASM in other
parts of your code you're free to use the second style.
### Assembly
This is just one big string literal. You write out one instruction per line, and
excess whitespace is ignored. You can also do comments within your assembly
using `;` to start a comment that goes until the end of the line.
Assembly convention doesn't consider it unreasonable to comment potentially as
much as _every single line_ of asm that you write when you're getting used to
things. Or even if you are used to things. This is cryptic stuff, there's a
reason we avoid writing in it as much as possible.
Remember that our Rust code is in 16-bit mode. You _can_ switch to 32-bit mode
within your asm as long as you switch back by the time the block ends. Otherwise
you'll have a bad time.
### Register bindings
After the assembly string literal, you need to define your binding (which
rust variables are getting into your registers and which ones are going to refer
to their value afterward).
There are many operand types [as per the
RFC](https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md#operand-type),
but you will most often use:
```
[alias =] in(<reg>) <binding> // input
[alias =] out(<reg>) <binding> // output
[alias =] inout(<reg>) <in binding> => <out binding> // both
out(<reg>) _ // Clobber
```
* The binding can be any single 32-bit or smaller value.
* If your binding has bit pattern requirements ("must be non-zero", etc) you are
responsible for upholding that.
* If your binding type will try to `Drop` later then you are responsible for it
being in a fit state to do that.
* The binding must be either a mutable binding or a binding that was
pre-declared but not yet assigned.
* An input binding must be a single 32-bit or smaller value.
* An input binding _should_ be a type that is `Copy` but this is not an absolute
requirement. Having the input be read is semantically similar to using
`core::ptr::read(&binding)` and forgetting the value when you're done.
Anything else is UB.
### Clobbers
Sometimes your asm will touch registers other than the ones declared for input
and output.
Clobbers are declared as a comma separated list of string literals naming
specific registers. You don't use curly braces with clobbers.
LLVM _needs_ to know this information. It can move things around to keep your
data safe, but only if you tell it what's about to happen.
Failure to define all of your clobbers can cause UB.
### Options
By default the compiler won't optimize the code you wrote in an `asm` block. You
will need to specify with the `options(..)` parameter that your code can be
optimized. The available options [are specified in the
RFC](https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md#options-1).
An optimization might duplicate or remove your instructions from the final
code.
Typically when executing a BIOS call (such as `swi 0x01`, which resets the
console), it's important that the instruction is executed, and not optimized
away, even though it has no observable input and output to the compiler.
However some BIOS calls, such as _some_ math functions, have no observable
effects outside of the registers we specified, in this case, we instruct the
compiler to optimize them.
### BIOS ASM
* Inputs are always `r0`, `r1`, `r2`, and/or `r3`, depending on function.
* Outputs are always zero or more of `r0`, `r1`, and `r3`.
* Any of the output registers that aren't actually used should be marked as
clobbered.
* All other registers are unaffected.
All of the GBA BIOS calls are performed using the
[swi](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/BABFCEEG.html)
instruction, combined with a value depending on what BIOS function you're trying
to invoke. If you're in 16-bit code you use the value directly, and if you're in
32-bit mode you shift the value up by 16 bits first.
### Example BIOS Function: Division
For our example we'll use the division function, because GBATEK gives very clear
instructions on how each register is used with that one:
```txt
Signed Division, r0/r1.
r0 signed 32bit Number
r1 signed 32bit Denom
Return:
r0 Number DIV Denom ;signed
r1 Number MOD Denom ;signed
r3 ABS (Number DIV Denom) ;unsigned
For example, incoming -1234, 10 should return -123, -4, +123.
The function usually gets caught in an endless loop upon division by zero.
```
The math folks tell me that the `r1` value should be properly called the
"remainder" not the "modulus". We'll go with that for our function, doesn't hurt
to use the correct names. Our Rust function has an assert against dividing by
`0`, then we name some bindings _without_ giving them a value, we make the asm
call, and then return what we got.
```rust
pub fn div_rem(numerator: i32, denominator: i32) -> (i32, i32) {
assert!(denominator != 0);
let div_out: i32;
let rem_out: i32;
unsafe {
asm!(
"swi 0x06",
inout("r0") numerator => div_out,
inout("r1") denominator => rem_out,
out("r3") _,
options(nostack, nomem),
);
}
(div_out, rem_out)
}
```
I _hope_ this all makes sense by now.
## Specific BIOS Functions
For a full list of all the specific BIOS functions and their use you should
check the `gba::bios` module within the `gba` crate. There's just so many of
them that enumerating them all here wouldn't serve much purpose.
Which is not to say that we'll never cover any BIOS functions in this book!
Instead, we'll simply mention them when whenever they're relevent to the task at
hand (such as controlling sound or waiting for vblank).
//TODO: list/name all BIOS functions as well as what they relate to elsewhere.

View file

@ -1,548 +0,0 @@
# Fixed Only
In addition to not having much of the standard library available, we don't even
have a floating point unit available! We can't do floating point math in
hardware! We _could_ still do floating point math as pure software computations
if we wanted, but that's a slow, slow thing to do.
Are there faster ways? It's the same answer as always: "Yes, but not without a
tradeoff."
The faster way is to represent fractional values using a system called a [Fixed
Point Representation](https://en.wikipedia.org/wiki/Fixed-point_arithmetic).
What do we trade away? Numeric range.
* Floating point math stores bits for base value and for exponent all according
to a single [well defined](https://en.wikipedia.org/wiki/IEEE_754) standard
for how such a complicated thing works.
* Fixed point math takes a normal integer (either signed or unsigned) and then
just "mentally associates" it (so to speak) with a fractional value for its
"units". If you have 3 and it's in units of 1/2, then you have 3/2, or 1.5
using decimal notation. If your number is 256 and it's in units of 1/256th
then the value is 1.0 in decimal notation.
Floating point math requires dedicated hardware to perform quickly, but it can
"trade" precision when it needs to represent extremely large or small values.
Fixed point math is just integral math, which our GBA is reasonably good at, but
because your number is associated with a fixed fraction your results can get out
of range very easily.
## Representing A Fixed Point Value
So we want to associate our numbers with a mental note of what units they're in:
* [PhantomData](https://doc.rust-lang.org/core/marker/struct.PhantomData.html)
is a type that tells the compiler "please remember this extra type info" when
you add it as a field to a struct. It goes away at compile time, so it's
perfect for us to use as space for a note to ourselves without causing runtime
overhead.
* The [typenum](https://crates.io/crates/typenum) crate is the best way to
represent a number within a type in Rust. Since our values on the GBA are
always specified as a number of fractional bits to count the number as, we can
put `typenum` types such as `U8` or `U14` into our `PhantomData` to keep track
of what's going on.
Now, those of you who know me, or perhaps just know my reputation, will of
course _immediately_ question what happened to the real Lokathor. I do not care
for most crates, and I particularly don't care for using a crate in teaching
situations. However, `typenum` has a number of factors on its side that let me
suggest it in this situation:
* It's version 1.10 with a total of 21 versions and nearly 700k downloads, so we
can expect that the major troubles have been shaken out and that it will remain
fairly stable for quite some time to come.
* It has no further dependencies that it's going to drag into the compilation.
* It happens all at compile time, so it's not clogging up our actual game with
any nonsense.
* The (interesting) subject of "how do you do math inside Rust's trait system?" is
totally separate from the concern that we're trying to focus on here.
Therefore, we will consider it acceptable to use this crate.
Now the `typenum` crate defines a whole lot, but we'll focus down to just a
single type at the moment:
[UInt](https://docs.rs/typenum/1.10.0/typenum/uint/struct.UInt.html) is a
type-level unsigned value. It's like `u8` or `u16`, but while they're types that
then have values, each `UInt` construction statically equates to a specific
value. Like how the `()` type only has one value, which is also called `()`. In
this case, you wrap up `UInt` around smaller `UInt` values and a `B1` or `B0`
value to build up the binary number that you want at the type level.
In other words, instead of writing
```rust
let six = 0b110;
```
We write
```rust
type U6 = UInt<UInt<UInt<UTerm, B1>, B1>, B0>;
```
Wild, I know. If you look into the `typenum` crate you can do math and stuff
with these type level numbers, and we will a little bit below, but to start off
we _just_ need to store one in some `PhantomData`.
### A struct For Fixed Point
Our actual type for a fixed point value looks like this:
```rust
use core::marker::PhantomData;
use typenum::marker_traits::Unsigned;
/// Fixed point `T` value with `F` fractional bits.
#[derive(Debug, Copy, Clone, Default, PartialEq, Eq, PartialOrd, Ord)]
#[repr(transparent)]
pub struct Fx<T, F: Unsigned> {
bits: T,
_phantom: PhantomData<F>,
}
```
This says that `Fx<T,F>` is a generic type that holds some base number type `T`
and a `F` type that's marking off how many fractional bits we're using. We only
want people giving unsigned type-level values for the `PhantomData` type, so we
use the trait bound `F: Unsigned`.
We use
[repr(transparent)](https://github.com/rust-lang/rfcs/blob/master/text/1758-repr-transparent.md)
here to ensure that `Fx` will always be treated just like the base type in the
final program (in terms of bit pattern and ABI).
If you go and check, this is _basically_ how the existing general purpose crates
for fixed point math represent their numbers. They're a little fancier about it
because they have to cover every case, and we only have to cover our GBA case.
That's quite a bit to type though. We probably want to make a few type aliases
for things to be easier to look at. Unfortunately there's [no standard
notation](https://en.wikipedia.org/wiki/Fixed-point_arithmetic#Notation) for how
you write a fixed point type. We also have to limit ourselves to what's valid
for use in a Rust type too. I like the `fx` thing, so we'll use that for signed
and then `fxu` if we need an unsigned value.
```rust
/// Alias for an `i16` fixed point value with 8 fractional bits.
pub type fx8_8 = Fx<i16,U8>;
```
Rust will complain about having `non_camel_case_types`, and you can shut that
warning up by putting an `#[allow(non_camel_case_types)]` attribute on the type
alias directly, or you can use `#![allow(non_camel_case_types)]` at the very top
of the module to shut up that warning for the whole module (which is what I
did).
## Constructing A Fixed Point Value
So how do we actually _make_ one of these values? Well, we can always just wrap or unwrap any value in our `Fx` type:
```rust
impl<T, F: Unsigned> Fx<T, F> {
/// Uses the provided value directly.
pub fn from_raw(r: T) -> Self {
Fx {
num: r,
phantom: PhantomData,
}
}
/// Unwraps the inner value.
pub fn into_raw(self) -> T {
self.num
}
}
```
I'd like to use the `From` trait of course, but it was giving me some trouble, i
think because of the orphan rule. Oh well.
If we want to be particular to the fact that these are supposed to be
_numbers_... that gets tricky. Rust is actually quite bad at being generic about
number types. You can use the [num](https://crates.io/crates/num) crate, or you
can just use a macro and invoke it once per type. Guess what we're gonna do.
```rust
macro_rules! fixed_point_methods {
($t:ident) => {
impl<F: Unsigned> Fx<$t, F> {
/// Gives the smallest positive non-zero value.
pub fn precision() -> Self {
Fx {
num: 1,
phantom: PhantomData,
}
}
/// Makes a value with the integer part shifted into place.
pub fn from_int_part(i: $t) -> Self {
Fx {
num: i << F::U8,
phantom: PhantomData,
}
}
}
};
}
fixed_point_methods! {u8}
fixed_point_methods! {i8}
fixed_point_methods! {i16}
fixed_point_methods! {u16}
fixed_point_methods! {i32}
fixed_point_methods! {u32}
```
Now _you'd think_ that those can be `const`, but at the moment you can't have a
`const` function with a bound on any trait other than `Sized`, so they have to
be normal functions.
Also, we're doing something a little interesting there with `from_int_part`. We
can take our `F` type and get its constant value. There's other associated
constants if we want it in other types, and also non-const methods if you wanted
that for some reason (maybe passing it as a closure function? dunno).
## Casting Base Values
Next, once we have a value in one base type we will need to be able to move it
into another base type. Unfortunately this means we gotta use the `as` operator,
which requires a concrete source type and a concrete destination type. There's
no easy way for us to make it generic here.
We could let the user use `into_raw`, cast, and then do `from_raw`, but that's
error prone because they might change the fractional bit count accidentally.
This means that we have to write a function that does the casting while
perfectly preserving the fractional bit quantity. If we wrote one function for
each conversion it'd be like 30 different possible casts (6 base types that we
support, and then 5 possible target types). Instead, we'll write it just once in
a way that takes a closure, and let the user pass a closure that does the cast.
The compiler should merge it all together quite nicely for us once optimizations
kick in.
This code goes outside the macro. I want to avoid too much code in the macro if
we can, it's a little easier to cope with I think.
```rust
/// Casts the base type, keeping the fractional bit quantity the same.
pub fn cast_inner<Z, C: Fn(T) -> Z>(self, op: C) -> Fx<Z, F> {
Fx {
num: op(self.num),
phantom: PhantomData,
}
}
```
It's horrible and ugly, but Rust is just bad at numbers sometimes.
## Adjusting Fractional Part
In addition to the base value we might want to change our fractional bit
quantity. This is actually easier that it sounds, but it also requires us to be
tricky with the generics. We can actually use some typenum type level operators
here.
This code goes inside the macro: we need to be able to use the left shift and
right shift, which is easiest when we just use the macro's `$t` as our type. We
could alternately put a similar function outside the macro and be generic on `T`
having the left and right shift operators by using a `where` clause. As much as
I'd like to avoid too much code being generated by macro, I'd _even more_ like
to avoid generic code with huge and complicated trait bounds. It comes down to
style, and you gotta decide for yourself.
```rust
/// Changes the fractional bit quantity, keeping the base type the same.
pub fn adjust_fractional_bits<Y: Unsigned + IsEqual<F, Output = False>>(self) -> Fx<$t, Y> {
let leftward_movement: i32 = Y::to_i32() - F::to_i32();
Fx {
num: if leftward_movement > 0 {
self.num << leftward_movement
} else {
self.num >> (-leftward_movement)
},
phantom: PhantomData,
}
}
```
There's a few things at work. First, we introduce `Y` as the target number of
fractional bits, and we _also_ limit it that the target bits quantity can't be
the same as we already have using a type-level operator. If it's the same as we
started with, why are you doing the cast at all?
Now, once we're sure that the current bits and target bits aren't the same, we
compute `target - start`, and call this our "leftward movement". Example: if
we're targeting 8 bits and we're at 4 bits, we do 8-4 and get +4 as our leftward
movement. If the leftward_movement is positive we naturally shift our current
value to the left. If it's not positive then it _must_ be negative because we
eliminated 0 as a possibility using the type-level operator, so we shift to the
right by the negative value.
## Addition, Subtraction, Shifting, Negative, Comparisons
From here on we're getting help from [this blog
post](https://spin.atomicobject.com/2012/03/15/simple-fixed-point-math/) by [Job
Vranish](https://spin.atomicobject.com/author/vranish/), so thank them if you
learn something.
I might have given away the game a bit with those `derive` traits on our fixed
point type. For a fair number of operations you can use the normal form of the
op on the inner bits as long as the fractional parts have the same quantity.
This includes equality and ordering (which we derived) as well as addition,
subtraction, and bit shifting (which we need to do ourselves).
This code can go outside the macro, with sufficient trait bounds.
```rust
impl<T: Add<Output = T>, F: Unsigned> Add for Fx<T, F> {
type Output = Self;
fn add(self, rhs: Fx<T, F>) -> Self::Output {
Fx {
num: self.num + rhs.num,
phantom: PhantomData,
}
}
}
```
The bound on `T` makes it so that `Fx<T, F>` can be added any time that `T` can
be added to its own type with itself as the output. We can use the exact same
pattern for `Sub`, `Shl`, `Shr`, and `Neg`. With enough trait bounds, we can do
anything!
```rust
impl<T: Sub<Output = T>, F: Unsigned> Sub for Fx<T, F> {
type Output = Self;
fn sub(self, rhs: Fx<T, F>) -> Self::Output {
Fx {
num: self.num - rhs.num,
phantom: PhantomData,
}
}
}
impl<T: Shl<u32, Output = T>, F: Unsigned> Shl<u32> for Fx<T, F> {
type Output = Self;
fn shl(self, rhs: u32) -> Self::Output {
Fx {
num: self.num << rhs,
phantom: PhantomData,
}
}
}
impl<T: Shr<u32, Output = T>, F: Unsigned> Shr<u32> for Fx<T, F> {
type Output = Self;
fn shr(self, rhs: u32) -> Self::Output {
Fx {
num: self.num >> rhs,
phantom: PhantomData,
}
}
}
impl<T: Neg<Output = T>, F: Unsigned> Neg for Fx<T, F> {
type Output = Self;
fn neg(self) -> Self::Output {
Fx {
num: -self.num,
phantom: PhantomData,
}
}
}
```
Unfortunately, for `Shl` and `Shr` to have as much coverage on our type as it
does on the base type (allowing just about any right hand side) we'd have to do
another macro, but I think just `u32` is fine. We can always add more later if
we need.
We could also implement `BitAnd`, `BitOr`, `BitXor`, and `Not`, but they don't
seem relevent to our fixed point math use, and this section is getting long
already. Just use the same general patterns if you want to add it in your own
programs. Shockingly, `Rem` also works directly if you want it, though I don't
forsee us needing floating point remainder. Also, the GBA can't do hardware
division or remainder, and we'll have to work around that below when we
implement `Div` (which maybe we don't need, but it's complex enough I should
show it instead of letting people guess).
**Note:** In addition to the various `Op` traits, there's also `OpAssign`
variants. Each `OpAssign` is the same as `Op`, but takes `&mut self` instead of
`self` and then modifies in place instead of producing a fresh value. In other
words, if you want both `+` and `+=` you'll need to do the `AddAssign` trait
too. It's not the worst thing to just write `a = a+b`, so I won't bother with
showing all that here. It's pretty easy to figure out for yourself if you want.
## Multiplication
This is where things get more interesting. When we have two numbers `A` and `B`
they really stand for `(a*f)` and `(b*f)`. If we write `A*B` then we're really
writing `(a*f)*(b*f)`, which can be rewritten as `(a*b)*2f`, and now it's
obvious that we have one more `f` than we wanted to have. We have to do the
multiply of the inner value and then divide out the `f`. We divide by `1 <<
bit_count`, so if we have 8 fractional bits we'll divide by 256.
The catch is that, when we do the multiply we're _extremely_ likely to overflow
our base type with that multiplication step. Then we do that divide, and now our
result is basically nonsense. We can avoid this to some extent by casting up to
a higher bit type, doing the multiplication and division at higher precision,
and then casting back down. We want as much precision as possible without being
too inefficient, so we'll always cast up to 32-bit (on a 64-bit machine you'd
cast up to 64-bit instead).
Naturally, any signed value has to be cast up to `i32` and any unsigned value
has to be cast up to `u32`, so we'll have to handle those separately.
Also, instead of doing an _actual_ divide we can right-shift by the correct
number of bits to achieve the same effect. _Except_ when we have a signed value
that's negative, because actual division truncates towards zero and
right-shifting truncates towards negative infinity. We can get around _this_ by
flipping the sign, doing the shift, and flipping the sign again (which sounds
silly but it's so much faster than doing an actual division).
Also, again signed values can be annoying, because if the value _just happens_
to be `i32::MIN` then when you negate it you'll have... _still_ a negative
value. I'm not 100% on this, but I think the correct thing to do at that point
is to give `$t::MIN` as the output num value.
Did you get all that? Good, because this involves casting, so we will need to
implement it three times, which calls for another macro.
```rust
macro_rules! fixed_point_signed_multiply {
($t:ident) => {
impl<F: Unsigned> Mul for Fx<$t, F> {
type Output = Self;
fn mul(self, rhs: Fx<$t, F>) -> Self::Output {
let pre_shift = (self.num as i32).wrapping_mul(rhs.num as i32);
if pre_shift < 0 {
if pre_shift == core::i32::MIN {
Fx {
num: core::$t::MIN,
phantom: PhantomData,
}
} else {
Fx {
num: (-((-pre_shift) >> F::U8)) as $t,
phantom: PhantomData,
}
}
} else {
Fx {
num: (pre_shift >> F::U8) as $t,
phantom: PhantomData,
}
}
}
}
};
}
fixed_point_signed_multiply! {i8}
fixed_point_signed_multiply! {i16}
fixed_point_signed_multiply! {i32}
macro_rules! fixed_point_unsigned_multiply {
($t:ident) => {
impl<F: Unsigned> Mul for Fx<$t, F> {
type Output = Self;
fn mul(self, rhs: Fx<$t, F>) -> Self::Output {
Fx {
num: ((self.num as u32).wrapping_mul(rhs.num as u32) >> F::U8) as $t,
phantom: PhantomData,
}
}
}
};
}
fixed_point_unsigned_multiply! {u8}
fixed_point_unsigned_multiply! {u16}
fixed_point_unsigned_multiply! {u32}
```
## Division
Division is similar to multiplication, but reversed. Which makes sense. This
time `A/B` gives `(a*f)/(b*f)` which is `a/b`, one _less_ `f` than we were
after.
As with the multiplication version of things, we have to up-cast our inner value
as much a we can before doing the math, to allow for the most precision
possible.
The snag here is that the GBA has no division or remainder. Instead, the GBA has
a BIOS function you can call to do `i32/i32` division.
This is a potential problem for us though. If we have some unsigned value, we
need it to fit within the positive space of an `i32` _after the multiply_ so
that we can cast it to `i32`, call the BIOS function that only works on `i32`
values, and cast it back to its actual type.
* If you have a u8 you're always okay, even with 8 floating bits.
* If you have a u16 you're okay even with a maximum value up to 15 floating
bits, but having a maximum value and 16 floating bits makes it break.
* If you have a u32 you're probably going to be in trouble all the time.
So... ugh, there's not much we can do about this. For now we'll just have to
suffer some.
// TODO: find a numerics book that tells us how to do `u32/u32` divisions.
```rust
macro_rules! fixed_point_signed_division {
($t:ident) => {
impl<F: Unsigned> Div for Fx<$t, F> {
type Output = Self;
fn div(self, rhs: Fx<$t, F>) -> Self::Output {
let mul_output: i32 = (self.num as i32).wrapping_mul(1 << F::U8);
let divide_result: i32 = crate::bios::div(mul_output, rhs.num as i32);
Fx {
num: divide_result as $t,
phantom: PhantomData,
}
}
}
};
}
fixed_point_signed_division! {i8}
fixed_point_signed_division! {i16}
fixed_point_signed_division! {i32}
macro_rules! fixed_point_unsigned_division {
($t:ident) => {
impl<F: Unsigned> Div for Fx<$t, F> {
type Output = Self;
fn div(self, rhs: Fx<$t, F>) -> Self::Output {
let mul_output: i32 = (self.num as i32).wrapping_mul(1 << F::U8);
let divide_result: i32 = crate::bios::div(mul_output, rhs.num as i32);
Fx {
num: divide_result as $t,
phantom: PhantomData,
}
}
}
};
}
fixed_point_unsigned_division! {u8}
fixed_point_unsigned_division! {u16}
fixed_point_unsigned_division! {u32}
```
## Trigonometry
TODO: look up tables! arcbits!
## Just Using A Crate
If, after seeing all that, and seeing that I still didn't even cover every
possible trait impl that you might want for all the possible types... if after
all that you feel too intimidated, then I'll cave a bit on your behalf and
suggest to you that the [fixed](https://crates.io/crates/fixed) crate seems to
be the best crate available for fixed point math.
_I have not tested its use on the GBA myself_.
It's just my recommendation from looking at the docs of the various options
available, if you really wanted to just have a crate for it.

View file

@ -1,23 +0,0 @@
# Book Goals and Style
So, what's this book actually gonna teach you?
My goal is certainly not just showing off the crate. Programming for the GBA is
weird enough that I'm trying to teach you all the rest of the stuff you need to
know along the way. If I do my job right then you'd be able to write your own
crate for GBA stuff just how you think it should all go by the end.
Overall the book is sorted more for easy review once you're trying to program
something. The GBA has a few things that can stand on their own and many other
things are a mass of interconnected concepts, so some parts of the book end up
having to refer you to portions that you haven't read yet. The chapters and
sections are sorted so that _minimal_ future references are required, but it's
unavoidable that it'll happen sometimes.
The actual "tutorial order" of the book is the
[Examples](../05-examples/00-index.md) chapter. Each section of that chapter
breaks down one of the provided examples in the [examples
directory](https://github.com/rust-console/gba/tree/master/examples) of the
repository. We go over what sections of the book you'll need to have read for
the example code to make sense, and also how we apply the general concepts
described in the book to the specific example cases.

View file

@ -1 +0,0 @@
# Timers

View file

@ -1,133 +0,0 @@
# Direct Memory Access
The GBA has four Direct Memory Access (DMA) units that can be utilized. They're
mostly the same in terms of overall operation, but each unit has special rules
that make it better suited to a particular task.
**Please Note:** TONC and GBATEK have slightly different concepts of how a DMA
unit's registers should be viewed. I've chosen to go by what GBATEK uses.
## General DMA
A single DMA unit is controlled through four different IO Registers.
* **Source:** (`DMAxSAD`, read only) A `*const` pointer that the DMA reads from.
* **Destination:** (`DMAxDAD`, read only) A `*mut` pointer that the DMA writes
to.
* **Count:** (`DMAxCNT_L`, read only) How many transfers to perform.
* **Control:** (`DMAxCNT_H`, read/write) A register full of bit-flags that
controls all sorts of details.
Here, the `x` is replaced with 0 through 3 when utilizing whichever particular
DMA unit.
### Source Address
This is either a `u32` or `u16` address depending on the unit's assigned
transfer mode (see Control). The address MUST be aligned.
With DMA0 the source must be internal memory. With other DMA units the source
can be any non-`SRAM` location.
### Destination Address
As with the Source, this is either a `u32` or `u16` address depending on the
unit's assigned transfer mode (see Control). The address MUST be aligned.
With DMA0/1/2 the destination must be internal memory. With DMA3 the destination
can be any non-`SRAM` memory (allowing writes into Game Pak ROM / FlashROM,
assuming that your Game Pak hardware supports that).
### Count
This is a `u16` that says how many transfers (`u16` or `u32`) to make.
DMA0/1/2 will only actually accept a 14-bit value, while DMA3 will accept a full
16-bit value. A value of 0 instead acts as if you'd used the _maximum_ value for
the DMA in question. Put another way, DMA0/1/2 transfer `1` through `0x4000`
words, with `0` as the `0x4000` value, and DMA3 transfers `1` through `0x1_0000`
words, with `0` as the `0x1_0000` value.
The maximum value isn't a very harsh limit. Even in just `u16` mode, `0x4000`
transfers is 32k, which would for example be all 32k of `IWRAM` (including your
own user stack). If you for some reason do need to transfer more than a single
DMA use can move around at once then you can just setup the DMA a second time
and keep going.
### Control
This `u16` bit-flag field is where things get wild.
* Bits 0-4 do nothing
* Bit 5-6 control how the destination address changes per transfer:
* 0: Offset +1
* 1: Offset -1
* 2: No Change
* 3: Offset +1 and reload when a Repeat starts (below)
* Bit 7-8 similarly control how the source address changes per transfer:
* 0: Offset +1
* 1: Offset -1
* 2: No Change
* 3: Prohibited
* Bit 9: enables Repeat mode.
* Bit 10: Transfer `u16` (false) or `u32` (true) data.
* Bit 11: "Game Pak DRQ" flag. GBATEK says that this is only allowed for DMA3,
and also your Game Pak hardware must be equipped to use DRQ mode. I don't even
know what DRQ mode is all about, and GBATEK doesn't say much either. If DRQ is
set then you _must not_ set the Repeat bit as well. The `gba` crate simply
doesn't bother to expose this flag to users.
* Bit 12-13: DMA Start:
* 0: "Immediate", which is 2 cycles after requested.
* 1: VBlank
* 2: HBlank
* 3: Special, depending on what DMA unit is involved:
* DMA0: Prohibited.
* DMA1/2: Sound FIFO (see the [Sound](04-sound.md) section)
* DMA3: Video Capture, intended for use with the Repeat flag, performs a
transfer per scanline (similar to HBlank) starting at `VCOUNT` 2 and
stopping at `VCOUNT` 162. Intended for copying things from ROM or camera
into VRAM.
* Bit 14: Interrupt upon DMA complete.
* Bit 15: Enable this DMA unit.
## DMA Life Cycle
The general technique for using a DMA unit involves first setting the relevent
source, destination, and count registers, then setting the appropriate control
register value with the Enable bit set.
Once the Enable flag is set the appropriate DMA unit will trigger at the
assigned time (Bit 12-13). The CPU's operation is halted while any DMA unit is
active, until the DMA completes its task. If more than one DMA unit is supposed
to be active at once, then the DMA unit with the lower number will activate and
complete before any others.
When the DMA triggers via _Enable_, the `Source`, `Destination`, and `Count`
values are copied from the GBA's registers into the DMA unit's internal
registers. Changes to the DMA unit's internal copy of the data don't affect the
values in the GBA registers. Another _Enable_ will read the same values as
before.
If DMA is triggered via having _Repeat_ active then _only_ the Count is copied
in to the DMA unit registers. The `Source` and `Destination` are unaffected
during a Repeat. The exception to this is if the destination address control
value (Bits 5-6) are set to 3 (`0b11`), in which case a _Repeat_ will also
re-copy the `Destination` as well as the `Count`.
Once a DMA operation completes, the Enable flag of its Control register will
automatically be disabled, _unless_ the Repeat flag is on, in which case the
Enable flag is left active. You will have to manually disable it if you don't
want the DMA to kick in again over and over at the specified starting time.
## DMA Limitations
The DMA units cannot access `SRAM` at all.
If you're using HBlank to access any part of the memory that the display
controller utilizes (`OAM`, `PALRAM`, `VRAM`), you need to have enabled the
"HBlank Interval Free" bit in the Display Control Register (`DISPCNT`).
Whenever DMA is active the CPU is _not_ active, which means that
[Interrupts](05-interrupts.md) will not fire while DMA is happening. This can
cause any number of hard to track down bugs. Try to limit your use of the DMA
units if you can.

View file

@ -1,317 +0,0 @@
# Volatile Destination
TODO: update this when we can make more stuff `const`
## Volatile Memory
The compiler is an eager friend, so when it sees a read or a write that won't
have an effect, it eliminates that read or write. For example, if we write
```rust
let mut x = 5;
x = 7;
```
The compiler won't actually ever put 5 into `x`. It'll skip straight to putting
7 in `x`, because we never read from `x` when it's 5, so that's a safe change to
make. Normally, values are stored in RAM, which has no side effects when you
read and write from it. RAM is purely for keeping notes about values you'll need
later on.
However, what if we had a bit of hardware where we wanted to do a write and that
did something _other than_ keeping the value for us to look at later? As you saw
in the `hello_magic` example, we have to use a `write_volatile` operation.
Volatile means "just do it anyway". The compiler thinks that it's pointless, but
we know better, so we can force it to really do exactly what we say by using
`write_volatile` instead of `write`.
This is kinda error prone though, right? Because it's just a raw pointer, so we
might forget to use `write_volatile` at some point.
Instead, we want a type that's always going to use volatile reads and writes.
Also, we want a pointer type that lets our reads and writes to be as safe as
possible once we've unsafely constructed the initial value.
### Constructing The VolAddress Type
First, we want a type that stores a location within the address space. This can
be a pointer, or a `usize`, and we'll use a `usize` because that's easier to
work with in a `const` context (and we want to have `const` when we can get it).
We'll also have our type use `NonZeroUsize` instead of just `usize` so that
`Option<VolAddress<T>>` stays as a single machine word. This helps quite a bit
when we want to iterate over the addresses of a block of memory (such as
locations within the palette memory). Hardware is never at the null address
anyway. Also, if we had _just_ an address number then we wouldn't be able to
track what type the address is for. We need some
[PhantomData](https://doc.rust-lang.org/core/marker/struct.PhantomData.html),
and specifically we need the phantom data to be for `*mut T`:
* If we used `*const T` that'd have the wrong
[variance](https://doc.rust-lang.org/nomicon/subtyping.html).
* If we used `&mut T` then that's fusing in the ideas of _lifetime_ and
_exclusive access_ to our type. That's potentially important, but that's also
an abstraction we'll build _on top of_ this `VolAddress` type if we need it.
One abstraction layer at a time, so we start with just a phantom pointer. This gives us a type that looks like this:
```rust
#[derive(Debug)]
#[repr(transparent)]
pub struct VolAddress<T> {
address: NonZeroUsize,
marker: PhantomData<*mut T>,
}
```
Now, because of how `derive` is specified, it derives traits _if the generic
parameter_ supports those traits. Since our type is like a pointer, the traits
it supports are distinct from whatever traits the target type supports. So we'll
provide those implementations manually.
```rust
impl<T> Clone for VolAddress<T> {
fn clone(&self) -> Self {
*self
}
}
impl<T> Copy for VolAddress<T> {}
impl<T> PartialEq for VolAddress<T> {
fn eq(&self, other: &Self) -> bool {
self.address == other.address
}
}
impl<T> Eq for VolAddress<T> {}
impl<T> PartialOrd for VolAddress<T> {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
Some(self.address.cmp(&other.address))
}
}
impl<T> Ord for VolAddress<T> {
fn cmp(&self, other: &Self) -> Ordering {
self.address.cmp(&other.address)
}
}
```
Boilerplate junk, not interesting. There's a reason that you derive those traits
99% of the time in Rust.
### Constructing A VolAddress Value
Okay so here's the next core concept: If we unsafely _construct_ a
`VolAddress<T>`, then we can safely _use_ the value once it's been properly
created.
```rust
// you'll need these features enabled and a recent nightly
#![feature(const_int_wrapping)]
#![feature(min_const_unsafe_fn)]
impl<T> VolAddress<T> {
pub const unsafe fn new_unchecked(address: usize) -> Self {
VolAddress {
address: NonZeroUsize::new_unchecked(address),
marker: PhantomData,
}
}
pub const unsafe fn cast<Z>(self) -> VolAddress<Z> {
VolAddress {
address: self.address,
marker: PhantomData,
}
}
pub unsafe fn offset(self, offset: isize) -> Self {
VolAddress {
address: NonZeroUsize::new_unchecked(self.address.get().wrapping_add(offset as usize * core::mem::size_of::<T>())),
marker: PhantomData,
}
}
}
```
So what are the unsafety rules here?
* Non-null, obviously.
* Must be aligned for `T`
* Must always produce valid bit patterns for `T`
* Must not be part of the address space that Rust's stack or allocator will ever
uses.
So, again using the `hello_magic` example, we had
```rust
(0x400_0000 as *mut u16).write_volatile(0x0403);
```
And instead we could declare
```rust
const MAGIC_LOCATION: VolAddress<u16> = unsafe { VolAddress::new(0x400_0000) };
```
### Using A VolAddress Value
Now that we've named the magic location, we want to write to it.
```rust
impl<T> VolAddress<T> {
pub fn read(self) -> T
where
T: Copy,
{
unsafe { (self.address.get() as *mut T).read_volatile() }
}
pub unsafe fn read_non_copy(self) -> T {
(self.address.get() as *mut T).read_volatile()
}
pub fn write(self, val: T) {
unsafe { (self.address.get() as *mut T).write_volatile(val) }
}
}
```
So if the type is `Copy` we can `read` it as much as we want. If, somehow, the
type isn't `Copy`, then it might be `Drop`, and that means if we read out a
value over and over we could cause the `drop` method to trigger UB. Since the
end user might really know what they're doing, we provide an unsafe backup
`read_non_copy`.
On the other hand, we can `write` to the location as much as we want. Even if
the type isn't `Copy`, _not running `Drop` is safe_, so a `write` is always
safe.
Now we can write to our magical location.
```rust
MAGIC_LOCATION.write(0x0403);
```
### VolAddress Iteration
We've already seen that sometimes we want to have a base address of some sort
and then offset from that location to another. What if we wanted to iterate over
_all the locations_. That's not particularly hard.
```rust
impl<T> VolAddress<T> {
pub const unsafe fn iter_slots(self, slots: usize) -> VolAddressIter<T> {
VolAddressIter { vol_address: self, slots }
}
}
#[derive(Debug)]
pub struct VolAddressIter<T> {
vol_address: VolAddress<T>,
slots: usize,
}
impl<T> Clone for VolAddressIter<T> {
fn clone(&self) -> Self {
VolAddressIter {
vol_address: self.vol_address,
slots: self.slots,
}
}
}
impl<T> PartialEq for VolAddressIter<T> {
fn eq(&self, other: &Self) -> bool {
self.vol_address == other.vol_address && self.slots == other.slots
}
}
impl<T> Eq for VolAddressIter<T> {}
impl<T> Iterator for VolAddressIter<T> {
type Item = VolAddress<T>;
fn next(&mut self) -> Option<Self::Item> {
if self.slots > 0 {
let out = self.vol_address;
unsafe {
self.slots -= 1;
self.vol_address = self.vol_address.offset(1);
}
Some(out)
} else {
None
}
}
}
impl<T> FusedIterator for VolAddressIter<T> {}
```
### VolAddressBlock
Obviously, having a base address and a length exist separately is error prone.
There's a good reason for slices to keep their pointer and their length
together. We want something like that, which we'll call a "block" because
"array" and "slice" are already things in Rust.
```rust
#[derive(Debug)]
pub struct VolAddressBlock<T> {
vol_address: VolAddress<T>,
slots: usize,
}
impl<T> Clone for VolAddressBlock<T> {
fn clone(&self) -> Self {
VolAddressBlock {
vol_address: self.vol_address,
slots: self.slots,
}
}
}
impl<T> PartialEq for VolAddressBlock<T> {
fn eq(&self, other: &Self) -> bool {
self.vol_address == other.vol_address && self.slots == other.slots
}
}
impl<T> Eq for VolAddressBlock<T> {}
impl<T> VolAddressBlock<T> {
pub const unsafe fn new_unchecked(vol_address: VolAddress<T>, slots: usize) -> Self {
VolAddressBlock { vol_address, slots }
}
pub const fn iter(self) -> VolAddressIter<T> {
VolAddressIter {
vol_address: self.vol_address,
slots: self.slots,
}
}
pub unsafe fn index_unchecked(self, slot: usize) -> VolAddress<T> {
self.vol_address.offset(slot as isize)
}
pub fn index(self, slot: usize) -> VolAddress<T> {
if slot < self.slots {
unsafe { self.vol_address.offset(slot as isize) }
} else {
panic!("Index Requested: {} >= Bound: {}", slot, self.slots)
}
}
pub fn get(self, slot: usize) -> Option<VolAddress<T>> {
if slot < self.slots {
unsafe { Some(self.vol_address.offset(slot as isize)) }
} else {
None
}
}
}
```
Now we can have something like:
```rust
const OTHER_MAGIC: VolAddressBlock<u16> = unsafe {
VolAddressBlock::new_unchecked(
VolAddress::new(0x600_0000),
240 * 160
)
};
OTHER_MAGIC.index(120 + 80 * 240).write_volatile(0x001F);
OTHER_MAGIC.index(136 + 80 * 240).write_volatile(0x03E0);
OTHER_MAGIC.index(120 + 96 * 240).write_volatile(0x7C00);
```
### Docs?
If you wanna see these types and methods with a full docs write up you should
check the GBA crate's source.

View file

@ -1,28 +0,0 @@
# Work RAM
## External Work RAM (EWRAM)
* **Address Span:** `0x2000000` to `0x203FFFF` (256k)
This is a big pile of space, the use of which is up to each game. However, the
external work ram has only a 16-bit bus (if you read/write a 32-bit value it
silently breaks it up into two 16-bit operations) and also 2 wait cycles (extra
CPU cycles that you have to expend _per 16-bit bus use_).
It's most helpful to think of EWRAM as slower, distant memory, similar to the
"heap" in a normal application. You can take the time to go store something
within EWRAM, or to load it out of EWRAM, but if you've got several operations
to do in a row and you're worried about time you should pull that value into
local memory, work on your local copy, and then push it back out to EWRAM.
## Internal Work RAM (IWRAM)
* **Address Span:** `0x3000000` to `0x3007FFF` (32k)
This is a smaller pile of space, but it has a 32-bit bus and no wait.
By default, `0x3007F00` to `0x3007FFF` is reserved for interrupt and BIOS use.
The rest of it is mostly up to you. The user's stack space starts at `0x3007F00`
and proceeds _down_ from there. For best results you should probably start at
`0x3000000` and then go upwards. Under normal use it's unlikely that the two
memory regions will crash into each other.

View file

@ -1,3 +0,0 @@
# IO Registers
* **Address Span:** `0x400_0000` to `0x400_03FE`

View file

@ -1,206 +0,0 @@
# Newtype
TODO: we've already used newtype twice by now (fixed point values and volatile
addresses), so we need to adjust how we start this section.
There's a great Zero Cost abstraction that we'll be using a lot that you might
not already be familiar with: we're talking about the "Newtype Pattern"!
Now, I told you to read the Rust Book before you read this book, and I'm sure
you're all good students who wouldn't sneak into this book without doing the
required reading, so I'm sure you all remember exactly what I'm talking about,
because they touch on the newtype concept in the book twice, in two _very_ long
named sections:
* [Using the Newtype Pattern to Implement External Traits on External
Types](https://doc.rust-lang.org/book/ch19-03-advanced-traits.html#using-the-newtype-pattern-to-implement-external-traits-on-external-types)
* [Using the Newtype Pattern for Type Safety and
Abstraction](https://doc.rust-lang.org/book/ch19-04-advanced-types.html#using-the-newtype-pattern-for-type-safety-and-abstraction)
...Yeah... The Rust Book doesn't know how to make a short sub-section name to
save its life. Shame.
## Newtype Basics
So, we have all these pieces of data, and we want to keep them separated, and we
don't wanna pay the cost for it at runtime. Well, we're in luck, we can pay the
cost at compile time.
```rust
pub struct PixelColor(u16);
```
TODO: we've already talked about repr(transparent) by now
Ah, except that, as I'm sure you remember from [The
Rustonomicon](https://doc.rust-lang.org/nomicon/other-reprs.html#reprtransparent)
(and from the RFC too, of course), if we have a single field struct that's
sometimes different from having just the bare value, so we should be using
`#[repr(transparent)]` with our newtypes.
```rust
#[repr(transparent)]
pub struct PixelColor(u16);
```
And then we'll need to do that same thing for _every other newtype we want_.
Except there's only two tiny parts that actually differ between newtype
declarations: the new name and the base type. All the rest is just the same rote
code over and over. Generating piles and piles of boilerplate code? Sounds like
a job for a macro to me!
## Making It A Macro
If you're going to do much with macros you should definitely read through [The
Little Book of Rust
Macros](https://danielkeep.github.io/tlborm/book/index.html), but we won't be
doing too much so you can just follow along here a bit if you like.
The most basic version of a newtype macro starts like this:
```rust
#[macro_export]
macro_rules! newtype {
($new_name:ident, $old_name:ident) => {
#[repr(transparent)]
pub struct $new_name($old_name);
};
}
```
The `#[macro_export]` makes it exported by the current module (like `pub`
kinda), and then we have one expansion option that takes an identifier, a `,`,
and then a second identifier. The new name is the outer type we'll be using, and
the old name is the inner type that's being wrapped. You'd use our new macro
something like this:
```rust
newtype! {PixelColorCurly, u16}
newtype!(PixelColorParens, u16);
newtype![PixelColorBrackets, u16];
```
Note that you can invoke the macro with the outermost grouping as any of `()`,
`[]`, or `{}`. It makes no particular difference to the macro. Also, that space
in the first version is kinda to show off that you can put white space in
between the macro name and the grouping if you want. The difference is mostly
style, but there are some rules and considerations here:
* If you use curly braces then you _must not_ put a `;` after the invocation.
* If you use parentheses or brackets then you _must_ put the `;` at the end.
* Rustfmt cares which you use and formats accordingly:
* Curly brace macro use mostly gets treated like a code block.
* Parentheses macro use mostly gets treated like a function call.
* Bracket macro use mostly gets treated like an array declaration.
**As a reminder:** remember that `macro_rules` macros have to appear _before_
they're invoked in your source, so the `newtype` macro will always have to be at
the very top of your file, or if you put it in a module within your project
you'll need to declare the module before anything that uses it.
## Upgrade That Macro!
We also want to be able to add `derive` stuff and doc comments to our newtype.
Within the context of `macro_rules!` definitions these are called "meta". Since
we can have any number of them we wrap it all up in a "zero or more" matcher.
Then our macro looks like this:
```rust
#[macro_export]
macro_rules! newtype {
($(#[$attr:meta])* $new_name:ident, $old_name:ident) => {
$(#[$attr])*
#[repr(transparent)]
pub struct $new_name($old_name);
};
}
```
So now we can write
```rust
newtype! {
/// Color on the GBA gives 5 bits for each channel, the highest bit is ignored.
#[derive(Debug, Clone, Copy)]
PixelColor, u16
}
```
Next, we can allow for the wrapping of types that aren't just a single
identifier by changing `$old_name` from `:ident` to `:ty`. We can't _also_ do
this for the `$new_type` part because declaring a new struct expects a valid
identifier that's _not_ already declared (obviously), and `:ty` is intended for
capturing types that already exist.
```rust
#[macro_export]
macro_rules! newtype {
($(#[$attr:meta])* $new_name:ident, $old_name:ty) => {
$(#[$attr])*
#[repr(transparent)]
pub struct $new_name($old_name);
};
}
```
Next of course we'll want to usually have a `new` method that's const and just
gives a 0 value. We won't always be making a newtype over a number value, but we
often will. It's usually silly to have a `new` method with no arguments since we
might as well just impl `Default`, but `Default::default` isn't `const`, so
having `pub const fn new() -> Self` is justified here.
Here, the token `0` is given the `{integer}` type, which can be converted into
any of the integer types as needed, but it still can't be converted into an
array type or a pointer or things like that. Accordingly we've added the "no
frills" option which declares the struct and no `new` method.
```rust
#[macro_export]
macro_rules! newtype {
($(#[$attr:meta])* $new_name:ident, $old_name:ty) => {
$(#[$attr])*
#[repr(transparent)]
pub struct $new_name($old_name);
impl $new_name {
/// A `const` "zero value" constructor
pub const fn new() -> Self {
$new_name(0)
}
}
};
($(#[$attr:meta])* $new_name:ident, $old_name:ty, no frills) => {
$(#[$attr])*
#[repr(transparent)]
pub struct $new_name($old_name);
};
}
```
Finally, we usually want to have the wrapped value be totally private, but there
_are_ occasions where that's not the case. For this, we can allow the wrapped
field to accept a visibility modifier.
```rust
#[macro_export]
macro_rules! newtype {
($(#[$attr:meta])* $new_name:ident, $v:vis $old_name:ty) => {
$(#[$attr])*
#[repr(transparent)]
pub struct $new_name($v $old_name);
impl $new_name {
/// A `const` "zero value" constructor
pub const fn new() -> Self {
$new_name(0)
}
}
};
($(#[$attr:meta])* $new_name:ident, $v:vis $old_name:ty, no frills) => {
$(#[$attr])*
#[repr(transparent)]
pub struct $new_name($v $old_name);
};
}
```

View file

@ -1 +0,0 @@
# Sound

View file

@ -1,130 +0,0 @@
# Constant Assertions
Have you ever wanted to assert things _even before runtime_? We all have, of
course. Particularly when the runtime machine is a poor little GBA, we'd like to
have the machine doing the compile handle as much checking as possible.
Enter the [static assertions](https://docs.rs/static_assertions/) crate, which
provides a way to let you assert on a `const` expression.
This is an amazing crate that you should definitely use when you can.
It's written by [Nikolai Vazquez](https://github.com/nvzqz), and they kindly
wrote up a [blog
post](https://nikolaivazquez.com/posts/programming/rust-static-assertions/) that
explains the thinking behind it.
However, I promised that each example would be single file, and I also promised
to explain what's going on as we go, so we'll briefly touch upon giving an
explanation here.
## How We Const Assert
Alright, as it stands (2018-12-15), we can't use `if` in a `const` context.
Since we can't use `if`, we can't use a normal `assert!`. Some day it will be
possible, and a failed assert at compile time will be a compile error and a
failed assert at run time will be a panic and we'll have a nice unified
programming experience. We can add runtime-only assertions by being a little
tricky with the compiler.
If we write
```rust
const ASSERT: usize = 0 - 1;
```
that gives a warning, since the math would underflow. We can upgrade that
warning to a hard error:
```rust
#[deny(const_err)]
const ASSERT: usize = 0 - 1;
```
And to make our construction reusable we can enable the
[underscore_const_names](https://github.com/rust-lang/rust/issues/54912) feature
in our program (or library) and then give each such const an underscore for a
name.
```rust
#![feature(underscore_const_names)]
#[deny(const_err)]
const _: usize = 0 - 1;
```
Now we wrap this in a macro where we give a `bool` expression as input. We
negate the bool then cast it to a `usize`, meaning that `true` negates into
`false`, which becomes `0usize`, and then there's no underflow error. Or if the
input was `false`, it negates into `true`, then becomes `1usize`, and then the
underflow error fires.
```rust
macro_rules! const_assert {
($condition:expr) => {
#[deny(const_err)]
#[allow(dead_code)]
const ASSERT: usize = 0 - !$condition as usize;
}
}
```
Technically, written like this, the expression can be anything with a
`core::ops::Not` implementation that can also be `as` cast into `usize`. That's
`bool`, but also basically all the other number types. Since we want to ensure
that we get proper looking type errors when things go wrong, we can use
`($condition && true)` to enforce that we get a `bool` (thanks to `Talchas` for
that particular suggestion).
```rust
macro_rules! const_assert {
($condition:expr) => {
#[deny(const_err)]
#[allow(dead_code)]
const _: usize = 0 - !($condition && true) as usize;
}
}
```
## Asserting Something
As an example of how we might use a `const_assert`, we'll do a demo with colors.
There's a red, blue, and green channel. We store colors in a `u16` with 5 bits
for each channel.
```rust
newtype! {
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
Color, u16
}
```
And when we're building a color, we're passing in `u16` values, but they could
be using more than just 5 bits of space. We want to make sure that each channel
is 31 or less, so we can make a color builder that does a `const_assert!` on the
value of each channel.
```rust
macro_rules! rgb {
($r:expr, $g:expr, $b:expr) => {
{
const_assert!($r <= 31);
const_assert!($g <= 31);
const_assert!($b <= 31);
Color($b << 10 | $g << 5 | $r)
}
}
}
```
And then we can declare some colors
```rust
const RED: Color = rgb!(31, 0, 0);
const BLUE: Color = rgb!(31, 500, 0);
```
The second one is clearly out of bounds and it fires an error just like we
wanted.

View file

@ -1,78 +0,0 @@
# Help and Resources
## Help
So you're stuck on a problem and the book doesn't say what to do. Where can you
find out more?
The first place I would suggest is the [Rust Community
Discord](https://discordapp.com/invite/aVESxV8). If it's a general Rust question
then you can ask anyone in any channel you feel is appropriate. If it's GBA
specific then you can try asking me (`Lokathor`) or `Ketsuban` in the `#gamedev`
channel.
## Emulators
You certainly might want to eventually write a game that you can put on a flash
cart and play on real hardware, but for most of your development you'll probably
want to be using an emulator for testing, because you don't have to fiddle with
cables and all that.
In terms of emulators, you want to be using
[mGBA](https://github.com/mgba-emu/mgba), and you want to be using the [0.7 Beta
1](https://github.com/mgba-emu/mgba/releases/tag/0.7-b1) or later. This update
lets you run raw ELF files, which means that you can have full debug symbols
available while you're debugging problems.
## Information Resources
First, if I fail to describe something related to Rust, you can always try
checking in [The Rust
Reference](https://doc.rust-lang.org/nightly/reference/introduction.html) to see
if they cover it. You can mostly ignore that big scary red banner at the top,
things are a lot better documented than they make it sound.
If you need help trying to fiddle your math down as hard as you can, there are
resources such as the [Bit Twiddling
Hacks](https://graphics.stanford.edu/~seander/bithacks.html) page.
As to GBA related lore, Ketsuban and I didn't magically learn this all from
nowhere, we read various technical manuals and guides ourselves and then
distilled those works oriented around C and C++ into a book for Rust.
We have personally used some or all of the following:
* [GBATEK](http://problemkaputt.de/gbatek.htm): This is _the_ resource. It
covers not only the GBA, but also the DS and DSi, and also a run down of ARM
assembly (32-bit and 16-bit opcodes). The link there is to the 2.9b version on
`problemkaputt.de` (the official home of the document), but if you just google
for gbatek the top result is for the 2.5 version on `akkit.org`, so make sure
you're looking at the newest version. Sometimes `problemkaputt.de` is a little
sluggish so I've also [mirrored](https://lokathor.com/gbatek.html) the 2.9b
version on my own site as well. GBATEK is rather large, over 2mb of text, so
if you're on a phone or similar you might want to save an offline copy to go
easy on your data usage.
* [TONC](https://www.coranac.com/tonc/text/): While GBATEK is basically just a
huge tech specification, TONC is an actual _guide_ on how to make sense of the
GBA's abilities and organize it into a game. It's written for C of course, but
as a Rust programmer you should always be practicing your ability to read C
code anyway. It's the programming equivalent of learning Latin because all the
old academic books are written in Latin.
* [CowBite](https://www.cs.rit.edu/~tjh8300/CowBite/CowBiteSpec.htm): This is
more like GBATEK, and it's less complete, but it mixes in a little more
friendly explanation of things in between the hardware spec parts.
And I haven't had time to look at it myself, [The Audio
Advance](http://belogic.com/gba/) seems to be very good. It explains in depth
how you can get audio working on the GBA. Note that the table of contents for
each page goes along the top instead of down the side.
## Non-Rust GBA Community
There's also the [GBADev.org](http://www.gbadev.org/) site, which has a forum
and everything. They're coding in C and C++, but you can probably overcome that
difference with a little work on your part.
I also found a place called
[GBATemp](https://gbatemp.net/categories/nintendo-gba-discussions.32/), which
seems to have a more active forum but less of a focus on actual coding.

View file

@ -1 +0,0 @@
# Interrupts

View file

@ -1,50 +0,0 @@
# Palette RAM (PALRAM)
* **Address Span:** `0x500_0000` to `0x500_03FF` (1k)
Palette RAM has a 16-bit bus, which isn't really a problem because it
conceptually just holds `u16` values. There's no automatic wait state, but if
you try to access the same location that the display controller is accessing you
get bumped by 1 cycle. Since the display controller can use the palette ram any
number of times per scanline it's basically impossible to predict if you'll have
to do a wait or not during VDraw. During VBlank you won't have any wait of
course.
PALRAM is among the memory where there's weirdness if you try to write just one
byte: if you try to write just 1 byte, it writes that byte into _both_ parts of
the larger 16-bit location. This doesn't really affect us much with PALRAM,
because palette values are all supposed to be `u16` anyway.
The palette memory actually contains not one, but _two_ sets of palettes. First
there's 256 entries for the background palette data (starting at `0x500_0000`),
and then there's 256 entries for object palette data (starting at `0x500_0200`).
The GBA also has two modes for palette access: 8-bits-per-pixel (8bpp) and
4-bits-per-pixel (4bpp).
* In 8bpp mode an 8-bit palette index value within a background or sprite
simply indexes directly into the 256 slots for that type of thing.
* In 4bpp mode a 4-bit palette index value within a background or sprite
specifies an index within a particular "palbank" (16 palette entries each),
and then a _separate_ setting outside of the graphical data determines which
palbank is to be used for that background or object (the screen entry data for
backgrounds, and the object attributes for objects).
### Transparency
When a pixel within a background or object specifies index 0 as its palette
entry it is treated as a transparent pixel. This means that in 8bpp mode there's
only 255 actual color options (0 being transparent), and in 4bpp mode there's
only 15 actual color options available within each palbank (the 0th entry of
_each_ palbank is transparent).
Individual backgrounds, and individual objects, each determine if they're 4bpp
or 8bpp separately, so a given overall palette slot might map to a used color in
8bpp and an unused/transparent color in 4bpp. If you're a palette wizard.
Palette slot 0 of the overall background palette is used to determine the
"backdrop" color. That's the color you see if no background or object ends up
being rendered within a given pixel.
Since display mode 3 and display mode 5 don't use the palette, they cannot
benefit from transparency.

View file

@ -1 +0,0 @@
# Link Cable

View file

@ -1,24 +0,0 @@
# Video RAM (VRAM)
* **Address Span:** `0x600_0000` to `0x601_7FFF` (96k)
We've used this before! VRAM has a 16-bit bus and no wait. However, the same as
with PALRAM, the "you might have to wait if the display controller is looking at
it" rule applies here.
Unfortunately there's not much more exact detail that can be given about VRAM.
The use of the memory depends on the video mode that you're using.
One general detail of note is that you can't write individual bytes to any part
of VRAM. Depending on mode and location, you'll either get your bytes doubled
into both the upper and lower parts of the 16-bit location targeted, or you
won't even affect the memory. This usually isn't a big deal, except in two
situations:
* In Mode 4, if you want to change just 1 pixel, you'll have to be very careful
to read the old `u16`, overwrite just the byte you wanted to change, and then
write that back.
* In any display mode, avoid using `memcopy` to place things into VRAM.
It's written to be byte oriented, and only does 32-bit transfers under select
conditions. The rest of the time it'll copy one byte at a time and you'll get
either garbage or nothing at all.

View file

@ -1 +0,0 @@
# Game Pak

View file

@ -1,62 +0,0 @@
# Object Attribute Memory (OAM)
* **Address Span:** `0x700_0000` to `0x700_03FF` (1k)
The Object Attribute Memory has a 32-bit bus and no default wait, but suffers
from the "you might have to wait if the display controller is looking at it"
rule. You cannot write individual bytes to OAM at all, but that's not really a
problem because all the fields of the data types within OAM are either `i16` or
`u16` anyway.
Object attribute memory is the wildest yet: it conceptually contains two types
of things, but they're _interlaced_ with each other all the way through.
Now, [GBATEK](http://problemkaputt.de/gbatek.htm#lcdobjoamattributes) and
[CowByte](https://www.cs.rit.edu/~tjh8300/CowBite/CowBiteSpec.htm#OAM%20(sprites))
doesn't quite give names to the two data types here.
[TONC](https://www.coranac.com/tonc/text/regobj.htm#sec-oam) calls them
`OBJ_ATTR` and `OBJ_AFFINE`, but we'll be giving them names fitting with the
Rust naming convention. Just know that if you try to talk about it with others
they might not be using the same names. In Rust terms their layout would look
like this:
```rust
#[repr(C)]
pub struct ObjectAttributes {
attr0: u16,
attr1: u16,
attr2: u16,
filler: i16,
}
#[repr(C)]
pub struct AffineMatrix {
filler0: [u16; 3],
pa: i16,
filler1: [u16; 3],
pb: i16,
filler2: [u16; 3],
pc: i16,
filler3: [u16; 3],
pd: i16,
}
```
(Note: the `#[repr(C)]` part just means that Rust must lay out the data exactly
in the order we specify, which otherwise it is not required to do).
So, we've got 1024 bytes in OAM and each `ObjectAttributes` value is 8 bytes, so
naturally we can support up to 128 objects.
_At the same time_, we've got 1024 bytes in OAM and each `AffineMatrix` is 32
bytes, so we can have 32 of them.
But, as I said, these things are all _interlaced_ with each other. See how
there's "filler" fields in each struct? If we imagine the OAM as being just an
array of one type or the other, indexes 0/1/2/3 of the `ObjectAttributes` array
would line up with index 0 of the `AffineMatrix` array. It's kinda weird, but
that's just how it works. When we setup functions to read and write these values
we'll have to be careful with how we do it. We probably _won't_ want to use
those representations above, at least not with the `AffineMatrix` type, because
they're quite wasteful if you want to store just object attributes or just
affine matrices.

View file

@ -1,14 +0,0 @@
# Game Pak ROM / Flash ROM (ROM)
* **Address Span (Wait State 0):** `0x800_0000` to `0x9FF_FFFF`
* **Address Span (Wait State 1):** `0xA00_0000` to `0xBFF_FFFF`
* **Address Span (Wait State 2):** `0xC00_0000` to `0xDFF_FFFF`
The game's ROM data is a single set of data that's up to 32 megabytes in size.
However, that data is mirrored to three different locations in the address
space. Depending on which part of the address space you use, it can affect the
memory timings involved.
TODO: describe `WAITCNT` here, we won't get a better chance at it.
TODO: discuss THUMB vs ARM code and why THUMB is so much faster (because ROM is a 16-bit bus)

View file

@ -1,21 +0,0 @@
# Save RAM (SRAM)
* **Address Span:** `0xE00_0000` to `0xE00FFFF` (64k)
The actual amount of SRAM available depends on your game pak, and the 64k figure
is simply the maximum possible. A particular game pak might have less, and an
emulator will likely let you have all 64k if you want.
As with other portions of the address space, SRAM has some number of wait cycles
per use. As with ROM, you can change the wait cycle settings via the `WAITCNT`
register if the defaults don't work well for your game pak. See the ROM section
for full details of how the `WAITCNT` register works.
The game pak SRAM also has only an 8-bit bus, so have fun with that.
The GBA Direct Memory Access (DMA) unit cannot access SRAM.
Also, you [should not write to SRAM with code executing from
ROM](https://problemkaputt.de/gbatek.htm#gbacartbackupsramfram). Instead, you
should move the code to WRAM and execute the save code from there. We'll cover
how to handle that eventually.

File diff suppressed because it is too large Load diff

View file

@ -1,52 +0,0 @@
# Ch 3: Memory and Objects
Alright so we can do some basic "movement", but we left a big trail in the video
memory of everywhere we went. Most of the time that's not what we want at all.
If we want more hardware support we're going to have to use a new video mode. So
far we've only used Mode 3, but modes 4 and 5 are basically the same. Instead,
we'll switch focus to using a tiled graphical mode.
First we will go over the complete GBA memory mapping. Part of this is the
memory for tiled graphics, but also things like all those IO registers, where
our RAM is for scratch space, all that stuff. Even if we can't put all of them
to use at once, it's helpful to have an idea of what will be available in the
long run.
Tiled modes bring us three big new concepts that each have their own complexity:
tiles, backgrounds, and objects. Backgrounds and objects both use tiles, but the
background is for creating a very large static space that you can scroll around
the view within, and the objects are about having a few moving bits that appear
over the background. Careful use of backgrounds and objects is key to having the
best looking GBA game, so we won't even be able to cover it all in a single
chapter.
And, of course, since most games are pretty boring if they're totally static
we'll touch on the kinds of RNG implementations you might want to have on a GBA.
Most general purpose RNGs that you find are rather big compared to the amount of
memory we want to give them, and they often use a lot of `u64` operations, so
they end up much slower on a 32-bit machine like the GBA (you can lower 64-bit
ops to combinations of 32-bit ops, but that's quite a bit more work). We'll
cover a few RNG options that size down the RNG to a good size and a good speed
without trading away too much in terms of quality.
To top it all off, we'll make a simple "memory game" sort of thing. There's some
face down cards in a grid, you pick one to check, then you pick the other to
check, and then if they match the pair disappears.
## Drawing Priority
Both backgrounds and objects can have "priority" values associated with them.
TONC and GBATEK have _opposite_ ideas of what it means to have the "highest"
priority. TONC goes by highest numerical value, and GBATEK goes by what's on the
z-layer closest to the user. Let's list out the rules as clearly as we can:
* Priority is always two bits, so 0 through 3.
* Priority conceptually proceeds in drawing passes that count _down_, so any
priority 3 things can get covered up by priority 2 things. In truth there's
probably depth testing and buffering stuff going on so it's all one single
pass, but conceptually we will imagine it happening as all of the 3 elements,
then all of 2, and so on.
* Objects always draw over top of backgrounds of equal priority.
* Within things of the same type and priority, the lower numbered element "wins"
and gets its pixel drawn (bg0 is favored over bg1, obj0 is favored over obj1,
etc).

View file

@ -1,33 +0,0 @@
# IO Registers
The GBA has a large number of **IO Registers** (not to be confused with CPU
registers). These are special memory locations from `0x04000000` to
`0x040003FE`. GBATEK has a [full
list](http://problemkaputt.de/gbatek.htm#gbaiomap), but we only need to learn
about a few of them at a time as we go, so don't be worried.
The important facts to know about IO Registers are these:
* Each has their own specific size. Most are `u16`, but some are `u32`.
* All of them must be accessed in a `volatile` style.
* Each register is specifically readable or writable or both. Actually, with
some registers there are even individual bits that are read-only or
write-only.
* If you write to a read-only position, those writes are simply ignored. This
mostly matters if a writable register contains a read-only bit (such as the
Display Control, next section).
* If you read from a write-only position, you get back values that are
[basically
nonsense](http://problemkaputt.de/gbatek.htm#gbaunpredictablethings). There
aren't really any registers that mix writable bits with read only bits, so
you're basically safe here. The only (mild) concern is that when you write a
value into a write-only register you need to keep track of what you wrote
somewhere else if you want to know what you wrote (such to adjust an offset
value by +1, or whatever).
* You can always check GBATEK to be sure, but if I don't mention it then a bit
is probably both read and write.
* Some registers have invalid bit patterns. For example, the lowest three bits
of the Display Control register can't legally be set to the values 6 or 7.
When talking about bit positions, the numbers are _zero indexed_ just like an
array index is.

View file

@ -1,135 +0,0 @@
# light_cycle
Now let's make a game of "light_cycle" with our new knowledge.
## Gameplay
`light_cycle` is pretty simple, and very obvious if you've ever seen Tron. The
player moves around the screen with a trail left behind them. They die if they
go off the screen or if they touch their own trail.
## Operations
We need some better drawing operations this time around.
```rust
pub unsafe fn mode3_clear_screen(color: u16) {
let color = color as u32;
let bulk_color = color << 16 | color;
let mut ptr = VolatilePtr(VRAM as *mut u32);
for _ in 0..SCREEN_HEIGHT {
for _ in 0..(SCREEN_WIDTH / 2) {
ptr.write(bulk_color);
ptr = ptr.offset(1);
}
}
}
pub unsafe fn mode3_draw_pixel(col: isize, row: isize, color: u16) {
VolatilePtr(VRAM as *mut u16).offset(col + row * SCREEN_WIDTH).write(color);
}
pub unsafe fn mode3_read_pixel(col: isize, row: isize) -> u16 {
VolatilePtr(VRAM as *mut u16).offset(col + row * SCREEN_WIDTH).read()
}
```
The draw pixel and read pixel are both pretty obvious. What's new is the clear
screen operation. It changes the `u16` color into a `u32` and then packs the
value in twice. Then we write out `u32` values the whole way through screen
memory. This means we have to do less write operations overall, and so the
screen clear is twice as fast.
Now we just have to fill in the main function:
```rust
#[start]
fn main(_argc: isize, _argv: *const *const u8) -> isize {
unsafe {
DISPCNT.write(MODE3 | BG2);
}
let mut px = SCREEN_WIDTH / 2;
let mut py = SCREEN_HEIGHT / 2;
let mut color = rgb16(31, 0, 0);
loop {
// read the input for this frame
let this_frame_keys = key_input();
// adjust game state and wait for vblank
px += 2 * this_frame_keys.column_direction() as isize;
py += 2 * this_frame_keys.row_direction() as isize;
wait_until_vblank();
// draw the new game and wait until the next frame starts.
unsafe {
if px < 0 || py < 0 || px == SCREEN_WIDTH || py == SCREEN_HEIGHT {
// out of bounds, reset the screen and position.
mode3_clear_screen(0);
color = color.rotate_left(5);
px = SCREEN_WIDTH / 2;
py = SCREEN_HEIGHT / 2;
} else {
let color_here = mode3_read_pixel(px, py);
if color_here != 0 {
// crashed into our own line, reset the screen
mode3_clear_screen(0);
color = color.rotate_left(5);
} else {
// draw the new part of the line
mode3_draw_pixel(px, py, color);
mode3_draw_pixel(px, py + 1, color);
mode3_draw_pixel(px + 1, py, color);
mode3_draw_pixel(px + 1, py + 1, color);
}
}
}
wait_until_vdraw();
}
}
```
Oh that's a lot more than before!
First we set Mode 3 and Background 2, we know about that.
Then we're going to store the player's x and y, along with a color value for
their light cycle. Then we enter the core loop.
We read the keys for input, and then do as much as we can without touching video
memory. Since we're using video memory as the place to store the player's light
trail, we can't do much, we just update their position and wait for VBlank to
start. The player will be a 2x2 square, so the arrows will move you 2 pixels per
frame.
Once we're in VBlank we check to see what kind of drawing we're doing. If the
player has gone out of bounds, we clear the screen, rotate their color, and then
reset their position. Why rotate the color? Just because it's fun to have
different colors.
Next, if the player is in bounds we read the video memory for their position. If
it's not black that means we've been here before and the player has crashed into
their own line. In this case, we reset the game without moving them to a new
location.
Finally, if the player is in bounds and they haven't crashed, we write their
color into memory at this position.
Regardless of how it worked out, we hold here until vdraw starts before going to
the next loop. That's all there is to it.
## The gba crate doesn't quite work like this
Once again, as with the `hello1` and `hello2` examples, the `gba` crate covers
much of this same ground as our example here, but in slightly different ways.
Better organization and abstractions are usually only realized once you've used
more of the whole thing you're trying to work with. If we want to have a crate
where the whole thing is well integrated with itself, then the examples would
also end up having to explain about things we haven't really touched on much
yet. It becomes a lot harder to teach.
So, going forward, we will continue to teach concepts and build examples that
don't directly depend on the `gba` crate. This allows the crate to freely grow
without all the past examples becoming a great inertia upon it.

View file

@ -1,316 +0,0 @@
# Making A Memory Game
For this example to show off our new skills we'll make a "memory" game. The idea
is that there's some face down cards and you pick one, it flips, you pick a
second, if they match they both go away, if they don't match they both turn back
face down. The player keeps going until all the cards are gone, then we'll deal
the cards again.
There are many steps to do to get such a simple seeming game going. In fact I
stumbled a bit myself when trying to get things set up and going despite having
written and explained all the parts so far. Accordingly, we'll take each part
very slowly, and review things as we build up our game.
We'll start back with a nearly blank file, calling it `memory_game.rs`:
```rust
#![feature(start)]
#![no_std]
#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
loop {}
}
#[start]
fn main(_argc: isize, _argv: *const *const u8) -> isize {
loop {
// TODO the whole thing
}
}
```
## Displaying A Background
First let's try to get a background going. We'll display a simple checker
pattern just so that we know that we did something.
Remember, backgrounds have the following essential components:
* Background Palette
* Background Tiles
* Screenblock
* IO Registers
### Background Palette
To write to the background palette memory we'll want to name a `VolatilePtr` for
it. We'll probably also want to be able to cast between different types either
right away or later in this program, so we'll add a method for that.
```rust
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
#[repr(transparent)]
pub struct VolatilePtr<T>(pub *mut T);
impl<T> VolatilePtr<T> {
pub unsafe fn read(&self) -> T {
core::ptr::read_volatile(self.0)
}
pub unsafe fn write(&self, data: T) {
core::ptr::write_volatile(self.0, data);
}
pub fn offset(self, count: isize) -> Self {
VolatilePtr(self.0.wrapping_offset(count))
}
pub fn cast<Z>(self) -> VolatilePtr<Z> {
VolatilePtr(self.0 as *mut Z)
}
}
```
Now we give ourselves an easy way to write a color into a palbank slot.
```rust
pub const BACKGROUND_PALETTE: VolatilePtr<u16> = VolatilePtr(0x500_0000 as *mut u16);
pub fn set_bg_palette_4bpp(palbank: usize, slot: usize, color: u16) {
assert!(palbank < 16);
assert!(slot > 0 && slot < 16);
unsafe {
BACKGROUND_PALETTE
.cast::<[u16; 16]>()
.offset(palbank as isize)
.cast::<u16>()
.offset(slot as isize)
.write(color);
}
}
```
And of course we need to bring back in our ability to build color values, as
well as a few named colors to start us off:
```rust
pub const fn rgb16(red: u16, green: u16, blue: u16) -> u16 {
blue << 10 | green << 5 | red
}
pub const WHITE: u16 = rgb16(31, 31, 31);
pub const LIGHT_GRAY: u16 = rgb16(25, 25, 25);
pub const DARK_GRAY: u16 = rgb16(15, 15, 15);
```
Which _finally_ allows us to set our palette colors in `main`:
```rust
fn main(_argc: isize, _argv: *const *const u8) -> isize {
set_bg_palette_4bpp(0, 1, WHITE);
set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
set_bg_palette_4bpp(0, 3, DARK_GRAY);
```
### Background Tiles
So we'll want some light gray tiles and some dark gray tiles. We could use a
single tile and then swap it between palbanks to do the color selection, but for
now we'll just use two different tiles, since we've got tons of tile space to
spare.
```rust
#[derive(Debug, Clone, Copy, Default)]
#[repr(transparent)]
pub struct Tile4bpp {
pub data: [u32; 8],
}
pub const ALL_TWOS: Tile4bpp = Tile4bpp {
data: [
0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222,
],
};
pub const ALL_THREES: Tile4bpp = Tile4bpp {
data: [
0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333,
],
};
```
And then we have to have a way to put the tiles into video memory:
```rust
#[derive(Clone, Copy)]
#[repr(transparent)]
pub struct Charblock4bpp {
pub data: [Tile4bpp; 512],
}
pub const VRAM: VolatilePtr<Charblock4bpp> = VolatilePtr(0x0600_0000 as *mut Charblock4bpp);
pub fn set_bg_tile_4bpp(charblock: usize, index: usize, tile: Tile4bpp) {
assert!(charblock < 4);
assert!(index < 512);
unsafe { VRAM.offset(charblock as isize).cast::<Tile4bpp>().offset(index as isize).write(tile) }
}
```
And finally, we can call that within `main`:
```rust
fn main(_argc: isize, _argv: *const *const u8) -> isize {
// bg palette
set_bg_palette_4bpp(0, 1, WHITE);
set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
set_bg_palette_4bpp(0, 3, DARK_GRAY);
// bg tiles
set_bg_tile_4bpp(0, 0, ALL_TWOS);
set_bg_tile_4bpp(0, 1, ALL_THREES);
```
### Setup A Screenblock
Screenblocks are a little weird because they take the same space as the
charblocks (8 screenblocks per charblock). The GBA will let you mix and match
and it's up to you to keep it all straight. We're using tiles at the base of
charblock 0, so we'll place our screenblock at the base of charblock 1.
First, we have to be able to make one single screenblock entry at a time:
```rust
#[derive(Debug, Clone, Copy, Default)]
#[repr(transparent)]
pub struct RegularScreenblockEntry(u16);
impl RegularScreenblockEntry {
pub const SCREENBLOCK_ENTRY_TILE_ID_MASK: u16 = 0b11_1111_1111;
pub const fn from_tile_id(id: u16) -> Self {
RegularScreenblockEntry(id & Self::SCREENBLOCK_ENTRY_TILE_ID_MASK)
}
}
```
And then with 32x32 of these things we'll have a whole screenblock. Now, we
probably won't actually make values of the screenblock type itself, but we at
least need it to have the type declared with the correct size so that we can
move our pointers around by the right amount.
```rust
#[derive(Clone, Copy)]
#[repr(transparent)]
pub struct RegularScreenblock {
pub data: [RegularScreenblockEntry; 32 * 32],
}
```
Alright, so, as I said those things are kinda big, we don't really want to be
building them up on the stack if we can avoid it, so we'll write one straight
into memory at the correct location.
```rust
pub fn checker_screenblock(slot: usize, a_entry: RegularScreenblockEntry, b_entry: RegularScreenblockEntry) {
let mut p = VRAM.cast::<RegularScreenblock>().offset(slot as isize).cast::<RegularScreenblockEntry>();
let mut checker = true;
for _row in 0..32 {
for _col in 0..32 {
unsafe { p.write(if checker { a_entry } else { b_entry }) };
p = p.offset(1);
checker = !checker;
}
checker = !checker;
}
}
```
And then we add this into `main`
```rust
// screenblock
let light_entry = RegularScreenblockEntry::from_tile_id(0);
let dark_entry = RegularScreenblockEntry::from_tile_id(1);
checker_screenblock(8, light_entry, dark_entry);
```
### Background IO Registers
Our most important step is of course the IO register step. There's four
different background layers, but each of them has the same format for their
control register. For the moment, all that we care about is being able to set
the "screen base block" value.
```rust
#[derive(Clone, Copy, Default, PartialEq, Eq)]
#[repr(transparent)]
pub struct BackgroundControlSetting(u16);
impl BackgroundControlSetting {
pub const SCREEN_BASE_BLOCK_MASK: u16 = 0b1_1111;
pub const fn from_base_block(sbb: u16) -> Self {
BackgroundControlSetting((sbb & Self::SCREEN_BASE_BLOCK_MASK) << 8)
}
}
pub const BG0CNT: VolatilePtr<BackgroundControlSetting> = VolatilePtr(0x400_0008 as *mut BackgroundControlSetting);
```
And... that's all it takes for us to be able to add a line into `main`
```rust
// bg0 control
unsafe { BG0CNT.write(BackgroundControlSetting::from_base_block(8)) };
```
### Set The Display Control Register
We're finally ready to set the display control register and get things going.
We've slightly glossed over it so far, but when the GBA is first booted most
everything within the address space will be all zeroed. However, the display
control register has the "Force VBlank" bit enabled by the BIOS, giving you a
moment to put the memory in place that you'll need for the first frame.
So, now that have got all of our memory set, we'll overwrite the initial
display control register value with what we'll call "just enable bg0".
```rust
#[derive(Clone, Copy, Default, PartialEq, Eq)]
#[repr(transparent)]
pub struct DisplayControlSetting(u16);
impl DisplayControlSetting {
pub const JUST_ENABLE_BG0: DisplayControlSetting = DisplayControlSetting(1 << 8);
}
pub const DISPCNT: VolatilePtr<DisplayControlSetting> = VolatilePtr(0x0400_0000 as *mut DisplayControlSetting);
```
And so finally we have a complete `main`
```rust
#[start]
fn main(_argc: isize, _argv: *const *const u8) -> isize {
// bg palette
set_bg_palette_4bpp(0, 1, WHITE);
set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
set_bg_palette_4bpp(0, 3, DARK_GRAY);
// bg tiles
set_bg_tile_4bpp(0, 0, ALL_TWOS);
set_bg_tile_4bpp(0, 1, ALL_THREES);
// screenblock
let light_entry = RegularScreenblockEntry::from_tile_id(0);
let dark_entry = RegularScreenblockEntry::from_tile_id(1);
checker_screenblock(8, light_entry, dark_entry);
// bg0 control
unsafe { BG0CNT.write(BackgroundControlSetting::from_base_block(8)) };
// Display Control
unsafe { DISPCNT.write(DisplayControlSetting::JUST_ENABLE_BG0) };
loop {
// TODO the whole thing
}
}
```
And _It works, Marty! It works!_
![screenshot_checkers](screenshot_checkers.png)
We've got more to go, but we're well on our way.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 147 KiB

View file

@ -1,313 +0,0 @@
# Regular Backgrounds
So, backgrounds, they're cool. Why do we call the ones here "regular"
backgrounds? Because there's also "affine" backgrounds. However, affine math
stuff adds a complication, so for now we'll just work with regular backgrounds.
The non-affine backgrounds are sometimes called "text mode" backgrounds by other
guides.
To get your background image working you generally need to perform all of the
following steps, though I suppose the exact ordering is up to you.
## Tiled Video Modes
When you want regular tiled display, you must use video mode 0 or 1.
* Mode 0 allows for using all four BG layers (0 through 3) as regular
backgrounds.
* Mode 1 allows for using BG0 and BG1 as regular backgrounds, BG2 as an affine
background, and BG3 not at all.
* Mode 2 allows for BG2 and BG3 to be used as affine backgrounds, while BG0 and
BG1 cannot be used at all.
We will not cover affine backgrounds in this chapter, so we will naturally be
using video mode 0.
Also, note that you have to enable each background layer that you want to use
within the display control register.
## Get Your Palette Ready
Background palette starts at `0x5000000` and is 256 `u16` values long. It'd
potentially be possible declare a static array starting at a fixed address and
use a linker script to make sure that it ends up at the right spot in the final
program, but since we have to use volatile reads and writes with PALRAM anyway,
we'll just reuse our `VolatilePtr` type. Something like this:
```rust
pub const PALRAM_BG_BASE: VolatilePtr<u16> = VolatilePtr(0x500_0000 as *mut u16);
pub fn bg_palette(slot: usize) -> u16 {
assert!(slot < 256);
unsafe { PALRAM_BG_BASE.offset(slot as isize).read() }
}
pub fn set_bg_palette(slot: usize, color: u16) {
assert!(slot < 256);
unsafe { PALRAM_BG_BASE.offset(slot as isize).write(color) }
}
```
As we discussed with the tile color depths, the palette can be utilized as a
single block of palette values (`[u16; 256]`) or as 16 palbanks of 16 palette
values each (`[[u16;16]; 16]`). This setting is assigned per background layer
via IO register.
## Get Your Tiles Ready
Tile data is placed into charblocks. A charblock is always 16kb, so depending on
color depth it will have either 256 or 512 tiles within that charblock.
Charblocks 0, 1, 2, and 3 are all for background tiles. That's a maximum of 2048
tiles for backgrounds, but as you'll see in a moment a particular tilemap entry
can't even index that high. Instead, each background layer is assigned a
"character base block", and then tilemap entries index relative to the character
base block of that background layer.
Now, if you want to move in a lot of tile data you'll probably want to use a DMA
routine, or at least write a function like memcopy32 for fast `u32` copying from
ROM into VRAM. However, for now, and because we're being very explicit since
this is our first time doing it, we'll write it as functions for individual tile
reads and writes.
The math works like indexing a pointer, except that we have two sizes we need to
go by. First you take the base address for VRAM (`0x600_0000`), then add the
size of a charblock (16kb) times the charblock you want to place the tile
within, and then you add the index of the tile slot you're placing it into times
the size of that type of tile. Like this:
```rust
pub fn bg_tile_4bpp(base_block: usize, tile_index: usize) -> Tile4bpp {
assert!(base_block < 4);
assert!(tile_index < 512);
let address = VRAM + size_of::<Charblock4bpp>() * base_block + size_of::<Tile4bpp>() * tile_index;
unsafe { VolatilePtr(address as *mut Tile4bpp).read() }
}
pub fn set_bg_tile_4bpp(base_block: usize, tile_index: usize, tile: Tile4bpp) {
assert!(base_block < 4);
assert!(tile_index < 512);
let address = VRAM + size_of::<Charblock4bpp>() * base_block + size_of::<Tile4bpp>() * tile_index;
unsafe { VolatilePtr(address as *mut Tile4bpp).write(tile) }
}
pub fn bg_tile_8bpp(base_block: usize, tile_index: usize) -> Tile8bpp {
assert!(base_block < 4);
assert!(tile_index < 256);
let address = VRAM + size_of::<Charblock8bpp>() * base_block + size_of::<Tile8bpp>() * tile_index;
unsafe { VolatilePtr(address as *mut Tile8bpp).read() }
}
pub fn set_bg_tile_8bpp(base_block: usize, tile_index: usize, tile: Tile8bpp) {
assert!(base_block < 4);
assert!(tile_index < 256);
let address = VRAM + size_of::<Charblock8bpp>() * base_block + size_of::<Tile8bpp>() * tile_index;
unsafe { VolatilePtr(address as *mut Tile8bpp).write(tile) }
}
```
For bulk operations, you'd do the exact same math to get your base destination
pointer, and then you'd get the base source pointer for the tile you're copying
out of ROM, and then you'd do the bulk copy for the correct number of `u32`
values that you're trying to move (8 per tile moved for 4bpp, or 16 per tile
moved for 8bpp).
**GBA Limitation Note:** on a modern PC (eg: `x86` or `x86_64`) you're probably
used to index based loops and iterator based loops being the same speed. The CPU
has the ability to do a "fused multiply add", so the base address of the array
plus desired index * size per element is a single CPU operation to compute. It's
slightly more complicated if there's arrays within arrays like there are here,
but with normal arrays it's basically the same speed to index per loop cycle as
it is to take a base address and then add +1 offset per loop cycle. However, the
GBA's CPU _can't do any of that_. On the GBA, there's a genuine speed difference
between looping over indexes and then indexing each loop (slow) compared to
using an iterator that just stores an internal pointer and does +1 offset per
loop until it reaches the end (fast). The repeated indexing itself can by itself
be an expensive step. If it's like a 3 element array it's no big deal, but if
you've got a big slice of data to process, be sure to go over it with `.iter()`
and `.iter_mut()` if you can, instead of looping by index. This is Rust and all,
so probably you were gonna do that anyway, but just a heads up.
## Get your Tilemap ready
I believe that at one point I alluded to a tilemap existing. Well, just as the
tiles are arranged into charblocks, the data describing what tile to show in
what location is arranged into a thing called a **screenblock**.
A screenblock is placed into VRAM the same as the tile data charblocks. Starting
at the base of VRAM (`0x600_0000`) there are 32 slots for the screenblock array.
Each screenblock is 2048 bytes (`0x800`). Naturally, if our tiles are using up
charblock space within VRAM and our tilemaps are using up screenblock space
within the same VRAM... well it would just be a _disaster_ if they ran in to
each other. Once again, it's up to you as the programmer to determine how much
space you want to devote to each thing. Each complete charblock uses up 8
screenblocks worth of space, but you don't have to fill a complete charblock
with tiles, so you can be very fiddly with how you split the memory.
Each screenblock is composed of a series of _screenblock entry_ values, which
describe what tile index to use and if the tile should be flipped and what
palbank it should use (if any). Because both regular backgrounds and affine
backgrounds are composed of screenblocks with entries, and because the affine
background has a smaller format for screenblock entries, we'll name
appropriately.
```rust
#[derive(Clone, Copy)]
#[repr(transparent)]
pub struct RegularScreenblock {
pub data: [RegularScreenblockEntry; 32 * 32],
}
#[derive(Debug, Clone, Copy, Default)]
#[repr(transparent)]
pub struct RegularScreenblockEntry(u16);
```
So, with one entry per tile, a single screenblock allows for 32x32 tiles worth of
background.
The format of a regular screenblock entry is quite simple compared to some of
the IO register stuff:
* 10 bits for tile index (base off of the character base block of the background)
* 1 bit for horizontal flip
* 1 bit for vertical flip
* 4 bits for picking which palbank to use (if 4bpp, otherwise it's ignored)
```rust
impl RegularScreenblockEntry {
pub fn tile_id(self) -> u16 {
self.0 & 0b11_1111_1111
}
pub fn set_tile_id(&mut self, id: u16) {
self.0 &= !0b11_1111_1111;
self.0 |= id;
}
pub fn horizontal_flip(self) -> bool {
(self.0 & (1 << 0xA)) > 0
}
pub fn set_horizontal_flip(&mut self, bit: bool) {
if bit {
self.0 |= 1 << 0xA;
} else {
self.0 &= !(1 << 0xA);
}
}
pub fn vertical_flip(self) -> bool {
(self.0 & (1 << 0xB)) > 0
}
pub fn set_vertical_flip(&mut self, bit: bool) {
if bit {
self.0 |= 1 << 0xB;
} else {
self.0 &= !(1 << 0xB);
}
}
pub fn palbank_index(self) -> u16 {
self.0 >> 12
}
pub fn set_palbank_index(&mut self, palbank_index: u16) {
self.0 &= 0b1111_1111_1111;
self.0 |= palbank_index << 12;
}
}
```
Now, at either 256 or 512 tiles per charblock, you might be thinking that with a
10 bit index you can index past the end of one charblock and into the next.
You'd be right, mostly.
As long as you stay within the background memory region for charblocks (that is,
0 through 3), then it all works out. However, if you try to get the background
rendering to reach outside of the background charblocks you'll get an
implementation defined result. It's not the dreaded "undefined behavior" we're
often worried about in programming, but the results _are_ determined by what
you're running the game on. With GBA hardware you get a bizarre result
(basically another way to put garbage on the screen). With a DS it acts as if
the tiles were all 0s. If you use an emulator it might or might not allow for
you to do this, it's up to the emulator writers.
## Set Your IO Registers
Instead of being just a single IO register to learn about this time, there's two
separate groups of related registers.
### Background Control
* BG0CNT (`0x400_0008`): BG0 Control
* BG1CNT (`0x400_000A`): BG1 Control
* BG2CNT (`0x400_000C`): BG2 Control
* BG3CNT (`0x400_000E`): BG3 Control
Each of these are a read/write `u16` location. This is where we get to all of
the important details that we've been putting off.
* 2 bits for the priority.
* 2 bits for "character base block", the charblock that all of the tile indexes
for this background are offset from.
* 1 bit for mosaic effect being enabled (we'll get to that below).
* 1 bit to enable 8bpp, otherwise 4bpp is used.
* 5 bits to pick the "screen base block", the screen block that serves as the
_base_ value for this background.
* 1 bit that is _not_ used in regular mode, but in affine mode it can be enabled
to cause the affine background to wrap around at the edges.
* 2 bits for the background size.
The size works a little funny. When size is 0 only the base screen block is
used. If size is 1 or 2 then the base screenblock and the following screenblock
are placed next to each other (horizontally for 1, vertically for 2). If the
size is 3 then the base screenblock and the following three screenblocks are
arranged into a 2x2 grid of screenblocks.
### Background Offset
* BG0HOFS (`0x400_0010`): BG0 X-Offset
* BG0VOFS (`0x400_0012`): BG0 Y-Offset
* BG1HOFS (`0x400_0014`): BG1 X-Offset
* BG1VOFS (`0x400_0016`): BG1 Y-Offset
* BG2HOFS (`0x400_0018`): BG2 X-Offset
* BG2VOFS (`0x400_001A`): BG2 Y-Offset
* BG3HOFS (`0x400_001C`): BG3 X-Offset
* BG3VOFS (`0x400_001E`): BG3 Y-Offset
Each of these are a _write only_ `u16` location. Bits 0 through 8 are used, so
the offsets can be 0 through 511. They also only apply in regular backgrounds.
If a background is in an affine state then you'll use different IO registers to
control it (discussed in a later chapter).
The offset that you assign determines the pixel offset of the display area
relative to the start of the background scene, as if the screen was a camera
looking at the scene. In other words, as a BG X offset value increases, you can
think of it as the camera moving to the right, or as that background moving to
the left. Like when mario walks toward the goal. Similarly, when a BG Y offset
increases the camera is moving down, or the background is moving up, like when
mario falls down from a high platform.
Depending on how much the background is scrolled and the size of the background,
it will loop.
## Mosaic
As a special effect, you can apply mosaic to backgrounds and objects. It's just
a single flag for each background, so all backgrounds will use the same mosaic
settings when they have it enabled. What it actually does is split the normal
image into "blocks" and then each block gets the color of the top left pixel of
that block. This is the effect you see when link hits an electric foe with his
sword and the whole screen "buzzes" at you.
The mosaic control is a _write only_ `u16` IO register at `0x400_004C`.
There's 4 bits each for:
* Horizontal BG stretch
* Vertical BG stretch
* Horizontal object stretch
* Vertical object stretch
The inputs should be 1 _less_ than the desired block size. So if you set a
stretch value of 5 then pixels 0-5 would be part of the first block (6 pixels),
then 6-11 is the next block (another 6 pixels) and so on.
If you need to make a pixel other than the top left part of each block the one
that determines the mosaic color you can carefully offset the background or
image by a tiny bit, but of course that makes every mosaic block change its
target pixel. You can't change the target pixel on a block by block basis.

View file

@ -1,417 +0,0 @@
# Regular Objects
As with backgrounds, objects can be used in both an affine and non-affine way.
For this section we'll focus on the non-affine elements, and then we'll do all
the affine stuff in a later chapter.
## Objects vs Sprites
As [TONC](https://www.coranac.com/tonc/text/regobj.htm) helpfully reminds us
(and then proceeds to not follow its own advice), we should always try to think
in terms of _objects_, not _sprites_. A sprite is a logical / software concern,
perhaps a player concern, whereas an object is a hardware concern.
What's more, a given sprite that the player sees might need more than one object
to display. Objects must be either square or rectangular (so sprite bits that
stick out probably call for a second object), and can only be from 8x8 to 64x64
(so anything bigger has to be two objects lined up to appear as one).
## General Object Info
Unlike with backgrounds, you can enable the object layer in any video mode.
There's space for 128 object definitions in OAM.
The display gets a number of cycles per scanline to process objects: 1210 by
default, but only 954 if you enable the "HBlank interval free" setting in the
display control register. The [cycle cost per
object](http://problemkaputt.de/gbatek.htm#lcdobjoverview) depends on the
object's size and if it's using affine or regular mode, so enabling the HBlank
interval free setting doesn't cut the number of objects displayable by an exact
number of objects. The objects are processed in order of their definitions and
if you run out of cycles then the rest just don't get shown. If there's a
concern that you might run out of cycles you can place important objects (such
as the player) at the start of the list and then less important animation
objects later on.
## Ready the Palette
Objects use the palette the same as the background does. The only difference is
that the palette data for objects starts at `0x500_0200`.
```rust
pub const PALRAM_OBJECT_BASE: VolatilePtr<u16> = VolatilePtr(0x500_0200 as *mut u16);
pub fn object_palette(slot: usize) -> u16 {
assert!(slot < 256);
unsafe { PALRAM_OBJECT_BASE.offset(slot as isize).read() }
}
pub fn set_object_palette(slot: usize, color: u16) {
assert!(slot < 256);
unsafe { PALRAM_OBJECT_BASE.offset(slot as isize).write(color) }
}
```
## Ready the Tiles
Objects, as with backgrounds, are composed of 8x8 tiles, and if you want
something bigger than 8x8 you have to use more than one tile put together.
Object tiles go into the final two charblocks of VRAM (indexes 4 and 5). Because
there's only two of them, they are sometimes called the lower block
(`0x601_0000`) and the higher/upper block (`0x601_4000`).
Tile indexes for sprites always offset from the base of the lower block, and
they always go 32 bytes at a time, regardless of if the object is set for 4bpp
or 8bpp. From this we can determine that there's 512 tile slots in each of the
two object charblocks. However, in video modes 3, 4, and 5 the space for the
background cuts into the lower charblock, so you can only safely use the upper
charblock.
```rust
pub fn obj_tile_4bpp(tile_index: usize) -> Tile4bpp {
assert!(tile_index < 512);
let address = VRAM + size_of::<Charblock4bpp>() * 4 + 32 * tile_index;
unsafe { VolatilePtr(address as *mut Tile4bpp).read() }
}
pub fn set_obj_tile_4bpp(tile_index: usize, tile: Tile4bpp) {
assert!(tile_index < 512);
let address = VRAM + size_of::<Charblock4bpp>() * 4 + 32 * tile_index;
unsafe { VolatilePtr(address as *mut Tile4bpp).write(tile) }
}
pub fn obj_tile_8bpp(tile_index: usize) -> Tile8bpp {
assert!(tile_index < 512);
let address = VRAM + size_of::<Charblock8bpp>() * 4 + 32 * tile_index;
unsafe { VolatilePtr(address as *mut Tile8bpp).read() }
}
pub fn set_obj_tile_8bpp(tile_index: usize, tile: Tile8bpp) {
assert!(tile_index < 512);
let address = VRAM + size_of::<Charblock8bpp>() * 4 + 32 * tile_index;
unsafe { VolatilePtr(address as *mut Tile8bpp).write(tile) }
}
```
With backgrounds you picked every single tile individually with a bunch of
screen entry values. Objects don't do that at all. Instead you pick a base tile,
size, and shape, then it figures out the rest from there. However, you may
recall back with the display control register something about an "object memory
1d" bit. This is where that comes into play.
* If object memory is set to be 2d (the default) then each charblock is treated
as 32 tiles by 32 tiles square. Each object has a base tile and dimensions,
and that just extracts directly from the charblock picture as if you were
selecting an area. This mode probably makes for the easiest image editing.
* If object memory is set to be 1d then the tiles are loaded sequentially from
the starting point, enough to fill in the object's dimensions. This most
probably makes it the easiest to program with about things, since programming
languages are pretty good at 1d things.
I'm not sure I explained that well, here's a picture:
![2d1d-diagram](obj_memory_2d1d.jpg)
In 2d mode, a new row of tiles starts every 32 tile indexes.
Of course, the mode that you actually end up using is not particularly
important, since it should be the job of your image conversion routine to get
everything all lined up and into place anyway.
## Set the Object Attributes
The final step is to assign the correct attributes to an object. Each object has
three `u16` values that make up its overall attributes.
Before we go into the details, I want to bring up that the hardware will attempt
to process every single object every single frame if the object layer is
enabled, and also that all of the GBA's object memory is cleared to 0 at
startup. Why do these two things matter right now? As you'll see in a second an
"all zero" set of object attributes causes an 8x8 object to appear at 0,0 using
object tile index 0. This is usually _not_ what you want your unused objects to
do. When your game first starts you should take a moment to mark any objects you
won't be using as objects to not render.
### ObjectAttributes.attr0
* 8 bits for row coordinate (marks the top of the sprite)
* 2 bits for object rendering: 0 = Normal, 1 = Affine, 2 = Disabled, 3 = Affine with double rendering area
* 2 bits for object mode: 0 = Normal, 1 = Alpha Blending, 2 = Object Window, 3 = Forbidden
* 1 bit for mosaic enabled
* 1 bit 8bpp color enabled
* 2 bits for shape: 0 = Square, 1 = Horizontal, 2 = Vertical, 3 = Forbidden
If an object is 128 pixels big at Y > 128 you'll get a strange looking result
where it acts like Y > -128 and then displays partly off screen to the top.
### ObjectAttributes.attr1
* 9 bit for column coordinate (marks the left of the sprite)
* Either:
* 3 empty bits, 1 bit for horizontal flip, 1 bit for vertical flip (non-affine)
* 5 bits for affine index (affine)
* 2 bits for size.
| Size | Square | Horizontal | Vertical|
|:----:|:------:|:----------:|:-------:|
| 0 | 8x8 | 16x8 | 8x16 |
| 1 | 16x16 | 32x8 | 8x32 |
| 2 | 32x32 | 32x16 | 16x32 |
| 3 | 64x64 | 64x32 | 32x64 |
### ObjectAttributes.attr2
* 10 bits for the base tile index
* 2 bits for priority
* 4 bits for the palbank index (4bpp mode only, ignored in 8bpp)
### ObjectAttributes summary
So I said in the GBA memory mapping section that C people would tell you that
the object attributes should look like this:
```rust
#[repr(C)]
pub struct ObjectAttributes {
attr0: u16,
attr1: u16,
attr2: u16,
filler: i16,
}
```
Except that:
1) It's wasteful when we store object attributes on their own outside of OAM
(which we definitely might want to do).
2) In Rust we can't access just one field through a volatile pointer (our
pointers aren't actually volatile to begin with, just the ops we do with them
are). We have to read or write the whole pointer's value at a time.
Similarly, we can't do things like `|=` and `&=` with volatile in Rust. So in
rust we can't have a volatile pointer to an ObjectAttributes and then write
to just the three "real" values and not touch the filler field. Having the
filler value in there just means we have to dance around it more, not less.
3) We want to newtype this whole thing to prevent accidental invalid states from
being written into memory.
So we will not be using that representation. At the same time we want to have no
overhead, so we will stick to three `u16` values. We could newtype each
individual field to be its own type (`ObjectAttributesAttr0` or something silly
like that), since there aren't actual dependencies between two different fields
such that a change in one can throw another into a forbidden state. The worst
that can happen is if we disable or enable affine mode (`attr0`) it can change
the meaning of `attr1`. The changed meaning isn't actually in invalid state
though, so we _could_ make each field its own type if we wanted.
However, when you think about it, I can't imagine a common situation where we do
something like make an `attr0` value that we then want to save on its own and
apply to several different `ObjectAttributes` that we make during a game. That
just doesn't sound likely to me. So, we'll go the route where `ObjectAttributes`
is just a big black box to the outside world and we don't need to think about
the three fields internally as being separate.
First we make it so that we can get and set object attributes from memory:
```rust
pub const OAM: usize = 0x700_0000;
pub fn object_attributes(slot: usize) -> ObjectAttributes {
assert!(slot < 128);
let ptr = VolatilePtr((OAM + slot * (size_of::<u16>() * 4)) as *mut u16);
unsafe {
ObjectAttributes {
attr0: ptr.read(),
attr1: ptr.offset(1).read(),
attr2: ptr.offset(2).read(),
}
}
}
pub fn set_object_attributes(slot: usize, obj: ObjectAttributes) {
assert!(slot < 128);
let ptr = VolatilePtr((OAM + slot * (size_of::<u16>() * 4)) as *mut u16);
unsafe {
ptr.write(obj.attr0);
ptr.offset(1).write(obj.attr1);
ptr.offset(2).write(obj.attr2);
}
}
#[derive(Debug, Clone, Copy, Default)]
pub struct ObjectAttributes {
attr0: u16,
attr1: u16,
attr2: u16,
}
```
Then we add a billion methods to the `ObjectAttributes` type so that we can
actually set all the different values that we want to set.
This code block is the last thing on this page so if you don't wanna scroll past
the whole thing you can just go to the next page.
```rust
#[derive(Debug, Clone, Copy)]
pub enum ObjectRenderMode {
Normal,
Affine,
Disabled,
DoubleAreaAffine,
}
#[derive(Debug, Clone, Copy)]
pub enum ObjectMode {
Normal,
AlphaBlending,
ObjectWindow,
}
#[derive(Debug, Clone, Copy)]
pub enum ObjectShape {
Square,
Horizontal,
Vertical,
}
#[derive(Debug, Clone, Copy)]
pub enum ObjectOrientation {
Normal,
HFlip,
VFlip,
BothFlip,
Affine(u8),
}
impl ObjectAttributes {
pub fn row(&self) -> u16 {
self.attr0 & 0b1111_1111
}
pub fn column(&self) -> u16 {
self.attr1 & 0b1_1111_1111
}
pub fn rendering(&self) -> ObjectRenderMode {
match (self.attr0 >> 8) & 0b11 {
0 => ObjectRenderMode::Normal,
1 => ObjectRenderMode::Affine,
2 => ObjectRenderMode::Disabled,
3 => ObjectRenderMode::DoubleAreaAffine,
_ => unimplemented!(),
}
}
pub fn mode(&self) -> ObjectMode {
match (self.attr0 >> 0xA) & 0b11 {
0 => ObjectMode::Normal,
1 => ObjectMode::AlphaBlending,
2 => ObjectMode::ObjectWindow,
_ => unimplemented!(),
}
}
pub fn mosaic(&self) -> bool {
((self.attr0 << 3) as i16) < 0
}
pub fn two_fifty_six_colors(&self) -> bool {
((self.attr0 << 2) as i16) < 0
}
pub fn shape(&self) -> ObjectShape {
match (self.attr0 >> 0xE) & 0b11 {
0 => ObjectShape::Square,
1 => ObjectShape::Horizontal,
2 => ObjectShape::Vertical,
_ => unimplemented!(),
}
}
pub fn orientation(&self) -> ObjectOrientation {
if (self.attr0 >> 8) & 1 > 0 {
ObjectOrientation::Affine((self.attr1 >> 9) as u8 & 0b1_1111)
} else {
match (self.attr1 >> 0xC) & 0b11 {
0 => ObjectOrientation::Normal,
1 => ObjectOrientation::HFlip,
2 => ObjectOrientation::VFlip,
3 => ObjectOrientation::BothFlip,
_ => unimplemented!(),
}
}
}
pub fn size(&self) -> u16 {
self.attr1 >> 0xE
}
pub fn tile_index(&self) -> u16 {
self.attr2 & 0b11_1111_1111
}
pub fn priority(&self) -> u16 {
self.attr2 >> 0xA
}
pub fn palbank(&self) -> u16 {
self.attr2 >> 0xC
}
//
pub fn set_row(&mut self, row: u16) {
self.attr0 &= !0b1111_1111;
self.attr0 |= row & 0b1111_1111;
}
pub fn set_column(&mut self, col: u16) {
self.attr1 &= !0b1_1111_1111;
self.attr2 |= col & 0b1_1111_1111;
}
pub fn set_rendering(&mut self, rendering: ObjectRenderMode) {
const RENDERING_MASK: u16 = 0b11 << 8;
self.attr0 &= !RENDERING_MASK;
self.attr0 |= (rendering as u16) << 8;
}
pub fn set_mode(&mut self, mode: ObjectMode) {
const MODE_MASK: u16 = 0b11 << 0xA;
self.attr0 &= MODE_MASK;
self.attr0 |= (mode as u16) << 0xA;
}
pub fn set_mosaic(&mut self, bit: bool) {
const MOSAIC_BIT: u16 = 1 << 0xC;
if bit {
self.attr0 |= MOSAIC_BIT
} else {
self.attr0 &= !MOSAIC_BIT
}
}
pub fn set_two_fifty_six_colors(&mut self, bit: bool) {
const COLOR_MODE_BIT: u16 = 1 << 0xD;
if bit {
self.attr0 |= COLOR_MODE_BIT
} else {
self.attr0 &= !COLOR_MODE_BIT
}
}
pub fn set_shape(&mut self, shape: ObjectShape) {
self.attr0 &= 0b0011_1111_1111_1111;
self.attr0 |= (shape as u16) << 0xE;
}
pub fn set_orientation(&mut self, orientation: ObjectOrientation) {
const AFFINE_INDEX_MASK: u16 = 0b1_1111 << 9;
self.attr1 &= !AFFINE_INDEX_MASK;
let bits = match orientation {
ObjectOrientation::Affine(index) => (index as u16) << 9,
ObjectOrientation::Normal => 0,
ObjectOrientation::HFlip => 1 << 0xC,
ObjectOrientation::VFlip => 1 << 0xD,
ObjectOrientation::BothFlip => 0b11 << 0xC,
};
self.attr1 |= bits;
}
pub fn set_size(&mut self, size: u16) {
self.attr1 &= 0b0011_1111_1111_1111;
self.attr1 |= size << 14;
}
pub fn set_tile_index(&mut self, index: u16) {
self.attr2 &= !0b11_1111_1111;
self.attr2 |= 0b11_1111_1111 & index;
}
pub fn set_priority(&mut self, priority: u16) {
self.attr2 &= !0b0000_1100_0000_0000;
self.attr2 |= (priority & 0b11) << 0xA;
}
pub fn set_palbank(&mut self, palbank: u16) {
self.attr2 &= !0b1111_0000_0000_0000;
self.attr2 |= (palbank & 0b1111) << 0xC;
}
}
```

Binary file not shown.

Before

Width:  |  Height:  |  Size: 5.4 KiB

View file

@ -1,109 +0,0 @@
# The Display Control Register
The display control register is our first actual IO Register. GBATEK gives it the
shorthand [DISPCNT](http://problemkaputt.de/gbatek.htm#lcdiodisplaycontrol), so
you might see it under that name if you read other guides.
Among IO Registers, it's one of the simpler ones, but it's got enough complexity
that we can get a hint of what's to come.
Also it's the one that you basically always need to set at least once in every
GBA game, so it's a good starting one to go over for that reason too.
The display control register holds a `u16` value, and is located at `0x0400_0000`.
Many of the bits here won't mean much to you right now. **That is fine.** You do
NOT need to memorize them all or what they all do right away. We'll just skim
over all the parts of this register to start, and then we'll go into more detail
in later chapters when we need to come back and use more of the bits.
## Video Modes
The lowest three bits (0-2) let you select from among the GBA's six video modes.
You'll notice that 3 bits allows for eight modes, but the values 6 and 7 are
prohibited.
Modes 0, 1, and 2 are "tiled" modes. These are actually the modes that you
should eventually learn to use as much as possible. It lets the GBA's limited
video hardware do as much of the work as possible, leaving more of your CPU time
for gameplay computations. However, they're also complex enough to deserve their
own demos and chapters later on, so that's all we'll say about them for now.
Modes 3, 4, and 5 are "bitmap" modes. These let you write individual pixels to
locations on the screen.
* **Mode 3** is full resolution (240w x 160h) RGB15 color. You might not be used
to RGB15, since modern computers have 24 or 32 bit colors. In RGB15, there's 5
bits for each color channel stored within a `u16` value, and the highest bit is
simply ignored.
* **Mode 4** is full resolution paletted color. Instead of being a `u16` color, each
pixel value is a `u8` palette index entry, and then the display uses the
palette memory (which we'll talk about later) to store the actual color data.
Since each pixel is half sized, we can fit twice as many. This lets us have
two "pages". At any given moment only one page is active, and you can draw to
the other page without the user noticing. You set which page to show with
another bit we'll get to in a moment.
* **Mode 5** is full color, but also with pages. This means that we must have a
reduced resolution to compensate (video memory is only so big!). The screen is
effectively only 160w x 128h in this mode.
## CGB Mode
Bit 3 is effectively read only. Technically it can be flipped using a BIOS call,
but when you write to the display control register normally it won't write to
this bit, so we'll call it effectively read only.
This bit is on if the CPU is in CGB mode.
## Page Flipping
Bit 4 lets you pick which page to use. This is only relevent in video modes 4 or
5, and is just ignored otherwise. It's very easy to remember: when the bit is 0
the 0th page is used, and when the bit is 1 the 1st page is used.
The second page always starts at `0x0600_A000`.
## OAM, VRAM, and Blanking
Bit 5 lets you access OAM during HBlank if enabled. This is cool, but it reduces
the maximum sprites per scanline, so it's not default.
Bit 6 lets you adjust if the GBA should treat Object Character VRAM as being 2d
(off) or 1d (on). This particular control can be kinda tricky to wrap your head
around, so we'll be sure to have some extra diagrams in the chapter that deals
with it.
Bit 7 forces the screen to stay in VBlank as long as it's set. This allows the
fastest use of the VRAM, Palette, and Object Attribute Memory. Obviously if you
leave this on for too long the player will notice a blank screen, but it might
be okay to use for a moment or two every once in a while.
## Screen Layers
Bits 8 through 11 control if Background layers 0 through 3 should be active.
Bit 12 affects the Object layer.
Note that not all background layers are available in all video modes:
* Mode 0: all
* Mode 1: 0/1/2
* Mode 2: 2/3
* Mode 3/4/5: 2
Bit 13 and 14 enable the display of Windows 0 and 1, and Bit 15 enables the
object display window. We'll get into how windows work later on, they let you do
some nifty graphical effects.
## In Conclusion...
So what did we do to the display control register in `hello1`?
```rust
(0x04000000 as *mut u16).write_volatile(0x0403);
```
First let's [convert that to
binary](https://www.wolframalpha.com/input/?i=0x0403), and we get
`0b100_0000_0011`. So, that's setting Mode 3 with background 2 enabled and
nothing else special.

View file

@ -1,213 +0,0 @@
# The Key Input Register
The Key Input Register is our next IO register. Its shorthand name is
[KEYINPUT](http://problemkaputt.de/gbatek.htm#gbakeypadinput) and it's a `u16`
at `0x4000130`. The entire register is obviously read only, you can't tell the
GBA what buttons are pressed.
Each button is exactly one bit:
| Bit | Button |
|:---:|:------:|
| 0 | A |
| 1 | B |
| 2 | Select |
| 3 | Start |
| 4 | Right |
| 5 | Left |
| 6 | Up |
| 7 | Down |
| 8 | R |
| 9 | L |
The higher bits above are not used at all.
Similar to other old hardware devices, the convention here is that a button's
bit is **clear when pressed, active when released**. In other words, when the
user is not touching the device at all the KEYINPUT value will read
`0b0000_0011_1111_1111`. There's similar values for when the user is pressing as
many buttons as possible, but since the left/right and up/down keys are on an
arrow pad the value can never be 0 since you can't ever press every single key
at once.
When dealing with key input, the register always shows the exact key values at
any moment you read it. Obviously that's what it should do, but what it means to
you as a programmer is that you should usually gather input once at the top of a
game frame and then use that single input poll as the input values across the
whole game frame.
Of course, you might want to know if a user's key state changed from frame to
frame. That's fairly easy too: We just store the last frame keys as well as the
current frame keys (it's only a `u16`) and then we can xor the two values.
Anything that shows up in the xor result is a key that changed. If it's changed
and it's now down, that means it was pushed this frame. If it's changed and it's
now up, that means it was released this frame.
The other major thing you might frequently want is to know "which way" the arrow
pad is pointing: Up/Down/None and Left/Right/None. Sounds like an enum to me.
Except that often time we'll have situations where the direction just needs to
be multiplied by a speed and applied as a delta to a position. We want to
support that as well as we can too.
## Key Input Code
Let's get down to some code. First we want to make a way to read the address as
a `u16` and then wrap that in our newtype which will implement methods for
reading and writing the key bits.
```rust
pub const KEYINPUT: VolatilePtr<u16> = VolatilePtr(0x400_0130 as *mut u16);
/// A newtype over the key input state of the GBA.
#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
#[repr(transparent)]
pub struct KeyInputSetting(u16);
pub fn key_input() -> KeyInputSetting {
unsafe { KeyInputSetting(KEYINPUT.read()) }
}
```
Now we want a way to check if a key is _being pressed_, since that's normally
how we think of things as a game designer and even as a player. That is, usually
you'd say "if you press A, then X happens" instead of "if you don't press A,
then X does not happen".
Normally we'd pick a constant for the bit we want, `&` it with our value, and
then check for `val != 0`. Since the bit we're looking for is `0` in the "true"
state we still pick the same constant and we still do the `&`, but we test with
`== 0`. Practically the same, right? Well, since I'm asking a rhetorical
question like that you can probably already guess that it's not the same. I was
shocked to learn this too.
All we have to do is ask our good friend
[Godbolt](https://rust.godbolt.org/z/d-8oCe) what's gonna happen when the code
compiles. The link there has the page set for the `stable` 1.30 compiler just so
that the link results stay consistent if you read this book in a year or
something. Also, we've set the target to `thumbv6m-none-eabi`, which is a
slightly later version of ARM than the actual GBA, but it's close enough for
just checking. Of course, in a full program small functions like these will
probably get inlined into the calling code and disappear entirely as they're
folded and refolded by the compiler, but we can just check.
It turns out that the `!=0` test is 4 instructions and the `==0` test is 6
instructions. Since we want to get savings where we can, and we'll probably
check the keys of an input often enough, we'll just always use a `!=0` test and
then adjust how we initially read the register to compensate. By using xor with
a mask for only the 10 used bits we can flip the "low when pressed" values so
that the entire result has active bits in all positions where a key is pressed.
```rust
pub fn key_input() -> KeyInputSetting {
unsafe { KeyInputSetting(KEYINPUT.read_volatile() ^ 0b0000_0011_1111_1111) }
}
```
Now we add a method for seeing if a key is pressed. In the full library there's
a more advanced version of this that's built up via macro, but for this example
we'll just name a bunch of `const` values and then have a method that takes a
value and says if that bit is on.
```rust
pub const KEY_A: u16 = 1 << 0;
pub const KEY_B: u16 = 1 << 1;
pub const KEY_SELECT: u16 = 1 << 2;
pub const KEY_START: u16 = 1 << 3;
pub const KEY_RIGHT: u16 = 1 << 4;
pub const KEY_LEFT: u16 = 1 << 5;
pub const KEY_UP: u16 = 1 << 6;
pub const KEY_DOWN: u16 = 1 << 7;
pub const KEY_R: u16 = 1 << 8;
pub const KEY_L: u16 = 1 << 9;
impl KeyInputSetting {
pub fn contains(&self, key: u16) -> bool {
(self.0 & key) != 0
}
}
```
Because each key is a unique bit you can even check for more than one key at
once by just adding two key values together.
```rust
let input_contains_a_and_l = input.contains(KEY_A + KEY_L);
```
And we wanted to save the state of an old frame and compare it to the current
frame to see what was different:
```rust
pub fn difference(&self, other: KeyInputSetting) -> KeyInputSetting {
KeyInputSetting(self.0 ^ other.0)
}
```
Anything that's "in" the difference output is a key that _changed_, and then if
the key reads as pressed this frame that means it was just pressed. The exact
mechanics of all the ways you might care to do something based on new key
presses is obviously quite varied, but it might be something like this:
```rust
let this_frame_diff = this_frame_input.difference(last_frame_input);
if this_frame_diff.contains(KEY_B) && this_frame_input.contains(KEY_B) {
// the user just pressed B, react in some way
}
```
And for the arrow pad, we'll make an enum that easily casts into `i32`. Whenever
we're working with stuff we can try to use `i32` / `isize` as often as possible
just because it's easier on the GBA's CPU if we stick to its native number size.
Having it be an enum lets us use `match` and be sure that we've covered all our
cases.
```rust
/// A "tribool" value helps us interpret the arrow pad.
#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
#[repr(i32)]
pub enum TriBool {
Minus = -1,
Neutral = 0,
Plus = +1,
}
```
Now, how do we determine _which way_ is plus or minus? Well... I don't know.
Really. I'm not sure what the best one is because the GBA really wants the
origin at 0,0 with higher rows going down and higher cols going right. On the
other hand, all the normal math you and I learned in school is oriented with
increasing Y being upward on the page. So, at least for this demo, we're going
to go with what the GBA wants us to do and give it a try. If we don't end up
confusing ourselves then we can stick with that. Maybe we can cover it over
somehow later on.
```rust
pub fn column_direction(&self) -> TriBool {
if self.contains(KEY_RIGHT) {
TriBool::Plus
} else if self.contains(KEY_LEFT) {
TriBool::Minus
} else {
TriBool::Neutral
}
}
pub fn row_direction(&self) -> TriBool {
if self.contains(KEY_DOWN) {
TriBool::Plus
} else if self.contains(KEY_UP) {
TriBool::Minus
} else {
TriBool::Neutral
}
}
```
So then in our game, every frame we can check for `column_direction` and
`row_direction` and then apply those to the player's current position to make
them move around the screen.
With that settled I think we're all done with user input for now. There's some
other things to eventually know about like key interrupts that you can set and
stuff, but we'll cover that later on because it's not necessary right now.

View file

@ -1,71 +0,0 @@
# The VCount Register
There's an IO register called
[VCOUNT](http://problemkaputt.de/gbatek.htm#lcdiointerruptsandstatus) that shows
you, what else, the Vertical (row) COUNT(er). It's a `u16` at address
`0x0400_0006`, and it's how we'll be doing our very poor quality vertical sync
code to start.
* **What makes it poor?** Well, we're just going to read from the vcount value as
often as possible every time we need to wait for a specific value to come up,
and then proceed once it hits the point we're looking for.
* **Why is this bad?** Because we're making the CPU do a lot of useless work,
which uses a lot more power that necessary. Even if you're not on an actual
GBA you might be running inside an emulator on a phone or other handheld. You
wanna try to save battery if all you're doing with that power use is waiting
instead of making a game actually do something.
* **Can we do better?** We can, but not yet. The better way to do things is to
use a BIOS call to put the CPU into low power mode until a VBlank interrupt
happens. However, we don't know about interrupts yet, and we don't know about
BIOS calls yet, so we'll do the basic thing for now and then upgrade later.
So the way that display hardware actually displays each frame is that it moves a
tiny pointer left to right across each pixel row one pixel at a time. When it's
within the actual screen width (240px) it's drawing out those pixels. Then it
goes _past_ the edge of the screen for 68px during a period known as the
"horizontal blank" (HBlank). Then it starts on the next row and does that loop
over again. This happens for the whole screen height (160px) and then once again
it goes past the last row for another 68px into a "vertical blank" (VBlank)
period.
* One pixel is 4 CPU cycles
* HDraw is 240 pixels, HBlank is 68 pixels (1,232 cycles per full scanline)
* VDraw is 150 scanlines, VBlank is 68 scanlines (280,896 cycles per full refresh)
Now you may remember some stuff from the display control register section where
it was mentioned that some parts of memory are best accessed during VBlank, and
also during hblank with a setting applied. These blanking periods are what was
being talked about. At other times if you attempt to access video or object
memory you (the CPU) might try touching the same memory that the display device
is trying to use, in which case you get bumped back a cycle so that the display
can finish what it's doing. Also, if you really insist on doing video memory
changes while the screen is being drawn then you might get some visual glitches.
If you can, just prepare all your changes ahead of time and then assign then all
quickly during the blank period.
So first we want a way to check the vcount value at all:
```rust
pub const VCOUNT: VolatilePtr<u16> = VolatilePtr(0x0400_0006 as *mut u16);
pub fn vcount() -> u16 {
unsafe { VCOUNT.read() }
}
```
Then we want two little helper functions to wait until VBlank and vdraw.
```rust
pub const SCREEN_HEIGHT: isize = 160;
pub fn wait_until_vblank() {
while vcount() < SCREEN_HEIGHT as u16 {}
}
pub fn wait_until_vdraw() {
while vcount() >= SCREEN_HEIGHT as u16 {}
}
```
And... that's it. No special types to be made this time around, it's just a
number we read out of memory.

View file

@ -1,130 +0,0 @@
# Tile Data
When using the GBA's hardware graphics, if you want to let the hardware do most
of the work you have to use Modes 0, 1 or 2. However, to do that we first have
to learn about how tile data works inside of the GBA.
## Tiles
Fundamentally, a tile is an 8x8 image. If you want anything bigger than 8x8 you
need to arrange several tiles so that it looks like whatever you're trying to
draw.
As was already mentioned, the GBA supports two different color modes: 4 bits per
pixel and 8 bits per pixel. This means that we have two types of tile that we
need to model. The pixel bits always represent an index into the PALRAM.
* With 4 bits per pixel, the PALRAM is imagined to be 16 **palbank** sections of
16 palette entries each. The image data selects the index within the palbank,
and an external configuration selects which palbank is used.
* With 8 bits per pixel, the PALRAM is imagined to be a single 256 entry array
and the index just directly picks which of the 256 colors is used.
Knowing this, we can write the following definitions:
```rust
#[derive(Debug, Clone, Copy, Default)]
#[repr(transparent)]
pub struct Tile4bpp {
pub data: [u32; 8]
}
#[derive(Debug, Clone, Copy, Default)]
#[repr(transparent)]
pub struct Tile8bpp {
pub data: [u32; 16]
}
```
I hope this makes sense so far. At 4bpp, we have 4 bits per pixel, times 8
pixels per line, times 8 lines: 256 bits required. Similarly, at 8 bits per
pixel we'll need 512 bits. Why are we defining them as arrays of `u32` values?
Because when it comes time to do bulk copies the fastest way to it will be to go
one whole machine word at a time. If we make the data inside the type be an
array of `u32` then it'll already be aligned for fast `u32` bulk copies.
Keeping track of the current color depth is naturally the _programmer's_
problem. If you get it wrong you'll see a whole ton of garbage pixels all over
the screen, and you'll probably be able to guess why. You know, unless you did
one of the other things that can make a bunch of garbage pixels show up all over
the screen. Graphics programming is fun like that.
## Charblocks
Tiles don't just sit on their own, they get grouped into **charblocks**. Long
ago in the distant past, video games were built with hardware that was also used
to make text terminals. So tile image data was called "character data". In fact
some guides will even call the regular mode for the background layers "text
mode", despite the fact that you obviously don't have to show text at all.
A charblock is 16kb long (`0x4000` bytes), which means that the number of tiles
that fit into a charblock depends on your color depth. With 4bpp you get 512
tiles, and with 8bpp there's 256 tiles. So they'd be something like this:
```rust
#[derive(Clone, Copy)]
#[repr(transparent)]
pub struct Charblock4bpp {
pub data: [Tile4bpp; 512],
}
#[derive(Clone, Copy)]
#[repr(transparent)]
pub struct Charblock8bpp {
pub data: [Tile8bpp; 256],
}
```
You'll note that we can't even derive `Debug` or `Default` any more because the
arrays are so big. Rust supports Clone and Copy for arrays of any size, but the
rest is still size 32 or less. We won't generally be making up an entire
Charblock on the fly though, so it's not a big deal. If we _absolutely_ had to,
we could call `core::mem::zeroed()`, but we really don't want to be trying to
build a whole charblock at runtime. We'll usually want to define our tile data
as `const` charblock values (or even parts of charblock values) that we then
load out of the game pak ROM at runtime.
Anyway, with 16k per charblock and only 96k total in VRAM, it's easy math to see
that there's 6 different charblocks in VRAM when in a tiled mode. The first four
of these are for backgrounds, and the other two are for objects. There's rules
for how a tile ID on a background or object selects a tile within a charblock,
but since they're different between backgrounds and objects we'll cover that on
their own pages.
## Image Editing
It's very important to note that if you use a normal image editor you'll get
very bad results if you translate that directly into GBA memory.
Imagine you have part of an image that's 16 by 16 pixels, aka 2 tiles by 2
tiles. The data for that bitmap is the 1st row of the 1st tile, then the 1st row
of the 2nd tile. However, when we translate that into the GBA, the first 8
pixels will indeed be the first 8 tile pixels, but then the next 8 pixels in
memory will be used as the _2nd row of the first tile_, not the 1st row of the
2nd tile.
So, how do we fix this?
Well, the simple but annoying way is to edit your tile image as being an 8 pixel
wide image and then have the image get super tall as you add more and more
tiles. It can work, but it's really impractical if you have any multi-tile
things that you're trying to do.
Instead, there are some image conversion tools that devkitpro provides in their
gba-dev section. They let you take normal images and then repackage them and
export it in various formats that you can then compile into your project.
Ketsuban uses the [grit](http://www.coranac.com/projects/grit/) tool, with the
following suggestions:
1) Include an actual resource file and a file describing it somewhere in your
project (see [the grit
manual](http://www.coranac.com/man/grit/html/index.htm) for all details
involved here).
2) In a `build.rs` you run `grit` on each resource+description pair, such as in
this [old gist
example](https://gist.github.com/ketsuban/526fa55fbef0a3ccd4c7cd6204f29f94)
3) Then within your rust code you use the
[include_bytes!](https://doc.rust-lang.org/core/macro.include_bytes.html)
macro to have the formatted resource be available as a const value you can
load at runtime.

View file

@ -1,113 +0,0 @@
# Video Memory Intro
The GBA's Video RAM is 96k stretching from `0x0600_0000` to `0x0601_7FFF`.
The Video RAM can only be accessed totally freely during a Vertical Blank (aka
"VBlank", though sometimes I forget and don't capitalize it properly). At other
times, if the CPU tries to touch the same part of video memory as the display
controller is accessing then the CPU gets bumped by a cycle to avoid a clash.
Annoyingly, VRAM can only be properly written to in 16 and 32 bit segments (same
with PALRAM and OAM). If you try to write just an 8 bit segment, then both parts
of the 16 bit segment get the same value written to them. In other words, if you
write the byte `5` to `0x0600_0000`, then both `0x0600_0000` and ALSO
`0x0600_0001` will have the byte `5` in them. We have to be extra careful when
trying to set an individual byte, and we also have to be careful if we use
`memcopy` or `memset` as well, because they're byte oriented by default and
don't know to follow the special rules.
## RGB15
As I said before, RGB15 stores a color within a `u16` value using 5 bits for
each color channel.
```rust
pub const RED: u16 = 0b0_00000_00000_11111;
pub const GREEN: u16 = 0b0_00000_11111_00000;
pub const BLUE: u16 = 0b0_11111_00000_00000;
```
In Mode 3 and Mode 5 we write direct color values into VRAM, and in Mode 4 we
write palette index values, and then the color values go into the PALRAM.
## Mode 3
Mode 3 is pretty easy. We have a full resolution grid of rgb15 pixels. There's
160 rows of 240 pixels each, with the base address being the top left corner. A
particular pixel uses normal "2d indexing" math:
```rust
let row_five_col_seven = 5 + (7 * SCREEN_WIDTH);
```
To draw a pixel, we just write a value at the address for the row and col that
we want to draw to.
## Mode 4
Mode 4 introduces page flipping. Instead of one giant page at `0x0600_0000`,
there's Page 0 at `0x0600_0000` and then Page 1 at `0x0600_A000`. The resolution
for each page is the same as above, but instead of writing `u16` values, the
memory is treated as `u8` indexes into PALRAM. The PALRAM starts at
`0x0500_0000`, and there's enough space for 256 palette entries (each a `u16`).
To set the color of a palette entry we just do a normal `u16` write_volatile.
```rust
(0x0500_0000 as *mut u16).offset(target_index).write_volatile(new_color)
```
To draw a pixel we set the palette entry that we want the pixel to use. However,
we must remember the "minimum size" write limitation that applies to VRAM. So,
if we want to change just a single pixel at a time we must
1) Read the full `u16` it's a part of.
2) Clear the half of the `u16` we're going to replace
3) Write the half of the `u16` we're going to replace with the new value
4) Write that result back to the address.
So, the math for finding a byte offset is the same as Mode 3 (since they're both
a 2d grid). If the byte offset is EVEN it'll be the high bits of the `u16` at
half the byte offset rounded down. If the offset is ODD it'll be the low bits of
the `u16` at half the byte.
Does that make sense?
* If we want to write pixel (0,0) the byte offset is 0, so we change the high
bits of `u16` offset 0. Then we want to write to (1,0), so the byte offset is
1, so we change the low bits of `u16` offset 0. The pixels are next to each
other, and the target bytes are next to each other, good so far.
* If we want to write to (5,6) that'd be byte `5 + 6 * 240 = 1445`, so we'd
target the low bits of `u16` offset `floor(1445/2) = 722`.
As you can see, trying to write individual pixels in Mode 4 is mostly a bad
time. Fret not! We don't _have_ to write individual bytes. If our data is
arranged correctly ahead of time we can just write `u16` or `u32` values
directly. The video hardware doesn't care, it'll get along just fine.
## Mode 5
Mode 5 is also a two page mode, but instead of compressing the size of a pixel's
data to fit in two pages, we compress the resolution.
Mode 5 is full `u16` color, but only 160w x 128h per page.
## In Conclusion...
So what got written into VRAM in `hello1`?
```rust
(0x06000000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
(0x06000000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
(0x06000000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
```
So at pixels `(120,80)`, `(136,80)`, and `(120,96)` we write three values. Once
again we probably need to [convert them](https://www.wolframalpha.com/) into
binary to make sense of it.
* 0x001F: 0b0_00000_00000_11111
* 0x03E0: 0b0_00000_11111_00000
* 0x7C00: 0b0_11111_00000_00000
Ah, of course, a red pixel, a green pixel, and a blue pixel.

View file

@ -1,9 +0,0 @@
# Rust GBA Guide
* [Development Setup](development-setup.md)
* [Volatile](volatile.md)
* [The Hardware Memory Map](the-hardware-memory-map.md)
* [IO Registers](io-registers.md)
* [Bitmap Video](bitmap-video.md)
* [GBA Assembly](gba-asm.md)

View file

@ -1,214 +0,0 @@
# Bitmap Video
Our first video modes to talk about are the bitmap video modes.
It's not because they're the best and fastest, it's because they're the
_simplest_. You can get going and practice with them really quickly. Usually
after that you end up wanting to move on to the other video modes because they
have better hardware support, so you can draw more complex things with the small
number of cycles that the GBA allows.
## The Three Bitmap Modes
As I said in the Hardware Memory Map section, the Video RAM lives in the address
space at `0x600_0000`. Depending on our video mode the display controller will
consider this memory to be in one of a few totally different formats.
### Mode 3
The screen is 160 rows, each 240 pixels long, of `u16` color values.
This is "full" resolution, and "full" color. It adds up to 76,800 bytes. VRAM is
only 96,304 bytes total though. There's enough space left over after the bitmap
for some object tile data if you want to use objects, but basically Mode3 is
using all of VRAM as one huge canvas.
### Mode 4
The screen is 160 rows, each 240 pixels long, of `u8` palette values.
This has half as much space per pixel. What's a palette value? That's an index
into the background PALRAM which says what the color of that pixel should be. We
still have the full color space available, but we can only use 256 colors at the
same time.
What did we get in exchange for this? Well, now there's a second "page". The
second page starts `0xA000` bytes into VRAM (in both Mode 4 and Mode 5). It's an
entire second set of pixel data. You determine if Page 0 or Page 1 is shown
using bit 4 of DISPCNT. When you swap which page is being displayed it's called
page flipping or flipping the page, or something like that.
Having two pages is cool, but Mode 4 has a big drawback: it's part of VRAM so
that "can't write 1 byte at a time" rule applies. This means that to set a
single byte we need to read a `u16`, adjust just one side of it, and then write
that `u16` back. We can hide the complication behind a method call, but it
simply takes longer to do all that, so editing pixels ends up being
unfortunately slow compared to the other bitmap modes.
### Mode 5
The screen is 128 rows, each 160 pixels long, of `u16` color values.
Mode 5 has two pages like Mode 4 does, but instead of keeping full resolution we
keep full color. The pixels are displayed in the top left and it's just black on
the right and bottom edges. You can use the background control registers to
shift it around, maybe center it, but there's no way to get around the fact that
not having full resolution is kinda awkward.
## Using Mode 3
Let's have a look at how this comes together. We'll call this one
`hello_world.rs`, since it's our first real program.
### Module Attributes and Imports
At the top of our file we're still `no_std` and we're still using
`feature(start)`, but now we're using the `gba` crate so we're 100% safe code!
Often enough we'll need a little `unsafe`, but for just bitmap drawing we don't
need it.
```rust
#![no_std]
#![feature(start)]
#![forbid(unsafe_code)]
use gba::{
fatal,
io::{
display::{DisplayControlSetting, DisplayMode, DISPCNT, VBLANK_SCANLINE, VCOUNT},
keypad::read_key_input,
},
vram::bitmap::Mode3,
Color,
};
```
### Panic Handler
Before we had a panic handler that just looped forever. Now that we're using the
`gba` crate we can rely on the debug output channel from `mGBA` to get a message
into the real world. There's macros setup for each message severity, and they
all accept a format string and arguments, like how `println` works. The catch is
that a given message is capped at a length of 255 bytes, and it should probably
be ASCII only.
In the case of the `fatal` message level, it also halts the emulator.
Of course, if the program is run on real hardware then the `fatal` message won't
stop the program, so we still need the infinite loop there too.
(not that this program _can_ panic, but `rustc` doesn't know that so it demands
we have a `panic_handler`)
```rust
#[panic_handler]
fn panic(info: &core::panic::PanicInfo) -> ! {
// This kills the emulation with a message if we're running within mGBA.
fatal!("{}", info);
// If we're _not_ running within mGBA then we still need to not return, so
// loop forever doing nothing.
loop {}
}
```
### Waiting Around
Like I talked about before, sometimes we need to wait around a bit for the right
moment to start doing work. However, we don't know how to do the good version of
waiting for VBlank and VDraw to start, so we'll use the really bad version of it
for now.
```rust
/// Performs a busy loop until VBlank starts.
///
/// This is very inefficient, and please keep following the lessons until we
/// cover how interrupts work!
pub fn spin_until_vblank() {
while VCOUNT.read() < VBLANK_SCANLINE {}
}
/// Performs a busy loop until VDraw starts.
///
/// This is very inefficient, and please keep following the lessons until we
/// cover how interrupts work!
pub fn spin_until_vdraw() {
while VCOUNT.read() >= VBLANK_SCANLINE {}
}
```
### Setup in `main`
In main we set the display control value we want and declare a few variables
we're going to use in our primary loop.
```rust
#[start]
fn main(_argc: isize, _argv: *const *const u8) -> isize {
const SETTING: DisplayControlSetting =
DisplayControlSetting::new().with_mode(DisplayMode::Mode3).with_bg2(true);
DISPCNT.write(SETTING);
let mut px = Mode3::WIDTH / 2;
let mut py = Mode3::HEIGHT / 2;
let mut color = Color::from_rgb(31, 0, 0);
```
### Stuff During VDraw
When a frame starts we want to read the keys, then adjust as much of the game
state as we can without touching VRAM.
Once we're ready, we do our spin loop until VBlank starts.
In this case, we're going to adjust `px` and `py` depending on the arrow pad
input, and also we'll cycle around the color depending on L and R being pressed.
```rust
loop {
// read our keys for this frame
let this_frame_keys = read_key_input();
// adjust game state and wait for vblank
px = px.wrapping_add(2 * this_frame_keys.x_tribool() as usize);
py = py.wrapping_add(2 * this_frame_keys.y_tribool() as usize);
if this_frame_keys.l() {
color = Color(color.0.rotate_left(5));
}
if this_frame_keys.r() {
color = Color(color.0.rotate_right(5));
}
// now we wait
spin_until_vblank();
```
### Stuff During VBlank
When VBlank starts we want want to update video memory to display the new
frame's situation.
In our case, we're going to paint a little square of the current color, but also
if you go off the map it resets the screen.
At the end, we spin until VDraw starts so we can do the whole thing again.
```rust
// draw the new game and wait until the next frame starts.
if px >= Mode3::WIDTH || py >= Mode3::HEIGHT {
// out of bounds, reset the screen and position.
Mode3::dma_clear_to(Color::from_rgb(0, 0, 0));
px = Mode3::WIDTH / 2;
py = Mode3::HEIGHT / 2;
} else {
// draw the new part of the line
Mode3::write(px, py, color);
Mode3::write(px, py + 1, color);
Mode3::write(px + 1, py, color);
Mode3::write(px + 1, py + 1, color);
}
// now we wait again
spin_until_vdraw();
}
}
```

View file

@ -1,189 +0,0 @@
# Development Setup
Before you can build a GBA game you'll have to follow some special steps to
setup the development environment.
Once again, extra special thanks to **Ketsuban**, who first dove into how to
make this all work with rust and then shared it with the world.
## Per System Setup
Obviously you need your computer to have a [working rust
installation](https://rustup.rs/). However, you'll also need to ensure that
you're using a nightly toolchain (we will need it for inline assembly, among
other potential useful features). You can run `rustup default nightly` to set
nightly as the system wide default toolchain, or you can use a [toolchain
file](https://github.com/rust-lang-nursery/rustup.rs#the-toolchain-file) to use
nightly just on a specific project, but either way we'll be assuming the use of
nightly from now on. You'll also need the `rust-src` component so that
`cargo-xbuild` will be able to compile the core crate for us in a bit, so run
`rustup component add rust-src`.
Next, you need [devkitpro](https://devkitpro.org/wiki/Getting_Started). They've
got a graphical installer for Windows that runs nicely, and I guess `pacman`
support on Linux (I'm on Windows so I haven't tried the Linux install myself).
We'll be using a few of their general binutils for the `arm-none-eabi` target,
and we'll also be using some of their tools that are specific to GBA
development, so _even if_ you already have the right binutils for whatever
reason, you'll still want devkitpro for the `gbafix` utility.
* On Windows you'll want something like `C:\devkitpro\devkitARM\bin` and
`C:\devkitpro\tools\bin` to be [added to your
PATH](https://stackoverflow.com/q/44272416/455232), depending on where you
installed it to and such.
* On Linux you can use pacman to get it, and the default install puts the stuff
in `/opt/devkitpro/devkitARM/bin` and `/opt/devkitpro/tools/bin`. If you need
help you can look in our repository's
[.travis.yml](https://github.com/rust-console/gba/blob/master/.travis.yml)
file to see exactly what our CI does.
Finally, you'll need `cargo-xbuild`. Just run `cargo install cargo-xbuild` and
cargo will figure it all out for you.
## Per Project Setup
Once the system wide tools are ready, you'll need some particular files each
time you want to start a new project. You can find them in the root of the
[rust-console/gba repo](https://github.com/rust-console/gba).
* `thumbv4-none-agb.json` describes the overall GBA to cargo-xbuild (and LLVM)
so it knows what to do. Technically the GBA is `thumbv4-none-eabi`, but we
change the `eabi` to `agb` so that we can distinguish it from other `eabi`
devices when using `cfg` flags.
* `crt0.s` describes some ASM startup stuff. If you have more ASM to place here
later on this is where you can put it. You also need to build it into a
`crt0.o` file before it can actually be used, but we'll cover that below.
* `linker.ld` tells the linker all the critical info about the layout
expectations that the GBA has about our program, and that it should also
include the `crt0.o` file with our compiled rust code.
## Compiling
Once all the tools are in place, there's particular steps that you need to
compile the project. For these to work you'll need some source code to compile.
Unlike with other things, an empty main file and/or an empty lib file will cause
a total build failure, because we'll need a
[no_std](https://rust-embedded.github.io/book/intro/no-std.html) build, and rust
defaults to builds that use the standard library. The next section has a minimal
example file you can use (along with explanation), but we'll describe the build
steps here.
* `arm-none-eabi-as crt0.s -o target/crt0.o`
* This builds your text format `crt0.s` file into object format `crt0.o`
that's placed in the `target/` directory. Note that if the `target/`
directory doesn't exist yet it will fail, so you have to make the directory
if it's not there. You don't need to rebuild `crt0.s` every single time,
only when it changes, but you might as well throw a line to do it every time
into your build script so that you never forget because it's a practically
instant operation anyway.
* `cargo xbuild --target thumbv4-none-agb.json`
* This builds your Rust source. It accepts _most of_ the normal options, such
as `--release`, and options, such as `--bin foo` or `--examples`, that you'd
expect `cargo` to accept.
* You **can not** build and run tests this way, because they require `std`,
which the GBA doesn't have. If you want you can still run some of your
project's tests with `cargo test --lib` or similar, but that builds for your
local machine, so anything specific to the GBA (such as reading and writing
registers) won't be testable that way. If you want to isolate and try out
some piece code running on the GBA you'll unfortunately have to make a demo
for it in your `examples/` directory and then run the demo in an emulator
and see if it does what you expect.
* The file extension is important! It will work if you forget it, but `cargo
xbuild` takes the inclusion of the extension as a flag to also compile
dependencies with the same sysroot, so you can include other crates in your
build. Well, crates that work in the GBA's limited environment, but you get
the idea.
At this point you have an ELF binary that some emulators can execute directly
(more on that later). However, if you want a "real" ROM that works in all
emulators and that you could transfer to a flash cart to play on real hardware
there's a little more to do.
* `arm-none-eabi-objcopy -O binary target/thumbv4-none-agb/MODE/BIN_NAME target/ROM_NAME.gba`
* This will perform an [objcopy](https://linux.die.net/man/1/objcopy) on our
program. Here I've named the program `arm-none-eabi-objcopy`, which is what
devkitpro calls their version of `objcopy` that's specific to the GBA in the
Windows install. If the program isn't found under that name, have a look in
your installation directory to see if it's under a slightly different name
or something.
* As you can see from reading the man page, the `-O binary` option takes our
lovely ELF file with symbols and all that and strips it down to basically a
bare memory dump of the program.
* The next argument is the input file. You might not be familiar with how
`cargo` arranges stuff in the `target/` directory, and between RLS and
`cargo doc` and stuff it gets kinda crowded, so it goes like this:
* Since our program was built for a non-local target, first we've got a
directory named for that target, `thumbv4-none-agb/`
* Next, the "MODE" is either `debug/` or `release/`, depending on if we had
the `--release` flag included. You'll probably only be packing release
mode programs all the way into GBA roms, but it works with either mode.
* Finally, the name of the program. If your program is something out of the
project's `src/bin/` then it'll be that file's name, or whatever name you
configured for the bin in the `Cargo.toml` file. If your program is
something out of the project's `examples/` directory there will be a
similar `examples/` sub-directory first, and then the example's name.
* The final argument is the output of the `objcopy`, which I suggest putting
at just the top level of the `target/` directory. Really it could go
anywhere, but if you're using git then it's likely that your `.gitignore`
file is already setup to exclude everything in `target/`, so this makes sure
that your intermediate game builds don't get checked into your git.
* `gbafix target/ROM_NAME.gba`
* The `gbafix` tool also comes from devkitpro. The GBA is very picky about a
ROM's format, and `gbafix` patches the ROM's header and such so that it'll
work right. Unlike `objcopy`, this tool is custom built for GBA development,
so it works just perfectly without any arguments beyond the file name. The
ROM is patched in place, so we don't even need to specify a new destination.
And you're _finally_ done!
Of course, you probably want to make a script for all that, but it's up to you.
On our own project we have it mostly set up within a `Makefile.toml` which runs
using the [cargo-make](https://github.com/sagiegurari/cargo-make) plugin.
## Checking Your Setup
As I said, you need some source code to compile just to check that your
compilation pipeline is working. Here's a sample file that just puts three dots
on the screen without depending on any crates or anything at all.
`hello_magic.rs`:
```rust
#![no_std]
#![feature(start)]
#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
loop {}
}
#[start]
fn main(_argc: isize, _argv: *const *const u8) -> isize {
unsafe {
(0x400_0000 as *mut u16).write_volatile(0x0403);
(0x600_0000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
(0x600_0000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
(0x600_0000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
loop {}
}
}
#[no_mangle]
static __IRQ_HANDLER: extern "C" fn() = irq_handler;
extern "C" fn irq_handler() {}
```
Throw that into your project skeleton, build the program, and give it a run in
an emulator. I suggest [mgba](https://mgba.io/2019/01/26/mgba-0.7.0/), it has
some developer tools we'll use later on. You should see a red, green, and blue
dot close-ish to the middle of the screen. If you don't, something _already_
went wrong. Double check things, phone a friend, write your senators, try asking
`Lokathor` or `Ketsuban` on the [Rust Community
Discord](https://discordapp.com/invite/aVESxV8), until you're eventually able to
get your three dots going.
Of course, I'm sure you want to know why those particular numbers are the
numbers to use. Well that's what the whole rest of the book is about!

View file

@ -1,123 +0,0 @@
# GBA Assembly
On the GBA sometimes you just end up using assembly. Not a whole lot, but
sometimes. Accordingly, you should know how assembly works on the GBA.
* The [ARM Infocenter:
ARM7TDMI](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0210c/index.html)
is the basic authority for reference information. The GBA has a CPU with the
`ARMv4` ISA, the `ARMv4T` variant, and specifically the `ARM7TDMI`
microarchitecture. Someone at ARM decided that having both `ARM#` and `ARMv#`
was a good way to [version things](https://en.wikichip.org/wiki/arm/versions),
even when the numbers don't match. The rest of us have been sad ever since.
The link there will take you to the correct book specific to the GBA's
microarchitecture. There's a whole big pile of ARM books available within the
ARM Infocenter, so if you just google it or whatever make sure you end up
looking at the correct one. Note that there is also a [PDF
Version](http://infocenter.arm.com/help/topic/com.arm.doc.ddi0210c/DDI0210B.pdf)
of the documentation available, if you'd like that.
* In addition to the `ARM7TDMI` book, which is specific to the GBA's CPU, you'll
need to find a copy of the ARM Architecture Reference Manual if you want
general ARM knowledge. The ARM Infocenter has the
[ARMv5](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0100i/index.html)
version of said manual hosted on their site. Unfortunately, they don't seem to
host the `ARMv4T` version of the manual any more.
* The [GBATek: ARM CPU
Overview](https://problemkaputt.de/gbatek.htm#armcpuoverview) also has quite a
bit of info. Some of it is a duplication of what you'd find in the ARM
Infocenter reference manuals. Some of it is information that's specific to the
GBA's layout and how the CPU interacts with other parts (such as how its
timings and the display adapter's timings line up). Some of it is specific to
the ARM chips _within the DS and DSi_, so be careful to make sure that you
don't wander into the wrong section. GBATEK is always a bit of a jumbled mess,
and the explanations are often "sparse" (to put it nicely), so I'd advise that
you also look at the official ARM manuals.
* The [Compiler Explorer](https://rust.godbolt.org/z/ndCnk3) can be used to
quickly look at assembly versions of your Rust code. That link there will load
up an essentially blank `no_std` file with `opt-level=3` set and targeting
`thumbv6m-none-eabi`. That's _not_ the same target as the GBA (it's two ISA
revisions later, `ARMv6` instead of `ARMv4`), but it's the closest CPU target
that is bundled with `rustc`, so it's the closest you can get with the
compiler explorer website. If you're very dedicated I suppose you could setup
a [local
instance](https://github.com/mattgodbolt/compiler-explorer#running-a-local-instance)
of compiler explorer and then add the extra target definition and so on, but
that's _probably_ overkill.
## ARM and Thumb
The "T" part in `ARMv4T` and `ARM7TDMI` means "Thumb". An ARM chip that supports
Thumb has two different instruction sets instead of just one. The chip can run
in ARM state with 32-bit instructions, or it can run in Thumb state with 16-bit
instructions. Note that the CPU _state_ (ARM or Thumb) is distinct from the
_mode_ (User, FIQ, IRQ, etc). Apparently these states are sometimes called
`a32` and `t32` in a more modern context, but I will stick with ARM and Thumb
because that's what the official ARM7TDMI manual and GBATEK both use.
On the GBA, the memory bus that physically transfers data from the cartridge into
the device is a 16-bit memory bus. This means that if you need to transfer more
than 16 bits at a time you have to do more than one transfer. Since we'd like
our instructions to get to the CPU as fast as possible, we compile the majority
of our program with the Thumb instruction set. The ARM reference says that with
Thumb instructions on a 16-bit memory bus system you get about 160% performance
compared to using ARM instructions. That's absolutely something we want to take
advantage of. Also, your Thumb compiled code is about 65% of the same code
compiled with ARM. Since a game ROM can only be 32MB total, and we're trying to
fit in images and sound too, we want to get space savings where we can.
You may wonder, why is the Thumb code 65% as large if the instructions
themselves are 50% as large, and why have ARM state at all if there's such a
benefit to be had with Thumb? Well, Thumb state doesn't support as many different
instructions as ARM state does. Some lines of source code that can compile to a
single ARM instruction might need to compile into more than one Thumb
instruction. Thumb still has most of the really good instructions available, so
it all averages out to about 65%.
That said, some parts of a GBA program _must_ be written for ARM state. Also,
ARM state does allow that increased instruction flexibility. So we _need_ to use
ARM some of the time, and we might just _want_ to use ARM even when we don't
need to at other times. It is possible to switch states on the fly, there's
extremely minimal overhead, even less than doing some function calls. The only
problem is the 16-bit memory bus of the cartridge giving us a needless speed
penalty with our ARM code. The CPU _executes_ the ARM instructions at full
speed, but then it has to wait while more instructions get sent in. What do we
do? Well, code is ultimately just a different kind of data. We can copy parts of
our code off the cartridge ROM and place it into a part of the RAM that has a
32-bit memory bus. Then the CPU can execute the code from there, going at full
speed. Of course, there's only a very small amount of RAM compared to the size
of a cartridge, so we'll only do this with a few select functions. Exactly which
functions will probably depend on your game.
There's two problems that we face as Rust programmers:
1) Rust offers no way to specify individual functions as being ARM or Thumb. The
whole program is compiled for one state or the other. Obviously this is no
good, so it's on the [2019 embedded
wishlist](https://github.com/rust-embedded/wg/issues/256#issuecomment-439677804),
and perhaps a fix will come.
2) Rust offers no way to get a pointer to a function as well as the length of
the compiled function, so we can't copy a function from the ROM to some other
location because we can't even express statements about the function's data.
I also put this [on the
wishlist](https://github.com/rust-embedded/wg/issues/256#issuecomment-450539836),
but honestly I have much less hope that this becomes a part of rust.
What this ultimately means is that some parts of our program have to be written
in external assembly files and then added to the program with the linker. We
were already going to write some assembly, and we already use more than one file
in our project all the time, those parts aren't a big problem. The big problem
is that using custom linker scripts to get assembly code into our final program
isn't transitive between crates.
What I mean is that once we have a file full of custom assembly that we're
linking in by hand, that's not "part of" the crate any more. At least not as
`cargo` sees it. So we can't just upload it to `crates.io` and then depend on it
in other projects and have `cargo` download the right version and and include it
all automatically. We're back to fully manually copying files from the old
project into the new one, adding more lines to the linker script each time we
split up a new assembly file, all that stuff. Like the stone age. Sometimes ya
gotta suffer for your art.

View file

@ -1,237 +0,0 @@
# IO Registers
As I said before, the IO registers are how you tell the GBA to do all the things
you want it to do. If you want a hint at what's available, they're all listed
out in the [GBA I/O Map](https://problemkaputt.de/gbatek.htm#gbaiomap) section
of GBATEK. Go have a quick look.
Each individual IO register has a particular address just like we talked about
in the Hardware Memory Map section. They also have a size (listed in bytes), and
a note on if they're read only, write only, or read-write. Finally, each
register has a name and a one line summary. Unfortunately for us, the names are
all C style names with heavy shorthand. I'm not normally a fan of shorthand
names, but the `gba` crate uses the register names from GBATEK as much as
possible, since they're the most commonly used set of names among GBA
programmers. That way, if you're reading other guides and they say to set the
`BG2CNT` register, then you know exactly what register to look for within the
`gba` docs.
## Register Bits
There's only about 100 registers, but there's a lot more than 100 details we
want to have control over on the GBA. How does that work? Well, let's use a
particular register to talk about it. The first one on the list is `DISPCNT`,
the "Display Control" register. It's one of the most important IO registers, so
this is a "two birds with one stone" situation.
Naturally there's a whole lot of things involved in the LCD that we want to
control, and it's all "one" value, but that value is actually many "fields"
packed into one value. When learning about an IO register, you have to look at
its bit pattern breakdown. For `DISPCNT` the GBATEK entry looks like this:
```txt
4000000h - DISPCNT - LCD Control (Read/Write)
Bit Expl.
0-2 BG Mode (0-5=Video Mode 0-5, 6-7=Prohibited)
3 Reserved / CGB Mode (0=GBA, 1=CGB; can be set only by BIOS opcodes)
4 Display Frame Select (0-1=Frame 0-1) (for BG Modes 4,5 only)
5 H-Blank Interval Free (1=Allow access to OAM during H-Blank)
6 OBJ Character VRAM Mapping (0=Two dimensional, 1=One dimensional)
7 Forced Blank (1=Allow FAST access to VRAM,Palette,OAM)
8 Screen Display BG0 (0=Off, 1=On)
9 Screen Display BG1 (0=Off, 1=On)
10 Screen Display BG2 (0=Off, 1=On)
11 Screen Display BG3 (0=Off, 1=On)
12 Screen Display OBJ (0=Off, 1=On)
13 Window 0 Display Flag (0=Off, 1=On)
14 Window 1 Display Flag (0=Off, 1=On)
15 OBJ Window Display Flag (0=Off, 1=On)
```
So what we're supposed to understand here is that we've got a `u16`, and then we
set the individual bits for the things that we want. In the `hello_magic`
example you might recall that we set this register to the value `0x0403`. That
was a bit of a trick on my part because hex numbers usually look far more
mysterious than decimal or binary numbers. If we converted it to binary it'd
look like this:
```rust
0b100_0000_0011
```
And then you can just go down the list of settings to see what bits are what:
* Bits 0-2 (BG Mode) are `0b011`, so that's Video Mode 3
* Bit 10 (Display BG2) is enabled
* Everything else is disabled
Naturally, trying to remember exactly what bit does what can be difficult. In
the `gba` crate we attempt as much as possible to make types that wrap over a
`u16` or `u32` and then have getters and setters _as if_ all the inner bits were
different fields.
* If it's a single bit then the getter/setter will use `bool`.
* If it's more than one bit and each pattern has some non-numeric meaning then
it'll use an `enum`.
* If it's more than one bit and numeric in nature then it'll just use the
wrapped integer type. Note that you generally won't get the full range of the
inner number type, and any excess gets truncated down to fit in the bits
available.
All the getters and setters are defined as `const` functions, so you can make
constant declarations for the exact setting combinations that you want.
## Some Important IO Registers
It's not easy to automatically see what registers will be important for getting
started and what registers can be saved to learn about later.
We'll go over three IO registers here that will help us the most to get started,
then next lesson we'll cover how that Video Mode 3 bitmap drawing works, and
then by the end of the next lesson we'll be able to put it all together into
something interactive.
### DISPCNT: Display Control
The [DISPCNT](https://problemkaputt.de/gbatek.htm#lcdiodisplaycontrol) register
lets us affect the major details of our video output. There's a lot of other
registers involved too, but it all starts here.
```rust
pub const DISPCNT: VolAddress<DisplayControlSetting> = unsafe { VolAddress::new(0x400_0000) };
```
As you can see, the display control register is, like most registers,
complicated enough that we make it a dedicated type with getters and setters for
the "phantom" fields. In this case it's mostly a bunch of `bool` values we can
set, and also the video mode is an `enum`.
We already looked at the bit listing above, let's go over what's important right
now and skip the other bits:
* BG Mode sets how the whole screen is going to work and even how the display
adapter is going to interpret the bit layout of video memory for pixel
processing. We'll start with Mode 3, which is the simplest to learn.
* The "Forced Blank" bit is one of the very few bits that starts _on_ at the
start of the main program. When it's enabled it prevents the display adapter
from displaying anything at all. You use this bit when you need to do a very
long change to video memory and you don't want the user to see the
intermediate states being partly drawn.
* The "Screen Display" bits let us enable different display layers. We care
about BG2 right now because the bitmap modes (3, 4, and 5) are all treated as
if they were drawing into BG2 (even though it's the only BG layer available in
those modes).
There's a bunch of other stuff, but we'll get to those things later. They're not
relevent right now, and there's enough to learn already. Already we can see that
when the `hello_magic` demo says
```rust
(0x400_0000 as *mut u16).write_volatile(0x0403);
```
We could re-write that more sensibly like this
```rust
const SETTING: DisplayControlSetting =
DisplayControlSetting::new().with_mode(DisplayMode::Mode3).with_bg2(true);
DISPCNT.write(SETTING);
```
### VCOUNT: Vertical Display Counter
The [VCOUNT](https://problemkaputt.de/gbatek.htm#lcdiointerruptsandstatus)
register lets us find out what row of pixels (called a **scanline**) is
currently being processed.
```rust
pub const VCOUNT: ROVolAddress<u16> = unsafe { ROVolAddress::new(0x400_0006) };
```
You see, the display adapter is constantly running its own loop, along side the
CPU. It starts at the very first pixel of the very first scanline, takes 4
cycles to determine what color that pixel is, and then processes the next
pixel. Each scanline is 240 pixels long, followed by 68 "virtual" pixels so that
you have just a moment to setup for the next scanline to be drawn if you need
it. 272 cycles (68*4) is not a lot of time, but it's enough that you could
change some palette colors or move some objects around if you need to.
* Horizontal pixel value `0..240`: "HDraw"
* Horizontal pixel value `240..308`: "HBlank"
There's no way to check the current horizontal counter, but there is a way to
have the CPU interrupt the normal code when the HBlank period starts, which
we'll learn about later.
Once a complete scanline has been processed (including the blank period), the
display adapter keeps going with the next scanline. Similar to how the
horizontal processing works, there's 160 scanlines in the real display, and then
it's followed by 68 "virtual" scanlines to give you time for adjusting video
memory between the frames of the game.
* Vertical Count `0..160`: "VDraw"
* Vertical Count `160..228`: "VBlank"
Once every scanline has been processed (including the vblank period), the
display adapter starts the whole loop over again with scanline 0. A total of
280,896 cycles per display loop (4 * 308 * 228), and about 59.59ns per CPU
cycle, gives us a full speed display rate of 59.73fps. That's close enough to
60fps that I think we can just round up a bit whenever we're not counting it
down to the exact cycle timings.
However, there's a bit of a snag. If we change video memory during the middle of
a scanline the display will _immediately_ start processing using the new state
of video memory. The picture before the change and after the change won't look
like a single, clean picture. Instead you'll get what's called "[screen
tearing](https://en.wikipedia.org/wiki/Screen_tearing)", which is usually
considered to be the mark of a badly programmed game.
To avoid this we just need to only adjust video memory during one of the blank
periods. If you're really cool you can adjust things during HBlank, but we're
not that cool yet. Starting out our general program flow will be:
1) Gather input for the frame (next part of this lesson) and update the game
state, getting everything ready for when VBlank actually starts.
2) Once VBlank starts we update all of the video memory as fast as we can.
3) Once we're done drawing we again wait for the VDraw period to begin and then
do it all again.
Now, it's not the most efficient way, but to get our timings right we can just
read from `VCOUNT` over and over in a "busy loop". Once we read a value of 160
we know that we've entered VBlank. Once it goes back to 0 we know that we're
back in VDraw.
Doing a busy loop like this actually drains the batteries way more than
necessary. It keeps the CPU active constantly, which is what uses a fair amount
of the power. Normally you're supposed to put the CPU to sleep if you're just
waiting around for something to happen. However, that also requires learning
about some more concepts to get right. So to keep things easier starting out
we'll do the bad/lazy version and then upgrade our technique later.
### KEYINPUT: Key Input Reading
The [KEYINPUT](https://problemkaputt.de/gbatek.htm#gbakeypadinput) register is
the last one we've got to learn about this lesson. It lets you check the status
of all 10 buttons on the GBA.
```rust
pub const KEYINPUT: ROVolAddress<u16> = unsafe { ROVolAddress::new(0x400_0130) };
```
There's little to say here. It's a read only register, and the data just
contains one bit per button. The only thing that's a little weird about it is
that the bits follow a "low active" convention, so if the button is pressed then
the bit is 0, and if the button is released the bit is 1.
You _could_ work with that directly, but I think it's a lot easier to think
about having `true` for pressed and `false` for not pressed. So the `gba` crate
flips the bits when you read the keys:
```rust
/// Gets the current state of the keys
pub fn read_key_input() -> KeyInput {
KeyInput(KEYINPUT.read() ^ 0b0000_0011_1111_1111)
}
```
Now we can treat the KeyInput values like a totally normal bitset.

View file

@ -1,379 +0,0 @@
# The Hardware Memory Map
So we saw `hello_magic.rs` and then we learned what `volatile` was all about,
but we've still got a few things that are a bit mysterious. You can't just cast
a number into a pointer and start writing to it! That's totally crazy! That's
writing to un-allocated memory! Against the rules!
Well, _kinda_. It's true that you're not allowed to write _anywhere at all_, but
those locations were carefully selected locations.
You see, on a modern computer if you need to check if a key is pressed you ask
the Operating System (OS) to please go check for you. If you need to play a
sound, you ask the OS to please play the sound on a default sound output. If you
need to show a picture you ask the OS to give you access to the video driver so
that you can ask the video driver to please put some pixels on the screen.
That's mostly fine, except how does the OS actually do it? It doesn't have an OS
to go ask, it has to stop somewhere.
Ultimately, every piece of hardware is mapped into somewhere in the address
space of the CPU. You can't actually tell that this is the case as a normal user
because your program runs inside a virtualized address space. That way you can't
go writing into another program's memory and crash what they're doing or steal
their data (well, hopefully, it's obviously not perfect). Outside of the
virtualization layer the OS is running directly in the "true" address space, and
it can access the hardware on behalf of a program whenever it's asked to.
How does directly accessing the hardware work, _precisely_? It's just the same
as accessing the RAM. Each address holds some bits, and the CPU picks an address
and loads in the bits. Then the program gets the bits and has to decide what
they mean. The "driver" of a hardware device is just the layer that translates
between raw bits in the outside world and more meaningful values inside of the
program.
Of course, memory mapped hardware can change its bits at any time. The user can
press and release a key and you can't stop them. This is where `volatile` comes
in. Whenever there's memory mapped hardware you want to access it with
`volatile` operations so that you can be sure that you're sending the data every
time, and that you're getting fresh data every time.
## GBA Specifics
That's enough about the general concept of memory mapped hardware, let's get to
some GBA specifics. The GBA has the following sections in its memory map.
* BIOS
* External Work RAM (EWRAM)
* Internal Work RAM (IWRAM)
* IO Registers
* Palette RAM (PALRAM)
* Video RAM (VRAM)
* Object Attribute Memory (OAM)
* Game Pak ROM (ROM)
* Save RAM (SRAM)
Each of these has a few key points of interest:
* **Bus Width:** Also just called "bus", this is how many little wires are
_physically_ connecting a part of the address space to the CPU. If you need to
transfer more data than fits in the bus you have to do repeated transfers
until it all gets through.
* **Read/Write Modes:** Most parts of the address space can be read from in 8,
16, or 32 bits at a time (there's a few exceptions we'll see). However, a
significant portion of the address space can't accept 8 bit writes. Usually
this isn't a big deal, but standard `memcopy` routine switches to doing a
byte-by-byte copy in some situations, so we'll have to be careful about using
it in combination with those regions of the memory.
* **Access Speed:** On top of the bus width issue, not all memory can be
accessed at the same speed. The "fast" parts of memory can do a read or write
in 1 cycle, but the slower parts of memory can take a few cycles per access.
These are called "wait cycles". The exact timings depend on what you configure
the system to use, which is also limited by what your cartridge physically
supports. You'll often see timings broken down into `N` cycles (non-sequential
memory access) and `S` cycles (sequential memory access, often faster). There
are also `I` cycles (internal cycles) which happen whenever the CPU does an
internal operation that's more than one cycle to complete (like a multiply).
Don't worry, you don't have to count exact cycle timings unless you're on the
razor's edge of the GBA's abilities. For more normal games you just have to be
mindful of what you're doing and it'll be fine.
Let's briefly go over the major talking points of each memory region. All of
this information is also available in GBATEK, mostly in their [memory
map](http://www.akkit.org/info/gbatek.htm#gbamemorymap) section (though somewhat
spread through the rest of the document too).
Though I'm going to list the location range of each memory space below, most of
the hardware locations are actually mirrored at several points throughout the
address space.
### BIOS
* **Location:** `0x0` to `0x3FFF`
* **Bus:** 32-bit
* **Access:** Memory protected read-only (see text).
* **Wait Cycles:** None
The "basic input output system". This contains a grab bag of utilities that do
various tasks. The code is optimized for small size rather than great speed, so
you can sometimes write faster versions of these routines. Also, calling a bios
function has more overhead than a normal function call. You can think of bios
calls as being similar to system calls to the OS on a desktop computer. Useful,
but costly.
As a side note, not only is BIOS memory read only, but it's memory protected so
that you can't even read from bios memory unless the system is currently
executing a function that's in bios memory. If you try then the system just
gives back a nonsensical value that's not really what you asked for. If you
really want to know what's inside, there's actually a bug in one bios call
(`MidiKey2Freq`) that lets you read the bios section one byte at a time.
Also, there's not just one bios! Of course there's the official bios from
Nintendo that's used on actual hardware, but since that's code instead of
hardware it's protected by copyright. Since a bios is needed to run a GBA
emulator properly, people have come up with their own open source versions or
they simply make the emulator special case the bios and act _as if_ the function
call had done the right thing.
* The [TempGBA](https://github.com/Nebuleon/TempGBA) repository has an easy to
look at version written in assembly. It's API and effects are close enough to
the Nintendo version that most games will run just fine.
* You can also check out the [mGBA
bios](https://github.com/mgba-emu/mgba/blob/master/src/gba/bios.c) if you want
to see the C version of what various bios functions are doing.
### External Work RAM (EWRAM)
* **Location:** `0x200_0000` to `0x203_FFFF` (256k)
* **Bus:** 16-bit
* **Access:** Read-write, any size.
* **Wait Cycles:** 2
The external work ram is a sizable amount of space, but the 2 wait cycles per
access and 16-bit bus mean that you should probably think of it as being a
"heap" to avoid putting things in if you don't have to.
The GBA itself doesn't use this for anything, so any use is totally up to you.
At the moment, the linker script and `crt0.s` files provided with the `gba`
crate also have no defined use for the EWRAM, so it's 100% on you to decide how
you wanna use them.
(Note: There is an undocumented control register that lets you adjust the wait
cycles on EWRAM. Using it, you can turn EWRAM from the default 2 wait cycles
down to 1. However, not all GBA-like things support it. The GBA and GBA SP do,
the GBA Micro and DS do not. Emulators might or might not depending on the
particular emulator. See the [GBATEK system
control](https://problemkaputt.de/gbatek.htm#gbasystemcontrol) page for a full
description of that register, though probably only once you've read more of this
tutorial book and know how to make sense of IO registers and such.)
### Internal Work RAM (IWRAM)
* **Location:** `0x300_0000` to `0x300_7FFF` (32k)
* **Bus:** 32-bit
* **Access:** Read-write, any size.
* **Wait Cycles:** 0
This is where the "fast" memory for general purposes lives. By default the
system uses the 256 _bytes_ starting at `0x300_7F00` _and up_ for system and
interrupt purposes, while Rust's program stack starts at that same address _and
goes down_ from there.
Even though your stack exists in this space, it's totally reasonable to use the
bottom parts of this memory space for whatever quick scratch purposes, same as
EWRAM. 32k is fairly huge, and the stack going down from the top and the scratch
data going up from the bottom are unlikely to hit each other. If they do you
were probably well on your way to a stack overflow anyway.
The linker script and `crt0.s` file provided with the `gba` crate use the bottom
of IWRAM to store the `.data` and `.bss` [data
segments](https://en.wikipedia.org/wiki/Data_segment). That's where your global
variables get placed (both `static` and `static mut`). The `.data` segment holds
any variable that's initialized to non-zero, and the `.bss` section is for any
variable initialized to zero. When the GBA is powered on, some code in the
`crt0.s` file runs and copies the initial `.data` values into place within IWRAM
(all of `.bss` starts at 0, so there's no copy for those variables).
If you have no global variables at all, then you don't need to worry about those
details, but if you do have some global variables then you can use the _address
of_ the `__bss_end` symbol defined in the top of the `gba` crate as a marker for
where it's safe for you to start using IWRAM without overwriting your globals.
### IO Registers
* **Location:** `0x400_0000` to `0x400_03FE`
* **Bus:** 32-bit
* **Access:** different for each IO register
* **Wait Cycles:** 0
The IO Registers are where most of the magic happens, and it's where most of the
variety happens too. Each IO register is a specific width, usually 16-bit but
sometimes 32-bit. Most of them are fully read/write, but some of them are read
only or write only. Some of them have individual bits that are read only even
when the rest of the register is writable. Some of them can be written to, but
the write doesn't change the value you read back, it sets something else.
Really.
The IO registers are how you control every bit of hardware besides the CPU
itself. Reading the buttons, setting display modes, enabling timers, all of that
goes through different IO registers. Actually, even a few parts of the CPU's
operation can be controlled via IO register.
We'll go over IO registers more in the next section, including a few specific
registers, and then we'll constantly encounter more IO registers as we explore
each new topic through the rest of the book.
### Palette RAM (PALRAM)
* **Location:** `0x500_0000` to `0x500_03FF` (1k)
* **Bus:** 16-bit
* **Access:** Read any, single bytes mirrored (see text).
* **Wait Cycles:** Video Memory Wait (see text)
This is where the GBA stores color palette data. There's 256 slots for
Background color, and then 256 slots for Object color.
GBA colors are 15 bits each, with five bits per channel and the highest bit
being totally ignored, so we store them as `u16` values:
* `X_BBBBB_GGGGG_RRRRR`
Of note is the fact that the 256 palette slots can be viewed in two different
ways. There's two different formats for images in video memory: "8 bit per
pixel" (8bpp) and "4 bit per pixel mode" (4bpp).
* **8bpp:** Each pixel in the image is 8 bits and indexes directly into the full
256 entry palette array. An index of 0 means that pixel should be transparent,
so there's 255 possible colors.
* **4bpp:** Each pixel in the image is 4 bits and indexes into a "palbank" of 16
colors within the palette data. Some exterior control selects the palbank to
be used. An index of 0 still means that the pixel should be transparent, so
there's 15 possible colors.
Different images can use different modes all at once, as long as you can fit all
the colors you want to use into your palette layout.
PALRAM can't be written to in individual bytes. This isn't normally a problem at
all, because you wouldn't really want to write half of a color entry anyway. If
you do try to write a single byte then it gets "mirrored" into both halves of
the `u16` that would be associated with that address. For example, if you tried
to write `0x01u8` to either `0x500_0000` or `0x500_0001` then you'd actually
_effectively_ be writing `0x0101u16` to `0x500_0000`.
PALRAM follows what we'll call the "Video Memory Wait" rule: If you to access
the memory during a vertical blank or horizontal blank period there's 0 wait
cycles, and if you try to access the memory while the display controller is
drawing there is a 1 cycle wait inserted _if_ the display controller was using
that memory at that moment.
### Video RAM (VRAM)
* **Location:** `0x600_0000` to `0x601_7FFF` (96k or 64k+32k depending on mode)
* **Bus:** 16-bit
* **Access:** Read any, single bytes _sometimes_ mirrored (see text).
* **Wait Cycles:** Video Memory Wait (see text)
Video RAM is the memory for what you want the display controller to be
displaying. The GBA actually has 6 different display modes (numbered 0 through
5), and depending on the mode you're using the layout that you should imagine
VRAM having changes. Because there's so much involved here, I'll leave more
precise details to the following sections which talk about how to use VRAM in
each mode.
VRAM can't be written to in individual bytes. If you try to write a single byte
to background VRAM the byte gets mirrored like with PALRAM, and if you try with
object VRAM the write gets ignored entirely. Exactly what address ranges those
memory types are depends on video mode, but just don't bother with individual
byte writes to VRAM. If you want to change a single byte of data (and you might)
then the correct style is to read the full `u16`, mask out the old data, mask in
your new value, and then write the whole `u16`.
VRAM follows the same "Video Memory Wait" rule that PALRAM has.
### Object Attribute Memory (OAM)
* **Location:** `0x700_0000` to `0x700_03FF` (1k)
* **Bus:** 32-bit
* **Access:** Read any, single bytes no effect (see text).
* **Wait Cycles:** Video Memory Wait (see text)
This part of memory controls the "Objects" (OBJ) on the screen. An object is
_similar to_ the concept of a "sprite". However, because of an object's size
limitations, a single sprite might require more than one object to be drawn
properly. In general, if you want to think in terms of sprites at all, you
should think of sprites as being a logical / programming concept, and objects as
being a hardware concept.
While VRAM has the _image_ data for each object, this part of memory has the
_control_ data for each object. An objects "attributes" describe what part of
the VRAM to use, where to place is on the screen, any special graphical effects
to use, all that stuff. Each object has 6 bytes of attribute data (arranged as
three `u16` values), and there's a total of 128 objects (indexed 0 through 127).
But 6 bytes each times 128 entries out of 1024 bytes leaves us with 256 bytes
left over. What's the other space used for? Well, it's a little weird, but after
every three `u16` object attribute fields there's one `i16` "affine parameter"
field mixed in. It takes four such fields to make a complete set of affine
parameters (a 2x2 matrix), so we get a total of 32 affine parameter entries
across all of OAM. "Affine" might sound fancy but it just means a transformation
where anything that started parallel stays parallel after the transform. The
affine parameters can be used to scale, rotate, and/or skew a background or
object as it's being displayed on the screen. It takes more computing power than
the non-affine display, so you can't display as many different things at once
when using the affine modes.
OAM can't ever be written to with individual bytes. The write just has no effect
at all.
OAM follows the same "Video Memory Wait" rule that PALRAM has, **and** you can
also only freely access OAM during a horizontal blank if you set a special
"HBlank Interval Free" bit in one of the IO registers (the "Display Control"
register, which we'll talk about next lesson). The reason that you might _not_
want to set that bit is because when it's enabled you can't draw as many objects
at once. You don't lose the use of an exact number of objects, you actually lose
the use of a number of display adapter drawing cycles. Since not all objects
take the same number of cycles to render, it depends on what you're drawing.
GBATEK [has the details](https://problemkaputt.de/gbatek.htm#lcdobjoverview) if
you want to know precisely.
### Game Pak ROM (ROM)
* **Location:** Special (max of 32MB)
* **Bus:** 16-bit
* **Access:** Special
* **Wait Cycles:** Special
This is where your actual game is located! As you might guess, since each
cartridge is different, the details here depend quite a bit on the cartridge
that you use for your game. Even a simple statement like "you can't write to the
ROM region" isn't true for some carts if they have FlashROM.
The _most important_ thing to concern yourself with when considering the ROM
portion of memory is the 32MB limit. That's compiled code, images, sound,
everything put together. The total has to stay under 32MB.
The next most important thing to consider is that 16-bit bus. It means that we
compile our programs using "Thumb state" code instead of "ARM state" code.
Details about this can be found in the GBA Assembly section of the book, but
just be aware that there's two different types of assembly on the GBA. You can
switch between them, but the default for us is always Thumb state.
Another detail which you actually _don't_ have to think about much, but that you
might care if you're doing precise optimization, is that the ROM address space
is actually mirrored across three different locations:
* `0x800_0000` to `0x9FF_FFFF`: Wait State 0
* `0xA00_0000` to `0xBFF_FFFF`: Wait State 1
* `0xC00_0000` to `0xDFF_FFFF`: Wait State 2
These _don't_ mean 0, 1, and 2 wait cycles, they mean the wait cycles associated
with ROM mirrors 0, 1, and 2. On some carts the game will store different parts
of the data into different chips that are wired to be accessible through
different parts of the mirroring. The actual wait cycles used are even
configurable via an IO register called the
[WAITCNT](https://problemkaputt.de/gbatek.htm#gbasystemcontrol) ("Wait Control",
I don't know why C programmers have to give everything the worst names it's not
1980 any more).
### Save RAM (SRAM)
* **Location:** Special (max of 64k)
* **Bus:** 8-bit
* **Access:** Special
* **Wait Cycles:** Special
The Save RAM is also part of the cart that you've got your game on, so it also
depends on your hardware.
SRAM _starts_ at `0xE00_0000` and you can save up to however much the hardware
supports, to a maximum of 64k. However, you can only read and write SRAM one
_byte_ at a time. What's worse, while you can _write_ to SRAM using code
executing anywhere, you can only _read_ with code that's executing out of either
Internal or External Work RAM, not from with code that's executing out of ROM.
This means that you need to copy the code for doing the read into some scratch
space (either at startup or on the fly, doesn't matter) and call that function
you've carefully placed. It's a bit annoying, but soon enough a routine for it
all will be provided in the `gba` crate and we won't have to worry too much
about it.
(TODO: Provide the routine that I just claimed we would provide.)

View file

@ -1,48 +0,0 @@
# Volatile
I know that you just got your first program running and you're probably excited
to learn more about GBA stuff, but first we have to cover a subject that's not
quite GBA specific.
In the `hello_magic.rs` file we had these lines
```rust
(0x600_0000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
(0x600_0000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
(0x600_0000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
```
You've probably seen or heard of the
[write](https://doc.rust-lang.org/core/ptr/fn.write.html) function before, but
you'd be excused if you've never heard of its cousin function,
[write_volatile](https://doc.rust-lang.org/core/ptr/fn.write_volatile.html).
What's the difference? Well, when the compiler sees normal reads and writes, it
assumes that those go into plain old memory locations. CPU registers, RAM,
wherever it is that the value's being placed. The compiler assumes that it's
safe to optimize away some of the reads and writes, or maybe issue the reads and
writes in a different order from what you wrote. Normally this is okay, and it's
exactly what we want the compiler to be doing, quietly making things faster for us.
However, some of the time we access values from parts of memory where it's
important that each access happen, and in the exact order that we say. In our
`hello_magic.rs` example, we're writing directly into the video memory of the
display. The compiler sees that the rest of the Rust program never read out of
those locations, so it might think "oh, we can skip those writes, they're
pointless". It doesn't know that we're having a side effect besides just storing
some value at an address.
By declaring a particular read or write to be `volatile` then we can force the
compiler to issue that access. Further, we're guaranteed that all `volatile`
access will happen in exactly the order it appears in the program relative to
other `volatile` access. However, non-volatile access can still be re-ordered
relative to a volatile access. In other words, for parts of the memory that are
volatile, we must _always_ use a volatile read or write for our program to
perform properly.
For exactly this reason, we've got the [voladdress](https://docs.rs/voladdress/)
crate. It used to be part of the GBA crate, but it became big enough to break
out into a stand alone crate. It doesn't even do too much, it just makes it a
lot less error prone to accidentally forget to use volatile with our memory
mapped addresses. We just call `read` and `write` on any `VolAddress` that we
happen to see and the right thing will happen.