mirror of
https://github.com/italicsjenga/gba.git
synced 2025-01-26 01:16:33 +11:00
No more old book stuff (#123)
* stop with the book, we should focus on the crate. * Update README.md * Update README.md
This commit is contained in:
parent
99f80d2b9a
commit
8efef6ebc5
54 changed files with 32 additions and 6528 deletions
|
@ -1,6 +1,6 @@
|
|||
[package]
|
||||
name = "gba"
|
||||
description = "A crate (and book) for making GBA games with Rust."
|
||||
description = "A crate for making GBA games with Rust."
|
||||
version = "0.4.0-pre1"
|
||||
authors = ["Lokathor <zefria@gmail.com>", "Thomas Winwood <twwinwood@gmail.com>"]
|
||||
repository = "https://github.com/rust-console/gba"
|
||||
|
|
59
README.md
59
README.md
|
@ -11,43 +11,45 @@
|
|||
|
||||
# gba
|
||||
|
||||
_Eventually_ there will be a full [Tutorial
|
||||
Book](https://rust-console.github.io/gba/) that goes along with this crate.
|
||||
However, currently the development focus is leaning towards having minimal
|
||||
coverage of all the parts of the GBA. Until that's done, unfortunately the book
|
||||
will be in a rather messy state.
|
||||
A crate to make GBA programming easy.
|
||||
|
||||
## What's Missing
|
||||
Currently we don't have as much documentation as we'd like.
|
||||
If you check out the [awesome-gbadev](https://github.com/gbdev/awesome-gbadev) repository they have many resources, though most are oriented towards C.
|
||||
|
||||
The following major GBA features are still missing from the crate:
|
||||
## First Time Setup
|
||||
|
||||
* Affine Graphics
|
||||
* Interrupt Handling
|
||||
* Serial Communication
|
||||
Building for the GBA requires Nightly rust, and also uses the `build-std` feature, so you'll need the rust source available.
|
||||
|
||||
## Build Dependencies
|
||||
|
||||
Install required cargo packages
|
||||
```sh
|
||||
rustup install nightly
|
||||
rustup +nightly component add rust-src
|
||||
```
|
||||
|
||||
You'll also need the ARM binutils so that you can have the assembler and linker for the ARMv4T architecture.
|
||||
The way to get them varies by platform:
|
||||
* Ubuntu and other debian-like linux distros will usually have them in the package manager.
|
||||
```shell
|
||||
sudo apt-get install binutils-arm-none-eabi
|
||||
```
|
||||
* With OSX you can get them via homebrew.
|
||||
```shell
|
||||
brew install --cask gcc-arm-embedded
|
||||
```
|
||||
* On Windows you can get the installer from ARM's website and run that.
|
||||
* Download the [GNU Arm Embedded Toolchain](https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-rm/downloads)
|
||||
* When installing the toolchain, make sure to select "Add path to environment variable" during install.
|
||||
* You'll have to restart any open command prompts after you so run the installer so that they see the new PATH value.
|
||||
|
||||
Finally, rustc itself is only able to make ELF format files. These can be run in emulators, but aren't able to be played on actual hardware.
|
||||
You'll need to convert the ELF file into a GBA rom. There's a `cargo-make` file in this repository to do this, and it relies on a tool called `gbafix`
|
||||
to assign the right header data to the ROM when packing it.
|
||||
|
||||
```sh
|
||||
cargo install cargo-make
|
||||
cargo install gbafix
|
||||
```
|
||||
|
||||
Install arm build tools
|
||||
* Ubuntu
|
||||
```shell
|
||||
sudo apt-get install binutils-arm-none-eabi
|
||||
```
|
||||
* OSX
|
||||
```shell
|
||||
brew install --cask gcc-arm-embedded
|
||||
```
|
||||
* Windows
|
||||
* Download the [GNU Arm Embedded Toolchain](https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-rm/downloads)
|
||||
* Install the toolchain, make sure to select "Add path to environment variable" during install
|
||||
|
||||
<!--
|
||||
## First Time Setup
|
||||
|
||||
Writing a Rust program for the GBA requires a fair amount of special setup. All
|
||||
|
@ -61,8 +63,9 @@ project started quickly we got you covered:
|
|||
```sh
|
||||
curl https://raw.githubusercontent.com/rust-console/gba/master/init.sh -sSf | bash -s APP_NAME
|
||||
```
|
||||
-->
|
||||
|
||||
# Contribution
|
||||
|
||||
This crate is Apache2 licensed and any contributions you submit must also be
|
||||
Apache2 licensed.
|
||||
This crate is tri-licensed under Zlib / Apache-2.0 / MIT.
|
||||
Any contributions you submit must be licensed the same.
|
||||
|
|
|
@ -1,7 +0,0 @@
|
|||
[book]
|
||||
title = "Rust GBA Guide"
|
||||
authors = ["Lokathor"]
|
||||
|
||||
[build]
|
||||
build-dir = "../target/book-output"
|
||||
create-missing = true
|
|
@ -1,38 +0,0 @@
|
|||
# Broad Concepts
|
||||
|
||||
The GameBoy Advance sits in a middle place between the chthonic game consoles of
|
||||
the ancient past and the "small PC in a funny case" consoles of the modern age.
|
||||
|
||||
On the one hand, yeah, you're gonna find a few strange conventions as you learn
|
||||
all the ropes.
|
||||
|
||||
On the other, at least we're writing in Rust at all, and not having to do all
|
||||
the assembly by hand.
|
||||
|
||||
This chapter for "concepts" has a section for each part of the GBA's hardware
|
||||
memory map, going by increasing order of base address value. The sections try to
|
||||
explain as much as possible while sticking to just the concerns you might have
|
||||
regarding that part of the memory map.
|
||||
|
||||
For an assessment of how to wrangle all three parts of the video system (PALRAM,
|
||||
VRAM, and OAM), along with the correct IO registers, into something that shows a
|
||||
picture, you'll want the Video chapter.
|
||||
|
||||
Similarly, the "IO Registers" part of the GBA actually controls how you interact
|
||||
with every single bit of hardware connected to the GBA. A full description of
|
||||
everything is obviously too much for just one section of the book. Instead you
|
||||
get an overview of general IO register rules and advice. Each particular
|
||||
register is described in the appropriate sections of either the Video or
|
||||
Non-Video chapters.
|
||||
|
||||
## Bus Size
|
||||
|
||||
TODO: describe this
|
||||
|
||||
## Minimum Write Size
|
||||
|
||||
TODO: talk about parts where you can't write one byte at a time
|
||||
|
||||
## Volatile or Not?
|
||||
|
||||
TODO: discuss what memory should be used volatile style and what can be used normal style.
|
|
@ -1,21 +0,0 @@
|
|||
# Introduction
|
||||
|
||||
This is the book for learning how to write GameBoy Advance (GBA) games in Rust.
|
||||
|
||||
I'm **Lokathor**, the main author of the book. There's also **Ketsuban** who
|
||||
provides the technical advisement, reviews the PRs, and keeps my crazy in check.
|
||||
|
||||
The book is a work in progress, as you can see if you actually try to open many
|
||||
of the pages listed in the Table Of Contents.
|
||||
|
||||
## Feedback
|
||||
|
||||
It's very often hard to tell when you've explained something properly. In the
|
||||
same way that your brain will read over small misspellings and correct things
|
||||
into the right word, if an explanation for something you already understand
|
||||
accidentally skips over some small detail then your brain can fill in the gaps
|
||||
without you realizing it.
|
||||
|
||||
**Please**, if things don't make sense then [file an
|
||||
issue](https://github.com/rust-console/gba/issues) about it so I know where
|
||||
things need to improve.
|
|
@ -1,21 +0,0 @@
|
|||
# Non-Video
|
||||
|
||||
Besides video effects the GBA still has an okay amount of stuff going on.
|
||||
|
||||
Obviously you'll want to know how to read the user's button inputs. That can
|
||||
almost go without saying, except that I said it.
|
||||
|
||||
Each other part can be handled in about any order you like.
|
||||
|
||||
Using interrupts is perhaps one of the hardest things for us as Rust programmers
|
||||
due to quirks in our compilation process. Our code all gets compiled to 16-bit
|
||||
THUMB instructions, and we don't have a way to mark a function to be compiled
|
||||
using 32-bit ASM instructions instead. However, an interrupt handler _must_ be
|
||||
written in 32-bit ASM instructions for it to work. That means that we have to
|
||||
write our interrupt handler in 32-bit ASM by hand. We'll do it, but I don't
|
||||
think we'll be too happy about it.
|
||||
|
||||
The Link Cable related stuff is also probably a little harder to test than
|
||||
anything else. Just because link cable emulation isn't always the best, and or
|
||||
you need two GBAs with two flash carts and the cable for hardware testing.
|
||||
Still, we'll try to go over it eventually.
|
|
@ -1,9 +0,0 @@
|
|||
# Quirks
|
||||
|
||||
The GBA supports a lot of totally normal Rust code exactly like you'd think.
|
||||
|
||||
However, it also is missing a lot of what you might expect, and sometimes we
|
||||
have to do things in slightly weird ways.
|
||||
|
||||
We start the book by covering the quirks our code will have, just to avoid too
|
||||
many surprises later.
|
|
@ -1,9 +0,0 @@
|
|||
# Video
|
||||
|
||||
GBA Video starts with an IO register called the "Display Control Register", and
|
||||
then spirals out from there. You generally have to use Palette RAM (PALRAM),
|
||||
Video RAM (VRAM), Object Attribute Memory (OAM), as well as any number of other
|
||||
IO registers.
|
||||
|
||||
They all have to work together just right, and there's a lot going on when you
|
||||
first try doing it, so try to take it very slowly as you're learning each step.
|
|
@ -1,102 +0,0 @@
|
|||
# Buttons
|
||||
|
||||
It's all well and good to just show a picture, even to show an animation, but if
|
||||
we want a game we have to let the user interact with something.
|
||||
|
||||
## Key Input Register
|
||||
|
||||
* KEYINPUT, `0x400_0130`, `u16`, read only
|
||||
|
||||
This little `u16` stores the status of _all_ the buttons on the GBA, all at
|
||||
once. There's only 10 of them, and we have 16 bits to work with, so that sounds
|
||||
easy. However, there's a bit of a catch. The register follows a "low-active"
|
||||
convention, where pressing a button _clears_ that bit until it's released.
|
||||
|
||||
```rust
|
||||
const NO_BUTTONS_PRESSED: u16 = 0b0000_0011_1111_1111;
|
||||
```
|
||||
|
||||
The buttons are, going up in order from the 0th bit:
|
||||
|
||||
* A
|
||||
* B
|
||||
* Select
|
||||
* Start
|
||||
* Right
|
||||
* Left
|
||||
* Up
|
||||
* Down
|
||||
* R
|
||||
* L
|
||||
|
||||
Bits above that are not used. However, since the left and right directions, as
|
||||
well as the up and down directions, can never be pressed at the same time, the
|
||||
`KEYINPUT` register should never read as zero. Of course, the register _might_
|
||||
read as zero if someone is using an emulator that allows for such inputs, so I
|
||||
wouldn't go so far as to make it be `NonZeroU16` or anything like that.
|
||||
|
||||
When programming, we usually are thinking of what buttons we want to have _be
|
||||
pressed_ instead of buttons we want to have _not be pressed_. This means that we
|
||||
need an inversion to happen somewhere along the line. The easiest moment of
|
||||
inversion is immediately as you read in from the register and wrap the value up
|
||||
in a newtype.
|
||||
|
||||
```rust
|
||||
pub fn read_key_input() -> KeyInput {
|
||||
KeyInput(KEYINPUT.read() ^ 0b0000_0011_1111_1111)
|
||||
}
|
||||
```
|
||||
|
||||
Now the KeyInput you get can be checked for what buttons are pressed by checking
|
||||
for a set bit like you'd do anywhere else.
|
||||
|
||||
```rust
|
||||
impl KeyInput {
|
||||
pub fn a_pressed(self) -> bool {
|
||||
(self.0 & A_BIT) > 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Note that the current `KEYINPUT` value changes in real time as the user presses
|
||||
or releases the buttons. To account for this, it's best to read the value just
|
||||
once per game frame and then use that single value as if it was the input across
|
||||
the whole frame. If you've worked with polling input before that should sound
|
||||
totally normal. If not, just remember to call `read_key_input` once per frame
|
||||
and then use that `KeyInput` value across the whole frame.
|
||||
|
||||
### Detecting New Presses
|
||||
|
||||
The keypad only tells you what's _currently_ pressed, but if you want to check
|
||||
what's _newly_ pressed it's not too much harder.
|
||||
|
||||
All that you do is store the last frame's keys and compare them to the current
|
||||
keys with an `XOR`. In the `gba` crate it's called `KeyInput::difference`. Once
|
||||
you've got the difference between last frame and this frame, you know what
|
||||
changes happened.
|
||||
|
||||
* If something is in the difference and _not pressed_ in the last frame, that
|
||||
means it was newly pressed.
|
||||
* If something is in the difference and _pressed_ in the last frame that means
|
||||
it was newly released.
|
||||
* If something is not in the difference then there's no change between last
|
||||
frame and this frame.
|
||||
|
||||
## Key Interrupt Control
|
||||
|
||||
* KEYCNT, `0x400_0132`, `u16`, read/write
|
||||
|
||||
This lets you control what keys will trigger a keypad interrupt. Of course, for
|
||||
the actual interrupt to fire you also need to set the `IME` and `IE` registers
|
||||
properly. See the [Interrupts](05-interrupts.md) section for details there.
|
||||
|
||||
The main thing to know about this register is that the keys are in _the exact
|
||||
same order_ as the key input order. However, with this register they use a
|
||||
high-active convention instead (eg: the bit is active when the button should be
|
||||
pressed as part of the interrupt).
|
||||
|
||||
In addition to simply having the bits for the buttons, bit 14 is a flag for
|
||||
enabling keypad interrupts (in addition to the flag in the `IE` register), and
|
||||
bit 15 decides how having more than one button works. If bit 15 is disabled,
|
||||
it's an OR combination (eg: "press any key to continue"). If bit 15 is enabled
|
||||
it's an AND combination (eg: "press A+B+Start+Select to reset").
|
|
@ -1 +0,0 @@
|
|||
# CPU
|
|
@ -1,160 +0,0 @@
|
|||
# No Std
|
||||
|
||||
First up, as you already saw in the `hello_magic` code, we have to use the
|
||||
`#![no_std]` outer attribute on our program when we target the GBA. You can find
|
||||
some info about `no_std` in two official sources:
|
||||
|
||||
* [unstable
|
||||
book section](https://doc.rust-lang.org/unstable-book/language-features/lang-items.html#writing-an-executable-without-stdlib)
|
||||
* [embedded
|
||||
book section](https://rust-embedded.github.io/book/intro/no-std.html?highlight=no_std#a--no_std--rust-environment)
|
||||
|
||||
The unstable book is borderline useless here because it's describing too many
|
||||
things in too many words. The embedded book is much better, but still fairly
|
||||
terse.
|
||||
|
||||
## Bare Metal
|
||||
|
||||
The GBA falls under what the Embedded Book calls "Bare Metal Environments".
|
||||
Basically, the machine powers on and immediately begins executing some ASM code.
|
||||
Our ASM startup was provided by `Ketsuban` (check the `crt0.s` file). We'll go
|
||||
over _how_ it works much later on, for now it's enough to know that it does
|
||||
work, and eventually control passes into Rust code.
|
||||
|
||||
On the rust code side of things, we determine our starting point with the
|
||||
`#[start]` attribute on our `main` function. The `main` function also has a
|
||||
specific type signature that's different from the usual `main` that you'd see in
|
||||
Rust. I'd tell you to read the unstable-book entry on `#[start]` but they
|
||||
[literally](https://doc.rust-lang.org/unstable-book/language-features/start.html)
|
||||
just tell you to look at the [tracking issue for
|
||||
it](https://github.com/rust-lang/rust/issues/29633) instead, and that's not very
|
||||
helpful either. Basically it just _has_ to be declared the way it is, even
|
||||
though there's nothing passing in the arguments and there's no place that the
|
||||
return value will go. The compiler won't accept it any other way.
|
||||
|
||||
## No Standard Library
|
||||
|
||||
The Embedded Book tells us that we can't use the standard library, but we get
|
||||
access to something called "libcore", which sounds kinda funny. What they're
|
||||
talking about is just [the core
|
||||
crate](https://doc.rust-lang.org/core/index.html), which is called `libcore`
|
||||
within the rust repository for historical reasons.
|
||||
|
||||
The `core` crate is actually still a really big portion of Rust. The standard
|
||||
library doesn't actually hold too much code (relatively speaking), instead it
|
||||
just takes code form other crates and then re-exports it in an organized way. So
|
||||
with just `core` instead of `std`, what are we missing?
|
||||
|
||||
In no particular order:
|
||||
|
||||
* Allocation
|
||||
* Clock
|
||||
* Network
|
||||
* File System
|
||||
|
||||
The allocation system and all the types that you can use if you have a global
|
||||
allocator are neatly packaged up in the
|
||||
[alloc](https://doc.rust-lang.org/alloc/index.html) crate. The rest isn't as
|
||||
nicely organized.
|
||||
|
||||
It's _possible_ to implement a fair portion of the entire standard library
|
||||
within a GBA context and make the rest just panic if you try to use it. However,
|
||||
do you really need all that? Eh... probably not?
|
||||
|
||||
* We don't need a file system, because all of our data is just sitting there in
|
||||
the ROM for us to use. When programming we can organize our `const` data into
|
||||
modules and such to keep it organized, but once the game is compiled it's just
|
||||
one huge flat address space. TODO: Parasyte says that a FS can be handy even
|
||||
if it's all just ReadOnly, so we'll eventually talk about how you might set up
|
||||
such a thing I guess, since we'll already be talking about replacements for
|
||||
three of the other four things we "lost". Maybe we'll make Parasyte write that
|
||||
section.
|
||||
* Networking, well, the GBA has a Link Cable you can use to communicate with
|
||||
another GBA, but it's not really like a unix socket with TCP, so the standard
|
||||
Rust networking isn't a very good match.
|
||||
* Clock is actually two different things at once. One is the ability to store
|
||||
the time long term, which is a bit of hardware that some gamepaks have in them
|
||||
(eg: pokemon ruby/sapphire/emerald). The GBA itself can't keep time while
|
||||
power is off. However, the second part is just tracking time moment to moment,
|
||||
which the GBA can totally do. We'll see how to access the timers soon enough.
|
||||
|
||||
Which just leaves us with allocation. Do we need an allocator? Depends on your
|
||||
game. For demos and small games you probably don't need one. For bigger games
|
||||
you'll maybe want to get an allocator going eventually. It's in some sense a
|
||||
crutch, but it's a very useful one.
|
||||
|
||||
So I promise that at some point we'll cover how to get an allocator going.
|
||||
Either a Rust Global Allocator (if practical), which would allow for a lot of
|
||||
the standard library types to be used "for free" once it was set up, or just a
|
||||
custom allocator that's GBA specific if Rust's global allocator style isn't a
|
||||
good fit for the GBA (I honestly haven't looked into it).
|
||||
|
||||
## Bare Metal Panic
|
||||
|
||||
If our code panics, we usually want to see that panic message. Unfortunately,
|
||||
without a way to access something like `stdout` or `stderr` we've gotta do
|
||||
something a little weirder.
|
||||
|
||||
If our program is running within the `mGBA` emulator, version 0.7 or later, we
|
||||
can access a special set of addresses that allow us to send out `CString`
|
||||
values, which then appear within a message log that you can check.
|
||||
|
||||
We can capture this behavior by making an `MGBADebug` type, and then implement
|
||||
`core::fmt::Write` for that type. Once done, the `write!` macro will let us
|
||||
target the mGBA debug output channel.
|
||||
|
||||
When used, it looks like this:
|
||||
|
||||
```rust
|
||||
#[panic_handler]
|
||||
fn panic(info: &core::panic::PanicInfo) -> ! {
|
||||
use core::fmt::Write;
|
||||
use gba::mgba::{MGBADebug, MGBADebugLevel};
|
||||
|
||||
if let Some(mut mgba) = MGBADebug::new() {
|
||||
let _ = write!(mgba, "{}", info);
|
||||
mgba.send(MGBADebugLevel::Fatal);
|
||||
}
|
||||
loop {}
|
||||
}
|
||||
```
|
||||
|
||||
If you want to follow the particulars you can check the `MGBADebug` source in
|
||||
the `gba` crate. Basically, there's one address you can use to try and activate
|
||||
the debug output, and if it works you write your message into the "array" at
|
||||
another address, and then finally write a send value to a third address. You'll
|
||||
need to have read the [volatile](03-volatile_destination.md) section for the
|
||||
details to make sense.
|
||||
|
||||
## LLVM Intrinsics
|
||||
|
||||
The above code will make your program fail to build in debug mode, saying that
|
||||
`__clzsi2` can't be found. This is a special builtin function that LLVM attempts
|
||||
to use when there's no hardware version of an operation it wants to do (in this
|
||||
case, counting the leading zeros). It's not _actually_ necessary in this case,
|
||||
which is why you only need it in debug mode. The higher optimization level of
|
||||
release mode makes LLVM pre-compute more and fold more constants or whatever and
|
||||
then it stops trying to call `__clzsi2`.
|
||||
|
||||
Unfortunately, sometimes a build will fail with a missing intrinsic even in
|
||||
release mode.
|
||||
|
||||
If LLVM wants _core_ to have that intrinsic then you're in
|
||||
trouble, you'll have to send a PR to the
|
||||
[compiler-builtins](https://github.com/rust-lang-nursery/compiler-builtins)
|
||||
repository and hope to get it into rust itself.
|
||||
|
||||
If LLVM wants _your code_ to have the intrinsic then you're in less trouble. You
|
||||
can look up the details and then implement it yourself. It can go anywhere in
|
||||
your program, as long as it has the right ABI and name. In the case of
|
||||
`__clzsi2` it takes a `usize` and returns a `usize`, so you'd write something
|
||||
like:
|
||||
|
||||
```rust
|
||||
#[no_mangle]
|
||||
pub extern "C" fn __clzsi2(mut x: usize) -> usize {
|
||||
//
|
||||
}
|
||||
```
|
||||
|
||||
And so on for whatever other missing intrinsic.
|
|
@ -1,29 +0,0 @@
|
|||
# Reader Requirements
|
||||
|
||||
This book naturally assumes that you've already read Rust's core book:
|
||||
|
||||
* [The Rust Programming Language](https://doc.rust-lang.org/book/)
|
||||
|
||||
Now, I _know_ it sounds silly to say "if you wanna program Rust on this old
|
||||
video game system you should already know how to program Rust", but the more
|
||||
people I meet and chat with the more they tell me that they jumped into Rust
|
||||
without reading any or all of the book. You know who you are.
|
||||
|
||||
Please, read the whole book!
|
||||
|
||||
In addition to the core book, there's also an expansion book that I will declare
|
||||
to be required reading for this:
|
||||
|
||||
* [The Rustonomicon](https://doc.rust-lang.org/nomicon/)
|
||||
|
||||
The Rustonomicon is all about trying to demystify `unsafe`. We'll end up using a
|
||||
fair bit of unsafe code as a natural consequence of doing direct hardware
|
||||
manipulations. Using unsafe is like [swinging a
|
||||
sword](https://www.zeldadungeon.net/wp-content/uploads/2013/04/tumblr_mlkpzij6T81qizbpto1_1280.gif),
|
||||
you should start slowly, practice carefully, and always pay attention no matter
|
||||
how experienced you think you've become.
|
||||
|
||||
That said, it's sometimes a [necessary
|
||||
tool](https://www.youtube.com/watch?v=rTo2u13lVcQ) to get the job done, so you
|
||||
have to break out of the borderline pathological fear of using it that most rust
|
||||
programmers tend to have.
|
|
@ -1 +0,0 @@
|
|||
# RBG15 Color
|
|
@ -1,239 +0,0 @@
|
|||
# BIOS
|
||||
|
||||
* **Address Span:** `0x0` to `0x3FFF` (16k)
|
||||
|
||||
The [BIOS](https://en.wikipedia.org/wiki/BIOS) of the GBA is a small read-only
|
||||
portion of memory at the very base of the address space. However, it is also
|
||||
hardware protected against reading, so if you try to read from BIOS memory when
|
||||
the program counter isn't pointed into the BIOS (eg: any time code _you_ write
|
||||
is executing) then you get [basically garbage
|
||||
data](https://problemkaputt.de/gbatek.htm#gbaunpredictablethings) back.
|
||||
|
||||
So we're not going to spend time here talking about what bits to read or write
|
||||
within BIOS memory like we do with the other sections. Instead we're going to
|
||||
spend time talking about [inline
|
||||
assembly](https://doc.rust-lang.org/unstable-book/language-features/asm.html)
|
||||
([tracking issue](https://github.com/rust-lang/rust/issues/29722)) and then use
|
||||
it to call the [GBA BIOS
|
||||
Functions](https://problemkaputt.de/gbatek.htm#biosfunctions).
|
||||
|
||||
Note that BIOS calls have _more overhead than normal function calls_, so don't
|
||||
go using them all over the place if you don't have to. They're also usually
|
||||
written more to be compact in terms of code than for raw speed, so you actually
|
||||
can out speed them in some cases. Between the increased overhead and not being
|
||||
as speed optimized, you can sometimes do a faster job without calling the BIOS
|
||||
at all. (TODO: investigate more about what parts of the BIOS we could
|
||||
potentially offer faster alternatives for.)
|
||||
|
||||
I'd like to take a moment to thank [Marc Brinkmann](https://github.com/mbr)
|
||||
(with contributions from [Oliver Scherer](https://github.com/oli-obk) and
|
||||
[Philipp Oppermann](https://github.com/phil-opp)) for writing [this blog
|
||||
post](http://embed.rs/articles/2016/arm-inline-assembly-rust/). It's at least
|
||||
ten times the tutorial quality as the `asm` entry in the Unstable Book has. In
|
||||
fairness to the Unstable Book, the actual spec of how inline ASM works in rust
|
||||
is "basically what clang does", and that's specified as "basically what GCC
|
||||
does", and that's basically/shockingly not specified much at all despite GCC
|
||||
being like 30 years old.
|
||||
|
||||
So let's be slow and pedantic about this process.
|
||||
|
||||
## Inline ASM
|
||||
|
||||
**Fair Warning:** The general information that follows regarding the asm macro
|
||||
is consistent from system to system, but specific information about register
|
||||
names, register quantities, asm instruction argument ordering, and so on is
|
||||
specific to ARM on the GBA. If you're programming for any other device you'll
|
||||
need to carefully investigate that before you begin.
|
||||
|
||||
Now then, with those out of the way, the inline asm docs describe an asm call as
|
||||
looking like this:
|
||||
|
||||
```rust
|
||||
let x = 10u32;
|
||||
let y = 34u32;
|
||||
let result: u32;
|
||||
asm!(
|
||||
// assembly template
|
||||
"add {lhs}, {rhs}",
|
||||
lhs = inout(reg_thumb) x => result,
|
||||
rhs = in(reg_thumb) y,
|
||||
options(nostack, nomem),
|
||||
);
|
||||
// result == 44
|
||||
```
|
||||
|
||||
The `asm` macro follows the [RFC
|
||||
2873](https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md)
|
||||
syntax. The following is just a summary of the RFC.
|
||||
|
||||
Now we have to decide what we're gonna write. Obviously we're going to do some
|
||||
instructions, but those instructions use registers, and how are we gonna talk
|
||||
about them? We've got two choices.
|
||||
|
||||
1) We can pick each and every register used by specifying exact register names.
|
||||
In THUMB mode we have 8 registers available, named `r0` through `r7`. To use
|
||||
those registers you would write `in("r0") x` instead of
|
||||
`rhs = in(reg_thumb) x`, and directly refer to `r0` in the assembly template.
|
||||
|
||||
2) We can specify slots for registers we need and let LLVM decide. This is what
|
||||
we do when we write `rhs = in(reg_thumb) y` and use `{rhs}` in the assembly
|
||||
template.
|
||||
|
||||
The `reg_thumb` stands for the register class we are using. Since we are
|
||||
in THUMB mode, the set of registers we can use is limited. `reg_thumb` tells
|
||||
LLVM: "use only registers available in THUMB mode". In 32-bit mode, you have
|
||||
access to more register and you should use a different register class.
|
||||
|
||||
The register classes [are described in the
|
||||
RFC](https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md#register-operands).
|
||||
Look for "ARM" register classes.
|
||||
|
||||
In the case of the GBA BIOS, each BIOS function has pre-designated input and
|
||||
output registers, so we will use the first style. If you use inline ASM in other
|
||||
parts of your code you're free to use the second style.
|
||||
|
||||
### Assembly
|
||||
|
||||
This is just one big string literal. You write out one instruction per line, and
|
||||
excess whitespace is ignored. You can also do comments within your assembly
|
||||
using `;` to start a comment that goes until the end of the line.
|
||||
|
||||
Assembly convention doesn't consider it unreasonable to comment potentially as
|
||||
much as _every single line_ of asm that you write when you're getting used to
|
||||
things. Or even if you are used to things. This is cryptic stuff, there's a
|
||||
reason we avoid writing in it as much as possible.
|
||||
|
||||
Remember that our Rust code is in 16-bit mode. You _can_ switch to 32-bit mode
|
||||
within your asm as long as you switch back by the time the block ends. Otherwise
|
||||
you'll have a bad time.
|
||||
|
||||
### Register bindings
|
||||
|
||||
After the assembly string literal, you need to define your binding (which
|
||||
rust variables are getting into your registers and which ones are going to refer
|
||||
to their value afterward).
|
||||
|
||||
There are many operand types [as per the
|
||||
RFC](https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md#operand-type),
|
||||
but you will most often use:
|
||||
|
||||
```
|
||||
[alias =] in(<reg>) <binding> // input
|
||||
[alias =] out(<reg>) <binding> // output
|
||||
[alias =] inout(<reg>) <in binding> => <out binding> // both
|
||||
out(<reg>) _ // Clobber
|
||||
```
|
||||
|
||||
* The binding can be any single 32-bit or smaller value.
|
||||
* If your binding has bit pattern requirements ("must be non-zero", etc) you are
|
||||
responsible for upholding that.
|
||||
* If your binding type will try to `Drop` later then you are responsible for it
|
||||
being in a fit state to do that.
|
||||
* The binding must be either a mutable binding or a binding that was
|
||||
pre-declared but not yet assigned.
|
||||
* An input binding must be a single 32-bit or smaller value.
|
||||
* An input binding _should_ be a type that is `Copy` but this is not an absolute
|
||||
requirement. Having the input be read is semantically similar to using
|
||||
`core::ptr::read(&binding)` and forgetting the value when you're done.
|
||||
|
||||
Anything else is UB.
|
||||
|
||||
### Clobbers
|
||||
|
||||
Sometimes your asm will touch registers other than the ones declared for input
|
||||
and output.
|
||||
|
||||
Clobbers are declared as a comma separated list of string literals naming
|
||||
specific registers. You don't use curly braces with clobbers.
|
||||
|
||||
LLVM _needs_ to know this information. It can move things around to keep your
|
||||
data safe, but only if you tell it what's about to happen.
|
||||
|
||||
Failure to define all of your clobbers can cause UB.
|
||||
|
||||
### Options
|
||||
|
||||
By default the compiler won't optimize the code you wrote in an `asm` block. You
|
||||
will need to specify with the `options(..)` parameter that your code can be
|
||||
optimized. The available options [are specified in the
|
||||
RFC](https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md#options-1).
|
||||
|
||||
An optimization might duplicate or remove your instructions from the final
|
||||
code.
|
||||
|
||||
Typically when executing a BIOS call (such as `swi 0x01`, which resets the
|
||||
console), it's important that the instruction is executed, and not optimized
|
||||
away, even though it has no observable input and output to the compiler.
|
||||
|
||||
However some BIOS calls, such as _some_ math functions, have no observable
|
||||
effects outside of the registers we specified, in this case, we instruct the
|
||||
compiler to optimize them.
|
||||
|
||||
### BIOS ASM
|
||||
|
||||
* Inputs are always `r0`, `r1`, `r2`, and/or `r3`, depending on function.
|
||||
* Outputs are always zero or more of `r0`, `r1`, and `r3`.
|
||||
* Any of the output registers that aren't actually used should be marked as
|
||||
clobbered.
|
||||
* All other registers are unaffected.
|
||||
|
||||
All of the GBA BIOS calls are performed using the
|
||||
[swi](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/BABFCEEG.html)
|
||||
instruction, combined with a value depending on what BIOS function you're trying
|
||||
to invoke. If you're in 16-bit code you use the value directly, and if you're in
|
||||
32-bit mode you shift the value up by 16 bits first.
|
||||
|
||||
### Example BIOS Function: Division
|
||||
|
||||
For our example we'll use the division function, because GBATEK gives very clear
|
||||
instructions on how each register is used with that one:
|
||||
|
||||
```txt
|
||||
Signed Division, r0/r1.
|
||||
r0 signed 32bit Number
|
||||
r1 signed 32bit Denom
|
||||
Return:
|
||||
r0 Number DIV Denom ;signed
|
||||
r1 Number MOD Denom ;signed
|
||||
r3 ABS (Number DIV Denom) ;unsigned
|
||||
For example, incoming -1234, 10 should return -123, -4, +123.
|
||||
The function usually gets caught in an endless loop upon division by zero.
|
||||
```
|
||||
|
||||
The math folks tell me that the `r1` value should be properly called the
|
||||
"remainder" not the "modulus". We'll go with that for our function, doesn't hurt
|
||||
to use the correct names. Our Rust function has an assert against dividing by
|
||||
`0`, then we name some bindings _without_ giving them a value, we make the asm
|
||||
call, and then return what we got.
|
||||
|
||||
```rust
|
||||
pub fn div_rem(numerator: i32, denominator: i32) -> (i32, i32) {
|
||||
assert!(denominator != 0);
|
||||
let div_out: i32;
|
||||
let rem_out: i32;
|
||||
unsafe {
|
||||
asm!(
|
||||
"swi 0x06",
|
||||
inout("r0") numerator => div_out,
|
||||
inout("r1") denominator => rem_out,
|
||||
out("r3") _,
|
||||
options(nostack, nomem),
|
||||
);
|
||||
}
|
||||
(div_out, rem_out)
|
||||
}
|
||||
```
|
||||
|
||||
I _hope_ this all makes sense by now.
|
||||
|
||||
## Specific BIOS Functions
|
||||
|
||||
For a full list of all the specific BIOS functions and their use you should
|
||||
check the `gba::bios` module within the `gba` crate. There's just so many of
|
||||
them that enumerating them all here wouldn't serve much purpose.
|
||||
|
||||
Which is not to say that we'll never cover any BIOS functions in this book!
|
||||
Instead, we'll simply mention them when whenever they're relevent to the task at
|
||||
hand (such as controlling sound or waiting for vblank).
|
||||
|
||||
//TODO: list/name all BIOS functions as well as what they relate to elsewhere.
|
|
@ -1,548 +0,0 @@
|
|||
# Fixed Only
|
||||
|
||||
In addition to not having much of the standard library available, we don't even
|
||||
have a floating point unit available! We can't do floating point math in
|
||||
hardware! We _could_ still do floating point math as pure software computations
|
||||
if we wanted, but that's a slow, slow thing to do.
|
||||
|
||||
Are there faster ways? It's the same answer as always: "Yes, but not without a
|
||||
tradeoff."
|
||||
|
||||
The faster way is to represent fractional values using a system called a [Fixed
|
||||
Point Representation](https://en.wikipedia.org/wiki/Fixed-point_arithmetic).
|
||||
What do we trade away? Numeric range.
|
||||
|
||||
* Floating point math stores bits for base value and for exponent all according
|
||||
to a single [well defined](https://en.wikipedia.org/wiki/IEEE_754) standard
|
||||
for how such a complicated thing works.
|
||||
* Fixed point math takes a normal integer (either signed or unsigned) and then
|
||||
just "mentally associates" it (so to speak) with a fractional value for its
|
||||
"units". If you have 3 and it's in units of 1/2, then you have 3/2, or 1.5
|
||||
using decimal notation. If your number is 256 and it's in units of 1/256th
|
||||
then the value is 1.0 in decimal notation.
|
||||
|
||||
Floating point math requires dedicated hardware to perform quickly, but it can
|
||||
"trade" precision when it needs to represent extremely large or small values.
|
||||
|
||||
Fixed point math is just integral math, which our GBA is reasonably good at, but
|
||||
because your number is associated with a fixed fraction your results can get out
|
||||
of range very easily.
|
||||
|
||||
## Representing A Fixed Point Value
|
||||
|
||||
So we want to associate our numbers with a mental note of what units they're in:
|
||||
|
||||
* [PhantomData](https://doc.rust-lang.org/core/marker/struct.PhantomData.html)
|
||||
is a type that tells the compiler "please remember this extra type info" when
|
||||
you add it as a field to a struct. It goes away at compile time, so it's
|
||||
perfect for us to use as space for a note to ourselves without causing runtime
|
||||
overhead.
|
||||
* The [typenum](https://crates.io/crates/typenum) crate is the best way to
|
||||
represent a number within a type in Rust. Since our values on the GBA are
|
||||
always specified as a number of fractional bits to count the number as, we can
|
||||
put `typenum` types such as `U8` or `U14` into our `PhantomData` to keep track
|
||||
of what's going on.
|
||||
|
||||
Now, those of you who know me, or perhaps just know my reputation, will of
|
||||
course _immediately_ question what happened to the real Lokathor. I do not care
|
||||
for most crates, and I particularly don't care for using a crate in teaching
|
||||
situations. However, `typenum` has a number of factors on its side that let me
|
||||
suggest it in this situation:
|
||||
|
||||
* It's version 1.10 with a total of 21 versions and nearly 700k downloads, so we
|
||||
can expect that the major troubles have been shaken out and that it will remain
|
||||
fairly stable for quite some time to come.
|
||||
* It has no further dependencies that it's going to drag into the compilation.
|
||||
* It happens all at compile time, so it's not clogging up our actual game with
|
||||
any nonsense.
|
||||
* The (interesting) subject of "how do you do math inside Rust's trait system?" is
|
||||
totally separate from the concern that we're trying to focus on here.
|
||||
|
||||
Therefore, we will consider it acceptable to use this crate.
|
||||
|
||||
Now the `typenum` crate defines a whole lot, but we'll focus down to just a
|
||||
single type at the moment:
|
||||
[UInt](https://docs.rs/typenum/1.10.0/typenum/uint/struct.UInt.html) is a
|
||||
type-level unsigned value. It's like `u8` or `u16`, but while they're types that
|
||||
then have values, each `UInt` construction statically equates to a specific
|
||||
value. Like how the `()` type only has one value, which is also called `()`. In
|
||||
this case, you wrap up `UInt` around smaller `UInt` values and a `B1` or `B0`
|
||||
value to build up the binary number that you want at the type level.
|
||||
|
||||
In other words, instead of writing
|
||||
|
||||
```rust
|
||||
let six = 0b110;
|
||||
```
|
||||
|
||||
We write
|
||||
|
||||
```rust
|
||||
type U6 = UInt<UInt<UInt<UTerm, B1>, B1>, B0>;
|
||||
```
|
||||
|
||||
Wild, I know. If you look into the `typenum` crate you can do math and stuff
|
||||
with these type level numbers, and we will a little bit below, but to start off
|
||||
we _just_ need to store one in some `PhantomData`.
|
||||
|
||||
### A struct For Fixed Point
|
||||
|
||||
Our actual type for a fixed point value looks like this:
|
||||
|
||||
```rust
|
||||
use core::marker::PhantomData;
|
||||
use typenum::marker_traits::Unsigned;
|
||||
|
||||
/// Fixed point `T` value with `F` fractional bits.
|
||||
#[derive(Debug, Copy, Clone, Default, PartialEq, Eq, PartialOrd, Ord)]
|
||||
#[repr(transparent)]
|
||||
pub struct Fx<T, F: Unsigned> {
|
||||
bits: T,
|
||||
_phantom: PhantomData<F>,
|
||||
}
|
||||
```
|
||||
|
||||
This says that `Fx<T,F>` is a generic type that holds some base number type `T`
|
||||
and a `F` type that's marking off how many fractional bits we're using. We only
|
||||
want people giving unsigned type-level values for the `PhantomData` type, so we
|
||||
use the trait bound `F: Unsigned`.
|
||||
|
||||
We use
|
||||
[repr(transparent)](https://github.com/rust-lang/rfcs/blob/master/text/1758-repr-transparent.md)
|
||||
here to ensure that `Fx` will always be treated just like the base type in the
|
||||
final program (in terms of bit pattern and ABI).
|
||||
|
||||
If you go and check, this is _basically_ how the existing general purpose crates
|
||||
for fixed point math represent their numbers. They're a little fancier about it
|
||||
because they have to cover every case, and we only have to cover our GBA case.
|
||||
|
||||
That's quite a bit to type though. We probably want to make a few type aliases
|
||||
for things to be easier to look at. Unfortunately there's [no standard
|
||||
notation](https://en.wikipedia.org/wiki/Fixed-point_arithmetic#Notation) for how
|
||||
you write a fixed point type. We also have to limit ourselves to what's valid
|
||||
for use in a Rust type too. I like the `fx` thing, so we'll use that for signed
|
||||
and then `fxu` if we need an unsigned value.
|
||||
|
||||
```rust
|
||||
/// Alias for an `i16` fixed point value with 8 fractional bits.
|
||||
pub type fx8_8 = Fx<i16,U8>;
|
||||
```
|
||||
|
||||
Rust will complain about having `non_camel_case_types`, and you can shut that
|
||||
warning up by putting an `#[allow(non_camel_case_types)]` attribute on the type
|
||||
alias directly, or you can use `#![allow(non_camel_case_types)]` at the very top
|
||||
of the module to shut up that warning for the whole module (which is what I
|
||||
did).
|
||||
|
||||
## Constructing A Fixed Point Value
|
||||
|
||||
So how do we actually _make_ one of these values? Well, we can always just wrap or unwrap any value in our `Fx` type:
|
||||
|
||||
```rust
|
||||
impl<T, F: Unsigned> Fx<T, F> {
|
||||
/// Uses the provided value directly.
|
||||
pub fn from_raw(r: T) -> Self {
|
||||
Fx {
|
||||
num: r,
|
||||
phantom: PhantomData,
|
||||
}
|
||||
}
|
||||
/// Unwraps the inner value.
|
||||
pub fn into_raw(self) -> T {
|
||||
self.num
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
I'd like to use the `From` trait of course, but it was giving me some trouble, i
|
||||
think because of the orphan rule. Oh well.
|
||||
|
||||
If we want to be particular to the fact that these are supposed to be
|
||||
_numbers_... that gets tricky. Rust is actually quite bad at being generic about
|
||||
number types. You can use the [num](https://crates.io/crates/num) crate, or you
|
||||
can just use a macro and invoke it once per type. Guess what we're gonna do.
|
||||
|
||||
```rust
|
||||
macro_rules! fixed_point_methods {
|
||||
($t:ident) => {
|
||||
impl<F: Unsigned> Fx<$t, F> {
|
||||
/// Gives the smallest positive non-zero value.
|
||||
pub fn precision() -> Self {
|
||||
Fx {
|
||||
num: 1,
|
||||
phantom: PhantomData,
|
||||
}
|
||||
}
|
||||
|
||||
/// Makes a value with the integer part shifted into place.
|
||||
pub fn from_int_part(i: $t) -> Self {
|
||||
Fx {
|
||||
num: i << F::U8,
|
||||
phantom: PhantomData,
|
||||
}
|
||||
}
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
fixed_point_methods! {u8}
|
||||
fixed_point_methods! {i8}
|
||||
fixed_point_methods! {i16}
|
||||
fixed_point_methods! {u16}
|
||||
fixed_point_methods! {i32}
|
||||
fixed_point_methods! {u32}
|
||||
```
|
||||
|
||||
Now _you'd think_ that those can be `const`, but at the moment you can't have a
|
||||
`const` function with a bound on any trait other than `Sized`, so they have to
|
||||
be normal functions.
|
||||
|
||||
Also, we're doing something a little interesting there with `from_int_part`. We
|
||||
can take our `F` type and get its constant value. There's other associated
|
||||
constants if we want it in other types, and also non-const methods if you wanted
|
||||
that for some reason (maybe passing it as a closure function? dunno).
|
||||
|
||||
## Casting Base Values
|
||||
|
||||
Next, once we have a value in one base type we will need to be able to move it
|
||||
into another base type. Unfortunately this means we gotta use the `as` operator,
|
||||
which requires a concrete source type and a concrete destination type. There's
|
||||
no easy way for us to make it generic here.
|
||||
|
||||
We could let the user use `into_raw`, cast, and then do `from_raw`, but that's
|
||||
error prone because they might change the fractional bit count accidentally.
|
||||
This means that we have to write a function that does the casting while
|
||||
perfectly preserving the fractional bit quantity. If we wrote one function for
|
||||
each conversion it'd be like 30 different possible casts (6 base types that we
|
||||
support, and then 5 possible target types). Instead, we'll write it just once in
|
||||
a way that takes a closure, and let the user pass a closure that does the cast.
|
||||
The compiler should merge it all together quite nicely for us once optimizations
|
||||
kick in.
|
||||
|
||||
This code goes outside the macro. I want to avoid too much code in the macro if
|
||||
we can, it's a little easier to cope with I think.
|
||||
|
||||
```rust
|
||||
/// Casts the base type, keeping the fractional bit quantity the same.
|
||||
pub fn cast_inner<Z, C: Fn(T) -> Z>(self, op: C) -> Fx<Z, F> {
|
||||
Fx {
|
||||
num: op(self.num),
|
||||
phantom: PhantomData,
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
It's horrible and ugly, but Rust is just bad at numbers sometimes.
|
||||
|
||||
## Adjusting Fractional Part
|
||||
|
||||
In addition to the base value we might want to change our fractional bit
|
||||
quantity. This is actually easier that it sounds, but it also requires us to be
|
||||
tricky with the generics. We can actually use some typenum type level operators
|
||||
here.
|
||||
|
||||
This code goes inside the macro: we need to be able to use the left shift and
|
||||
right shift, which is easiest when we just use the macro's `$t` as our type. We
|
||||
could alternately put a similar function outside the macro and be generic on `T`
|
||||
having the left and right shift operators by using a `where` clause. As much as
|
||||
I'd like to avoid too much code being generated by macro, I'd _even more_ like
|
||||
to avoid generic code with huge and complicated trait bounds. It comes down to
|
||||
style, and you gotta decide for yourself.
|
||||
|
||||
```rust
|
||||
/// Changes the fractional bit quantity, keeping the base type the same.
|
||||
pub fn adjust_fractional_bits<Y: Unsigned + IsEqual<F, Output = False>>(self) -> Fx<$t, Y> {
|
||||
let leftward_movement: i32 = Y::to_i32() - F::to_i32();
|
||||
Fx {
|
||||
num: if leftward_movement > 0 {
|
||||
self.num << leftward_movement
|
||||
} else {
|
||||
self.num >> (-leftward_movement)
|
||||
},
|
||||
phantom: PhantomData,
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
There's a few things at work. First, we introduce `Y` as the target number of
|
||||
fractional bits, and we _also_ limit it that the target bits quantity can't be
|
||||
the same as we already have using a type-level operator. If it's the same as we
|
||||
started with, why are you doing the cast at all?
|
||||
|
||||
Now, once we're sure that the current bits and target bits aren't the same, we
|
||||
compute `target - start`, and call this our "leftward movement". Example: if
|
||||
we're targeting 8 bits and we're at 4 bits, we do 8-4 and get +4 as our leftward
|
||||
movement. If the leftward_movement is positive we naturally shift our current
|
||||
value to the left. If it's not positive then it _must_ be negative because we
|
||||
eliminated 0 as a possibility using the type-level operator, so we shift to the
|
||||
right by the negative value.
|
||||
|
||||
## Addition, Subtraction, Shifting, Negative, Comparisons
|
||||
|
||||
From here on we're getting help from [this blog
|
||||
post](https://spin.atomicobject.com/2012/03/15/simple-fixed-point-math/) by [Job
|
||||
Vranish](https://spin.atomicobject.com/author/vranish/), so thank them if you
|
||||
learn something.
|
||||
|
||||
I might have given away the game a bit with those `derive` traits on our fixed
|
||||
point type. For a fair number of operations you can use the normal form of the
|
||||
op on the inner bits as long as the fractional parts have the same quantity.
|
||||
This includes equality and ordering (which we derived) as well as addition,
|
||||
subtraction, and bit shifting (which we need to do ourselves).
|
||||
|
||||
This code can go outside the macro, with sufficient trait bounds.
|
||||
|
||||
```rust
|
||||
impl<T: Add<Output = T>, F: Unsigned> Add for Fx<T, F> {
|
||||
type Output = Self;
|
||||
fn add(self, rhs: Fx<T, F>) -> Self::Output {
|
||||
Fx {
|
||||
num: self.num + rhs.num,
|
||||
phantom: PhantomData,
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The bound on `T` makes it so that `Fx<T, F>` can be added any time that `T` can
|
||||
be added to its own type with itself as the output. We can use the exact same
|
||||
pattern for `Sub`, `Shl`, `Shr`, and `Neg`. With enough trait bounds, we can do
|
||||
anything!
|
||||
|
||||
```rust
|
||||
impl<T: Sub<Output = T>, F: Unsigned> Sub for Fx<T, F> {
|
||||
type Output = Self;
|
||||
fn sub(self, rhs: Fx<T, F>) -> Self::Output {
|
||||
Fx {
|
||||
num: self.num - rhs.num,
|
||||
phantom: PhantomData,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl<T: Shl<u32, Output = T>, F: Unsigned> Shl<u32> for Fx<T, F> {
|
||||
type Output = Self;
|
||||
fn shl(self, rhs: u32) -> Self::Output {
|
||||
Fx {
|
||||
num: self.num << rhs,
|
||||
phantom: PhantomData,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl<T: Shr<u32, Output = T>, F: Unsigned> Shr<u32> for Fx<T, F> {
|
||||
type Output = Self;
|
||||
fn shr(self, rhs: u32) -> Self::Output {
|
||||
Fx {
|
||||
num: self.num >> rhs,
|
||||
phantom: PhantomData,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl<T: Neg<Output = T>, F: Unsigned> Neg for Fx<T, F> {
|
||||
type Output = Self;
|
||||
fn neg(self) -> Self::Output {
|
||||
Fx {
|
||||
num: -self.num,
|
||||
phantom: PhantomData,
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Unfortunately, for `Shl` and `Shr` to have as much coverage on our type as it
|
||||
does on the base type (allowing just about any right hand side) we'd have to do
|
||||
another macro, but I think just `u32` is fine. We can always add more later if
|
||||
we need.
|
||||
|
||||
We could also implement `BitAnd`, `BitOr`, `BitXor`, and `Not`, but they don't
|
||||
seem relevent to our fixed point math use, and this section is getting long
|
||||
already. Just use the same general patterns if you want to add it in your own
|
||||
programs. Shockingly, `Rem` also works directly if you want it, though I don't
|
||||
forsee us needing floating point remainder. Also, the GBA can't do hardware
|
||||
division or remainder, and we'll have to work around that below when we
|
||||
implement `Div` (which maybe we don't need, but it's complex enough I should
|
||||
show it instead of letting people guess).
|
||||
|
||||
**Note:** In addition to the various `Op` traits, there's also `OpAssign`
|
||||
variants. Each `OpAssign` is the same as `Op`, but takes `&mut self` instead of
|
||||
`self` and then modifies in place instead of producing a fresh value. In other
|
||||
words, if you want both `+` and `+=` you'll need to do the `AddAssign` trait
|
||||
too. It's not the worst thing to just write `a = a+b`, so I won't bother with
|
||||
showing all that here. It's pretty easy to figure out for yourself if you want.
|
||||
|
||||
## Multiplication
|
||||
|
||||
This is where things get more interesting. When we have two numbers `A` and `B`
|
||||
they really stand for `(a*f)` and `(b*f)`. If we write `A*B` then we're really
|
||||
writing `(a*f)*(b*f)`, which can be rewritten as `(a*b)*2f`, and now it's
|
||||
obvious that we have one more `f` than we wanted to have. We have to do the
|
||||
multiply of the inner value and then divide out the `f`. We divide by `1 <<
|
||||
bit_count`, so if we have 8 fractional bits we'll divide by 256.
|
||||
|
||||
The catch is that, when we do the multiply we're _extremely_ likely to overflow
|
||||
our base type with that multiplication step. Then we do that divide, and now our
|
||||
result is basically nonsense. We can avoid this to some extent by casting up to
|
||||
a higher bit type, doing the multiplication and division at higher precision,
|
||||
and then casting back down. We want as much precision as possible without being
|
||||
too inefficient, so we'll always cast up to 32-bit (on a 64-bit machine you'd
|
||||
cast up to 64-bit instead).
|
||||
|
||||
Naturally, any signed value has to be cast up to `i32` and any unsigned value
|
||||
has to be cast up to `u32`, so we'll have to handle those separately.
|
||||
|
||||
Also, instead of doing an _actual_ divide we can right-shift by the correct
|
||||
number of bits to achieve the same effect. _Except_ when we have a signed value
|
||||
that's negative, because actual division truncates towards zero and
|
||||
right-shifting truncates towards negative infinity. We can get around _this_ by
|
||||
flipping the sign, doing the shift, and flipping the sign again (which sounds
|
||||
silly but it's so much faster than doing an actual division).
|
||||
|
||||
Also, again signed values can be annoying, because if the value _just happens_
|
||||
to be `i32::MIN` then when you negate it you'll have... _still_ a negative
|
||||
value. I'm not 100% on this, but I think the correct thing to do at that point
|
||||
is to give `$t::MIN` as the output num value.
|
||||
|
||||
Did you get all that? Good, because this involves casting, so we will need to
|
||||
implement it three times, which calls for another macro.
|
||||
|
||||
```rust
|
||||
macro_rules! fixed_point_signed_multiply {
|
||||
($t:ident) => {
|
||||
impl<F: Unsigned> Mul for Fx<$t, F> {
|
||||
type Output = Self;
|
||||
fn mul(self, rhs: Fx<$t, F>) -> Self::Output {
|
||||
let pre_shift = (self.num as i32).wrapping_mul(rhs.num as i32);
|
||||
if pre_shift < 0 {
|
||||
if pre_shift == core::i32::MIN {
|
||||
Fx {
|
||||
num: core::$t::MIN,
|
||||
phantom: PhantomData,
|
||||
}
|
||||
} else {
|
||||
Fx {
|
||||
num: (-((-pre_shift) >> F::U8)) as $t,
|
||||
phantom: PhantomData,
|
||||
}
|
||||
}
|
||||
} else {
|
||||
Fx {
|
||||
num: (pre_shift >> F::U8) as $t,
|
||||
phantom: PhantomData,
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
fixed_point_signed_multiply! {i8}
|
||||
fixed_point_signed_multiply! {i16}
|
||||
fixed_point_signed_multiply! {i32}
|
||||
|
||||
macro_rules! fixed_point_unsigned_multiply {
|
||||
($t:ident) => {
|
||||
impl<F: Unsigned> Mul for Fx<$t, F> {
|
||||
type Output = Self;
|
||||
fn mul(self, rhs: Fx<$t, F>) -> Self::Output {
|
||||
Fx {
|
||||
num: ((self.num as u32).wrapping_mul(rhs.num as u32) >> F::U8) as $t,
|
||||
phantom: PhantomData,
|
||||
}
|
||||
}
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
fixed_point_unsigned_multiply! {u8}
|
||||
fixed_point_unsigned_multiply! {u16}
|
||||
fixed_point_unsigned_multiply! {u32}
|
||||
```
|
||||
|
||||
## Division
|
||||
|
||||
Division is similar to multiplication, but reversed. Which makes sense. This
|
||||
time `A/B` gives `(a*f)/(b*f)` which is `a/b`, one _less_ `f` than we were
|
||||
after.
|
||||
|
||||
As with the multiplication version of things, we have to up-cast our inner value
|
||||
as much a we can before doing the math, to allow for the most precision
|
||||
possible.
|
||||
|
||||
The snag here is that the GBA has no division or remainder. Instead, the GBA has
|
||||
a BIOS function you can call to do `i32/i32` division.
|
||||
|
||||
This is a potential problem for us though. If we have some unsigned value, we
|
||||
need it to fit within the positive space of an `i32` _after the multiply_ so
|
||||
that we can cast it to `i32`, call the BIOS function that only works on `i32`
|
||||
values, and cast it back to its actual type.
|
||||
|
||||
* If you have a u8 you're always okay, even with 8 floating bits.
|
||||
* If you have a u16 you're okay even with a maximum value up to 15 floating
|
||||
bits, but having a maximum value and 16 floating bits makes it break.
|
||||
* If you have a u32 you're probably going to be in trouble all the time.
|
||||
|
||||
So... ugh, there's not much we can do about this. For now we'll just have to
|
||||
suffer some.
|
||||
|
||||
// TODO: find a numerics book that tells us how to do `u32/u32` divisions.
|
||||
|
||||
```rust
|
||||
macro_rules! fixed_point_signed_division {
|
||||
($t:ident) => {
|
||||
impl<F: Unsigned> Div for Fx<$t, F> {
|
||||
type Output = Self;
|
||||
fn div(self, rhs: Fx<$t, F>) -> Self::Output {
|
||||
let mul_output: i32 = (self.num as i32).wrapping_mul(1 << F::U8);
|
||||
let divide_result: i32 = crate::bios::div(mul_output, rhs.num as i32);
|
||||
Fx {
|
||||
num: divide_result as $t,
|
||||
phantom: PhantomData,
|
||||
}
|
||||
}
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
fixed_point_signed_division! {i8}
|
||||
fixed_point_signed_division! {i16}
|
||||
fixed_point_signed_division! {i32}
|
||||
|
||||
macro_rules! fixed_point_unsigned_division {
|
||||
($t:ident) => {
|
||||
impl<F: Unsigned> Div for Fx<$t, F> {
|
||||
type Output = Self;
|
||||
fn div(self, rhs: Fx<$t, F>) -> Self::Output {
|
||||
let mul_output: i32 = (self.num as i32).wrapping_mul(1 << F::U8);
|
||||
let divide_result: i32 = crate::bios::div(mul_output, rhs.num as i32);
|
||||
Fx {
|
||||
num: divide_result as $t,
|
||||
phantom: PhantomData,
|
||||
}
|
||||
}
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
fixed_point_unsigned_division! {u8}
|
||||
fixed_point_unsigned_division! {u16}
|
||||
fixed_point_unsigned_division! {u32}
|
||||
```
|
||||
|
||||
## Trigonometry
|
||||
|
||||
TODO: look up tables! arcbits!
|
||||
|
||||
## Just Using A Crate
|
||||
|
||||
If, after seeing all that, and seeing that I still didn't even cover every
|
||||
possible trait impl that you might want for all the possible types... if after
|
||||
all that you feel too intimidated, then I'll cave a bit on your behalf and
|
||||
suggest to you that the [fixed](https://crates.io/crates/fixed) crate seems to
|
||||
be the best crate available for fixed point math.
|
||||
|
||||
_I have not tested its use on the GBA myself_.
|
||||
|
||||
It's just my recommendation from looking at the docs of the various options
|
||||
available, if you really wanted to just have a crate for it.
|
|
@ -1,23 +0,0 @@
|
|||
# Book Goals and Style
|
||||
|
||||
So, what's this book actually gonna teach you?
|
||||
|
||||
My goal is certainly not just showing off the crate. Programming for the GBA is
|
||||
weird enough that I'm trying to teach you all the rest of the stuff you need to
|
||||
know along the way. If I do my job right then you'd be able to write your own
|
||||
crate for GBA stuff just how you think it should all go by the end.
|
||||
|
||||
Overall the book is sorted more for easy review once you're trying to program
|
||||
something. The GBA has a few things that can stand on their own and many other
|
||||
things are a mass of interconnected concepts, so some parts of the book end up
|
||||
having to refer you to portions that you haven't read yet. The chapters and
|
||||
sections are sorted so that _minimal_ future references are required, but it's
|
||||
unavoidable that it'll happen sometimes.
|
||||
|
||||
The actual "tutorial order" of the book is the
|
||||
[Examples](../05-examples/00-index.md) chapter. Each section of that chapter
|
||||
breaks down one of the provided examples in the [examples
|
||||
directory](https://github.com/rust-console/gba/tree/master/examples) of the
|
||||
repository. We go over what sections of the book you'll need to have read for
|
||||
the example code to make sense, and also how we apply the general concepts
|
||||
described in the book to the specific example cases.
|
|
@ -1 +0,0 @@
|
|||
# Timers
|
|
@ -1,133 +0,0 @@
|
|||
# Direct Memory Access
|
||||
|
||||
The GBA has four Direct Memory Access (DMA) units that can be utilized. They're
|
||||
mostly the same in terms of overall operation, but each unit has special rules
|
||||
that make it better suited to a particular task.
|
||||
|
||||
**Please Note:** TONC and GBATEK have slightly different concepts of how a DMA
|
||||
unit's registers should be viewed. I've chosen to go by what GBATEK uses.
|
||||
|
||||
## General DMA
|
||||
|
||||
A single DMA unit is controlled through four different IO Registers.
|
||||
|
||||
* **Source:** (`DMAxSAD`, read only) A `*const` pointer that the DMA reads from.
|
||||
* **Destination:** (`DMAxDAD`, read only) A `*mut` pointer that the DMA writes
|
||||
to.
|
||||
* **Count:** (`DMAxCNT_L`, read only) How many transfers to perform.
|
||||
* **Control:** (`DMAxCNT_H`, read/write) A register full of bit-flags that
|
||||
controls all sorts of details.
|
||||
|
||||
Here, the `x` is replaced with 0 through 3 when utilizing whichever particular
|
||||
DMA unit.
|
||||
|
||||
### Source Address
|
||||
|
||||
This is either a `u32` or `u16` address depending on the unit's assigned
|
||||
transfer mode (see Control). The address MUST be aligned.
|
||||
|
||||
With DMA0 the source must be internal memory. With other DMA units the source
|
||||
can be any non-`SRAM` location.
|
||||
|
||||
### Destination Address
|
||||
|
||||
As with the Source, this is either a `u32` or `u16` address depending on the
|
||||
unit's assigned transfer mode (see Control). The address MUST be aligned.
|
||||
|
||||
With DMA0/1/2 the destination must be internal memory. With DMA3 the destination
|
||||
can be any non-`SRAM` memory (allowing writes into Game Pak ROM / FlashROM,
|
||||
assuming that your Game Pak hardware supports that).
|
||||
|
||||
### Count
|
||||
|
||||
This is a `u16` that says how many transfers (`u16` or `u32`) to make.
|
||||
|
||||
DMA0/1/2 will only actually accept a 14-bit value, while DMA3 will accept a full
|
||||
16-bit value. A value of 0 instead acts as if you'd used the _maximum_ value for
|
||||
the DMA in question. Put another way, DMA0/1/2 transfer `1` through `0x4000`
|
||||
words, with `0` as the `0x4000` value, and DMA3 transfers `1` through `0x1_0000`
|
||||
words, with `0` as the `0x1_0000` value.
|
||||
|
||||
The maximum value isn't a very harsh limit. Even in just `u16` mode, `0x4000`
|
||||
transfers is 32k, which would for example be all 32k of `IWRAM` (including your
|
||||
own user stack). If you for some reason do need to transfer more than a single
|
||||
DMA use can move around at once then you can just setup the DMA a second time
|
||||
and keep going.
|
||||
|
||||
### Control
|
||||
|
||||
This `u16` bit-flag field is where things get wild.
|
||||
|
||||
* Bits 0-4 do nothing
|
||||
* Bit 5-6 control how the destination address changes per transfer:
|
||||
* 0: Offset +1
|
||||
* 1: Offset -1
|
||||
* 2: No Change
|
||||
* 3: Offset +1 and reload when a Repeat starts (below)
|
||||
* Bit 7-8 similarly control how the source address changes per transfer:
|
||||
* 0: Offset +1
|
||||
* 1: Offset -1
|
||||
* 2: No Change
|
||||
* 3: Prohibited
|
||||
* Bit 9: enables Repeat mode.
|
||||
* Bit 10: Transfer `u16` (false) or `u32` (true) data.
|
||||
* Bit 11: "Game Pak DRQ" flag. GBATEK says that this is only allowed for DMA3,
|
||||
and also your Game Pak hardware must be equipped to use DRQ mode. I don't even
|
||||
know what DRQ mode is all about, and GBATEK doesn't say much either. If DRQ is
|
||||
set then you _must not_ set the Repeat bit as well. The `gba` crate simply
|
||||
doesn't bother to expose this flag to users.
|
||||
* Bit 12-13: DMA Start:
|
||||
* 0: "Immediate", which is 2 cycles after requested.
|
||||
* 1: VBlank
|
||||
* 2: HBlank
|
||||
* 3: Special, depending on what DMA unit is involved:
|
||||
* DMA0: Prohibited.
|
||||
* DMA1/2: Sound FIFO (see the [Sound](04-sound.md) section)
|
||||
* DMA3: Video Capture, intended for use with the Repeat flag, performs a
|
||||
transfer per scanline (similar to HBlank) starting at `VCOUNT` 2 and
|
||||
stopping at `VCOUNT` 162. Intended for copying things from ROM or camera
|
||||
into VRAM.
|
||||
* Bit 14: Interrupt upon DMA complete.
|
||||
* Bit 15: Enable this DMA unit.
|
||||
|
||||
## DMA Life Cycle
|
||||
|
||||
The general technique for using a DMA unit involves first setting the relevent
|
||||
source, destination, and count registers, then setting the appropriate control
|
||||
register value with the Enable bit set.
|
||||
|
||||
Once the Enable flag is set the appropriate DMA unit will trigger at the
|
||||
assigned time (Bit 12-13). The CPU's operation is halted while any DMA unit is
|
||||
active, until the DMA completes its task. If more than one DMA unit is supposed
|
||||
to be active at once, then the DMA unit with the lower number will activate and
|
||||
complete before any others.
|
||||
|
||||
When the DMA triggers via _Enable_, the `Source`, `Destination`, and `Count`
|
||||
values are copied from the GBA's registers into the DMA unit's internal
|
||||
registers. Changes to the DMA unit's internal copy of the data don't affect the
|
||||
values in the GBA registers. Another _Enable_ will read the same values as
|
||||
before.
|
||||
|
||||
If DMA is triggered via having _Repeat_ active then _only_ the Count is copied
|
||||
in to the DMA unit registers. The `Source` and `Destination` are unaffected
|
||||
during a Repeat. The exception to this is if the destination address control
|
||||
value (Bits 5-6) are set to 3 (`0b11`), in which case a _Repeat_ will also
|
||||
re-copy the `Destination` as well as the `Count`.
|
||||
|
||||
Once a DMA operation completes, the Enable flag of its Control register will
|
||||
automatically be disabled, _unless_ the Repeat flag is on, in which case the
|
||||
Enable flag is left active. You will have to manually disable it if you don't
|
||||
want the DMA to kick in again over and over at the specified starting time.
|
||||
|
||||
## DMA Limitations
|
||||
|
||||
The DMA units cannot access `SRAM` at all.
|
||||
|
||||
If you're using HBlank to access any part of the memory that the display
|
||||
controller utilizes (`OAM`, `PALRAM`, `VRAM`), you need to have enabled the
|
||||
"HBlank Interval Free" bit in the Display Control Register (`DISPCNT`).
|
||||
|
||||
Whenever DMA is active the CPU is _not_ active, which means that
|
||||
[Interrupts](05-interrupts.md) will not fire while DMA is happening. This can
|
||||
cause any number of hard to track down bugs. Try to limit your use of the DMA
|
||||
units if you can.
|
|
@ -1,317 +0,0 @@
|
|||
# Volatile Destination
|
||||
|
||||
TODO: update this when we can make more stuff `const`
|
||||
|
||||
## Volatile Memory
|
||||
|
||||
The compiler is an eager friend, so when it sees a read or a write that won't
|
||||
have an effect, it eliminates that read or write. For example, if we write
|
||||
|
||||
```rust
|
||||
let mut x = 5;
|
||||
x = 7;
|
||||
```
|
||||
|
||||
The compiler won't actually ever put 5 into `x`. It'll skip straight to putting
|
||||
7 in `x`, because we never read from `x` when it's 5, so that's a safe change to
|
||||
make. Normally, values are stored in RAM, which has no side effects when you
|
||||
read and write from it. RAM is purely for keeping notes about values you'll need
|
||||
later on.
|
||||
|
||||
However, what if we had a bit of hardware where we wanted to do a write and that
|
||||
did something _other than_ keeping the value for us to look at later? As you saw
|
||||
in the `hello_magic` example, we have to use a `write_volatile` operation.
|
||||
Volatile means "just do it anyway". The compiler thinks that it's pointless, but
|
||||
we know better, so we can force it to really do exactly what we say by using
|
||||
`write_volatile` instead of `write`.
|
||||
|
||||
This is kinda error prone though, right? Because it's just a raw pointer, so we
|
||||
might forget to use `write_volatile` at some point.
|
||||
|
||||
Instead, we want a type that's always going to use volatile reads and writes.
|
||||
Also, we want a pointer type that lets our reads and writes to be as safe as
|
||||
possible once we've unsafely constructed the initial value.
|
||||
|
||||
### Constructing The VolAddress Type
|
||||
|
||||
First, we want a type that stores a location within the address space. This can
|
||||
be a pointer, or a `usize`, and we'll use a `usize` because that's easier to
|
||||
work with in a `const` context (and we want to have `const` when we can get it).
|
||||
We'll also have our type use `NonZeroUsize` instead of just `usize` so that
|
||||
`Option<VolAddress<T>>` stays as a single machine word. This helps quite a bit
|
||||
when we want to iterate over the addresses of a block of memory (such as
|
||||
locations within the palette memory). Hardware is never at the null address
|
||||
anyway. Also, if we had _just_ an address number then we wouldn't be able to
|
||||
track what type the address is for. We need some
|
||||
[PhantomData](https://doc.rust-lang.org/core/marker/struct.PhantomData.html),
|
||||
and specifically we need the phantom data to be for `*mut T`:
|
||||
|
||||
* If we used `*const T` that'd have the wrong
|
||||
[variance](https://doc.rust-lang.org/nomicon/subtyping.html).
|
||||
* If we used `&mut T` then that's fusing in the ideas of _lifetime_ and
|
||||
_exclusive access_ to our type. That's potentially important, but that's also
|
||||
an abstraction we'll build _on top of_ this `VolAddress` type if we need it.
|
||||
|
||||
One abstraction layer at a time, so we start with just a phantom pointer. This gives us a type that looks like this:
|
||||
|
||||
```rust
|
||||
#[derive(Debug)]
|
||||
#[repr(transparent)]
|
||||
pub struct VolAddress<T> {
|
||||
address: NonZeroUsize,
|
||||
marker: PhantomData<*mut T>,
|
||||
}
|
||||
```
|
||||
|
||||
Now, because of how `derive` is specified, it derives traits _if the generic
|
||||
parameter_ supports those traits. Since our type is like a pointer, the traits
|
||||
it supports are distinct from whatever traits the target type supports. So we'll
|
||||
provide those implementations manually.
|
||||
|
||||
```rust
|
||||
impl<T> Clone for VolAddress<T> {
|
||||
fn clone(&self) -> Self {
|
||||
*self
|
||||
}
|
||||
}
|
||||
impl<T> Copy for VolAddress<T> {}
|
||||
impl<T> PartialEq for VolAddress<T> {
|
||||
fn eq(&self, other: &Self) -> bool {
|
||||
self.address == other.address
|
||||
}
|
||||
}
|
||||
impl<T> Eq for VolAddress<T> {}
|
||||
impl<T> PartialOrd for VolAddress<T> {
|
||||
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
|
||||
Some(self.address.cmp(&other.address))
|
||||
}
|
||||
}
|
||||
impl<T> Ord for VolAddress<T> {
|
||||
fn cmp(&self, other: &Self) -> Ordering {
|
||||
self.address.cmp(&other.address)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Boilerplate junk, not interesting. There's a reason that you derive those traits
|
||||
99% of the time in Rust.
|
||||
|
||||
### Constructing A VolAddress Value
|
||||
|
||||
Okay so here's the next core concept: If we unsafely _construct_ a
|
||||
`VolAddress<T>`, then we can safely _use_ the value once it's been properly
|
||||
created.
|
||||
|
||||
```rust
|
||||
// you'll need these features enabled and a recent nightly
|
||||
#![feature(const_int_wrapping)]
|
||||
#![feature(min_const_unsafe_fn)]
|
||||
|
||||
impl<T> VolAddress<T> {
|
||||
pub const unsafe fn new_unchecked(address: usize) -> Self {
|
||||
VolAddress {
|
||||
address: NonZeroUsize::new_unchecked(address),
|
||||
marker: PhantomData,
|
||||
}
|
||||
}
|
||||
pub const unsafe fn cast<Z>(self) -> VolAddress<Z> {
|
||||
VolAddress {
|
||||
address: self.address,
|
||||
marker: PhantomData,
|
||||
}
|
||||
}
|
||||
pub unsafe fn offset(self, offset: isize) -> Self {
|
||||
VolAddress {
|
||||
address: NonZeroUsize::new_unchecked(self.address.get().wrapping_add(offset as usize * core::mem::size_of::<T>())),
|
||||
marker: PhantomData,
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
So what are the unsafety rules here?
|
||||
|
||||
* Non-null, obviously.
|
||||
* Must be aligned for `T`
|
||||
* Must always produce valid bit patterns for `T`
|
||||
* Must not be part of the address space that Rust's stack or allocator will ever
|
||||
uses.
|
||||
|
||||
So, again using the `hello_magic` example, we had
|
||||
|
||||
```rust
|
||||
(0x400_0000 as *mut u16).write_volatile(0x0403);
|
||||
```
|
||||
|
||||
And instead we could declare
|
||||
|
||||
```rust
|
||||
const MAGIC_LOCATION: VolAddress<u16> = unsafe { VolAddress::new(0x400_0000) };
|
||||
```
|
||||
|
||||
### Using A VolAddress Value
|
||||
|
||||
Now that we've named the magic location, we want to write to it.
|
||||
|
||||
```rust
|
||||
impl<T> VolAddress<T> {
|
||||
pub fn read(self) -> T
|
||||
where
|
||||
T: Copy,
|
||||
{
|
||||
unsafe { (self.address.get() as *mut T).read_volatile() }
|
||||
}
|
||||
pub unsafe fn read_non_copy(self) -> T {
|
||||
(self.address.get() as *mut T).read_volatile()
|
||||
}
|
||||
pub fn write(self, val: T) {
|
||||
unsafe { (self.address.get() as *mut T).write_volatile(val) }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
So if the type is `Copy` we can `read` it as much as we want. If, somehow, the
|
||||
type isn't `Copy`, then it might be `Drop`, and that means if we read out a
|
||||
value over and over we could cause the `drop` method to trigger UB. Since the
|
||||
end user might really know what they're doing, we provide an unsafe backup
|
||||
`read_non_copy`.
|
||||
|
||||
On the other hand, we can `write` to the location as much as we want. Even if
|
||||
the type isn't `Copy`, _not running `Drop` is safe_, so a `write` is always
|
||||
safe.
|
||||
|
||||
Now we can write to our magical location.
|
||||
|
||||
```rust
|
||||
MAGIC_LOCATION.write(0x0403);
|
||||
```
|
||||
|
||||
### VolAddress Iteration
|
||||
|
||||
We've already seen that sometimes we want to have a base address of some sort
|
||||
and then offset from that location to another. What if we wanted to iterate over
|
||||
_all the locations_. That's not particularly hard.
|
||||
|
||||
```rust
|
||||
impl<T> VolAddress<T> {
|
||||
pub const unsafe fn iter_slots(self, slots: usize) -> VolAddressIter<T> {
|
||||
VolAddressIter { vol_address: self, slots }
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug)]
|
||||
pub struct VolAddressIter<T> {
|
||||
vol_address: VolAddress<T>,
|
||||
slots: usize,
|
||||
}
|
||||
impl<T> Clone for VolAddressIter<T> {
|
||||
fn clone(&self) -> Self {
|
||||
VolAddressIter {
|
||||
vol_address: self.vol_address,
|
||||
slots: self.slots,
|
||||
}
|
||||
}
|
||||
}
|
||||
impl<T> PartialEq for VolAddressIter<T> {
|
||||
fn eq(&self, other: &Self) -> bool {
|
||||
self.vol_address == other.vol_address && self.slots == other.slots
|
||||
}
|
||||
}
|
||||
impl<T> Eq for VolAddressIter<T> {}
|
||||
impl<T> Iterator for VolAddressIter<T> {
|
||||
type Item = VolAddress<T>;
|
||||
|
||||
fn next(&mut self) -> Option<Self::Item> {
|
||||
if self.slots > 0 {
|
||||
let out = self.vol_address;
|
||||
unsafe {
|
||||
self.slots -= 1;
|
||||
self.vol_address = self.vol_address.offset(1);
|
||||
}
|
||||
Some(out)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
}
|
||||
impl<T> FusedIterator for VolAddressIter<T> {}
|
||||
```
|
||||
|
||||
### VolAddressBlock
|
||||
|
||||
Obviously, having a base address and a length exist separately is error prone.
|
||||
There's a good reason for slices to keep their pointer and their length
|
||||
together. We want something like that, which we'll call a "block" because
|
||||
"array" and "slice" are already things in Rust.
|
||||
|
||||
```rust
|
||||
#[derive(Debug)]
|
||||
pub struct VolAddressBlock<T> {
|
||||
vol_address: VolAddress<T>,
|
||||
slots: usize,
|
||||
}
|
||||
impl<T> Clone for VolAddressBlock<T> {
|
||||
fn clone(&self) -> Self {
|
||||
VolAddressBlock {
|
||||
vol_address: self.vol_address,
|
||||
slots: self.slots,
|
||||
}
|
||||
}
|
||||
}
|
||||
impl<T> PartialEq for VolAddressBlock<T> {
|
||||
fn eq(&self, other: &Self) -> bool {
|
||||
self.vol_address == other.vol_address && self.slots == other.slots
|
||||
}
|
||||
}
|
||||
impl<T> Eq for VolAddressBlock<T> {}
|
||||
|
||||
impl<T> VolAddressBlock<T> {
|
||||
pub const unsafe fn new_unchecked(vol_address: VolAddress<T>, slots: usize) -> Self {
|
||||
VolAddressBlock { vol_address, slots }
|
||||
}
|
||||
pub const fn iter(self) -> VolAddressIter<T> {
|
||||
VolAddressIter {
|
||||
vol_address: self.vol_address,
|
||||
slots: self.slots,
|
||||
}
|
||||
}
|
||||
pub unsafe fn index_unchecked(self, slot: usize) -> VolAddress<T> {
|
||||
self.vol_address.offset(slot as isize)
|
||||
}
|
||||
pub fn index(self, slot: usize) -> VolAddress<T> {
|
||||
if slot < self.slots {
|
||||
unsafe { self.vol_address.offset(slot as isize) }
|
||||
} else {
|
||||
panic!("Index Requested: {} >= Bound: {}", slot, self.slots)
|
||||
}
|
||||
}
|
||||
pub fn get(self, slot: usize) -> Option<VolAddress<T>> {
|
||||
if slot < self.slots {
|
||||
unsafe { Some(self.vol_address.offset(slot as isize)) }
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Now we can have something like:
|
||||
|
||||
```rust
|
||||
const OTHER_MAGIC: VolAddressBlock<u16> = unsafe {
|
||||
VolAddressBlock::new_unchecked(
|
||||
VolAddress::new(0x600_0000),
|
||||
240 * 160
|
||||
)
|
||||
};
|
||||
|
||||
OTHER_MAGIC.index(120 + 80 * 240).write_volatile(0x001F);
|
||||
OTHER_MAGIC.index(136 + 80 * 240).write_volatile(0x03E0);
|
||||
OTHER_MAGIC.index(120 + 96 * 240).write_volatile(0x7C00);
|
||||
```
|
||||
|
||||
### Docs?
|
||||
|
||||
If you wanna see these types and methods with a full docs write up you should
|
||||
check the GBA crate's source.
|
||||
|
|
@ -1,28 +0,0 @@
|
|||
# Work RAM
|
||||
|
||||
## External Work RAM (EWRAM)
|
||||
|
||||
* **Address Span:** `0x2000000` to `0x203FFFF` (256k)
|
||||
|
||||
This is a big pile of space, the use of which is up to each game. However, the
|
||||
external work ram has only a 16-bit bus (if you read/write a 32-bit value it
|
||||
silently breaks it up into two 16-bit operations) and also 2 wait cycles (extra
|
||||
CPU cycles that you have to expend _per 16-bit bus use_).
|
||||
|
||||
It's most helpful to think of EWRAM as slower, distant memory, similar to the
|
||||
"heap" in a normal application. You can take the time to go store something
|
||||
within EWRAM, or to load it out of EWRAM, but if you've got several operations
|
||||
to do in a row and you're worried about time you should pull that value into
|
||||
local memory, work on your local copy, and then push it back out to EWRAM.
|
||||
|
||||
## Internal Work RAM (IWRAM)
|
||||
|
||||
* **Address Span:** `0x3000000` to `0x3007FFF` (32k)
|
||||
|
||||
This is a smaller pile of space, but it has a 32-bit bus and no wait.
|
||||
|
||||
By default, `0x3007F00` to `0x3007FFF` is reserved for interrupt and BIOS use.
|
||||
The rest of it is mostly up to you. The user's stack space starts at `0x3007F00`
|
||||
and proceeds _down_ from there. For best results you should probably start at
|
||||
`0x3000000` and then go upwards. Under normal use it's unlikely that the two
|
||||
memory regions will crash into each other.
|
|
@ -1,3 +0,0 @@
|
|||
# IO Registers
|
||||
|
||||
* **Address Span:** `0x400_0000` to `0x400_03FE`
|
|
@ -1,206 +0,0 @@
|
|||
# Newtype
|
||||
|
||||
TODO: we've already used newtype twice by now (fixed point values and volatile
|
||||
addresses), so we need to adjust how we start this section.
|
||||
|
||||
There's a great Zero Cost abstraction that we'll be using a lot that you might
|
||||
not already be familiar with: we're talking about the "Newtype Pattern"!
|
||||
|
||||
Now, I told you to read the Rust Book before you read this book, and I'm sure
|
||||
you're all good students who wouldn't sneak into this book without doing the
|
||||
required reading, so I'm sure you all remember exactly what I'm talking about,
|
||||
because they touch on the newtype concept in the book twice, in two _very_ long
|
||||
named sections:
|
||||
|
||||
* [Using the Newtype Pattern to Implement External Traits on External
|
||||
Types](https://doc.rust-lang.org/book/ch19-03-advanced-traits.html#using-the-newtype-pattern-to-implement-external-traits-on-external-types)
|
||||
* [Using the Newtype Pattern for Type Safety and
|
||||
Abstraction](https://doc.rust-lang.org/book/ch19-04-advanced-types.html#using-the-newtype-pattern-for-type-safety-and-abstraction)
|
||||
|
||||
...Yeah... The Rust Book doesn't know how to make a short sub-section name to
|
||||
save its life. Shame.
|
||||
|
||||
## Newtype Basics
|
||||
|
||||
So, we have all these pieces of data, and we want to keep them separated, and we
|
||||
don't wanna pay the cost for it at runtime. Well, we're in luck, we can pay the
|
||||
cost at compile time.
|
||||
|
||||
```rust
|
||||
pub struct PixelColor(u16);
|
||||
```
|
||||
|
||||
TODO: we've already talked about repr(transparent) by now
|
||||
|
||||
Ah, except that, as I'm sure you remember from [The
|
||||
Rustonomicon](https://doc.rust-lang.org/nomicon/other-reprs.html#reprtransparent)
|
||||
(and from the RFC too, of course), if we have a single field struct that's
|
||||
sometimes different from having just the bare value, so we should be using
|
||||
`#[repr(transparent)]` with our newtypes.
|
||||
|
||||
```rust
|
||||
#[repr(transparent)]
|
||||
pub struct PixelColor(u16);
|
||||
```
|
||||
|
||||
And then we'll need to do that same thing for _every other newtype we want_.
|
||||
|
||||
Except there's only two tiny parts that actually differ between newtype
|
||||
declarations: the new name and the base type. All the rest is just the same rote
|
||||
code over and over. Generating piles and piles of boilerplate code? Sounds like
|
||||
a job for a macro to me!
|
||||
|
||||
## Making It A Macro
|
||||
|
||||
If you're going to do much with macros you should definitely read through [The
|
||||
Little Book of Rust
|
||||
Macros](https://danielkeep.github.io/tlborm/book/index.html), but we won't be
|
||||
doing too much so you can just follow along here a bit if you like.
|
||||
|
||||
The most basic version of a newtype macro starts like this:
|
||||
|
||||
```rust
|
||||
#[macro_export]
|
||||
macro_rules! newtype {
|
||||
($new_name:ident, $old_name:ident) => {
|
||||
#[repr(transparent)]
|
||||
pub struct $new_name($old_name);
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
The `#[macro_export]` makes it exported by the current module (like `pub`
|
||||
kinda), and then we have one expansion option that takes an identifier, a `,`,
|
||||
and then a second identifier. The new name is the outer type we'll be using, and
|
||||
the old name is the inner type that's being wrapped. You'd use our new macro
|
||||
something like this:
|
||||
|
||||
```rust
|
||||
newtype! {PixelColorCurly, u16}
|
||||
|
||||
newtype!(PixelColorParens, u16);
|
||||
|
||||
newtype![PixelColorBrackets, u16];
|
||||
```
|
||||
|
||||
Note that you can invoke the macro with the outermost grouping as any of `()`,
|
||||
`[]`, or `{}`. It makes no particular difference to the macro. Also, that space
|
||||
in the first version is kinda to show off that you can put white space in
|
||||
between the macro name and the grouping if you want. The difference is mostly
|
||||
style, but there are some rules and considerations here:
|
||||
|
||||
* If you use curly braces then you _must not_ put a `;` after the invocation.
|
||||
* If you use parentheses or brackets then you _must_ put the `;` at the end.
|
||||
* Rustfmt cares which you use and formats accordingly:
|
||||
* Curly brace macro use mostly gets treated like a code block.
|
||||
* Parentheses macro use mostly gets treated like a function call.
|
||||
* Bracket macro use mostly gets treated like an array declaration.
|
||||
|
||||
**As a reminder:** remember that `macro_rules` macros have to appear _before_
|
||||
they're invoked in your source, so the `newtype` macro will always have to be at
|
||||
the very top of your file, or if you put it in a module within your project
|
||||
you'll need to declare the module before anything that uses it.
|
||||
|
||||
## Upgrade That Macro!
|
||||
|
||||
We also want to be able to add `derive` stuff and doc comments to our newtype.
|
||||
Within the context of `macro_rules!` definitions these are called "meta". Since
|
||||
we can have any number of them we wrap it all up in a "zero or more" matcher.
|
||||
Then our macro looks like this:
|
||||
|
||||
```rust
|
||||
#[macro_export]
|
||||
macro_rules! newtype {
|
||||
($(#[$attr:meta])* $new_name:ident, $old_name:ident) => {
|
||||
$(#[$attr])*
|
||||
#[repr(transparent)]
|
||||
pub struct $new_name($old_name);
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
So now we can write
|
||||
|
||||
```rust
|
||||
newtype! {
|
||||
/// Color on the GBA gives 5 bits for each channel, the highest bit is ignored.
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
PixelColor, u16
|
||||
}
|
||||
```
|
||||
|
||||
Next, we can allow for the wrapping of types that aren't just a single
|
||||
identifier by changing `$old_name` from `:ident` to `:ty`. We can't _also_ do
|
||||
this for the `$new_type` part because declaring a new struct expects a valid
|
||||
identifier that's _not_ already declared (obviously), and `:ty` is intended for
|
||||
capturing types that already exist.
|
||||
|
||||
```rust
|
||||
#[macro_export]
|
||||
macro_rules! newtype {
|
||||
($(#[$attr:meta])* $new_name:ident, $old_name:ty) => {
|
||||
$(#[$attr])*
|
||||
#[repr(transparent)]
|
||||
pub struct $new_name($old_name);
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
Next of course we'll want to usually have a `new` method that's const and just
|
||||
gives a 0 value. We won't always be making a newtype over a number value, but we
|
||||
often will. It's usually silly to have a `new` method with no arguments since we
|
||||
might as well just impl `Default`, but `Default::default` isn't `const`, so
|
||||
having `pub const fn new() -> Self` is justified here.
|
||||
|
||||
Here, the token `0` is given the `{integer}` type, which can be converted into
|
||||
any of the integer types as needed, but it still can't be converted into an
|
||||
array type or a pointer or things like that. Accordingly we've added the "no
|
||||
frills" option which declares the struct and no `new` method.
|
||||
|
||||
```rust
|
||||
#[macro_export]
|
||||
macro_rules! newtype {
|
||||
($(#[$attr:meta])* $new_name:ident, $old_name:ty) => {
|
||||
$(#[$attr])*
|
||||
#[repr(transparent)]
|
||||
pub struct $new_name($old_name);
|
||||
impl $new_name {
|
||||
/// A `const` "zero value" constructor
|
||||
pub const fn new() -> Self {
|
||||
$new_name(0)
|
||||
}
|
||||
}
|
||||
};
|
||||
($(#[$attr:meta])* $new_name:ident, $old_name:ty, no frills) => {
|
||||
$(#[$attr])*
|
||||
#[repr(transparent)]
|
||||
pub struct $new_name($old_name);
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
Finally, we usually want to have the wrapped value be totally private, but there
|
||||
_are_ occasions where that's not the case. For this, we can allow the wrapped
|
||||
field to accept a visibility modifier.
|
||||
|
||||
```rust
|
||||
#[macro_export]
|
||||
macro_rules! newtype {
|
||||
($(#[$attr:meta])* $new_name:ident, $v:vis $old_name:ty) => {
|
||||
$(#[$attr])*
|
||||
#[repr(transparent)]
|
||||
pub struct $new_name($v $old_name);
|
||||
impl $new_name {
|
||||
/// A `const` "zero value" constructor
|
||||
pub const fn new() -> Self {
|
||||
$new_name(0)
|
||||
}
|
||||
}
|
||||
};
|
||||
($(#[$attr:meta])* $new_name:ident, $v:vis $old_name:ty, no frills) => {
|
||||
$(#[$attr])*
|
||||
#[repr(transparent)]
|
||||
pub struct $new_name($v $old_name);
|
||||
};
|
||||
}
|
||||
```
|
|
@ -1 +0,0 @@
|
|||
# Sound
|
|
@ -1,130 +0,0 @@
|
|||
# Constant Assertions
|
||||
|
||||
Have you ever wanted to assert things _even before runtime_? We all have, of
|
||||
course. Particularly when the runtime machine is a poor little GBA, we'd like to
|
||||
have the machine doing the compile handle as much checking as possible.
|
||||
|
||||
Enter the [static assertions](https://docs.rs/static_assertions/) crate, which
|
||||
provides a way to let you assert on a `const` expression.
|
||||
|
||||
This is an amazing crate that you should definitely use when you can.
|
||||
|
||||
It's written by [Nikolai Vazquez](https://github.com/nvzqz), and they kindly
|
||||
wrote up a [blog
|
||||
post](https://nikolaivazquez.com/posts/programming/rust-static-assertions/) that
|
||||
explains the thinking behind it.
|
||||
|
||||
However, I promised that each example would be single file, and I also promised
|
||||
to explain what's going on as we go, so we'll briefly touch upon giving an
|
||||
explanation here.
|
||||
|
||||
## How We Const Assert
|
||||
|
||||
Alright, as it stands (2018-12-15), we can't use `if` in a `const` context.
|
||||
|
||||
Since we can't use `if`, we can't use a normal `assert!`. Some day it will be
|
||||
possible, and a failed assert at compile time will be a compile error and a
|
||||
failed assert at run time will be a panic and we'll have a nice unified
|
||||
programming experience. We can add runtime-only assertions by being a little
|
||||
tricky with the compiler.
|
||||
|
||||
If we write
|
||||
|
||||
```rust
|
||||
const ASSERT: usize = 0 - 1;
|
||||
```
|
||||
|
||||
that gives a warning, since the math would underflow. We can upgrade that
|
||||
warning to a hard error:
|
||||
|
||||
```rust
|
||||
#[deny(const_err)]
|
||||
const ASSERT: usize = 0 - 1;
|
||||
```
|
||||
|
||||
And to make our construction reusable we can enable the
|
||||
[underscore_const_names](https://github.com/rust-lang/rust/issues/54912) feature
|
||||
in our program (or library) and then give each such const an underscore for a
|
||||
name.
|
||||
|
||||
```rust
|
||||
#![feature(underscore_const_names)]
|
||||
|
||||
#[deny(const_err)]
|
||||
const _: usize = 0 - 1;
|
||||
```
|
||||
|
||||
Now we wrap this in a macro where we give a `bool` expression as input. We
|
||||
negate the bool then cast it to a `usize`, meaning that `true` negates into
|
||||
`false`, which becomes `0usize`, and then there's no underflow error. Or if the
|
||||
input was `false`, it negates into `true`, then becomes `1usize`, and then the
|
||||
underflow error fires.
|
||||
|
||||
```rust
|
||||
macro_rules! const_assert {
|
||||
($condition:expr) => {
|
||||
#[deny(const_err)]
|
||||
#[allow(dead_code)]
|
||||
const ASSERT: usize = 0 - !$condition as usize;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Technically, written like this, the expression can be anything with a
|
||||
`core::ops::Not` implementation that can also be `as` cast into `usize`. That's
|
||||
`bool`, but also basically all the other number types. Since we want to ensure
|
||||
that we get proper looking type errors when things go wrong, we can use
|
||||
`($condition && true)` to enforce that we get a `bool` (thanks to `Talchas` for
|
||||
that particular suggestion).
|
||||
|
||||
```rust
|
||||
macro_rules! const_assert {
|
||||
($condition:expr) => {
|
||||
#[deny(const_err)]
|
||||
#[allow(dead_code)]
|
||||
const _: usize = 0 - !($condition && true) as usize;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Asserting Something
|
||||
|
||||
As an example of how we might use a `const_assert`, we'll do a demo with colors.
|
||||
There's a red, blue, and green channel. We store colors in a `u16` with 5 bits
|
||||
for each channel.
|
||||
|
||||
```rust
|
||||
newtype! {
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
Color, u16
|
||||
}
|
||||
```
|
||||
|
||||
And when we're building a color, we're passing in `u16` values, but they could
|
||||
be using more than just 5 bits of space. We want to make sure that each channel
|
||||
is 31 or less, so we can make a color builder that does a `const_assert!` on the
|
||||
value of each channel.
|
||||
|
||||
```rust
|
||||
macro_rules! rgb {
|
||||
($r:expr, $g:expr, $b:expr) => {
|
||||
{
|
||||
const_assert!($r <= 31);
|
||||
const_assert!($g <= 31);
|
||||
const_assert!($b <= 31);
|
||||
Color($b << 10 | $g << 5 | $r)
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
And then we can declare some colors
|
||||
|
||||
```rust
|
||||
const RED: Color = rgb!(31, 0, 0);
|
||||
|
||||
const BLUE: Color = rgb!(31, 500, 0);
|
||||
```
|
||||
|
||||
The second one is clearly out of bounds and it fires an error just like we
|
||||
wanted.
|
|
@ -1,78 +0,0 @@
|
|||
# Help and Resources
|
||||
|
||||
## Help
|
||||
|
||||
So you're stuck on a problem and the book doesn't say what to do. Where can you
|
||||
find out more?
|
||||
|
||||
The first place I would suggest is the [Rust Community
|
||||
Discord](https://discordapp.com/invite/aVESxV8). If it's a general Rust question
|
||||
then you can ask anyone in any channel you feel is appropriate. If it's GBA
|
||||
specific then you can try asking me (`Lokathor`) or `Ketsuban` in the `#gamedev`
|
||||
channel.
|
||||
|
||||
## Emulators
|
||||
|
||||
You certainly might want to eventually write a game that you can put on a flash
|
||||
cart and play on real hardware, but for most of your development you'll probably
|
||||
want to be using an emulator for testing, because you don't have to fiddle with
|
||||
cables and all that.
|
||||
|
||||
In terms of emulators, you want to be using
|
||||
[mGBA](https://github.com/mgba-emu/mgba), and you want to be using the [0.7 Beta
|
||||
1](https://github.com/mgba-emu/mgba/releases/tag/0.7-b1) or later. This update
|
||||
lets you run raw ELF files, which means that you can have full debug symbols
|
||||
available while you're debugging problems.
|
||||
|
||||
## Information Resources
|
||||
|
||||
First, if I fail to describe something related to Rust, you can always try
|
||||
checking in [The Rust
|
||||
Reference](https://doc.rust-lang.org/nightly/reference/introduction.html) to see
|
||||
if they cover it. You can mostly ignore that big scary red banner at the top,
|
||||
things are a lot better documented than they make it sound.
|
||||
|
||||
If you need help trying to fiddle your math down as hard as you can, there are
|
||||
resources such as the [Bit Twiddling
|
||||
Hacks](https://graphics.stanford.edu/~seander/bithacks.html) page.
|
||||
|
||||
As to GBA related lore, Ketsuban and I didn't magically learn this all from
|
||||
nowhere, we read various technical manuals and guides ourselves and then
|
||||
distilled those works oriented around C and C++ into a book for Rust.
|
||||
|
||||
We have personally used some or all of the following:
|
||||
|
||||
* [GBATEK](http://problemkaputt.de/gbatek.htm): This is _the_ resource. It
|
||||
covers not only the GBA, but also the DS and DSi, and also a run down of ARM
|
||||
assembly (32-bit and 16-bit opcodes). The link there is to the 2.9b version on
|
||||
`problemkaputt.de` (the official home of the document), but if you just google
|
||||
for gbatek the top result is for the 2.5 version on `akkit.org`, so make sure
|
||||
you're looking at the newest version. Sometimes `problemkaputt.de` is a little
|
||||
sluggish so I've also [mirrored](https://lokathor.com/gbatek.html) the 2.9b
|
||||
version on my own site as well. GBATEK is rather large, over 2mb of text, so
|
||||
if you're on a phone or similar you might want to save an offline copy to go
|
||||
easy on your data usage.
|
||||
* [TONC](https://www.coranac.com/tonc/text/): While GBATEK is basically just a
|
||||
huge tech specification, TONC is an actual _guide_ on how to make sense of the
|
||||
GBA's abilities and organize it into a game. It's written for C of course, but
|
||||
as a Rust programmer you should always be practicing your ability to read C
|
||||
code anyway. It's the programming equivalent of learning Latin because all the
|
||||
old academic books are written in Latin.
|
||||
* [CowBite](https://www.cs.rit.edu/~tjh8300/CowBite/CowBiteSpec.htm): This is
|
||||
more like GBATEK, and it's less complete, but it mixes in a little more
|
||||
friendly explanation of things in between the hardware spec parts.
|
||||
|
||||
And I haven't had time to look at it myself, [The Audio
|
||||
Advance](http://belogic.com/gba/) seems to be very good. It explains in depth
|
||||
how you can get audio working on the GBA. Note that the table of contents for
|
||||
each page goes along the top instead of down the side.
|
||||
|
||||
## Non-Rust GBA Community
|
||||
|
||||
There's also the [GBADev.org](http://www.gbadev.org/) site, which has a forum
|
||||
and everything. They're coding in C and C++, but you can probably overcome that
|
||||
difference with a little work on your part.
|
||||
|
||||
I also found a place called
|
||||
[GBATemp](https://gbatemp.net/categories/nintendo-gba-discussions.32/), which
|
||||
seems to have a more active forum but less of a focus on actual coding.
|
|
@ -1 +0,0 @@
|
|||
# Interrupts
|
|
@ -1,50 +0,0 @@
|
|||
# Palette RAM (PALRAM)
|
||||
|
||||
* **Address Span:** `0x500_0000` to `0x500_03FF` (1k)
|
||||
|
||||
Palette RAM has a 16-bit bus, which isn't really a problem because it
|
||||
conceptually just holds `u16` values. There's no automatic wait state, but if
|
||||
you try to access the same location that the display controller is accessing you
|
||||
get bumped by 1 cycle. Since the display controller can use the palette ram any
|
||||
number of times per scanline it's basically impossible to predict if you'll have
|
||||
to do a wait or not during VDraw. During VBlank you won't have any wait of
|
||||
course.
|
||||
|
||||
PALRAM is among the memory where there's weirdness if you try to write just one
|
||||
byte: if you try to write just 1 byte, it writes that byte into _both_ parts of
|
||||
the larger 16-bit location. This doesn't really affect us much with PALRAM,
|
||||
because palette values are all supposed to be `u16` anyway.
|
||||
|
||||
The palette memory actually contains not one, but _two_ sets of palettes. First
|
||||
there's 256 entries for the background palette data (starting at `0x500_0000`),
|
||||
and then there's 256 entries for object palette data (starting at `0x500_0200`).
|
||||
|
||||
The GBA also has two modes for palette access: 8-bits-per-pixel (8bpp) and
|
||||
4-bits-per-pixel (4bpp).
|
||||
|
||||
* In 8bpp mode an 8-bit palette index value within a background or sprite
|
||||
simply indexes directly into the 256 slots for that type of thing.
|
||||
* In 4bpp mode a 4-bit palette index value within a background or sprite
|
||||
specifies an index within a particular "palbank" (16 palette entries each),
|
||||
and then a _separate_ setting outside of the graphical data determines which
|
||||
palbank is to be used for that background or object (the screen entry data for
|
||||
backgrounds, and the object attributes for objects).
|
||||
|
||||
### Transparency
|
||||
|
||||
When a pixel within a background or object specifies index 0 as its palette
|
||||
entry it is treated as a transparent pixel. This means that in 8bpp mode there's
|
||||
only 255 actual color options (0 being transparent), and in 4bpp mode there's
|
||||
only 15 actual color options available within each palbank (the 0th entry of
|
||||
_each_ palbank is transparent).
|
||||
|
||||
Individual backgrounds, and individual objects, each determine if they're 4bpp
|
||||
or 8bpp separately, so a given overall palette slot might map to a used color in
|
||||
8bpp and an unused/transparent color in 4bpp. If you're a palette wizard.
|
||||
|
||||
Palette slot 0 of the overall background palette is used to determine the
|
||||
"backdrop" color. That's the color you see if no background or object ends up
|
||||
being rendered within a given pixel.
|
||||
|
||||
Since display mode 3 and display mode 5 don't use the palette, they cannot
|
||||
benefit from transparency.
|
|
@ -1 +0,0 @@
|
|||
# Link Cable
|
|
@ -1,24 +0,0 @@
|
|||
# Video RAM (VRAM)
|
||||
|
||||
* **Address Span:** `0x600_0000` to `0x601_7FFF` (96k)
|
||||
|
||||
We've used this before! VRAM has a 16-bit bus and no wait. However, the same as
|
||||
with PALRAM, the "you might have to wait if the display controller is looking at
|
||||
it" rule applies here.
|
||||
|
||||
Unfortunately there's not much more exact detail that can be given about VRAM.
|
||||
The use of the memory depends on the video mode that you're using.
|
||||
|
||||
One general detail of note is that you can't write individual bytes to any part
|
||||
of VRAM. Depending on mode and location, you'll either get your bytes doubled
|
||||
into both the upper and lower parts of the 16-bit location targeted, or you
|
||||
won't even affect the memory. This usually isn't a big deal, except in two
|
||||
situations:
|
||||
|
||||
* In Mode 4, if you want to change just 1 pixel, you'll have to be very careful
|
||||
to read the old `u16`, overwrite just the byte you wanted to change, and then
|
||||
write that back.
|
||||
* In any display mode, avoid using `memcopy` to place things into VRAM.
|
||||
It's written to be byte oriented, and only does 32-bit transfers under select
|
||||
conditions. The rest of the time it'll copy one byte at a time and you'll get
|
||||
either garbage or nothing at all.
|
|
@ -1 +0,0 @@
|
|||
# Game Pak
|
|
@ -1,62 +0,0 @@
|
|||
# Object Attribute Memory (OAM)
|
||||
|
||||
* **Address Span:** `0x700_0000` to `0x700_03FF` (1k)
|
||||
|
||||
The Object Attribute Memory has a 32-bit bus and no default wait, but suffers
|
||||
from the "you might have to wait if the display controller is looking at it"
|
||||
rule. You cannot write individual bytes to OAM at all, but that's not really a
|
||||
problem because all the fields of the data types within OAM are either `i16` or
|
||||
`u16` anyway.
|
||||
|
||||
Object attribute memory is the wildest yet: it conceptually contains two types
|
||||
of things, but they're _interlaced_ with each other all the way through.
|
||||
|
||||
Now, [GBATEK](http://problemkaputt.de/gbatek.htm#lcdobjoamattributes) and
|
||||
[CowByte](https://www.cs.rit.edu/~tjh8300/CowBite/CowBiteSpec.htm#OAM%20(sprites))
|
||||
doesn't quite give names to the two data types here.
|
||||
[TONC](https://www.coranac.com/tonc/text/regobj.htm#sec-oam) calls them
|
||||
`OBJ_ATTR` and `OBJ_AFFINE`, but we'll be giving them names fitting with the
|
||||
Rust naming convention. Just know that if you try to talk about it with others
|
||||
they might not be using the same names. In Rust terms their layout would look
|
||||
like this:
|
||||
|
||||
```rust
|
||||
#[repr(C)]
|
||||
pub struct ObjectAttributes {
|
||||
attr0: u16,
|
||||
attr1: u16,
|
||||
attr2: u16,
|
||||
filler: i16,
|
||||
}
|
||||
|
||||
#[repr(C)]
|
||||
pub struct AffineMatrix {
|
||||
filler0: [u16; 3],
|
||||
pa: i16,
|
||||
filler1: [u16; 3],
|
||||
pb: i16,
|
||||
filler2: [u16; 3],
|
||||
pc: i16,
|
||||
filler3: [u16; 3],
|
||||
pd: i16,
|
||||
}
|
||||
```
|
||||
|
||||
(Note: the `#[repr(C)]` part just means that Rust must lay out the data exactly
|
||||
in the order we specify, which otherwise it is not required to do).
|
||||
|
||||
So, we've got 1024 bytes in OAM and each `ObjectAttributes` value is 8 bytes, so
|
||||
naturally we can support up to 128 objects.
|
||||
|
||||
_At the same time_, we've got 1024 bytes in OAM and each `AffineMatrix` is 32
|
||||
bytes, so we can have 32 of them.
|
||||
|
||||
But, as I said, these things are all _interlaced_ with each other. See how
|
||||
there's "filler" fields in each struct? If we imagine the OAM as being just an
|
||||
array of one type or the other, indexes 0/1/2/3 of the `ObjectAttributes` array
|
||||
would line up with index 0 of the `AffineMatrix` array. It's kinda weird, but
|
||||
that's just how it works. When we setup functions to read and write these values
|
||||
we'll have to be careful with how we do it. We probably _won't_ want to use
|
||||
those representations above, at least not with the `AffineMatrix` type, because
|
||||
they're quite wasteful if you want to store just object attributes or just
|
||||
affine matrices.
|
|
@ -1,14 +0,0 @@
|
|||
# Game Pak ROM / Flash ROM (ROM)
|
||||
|
||||
* **Address Span (Wait State 0):** `0x800_0000` to `0x9FF_FFFF`
|
||||
* **Address Span (Wait State 1):** `0xA00_0000` to `0xBFF_FFFF`
|
||||
* **Address Span (Wait State 2):** `0xC00_0000` to `0xDFF_FFFF`
|
||||
|
||||
The game's ROM data is a single set of data that's up to 32 megabytes in size.
|
||||
However, that data is mirrored to three different locations in the address
|
||||
space. Depending on which part of the address space you use, it can affect the
|
||||
memory timings involved.
|
||||
|
||||
TODO: describe `WAITCNT` here, we won't get a better chance at it.
|
||||
|
||||
TODO: discuss THUMB vs ARM code and why THUMB is so much faster (because ROM is a 16-bit bus)
|
|
@ -1,21 +0,0 @@
|
|||
# Save RAM (SRAM)
|
||||
|
||||
* **Address Span:** `0xE00_0000` to `0xE00FFFF` (64k)
|
||||
|
||||
The actual amount of SRAM available depends on your game pak, and the 64k figure
|
||||
is simply the maximum possible. A particular game pak might have less, and an
|
||||
emulator will likely let you have all 64k if you want.
|
||||
|
||||
As with other portions of the address space, SRAM has some number of wait cycles
|
||||
per use. As with ROM, you can change the wait cycle settings via the `WAITCNT`
|
||||
register if the defaults don't work well for your game pak. See the ROM section
|
||||
for full details of how the `WAITCNT` register works.
|
||||
|
||||
The game pak SRAM also has only an 8-bit bus, so have fun with that.
|
||||
|
||||
The GBA Direct Memory Access (DMA) unit cannot access SRAM.
|
||||
|
||||
Also, you [should not write to SRAM with code executing from
|
||||
ROM](https://problemkaputt.de/gbatek.htm#gbacartbackupsramfram). Instead, you
|
||||
should move the code to WRAM and execute the save code from there. We'll cover
|
||||
how to handle that eventually.
|
File diff suppressed because it is too large
Load diff
|
@ -1,52 +0,0 @@
|
|||
# Ch 3: Memory and Objects
|
||||
|
||||
Alright so we can do some basic "movement", but we left a big trail in the video
|
||||
memory of everywhere we went. Most of the time that's not what we want at all.
|
||||
If we want more hardware support we're going to have to use a new video mode. So
|
||||
far we've only used Mode 3, but modes 4 and 5 are basically the same. Instead,
|
||||
we'll switch focus to using a tiled graphical mode.
|
||||
|
||||
First we will go over the complete GBA memory mapping. Part of this is the
|
||||
memory for tiled graphics, but also things like all those IO registers, where
|
||||
our RAM is for scratch space, all that stuff. Even if we can't put all of them
|
||||
to use at once, it's helpful to have an idea of what will be available in the
|
||||
long run.
|
||||
|
||||
Tiled modes bring us three big new concepts that each have their own complexity:
|
||||
tiles, backgrounds, and objects. Backgrounds and objects both use tiles, but the
|
||||
background is for creating a very large static space that you can scroll around
|
||||
the view within, and the objects are about having a few moving bits that appear
|
||||
over the background. Careful use of backgrounds and objects is key to having the
|
||||
best looking GBA game, so we won't even be able to cover it all in a single
|
||||
chapter.
|
||||
|
||||
And, of course, since most games are pretty boring if they're totally static
|
||||
we'll touch on the kinds of RNG implementations you might want to have on a GBA.
|
||||
Most general purpose RNGs that you find are rather big compared to the amount of
|
||||
memory we want to give them, and they often use a lot of `u64` operations, so
|
||||
they end up much slower on a 32-bit machine like the GBA (you can lower 64-bit
|
||||
ops to combinations of 32-bit ops, but that's quite a bit more work). We'll
|
||||
cover a few RNG options that size down the RNG to a good size and a good speed
|
||||
without trading away too much in terms of quality.
|
||||
|
||||
To top it all off, we'll make a simple "memory game" sort of thing. There's some
|
||||
face down cards in a grid, you pick one to check, then you pick the other to
|
||||
check, and then if they match the pair disappears.
|
||||
|
||||
## Drawing Priority
|
||||
|
||||
Both backgrounds and objects can have "priority" values associated with them.
|
||||
TONC and GBATEK have _opposite_ ideas of what it means to have the "highest"
|
||||
priority. TONC goes by highest numerical value, and GBATEK goes by what's on the
|
||||
z-layer closest to the user. Let's list out the rules as clearly as we can:
|
||||
|
||||
* Priority is always two bits, so 0 through 3.
|
||||
* Priority conceptually proceeds in drawing passes that count _down_, so any
|
||||
priority 3 things can get covered up by priority 2 things. In truth there's
|
||||
probably depth testing and buffering stuff going on so it's all one single
|
||||
pass, but conceptually we will imagine it happening as all of the 3 elements,
|
||||
then all of 2, and so on.
|
||||
* Objects always draw over top of backgrounds of equal priority.
|
||||
* Within things of the same type and priority, the lower numbered element "wins"
|
||||
and gets its pixel drawn (bg0 is favored over bg1, obj0 is favored over obj1,
|
||||
etc).
|
|
@ -1,33 +0,0 @@
|
|||
# IO Registers
|
||||
|
||||
The GBA has a large number of **IO Registers** (not to be confused with CPU
|
||||
registers). These are special memory locations from `0x04000000` to
|
||||
`0x040003FE`. GBATEK has a [full
|
||||
list](http://problemkaputt.de/gbatek.htm#gbaiomap), but we only need to learn
|
||||
about a few of them at a time as we go, so don't be worried.
|
||||
|
||||
The important facts to know about IO Registers are these:
|
||||
|
||||
* Each has their own specific size. Most are `u16`, but some are `u32`.
|
||||
* All of them must be accessed in a `volatile` style.
|
||||
* Each register is specifically readable or writable or both. Actually, with
|
||||
some registers there are even individual bits that are read-only or
|
||||
write-only.
|
||||
* If you write to a read-only position, those writes are simply ignored. This
|
||||
mostly matters if a writable register contains a read-only bit (such as the
|
||||
Display Control, next section).
|
||||
* If you read from a write-only position, you get back values that are
|
||||
[basically
|
||||
nonsense](http://problemkaputt.de/gbatek.htm#gbaunpredictablethings). There
|
||||
aren't really any registers that mix writable bits with read only bits, so
|
||||
you're basically safe here. The only (mild) concern is that when you write a
|
||||
value into a write-only register you need to keep track of what you wrote
|
||||
somewhere else if you want to know what you wrote (such to adjust an offset
|
||||
value by +1, or whatever).
|
||||
* You can always check GBATEK to be sure, but if I don't mention it then a bit
|
||||
is probably both read and write.
|
||||
* Some registers have invalid bit patterns. For example, the lowest three bits
|
||||
of the Display Control register can't legally be set to the values 6 or 7.
|
||||
|
||||
When talking about bit positions, the numbers are _zero indexed_ just like an
|
||||
array index is.
|
|
@ -1,135 +0,0 @@
|
|||
# light_cycle
|
||||
|
||||
Now let's make a game of "light_cycle" with our new knowledge.
|
||||
|
||||
## Gameplay
|
||||
|
||||
`light_cycle` is pretty simple, and very obvious if you've ever seen Tron. The
|
||||
player moves around the screen with a trail left behind them. They die if they
|
||||
go off the screen or if they touch their own trail.
|
||||
|
||||
## Operations
|
||||
|
||||
We need some better drawing operations this time around.
|
||||
|
||||
```rust
|
||||
pub unsafe fn mode3_clear_screen(color: u16) {
|
||||
let color = color as u32;
|
||||
let bulk_color = color << 16 | color;
|
||||
let mut ptr = VolatilePtr(VRAM as *mut u32);
|
||||
for _ in 0..SCREEN_HEIGHT {
|
||||
for _ in 0..(SCREEN_WIDTH / 2) {
|
||||
ptr.write(bulk_color);
|
||||
ptr = ptr.offset(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub unsafe fn mode3_draw_pixel(col: isize, row: isize, color: u16) {
|
||||
VolatilePtr(VRAM as *mut u16).offset(col + row * SCREEN_WIDTH).write(color);
|
||||
}
|
||||
|
||||
pub unsafe fn mode3_read_pixel(col: isize, row: isize) -> u16 {
|
||||
VolatilePtr(VRAM as *mut u16).offset(col + row * SCREEN_WIDTH).read()
|
||||
}
|
||||
```
|
||||
|
||||
The draw pixel and read pixel are both pretty obvious. What's new is the clear
|
||||
screen operation. It changes the `u16` color into a `u32` and then packs the
|
||||
value in twice. Then we write out `u32` values the whole way through screen
|
||||
memory. This means we have to do less write operations overall, and so the
|
||||
screen clear is twice as fast.
|
||||
|
||||
Now we just have to fill in the main function:
|
||||
|
||||
```rust
|
||||
#[start]
|
||||
fn main(_argc: isize, _argv: *const *const u8) -> isize {
|
||||
unsafe {
|
||||
DISPCNT.write(MODE3 | BG2);
|
||||
}
|
||||
|
||||
let mut px = SCREEN_WIDTH / 2;
|
||||
let mut py = SCREEN_HEIGHT / 2;
|
||||
let mut color = rgb16(31, 0, 0);
|
||||
|
||||
loop {
|
||||
// read the input for this frame
|
||||
let this_frame_keys = key_input();
|
||||
|
||||
// adjust game state and wait for vblank
|
||||
px += 2 * this_frame_keys.column_direction() as isize;
|
||||
py += 2 * this_frame_keys.row_direction() as isize;
|
||||
wait_until_vblank();
|
||||
|
||||
// draw the new game and wait until the next frame starts.
|
||||
unsafe {
|
||||
if px < 0 || py < 0 || px == SCREEN_WIDTH || py == SCREEN_HEIGHT {
|
||||
// out of bounds, reset the screen and position.
|
||||
mode3_clear_screen(0);
|
||||
color = color.rotate_left(5);
|
||||
px = SCREEN_WIDTH / 2;
|
||||
py = SCREEN_HEIGHT / 2;
|
||||
} else {
|
||||
let color_here = mode3_read_pixel(px, py);
|
||||
if color_here != 0 {
|
||||
// crashed into our own line, reset the screen
|
||||
mode3_clear_screen(0);
|
||||
color = color.rotate_left(5);
|
||||
} else {
|
||||
// draw the new part of the line
|
||||
mode3_draw_pixel(px, py, color);
|
||||
mode3_draw_pixel(px, py + 1, color);
|
||||
mode3_draw_pixel(px + 1, py, color);
|
||||
mode3_draw_pixel(px + 1, py + 1, color);
|
||||
}
|
||||
}
|
||||
}
|
||||
wait_until_vdraw();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Oh that's a lot more than before!
|
||||
|
||||
First we set Mode 3 and Background 2, we know about that.
|
||||
|
||||
Then we're going to store the player's x and y, along with a color value for
|
||||
their light cycle. Then we enter the core loop.
|
||||
|
||||
We read the keys for input, and then do as much as we can without touching video
|
||||
memory. Since we're using video memory as the place to store the player's light
|
||||
trail, we can't do much, we just update their position and wait for VBlank to
|
||||
start. The player will be a 2x2 square, so the arrows will move you 2 pixels per
|
||||
frame.
|
||||
|
||||
Once we're in VBlank we check to see what kind of drawing we're doing. If the
|
||||
player has gone out of bounds, we clear the screen, rotate their color, and then
|
||||
reset their position. Why rotate the color? Just because it's fun to have
|
||||
different colors.
|
||||
|
||||
Next, if the player is in bounds we read the video memory for their position. If
|
||||
it's not black that means we've been here before and the player has crashed into
|
||||
their own line. In this case, we reset the game without moving them to a new
|
||||
location.
|
||||
|
||||
Finally, if the player is in bounds and they haven't crashed, we write their
|
||||
color into memory at this position.
|
||||
|
||||
Regardless of how it worked out, we hold here until vdraw starts before going to
|
||||
the next loop. That's all there is to it.
|
||||
|
||||
## The gba crate doesn't quite work like this
|
||||
|
||||
Once again, as with the `hello1` and `hello2` examples, the `gba` crate covers
|
||||
much of this same ground as our example here, but in slightly different ways.
|
||||
|
||||
Better organization and abstractions are usually only realized once you've used
|
||||
more of the whole thing you're trying to work with. If we want to have a crate
|
||||
where the whole thing is well integrated with itself, then the examples would
|
||||
also end up having to explain about things we haven't really touched on much
|
||||
yet. It becomes a lot harder to teach.
|
||||
|
||||
So, going forward, we will continue to teach concepts and build examples that
|
||||
don't directly depend on the `gba` crate. This allows the crate to freely grow
|
||||
without all the past examples becoming a great inertia upon it.
|
|
@ -1,316 +0,0 @@
|
|||
# Making A Memory Game
|
||||
|
||||
For this example to show off our new skills we'll make a "memory" game. The idea
|
||||
is that there's some face down cards and you pick one, it flips, you pick a
|
||||
second, if they match they both go away, if they don't match they both turn back
|
||||
face down. The player keeps going until all the cards are gone, then we'll deal
|
||||
the cards again.
|
||||
|
||||
There are many steps to do to get such a simple seeming game going. In fact I
|
||||
stumbled a bit myself when trying to get things set up and going despite having
|
||||
written and explained all the parts so far. Accordingly, we'll take each part
|
||||
very slowly, and review things as we build up our game.
|
||||
|
||||
We'll start back with a nearly blank file, calling it `memory_game.rs`:
|
||||
|
||||
```rust
|
||||
#![feature(start)]
|
||||
#![no_std]
|
||||
|
||||
#[panic_handler]
|
||||
fn panic(_info: &core::panic::PanicInfo) -> ! {
|
||||
loop {}
|
||||
}
|
||||
|
||||
#[start]
|
||||
fn main(_argc: isize, _argv: *const *const u8) -> isize {
|
||||
loop {
|
||||
// TODO the whole thing
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Displaying A Background
|
||||
|
||||
First let's try to get a background going. We'll display a simple checker
|
||||
pattern just so that we know that we did something.
|
||||
|
||||
Remember, backgrounds have the following essential components:
|
||||
|
||||
* Background Palette
|
||||
* Background Tiles
|
||||
* Screenblock
|
||||
* IO Registers
|
||||
|
||||
### Background Palette
|
||||
|
||||
To write to the background palette memory we'll want to name a `VolatilePtr` for
|
||||
it. We'll probably also want to be able to cast between different types either
|
||||
right away or later in this program, so we'll add a method for that.
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
#[repr(transparent)]
|
||||
pub struct VolatilePtr<T>(pub *mut T);
|
||||
impl<T> VolatilePtr<T> {
|
||||
pub unsafe fn read(&self) -> T {
|
||||
core::ptr::read_volatile(self.0)
|
||||
}
|
||||
pub unsafe fn write(&self, data: T) {
|
||||
core::ptr::write_volatile(self.0, data);
|
||||
}
|
||||
pub fn offset(self, count: isize) -> Self {
|
||||
VolatilePtr(self.0.wrapping_offset(count))
|
||||
}
|
||||
pub fn cast<Z>(self) -> VolatilePtr<Z> {
|
||||
VolatilePtr(self.0 as *mut Z)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Now we give ourselves an easy way to write a color into a palbank slot.
|
||||
|
||||
```rust
|
||||
pub const BACKGROUND_PALETTE: VolatilePtr<u16> = VolatilePtr(0x500_0000 as *mut u16);
|
||||
|
||||
pub fn set_bg_palette_4bpp(palbank: usize, slot: usize, color: u16) {
|
||||
assert!(palbank < 16);
|
||||
assert!(slot > 0 && slot < 16);
|
||||
unsafe {
|
||||
BACKGROUND_PALETTE
|
||||
.cast::<[u16; 16]>()
|
||||
.offset(palbank as isize)
|
||||
.cast::<u16>()
|
||||
.offset(slot as isize)
|
||||
.write(color);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
And of course we need to bring back in our ability to build color values, as
|
||||
well as a few named colors to start us off:
|
||||
|
||||
```rust
|
||||
pub const fn rgb16(red: u16, green: u16, blue: u16) -> u16 {
|
||||
blue << 10 | green << 5 | red
|
||||
}
|
||||
|
||||
pub const WHITE: u16 = rgb16(31, 31, 31);
|
||||
pub const LIGHT_GRAY: u16 = rgb16(25, 25, 25);
|
||||
pub const DARK_GRAY: u16 = rgb16(15, 15, 15);
|
||||
```
|
||||
|
||||
Which _finally_ allows us to set our palette colors in `main`:
|
||||
|
||||
```rust
|
||||
fn main(_argc: isize, _argv: *const *const u8) -> isize {
|
||||
set_bg_palette_4bpp(0, 1, WHITE);
|
||||
set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
|
||||
set_bg_palette_4bpp(0, 3, DARK_GRAY);
|
||||
```
|
||||
|
||||
### Background Tiles
|
||||
|
||||
So we'll want some light gray tiles and some dark gray tiles. We could use a
|
||||
single tile and then swap it between palbanks to do the color selection, but for
|
||||
now we'll just use two different tiles, since we've got tons of tile space to
|
||||
spare.
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, Copy, Default)]
|
||||
#[repr(transparent)]
|
||||
pub struct Tile4bpp {
|
||||
pub data: [u32; 8],
|
||||
}
|
||||
|
||||
pub const ALL_TWOS: Tile4bpp = Tile4bpp {
|
||||
data: [
|
||||
0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222,
|
||||
],
|
||||
};
|
||||
|
||||
pub const ALL_THREES: Tile4bpp = Tile4bpp {
|
||||
data: [
|
||||
0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333,
|
||||
],
|
||||
};
|
||||
```
|
||||
|
||||
And then we have to have a way to put the tiles into video memory:
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Copy)]
|
||||
#[repr(transparent)]
|
||||
pub struct Charblock4bpp {
|
||||
pub data: [Tile4bpp; 512],
|
||||
}
|
||||
|
||||
pub const VRAM: VolatilePtr<Charblock4bpp> = VolatilePtr(0x0600_0000 as *mut Charblock4bpp);
|
||||
|
||||
pub fn set_bg_tile_4bpp(charblock: usize, index: usize, tile: Tile4bpp) {
|
||||
assert!(charblock < 4);
|
||||
assert!(index < 512);
|
||||
unsafe { VRAM.offset(charblock as isize).cast::<Tile4bpp>().offset(index as isize).write(tile) }
|
||||
}
|
||||
```
|
||||
|
||||
And finally, we can call that within `main`:
|
||||
|
||||
```rust
|
||||
fn main(_argc: isize, _argv: *const *const u8) -> isize {
|
||||
// bg palette
|
||||
set_bg_palette_4bpp(0, 1, WHITE);
|
||||
set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
|
||||
set_bg_palette_4bpp(0, 3, DARK_GRAY);
|
||||
// bg tiles
|
||||
set_bg_tile_4bpp(0, 0, ALL_TWOS);
|
||||
set_bg_tile_4bpp(0, 1, ALL_THREES);
|
||||
```
|
||||
|
||||
### Setup A Screenblock
|
||||
|
||||
Screenblocks are a little weird because they take the same space as the
|
||||
charblocks (8 screenblocks per charblock). The GBA will let you mix and match
|
||||
and it's up to you to keep it all straight. We're using tiles at the base of
|
||||
charblock 0, so we'll place our screenblock at the base of charblock 1.
|
||||
|
||||
First, we have to be able to make one single screenblock entry at a time:
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, Copy, Default)]
|
||||
#[repr(transparent)]
|
||||
pub struct RegularScreenblockEntry(u16);
|
||||
|
||||
impl RegularScreenblockEntry {
|
||||
pub const SCREENBLOCK_ENTRY_TILE_ID_MASK: u16 = 0b11_1111_1111;
|
||||
pub const fn from_tile_id(id: u16) -> Self {
|
||||
RegularScreenblockEntry(id & Self::SCREENBLOCK_ENTRY_TILE_ID_MASK)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
And then with 32x32 of these things we'll have a whole screenblock. Now, we
|
||||
probably won't actually make values of the screenblock type itself, but we at
|
||||
least need it to have the type declared with the correct size so that we can
|
||||
move our pointers around by the right amount.
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Copy)]
|
||||
#[repr(transparent)]
|
||||
pub struct RegularScreenblock {
|
||||
pub data: [RegularScreenblockEntry; 32 * 32],
|
||||
}
|
||||
```
|
||||
|
||||
Alright, so, as I said those things are kinda big, we don't really want to be
|
||||
building them up on the stack if we can avoid it, so we'll write one straight
|
||||
into memory at the correct location.
|
||||
|
||||
```rust
|
||||
pub fn checker_screenblock(slot: usize, a_entry: RegularScreenblockEntry, b_entry: RegularScreenblockEntry) {
|
||||
let mut p = VRAM.cast::<RegularScreenblock>().offset(slot as isize).cast::<RegularScreenblockEntry>();
|
||||
let mut checker = true;
|
||||
for _row in 0..32 {
|
||||
for _col in 0..32 {
|
||||
unsafe { p.write(if checker { a_entry } else { b_entry }) };
|
||||
p = p.offset(1);
|
||||
checker = !checker;
|
||||
}
|
||||
checker = !checker;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
And then we add this into `main`
|
||||
|
||||
```rust
|
||||
// screenblock
|
||||
let light_entry = RegularScreenblockEntry::from_tile_id(0);
|
||||
let dark_entry = RegularScreenblockEntry::from_tile_id(1);
|
||||
checker_screenblock(8, light_entry, dark_entry);
|
||||
```
|
||||
|
||||
### Background IO Registers
|
||||
|
||||
Our most important step is of course the IO register step. There's four
|
||||
different background layers, but each of them has the same format for their
|
||||
control register. For the moment, all that we care about is being able to set
|
||||
the "screen base block" value.
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Copy, Default, PartialEq, Eq)]
|
||||
#[repr(transparent)]
|
||||
pub struct BackgroundControlSetting(u16);
|
||||
|
||||
impl BackgroundControlSetting {
|
||||
pub const SCREEN_BASE_BLOCK_MASK: u16 = 0b1_1111;
|
||||
pub const fn from_base_block(sbb: u16) -> Self {
|
||||
BackgroundControlSetting((sbb & Self::SCREEN_BASE_BLOCK_MASK) << 8)
|
||||
}
|
||||
}
|
||||
|
||||
pub const BG0CNT: VolatilePtr<BackgroundControlSetting> = VolatilePtr(0x400_0008 as *mut BackgroundControlSetting);
|
||||
```
|
||||
|
||||
And... that's all it takes for us to be able to add a line into `main`
|
||||
|
||||
```rust
|
||||
// bg0 control
|
||||
unsafe { BG0CNT.write(BackgroundControlSetting::from_base_block(8)) };
|
||||
```
|
||||
|
||||
### Set The Display Control Register
|
||||
|
||||
We're finally ready to set the display control register and get things going.
|
||||
|
||||
We've slightly glossed over it so far, but when the GBA is first booted most
|
||||
everything within the address space will be all zeroed. However, the display
|
||||
control register has the "Force VBlank" bit enabled by the BIOS, giving you a
|
||||
moment to put the memory in place that you'll need for the first frame.
|
||||
|
||||
So, now that have got all of our memory set, we'll overwrite the initial
|
||||
display control register value with what we'll call "just enable bg0".
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Copy, Default, PartialEq, Eq)]
|
||||
#[repr(transparent)]
|
||||
pub struct DisplayControlSetting(u16);
|
||||
|
||||
impl DisplayControlSetting {
|
||||
pub const JUST_ENABLE_BG0: DisplayControlSetting = DisplayControlSetting(1 << 8);
|
||||
}
|
||||
|
||||
pub const DISPCNT: VolatilePtr<DisplayControlSetting> = VolatilePtr(0x0400_0000 as *mut DisplayControlSetting);
|
||||
```
|
||||
|
||||
And so finally we have a complete `main`
|
||||
|
||||
```rust
|
||||
#[start]
|
||||
fn main(_argc: isize, _argv: *const *const u8) -> isize {
|
||||
// bg palette
|
||||
set_bg_palette_4bpp(0, 1, WHITE);
|
||||
set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
|
||||
set_bg_palette_4bpp(0, 3, DARK_GRAY);
|
||||
// bg tiles
|
||||
set_bg_tile_4bpp(0, 0, ALL_TWOS);
|
||||
set_bg_tile_4bpp(0, 1, ALL_THREES);
|
||||
// screenblock
|
||||
let light_entry = RegularScreenblockEntry::from_tile_id(0);
|
||||
let dark_entry = RegularScreenblockEntry::from_tile_id(1);
|
||||
checker_screenblock(8, light_entry, dark_entry);
|
||||
// bg0 control
|
||||
unsafe { BG0CNT.write(BackgroundControlSetting::from_base_block(8)) };
|
||||
// Display Control
|
||||
unsafe { DISPCNT.write(DisplayControlSetting::JUST_ENABLE_BG0) };
|
||||
loop {
|
||||
// TODO the whole thing
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
And _It works, Marty! It works!_
|
||||
|
||||
![screenshot_checkers](screenshot_checkers.png)
|
||||
|
||||
We've got more to go, but we're well on our way.
|
Binary file not shown.
Before Width: | Height: | Size: 147 KiB |
|
@ -1,313 +0,0 @@
|
|||
# Regular Backgrounds
|
||||
|
||||
So, backgrounds, they're cool. Why do we call the ones here "regular"
|
||||
backgrounds? Because there's also "affine" backgrounds. However, affine math
|
||||
stuff adds a complication, so for now we'll just work with regular backgrounds.
|
||||
The non-affine backgrounds are sometimes called "text mode" backgrounds by other
|
||||
guides.
|
||||
|
||||
To get your background image working you generally need to perform all of the
|
||||
following steps, though I suppose the exact ordering is up to you.
|
||||
|
||||
## Tiled Video Modes
|
||||
|
||||
When you want regular tiled display, you must use video mode 0 or 1.
|
||||
|
||||
* Mode 0 allows for using all four BG layers (0 through 3) as regular
|
||||
backgrounds.
|
||||
* Mode 1 allows for using BG0 and BG1 as regular backgrounds, BG2 as an affine
|
||||
background, and BG3 not at all.
|
||||
* Mode 2 allows for BG2 and BG3 to be used as affine backgrounds, while BG0 and
|
||||
BG1 cannot be used at all.
|
||||
|
||||
We will not cover affine backgrounds in this chapter, so we will naturally be
|
||||
using video mode 0.
|
||||
|
||||
Also, note that you have to enable each background layer that you want to use
|
||||
within the display control register.
|
||||
|
||||
## Get Your Palette Ready
|
||||
|
||||
Background palette starts at `0x5000000` and is 256 `u16` values long. It'd
|
||||
potentially be possible declare a static array starting at a fixed address and
|
||||
use a linker script to make sure that it ends up at the right spot in the final
|
||||
program, but since we have to use volatile reads and writes with PALRAM anyway,
|
||||
we'll just reuse our `VolatilePtr` type. Something like this:
|
||||
|
||||
```rust
|
||||
pub const PALRAM_BG_BASE: VolatilePtr<u16> = VolatilePtr(0x500_0000 as *mut u16);
|
||||
|
||||
pub fn bg_palette(slot: usize) -> u16 {
|
||||
assert!(slot < 256);
|
||||
unsafe { PALRAM_BG_BASE.offset(slot as isize).read() }
|
||||
}
|
||||
|
||||
pub fn set_bg_palette(slot: usize, color: u16) {
|
||||
assert!(slot < 256);
|
||||
unsafe { PALRAM_BG_BASE.offset(slot as isize).write(color) }
|
||||
}
|
||||
```
|
||||
|
||||
As we discussed with the tile color depths, the palette can be utilized as a
|
||||
single block of palette values (`[u16; 256]`) or as 16 palbanks of 16 palette
|
||||
values each (`[[u16;16]; 16]`). This setting is assigned per background layer
|
||||
via IO register.
|
||||
|
||||
## Get Your Tiles Ready
|
||||
|
||||
Tile data is placed into charblocks. A charblock is always 16kb, so depending on
|
||||
color depth it will have either 256 or 512 tiles within that charblock.
|
||||
Charblocks 0, 1, 2, and 3 are all for background tiles. That's a maximum of 2048
|
||||
tiles for backgrounds, but as you'll see in a moment a particular tilemap entry
|
||||
can't even index that high. Instead, each background layer is assigned a
|
||||
"character base block", and then tilemap entries index relative to the character
|
||||
base block of that background layer.
|
||||
|
||||
Now, if you want to move in a lot of tile data you'll probably want to use a DMA
|
||||
routine, or at least write a function like memcopy32 for fast `u32` copying from
|
||||
ROM into VRAM. However, for now, and because we're being very explicit since
|
||||
this is our first time doing it, we'll write it as functions for individual tile
|
||||
reads and writes.
|
||||
|
||||
The math works like indexing a pointer, except that we have two sizes we need to
|
||||
go by. First you take the base address for VRAM (`0x600_0000`), then add the
|
||||
size of a charblock (16kb) times the charblock you want to place the tile
|
||||
within, and then you add the index of the tile slot you're placing it into times
|
||||
the size of that type of tile. Like this:
|
||||
|
||||
```rust
|
||||
pub fn bg_tile_4bpp(base_block: usize, tile_index: usize) -> Tile4bpp {
|
||||
assert!(base_block < 4);
|
||||
assert!(tile_index < 512);
|
||||
let address = VRAM + size_of::<Charblock4bpp>() * base_block + size_of::<Tile4bpp>() * tile_index;
|
||||
unsafe { VolatilePtr(address as *mut Tile4bpp).read() }
|
||||
}
|
||||
|
||||
pub fn set_bg_tile_4bpp(base_block: usize, tile_index: usize, tile: Tile4bpp) {
|
||||
assert!(base_block < 4);
|
||||
assert!(tile_index < 512);
|
||||
let address = VRAM + size_of::<Charblock4bpp>() * base_block + size_of::<Tile4bpp>() * tile_index;
|
||||
unsafe { VolatilePtr(address as *mut Tile4bpp).write(tile) }
|
||||
}
|
||||
|
||||
pub fn bg_tile_8bpp(base_block: usize, tile_index: usize) -> Tile8bpp {
|
||||
assert!(base_block < 4);
|
||||
assert!(tile_index < 256);
|
||||
let address = VRAM + size_of::<Charblock8bpp>() * base_block + size_of::<Tile8bpp>() * tile_index;
|
||||
unsafe { VolatilePtr(address as *mut Tile8bpp).read() }
|
||||
}
|
||||
|
||||
pub fn set_bg_tile_8bpp(base_block: usize, tile_index: usize, tile: Tile8bpp) {
|
||||
assert!(base_block < 4);
|
||||
assert!(tile_index < 256);
|
||||
let address = VRAM + size_of::<Charblock8bpp>() * base_block + size_of::<Tile8bpp>() * tile_index;
|
||||
unsafe { VolatilePtr(address as *mut Tile8bpp).write(tile) }
|
||||
}
|
||||
```
|
||||
|
||||
For bulk operations, you'd do the exact same math to get your base destination
|
||||
pointer, and then you'd get the base source pointer for the tile you're copying
|
||||
out of ROM, and then you'd do the bulk copy for the correct number of `u32`
|
||||
values that you're trying to move (8 per tile moved for 4bpp, or 16 per tile
|
||||
moved for 8bpp).
|
||||
|
||||
**GBA Limitation Note:** on a modern PC (eg: `x86` or `x86_64`) you're probably
|
||||
used to index based loops and iterator based loops being the same speed. The CPU
|
||||
has the ability to do a "fused multiply add", so the base address of the array
|
||||
plus desired index * size per element is a single CPU operation to compute. It's
|
||||
slightly more complicated if there's arrays within arrays like there are here,
|
||||
but with normal arrays it's basically the same speed to index per loop cycle as
|
||||
it is to take a base address and then add +1 offset per loop cycle. However, the
|
||||
GBA's CPU _can't do any of that_. On the GBA, there's a genuine speed difference
|
||||
between looping over indexes and then indexing each loop (slow) compared to
|
||||
using an iterator that just stores an internal pointer and does +1 offset per
|
||||
loop until it reaches the end (fast). The repeated indexing itself can by itself
|
||||
be an expensive step. If it's like a 3 element array it's no big deal, but if
|
||||
you've got a big slice of data to process, be sure to go over it with `.iter()`
|
||||
and `.iter_mut()` if you can, instead of looping by index. This is Rust and all,
|
||||
so probably you were gonna do that anyway, but just a heads up.
|
||||
|
||||
## Get your Tilemap ready
|
||||
|
||||
I believe that at one point I alluded to a tilemap existing. Well, just as the
|
||||
tiles are arranged into charblocks, the data describing what tile to show in
|
||||
what location is arranged into a thing called a **screenblock**.
|
||||
|
||||
A screenblock is placed into VRAM the same as the tile data charblocks. Starting
|
||||
at the base of VRAM (`0x600_0000`) there are 32 slots for the screenblock array.
|
||||
Each screenblock is 2048 bytes (`0x800`). Naturally, if our tiles are using up
|
||||
charblock space within VRAM and our tilemaps are using up screenblock space
|
||||
within the same VRAM... well it would just be a _disaster_ if they ran in to
|
||||
each other. Once again, it's up to you as the programmer to determine how much
|
||||
space you want to devote to each thing. Each complete charblock uses up 8
|
||||
screenblocks worth of space, but you don't have to fill a complete charblock
|
||||
with tiles, so you can be very fiddly with how you split the memory.
|
||||
|
||||
Each screenblock is composed of a series of _screenblock entry_ values, which
|
||||
describe what tile index to use and if the tile should be flipped and what
|
||||
palbank it should use (if any). Because both regular backgrounds and affine
|
||||
backgrounds are composed of screenblocks with entries, and because the affine
|
||||
background has a smaller format for screenblock entries, we'll name
|
||||
appropriately.
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Copy)]
|
||||
#[repr(transparent)]
|
||||
pub struct RegularScreenblock {
|
||||
pub data: [RegularScreenblockEntry; 32 * 32],
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, Default)]
|
||||
#[repr(transparent)]
|
||||
pub struct RegularScreenblockEntry(u16);
|
||||
```
|
||||
|
||||
So, with one entry per tile, a single screenblock allows for 32x32 tiles worth of
|
||||
background.
|
||||
|
||||
The format of a regular screenblock entry is quite simple compared to some of
|
||||
the IO register stuff:
|
||||
|
||||
* 10 bits for tile index (base off of the character base block of the background)
|
||||
* 1 bit for horizontal flip
|
||||
* 1 bit for vertical flip
|
||||
* 4 bits for picking which palbank to use (if 4bpp, otherwise it's ignored)
|
||||
|
||||
```rust
|
||||
impl RegularScreenblockEntry {
|
||||
pub fn tile_id(self) -> u16 {
|
||||
self.0 & 0b11_1111_1111
|
||||
}
|
||||
pub fn set_tile_id(&mut self, id: u16) {
|
||||
self.0 &= !0b11_1111_1111;
|
||||
self.0 |= id;
|
||||
}
|
||||
pub fn horizontal_flip(self) -> bool {
|
||||
(self.0 & (1 << 0xA)) > 0
|
||||
}
|
||||
pub fn set_horizontal_flip(&mut self, bit: bool) {
|
||||
if bit {
|
||||
self.0 |= 1 << 0xA;
|
||||
} else {
|
||||
self.0 &= !(1 << 0xA);
|
||||
}
|
||||
}
|
||||
pub fn vertical_flip(self) -> bool {
|
||||
(self.0 & (1 << 0xB)) > 0
|
||||
}
|
||||
pub fn set_vertical_flip(&mut self, bit: bool) {
|
||||
if bit {
|
||||
self.0 |= 1 << 0xB;
|
||||
} else {
|
||||
self.0 &= !(1 << 0xB);
|
||||
}
|
||||
}
|
||||
pub fn palbank_index(self) -> u16 {
|
||||
self.0 >> 12
|
||||
}
|
||||
pub fn set_palbank_index(&mut self, palbank_index: u16) {
|
||||
self.0 &= 0b1111_1111_1111;
|
||||
self.0 |= palbank_index << 12;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Now, at either 256 or 512 tiles per charblock, you might be thinking that with a
|
||||
10 bit index you can index past the end of one charblock and into the next.
|
||||
You'd be right, mostly.
|
||||
|
||||
As long as you stay within the background memory region for charblocks (that is,
|
||||
0 through 3), then it all works out. However, if you try to get the background
|
||||
rendering to reach outside of the background charblocks you'll get an
|
||||
implementation defined result. It's not the dreaded "undefined behavior" we're
|
||||
often worried about in programming, but the results _are_ determined by what
|
||||
you're running the game on. With GBA hardware you get a bizarre result
|
||||
(basically another way to put garbage on the screen). With a DS it acts as if
|
||||
the tiles were all 0s. If you use an emulator it might or might not allow for
|
||||
you to do this, it's up to the emulator writers.
|
||||
|
||||
## Set Your IO Registers
|
||||
|
||||
Instead of being just a single IO register to learn about this time, there's two
|
||||
separate groups of related registers.
|
||||
|
||||
### Background Control
|
||||
|
||||
* BG0CNT (`0x400_0008`): BG0 Control
|
||||
* BG1CNT (`0x400_000A`): BG1 Control
|
||||
* BG2CNT (`0x400_000C`): BG2 Control
|
||||
* BG3CNT (`0x400_000E`): BG3 Control
|
||||
|
||||
Each of these are a read/write `u16` location. This is where we get to all of
|
||||
the important details that we've been putting off.
|
||||
|
||||
* 2 bits for the priority.
|
||||
* 2 bits for "character base block", the charblock that all of the tile indexes
|
||||
for this background are offset from.
|
||||
* 1 bit for mosaic effect being enabled (we'll get to that below).
|
||||
* 1 bit to enable 8bpp, otherwise 4bpp is used.
|
||||
* 5 bits to pick the "screen base block", the screen block that serves as the
|
||||
_base_ value for this background.
|
||||
* 1 bit that is _not_ used in regular mode, but in affine mode it can be enabled
|
||||
to cause the affine background to wrap around at the edges.
|
||||
* 2 bits for the background size.
|
||||
|
||||
The size works a little funny. When size is 0 only the base screen block is
|
||||
used. If size is 1 or 2 then the base screenblock and the following screenblock
|
||||
are placed next to each other (horizontally for 1, vertically for 2). If the
|
||||
size is 3 then the base screenblock and the following three screenblocks are
|
||||
arranged into a 2x2 grid of screenblocks.
|
||||
|
||||
### Background Offset
|
||||
|
||||
* BG0HOFS (`0x400_0010`): BG0 X-Offset
|
||||
* BG0VOFS (`0x400_0012`): BG0 Y-Offset
|
||||
* BG1HOFS (`0x400_0014`): BG1 X-Offset
|
||||
* BG1VOFS (`0x400_0016`): BG1 Y-Offset
|
||||
* BG2HOFS (`0x400_0018`): BG2 X-Offset
|
||||
* BG2VOFS (`0x400_001A`): BG2 Y-Offset
|
||||
* BG3HOFS (`0x400_001C`): BG3 X-Offset
|
||||
* BG3VOFS (`0x400_001E`): BG3 Y-Offset
|
||||
|
||||
Each of these are a _write only_ `u16` location. Bits 0 through 8 are used, so
|
||||
the offsets can be 0 through 511. They also only apply in regular backgrounds.
|
||||
If a background is in an affine state then you'll use different IO registers to
|
||||
control it (discussed in a later chapter).
|
||||
|
||||
The offset that you assign determines the pixel offset of the display area
|
||||
relative to the start of the background scene, as if the screen was a camera
|
||||
looking at the scene. In other words, as a BG X offset value increases, you can
|
||||
think of it as the camera moving to the right, or as that background moving to
|
||||
the left. Like when mario walks toward the goal. Similarly, when a BG Y offset
|
||||
increases the camera is moving down, or the background is moving up, like when
|
||||
mario falls down from a high platform.
|
||||
|
||||
Depending on how much the background is scrolled and the size of the background,
|
||||
it will loop.
|
||||
|
||||
## Mosaic
|
||||
|
||||
As a special effect, you can apply mosaic to backgrounds and objects. It's just
|
||||
a single flag for each background, so all backgrounds will use the same mosaic
|
||||
settings when they have it enabled. What it actually does is split the normal
|
||||
image into "blocks" and then each block gets the color of the top left pixel of
|
||||
that block. This is the effect you see when link hits an electric foe with his
|
||||
sword and the whole screen "buzzes" at you.
|
||||
|
||||
The mosaic control is a _write only_ `u16` IO register at `0x400_004C`.
|
||||
|
||||
There's 4 bits each for:
|
||||
|
||||
* Horizontal BG stretch
|
||||
* Vertical BG stretch
|
||||
* Horizontal object stretch
|
||||
* Vertical object stretch
|
||||
|
||||
The inputs should be 1 _less_ than the desired block size. So if you set a
|
||||
stretch value of 5 then pixels 0-5 would be part of the first block (6 pixels),
|
||||
then 6-11 is the next block (another 6 pixels) and so on.
|
||||
|
||||
If you need to make a pixel other than the top left part of each block the one
|
||||
that determines the mosaic color you can carefully offset the background or
|
||||
image by a tiny bit, but of course that makes every mosaic block change its
|
||||
target pixel. You can't change the target pixel on a block by block basis.
|
|
@ -1,417 +0,0 @@
|
|||
# Regular Objects
|
||||
|
||||
As with backgrounds, objects can be used in both an affine and non-affine way.
|
||||
For this section we'll focus on the non-affine elements, and then we'll do all
|
||||
the affine stuff in a later chapter.
|
||||
|
||||
## Objects vs Sprites
|
||||
|
||||
As [TONC](https://www.coranac.com/tonc/text/regobj.htm) helpfully reminds us
|
||||
(and then proceeds to not follow its own advice), we should always try to think
|
||||
in terms of _objects_, not _sprites_. A sprite is a logical / software concern,
|
||||
perhaps a player concern, whereas an object is a hardware concern.
|
||||
|
||||
What's more, a given sprite that the player sees might need more than one object
|
||||
to display. Objects must be either square or rectangular (so sprite bits that
|
||||
stick out probably call for a second object), and can only be from 8x8 to 64x64
|
||||
(so anything bigger has to be two objects lined up to appear as one).
|
||||
|
||||
## General Object Info
|
||||
|
||||
Unlike with backgrounds, you can enable the object layer in any video mode.
|
||||
There's space for 128 object definitions in OAM.
|
||||
|
||||
The display gets a number of cycles per scanline to process objects: 1210 by
|
||||
default, but only 954 if you enable the "HBlank interval free" setting in the
|
||||
display control register. The [cycle cost per
|
||||
object](http://problemkaputt.de/gbatek.htm#lcdobjoverview) depends on the
|
||||
object's size and if it's using affine or regular mode, so enabling the HBlank
|
||||
interval free setting doesn't cut the number of objects displayable by an exact
|
||||
number of objects. The objects are processed in order of their definitions and
|
||||
if you run out of cycles then the rest just don't get shown. If there's a
|
||||
concern that you might run out of cycles you can place important objects (such
|
||||
as the player) at the start of the list and then less important animation
|
||||
objects later on.
|
||||
|
||||
## Ready the Palette
|
||||
|
||||
Objects use the palette the same as the background does. The only difference is
|
||||
that the palette data for objects starts at `0x500_0200`.
|
||||
|
||||
```rust
|
||||
pub const PALRAM_OBJECT_BASE: VolatilePtr<u16> = VolatilePtr(0x500_0200 as *mut u16);
|
||||
|
||||
pub fn object_palette(slot: usize) -> u16 {
|
||||
assert!(slot < 256);
|
||||
unsafe { PALRAM_OBJECT_BASE.offset(slot as isize).read() }
|
||||
}
|
||||
|
||||
pub fn set_object_palette(slot: usize, color: u16) {
|
||||
assert!(slot < 256);
|
||||
unsafe { PALRAM_OBJECT_BASE.offset(slot as isize).write(color) }
|
||||
}
|
||||
```
|
||||
|
||||
## Ready the Tiles
|
||||
|
||||
Objects, as with backgrounds, are composed of 8x8 tiles, and if you want
|
||||
something bigger than 8x8 you have to use more than one tile put together.
|
||||
Object tiles go into the final two charblocks of VRAM (indexes 4 and 5). Because
|
||||
there's only two of them, they are sometimes called the lower block
|
||||
(`0x601_0000`) and the higher/upper block (`0x601_4000`).
|
||||
|
||||
Tile indexes for sprites always offset from the base of the lower block, and
|
||||
they always go 32 bytes at a time, regardless of if the object is set for 4bpp
|
||||
or 8bpp. From this we can determine that there's 512 tile slots in each of the
|
||||
two object charblocks. However, in video modes 3, 4, and 5 the space for the
|
||||
background cuts into the lower charblock, so you can only safely use the upper
|
||||
charblock.
|
||||
|
||||
```rust
|
||||
pub fn obj_tile_4bpp(tile_index: usize) -> Tile4bpp {
|
||||
assert!(tile_index < 512);
|
||||
let address = VRAM + size_of::<Charblock4bpp>() * 4 + 32 * tile_index;
|
||||
unsafe { VolatilePtr(address as *mut Tile4bpp).read() }
|
||||
}
|
||||
|
||||
pub fn set_obj_tile_4bpp(tile_index: usize, tile: Tile4bpp) {
|
||||
assert!(tile_index < 512);
|
||||
let address = VRAM + size_of::<Charblock4bpp>() * 4 + 32 * tile_index;
|
||||
unsafe { VolatilePtr(address as *mut Tile4bpp).write(tile) }
|
||||
}
|
||||
|
||||
pub fn obj_tile_8bpp(tile_index: usize) -> Tile8bpp {
|
||||
assert!(tile_index < 512);
|
||||
let address = VRAM + size_of::<Charblock8bpp>() * 4 + 32 * tile_index;
|
||||
unsafe { VolatilePtr(address as *mut Tile8bpp).read() }
|
||||
}
|
||||
|
||||
pub fn set_obj_tile_8bpp(tile_index: usize, tile: Tile8bpp) {
|
||||
assert!(tile_index < 512);
|
||||
let address = VRAM + size_of::<Charblock8bpp>() * 4 + 32 * tile_index;
|
||||
unsafe { VolatilePtr(address as *mut Tile8bpp).write(tile) }
|
||||
}
|
||||
```
|
||||
|
||||
With backgrounds you picked every single tile individually with a bunch of
|
||||
screen entry values. Objects don't do that at all. Instead you pick a base tile,
|
||||
size, and shape, then it figures out the rest from there. However, you may
|
||||
recall back with the display control register something about an "object memory
|
||||
1d" bit. This is where that comes into play.
|
||||
|
||||
* If object memory is set to be 2d (the default) then each charblock is treated
|
||||
as 32 tiles by 32 tiles square. Each object has a base tile and dimensions,
|
||||
and that just extracts directly from the charblock picture as if you were
|
||||
selecting an area. This mode probably makes for the easiest image editing.
|
||||
* If object memory is set to be 1d then the tiles are loaded sequentially from
|
||||
the starting point, enough to fill in the object's dimensions. This most
|
||||
probably makes it the easiest to program with about things, since programming
|
||||
languages are pretty good at 1d things.
|
||||
|
||||
I'm not sure I explained that well, here's a picture:
|
||||
|
||||
![2d1d-diagram](obj_memory_2d1d.jpg)
|
||||
|
||||
In 2d mode, a new row of tiles starts every 32 tile indexes.
|
||||
|
||||
Of course, the mode that you actually end up using is not particularly
|
||||
important, since it should be the job of your image conversion routine to get
|
||||
everything all lined up and into place anyway.
|
||||
|
||||
## Set the Object Attributes
|
||||
|
||||
The final step is to assign the correct attributes to an object. Each object has
|
||||
three `u16` values that make up its overall attributes.
|
||||
|
||||
Before we go into the details, I want to bring up that the hardware will attempt
|
||||
to process every single object every single frame if the object layer is
|
||||
enabled, and also that all of the GBA's object memory is cleared to 0 at
|
||||
startup. Why do these two things matter right now? As you'll see in a second an
|
||||
"all zero" set of object attributes causes an 8x8 object to appear at 0,0 using
|
||||
object tile index 0. This is usually _not_ what you want your unused objects to
|
||||
do. When your game first starts you should take a moment to mark any objects you
|
||||
won't be using as objects to not render.
|
||||
|
||||
### ObjectAttributes.attr0
|
||||
|
||||
* 8 bits for row coordinate (marks the top of the sprite)
|
||||
* 2 bits for object rendering: 0 = Normal, 1 = Affine, 2 = Disabled, 3 = Affine with double rendering area
|
||||
* 2 bits for object mode: 0 = Normal, 1 = Alpha Blending, 2 = Object Window, 3 = Forbidden
|
||||
* 1 bit for mosaic enabled
|
||||
* 1 bit 8bpp color enabled
|
||||
* 2 bits for shape: 0 = Square, 1 = Horizontal, 2 = Vertical, 3 = Forbidden
|
||||
|
||||
If an object is 128 pixels big at Y > 128 you'll get a strange looking result
|
||||
where it acts like Y > -128 and then displays partly off screen to the top.
|
||||
|
||||
### ObjectAttributes.attr1
|
||||
|
||||
* 9 bit for column coordinate (marks the left of the sprite)
|
||||
* Either:
|
||||
* 3 empty bits, 1 bit for horizontal flip, 1 bit for vertical flip (non-affine)
|
||||
* 5 bits for affine index (affine)
|
||||
* 2 bits for size.
|
||||
|
||||
| Size | Square | Horizontal | Vertical|
|
||||
|:----:|:------:|:----------:|:-------:|
|
||||
| 0 | 8x8 | 16x8 | 8x16 |
|
||||
| 1 | 16x16 | 32x8 | 8x32 |
|
||||
| 2 | 32x32 | 32x16 | 16x32 |
|
||||
| 3 | 64x64 | 64x32 | 32x64 |
|
||||
|
||||
### ObjectAttributes.attr2
|
||||
|
||||
* 10 bits for the base tile index
|
||||
* 2 bits for priority
|
||||
* 4 bits for the palbank index (4bpp mode only, ignored in 8bpp)
|
||||
|
||||
### ObjectAttributes summary
|
||||
|
||||
So I said in the GBA memory mapping section that C people would tell you that
|
||||
the object attributes should look like this:
|
||||
|
||||
```rust
|
||||
#[repr(C)]
|
||||
pub struct ObjectAttributes {
|
||||
attr0: u16,
|
||||
attr1: u16,
|
||||
attr2: u16,
|
||||
filler: i16,
|
||||
}
|
||||
```
|
||||
|
||||
Except that:
|
||||
|
||||
1) It's wasteful when we store object attributes on their own outside of OAM
|
||||
(which we definitely might want to do).
|
||||
2) In Rust we can't access just one field through a volatile pointer (our
|
||||
pointers aren't actually volatile to begin with, just the ops we do with them
|
||||
are). We have to read or write the whole pointer's value at a time.
|
||||
Similarly, we can't do things like `|=` and `&=` with volatile in Rust. So in
|
||||
rust we can't have a volatile pointer to an ObjectAttributes and then write
|
||||
to just the three "real" values and not touch the filler field. Having the
|
||||
filler value in there just means we have to dance around it more, not less.
|
||||
3) We want to newtype this whole thing to prevent accidental invalid states from
|
||||
being written into memory.
|
||||
|
||||
So we will not be using that representation. At the same time we want to have no
|
||||
overhead, so we will stick to three `u16` values. We could newtype each
|
||||
individual field to be its own type (`ObjectAttributesAttr0` or something silly
|
||||
like that), since there aren't actual dependencies between two different fields
|
||||
such that a change in one can throw another into a forbidden state. The worst
|
||||
that can happen is if we disable or enable affine mode (`attr0`) it can change
|
||||
the meaning of `attr1`. The changed meaning isn't actually in invalid state
|
||||
though, so we _could_ make each field its own type if we wanted.
|
||||
|
||||
However, when you think about it, I can't imagine a common situation where we do
|
||||
something like make an `attr0` value that we then want to save on its own and
|
||||
apply to several different `ObjectAttributes` that we make during a game. That
|
||||
just doesn't sound likely to me. So, we'll go the route where `ObjectAttributes`
|
||||
is just a big black box to the outside world and we don't need to think about
|
||||
the three fields internally as being separate.
|
||||
|
||||
First we make it so that we can get and set object attributes from memory:
|
||||
|
||||
```rust
|
||||
pub const OAM: usize = 0x700_0000;
|
||||
|
||||
pub fn object_attributes(slot: usize) -> ObjectAttributes {
|
||||
assert!(slot < 128);
|
||||
let ptr = VolatilePtr((OAM + slot * (size_of::<u16>() * 4)) as *mut u16);
|
||||
unsafe {
|
||||
ObjectAttributes {
|
||||
attr0: ptr.read(),
|
||||
attr1: ptr.offset(1).read(),
|
||||
attr2: ptr.offset(2).read(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub fn set_object_attributes(slot: usize, obj: ObjectAttributes) {
|
||||
assert!(slot < 128);
|
||||
let ptr = VolatilePtr((OAM + slot * (size_of::<u16>() * 4)) as *mut u16);
|
||||
unsafe {
|
||||
ptr.write(obj.attr0);
|
||||
ptr.offset(1).write(obj.attr1);
|
||||
ptr.offset(2).write(obj.attr2);
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, Default)]
|
||||
pub struct ObjectAttributes {
|
||||
attr0: u16,
|
||||
attr1: u16,
|
||||
attr2: u16,
|
||||
}
|
||||
```
|
||||
|
||||
Then we add a billion methods to the `ObjectAttributes` type so that we can
|
||||
actually set all the different values that we want to set.
|
||||
|
||||
This code block is the last thing on this page so if you don't wanna scroll past
|
||||
the whole thing you can just go to the next page.
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub enum ObjectRenderMode {
|
||||
Normal,
|
||||
Affine,
|
||||
Disabled,
|
||||
DoubleAreaAffine,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub enum ObjectMode {
|
||||
Normal,
|
||||
AlphaBlending,
|
||||
ObjectWindow,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub enum ObjectShape {
|
||||
Square,
|
||||
Horizontal,
|
||||
Vertical,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub enum ObjectOrientation {
|
||||
Normal,
|
||||
HFlip,
|
||||
VFlip,
|
||||
BothFlip,
|
||||
Affine(u8),
|
||||
}
|
||||
|
||||
impl ObjectAttributes {
|
||||
pub fn row(&self) -> u16 {
|
||||
self.attr0 & 0b1111_1111
|
||||
}
|
||||
pub fn column(&self) -> u16 {
|
||||
self.attr1 & 0b1_1111_1111
|
||||
}
|
||||
pub fn rendering(&self) -> ObjectRenderMode {
|
||||
match (self.attr0 >> 8) & 0b11 {
|
||||
0 => ObjectRenderMode::Normal,
|
||||
1 => ObjectRenderMode::Affine,
|
||||
2 => ObjectRenderMode::Disabled,
|
||||
3 => ObjectRenderMode::DoubleAreaAffine,
|
||||
_ => unimplemented!(),
|
||||
}
|
||||
}
|
||||
pub fn mode(&self) -> ObjectMode {
|
||||
match (self.attr0 >> 0xA) & 0b11 {
|
||||
0 => ObjectMode::Normal,
|
||||
1 => ObjectMode::AlphaBlending,
|
||||
2 => ObjectMode::ObjectWindow,
|
||||
_ => unimplemented!(),
|
||||
}
|
||||
}
|
||||
pub fn mosaic(&self) -> bool {
|
||||
((self.attr0 << 3) as i16) < 0
|
||||
}
|
||||
pub fn two_fifty_six_colors(&self) -> bool {
|
||||
((self.attr0 << 2) as i16) < 0
|
||||
}
|
||||
pub fn shape(&self) -> ObjectShape {
|
||||
match (self.attr0 >> 0xE) & 0b11 {
|
||||
0 => ObjectShape::Square,
|
||||
1 => ObjectShape::Horizontal,
|
||||
2 => ObjectShape::Vertical,
|
||||
_ => unimplemented!(),
|
||||
}
|
||||
}
|
||||
pub fn orientation(&self) -> ObjectOrientation {
|
||||
if (self.attr0 >> 8) & 1 > 0 {
|
||||
ObjectOrientation::Affine((self.attr1 >> 9) as u8 & 0b1_1111)
|
||||
} else {
|
||||
match (self.attr1 >> 0xC) & 0b11 {
|
||||
0 => ObjectOrientation::Normal,
|
||||
1 => ObjectOrientation::HFlip,
|
||||
2 => ObjectOrientation::VFlip,
|
||||
3 => ObjectOrientation::BothFlip,
|
||||
_ => unimplemented!(),
|
||||
}
|
||||
}
|
||||
}
|
||||
pub fn size(&self) -> u16 {
|
||||
self.attr1 >> 0xE
|
||||
}
|
||||
pub fn tile_index(&self) -> u16 {
|
||||
self.attr2 & 0b11_1111_1111
|
||||
}
|
||||
pub fn priority(&self) -> u16 {
|
||||
self.attr2 >> 0xA
|
||||
}
|
||||
pub fn palbank(&self) -> u16 {
|
||||
self.attr2 >> 0xC
|
||||
}
|
||||
//
|
||||
pub fn set_row(&mut self, row: u16) {
|
||||
self.attr0 &= !0b1111_1111;
|
||||
self.attr0 |= row & 0b1111_1111;
|
||||
}
|
||||
pub fn set_column(&mut self, col: u16) {
|
||||
self.attr1 &= !0b1_1111_1111;
|
||||
self.attr2 |= col & 0b1_1111_1111;
|
||||
}
|
||||
pub fn set_rendering(&mut self, rendering: ObjectRenderMode) {
|
||||
const RENDERING_MASK: u16 = 0b11 << 8;
|
||||
self.attr0 &= !RENDERING_MASK;
|
||||
self.attr0 |= (rendering as u16) << 8;
|
||||
}
|
||||
pub fn set_mode(&mut self, mode: ObjectMode) {
|
||||
const MODE_MASK: u16 = 0b11 << 0xA;
|
||||
self.attr0 &= MODE_MASK;
|
||||
self.attr0 |= (mode as u16) << 0xA;
|
||||
}
|
||||
pub fn set_mosaic(&mut self, bit: bool) {
|
||||
const MOSAIC_BIT: u16 = 1 << 0xC;
|
||||
if bit {
|
||||
self.attr0 |= MOSAIC_BIT
|
||||
} else {
|
||||
self.attr0 &= !MOSAIC_BIT
|
||||
}
|
||||
}
|
||||
pub fn set_two_fifty_six_colors(&mut self, bit: bool) {
|
||||
const COLOR_MODE_BIT: u16 = 1 << 0xD;
|
||||
if bit {
|
||||
self.attr0 |= COLOR_MODE_BIT
|
||||
} else {
|
||||
self.attr0 &= !COLOR_MODE_BIT
|
||||
}
|
||||
}
|
||||
pub fn set_shape(&mut self, shape: ObjectShape) {
|
||||
self.attr0 &= 0b0011_1111_1111_1111;
|
||||
self.attr0 |= (shape as u16) << 0xE;
|
||||
}
|
||||
pub fn set_orientation(&mut self, orientation: ObjectOrientation) {
|
||||
const AFFINE_INDEX_MASK: u16 = 0b1_1111 << 9;
|
||||
self.attr1 &= !AFFINE_INDEX_MASK;
|
||||
let bits = match orientation {
|
||||
ObjectOrientation::Affine(index) => (index as u16) << 9,
|
||||
ObjectOrientation::Normal => 0,
|
||||
ObjectOrientation::HFlip => 1 << 0xC,
|
||||
ObjectOrientation::VFlip => 1 << 0xD,
|
||||
ObjectOrientation::BothFlip => 0b11 << 0xC,
|
||||
};
|
||||
self.attr1 |= bits;
|
||||
}
|
||||
pub fn set_size(&mut self, size: u16) {
|
||||
self.attr1 &= 0b0011_1111_1111_1111;
|
||||
self.attr1 |= size << 14;
|
||||
}
|
||||
pub fn set_tile_index(&mut self, index: u16) {
|
||||
self.attr2 &= !0b11_1111_1111;
|
||||
self.attr2 |= 0b11_1111_1111 & index;
|
||||
}
|
||||
pub fn set_priority(&mut self, priority: u16) {
|
||||
self.attr2 &= !0b0000_1100_0000_0000;
|
||||
self.attr2 |= (priority & 0b11) << 0xA;
|
||||
}
|
||||
pub fn set_palbank(&mut self, palbank: u16) {
|
||||
self.attr2 &= !0b1111_0000_0000_0000;
|
||||
self.attr2 |= (palbank & 0b1111) << 0xC;
|
||||
}
|
||||
}
|
||||
```
|
Binary file not shown.
Before Width: | Height: | Size: 5.4 KiB |
|
@ -1,109 +0,0 @@
|
|||
# The Display Control Register
|
||||
|
||||
The display control register is our first actual IO Register. GBATEK gives it the
|
||||
shorthand [DISPCNT](http://problemkaputt.de/gbatek.htm#lcdiodisplaycontrol), so
|
||||
you might see it under that name if you read other guides.
|
||||
|
||||
Among IO Registers, it's one of the simpler ones, but it's got enough complexity
|
||||
that we can get a hint of what's to come.
|
||||
|
||||
Also it's the one that you basically always need to set at least once in every
|
||||
GBA game, so it's a good starting one to go over for that reason too.
|
||||
|
||||
The display control register holds a `u16` value, and is located at `0x0400_0000`.
|
||||
|
||||
Many of the bits here won't mean much to you right now. **That is fine.** You do
|
||||
NOT need to memorize them all or what they all do right away. We'll just skim
|
||||
over all the parts of this register to start, and then we'll go into more detail
|
||||
in later chapters when we need to come back and use more of the bits.
|
||||
|
||||
## Video Modes
|
||||
|
||||
The lowest three bits (0-2) let you select from among the GBA's six video modes.
|
||||
You'll notice that 3 bits allows for eight modes, but the values 6 and 7 are
|
||||
prohibited.
|
||||
|
||||
Modes 0, 1, and 2 are "tiled" modes. These are actually the modes that you
|
||||
should eventually learn to use as much as possible. It lets the GBA's limited
|
||||
video hardware do as much of the work as possible, leaving more of your CPU time
|
||||
for gameplay computations. However, they're also complex enough to deserve their
|
||||
own demos and chapters later on, so that's all we'll say about them for now.
|
||||
|
||||
Modes 3, 4, and 5 are "bitmap" modes. These let you write individual pixels to
|
||||
locations on the screen.
|
||||
|
||||
* **Mode 3** is full resolution (240w x 160h) RGB15 color. You might not be used
|
||||
to RGB15, since modern computers have 24 or 32 bit colors. In RGB15, there's 5
|
||||
bits for each color channel stored within a `u16` value, and the highest bit is
|
||||
simply ignored.
|
||||
* **Mode 4** is full resolution paletted color. Instead of being a `u16` color, each
|
||||
pixel value is a `u8` palette index entry, and then the display uses the
|
||||
palette memory (which we'll talk about later) to store the actual color data.
|
||||
Since each pixel is half sized, we can fit twice as many. This lets us have
|
||||
two "pages". At any given moment only one page is active, and you can draw to
|
||||
the other page without the user noticing. You set which page to show with
|
||||
another bit we'll get to in a moment.
|
||||
* **Mode 5** is full color, but also with pages. This means that we must have a
|
||||
reduced resolution to compensate (video memory is only so big!). The screen is
|
||||
effectively only 160w x 128h in this mode.
|
||||
|
||||
## CGB Mode
|
||||
|
||||
Bit 3 is effectively read only. Technically it can be flipped using a BIOS call,
|
||||
but when you write to the display control register normally it won't write to
|
||||
this bit, so we'll call it effectively read only.
|
||||
|
||||
This bit is on if the CPU is in CGB mode.
|
||||
|
||||
## Page Flipping
|
||||
|
||||
Bit 4 lets you pick which page to use. This is only relevent in video modes 4 or
|
||||
5, and is just ignored otherwise. It's very easy to remember: when the bit is 0
|
||||
the 0th page is used, and when the bit is 1 the 1st page is used.
|
||||
|
||||
The second page always starts at `0x0600_A000`.
|
||||
|
||||
## OAM, VRAM, and Blanking
|
||||
|
||||
Bit 5 lets you access OAM during HBlank if enabled. This is cool, but it reduces
|
||||
the maximum sprites per scanline, so it's not default.
|
||||
|
||||
Bit 6 lets you adjust if the GBA should treat Object Character VRAM as being 2d
|
||||
(off) or 1d (on). This particular control can be kinda tricky to wrap your head
|
||||
around, so we'll be sure to have some extra diagrams in the chapter that deals
|
||||
with it.
|
||||
|
||||
Bit 7 forces the screen to stay in VBlank as long as it's set. This allows the
|
||||
fastest use of the VRAM, Palette, and Object Attribute Memory. Obviously if you
|
||||
leave this on for too long the player will notice a blank screen, but it might
|
||||
be okay to use for a moment or two every once in a while.
|
||||
|
||||
## Screen Layers
|
||||
|
||||
Bits 8 through 11 control if Background layers 0 through 3 should be active.
|
||||
|
||||
Bit 12 affects the Object layer.
|
||||
|
||||
Note that not all background layers are available in all video modes:
|
||||
|
||||
* Mode 0: all
|
||||
* Mode 1: 0/1/2
|
||||
* Mode 2: 2/3
|
||||
* Mode 3/4/5: 2
|
||||
|
||||
Bit 13 and 14 enable the display of Windows 0 and 1, and Bit 15 enables the
|
||||
object display window. We'll get into how windows work later on, they let you do
|
||||
some nifty graphical effects.
|
||||
|
||||
## In Conclusion...
|
||||
|
||||
So what did we do to the display control register in `hello1`?
|
||||
|
||||
```rust
|
||||
(0x04000000 as *mut u16).write_volatile(0x0403);
|
||||
```
|
||||
|
||||
First let's [convert that to
|
||||
binary](https://www.wolframalpha.com/input/?i=0x0403), and we get
|
||||
`0b100_0000_0011`. So, that's setting Mode 3 with background 2 enabled and
|
||||
nothing else special.
|
|
@ -1,213 +0,0 @@
|
|||
# The Key Input Register
|
||||
|
||||
The Key Input Register is our next IO register. Its shorthand name is
|
||||
[KEYINPUT](http://problemkaputt.de/gbatek.htm#gbakeypadinput) and it's a `u16`
|
||||
at `0x4000130`. The entire register is obviously read only, you can't tell the
|
||||
GBA what buttons are pressed.
|
||||
|
||||
Each button is exactly one bit:
|
||||
|
||||
| Bit | Button |
|
||||
|:---:|:------:|
|
||||
| 0 | A |
|
||||
| 1 | B |
|
||||
| 2 | Select |
|
||||
| 3 | Start |
|
||||
| 4 | Right |
|
||||
| 5 | Left |
|
||||
| 6 | Up |
|
||||
| 7 | Down |
|
||||
| 8 | R |
|
||||
| 9 | L |
|
||||
|
||||
The higher bits above are not used at all.
|
||||
|
||||
Similar to other old hardware devices, the convention here is that a button's
|
||||
bit is **clear when pressed, active when released**. In other words, when the
|
||||
user is not touching the device at all the KEYINPUT value will read
|
||||
`0b0000_0011_1111_1111`. There's similar values for when the user is pressing as
|
||||
many buttons as possible, but since the left/right and up/down keys are on an
|
||||
arrow pad the value can never be 0 since you can't ever press every single key
|
||||
at once.
|
||||
|
||||
When dealing with key input, the register always shows the exact key values at
|
||||
any moment you read it. Obviously that's what it should do, but what it means to
|
||||
you as a programmer is that you should usually gather input once at the top of a
|
||||
game frame and then use that single input poll as the input values across the
|
||||
whole game frame.
|
||||
|
||||
Of course, you might want to know if a user's key state changed from frame to
|
||||
frame. That's fairly easy too: We just store the last frame keys as well as the
|
||||
current frame keys (it's only a `u16`) and then we can xor the two values.
|
||||
Anything that shows up in the xor result is a key that changed. If it's changed
|
||||
and it's now down, that means it was pushed this frame. If it's changed and it's
|
||||
now up, that means it was released this frame.
|
||||
|
||||
The other major thing you might frequently want is to know "which way" the arrow
|
||||
pad is pointing: Up/Down/None and Left/Right/None. Sounds like an enum to me.
|
||||
Except that often time we'll have situations where the direction just needs to
|
||||
be multiplied by a speed and applied as a delta to a position. We want to
|
||||
support that as well as we can too.
|
||||
|
||||
## Key Input Code
|
||||
|
||||
Let's get down to some code. First we want to make a way to read the address as
|
||||
a `u16` and then wrap that in our newtype which will implement methods for
|
||||
reading and writing the key bits.
|
||||
|
||||
```rust
|
||||
pub const KEYINPUT: VolatilePtr<u16> = VolatilePtr(0x400_0130 as *mut u16);
|
||||
|
||||
/// A newtype over the key input state of the GBA.
|
||||
#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
|
||||
#[repr(transparent)]
|
||||
pub struct KeyInputSetting(u16);
|
||||
|
||||
pub fn key_input() -> KeyInputSetting {
|
||||
unsafe { KeyInputSetting(KEYINPUT.read()) }
|
||||
}
|
||||
```
|
||||
|
||||
Now we want a way to check if a key is _being pressed_, since that's normally
|
||||
how we think of things as a game designer and even as a player. That is, usually
|
||||
you'd say "if you press A, then X happens" instead of "if you don't press A,
|
||||
then X does not happen".
|
||||
|
||||
Normally we'd pick a constant for the bit we want, `&` it with our value, and
|
||||
then check for `val != 0`. Since the bit we're looking for is `0` in the "true"
|
||||
state we still pick the same constant and we still do the `&`, but we test with
|
||||
`== 0`. Practically the same, right? Well, since I'm asking a rhetorical
|
||||
question like that you can probably already guess that it's not the same. I was
|
||||
shocked to learn this too.
|
||||
|
||||
All we have to do is ask our good friend
|
||||
[Godbolt](https://rust.godbolt.org/z/d-8oCe) what's gonna happen when the code
|
||||
compiles. The link there has the page set for the `stable` 1.30 compiler just so
|
||||
that the link results stay consistent if you read this book in a year or
|
||||
something. Also, we've set the target to `thumbv6m-none-eabi`, which is a
|
||||
slightly later version of ARM than the actual GBA, but it's close enough for
|
||||
just checking. Of course, in a full program small functions like these will
|
||||
probably get inlined into the calling code and disappear entirely as they're
|
||||
folded and refolded by the compiler, but we can just check.
|
||||
|
||||
It turns out that the `!=0` test is 4 instructions and the `==0` test is 6
|
||||
instructions. Since we want to get savings where we can, and we'll probably
|
||||
check the keys of an input often enough, we'll just always use a `!=0` test and
|
||||
then adjust how we initially read the register to compensate. By using xor with
|
||||
a mask for only the 10 used bits we can flip the "low when pressed" values so
|
||||
that the entire result has active bits in all positions where a key is pressed.
|
||||
|
||||
```rust
|
||||
pub fn key_input() -> KeyInputSetting {
|
||||
unsafe { KeyInputSetting(KEYINPUT.read_volatile() ^ 0b0000_0011_1111_1111) }
|
||||
}
|
||||
```
|
||||
|
||||
Now we add a method for seeing if a key is pressed. In the full library there's
|
||||
a more advanced version of this that's built up via macro, but for this example
|
||||
we'll just name a bunch of `const` values and then have a method that takes a
|
||||
value and says if that bit is on.
|
||||
|
||||
```rust
|
||||
pub const KEY_A: u16 = 1 << 0;
|
||||
pub const KEY_B: u16 = 1 << 1;
|
||||
pub const KEY_SELECT: u16 = 1 << 2;
|
||||
pub const KEY_START: u16 = 1 << 3;
|
||||
pub const KEY_RIGHT: u16 = 1 << 4;
|
||||
pub const KEY_LEFT: u16 = 1 << 5;
|
||||
pub const KEY_UP: u16 = 1 << 6;
|
||||
pub const KEY_DOWN: u16 = 1 << 7;
|
||||
pub const KEY_R: u16 = 1 << 8;
|
||||
pub const KEY_L: u16 = 1 << 9;
|
||||
|
||||
impl KeyInputSetting {
|
||||
pub fn contains(&self, key: u16) -> bool {
|
||||
(self.0 & key) != 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Because each key is a unique bit you can even check for more than one key at
|
||||
once by just adding two key values together.
|
||||
|
||||
```rust
|
||||
let input_contains_a_and_l = input.contains(KEY_A + KEY_L);
|
||||
```
|
||||
|
||||
And we wanted to save the state of an old frame and compare it to the current
|
||||
frame to see what was different:
|
||||
|
||||
```rust
|
||||
pub fn difference(&self, other: KeyInputSetting) -> KeyInputSetting {
|
||||
KeyInputSetting(self.0 ^ other.0)
|
||||
}
|
||||
```
|
||||
|
||||
Anything that's "in" the difference output is a key that _changed_, and then if
|
||||
the key reads as pressed this frame that means it was just pressed. The exact
|
||||
mechanics of all the ways you might care to do something based on new key
|
||||
presses is obviously quite varied, but it might be something like this:
|
||||
|
||||
```rust
|
||||
let this_frame_diff = this_frame_input.difference(last_frame_input);
|
||||
|
||||
if this_frame_diff.contains(KEY_B) && this_frame_input.contains(KEY_B) {
|
||||
// the user just pressed B, react in some way
|
||||
}
|
||||
```
|
||||
|
||||
And for the arrow pad, we'll make an enum that easily casts into `i32`. Whenever
|
||||
we're working with stuff we can try to use `i32` / `isize` as often as possible
|
||||
just because it's easier on the GBA's CPU if we stick to its native number size.
|
||||
Having it be an enum lets us use `match` and be sure that we've covered all our
|
||||
cases.
|
||||
|
||||
```rust
|
||||
/// A "tribool" value helps us interpret the arrow pad.
|
||||
#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
|
||||
#[repr(i32)]
|
||||
pub enum TriBool {
|
||||
Minus = -1,
|
||||
Neutral = 0,
|
||||
Plus = +1,
|
||||
}
|
||||
```
|
||||
|
||||
Now, how do we determine _which way_ is plus or minus? Well... I don't know.
|
||||
Really. I'm not sure what the best one is because the GBA really wants the
|
||||
origin at 0,0 with higher rows going down and higher cols going right. On the
|
||||
other hand, all the normal math you and I learned in school is oriented with
|
||||
increasing Y being upward on the page. So, at least for this demo, we're going
|
||||
to go with what the GBA wants us to do and give it a try. If we don't end up
|
||||
confusing ourselves then we can stick with that. Maybe we can cover it over
|
||||
somehow later on.
|
||||
|
||||
```rust
|
||||
pub fn column_direction(&self) -> TriBool {
|
||||
if self.contains(KEY_RIGHT) {
|
||||
TriBool::Plus
|
||||
} else if self.contains(KEY_LEFT) {
|
||||
TriBool::Minus
|
||||
} else {
|
||||
TriBool::Neutral
|
||||
}
|
||||
}
|
||||
|
||||
pub fn row_direction(&self) -> TriBool {
|
||||
if self.contains(KEY_DOWN) {
|
||||
TriBool::Plus
|
||||
} else if self.contains(KEY_UP) {
|
||||
TriBool::Minus
|
||||
} else {
|
||||
TriBool::Neutral
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
So then in our game, every frame we can check for `column_direction` and
|
||||
`row_direction` and then apply those to the player's current position to make
|
||||
them move around the screen.
|
||||
|
||||
With that settled I think we're all done with user input for now. There's some
|
||||
other things to eventually know about like key interrupts that you can set and
|
||||
stuff, but we'll cover that later on because it's not necessary right now.
|
|
@ -1,71 +0,0 @@
|
|||
# The VCount Register
|
||||
|
||||
There's an IO register called
|
||||
[VCOUNT](http://problemkaputt.de/gbatek.htm#lcdiointerruptsandstatus) that shows
|
||||
you, what else, the Vertical (row) COUNT(er). It's a `u16` at address
|
||||
`0x0400_0006`, and it's how we'll be doing our very poor quality vertical sync
|
||||
code to start.
|
||||
|
||||
* **What makes it poor?** Well, we're just going to read from the vcount value as
|
||||
often as possible every time we need to wait for a specific value to come up,
|
||||
and then proceed once it hits the point we're looking for.
|
||||
* **Why is this bad?** Because we're making the CPU do a lot of useless work,
|
||||
which uses a lot more power that necessary. Even if you're not on an actual
|
||||
GBA you might be running inside an emulator on a phone or other handheld. You
|
||||
wanna try to save battery if all you're doing with that power use is waiting
|
||||
instead of making a game actually do something.
|
||||
* **Can we do better?** We can, but not yet. The better way to do things is to
|
||||
use a BIOS call to put the CPU into low power mode until a VBlank interrupt
|
||||
happens. However, we don't know about interrupts yet, and we don't know about
|
||||
BIOS calls yet, so we'll do the basic thing for now and then upgrade later.
|
||||
|
||||
So the way that display hardware actually displays each frame is that it moves a
|
||||
tiny pointer left to right across each pixel row one pixel at a time. When it's
|
||||
within the actual screen width (240px) it's drawing out those pixels. Then it
|
||||
goes _past_ the edge of the screen for 68px during a period known as the
|
||||
"horizontal blank" (HBlank). Then it starts on the next row and does that loop
|
||||
over again. This happens for the whole screen height (160px) and then once again
|
||||
it goes past the last row for another 68px into a "vertical blank" (VBlank)
|
||||
period.
|
||||
|
||||
* One pixel is 4 CPU cycles
|
||||
* HDraw is 240 pixels, HBlank is 68 pixels (1,232 cycles per full scanline)
|
||||
* VDraw is 150 scanlines, VBlank is 68 scanlines (280,896 cycles per full refresh)
|
||||
|
||||
Now you may remember some stuff from the display control register section where
|
||||
it was mentioned that some parts of memory are best accessed during VBlank, and
|
||||
also during hblank with a setting applied. These blanking periods are what was
|
||||
being talked about. At other times if you attempt to access video or object
|
||||
memory you (the CPU) might try touching the same memory that the display device
|
||||
is trying to use, in which case you get bumped back a cycle so that the display
|
||||
can finish what it's doing. Also, if you really insist on doing video memory
|
||||
changes while the screen is being drawn then you might get some visual glitches.
|
||||
If you can, just prepare all your changes ahead of time and then assign then all
|
||||
quickly during the blank period.
|
||||
|
||||
So first we want a way to check the vcount value at all:
|
||||
|
||||
```rust
|
||||
pub const VCOUNT: VolatilePtr<u16> = VolatilePtr(0x0400_0006 as *mut u16);
|
||||
|
||||
pub fn vcount() -> u16 {
|
||||
unsafe { VCOUNT.read() }
|
||||
}
|
||||
```
|
||||
|
||||
Then we want two little helper functions to wait until VBlank and vdraw.
|
||||
|
||||
```rust
|
||||
pub const SCREEN_HEIGHT: isize = 160;
|
||||
|
||||
pub fn wait_until_vblank() {
|
||||
while vcount() < SCREEN_HEIGHT as u16 {}
|
||||
}
|
||||
|
||||
pub fn wait_until_vdraw() {
|
||||
while vcount() >= SCREEN_HEIGHT as u16 {}
|
||||
}
|
||||
```
|
||||
|
||||
And... that's it. No special types to be made this time around, it's just a
|
||||
number we read out of memory.
|
|
@ -1,130 +0,0 @@
|
|||
# Tile Data
|
||||
|
||||
When using the GBA's hardware graphics, if you want to let the hardware do most
|
||||
of the work you have to use Modes 0, 1 or 2. However, to do that we first have
|
||||
to learn about how tile data works inside of the GBA.
|
||||
|
||||
## Tiles
|
||||
|
||||
Fundamentally, a tile is an 8x8 image. If you want anything bigger than 8x8 you
|
||||
need to arrange several tiles so that it looks like whatever you're trying to
|
||||
draw.
|
||||
|
||||
As was already mentioned, the GBA supports two different color modes: 4 bits per
|
||||
pixel and 8 bits per pixel. This means that we have two types of tile that we
|
||||
need to model. The pixel bits always represent an index into the PALRAM.
|
||||
|
||||
* With 4 bits per pixel, the PALRAM is imagined to be 16 **palbank** sections of
|
||||
16 palette entries each. The image data selects the index within the palbank,
|
||||
and an external configuration selects which palbank is used.
|
||||
* With 8 bits per pixel, the PALRAM is imagined to be a single 256 entry array
|
||||
and the index just directly picks which of the 256 colors is used.
|
||||
|
||||
Knowing this, we can write the following definitions:
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, Copy, Default)]
|
||||
#[repr(transparent)]
|
||||
pub struct Tile4bpp {
|
||||
pub data: [u32; 8]
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, Default)]
|
||||
#[repr(transparent)]
|
||||
pub struct Tile8bpp {
|
||||
pub data: [u32; 16]
|
||||
}
|
||||
```
|
||||
|
||||
I hope this makes sense so far. At 4bpp, we have 4 bits per pixel, times 8
|
||||
pixels per line, times 8 lines: 256 bits required. Similarly, at 8 bits per
|
||||
pixel we'll need 512 bits. Why are we defining them as arrays of `u32` values?
|
||||
Because when it comes time to do bulk copies the fastest way to it will be to go
|
||||
one whole machine word at a time. If we make the data inside the type be an
|
||||
array of `u32` then it'll already be aligned for fast `u32` bulk copies.
|
||||
|
||||
Keeping track of the current color depth is naturally the _programmer's_
|
||||
problem. If you get it wrong you'll see a whole ton of garbage pixels all over
|
||||
the screen, and you'll probably be able to guess why. You know, unless you did
|
||||
one of the other things that can make a bunch of garbage pixels show up all over
|
||||
the screen. Graphics programming is fun like that.
|
||||
|
||||
## Charblocks
|
||||
|
||||
Tiles don't just sit on their own, they get grouped into **charblocks**. Long
|
||||
ago in the distant past, video games were built with hardware that was also used
|
||||
to make text terminals. So tile image data was called "character data". In fact
|
||||
some guides will even call the regular mode for the background layers "text
|
||||
mode", despite the fact that you obviously don't have to show text at all.
|
||||
|
||||
A charblock is 16kb long (`0x4000` bytes), which means that the number of tiles
|
||||
that fit into a charblock depends on your color depth. With 4bpp you get 512
|
||||
tiles, and with 8bpp there's 256 tiles. So they'd be something like this:
|
||||
|
||||
```rust
|
||||
#[derive(Clone, Copy)]
|
||||
#[repr(transparent)]
|
||||
pub struct Charblock4bpp {
|
||||
pub data: [Tile4bpp; 512],
|
||||
}
|
||||
|
||||
#[derive(Clone, Copy)]
|
||||
#[repr(transparent)]
|
||||
pub struct Charblock8bpp {
|
||||
pub data: [Tile8bpp; 256],
|
||||
}
|
||||
```
|
||||
|
||||
You'll note that we can't even derive `Debug` or `Default` any more because the
|
||||
arrays are so big. Rust supports Clone and Copy for arrays of any size, but the
|
||||
rest is still size 32 or less. We won't generally be making up an entire
|
||||
Charblock on the fly though, so it's not a big deal. If we _absolutely_ had to,
|
||||
we could call `core::mem::zeroed()`, but we really don't want to be trying to
|
||||
build a whole charblock at runtime. We'll usually want to define our tile data
|
||||
as `const` charblock values (or even parts of charblock values) that we then
|
||||
load out of the game pak ROM at runtime.
|
||||
|
||||
Anyway, with 16k per charblock and only 96k total in VRAM, it's easy math to see
|
||||
that there's 6 different charblocks in VRAM when in a tiled mode. The first four
|
||||
of these are for backgrounds, and the other two are for objects. There's rules
|
||||
for how a tile ID on a background or object selects a tile within a charblock,
|
||||
but since they're different between backgrounds and objects we'll cover that on
|
||||
their own pages.
|
||||
|
||||
## Image Editing
|
||||
|
||||
It's very important to note that if you use a normal image editor you'll get
|
||||
very bad results if you translate that directly into GBA memory.
|
||||
|
||||
Imagine you have part of an image that's 16 by 16 pixels, aka 2 tiles by 2
|
||||
tiles. The data for that bitmap is the 1st row of the 1st tile, then the 1st row
|
||||
of the 2nd tile. However, when we translate that into the GBA, the first 8
|
||||
pixels will indeed be the first 8 tile pixels, but then the next 8 pixels in
|
||||
memory will be used as the _2nd row of the first tile_, not the 1st row of the
|
||||
2nd tile.
|
||||
|
||||
So, how do we fix this?
|
||||
|
||||
Well, the simple but annoying way is to edit your tile image as being an 8 pixel
|
||||
wide image and then have the image get super tall as you add more and more
|
||||
tiles. It can work, but it's really impractical if you have any multi-tile
|
||||
things that you're trying to do.
|
||||
|
||||
Instead, there are some image conversion tools that devkitpro provides in their
|
||||
gba-dev section. They let you take normal images and then repackage them and
|
||||
export it in various formats that you can then compile into your project.
|
||||
|
||||
Ketsuban uses the [grit](http://www.coranac.com/projects/grit/) tool, with the
|
||||
following suggestions:
|
||||
|
||||
1) Include an actual resource file and a file describing it somewhere in your
|
||||
project (see [the grit
|
||||
manual](http://www.coranac.com/man/grit/html/index.htm) for all details
|
||||
involved here).
|
||||
2) In a `build.rs` you run `grit` on each resource+description pair, such as in
|
||||
this [old gist
|
||||
example](https://gist.github.com/ketsuban/526fa55fbef0a3ccd4c7cd6204f29f94)
|
||||
3) Then within your rust code you use the
|
||||
[include_bytes!](https://doc.rust-lang.org/core/macro.include_bytes.html)
|
||||
macro to have the formatted resource be available as a const value you can
|
||||
load at runtime.
|
|
@ -1,113 +0,0 @@
|
|||
# Video Memory Intro
|
||||
|
||||
The GBA's Video RAM is 96k stretching from `0x0600_0000` to `0x0601_7FFF`.
|
||||
|
||||
The Video RAM can only be accessed totally freely during a Vertical Blank (aka
|
||||
"VBlank", though sometimes I forget and don't capitalize it properly). At other
|
||||
times, if the CPU tries to touch the same part of video memory as the display
|
||||
controller is accessing then the CPU gets bumped by a cycle to avoid a clash.
|
||||
|
||||
Annoyingly, VRAM can only be properly written to in 16 and 32 bit segments (same
|
||||
with PALRAM and OAM). If you try to write just an 8 bit segment, then both parts
|
||||
of the 16 bit segment get the same value written to them. In other words, if you
|
||||
write the byte `5` to `0x0600_0000`, then both `0x0600_0000` and ALSO
|
||||
`0x0600_0001` will have the byte `5` in them. We have to be extra careful when
|
||||
trying to set an individual byte, and we also have to be careful if we use
|
||||
`memcopy` or `memset` as well, because they're byte oriented by default and
|
||||
don't know to follow the special rules.
|
||||
|
||||
## RGB15
|
||||
|
||||
As I said before, RGB15 stores a color within a `u16` value using 5 bits for
|
||||
each color channel.
|
||||
|
||||
```rust
|
||||
pub const RED: u16 = 0b0_00000_00000_11111;
|
||||
pub const GREEN: u16 = 0b0_00000_11111_00000;
|
||||
pub const BLUE: u16 = 0b0_11111_00000_00000;
|
||||
```
|
||||
|
||||
In Mode 3 and Mode 5 we write direct color values into VRAM, and in Mode 4 we
|
||||
write palette index values, and then the color values go into the PALRAM.
|
||||
|
||||
## Mode 3
|
||||
|
||||
Mode 3 is pretty easy. We have a full resolution grid of rgb15 pixels. There's
|
||||
160 rows of 240 pixels each, with the base address being the top left corner. A
|
||||
particular pixel uses normal "2d indexing" math:
|
||||
|
||||
```rust
|
||||
let row_five_col_seven = 5 + (7 * SCREEN_WIDTH);
|
||||
```
|
||||
|
||||
To draw a pixel, we just write a value at the address for the row and col that
|
||||
we want to draw to.
|
||||
|
||||
## Mode 4
|
||||
|
||||
Mode 4 introduces page flipping. Instead of one giant page at `0x0600_0000`,
|
||||
there's Page 0 at `0x0600_0000` and then Page 1 at `0x0600_A000`. The resolution
|
||||
for each page is the same as above, but instead of writing `u16` values, the
|
||||
memory is treated as `u8` indexes into PALRAM. The PALRAM starts at
|
||||
`0x0500_0000`, and there's enough space for 256 palette entries (each a `u16`).
|
||||
|
||||
To set the color of a palette entry we just do a normal `u16` write_volatile.
|
||||
|
||||
```rust
|
||||
(0x0500_0000 as *mut u16).offset(target_index).write_volatile(new_color)
|
||||
```
|
||||
|
||||
To draw a pixel we set the palette entry that we want the pixel to use. However,
|
||||
we must remember the "minimum size" write limitation that applies to VRAM. So,
|
||||
if we want to change just a single pixel at a time we must
|
||||
|
||||
1) Read the full `u16` it's a part of.
|
||||
2) Clear the half of the `u16` we're going to replace
|
||||
3) Write the half of the `u16` we're going to replace with the new value
|
||||
4) Write that result back to the address.
|
||||
|
||||
So, the math for finding a byte offset is the same as Mode 3 (since they're both
|
||||
a 2d grid). If the byte offset is EVEN it'll be the high bits of the `u16` at
|
||||
half the byte offset rounded down. If the offset is ODD it'll be the low bits of
|
||||
the `u16` at half the byte.
|
||||
|
||||
Does that make sense?
|
||||
|
||||
* If we want to write pixel (0,0) the byte offset is 0, so we change the high
|
||||
bits of `u16` offset 0. Then we want to write to (1,0), so the byte offset is
|
||||
1, so we change the low bits of `u16` offset 0. The pixels are next to each
|
||||
other, and the target bytes are next to each other, good so far.
|
||||
* If we want to write to (5,6) that'd be byte `5 + 6 * 240 = 1445`, so we'd
|
||||
target the low bits of `u16` offset `floor(1445/2) = 722`.
|
||||
|
||||
As you can see, trying to write individual pixels in Mode 4 is mostly a bad
|
||||
time. Fret not! We don't _have_ to write individual bytes. If our data is
|
||||
arranged correctly ahead of time we can just write `u16` or `u32` values
|
||||
directly. The video hardware doesn't care, it'll get along just fine.
|
||||
|
||||
## Mode 5
|
||||
|
||||
Mode 5 is also a two page mode, but instead of compressing the size of a pixel's
|
||||
data to fit in two pages, we compress the resolution.
|
||||
|
||||
Mode 5 is full `u16` color, but only 160w x 128h per page.
|
||||
|
||||
## In Conclusion...
|
||||
|
||||
So what got written into VRAM in `hello1`?
|
||||
|
||||
```rust
|
||||
(0x06000000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
|
||||
(0x06000000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
|
||||
(0x06000000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
|
||||
```
|
||||
|
||||
So at pixels `(120,80)`, `(136,80)`, and `(120,96)` we write three values. Once
|
||||
again we probably need to [convert them](https://www.wolframalpha.com/) into
|
||||
binary to make sense of it.
|
||||
|
||||
* 0x001F: 0b0_00000_00000_11111
|
||||
* 0x03E0: 0b0_00000_11111_00000
|
||||
* 0x7C00: 0b0_11111_00000_00000
|
||||
|
||||
Ah, of course, a red pixel, a green pixel, and a blue pixel.
|
|
@ -1,9 +0,0 @@
|
|||
|
||||
# Rust GBA Guide
|
||||
|
||||
* [Development Setup](development-setup.md)
|
||||
* [Volatile](volatile.md)
|
||||
* [The Hardware Memory Map](the-hardware-memory-map.md)
|
||||
* [IO Registers](io-registers.md)
|
||||
* [Bitmap Video](bitmap-video.md)
|
||||
* [GBA Assembly](gba-asm.md)
|
|
@ -1,214 +0,0 @@
|
|||
# Bitmap Video
|
||||
|
||||
Our first video modes to talk about are the bitmap video modes.
|
||||
|
||||
It's not because they're the best and fastest, it's because they're the
|
||||
_simplest_. You can get going and practice with them really quickly. Usually
|
||||
after that you end up wanting to move on to the other video modes because they
|
||||
have better hardware support, so you can draw more complex things with the small
|
||||
number of cycles that the GBA allows.
|
||||
|
||||
## The Three Bitmap Modes
|
||||
|
||||
As I said in the Hardware Memory Map section, the Video RAM lives in the address
|
||||
space at `0x600_0000`. Depending on our video mode the display controller will
|
||||
consider this memory to be in one of a few totally different formats.
|
||||
|
||||
### Mode 3
|
||||
|
||||
The screen is 160 rows, each 240 pixels long, of `u16` color values.
|
||||
|
||||
This is "full" resolution, and "full" color. It adds up to 76,800 bytes. VRAM is
|
||||
only 96,304 bytes total though. There's enough space left over after the bitmap
|
||||
for some object tile data if you want to use objects, but basically Mode3 is
|
||||
using all of VRAM as one huge canvas.
|
||||
|
||||
### Mode 4
|
||||
|
||||
The screen is 160 rows, each 240 pixels long, of `u8` palette values.
|
||||
|
||||
This has half as much space per pixel. What's a palette value? That's an index
|
||||
into the background PALRAM which says what the color of that pixel should be. We
|
||||
still have the full color space available, but we can only use 256 colors at the
|
||||
same time.
|
||||
|
||||
What did we get in exchange for this? Well, now there's a second "page". The
|
||||
second page starts `0xA000` bytes into VRAM (in both Mode 4 and Mode 5). It's an
|
||||
entire second set of pixel data. You determine if Page 0 or Page 1 is shown
|
||||
using bit 4 of DISPCNT. When you swap which page is being displayed it's called
|
||||
page flipping or flipping the page, or something like that.
|
||||
|
||||
Having two pages is cool, but Mode 4 has a big drawback: it's part of VRAM so
|
||||
that "can't write 1 byte at a time" rule applies. This means that to set a
|
||||
single byte we need to read a `u16`, adjust just one side of it, and then write
|
||||
that `u16` back. We can hide the complication behind a method call, but it
|
||||
simply takes longer to do all that, so editing pixels ends up being
|
||||
unfortunately slow compared to the other bitmap modes.
|
||||
|
||||
### Mode 5
|
||||
|
||||
The screen is 128 rows, each 160 pixels long, of `u16` color values.
|
||||
|
||||
Mode 5 has two pages like Mode 4 does, but instead of keeping full resolution we
|
||||
keep full color. The pixels are displayed in the top left and it's just black on
|
||||
the right and bottom edges. You can use the background control registers to
|
||||
shift it around, maybe center it, but there's no way to get around the fact that
|
||||
not having full resolution is kinda awkward.
|
||||
|
||||
## Using Mode 3
|
||||
|
||||
Let's have a look at how this comes together. We'll call this one
|
||||
`hello_world.rs`, since it's our first real program.
|
||||
|
||||
### Module Attributes and Imports
|
||||
|
||||
At the top of our file we're still `no_std` and we're still using
|
||||
`feature(start)`, but now we're using the `gba` crate so we're 100% safe code!
|
||||
Often enough we'll need a little `unsafe`, but for just bitmap drawing we don't
|
||||
need it.
|
||||
|
||||
```rust
|
||||
#![no_std]
|
||||
#![feature(start)]
|
||||
#![forbid(unsafe_code)]
|
||||
|
||||
use gba::{
|
||||
fatal,
|
||||
io::{
|
||||
display::{DisplayControlSetting, DisplayMode, DISPCNT, VBLANK_SCANLINE, VCOUNT},
|
||||
keypad::read_key_input,
|
||||
},
|
||||
vram::bitmap::Mode3,
|
||||
Color,
|
||||
};
|
||||
```
|
||||
|
||||
### Panic Handler
|
||||
|
||||
Before we had a panic handler that just looped forever. Now that we're using the
|
||||
`gba` crate we can rely on the debug output channel from `mGBA` to get a message
|
||||
into the real world. There's macros setup for each message severity, and they
|
||||
all accept a format string and arguments, like how `println` works. The catch is
|
||||
that a given message is capped at a length of 255 bytes, and it should probably
|
||||
be ASCII only.
|
||||
|
||||
In the case of the `fatal` message level, it also halts the emulator.
|
||||
|
||||
Of course, if the program is run on real hardware then the `fatal` message won't
|
||||
stop the program, so we still need the infinite loop there too.
|
||||
|
||||
(not that this program _can_ panic, but `rustc` doesn't know that so it demands
|
||||
we have a `panic_handler`)
|
||||
|
||||
```rust
|
||||
#[panic_handler]
|
||||
fn panic(info: &core::panic::PanicInfo) -> ! {
|
||||
// This kills the emulation with a message if we're running within mGBA.
|
||||
fatal!("{}", info);
|
||||
// If we're _not_ running within mGBA then we still need to not return, so
|
||||
// loop forever doing nothing.
|
||||
loop {}
|
||||
}
|
||||
```
|
||||
|
||||
### Waiting Around
|
||||
|
||||
Like I talked about before, sometimes we need to wait around a bit for the right
|
||||
moment to start doing work. However, we don't know how to do the good version of
|
||||
waiting for VBlank and VDraw to start, so we'll use the really bad version of it
|
||||
for now.
|
||||
|
||||
```rust
|
||||
/// Performs a busy loop until VBlank starts.
|
||||
///
|
||||
/// This is very inefficient, and please keep following the lessons until we
|
||||
/// cover how interrupts work!
|
||||
pub fn spin_until_vblank() {
|
||||
while VCOUNT.read() < VBLANK_SCANLINE {}
|
||||
}
|
||||
|
||||
/// Performs a busy loop until VDraw starts.
|
||||
///
|
||||
/// This is very inefficient, and please keep following the lessons until we
|
||||
/// cover how interrupts work!
|
||||
pub fn spin_until_vdraw() {
|
||||
while VCOUNT.read() >= VBLANK_SCANLINE {}
|
||||
}
|
||||
```
|
||||
|
||||
### Setup in `main`
|
||||
|
||||
In main we set the display control value we want and declare a few variables
|
||||
we're going to use in our primary loop.
|
||||
|
||||
```rust
|
||||
#[start]
|
||||
fn main(_argc: isize, _argv: *const *const u8) -> isize {
|
||||
const SETTING: DisplayControlSetting =
|
||||
DisplayControlSetting::new().with_mode(DisplayMode::Mode3).with_bg2(true);
|
||||
DISPCNT.write(SETTING);
|
||||
|
||||
let mut px = Mode3::WIDTH / 2;
|
||||
let mut py = Mode3::HEIGHT / 2;
|
||||
let mut color = Color::from_rgb(31, 0, 0);
|
||||
```
|
||||
|
||||
### Stuff During VDraw
|
||||
|
||||
When a frame starts we want to read the keys, then adjust as much of the game
|
||||
state as we can without touching VRAM.
|
||||
|
||||
Once we're ready, we do our spin loop until VBlank starts.
|
||||
|
||||
In this case, we're going to adjust `px` and `py` depending on the arrow pad
|
||||
input, and also we'll cycle around the color depending on L and R being pressed.
|
||||
|
||||
```rust
|
||||
loop {
|
||||
// read our keys for this frame
|
||||
let this_frame_keys = read_key_input();
|
||||
|
||||
// adjust game state and wait for vblank
|
||||
px = px.wrapping_add(2 * this_frame_keys.x_tribool() as usize);
|
||||
py = py.wrapping_add(2 * this_frame_keys.y_tribool() as usize);
|
||||
if this_frame_keys.l() {
|
||||
color = Color(color.0.rotate_left(5));
|
||||
}
|
||||
if this_frame_keys.r() {
|
||||
color = Color(color.0.rotate_right(5));
|
||||
}
|
||||
|
||||
// now we wait
|
||||
spin_until_vblank();
|
||||
```
|
||||
|
||||
### Stuff During VBlank
|
||||
|
||||
When VBlank starts we want want to update video memory to display the new
|
||||
frame's situation.
|
||||
|
||||
In our case, we're going to paint a little square of the current color, but also
|
||||
if you go off the map it resets the screen.
|
||||
|
||||
At the end, we spin until VDraw starts so we can do the whole thing again.
|
||||
|
||||
```rust
|
||||
// draw the new game and wait until the next frame starts.
|
||||
if px >= Mode3::WIDTH || py >= Mode3::HEIGHT {
|
||||
// out of bounds, reset the screen and position.
|
||||
Mode3::dma_clear_to(Color::from_rgb(0, 0, 0));
|
||||
px = Mode3::WIDTH / 2;
|
||||
py = Mode3::HEIGHT / 2;
|
||||
} else {
|
||||
// draw the new part of the line
|
||||
Mode3::write(px, py, color);
|
||||
Mode3::write(px, py + 1, color);
|
||||
Mode3::write(px + 1, py, color);
|
||||
Mode3::write(px + 1, py + 1, color);
|
||||
}
|
||||
|
||||
// now we wait again
|
||||
spin_until_vdraw();
|
||||
}
|
||||
}
|
||||
```
|
|
@ -1,189 +0,0 @@
|
|||
# Development Setup
|
||||
|
||||
Before you can build a GBA game you'll have to follow some special steps to
|
||||
setup the development environment.
|
||||
|
||||
Once again, extra special thanks to **Ketsuban**, who first dove into how to
|
||||
make this all work with rust and then shared it with the world.
|
||||
|
||||
## Per System Setup
|
||||
|
||||
Obviously you need your computer to have a [working rust
|
||||
installation](https://rustup.rs/). However, you'll also need to ensure that
|
||||
you're using a nightly toolchain (we will need it for inline assembly, among
|
||||
other potential useful features). You can run `rustup default nightly` to set
|
||||
nightly as the system wide default toolchain, or you can use a [toolchain
|
||||
file](https://github.com/rust-lang-nursery/rustup.rs#the-toolchain-file) to use
|
||||
nightly just on a specific project, but either way we'll be assuming the use of
|
||||
nightly from now on. You'll also need the `rust-src` component so that
|
||||
`cargo-xbuild` will be able to compile the core crate for us in a bit, so run
|
||||
`rustup component add rust-src`.
|
||||
|
||||
Next, you need [devkitpro](https://devkitpro.org/wiki/Getting_Started). They've
|
||||
got a graphical installer for Windows that runs nicely, and I guess `pacman`
|
||||
support on Linux (I'm on Windows so I haven't tried the Linux install myself).
|
||||
We'll be using a few of their general binutils for the `arm-none-eabi` target,
|
||||
and we'll also be using some of their tools that are specific to GBA
|
||||
development, so _even if_ you already have the right binutils for whatever
|
||||
reason, you'll still want devkitpro for the `gbafix` utility.
|
||||
|
||||
* On Windows you'll want something like `C:\devkitpro\devkitARM\bin` and
|
||||
`C:\devkitpro\tools\bin` to be [added to your
|
||||
PATH](https://stackoverflow.com/q/44272416/455232), depending on where you
|
||||
installed it to and such.
|
||||
* On Linux you can use pacman to get it, and the default install puts the stuff
|
||||
in `/opt/devkitpro/devkitARM/bin` and `/opt/devkitpro/tools/bin`. If you need
|
||||
help you can look in our repository's
|
||||
[.travis.yml](https://github.com/rust-console/gba/blob/master/.travis.yml)
|
||||
file to see exactly what our CI does.
|
||||
|
||||
Finally, you'll need `cargo-xbuild`. Just run `cargo install cargo-xbuild` and
|
||||
cargo will figure it all out for you.
|
||||
|
||||
## Per Project Setup
|
||||
|
||||
Once the system wide tools are ready, you'll need some particular files each
|
||||
time you want to start a new project. You can find them in the root of the
|
||||
[rust-console/gba repo](https://github.com/rust-console/gba).
|
||||
|
||||
* `thumbv4-none-agb.json` describes the overall GBA to cargo-xbuild (and LLVM)
|
||||
so it knows what to do. Technically the GBA is `thumbv4-none-eabi`, but we
|
||||
change the `eabi` to `agb` so that we can distinguish it from other `eabi`
|
||||
devices when using `cfg` flags.
|
||||
* `crt0.s` describes some ASM startup stuff. If you have more ASM to place here
|
||||
later on this is where you can put it. You also need to build it into a
|
||||
`crt0.o` file before it can actually be used, but we'll cover that below.
|
||||
* `linker.ld` tells the linker all the critical info about the layout
|
||||
expectations that the GBA has about our program, and that it should also
|
||||
include the `crt0.o` file with our compiled rust code.
|
||||
|
||||
## Compiling
|
||||
|
||||
Once all the tools are in place, there's particular steps that you need to
|
||||
compile the project. For these to work you'll need some source code to compile.
|
||||
Unlike with other things, an empty main file and/or an empty lib file will cause
|
||||
a total build failure, because we'll need a
|
||||
[no_std](https://rust-embedded.github.io/book/intro/no-std.html) build, and rust
|
||||
defaults to builds that use the standard library. The next section has a minimal
|
||||
example file you can use (along with explanation), but we'll describe the build
|
||||
steps here.
|
||||
|
||||
* `arm-none-eabi-as crt0.s -o target/crt0.o`
|
||||
* This builds your text format `crt0.s` file into object format `crt0.o`
|
||||
that's placed in the `target/` directory. Note that if the `target/`
|
||||
directory doesn't exist yet it will fail, so you have to make the directory
|
||||
if it's not there. You don't need to rebuild `crt0.s` every single time,
|
||||
only when it changes, but you might as well throw a line to do it every time
|
||||
into your build script so that you never forget because it's a practically
|
||||
instant operation anyway.
|
||||
|
||||
* `cargo xbuild --target thumbv4-none-agb.json`
|
||||
* This builds your Rust source. It accepts _most of_ the normal options, such
|
||||
as `--release`, and options, such as `--bin foo` or `--examples`, that you'd
|
||||
expect `cargo` to accept.
|
||||
* You **can not** build and run tests this way, because they require `std`,
|
||||
which the GBA doesn't have. If you want you can still run some of your
|
||||
project's tests with `cargo test --lib` or similar, but that builds for your
|
||||
local machine, so anything specific to the GBA (such as reading and writing
|
||||
registers) won't be testable that way. If you want to isolate and try out
|
||||
some piece code running on the GBA you'll unfortunately have to make a demo
|
||||
for it in your `examples/` directory and then run the demo in an emulator
|
||||
and see if it does what you expect.
|
||||
* The file extension is important! It will work if you forget it, but `cargo
|
||||
xbuild` takes the inclusion of the extension as a flag to also compile
|
||||
dependencies with the same sysroot, so you can include other crates in your
|
||||
build. Well, crates that work in the GBA's limited environment, but you get
|
||||
the idea.
|
||||
|
||||
At this point you have an ELF binary that some emulators can execute directly
|
||||
(more on that later). However, if you want a "real" ROM that works in all
|
||||
emulators and that you could transfer to a flash cart to play on real hardware
|
||||
there's a little more to do.
|
||||
|
||||
* `arm-none-eabi-objcopy -O binary target/thumbv4-none-agb/MODE/BIN_NAME target/ROM_NAME.gba`
|
||||
* This will perform an [objcopy](https://linux.die.net/man/1/objcopy) on our
|
||||
program. Here I've named the program `arm-none-eabi-objcopy`, which is what
|
||||
devkitpro calls their version of `objcopy` that's specific to the GBA in the
|
||||
Windows install. If the program isn't found under that name, have a look in
|
||||
your installation directory to see if it's under a slightly different name
|
||||
or something.
|
||||
* As you can see from reading the man page, the `-O binary` option takes our
|
||||
lovely ELF file with symbols and all that and strips it down to basically a
|
||||
bare memory dump of the program.
|
||||
* The next argument is the input file. You might not be familiar with how
|
||||
`cargo` arranges stuff in the `target/` directory, and between RLS and
|
||||
`cargo doc` and stuff it gets kinda crowded, so it goes like this:
|
||||
* Since our program was built for a non-local target, first we've got a
|
||||
directory named for that target, `thumbv4-none-agb/`
|
||||
* Next, the "MODE" is either `debug/` or `release/`, depending on if we had
|
||||
the `--release` flag included. You'll probably only be packing release
|
||||
mode programs all the way into GBA roms, but it works with either mode.
|
||||
* Finally, the name of the program. If your program is something out of the
|
||||
project's `src/bin/` then it'll be that file's name, or whatever name you
|
||||
configured for the bin in the `Cargo.toml` file. If your program is
|
||||
something out of the project's `examples/` directory there will be a
|
||||
similar `examples/` sub-directory first, and then the example's name.
|
||||
* The final argument is the output of the `objcopy`, which I suggest putting
|
||||
at just the top level of the `target/` directory. Really it could go
|
||||
anywhere, but if you're using git then it's likely that your `.gitignore`
|
||||
file is already setup to exclude everything in `target/`, so this makes sure
|
||||
that your intermediate game builds don't get checked into your git.
|
||||
|
||||
* `gbafix target/ROM_NAME.gba`
|
||||
* The `gbafix` tool also comes from devkitpro. The GBA is very picky about a
|
||||
ROM's format, and `gbafix` patches the ROM's header and such so that it'll
|
||||
work right. Unlike `objcopy`, this tool is custom built for GBA development,
|
||||
so it works just perfectly without any arguments beyond the file name. The
|
||||
ROM is patched in place, so we don't even need to specify a new destination.
|
||||
|
||||
And you're _finally_ done!
|
||||
|
||||
Of course, you probably want to make a script for all that, but it's up to you.
|
||||
On our own project we have it mostly set up within a `Makefile.toml` which runs
|
||||
using the [cargo-make](https://github.com/sagiegurari/cargo-make) plugin.
|
||||
|
||||
## Checking Your Setup
|
||||
|
||||
As I said, you need some source code to compile just to check that your
|
||||
compilation pipeline is working. Here's a sample file that just puts three dots
|
||||
on the screen without depending on any crates or anything at all.
|
||||
|
||||
`hello_magic.rs`:
|
||||
|
||||
```rust
|
||||
#![no_std]
|
||||
#![feature(start)]
|
||||
|
||||
#[panic_handler]
|
||||
fn panic(_info: &core::panic::PanicInfo) -> ! {
|
||||
loop {}
|
||||
}
|
||||
|
||||
#[start]
|
||||
fn main(_argc: isize, _argv: *const *const u8) -> isize {
|
||||
unsafe {
|
||||
(0x400_0000 as *mut u16).write_volatile(0x0403);
|
||||
(0x600_0000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
|
||||
(0x600_0000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
|
||||
(0x600_0000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
|
||||
loop {}
|
||||
}
|
||||
}
|
||||
|
||||
#[no_mangle]
|
||||
static __IRQ_HANDLER: extern "C" fn() = irq_handler;
|
||||
|
||||
extern "C" fn irq_handler() {}
|
||||
```
|
||||
|
||||
Throw that into your project skeleton, build the program, and give it a run in
|
||||
an emulator. I suggest [mgba](https://mgba.io/2019/01/26/mgba-0.7.0/), it has
|
||||
some developer tools we'll use later on. You should see a red, green, and blue
|
||||
dot close-ish to the middle of the screen. If you don't, something _already_
|
||||
went wrong. Double check things, phone a friend, write your senators, try asking
|
||||
`Lokathor` or `Ketsuban` on the [Rust Community
|
||||
Discord](https://discordapp.com/invite/aVESxV8), until you're eventually able to
|
||||
get your three dots going.
|
||||
|
||||
Of course, I'm sure you want to know why those particular numbers are the
|
||||
numbers to use. Well that's what the whole rest of the book is about!
|
|
@ -1,123 +0,0 @@
|
|||
# GBA Assembly
|
||||
|
||||
On the GBA sometimes you just end up using assembly. Not a whole lot, but
|
||||
sometimes. Accordingly, you should know how assembly works on the GBA.
|
||||
|
||||
* The [ARM Infocenter:
|
||||
ARM7TDMI](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0210c/index.html)
|
||||
is the basic authority for reference information. The GBA has a CPU with the
|
||||
`ARMv4` ISA, the `ARMv4T` variant, and specifically the `ARM7TDMI`
|
||||
microarchitecture. Someone at ARM decided that having both `ARM#` and `ARMv#`
|
||||
was a good way to [version things](https://en.wikichip.org/wiki/arm/versions),
|
||||
even when the numbers don't match. The rest of us have been sad ever since.
|
||||
The link there will take you to the correct book specific to the GBA's
|
||||
microarchitecture. There's a whole big pile of ARM books available within the
|
||||
ARM Infocenter, so if you just google it or whatever make sure you end up
|
||||
looking at the correct one. Note that there is also a [PDF
|
||||
Version](http://infocenter.arm.com/help/topic/com.arm.doc.ddi0210c/DDI0210B.pdf)
|
||||
of the documentation available, if you'd like that.
|
||||
|
||||
* In addition to the `ARM7TDMI` book, which is specific to the GBA's CPU, you'll
|
||||
need to find a copy of the ARM Architecture Reference Manual if you want
|
||||
general ARM knowledge. The ARM Infocenter has the
|
||||
[ARMv5](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0100i/index.html)
|
||||
version of said manual hosted on their site. Unfortunately, they don't seem to
|
||||
host the `ARMv4T` version of the manual any more.
|
||||
|
||||
* The [GBATek: ARM CPU
|
||||
Overview](https://problemkaputt.de/gbatek.htm#armcpuoverview) also has quite a
|
||||
bit of info. Some of it is a duplication of what you'd find in the ARM
|
||||
Infocenter reference manuals. Some of it is information that's specific to the
|
||||
GBA's layout and how the CPU interacts with other parts (such as how its
|
||||
timings and the display adapter's timings line up). Some of it is specific to
|
||||
the ARM chips _within the DS and DSi_, so be careful to make sure that you
|
||||
don't wander into the wrong section. GBATEK is always a bit of a jumbled mess,
|
||||
and the explanations are often "sparse" (to put it nicely), so I'd advise that
|
||||
you also look at the official ARM manuals.
|
||||
|
||||
* The [Compiler Explorer](https://rust.godbolt.org/z/ndCnk3) can be used to
|
||||
quickly look at assembly versions of your Rust code. That link there will load
|
||||
up an essentially blank `no_std` file with `opt-level=3` set and targeting
|
||||
`thumbv6m-none-eabi`. That's _not_ the same target as the GBA (it's two ISA
|
||||
revisions later, `ARMv6` instead of `ARMv4`), but it's the closest CPU target
|
||||
that is bundled with `rustc`, so it's the closest you can get with the
|
||||
compiler explorer website. If you're very dedicated I suppose you could setup
|
||||
a [local
|
||||
instance](https://github.com/mattgodbolt/compiler-explorer#running-a-local-instance)
|
||||
of compiler explorer and then add the extra target definition and so on, but
|
||||
that's _probably_ overkill.
|
||||
|
||||
## ARM and Thumb
|
||||
|
||||
The "T" part in `ARMv4T` and `ARM7TDMI` means "Thumb". An ARM chip that supports
|
||||
Thumb has two different instruction sets instead of just one. The chip can run
|
||||
in ARM state with 32-bit instructions, or it can run in Thumb state with 16-bit
|
||||
instructions. Note that the CPU _state_ (ARM or Thumb) is distinct from the
|
||||
_mode_ (User, FIQ, IRQ, etc). Apparently these states are sometimes called
|
||||
`a32` and `t32` in a more modern context, but I will stick with ARM and Thumb
|
||||
because that's what the official ARM7TDMI manual and GBATEK both use.
|
||||
|
||||
On the GBA, the memory bus that physically transfers data from the cartridge into
|
||||
the device is a 16-bit memory bus. This means that if you need to transfer more
|
||||
than 16 bits at a time you have to do more than one transfer. Since we'd like
|
||||
our instructions to get to the CPU as fast as possible, we compile the majority
|
||||
of our program with the Thumb instruction set. The ARM reference says that with
|
||||
Thumb instructions on a 16-bit memory bus system you get about 160% performance
|
||||
compared to using ARM instructions. That's absolutely something we want to take
|
||||
advantage of. Also, your Thumb compiled code is about 65% of the same code
|
||||
compiled with ARM. Since a game ROM can only be 32MB total, and we're trying to
|
||||
fit in images and sound too, we want to get space savings where we can.
|
||||
|
||||
You may wonder, why is the Thumb code 65% as large if the instructions
|
||||
themselves are 50% as large, and why have ARM state at all if there's such a
|
||||
benefit to be had with Thumb? Well, Thumb state doesn't support as many different
|
||||
instructions as ARM state does. Some lines of source code that can compile to a
|
||||
single ARM instruction might need to compile into more than one Thumb
|
||||
instruction. Thumb still has most of the really good instructions available, so
|
||||
it all averages out to about 65%.
|
||||
|
||||
That said, some parts of a GBA program _must_ be written for ARM state. Also,
|
||||
ARM state does allow that increased instruction flexibility. So we _need_ to use
|
||||
ARM some of the time, and we might just _want_ to use ARM even when we don't
|
||||
need to at other times. It is possible to switch states on the fly, there's
|
||||
extremely minimal overhead, even less than doing some function calls. The only
|
||||
problem is the 16-bit memory bus of the cartridge giving us a needless speed
|
||||
penalty with our ARM code. The CPU _executes_ the ARM instructions at full
|
||||
speed, but then it has to wait while more instructions get sent in. What do we
|
||||
do? Well, code is ultimately just a different kind of data. We can copy parts of
|
||||
our code off the cartridge ROM and place it into a part of the RAM that has a
|
||||
32-bit memory bus. Then the CPU can execute the code from there, going at full
|
||||
speed. Of course, there's only a very small amount of RAM compared to the size
|
||||
of a cartridge, so we'll only do this with a few select functions. Exactly which
|
||||
functions will probably depend on your game.
|
||||
|
||||
There's two problems that we face as Rust programmers:
|
||||
|
||||
1) Rust offers no way to specify individual functions as being ARM or Thumb. The
|
||||
whole program is compiled for one state or the other. Obviously this is no
|
||||
good, so it's on the [2019 embedded
|
||||
wishlist](https://github.com/rust-embedded/wg/issues/256#issuecomment-439677804),
|
||||
and perhaps a fix will come.
|
||||
|
||||
2) Rust offers no way to get a pointer to a function as well as the length of
|
||||
the compiled function, so we can't copy a function from the ROM to some other
|
||||
location because we can't even express statements about the function's data.
|
||||
I also put this [on the
|
||||
wishlist](https://github.com/rust-embedded/wg/issues/256#issuecomment-450539836),
|
||||
but honestly I have much less hope that this becomes a part of rust.
|
||||
|
||||
What this ultimately means is that some parts of our program have to be written
|
||||
in external assembly files and then added to the program with the linker. We
|
||||
were already going to write some assembly, and we already use more than one file
|
||||
in our project all the time, those parts aren't a big problem. The big problem
|
||||
is that using custom linker scripts to get assembly code into our final program
|
||||
isn't transitive between crates.
|
||||
|
||||
What I mean is that once we have a file full of custom assembly that we're
|
||||
linking in by hand, that's not "part of" the crate any more. At least not as
|
||||
`cargo` sees it. So we can't just upload it to `crates.io` and then depend on it
|
||||
in other projects and have `cargo` download the right version and and include it
|
||||
all automatically. We're back to fully manually copying files from the old
|
||||
project into the new one, adding more lines to the linker script each time we
|
||||
split up a new assembly file, all that stuff. Like the stone age. Sometimes ya
|
||||
gotta suffer for your art.
|
|
@ -1,237 +0,0 @@
|
|||
# IO Registers
|
||||
|
||||
As I said before, the IO registers are how you tell the GBA to do all the things
|
||||
you want it to do. If you want a hint at what's available, they're all listed
|
||||
out in the [GBA I/O Map](https://problemkaputt.de/gbatek.htm#gbaiomap) section
|
||||
of GBATEK. Go have a quick look.
|
||||
|
||||
Each individual IO register has a particular address just like we talked about
|
||||
in the Hardware Memory Map section. They also have a size (listed in bytes), and
|
||||
a note on if they're read only, write only, or read-write. Finally, each
|
||||
register has a name and a one line summary. Unfortunately for us, the names are
|
||||
all C style names with heavy shorthand. I'm not normally a fan of shorthand
|
||||
names, but the `gba` crate uses the register names from GBATEK as much as
|
||||
possible, since they're the most commonly used set of names among GBA
|
||||
programmers. That way, if you're reading other guides and they say to set the
|
||||
`BG2CNT` register, then you know exactly what register to look for within the
|
||||
`gba` docs.
|
||||
|
||||
## Register Bits
|
||||
|
||||
There's only about 100 registers, but there's a lot more than 100 details we
|
||||
want to have control over on the GBA. How does that work? Well, let's use a
|
||||
particular register to talk about it. The first one on the list is `DISPCNT`,
|
||||
the "Display Control" register. It's one of the most important IO registers, so
|
||||
this is a "two birds with one stone" situation.
|
||||
|
||||
Naturally there's a whole lot of things involved in the LCD that we want to
|
||||
control, and it's all "one" value, but that value is actually many "fields"
|
||||
packed into one value. When learning about an IO register, you have to look at
|
||||
its bit pattern breakdown. For `DISPCNT` the GBATEK entry looks like this:
|
||||
|
||||
```txt
|
||||
4000000h - DISPCNT - LCD Control (Read/Write)
|
||||
Bit Expl.
|
||||
0-2 BG Mode (0-5=Video Mode 0-5, 6-7=Prohibited)
|
||||
3 Reserved / CGB Mode (0=GBA, 1=CGB; can be set only by BIOS opcodes)
|
||||
4 Display Frame Select (0-1=Frame 0-1) (for BG Modes 4,5 only)
|
||||
5 H-Blank Interval Free (1=Allow access to OAM during H-Blank)
|
||||
6 OBJ Character VRAM Mapping (0=Two dimensional, 1=One dimensional)
|
||||
7 Forced Blank (1=Allow FAST access to VRAM,Palette,OAM)
|
||||
8 Screen Display BG0 (0=Off, 1=On)
|
||||
9 Screen Display BG1 (0=Off, 1=On)
|
||||
10 Screen Display BG2 (0=Off, 1=On)
|
||||
11 Screen Display BG3 (0=Off, 1=On)
|
||||
12 Screen Display OBJ (0=Off, 1=On)
|
||||
13 Window 0 Display Flag (0=Off, 1=On)
|
||||
14 Window 1 Display Flag (0=Off, 1=On)
|
||||
15 OBJ Window Display Flag (0=Off, 1=On)
|
||||
```
|
||||
|
||||
So what we're supposed to understand here is that we've got a `u16`, and then we
|
||||
set the individual bits for the things that we want. In the `hello_magic`
|
||||
example you might recall that we set this register to the value `0x0403`. That
|
||||
was a bit of a trick on my part because hex numbers usually look far more
|
||||
mysterious than decimal or binary numbers. If we converted it to binary it'd
|
||||
look like this:
|
||||
|
||||
```rust
|
||||
0b100_0000_0011
|
||||
```
|
||||
|
||||
And then you can just go down the list of settings to see what bits are what:
|
||||
|
||||
* Bits 0-2 (BG Mode) are `0b011`, so that's Video Mode 3
|
||||
* Bit 10 (Display BG2) is enabled
|
||||
* Everything else is disabled
|
||||
|
||||
Naturally, trying to remember exactly what bit does what can be difficult. In
|
||||
the `gba` crate we attempt as much as possible to make types that wrap over a
|
||||
`u16` or `u32` and then have getters and setters _as if_ all the inner bits were
|
||||
different fields.
|
||||
|
||||
* If it's a single bit then the getter/setter will use `bool`.
|
||||
* If it's more than one bit and each pattern has some non-numeric meaning then
|
||||
it'll use an `enum`.
|
||||
* If it's more than one bit and numeric in nature then it'll just use the
|
||||
wrapped integer type. Note that you generally won't get the full range of the
|
||||
inner number type, and any excess gets truncated down to fit in the bits
|
||||
available.
|
||||
|
||||
All the getters and setters are defined as `const` functions, so you can make
|
||||
constant declarations for the exact setting combinations that you want.
|
||||
|
||||
## Some Important IO Registers
|
||||
|
||||
It's not easy to automatically see what registers will be important for getting
|
||||
started and what registers can be saved to learn about later.
|
||||
|
||||
We'll go over three IO registers here that will help us the most to get started,
|
||||
then next lesson we'll cover how that Video Mode 3 bitmap drawing works, and
|
||||
then by the end of the next lesson we'll be able to put it all together into
|
||||
something interactive.
|
||||
|
||||
### DISPCNT: Display Control
|
||||
|
||||
The [DISPCNT](https://problemkaputt.de/gbatek.htm#lcdiodisplaycontrol) register
|
||||
lets us affect the major details of our video output. There's a lot of other
|
||||
registers involved too, but it all starts here.
|
||||
|
||||
```rust
|
||||
pub const DISPCNT: VolAddress<DisplayControlSetting> = unsafe { VolAddress::new(0x400_0000) };
|
||||
```
|
||||
|
||||
As you can see, the display control register is, like most registers,
|
||||
complicated enough that we make it a dedicated type with getters and setters for
|
||||
the "phantom" fields. In this case it's mostly a bunch of `bool` values we can
|
||||
set, and also the video mode is an `enum`.
|
||||
|
||||
We already looked at the bit listing above, let's go over what's important right
|
||||
now and skip the other bits:
|
||||
|
||||
* BG Mode sets how the whole screen is going to work and even how the display
|
||||
adapter is going to interpret the bit layout of video memory for pixel
|
||||
processing. We'll start with Mode 3, which is the simplest to learn.
|
||||
* The "Forced Blank" bit is one of the very few bits that starts _on_ at the
|
||||
start of the main program. When it's enabled it prevents the display adapter
|
||||
from displaying anything at all. You use this bit when you need to do a very
|
||||
long change to video memory and you don't want the user to see the
|
||||
intermediate states being partly drawn.
|
||||
* The "Screen Display" bits let us enable different display layers. We care
|
||||
about BG2 right now because the bitmap modes (3, 4, and 5) are all treated as
|
||||
if they were drawing into BG2 (even though it's the only BG layer available in
|
||||
those modes).
|
||||
|
||||
There's a bunch of other stuff, but we'll get to those things later. They're not
|
||||
relevent right now, and there's enough to learn already. Already we can see that
|
||||
when the `hello_magic` demo says
|
||||
|
||||
```rust
|
||||
(0x400_0000 as *mut u16).write_volatile(0x0403);
|
||||
```
|
||||
|
||||
We could re-write that more sensibly like this
|
||||
|
||||
```rust
|
||||
const SETTING: DisplayControlSetting =
|
||||
DisplayControlSetting::new().with_mode(DisplayMode::Mode3).with_bg2(true);
|
||||
DISPCNT.write(SETTING);
|
||||
```
|
||||
|
||||
### VCOUNT: Vertical Display Counter
|
||||
|
||||
The [VCOUNT](https://problemkaputt.de/gbatek.htm#lcdiointerruptsandstatus)
|
||||
register lets us find out what row of pixels (called a **scanline**) is
|
||||
currently being processed.
|
||||
|
||||
```rust
|
||||
pub const VCOUNT: ROVolAddress<u16> = unsafe { ROVolAddress::new(0x400_0006) };
|
||||
```
|
||||
|
||||
You see, the display adapter is constantly running its own loop, along side the
|
||||
CPU. It starts at the very first pixel of the very first scanline, takes 4
|
||||
cycles to determine what color that pixel is, and then processes the next
|
||||
pixel. Each scanline is 240 pixels long, followed by 68 "virtual" pixels so that
|
||||
you have just a moment to setup for the next scanline to be drawn if you need
|
||||
it. 272 cycles (68*4) is not a lot of time, but it's enough that you could
|
||||
change some palette colors or move some objects around if you need to.
|
||||
|
||||
* Horizontal pixel value `0..240`: "HDraw"
|
||||
* Horizontal pixel value `240..308`: "HBlank"
|
||||
|
||||
There's no way to check the current horizontal counter, but there is a way to
|
||||
have the CPU interrupt the normal code when the HBlank period starts, which
|
||||
we'll learn about later.
|
||||
|
||||
Once a complete scanline has been processed (including the blank period), the
|
||||
display adapter keeps going with the next scanline. Similar to how the
|
||||
horizontal processing works, there's 160 scanlines in the real display, and then
|
||||
it's followed by 68 "virtual" scanlines to give you time for adjusting video
|
||||
memory between the frames of the game.
|
||||
|
||||
* Vertical Count `0..160`: "VDraw"
|
||||
* Vertical Count `160..228`: "VBlank"
|
||||
|
||||
Once every scanline has been processed (including the vblank period), the
|
||||
display adapter starts the whole loop over again with scanline 0. A total of
|
||||
280,896 cycles per display loop (4 * 308 * 228), and about 59.59ns per CPU
|
||||
cycle, gives us a full speed display rate of 59.73fps. That's close enough to
|
||||
60fps that I think we can just round up a bit whenever we're not counting it
|
||||
down to the exact cycle timings.
|
||||
|
||||
However, there's a bit of a snag. If we change video memory during the middle of
|
||||
a scanline the display will _immediately_ start processing using the new state
|
||||
of video memory. The picture before the change and after the change won't look
|
||||
like a single, clean picture. Instead you'll get what's called "[screen
|
||||
tearing](https://en.wikipedia.org/wiki/Screen_tearing)", which is usually
|
||||
considered to be the mark of a badly programmed game.
|
||||
|
||||
To avoid this we just need to only adjust video memory during one of the blank
|
||||
periods. If you're really cool you can adjust things during HBlank, but we're
|
||||
not that cool yet. Starting out our general program flow will be:
|
||||
|
||||
1) Gather input for the frame (next part of this lesson) and update the game
|
||||
state, getting everything ready for when VBlank actually starts.
|
||||
2) Once VBlank starts we update all of the video memory as fast as we can.
|
||||
3) Once we're done drawing we again wait for the VDraw period to begin and then
|
||||
do it all again.
|
||||
|
||||
Now, it's not the most efficient way, but to get our timings right we can just
|
||||
read from `VCOUNT` over and over in a "busy loop". Once we read a value of 160
|
||||
we know that we've entered VBlank. Once it goes back to 0 we know that we're
|
||||
back in VDraw.
|
||||
|
||||
Doing a busy loop like this actually drains the batteries way more than
|
||||
necessary. It keeps the CPU active constantly, which is what uses a fair amount
|
||||
of the power. Normally you're supposed to put the CPU to sleep if you're just
|
||||
waiting around for something to happen. However, that also requires learning
|
||||
about some more concepts to get right. So to keep things easier starting out
|
||||
we'll do the bad/lazy version and then upgrade our technique later.
|
||||
|
||||
### KEYINPUT: Key Input Reading
|
||||
|
||||
The [KEYINPUT](https://problemkaputt.de/gbatek.htm#gbakeypadinput) register is
|
||||
the last one we've got to learn about this lesson. It lets you check the status
|
||||
of all 10 buttons on the GBA.
|
||||
|
||||
```rust
|
||||
pub const KEYINPUT: ROVolAddress<u16> = unsafe { ROVolAddress::new(0x400_0130) };
|
||||
```
|
||||
|
||||
There's little to say here. It's a read only register, and the data just
|
||||
contains one bit per button. The only thing that's a little weird about it is
|
||||
that the bits follow a "low active" convention, so if the button is pressed then
|
||||
the bit is 0, and if the button is released the bit is 1.
|
||||
|
||||
You _could_ work with that directly, but I think it's a lot easier to think
|
||||
about having `true` for pressed and `false` for not pressed. So the `gba` crate
|
||||
flips the bits when you read the keys:
|
||||
|
||||
```rust
|
||||
/// Gets the current state of the keys
|
||||
pub fn read_key_input() -> KeyInput {
|
||||
KeyInput(KEYINPUT.read() ^ 0b0000_0011_1111_1111)
|
||||
}
|
||||
```
|
||||
|
||||
Now we can treat the KeyInput values like a totally normal bitset.
|
|
@ -1,379 +0,0 @@
|
|||
# The Hardware Memory Map
|
||||
|
||||
So we saw `hello_magic.rs` and then we learned what `volatile` was all about,
|
||||
but we've still got a few things that are a bit mysterious. You can't just cast
|
||||
a number into a pointer and start writing to it! That's totally crazy! That's
|
||||
writing to un-allocated memory! Against the rules!
|
||||
|
||||
Well, _kinda_. It's true that you're not allowed to write _anywhere at all_, but
|
||||
those locations were carefully selected locations.
|
||||
|
||||
You see, on a modern computer if you need to check if a key is pressed you ask
|
||||
the Operating System (OS) to please go check for you. If you need to play a
|
||||
sound, you ask the OS to please play the sound on a default sound output. If you
|
||||
need to show a picture you ask the OS to give you access to the video driver so
|
||||
that you can ask the video driver to please put some pixels on the screen.
|
||||
That's mostly fine, except how does the OS actually do it? It doesn't have an OS
|
||||
to go ask, it has to stop somewhere.
|
||||
|
||||
Ultimately, every piece of hardware is mapped into somewhere in the address
|
||||
space of the CPU. You can't actually tell that this is the case as a normal user
|
||||
because your program runs inside a virtualized address space. That way you can't
|
||||
go writing into another program's memory and crash what they're doing or steal
|
||||
their data (well, hopefully, it's obviously not perfect). Outside of the
|
||||
virtualization layer the OS is running directly in the "true" address space, and
|
||||
it can access the hardware on behalf of a program whenever it's asked to.
|
||||
|
||||
How does directly accessing the hardware work, _precisely_? It's just the same
|
||||
as accessing the RAM. Each address holds some bits, and the CPU picks an address
|
||||
and loads in the bits. Then the program gets the bits and has to decide what
|
||||
they mean. The "driver" of a hardware device is just the layer that translates
|
||||
between raw bits in the outside world and more meaningful values inside of the
|
||||
program.
|
||||
|
||||
Of course, memory mapped hardware can change its bits at any time. The user can
|
||||
press and release a key and you can't stop them. This is where `volatile` comes
|
||||
in. Whenever there's memory mapped hardware you want to access it with
|
||||
`volatile` operations so that you can be sure that you're sending the data every
|
||||
time, and that you're getting fresh data every time.
|
||||
|
||||
## GBA Specifics
|
||||
|
||||
That's enough about the general concept of memory mapped hardware, let's get to
|
||||
some GBA specifics. The GBA has the following sections in its memory map.
|
||||
|
||||
* BIOS
|
||||
* External Work RAM (EWRAM)
|
||||
* Internal Work RAM (IWRAM)
|
||||
* IO Registers
|
||||
* Palette RAM (PALRAM)
|
||||
* Video RAM (VRAM)
|
||||
* Object Attribute Memory (OAM)
|
||||
* Game Pak ROM (ROM)
|
||||
* Save RAM (SRAM)
|
||||
|
||||
Each of these has a few key points of interest:
|
||||
|
||||
* **Bus Width:** Also just called "bus", this is how many little wires are
|
||||
_physically_ connecting a part of the address space to the CPU. If you need to
|
||||
transfer more data than fits in the bus you have to do repeated transfers
|
||||
until it all gets through.
|
||||
* **Read/Write Modes:** Most parts of the address space can be read from in 8,
|
||||
16, or 32 bits at a time (there's a few exceptions we'll see). However, a
|
||||
significant portion of the address space can't accept 8 bit writes. Usually
|
||||
this isn't a big deal, but standard `memcopy` routine switches to doing a
|
||||
byte-by-byte copy in some situations, so we'll have to be careful about using
|
||||
it in combination with those regions of the memory.
|
||||
* **Access Speed:** On top of the bus width issue, not all memory can be
|
||||
accessed at the same speed. The "fast" parts of memory can do a read or write
|
||||
in 1 cycle, but the slower parts of memory can take a few cycles per access.
|
||||
These are called "wait cycles". The exact timings depend on what you configure
|
||||
the system to use, which is also limited by what your cartridge physically
|
||||
supports. You'll often see timings broken down into `N` cycles (non-sequential
|
||||
memory access) and `S` cycles (sequential memory access, often faster). There
|
||||
are also `I` cycles (internal cycles) which happen whenever the CPU does an
|
||||
internal operation that's more than one cycle to complete (like a multiply).
|
||||
Don't worry, you don't have to count exact cycle timings unless you're on the
|
||||
razor's edge of the GBA's abilities. For more normal games you just have to be
|
||||
mindful of what you're doing and it'll be fine.
|
||||
|
||||
Let's briefly go over the major talking points of each memory region. All of
|
||||
this information is also available in GBATEK, mostly in their [memory
|
||||
map](http://www.akkit.org/info/gbatek.htm#gbamemorymap) section (though somewhat
|
||||
spread through the rest of the document too).
|
||||
|
||||
Though I'm going to list the location range of each memory space below, most of
|
||||
the hardware locations are actually mirrored at several points throughout the
|
||||
address space.
|
||||
|
||||
### BIOS
|
||||
|
||||
* **Location:** `0x0` to `0x3FFF`
|
||||
* **Bus:** 32-bit
|
||||
* **Access:** Memory protected read-only (see text).
|
||||
* **Wait Cycles:** None
|
||||
|
||||
The "basic input output system". This contains a grab bag of utilities that do
|
||||
various tasks. The code is optimized for small size rather than great speed, so
|
||||
you can sometimes write faster versions of these routines. Also, calling a bios
|
||||
function has more overhead than a normal function call. You can think of bios
|
||||
calls as being similar to system calls to the OS on a desktop computer. Useful,
|
||||
but costly.
|
||||
|
||||
As a side note, not only is BIOS memory read only, but it's memory protected so
|
||||
that you can't even read from bios memory unless the system is currently
|
||||
executing a function that's in bios memory. If you try then the system just
|
||||
gives back a nonsensical value that's not really what you asked for. If you
|
||||
really want to know what's inside, there's actually a bug in one bios call
|
||||
(`MidiKey2Freq`) that lets you read the bios section one byte at a time.
|
||||
|
||||
Also, there's not just one bios! Of course there's the official bios from
|
||||
Nintendo that's used on actual hardware, but since that's code instead of
|
||||
hardware it's protected by copyright. Since a bios is needed to run a GBA
|
||||
emulator properly, people have come up with their own open source versions or
|
||||
they simply make the emulator special case the bios and act _as if_ the function
|
||||
call had done the right thing.
|
||||
|
||||
* The [TempGBA](https://github.com/Nebuleon/TempGBA) repository has an easy to
|
||||
look at version written in assembly. It's API and effects are close enough to
|
||||
the Nintendo version that most games will run just fine.
|
||||
* You can also check out the [mGBA
|
||||
bios](https://github.com/mgba-emu/mgba/blob/master/src/gba/bios.c) if you want
|
||||
to see the C version of what various bios functions are doing.
|
||||
|
||||
### External Work RAM (EWRAM)
|
||||
|
||||
* **Location:** `0x200_0000` to `0x203_FFFF` (256k)
|
||||
* **Bus:** 16-bit
|
||||
* **Access:** Read-write, any size.
|
||||
* **Wait Cycles:** 2
|
||||
|
||||
The external work ram is a sizable amount of space, but the 2 wait cycles per
|
||||
access and 16-bit bus mean that you should probably think of it as being a
|
||||
"heap" to avoid putting things in if you don't have to.
|
||||
|
||||
The GBA itself doesn't use this for anything, so any use is totally up to you.
|
||||
|
||||
At the moment, the linker script and `crt0.s` files provided with the `gba`
|
||||
crate also have no defined use for the EWRAM, so it's 100% on you to decide how
|
||||
you wanna use them.
|
||||
|
||||
(Note: There is an undocumented control register that lets you adjust the wait
|
||||
cycles on EWRAM. Using it, you can turn EWRAM from the default 2 wait cycles
|
||||
down to 1. However, not all GBA-like things support it. The GBA and GBA SP do,
|
||||
the GBA Micro and DS do not. Emulators might or might not depending on the
|
||||
particular emulator. See the [GBATEK system
|
||||
control](https://problemkaputt.de/gbatek.htm#gbasystemcontrol) page for a full
|
||||
description of that register, though probably only once you've read more of this
|
||||
tutorial book and know how to make sense of IO registers and such.)
|
||||
|
||||
### Internal Work RAM (IWRAM)
|
||||
|
||||
* **Location:** `0x300_0000` to `0x300_7FFF` (32k)
|
||||
* **Bus:** 32-bit
|
||||
* **Access:** Read-write, any size.
|
||||
* **Wait Cycles:** 0
|
||||
|
||||
This is where the "fast" memory for general purposes lives. By default the
|
||||
system uses the 256 _bytes_ starting at `0x300_7F00` _and up_ for system and
|
||||
interrupt purposes, while Rust's program stack starts at that same address _and
|
||||
goes down_ from there.
|
||||
|
||||
Even though your stack exists in this space, it's totally reasonable to use the
|
||||
bottom parts of this memory space for whatever quick scratch purposes, same as
|
||||
EWRAM. 32k is fairly huge, and the stack going down from the top and the scratch
|
||||
data going up from the bottom are unlikely to hit each other. If they do you
|
||||
were probably well on your way to a stack overflow anyway.
|
||||
|
||||
The linker script and `crt0.s` file provided with the `gba` crate use the bottom
|
||||
of IWRAM to store the `.data` and `.bss` [data
|
||||
segments](https://en.wikipedia.org/wiki/Data_segment). That's where your global
|
||||
variables get placed (both `static` and `static mut`). The `.data` segment holds
|
||||
any variable that's initialized to non-zero, and the `.bss` section is for any
|
||||
variable initialized to zero. When the GBA is powered on, some code in the
|
||||
`crt0.s` file runs and copies the initial `.data` values into place within IWRAM
|
||||
(all of `.bss` starts at 0, so there's no copy for those variables).
|
||||
|
||||
If you have no global variables at all, then you don't need to worry about those
|
||||
details, but if you do have some global variables then you can use the _address
|
||||
of_ the `__bss_end` symbol defined in the top of the `gba` crate as a marker for
|
||||
where it's safe for you to start using IWRAM without overwriting your globals.
|
||||
|
||||
### IO Registers
|
||||
|
||||
* **Location:** `0x400_0000` to `0x400_03FE`
|
||||
* **Bus:** 32-bit
|
||||
* **Access:** different for each IO register
|
||||
* **Wait Cycles:** 0
|
||||
|
||||
The IO Registers are where most of the magic happens, and it's where most of the
|
||||
variety happens too. Each IO register is a specific width, usually 16-bit but
|
||||
sometimes 32-bit. Most of them are fully read/write, but some of them are read
|
||||
only or write only. Some of them have individual bits that are read only even
|
||||
when the rest of the register is writable. Some of them can be written to, but
|
||||
the write doesn't change the value you read back, it sets something else.
|
||||
Really.
|
||||
|
||||
The IO registers are how you control every bit of hardware besides the CPU
|
||||
itself. Reading the buttons, setting display modes, enabling timers, all of that
|
||||
goes through different IO registers. Actually, even a few parts of the CPU's
|
||||
operation can be controlled via IO register.
|
||||
|
||||
We'll go over IO registers more in the next section, including a few specific
|
||||
registers, and then we'll constantly encounter more IO registers as we explore
|
||||
each new topic through the rest of the book.
|
||||
|
||||
### Palette RAM (PALRAM)
|
||||
|
||||
* **Location:** `0x500_0000` to `0x500_03FF` (1k)
|
||||
* **Bus:** 16-bit
|
||||
* **Access:** Read any, single bytes mirrored (see text).
|
||||
* **Wait Cycles:** Video Memory Wait (see text)
|
||||
|
||||
This is where the GBA stores color palette data. There's 256 slots for
|
||||
Background color, and then 256 slots for Object color.
|
||||
|
||||
GBA colors are 15 bits each, with five bits per channel and the highest bit
|
||||
being totally ignored, so we store them as `u16` values:
|
||||
|
||||
* `X_BBBBB_GGGGG_RRRRR`
|
||||
|
||||
Of note is the fact that the 256 palette slots can be viewed in two different
|
||||
ways. There's two different formats for images in video memory: "8 bit per
|
||||
pixel" (8bpp) and "4 bit per pixel mode" (4bpp).
|
||||
|
||||
* **8bpp:** Each pixel in the image is 8 bits and indexes directly into the full
|
||||
256 entry palette array. An index of 0 means that pixel should be transparent,
|
||||
so there's 255 possible colors.
|
||||
* **4bpp:** Each pixel in the image is 4 bits and indexes into a "palbank" of 16
|
||||
colors within the palette data. Some exterior control selects the palbank to
|
||||
be used. An index of 0 still means that the pixel should be transparent, so
|
||||
there's 15 possible colors.
|
||||
|
||||
Different images can use different modes all at once, as long as you can fit all
|
||||
the colors you want to use into your palette layout.
|
||||
|
||||
PALRAM can't be written to in individual bytes. This isn't normally a problem at
|
||||
all, because you wouldn't really want to write half of a color entry anyway. If
|
||||
you do try to write a single byte then it gets "mirrored" into both halves of
|
||||
the `u16` that would be associated with that address. For example, if you tried
|
||||
to write `0x01u8` to either `0x500_0000` or `0x500_0001` then you'd actually
|
||||
_effectively_ be writing `0x0101u16` to `0x500_0000`.
|
||||
|
||||
PALRAM follows what we'll call the "Video Memory Wait" rule: If you to access
|
||||
the memory during a vertical blank or horizontal blank period there's 0 wait
|
||||
cycles, and if you try to access the memory while the display controller is
|
||||
drawing there is a 1 cycle wait inserted _if_ the display controller was using
|
||||
that memory at that moment.
|
||||
|
||||
### Video RAM (VRAM)
|
||||
|
||||
* **Location:** `0x600_0000` to `0x601_7FFF` (96k or 64k+32k depending on mode)
|
||||
* **Bus:** 16-bit
|
||||
* **Access:** Read any, single bytes _sometimes_ mirrored (see text).
|
||||
* **Wait Cycles:** Video Memory Wait (see text)
|
||||
|
||||
Video RAM is the memory for what you want the display controller to be
|
||||
displaying. The GBA actually has 6 different display modes (numbered 0 through
|
||||
5), and depending on the mode you're using the layout that you should imagine
|
||||
VRAM having changes. Because there's so much involved here, I'll leave more
|
||||
precise details to the following sections which talk about how to use VRAM in
|
||||
each mode.
|
||||
|
||||
VRAM can't be written to in individual bytes. If you try to write a single byte
|
||||
to background VRAM the byte gets mirrored like with PALRAM, and if you try with
|
||||
object VRAM the write gets ignored entirely. Exactly what address ranges those
|
||||
memory types are depends on video mode, but just don't bother with individual
|
||||
byte writes to VRAM. If you want to change a single byte of data (and you might)
|
||||
then the correct style is to read the full `u16`, mask out the old data, mask in
|
||||
your new value, and then write the whole `u16`.
|
||||
|
||||
VRAM follows the same "Video Memory Wait" rule that PALRAM has.
|
||||
|
||||
### Object Attribute Memory (OAM)
|
||||
|
||||
* **Location:** `0x700_0000` to `0x700_03FF` (1k)
|
||||
* **Bus:** 32-bit
|
||||
* **Access:** Read any, single bytes no effect (see text).
|
||||
* **Wait Cycles:** Video Memory Wait (see text)
|
||||
|
||||
This part of memory controls the "Objects" (OBJ) on the screen. An object is
|
||||
_similar to_ the concept of a "sprite". However, because of an object's size
|
||||
limitations, a single sprite might require more than one object to be drawn
|
||||
properly. In general, if you want to think in terms of sprites at all, you
|
||||
should think of sprites as being a logical / programming concept, and objects as
|
||||
being a hardware concept.
|
||||
|
||||
While VRAM has the _image_ data for each object, this part of memory has the
|
||||
_control_ data for each object. An objects "attributes" describe what part of
|
||||
the VRAM to use, where to place is on the screen, any special graphical effects
|
||||
to use, all that stuff. Each object has 6 bytes of attribute data (arranged as
|
||||
three `u16` values), and there's a total of 128 objects (indexed 0 through 127).
|
||||
|
||||
But 6 bytes each times 128 entries out of 1024 bytes leaves us with 256 bytes
|
||||
left over. What's the other space used for? Well, it's a little weird, but after
|
||||
every three `u16` object attribute fields there's one `i16` "affine parameter"
|
||||
field mixed in. It takes four such fields to make a complete set of affine
|
||||
parameters (a 2x2 matrix), so we get a total of 32 affine parameter entries
|
||||
across all of OAM. "Affine" might sound fancy but it just means a transformation
|
||||
where anything that started parallel stays parallel after the transform. The
|
||||
affine parameters can be used to scale, rotate, and/or skew a background or
|
||||
object as it's being displayed on the screen. It takes more computing power than
|
||||
the non-affine display, so you can't display as many different things at once
|
||||
when using the affine modes.
|
||||
|
||||
OAM can't ever be written to with individual bytes. The write just has no effect
|
||||
at all.
|
||||
|
||||
OAM follows the same "Video Memory Wait" rule that PALRAM has, **and** you can
|
||||
also only freely access OAM during a horizontal blank if you set a special
|
||||
"HBlank Interval Free" bit in one of the IO registers (the "Display Control"
|
||||
register, which we'll talk about next lesson). The reason that you might _not_
|
||||
want to set that bit is because when it's enabled you can't draw as many objects
|
||||
at once. You don't lose the use of an exact number of objects, you actually lose
|
||||
the use of a number of display adapter drawing cycles. Since not all objects
|
||||
take the same number of cycles to render, it depends on what you're drawing.
|
||||
GBATEK [has the details](https://problemkaputt.de/gbatek.htm#lcdobjoverview) if
|
||||
you want to know precisely.
|
||||
|
||||
### Game Pak ROM (ROM)
|
||||
|
||||
* **Location:** Special (max of 32MB)
|
||||
* **Bus:** 16-bit
|
||||
* **Access:** Special
|
||||
* **Wait Cycles:** Special
|
||||
|
||||
This is where your actual game is located! As you might guess, since each
|
||||
cartridge is different, the details here depend quite a bit on the cartridge
|
||||
that you use for your game. Even a simple statement like "you can't write to the
|
||||
ROM region" isn't true for some carts if they have FlashROM.
|
||||
|
||||
The _most important_ thing to concern yourself with when considering the ROM
|
||||
portion of memory is the 32MB limit. That's compiled code, images, sound,
|
||||
everything put together. The total has to stay under 32MB.
|
||||
|
||||
The next most important thing to consider is that 16-bit bus. It means that we
|
||||
compile our programs using "Thumb state" code instead of "ARM state" code.
|
||||
Details about this can be found in the GBA Assembly section of the book, but
|
||||
just be aware that there's two different types of assembly on the GBA. You can
|
||||
switch between them, but the default for us is always Thumb state.
|
||||
|
||||
Another detail which you actually _don't_ have to think about much, but that you
|
||||
might care if you're doing precise optimization, is that the ROM address space
|
||||
is actually mirrored across three different locations:
|
||||
|
||||
* `0x800_0000` to `0x9FF_FFFF`: Wait State 0
|
||||
* `0xA00_0000` to `0xBFF_FFFF`: Wait State 1
|
||||
* `0xC00_0000` to `0xDFF_FFFF`: Wait State 2
|
||||
|
||||
These _don't_ mean 0, 1, and 2 wait cycles, they mean the wait cycles associated
|
||||
with ROM mirrors 0, 1, and 2. On some carts the game will store different parts
|
||||
of the data into different chips that are wired to be accessible through
|
||||
different parts of the mirroring. The actual wait cycles used are even
|
||||
configurable via an IO register called the
|
||||
[WAITCNT](https://problemkaputt.de/gbatek.htm#gbasystemcontrol) ("Wait Control",
|
||||
I don't know why C programmers have to give everything the worst names it's not
|
||||
1980 any more).
|
||||
|
||||
### Save RAM (SRAM)
|
||||
|
||||
* **Location:** Special (max of 64k)
|
||||
* **Bus:** 8-bit
|
||||
* **Access:** Special
|
||||
* **Wait Cycles:** Special
|
||||
|
||||
The Save RAM is also part of the cart that you've got your game on, so it also
|
||||
depends on your hardware.
|
||||
|
||||
SRAM _starts_ at `0xE00_0000` and you can save up to however much the hardware
|
||||
supports, to a maximum of 64k. However, you can only read and write SRAM one
|
||||
_byte_ at a time. What's worse, while you can _write_ to SRAM using code
|
||||
executing anywhere, you can only _read_ with code that's executing out of either
|
||||
Internal or External Work RAM, not from with code that's executing out of ROM.
|
||||
This means that you need to copy the code for doing the read into some scratch
|
||||
space (either at startup or on the fly, doesn't matter) and call that function
|
||||
you've carefully placed. It's a bit annoying, but soon enough a routine for it
|
||||
all will be provided in the `gba` crate and we won't have to worry too much
|
||||
about it.
|
||||
|
||||
(TODO: Provide the routine that I just claimed we would provide.)
|
|
@ -1,48 +0,0 @@
|
|||
# Volatile
|
||||
|
||||
I know that you just got your first program running and you're probably excited
|
||||
to learn more about GBA stuff, but first we have to cover a subject that's not
|
||||
quite GBA specific.
|
||||
|
||||
In the `hello_magic.rs` file we had these lines
|
||||
|
||||
```rust
|
||||
(0x600_0000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
|
||||
(0x600_0000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
|
||||
(0x600_0000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
|
||||
```
|
||||
|
||||
You've probably seen or heard of the
|
||||
[write](https://doc.rust-lang.org/core/ptr/fn.write.html) function before, but
|
||||
you'd be excused if you've never heard of its cousin function,
|
||||
[write_volatile](https://doc.rust-lang.org/core/ptr/fn.write_volatile.html).
|
||||
|
||||
What's the difference? Well, when the compiler sees normal reads and writes, it
|
||||
assumes that those go into plain old memory locations. CPU registers, RAM,
|
||||
wherever it is that the value's being placed. The compiler assumes that it's
|
||||
safe to optimize away some of the reads and writes, or maybe issue the reads and
|
||||
writes in a different order from what you wrote. Normally this is okay, and it's
|
||||
exactly what we want the compiler to be doing, quietly making things faster for us.
|
||||
|
||||
However, some of the time we access values from parts of memory where it's
|
||||
important that each access happen, and in the exact order that we say. In our
|
||||
`hello_magic.rs` example, we're writing directly into the video memory of the
|
||||
display. The compiler sees that the rest of the Rust program never read out of
|
||||
those locations, so it might think "oh, we can skip those writes, they're
|
||||
pointless". It doesn't know that we're having a side effect besides just storing
|
||||
some value at an address.
|
||||
|
||||
By declaring a particular read or write to be `volatile` then we can force the
|
||||
compiler to issue that access. Further, we're guaranteed that all `volatile`
|
||||
access will happen in exactly the order it appears in the program relative to
|
||||
other `volatile` access. However, non-volatile access can still be re-ordered
|
||||
relative to a volatile access. In other words, for parts of the memory that are
|
||||
volatile, we must _always_ use a volatile read or write for our program to
|
||||
perform properly.
|
||||
|
||||
For exactly this reason, we've got the [voladdress](https://docs.rs/voladdress/)
|
||||
crate. It used to be part of the GBA crate, but it became big enough to break
|
||||
out into a stand alone crate. It doesn't even do too much, it just makes it a
|
||||
lot less error prone to accidentally forget to use volatile with our memory
|
||||
mapped addresses. We just call `read` and `write` on any `VolAddress` that we
|
||||
happen to see and the right thing will happen.
|
Loading…
Add table
Reference in a new issue