-Here's a book that'll help you program in Rust on the Game Boy Advance (GBA).
-It's a work in progress of course, but so is most of everything in Rust.
-Style and Purpose
-I'm out to teach you how to program in Rust on the GBA, obviously. However,
-while there is a gba crate, and while I
-genuinely believe it to be a good and useful crate for GBA programming, we will
-not be using the gba
crate within this book. In fact we won't be using any
-crates at all. We can call it the Handmade Hero
-approach, if you like.
-I don't want to just teach you how to use the gba
crate, I want to teach you
-what you'd need to know to write the crate from scratch if it wasn't there.
-Each chapter of the book will focus on a few things you'll need to know about
-GBA programming and then present a fully self-contained example that puts those
-ideas into action. Just one file per example, no dependencies, no external
-assets, no fuss. The examples will be in the text of the book within code
-blocks, but also you can find them in the examples
-directory of the repo
-if you want to get them that way.
-
-I will try not to ask too much of the reader ahead of time, but you are expected
-to have already read The Rust Book. Having
-also read through the Rustonomicon is
-appreciated but not required.
-It's very difficult to know when you've said something that someone else won't
-already know about, or if you're presenting ideas out of order. If things aren't
-clear please file an issue and
-we'll try to address it.
-
-If you want to contact us you should join the Rust Community
-Discord and ask in the #gamedev
-channel.
-
-Ketsuban
is the wizard who knows much more about how it all works
-Lokathor
is the fool who decided to write a crate and book for it.
-
-If it's not a GBA specific question then you can probably ask any of the other
-folks in the server as well (there's a few hundred folks).
-
-If you want to read more about developing on the GBA there are some other good
-resources as well:
-
-- TONC, a tutorial series written
-for C, but it's what I based the ordering of this book's sections on.
-- GBATEK, a homebrew tech manual for
-GBA/NDS/DSi. We will regularly link to parts of it when talking about various
-bits of the GBA.
-- Getting Something for Nothing
-by James Munns. RustConf 2018. It specifically talks about zero cost
-abstraction within an embedded rust context, which is exactly what we're all
-about.
-
-
-Before you can build a GBA game you'll have to follow some special steps to
-setup the development environment. Perhaps unfortunately, there's enough detail
-here to warrant a mini-chapter all on its own.
-Once again, extra special thanks to Ketsuban, who first dove into how to
-make this all work with rust and then shared it with the world.
-
-Obviously you need your computer to have a working rust
-installation. However, you'll also need to ensure that
-you're using a nightly toolchain (we will need it for inline assembly, among
-other potential useful features). You can run rustup default nightly
to set
-nightly as the system wide default toolchain, or you can use a toolchain
-file to use
-nightly just on a specific project, but either way we'll be assuming the use of
-nightly from now on. You'll also need the rust-src
component so that
-cargo-xbuild
will be able to compile the core crate for us in a bit, so run
-rustup component add rust-src
.
-Next, you need devkitpro. They've
-got a graphical installer for Windows that runs nicely, and I guess pacman
-support on Linux (I'm on Windows so I haven't tried the Linux install myself).
-We'll be using a few of their general binutils for the arm-none-eabi
target,
-and we'll also be using some of their tools that are specific to GBA
-development, so even if you already have the right binutils for whatever
-reason, you'll still want devkitpro for the gbafix
utility.
-
-- On Windows you'll want something like
C:\devkitpro\devkitARM\bin
and
-C:\devkitpro\tools\bin
to be added to your
-PATH, depending on where you
-installed it to and such.
-- On Linux you can use pacman to get it, and the default install puts the stuff
-in
/opt/devkitpro/devkitARM/bin
and /opt/devkitpro/tools/bin
. If you need
-help you can look in our repository's
-.travis.yml
-file to see exactly what our CI does.
-
-Finally, you'll need cargo-xbuild
. Just run cargo install cargo-xbuild
and
-cargo will figure it all out for you.
-
-Once the system wide tools are ready, you'll need some particular files each
-time you want to start a new project. You can find them in the root of the
-rust-console/gba repo.
-
-thumbv4-none-agb.json
describes the overall GBA to cargo-xbuild (and LLVM)
-so it knows what to do. Technically the GBA is thumbv4-none-eabi
, but we
-change the eabi
to agb
so that we can distinguish it from other eabi
-devices when using cfg
flags.
-crt0.s
describes some ASM startup stuff. If you have more ASM to place here
-later on this is where you can put it. You also need to build it into a
-crt0.o
file before it can actually be used, but we'll cover that below.
-linker.ld
tells the linker all the critical info about the layout
-expectations that the GBA has about our program, and that it should also
-include the crt0.o
file with our compiled rust code.
-
-
-The next steps only work once you've got some source code to build. If you need
-a quick test, copy the hello1.rs
file from our examples directory in the
-repository.
-Once you've got something to build, you perform the following steps:
-
-At this point you have an ELF binary that some emulators can execute directly.
-This is helpful because it'll have debug symbols and all that, assuming a debug
-build. Specifically, mgba 0.7 beta
-1 can do it, and perhaps other
-emulators can also do it.
-However, if you want a "real" ROM that works in all emulators and that you could
-transfer to a flash cart there's a little more to do.
-
-And you're finally done!
-Of course, you probably want to make a script for all that, but it's up to you.
-On our own project we have it mostly set up within a Makefile.toml
which runs
-using the cargo-make plugin. It's
-not really the best plugin, but it's what's available.
-
-Traditionally a person writes a "hello, world" program so that they can test
-that their development environment is setup properly and to just get a feel for
-using the tools involved. To get an idea of what a small part of a source file
-will look like. All that stuff.
-Normally, you write a program that prints "hello, world" to the terminal. The
-GBA has no terminal, but it does have a screen, so instead we're going to draw
-three dots to the screen.
-
-Our first example will be a totally minimal, full magic number crazy town.
-Ready? Here goes:
-hello1.rs
-#![feature(start)]
-#![no_std]
-
-#[panic_handler]
-fn panic(_info: &core::panic::PanicInfo) -> ! {
- loop {}
-}
-
-#[start]
-fn main(_argc: isize, _argv: *const *const u8) -> isize {
- unsafe {
- (0x04000000 as *mut u16).write_volatile(0x0403);
- (0x06000000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
- (0x06000000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
- (0x06000000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
- loop {}
- }
-}
-
-Throw that into your project skeleton, build the program (as described back in
-Chapter 0), and give it a run in your emulator. You should see a red, green, and
-blue dot close-ish to the middle of the screen. If you don't, something already
-went wrong. Double check things, phone a friend, write your senators, try asking
-Ketsuban on the Rust Community Discord,
-until you're able to get your three dots going.
-
-So, what just happened? Even if you're used to Rust that might look pretty
-strange. We'll go over most of the little parts right here, and then bigger
-parts will get their own sections.
-
-# #![allow(unused_variables)]
-#![feature(start)]
-#fn main() {
-#}
-This enables the start
-feature,
-which you would normally be able to read about in the unstable book, except that
-the book tells you nothing at all except to look at the tracking
-issue.
-Basically, a GBA game is even more low-level than the normal amount of
-low-level that you get from Rust, so we have to tell the compiler to account for
-that by specifying a #[start]
, and we need this feature on to do that.
-
-# #![allow(unused_variables)]
-#![no_std]
-#fn main() {
-#}
-There's no standard library available on the GBA, so we'll have to live a core
-only life.
-
-# #![allow(unused_variables)]
-#fn main() {
-#[panic_handler]
-fn panic(_info: &core::panic::PanicInfo) -> ! {
- loop {}
-}
-#}
-This sets our panic
-handler.
-Basically, if we somehow trigger a panic, this is where the program goes.
-However, right now we don't know how to get any sort of message out to the user
-so... we do nothing at all. We can't even return from here, so we just sit in
-an infinite loop. The player will have to reset the universe from the outside.
-#[start]
-fn main(_argc: isize, _argv: *const *const u8) -> isize {
-
-This is our #[start]
. We call it main
, but it's not like a main
that you'd
-see in a Rust program. It's more like the sort of main
that you'd see in a C
-program, but it's still not that either. If you compile a #[start]
program
-for a target with an OS such as arm-none-eabi-nm
you can open up the debug
-info and see that your result will have the symbol for the C main
along side
-the symbol for the start main
that we write here. Our start main
is just its
-own unique thing, and the inputs and outputs have to be like that because that's
-how #[start]
is specified to work in Rust.
-If you think about it for a moment you'll probably realize that, those inputs
-and outputs are totally useless to us on a GBA. There's no OS on the GBA to call
-our program, and there's no place for our program to "return to" when it's done.
-Side note: if you want to learn more about stuff "before main gets called" you
-can watch a great CppCon talk by
-Matt Godbolt (yes, that Godbolt) where he delves into quite a bit of it. The
-talk doesn't really apply to the GBA, but it's pretty good.
-
-# #![allow(unused_variables)]
-#fn main() {
- unsafe {
-#}
-I hope you're all set for some unsafe
, because there's a lot of it to be had.
-
-# #![allow(unused_variables)]
-#fn main() {
- (0x04000000 as *mut u16).write_volatile(0x0403);
-#}
-Sure!
-
-# #![allow(unused_variables)]
-#fn main() {
- (0x06000000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
- (0x06000000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
- (0x06000000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
-#}
-Ah, of course.
-
-# #![allow(unused_variables)]
-#fn main() {
- loop {}
- }
-}
-#}
-And, as mentioned above, there's no place for a GBA program to "return to", so
-we can't ever let main
try to return there. Instead, we go into an infinite
-loop
that does nothing. The fact that this doesn't ever return an isize
-value doesn't seem to bother Rust, because I guess we're at least not returning
-any other type of thing instead.
-Fun fact: unlike in C++, an infinite loop with no side effects isn't Undefined
-Behavior for us rustaceans... semantically. In truth LLVM has a known
-bug in this area, so we won't
-actually be relying on empty loops in any future programs.
-
-Alright, I cheated quite a bit in the middle there. The program works, but I
-didn't really tell you why because I didn't really tell you what any of those
-magic numbers mean or do.
-
-0x04000000
is the address of an IO Register called the Display Control.
-0x06000000
is the start of Video RAM.
-
-So we write some magic to the display control register once, then we write some
-other magic to three magic locations in the Video RAM. Somehow that shows three
-dots. Gotta read on to find out why!
-
-Before we focus on what the numbers mean, first let's ask ourselves: Why are we
-doing volatile writes? You've probably never used that keywords before at all.
-What is volatile anyway?
-Well, the optimizer is pretty aggressive, and so it'll skip reads and writes
-when it thinks can. Like if you write to a pointer once, and then again a moment
-later, and it didn't see any other reads in between, it'll think that it can
-just skip doing that first write since it'll get overwritten anyway. Sometimes
-that's correct, but sometimes it's not.
-Marking a read or write as volatile tells the compiler that it really must do
-that action, and in the exact order that we wrote it out. It says that there
-might even be special hardware side effects going on that the compiler isn't
-aware of. In this case, the write to the display control register sets a video
-mode, and the writes to the Video RAM set pixels that will show up on the
-screen.
-Similar to "atomic" operations you might have heard about, all volatile
-operations are enforced to happen in the exact order that you specify them, but
-only relative to other volatile operations. So something like
-
-# #![allow(unused_variables)]
-#fn main() {
-c.write_volatile(5);
-a += b;
-d.write_volatile(7);
-#}
-might end up changing a
either before or after the change to c
(since the
-value of a
doesn't affect the write to c
), but the write to d
will
-always happen after the write to c
, even though the compiler doesn't see any
-direct data dependency there.
-If you ever go on to use volatile stuff on other platforms it's important to
-note that volatile doesn't make things thread-safe, you still need atomic for
-that. However, the GBA doesn't have threads, so we don't have to worry about
-those sorts of thread safety concerns (there's interrupts, but that's another
-matter).
-
-Of course, writing out volatile_write
every time is more than we wanna do.
-There's clarity and then there's excessive. This is a chance to write our first
-newtype.
-Basically a type that's got the exact same binary representation as some other
-type, but new methods and trait implementations.
-We want a *mut T
that's volatile by default, and also when we offset it...
-well the verdict is slightly unclear on how offset
vs wrapping_offset
work
-when you're using pointers that you made up out of nowhere. I've asked the
-experts and they genuinely weren't sure, so we'll make an offset
method that
-does a wrapping_offset
just to be careful.
-
-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Debug, Clone, Copy, Hash, PartialEq, Eq, PartialOrd, Ord)]
-#[repr(transparent)]
-pub struct VolatilePtr<T>(pub *mut T);
-impl<T> VolatilePtr<T> {
- pub unsafe fn read(&self) -> T {
- core::ptr::read_volatile(self.0)
- }
- pub unsafe fn write(&self, data: T) {
- core::ptr::write_volatile(self.0, data);
- }
- pub unsafe fn offset(self, count: isize) -> Self {
- VolatilePtr(self.0.wrapping_offset(count))
- }
-}
-#}
+
+Book Goals and Style
+
+
+
+
+
+
+
+
+
+
+
-The GBA has a large number of IO Registers (not to be confused with CPU
-registers). These are special memory locations from 0x04000000
to
-0x040003FE
. GBATEK has a full
-list, but we only need to learn
-about a few of them at a time as we go, so don't be worried.
-The important facts to know about IO Registers are these:
-
-- Each has their own specific size. Most are
u16
, but some are u32
.
-- All of them must be accessed in a
volatile
style.
-- Each register is specifically readable or writable or both. Actually, with
-some registers there are even individual bits that are read-only or
-write-only.
-
-- If you write to a read-only position, those writes are simply ignored. This
-mostly matters if a writable register contains a read-only bit (such as the
-Display Control, next section).
-- If you read from a write-only position, you get back values that are
-basically
-nonsense. There
-aren't really any registers that mix writable bits with read only bits, so
-you're basically safe here. The only (mild) concern is that when you write a
-value into a write-only register you need to keep track of what you wrote
-somewhere else if you want to know what you wrote (such to adjust an offset
-value by +1, or whatever).
-- You can always check GBATEK to be sure, but if I don't mention it then a bit
-is probably both read and write.
-
-
-- Some registers have invalid bit patterns. For example, the lowest three bits
-of the Display Control register can't legally be set to the values 6 or 7.
-
-When talking about bit positions, the numbers are zero indexed just like an
-array index is.
-
-The display control register is our first actual IO Register. GBATEK gives it the
-shorthand DISPCNT, so
-you might see it under that name if you read other guides.
-Among IO Registers, it's one of the simpler ones, but it's got enough complexity
-that we can get a hint of what's to come.
-Also it's the one that you basically always need to set at least once in every
-GBA game, so it's a good starting one to go over for that reason too.
-The display control register holds a u16
value, and is located at 0x0400_0000
.
-Many of the bits here won't mean much to you right now. That is fine. You do
-NOT need to memorize them all or what they all do right away. We'll just skim
-over all the parts of this register to start, and then we'll go into more detail
-in later chapters when we need to come back and use more of the bits.
-
-The lowest three bits (0-2) let you select from among the GBA's six video modes.
-You'll notice that 3 bits allows for eight modes, but the values 6 and 7 are
-prohibited.
-Modes 0, 1, and 2 are "tiled" modes. These are actually the modes that you
-should eventually learn to use as much as possible. It lets the GBA's limited
-video hardware do as much of the work as possible, leaving more of your CPU time
-for gameplay computations. However, they're also complex enough to deserve their
-own demos and chapters later on, so that's all we'll say about them for now.
-Modes 3, 4, and 5 are "bitmap" modes. These let you write individual pixels to
-locations on the screen.
-
-- Mode 3 is full resolution (240w x 160h) RGB15 color. You might not be used
-to RGB15, since modern computers have 24 or 32 bit colors. In RGB15, there's 5
-bits for each color channel stored within a
u16
value, and the highest bit is
-simply ignored.
-- Mode 4 is full resolution paletted color. Instead of being a
u16
color, each
-pixel value is a u8
palette index entry, and then the display uses the
-palette memory (which we'll talk about later) to store the actual color data.
-Since each pixel is half sized, we can fit twice as many. This lets us have
-two "pages". At any given moment only one page is active, and you can draw to
-the other page without the user noticing. You set which page to show with
-another bit we'll get to in a moment.
-- Mode 5 is full color, but also with pages. This means that we must have a
-reduced resolution to compensate (video memory is only so big!). The screen is
-effectively only 160w x 128h in this mode.
-
-
-Bit 3 is effectively read only. Technically it can be flipped using a BIOS call,
-but when you write to the display control register normally it won't write to
-this bit, so we'll call it effectively read only.
-This bit is on if the CPU is in CGB mode.
-
-Bit 4 lets you pick which page to use. This is only relevent in video modes 4 or
-5, and is just ignored otherwise. It's very easy to remember: when the bit is 0
-the 0th page is used, and when the bit is 1 the 1st page is used.
-The second page always starts at 0x0600_A000
.
-OAM, VRAM, and Blanking
-Bit 5 lets you access OAM during HBlank if enabled. This is cool, but it reduces
-the maximum sprites per scanline, so it's not default.
-Bit 6 lets you adjust if the GBA should treat Object Character VRAM as being 2d
-(off) or 1d (on). This particular control can be kinda tricky to wrap your head
-around, so we'll be sure to have some extra diagrams in the chapter that deals
-with it.
-Bit 7 forces the screen to stay in VBlank as long as it's set. This allows the
-fastest use of the VRAM, Palette, and Object Attribute Memory. Obviously if you
-leave this on for too long the player will notice a blank screen, but it might
-be okay to use for a moment or two every once in a while.
-
-Bits 8 through 11 control if Background layers 0 through 3 should be active.
-Bit 12 affects the Object layer.
-Note that not all background layers are available in all video modes:
-
-- Mode 0: all
-- Mode 1: 0/1/2
-- Mode 2: 2/3
-- Mode 3/4/5: 2
-
-Bit 13 and 14 enable the display of Windows 0 and 1, and Bit 15 enables the
-object display window. We'll get into how windows work later on, they let you do
-some nifty graphical effects.
-
-So what did we do to the display control register in hello1
?
-
-# #![allow(unused_variables)]
-#fn main() {
- (0x04000000 as *mut u16).write_volatile(0x0403);
-#}
-First let's convert that to
-binary, and we get
-0b100_0000_0011
. So, that's setting Mode 3 with background 2 enabled and
-nothing else special.
-
-The GBA's Video RAM is 96k stretching from 0x0600_0000
to 0x0601_7FFF
.
-The Video RAM can only be accessed totally freely during a Vertical Blank (aka
-"VBlank", though sometimes I forget and don't capitalize it properly). At other
-times, if the CPU tries to touch the same part of video memory as the display
-controller is accessing then the CPU gets bumped by a cycle to avoid a clash.
-Annoyingly, VRAM can only be properly written to in 16 and 32 bit segments (same
-with PALRAM and OAM). If you try to write just an 8 bit segment, then both parts
-of the 16 bit segment get the same value written to them. In other words, if you
-write the byte 5
to 0x0600_0000
, then both 0x0600_0000
and ALSO
-0x0600_0001
will have the byte 5
in them. We have to be extra careful when
-trying to set an individual byte, and we also have to be careful if we use
-memcopy
or memset
as well, because they're byte oriented by default and
-don't know to follow the special rules.
-
-As I said before, RGB15 stores a color within a u16
value using 5 bits for
-each color channel.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub const RED: u16 = 0b0_00000_00000_11111;
-pub const GREEN: u16 = 0b0_00000_11111_00000;
-pub const BLUE: u16 = 0b0_11111_00000_00000;
-#}
-In Mode 3 and Mode 5 we write direct color values into VRAM, and in Mode 4 we
-write palette index values, and then the color values go into the PALRAM.
-
-Mode 3 is pretty easy. We have a full resolution grid of rgb15 pixels. There's
-160 rows of 240 pixels each, with the base address being the top left corner. A
-particular pixel uses normal "2d indexing" math:
-
-# #![allow(unused_variables)]
-#fn main() {
-let row_five_col_seven = 5 + (7 * SCREEN_WIDTH);
-#}
-To draw a pixel, we just write a value at the address for the row and col that
-we want to draw to.
-
-Mode 4 introduces page flipping. Instead of one giant page at 0x0600_0000
,
-there's Page 0 at 0x0600_0000
and then Page 1 at 0x0600_A000
. The resolution
-for each page is the same as above, but instead of writing u16
values, the
-memory is treated as u8
indexes into PALRAM. The PALRAM starts at
-0x0500_0000
, and there's enough space for 256 palette entries (each a u16
).
-To set the color of a palette entry we just do a normal u16
write_volatile.
-
-# #![allow(unused_variables)]
-#fn main() {
-(0x0500_0000 as *mut u16).offset(target_index).write_volatile(new_color)
-#}
-To draw a pixel we set the palette entry that we want the pixel to use. However,
-we must remember the "minimum size" write limitation that applies to VRAM. So,
-if we want to change just a single pixel at a time we must
-
-- Read the full
u16
it's a part of.
-- Clear the half of the
u16
we're going to replace
-- Write the half of the
u16
we're going to replace with the new value
-- Write that result back to the address.
-
-So, the math for finding a byte offset is the same as Mode 3 (since they're both
-a 2d grid). If the byte offset is EVEN it'll be the high bits of the u16
at
-half the byte offset rounded down. If the offset is ODD it'll be the low bits of
-the u16
at half the byte.
-Does that make sense?
-
-- If we want to write pixel (0,0) the byte offset is 0, so we change the high
-bits of
u16
offset 0. Then we want to write to (1,0), so the byte offset is
-1, so we change the low bits of u16
offset 0. The pixels are next to each
-other, and the target bytes are next to each other, good so far.
-- If we want to write to (5,6) that'd be byte
5 + 6 * 240 = 1445
, so we'd
-target the low bits of u16
offset floor(1445/2) = 722
.
-
-As you can see, trying to write individual pixels in Mode 4 is mostly a bad
-time. Fret not! We don't have to write individual bytes. If our data is
-arranged correctly ahead of time we can just write u16
or u32
values
-directly. The video hardware doesn't care, it'll get along just fine.
-
-Mode 5 is also a two page mode, but instead of compressing the size of a pixel's
-data to fit in two pages, we compress the resolution.
-Mode 5 is full u16
color, but only 160w x 128h per page.
-
-So what got written into VRAM in hello1
?
-
-# #![allow(unused_variables)]
-#fn main() {
- (0x06000000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
- (0x06000000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
- (0x06000000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
-#}
-So at pixels (120,80)
, (136,80)
, and (120,96)
we write three values. Once
-again we probably need to convert them into
-binary to make sense of it.
-
-- 0x001F: 0b0_00000_00000_11111
-- 0x03E0: 0b0_00000_11111_00000
-- 0x7C00: 0b0_11111_00000_00000
-
-Ah, of course, a red pixel, a green pixel, and a blue pixel.
-
-Okay so let's have a look again:
-hello1
-#![feature(start)]
-#![no_std]
-
-#[panic_handler]
-fn panic(_info: &core::panic::PanicInfo) -> ! {
- loop {}
-}
-
-#[start]
-fn main(_argc: isize, _argv: *const *const u8) -> isize {
- unsafe {
- (0x04000000 as *mut u16).write_volatile(0x0403);
- (0x06000000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
- (0x06000000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
- (0x06000000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
- loop {}
- }
-}
-
-Now let's clean this up so that it's clearer what's going on.
-First we'll label that display control stuff, including using the VolatilePtr
-type from the volatile explanation:
-
-# #![allow(unused_variables)]
-#fn main() {
-pub const DISPCNT: VolatilePtr<u16> = VolatilePtr(0x04000000 as *mut u16);
-pub const MODE3: u16 = 3;
-pub const BG2: u16 = 0b100_0000_0000;
-#}
-Next we make some const values for the actual pixel drawing
-
-# #![allow(unused_variables)]
-#fn main() {
-pub const VRAM: usize = 0x06000000;
-pub const SCREEN_WIDTH: isize = 240;
-#}
-Note that VRAM has to be interpreted in different ways depending on mode, so we
-just leave it as usize
and we'll cast it into the right form closer to the
-actual use.
-Next we want a small helper function for putting together a color value.
-Happily, this one can even be declared as a const
function. At the time of
-writing, we've got the "minimal const fn" support in nightly. It really is quite
-limited, but I'm happy to let rustc and LLVM pre-compute as much as they can
-when it comes to the GBA's tiny CPU.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub const fn rgb16(red: u16, green: u16, blue: u16) -> u16 {
- blue << 10 | green << 5 | red
-}
-#}
-Finally, we'll make a function for drawing a pixel in Mode 3. Even though it's
-just a one-liner, having the "important parts" be labeled as function arguments
-usually helps you think about it a lot better.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub unsafe fn mode3_pixel(col: isize, row: isize, color: u16) {
- VolatilePtr(VRAM as *mut u16).offset(col + row * SCREEN_WIDTH).write(color);
-}
-#}
-So now we've got this:
-hello2
-#![feature(start)]
-#![no_std]
-
-#[panic_handler]
-fn panic(_info: &core::panic::PanicInfo) -> ! {
- loop {}
-}
-
-#[start]
-fn main(_argc: isize, _argv: *const *const u8) -> isize {
- unsafe {
- DISPCNT.write(MODE3 | BG2);
- mode3_pixel(120, 80, rgb16(31, 0, 0));
- mode3_pixel(136, 80, rgb16(0, 31, 0));
- mode3_pixel(120, 96, rgb16(0, 0, 31));
- loop {}
- }
-}
-
-#[derive(Debug, Clone, Copy, Hash, PartialEq, Eq, PartialOrd, Ord)]
-#[repr(transparent)]
-pub struct VolatilePtr<T>(pub *mut T);
-impl<T> VolatilePtr<T> {
- pub unsafe fn read(&self) -> T {
- core::ptr::read_volatile(self.0)
- }
- pub unsafe fn write(&self, data: T) {
- core::ptr::write_volatile(self.0, data);
- }
- pub unsafe fn offset(self, count: isize) -> Self {
- VolatilePtr(self.0.wrapping_offset(count))
- }
-}
-
-pub const DISPCNT: VolatilePtr<u16> = VolatilePtr(0x04000000 as *mut u16);
-pub const MODE3: u16 = 3;
-pub const BG2: u16 = 0b100_0000_0000;
-
-pub const VRAM: usize = 0x06000000;
-pub const SCREEN_WIDTH: isize = 240;
-
-pub const fn rgb16(red: u16, green: u16, blue: u16) -> u16 {
- blue << 10 | green << 5 | red
-}
-
-pub unsafe fn mode3_pixel(col: isize, row: isize, color: u16) {
- VolatilePtr(VRAM as *mut u16).offset(col + row * SCREEN_WIDTH).write(color);
-}
-
-Exact same program that we started with, but much easier to read.
-Of course, in the full gba
crate that this book is a part of we have these and
-other elements all labeled and sorted out for you (not identically, but
-similarly). Still, for educational purposes it's often best to do it yourself at
-least once.
-
-It's all well and good to draw three pixels, but they don't do anything yet. We
-want them to do something, and for that we need to get some input from the user.
-The GBA, as I'm sure you know, has an arrow pad, A and B, L and R, Start and
-Select. That's a little more than the NES/GB/CGB had, and a little less than the
-SNES had. As you can guess, we get key state info from an IO register.
-Also, we will need a way to keep the program from running "too fast". On a
-modern computer or console you do this with vsync info from the GPU and Monitor,
-and on the GBA we'll be using vsync info from an IO register that tracks what
-the display hardware is doing.
-As a way to apply our knowledge We'll make a simple "light cycle" game where
-your dot leaves a trail behind them and you die if you go off the screen or if
-you touch your own trail. We just make a copy of hello2.rs
named
-light_cycle.rs
and then fill it in as we go through the chapter. Normally you
-might not place the entire program into a single source file, particularly as it
-grows over time, but since these are small examples it's much better to have
-them be completely self contained than it is to have them be "properly
-organized" for the long term.
-
-The Key Input Register is our next IO register. Its shorthand name is
-KEYINPUT and it's a u16
-at 0x4000130
. The entire register is obviously read only, you can't tell the
-GBA what buttons are pressed.
-Each button is exactly one bit:
- Bit | Button |
- 0 | A |
- 1 | B |
- 2 | Select |
- 3 | Start |
- 4 | Right |
- 5 | Left |
- 6 | Up |
- 7 | Down |
- 8 | R |
- 9 | L |
-
-The higher bits above are not used at all.
-Similar to other old hardware devices, the convention here is that a button's
-bit is clear when pressed, active when released. In other words, when the
-user is not touching the device at all the KEYINPUT value will read
-0b0000_0011_1111_1111
. There's similar values for when the user is pressing as
-many buttons as possible, but since the left/right and up/down keys are on an
-arrow pad the value can never be 0 since you can't ever press every single key
-at once.
-When dealing with key input, the register always shows the exact key values at
-any moment you read it. Obviously that's what it should do, but what it means to
-you as a programmer is that you should usually gather input once at the top of a
-game frame and then use that single input poll as the input values across the
-whole game frame.
-Of course, you might want to know if a user's key state changed from frame to
-frame. That's fairly easy too: We just store the last frame keys as well as the
-current frame keys (it's only a u16
) and then we can xor the two values.
-Anything that shows up in the xor result is a key that changed. If it's changed
-and it's now down, that means it was pushed this frame. If it's changed and it's
-now up, that means it was released this frame.
-The other major thing you might frequently want is to know "which way" the arrow
-pad is pointing: Up/Down/None and Left/Right/None. Sounds like an enum to me.
-Except that often time we'll have situations where the direction just needs to
-be multiplied by a speed and applied as a delta to a position. We want to
-support that as well as we can too.
-
-Let's get down to some code. First we want to make a way to read the address as
-a u16
and then wrap that in our newtype which will implement methods for
-reading and writing the key bits.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub const KEYINPUT: VolatilePtr<u16> = VolatilePtr(0x400_0130 as *mut u16);
-
-/// A newtype over the key input state of the GBA.
-#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
-#[repr(transparent)]
-pub struct KeyInputSetting(u16);
-
-pub fn key_input() -> KeyInputSetting {
- unsafe { KeyInputSetting(KEYINPUT.read()) }
-}
-#}
-Now we want a way to check if a key is being pressed, since that's normally
-how we think of things as a game designer and even as a player. That is, usually
-you'd say "if you press A, then X happens" instead of "if you don't press A,
-then X does not happen".
-Normally we'd pick a constant for the bit we want, &
it with our value, and
-then check for val != 0
. Since the bit we're looking for is 0
in the "true"
-state we still pick the same constant and we still do the &
, but we test with
-== 0
. Practically the same, right? Well, since I'm asking a rhetorical
-question like that you can probably already guess that it's not the same. I was
-shocked to learn this too.
-All we have to do is ask our good friend
-Godbolt what's gonna happen when the code
-compiles. The link there has the page set for the stable
1.30 compiler just so
-that the link results stay consistent if you read this book in a year or
-something. Also, we've set the target to thumbv6m-none-eabi
, which is a
-slightly later version of ARM than the actual GBA, but it's close enough for
-just checking. Of course, in a full program small functions like these will
-probably get inlined into the calling code and disappear entirely as they're
-folded and refolded by the compiler, but we can just check.
-It turns out that the !=0
test is 4 instructions and the ==0
test is 6
-instructions. Since we want to get savings where we can, and we'll probably
-check the keys of an input often enough, we'll just always use a !=0
test and
-then adjust how we initially read the register to compensate. By using xor with
-a mask for only the 10 used bits we can flip the "low when pressed" values so
-that the entire result has active bits in all positions where a key is pressed.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub fn key_input() -> KeyInputSetting {
- unsafe { KeyInputSetting(KEYINPUT.read_volatile() ^ 0b0000_0011_1111_1111) }
-}
-#}
-Now we add a method for seeing if a key is pressed. In the full library there's
-a more advanced version of this that's built up via macro, but for this example
-we'll just name a bunch of const
values and then have a method that takes a
-value and says if that bit is on.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub const KEY_A: u16 = 1 << 0;
-pub const KEY_B: u16 = 1 << 1;
-pub const KEY_SELECT: u16 = 1 << 2;
-pub const KEY_START: u16 = 1 << 3;
-pub const KEY_RIGHT: u16 = 1 << 4;
-pub const KEY_LEFT: u16 = 1 << 5;
-pub const KEY_UP: u16 = 1 << 6;
-pub const KEY_DOWN: u16 = 1 << 7;
-pub const KEY_R: u16 = 1 << 8;
-pub const KEY_L: u16 = 1 << 9;
-
-impl KeyInputSetting {
- pub fn contains(&self, key: u16) -> bool {
- (self.0 & key) != 0
- }
-}
-#}
-Because each key is a unique bit you can even check for more than one key at
-once by just adding two key values together.
-
-# #![allow(unused_variables)]
-#fn main() {
-let input_contains_a_and_l = input.contains(KEY_A + KEY_L);
-#}
-And we wanted to save the state of an old frame and compare it to the current
-frame to see what was different:
-
-# #![allow(unused_variables)]
-#fn main() {
- pub fn difference(&self, other: KeyInputSetting) -> KeyInputSetting {
- KeyInputSetting(self.0 ^ other.0)
- }
-#}
-Anything that's "in" the difference output is a key that changed, and then if
-the key reads as pressed this frame that means it was just pressed. The exact
-mechanics of all the ways you might care to do something based on new key
-presses is obviously quite varied, but it might be something like this:
-
-# #![allow(unused_variables)]
-#fn main() {
-let this_frame_diff = this_frame_input.difference(last_frame_input);
-
-if this_frame_diff.contains(KEY_B) && this_frame_input.contains(KEY_B) {
- // the user just pressed B, react in some way
-}
-#}
-And for the arrow pad, we'll make an enum that easily casts into i32
. Whenever
-we're working with stuff we can try to use i32
/ isize
as often as possible
-just because it's easier on the GBA's CPU if we stick to its native number size.
-Having it be an enum lets us use match
and be sure that we've covered all our
-cases.
-
-# #![allow(unused_variables)]
-#fn main() {
-/// A "tribool" value helps us interpret the arrow pad.
-#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
-#[repr(i32)]
-pub enum TriBool {
- Minus = -1,
- Neutral = 0,
- Plus = +1,
-}
-#}
-Now, how do we determine which way is plus or minus? Well... I don't know.
-Really. I'm not sure what the best one is because the GBA really wants the
-origin at 0,0 with higher rows going down and higher cols going right. On the
-other hand, all the normal math you and I learned in school is oriented with
-increasing Y being upward on the page. So, at least for this demo, we're going
-to go with what the GBA wants us to do and give it a try. If we don't end up
-confusing ourselves then we can stick with that. Maybe we can cover it over
-somehow later on.
-
-# #![allow(unused_variables)]
-#fn main() {
- pub fn column_direction(&self) -> TriBool {
- if self.contains(KEY_RIGHT) {
- TriBool::Plus
- } else if self.contains(KEY_LEFT) {
- TriBool::Minus
- } else {
- TriBool::Neutral
- }
- }
-
- pub fn row_direction(&self) -> TriBool {
- if self.contains(KEY_DOWN) {
- TriBool::Plus
- } else if self.contains(KEY_UP) {
- TriBool::Minus
- } else {
- TriBool::Neutral
- }
- }
-#}
-So then in our game, every frame we can check for column_direction
and
-row_direction
and then apply those to the player's current position to make
-them move around the screen.
-With that settled I think we're all done with user input for now. There's some
-other things to eventually know about like key interrupts that you can set and
-stuff, but we'll cover that later on because it's not necessary right now.
-
-There's an IO register called
-VCOUNT that shows
-you, what else, the Vertical (row) COUNT(er). It's a u16
at address
-0x0400_0006
, and it's how we'll be doing our very poor quality vertical sync
-code to start.
-
-- What makes it poor? Well, we're just going to read from the vcount value as
-often as possible every time we need to wait for a specific value to come up,
-and then proceed once it hits the point we're looking for.
-- Why is this bad? Because we're making the CPU do a lot of useless work,
-which uses a lot more power that necessary. Even if you're not on an actual
-GBA you might be running inside an emulator on a phone or other handheld. You
-wanna try to save battery if all you're doing with that power use is waiting
-instead of making a game actually do something.
-- Can we do better? We can, but not yet. The better way to do things is to
-use a BIOS call to put the CPU into low power mode until a VBlank interrupt
-happens. However, we don't know about interrupts yet, and we don't know about
-BIOS calls yet, so we'll do the basic thing for now and then upgrade later.
-
-So the way that display hardware actually displays each frame is that it moves a
-tiny pointer left to right across each pixel row one pixel at a time. When it's
-within the actual screen width (240px) it's drawing out those pixels. Then it
-goes past the edge of the screen for 68px during a period known as the
-"horizontal blank" (HBlank). Then it starts on the next row and does that loop
-over again. This happens for the whole screen height (160px) and then once again
-it goes past the last row for another 68px into a "vertical blank" (VBlank)
-period.
-
-- One pixel is 4 CPU cycles
-- HDraw is 240 pixels, HBlank is 68 pixels (1,232 cycles per full scanline)
-- VDraw is 150 scanlines, VBlank is 68 scanlines (280,896 cycles per full refresh)
-
-Now you may remember some stuff from the display control register section where
-it was mentioned that some parts of memory are best accessed during VBlank, and
-also during hblank with a setting applied. These blanking periods are what was
-being talked about. At other times if you attempt to access video or object
-memory you (the CPU) might try touching the same memory that the display device
-is trying to use, in which case you get bumped back a cycle so that the display
-can finish what it's doing. Also, if you really insist on doing video memory
-changes while the screen is being drawn then you might get some visual glitches.
-If you can, just prepare all your changes ahead of time and then assign then all
-quickly during the blank period.
-So first we want a way to check the vcount value at all:
-
-# #![allow(unused_variables)]
-#fn main() {
-pub const VCOUNT: VolatilePtr<u16> = VolatilePtr(0x0400_0006 as *mut u16);
-
-pub fn vcount() -> u16 {
- unsafe { VCOUNT.read() }
-}
-#}
-Then we want two little helper functions to wait until VBlank and vdraw.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub const SCREEN_HEIGHT: isize = 160;
-
-pub fn wait_until_vblank() {
- while vcount() < SCREEN_HEIGHT as u16 {}
-}
-
-pub fn wait_until_vdraw() {
- while vcount() >= SCREEN_HEIGHT as u16 {}
-}
-#}
-And... that's it. No special types to be made this time around, it's just a
-number we read out of memory.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-Now let's make a game of "light_cycle" with our new knowledge.
-
-light_cycle
is pretty simple, and very obvious if you've ever seen Tron. The
-player moves around the screen with a trail left behind them. They die if they
-go off the screen or if they touch their own trail.
-
-We need some better drawing operations this time around.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub unsafe fn mode3_clear_screen(color: u16) {
- let color = color as u32;
- let bulk_color = color << 16 | color;
- let mut ptr = VolatilePtr(VRAM as *mut u32);
- for _ in 0..SCREEN_HEIGHT {
- for _ in 0..(SCREEN_WIDTH / 2) {
- ptr.write(bulk_color);
- ptr = ptr.offset(1);
- }
- }
-}
-
-pub unsafe fn mode3_draw_pixel(col: isize, row: isize, color: u16) {
- VolatilePtr(VRAM as *mut u16).offset(col + row * SCREEN_WIDTH).write(color);
-}
-
-pub unsafe fn mode3_read_pixel(col: isize, row: isize) -> u16 {
- VolatilePtr(VRAM as *mut u16).offset(col + row * SCREEN_WIDTH).read()
-}
-#}
-The draw pixel and read pixel are both pretty obvious. What's new is the clear
-screen operation. It changes the u16
color into a u32
and then packs the
-value in twice. Then we write out u32
values the whole way through screen
-memory. This means we have to do less write operations overall, and so the
-screen clear is twice as fast.
-Now we just have to fill in the main function:
-#[start]
-fn main(_argc: isize, _argv: *const *const u8) -> isize {
- unsafe {
- DISPCNT.write(MODE3 | BG2);
- }
-
- let mut px = SCREEN_WIDTH / 2;
- let mut py = SCREEN_HEIGHT / 2;
- let mut color = rgb16(31, 0, 0);
-
- loop {
- // read the input for this frame
- let this_frame_keys = key_input();
-
- // adjust game state and wait for vblank
- px += 2 * this_frame_keys.column_direction() as isize;
- py += 2 * this_frame_keys.row_direction() as isize;
- wait_until_vblank();
-
- // draw the new game and wait until the next frame starts.
- unsafe {
- if px < 0 || py < 0 || px == SCREEN_WIDTH || py == SCREEN_HEIGHT {
- // out of bounds, reset the screen and position.
- mode3_clear_screen(0);
- color = color.rotate_left(5);
- px = SCREEN_WIDTH / 2;
- py = SCREEN_HEIGHT / 2;
- } else {
- let color_here = mode3_read_pixel(px, py);
- if color_here != 0 {
- // crashed into our own line, reset the screen
- mode3_clear_screen(0);
- color = color.rotate_left(5);
- } else {
- // draw the new part of the line
- mode3_draw_pixel(px, py, color);
- mode3_draw_pixel(px, py + 1, color);
- mode3_draw_pixel(px + 1, py, color);
- mode3_draw_pixel(px + 1, py + 1, color);
- }
- }
- }
- wait_until_vdraw();
- }
-}
-
-Oh that's a lot more than before!
-First we set Mode 3 and Background 2, we know about that.
-Then we're going to store the player's x and y, along with a color value for
-their light cycle. Then we enter the core loop.
-We read the keys for input, and then do as much as we can without touching video
-memory. Since we're using video memory as the place to store the player's light
-trail, we can't do much, we just update their position and wait for VBlank to
-start. The player will be a 2x2 square, so the arrows will move you 2 pixels per
-frame.
-Once we're in VBlank we check to see what kind of drawing we're doing. If the
-player has gone out of bounds, we clear the screen, rotate their color, and then
-reset their position. Why rotate the color? Just because it's fun to have
-different colors.
-Next, if the player is in bounds we read the video memory for their position. If
-it's not black that means we've been here before and the player has crashed into
-their own line. In this case, we reset the game without moving them to a new
-location.
-Finally, if the player is in bounds and they haven't crashed, we write their
-color into memory at this position.
-Regardless of how it worked out, we hold here until vdraw starts before going to
-the next loop. That's all there is to it.
-
-Once again, as with the hello1
and hello2
examples, the gba
crate covers
-much of this same ground as our example here, but in slightly different ways.
-Better organization and abstractions are usually only realized once you've used
-more of the whole thing you're trying to work with. If we want to have a crate
-where the whole thing is well integrated with itself, then the examples would
-also end up having to explain about things we haven't really touched on much
-yet. It becomes a lot harder to teach.
-So, going forward, we will continue to teach concepts and build examples that
-don't directly depend on the gba
crate. This allows the crate to freely grow
-without all the past examples becoming a great inertia upon it.
-Ch 3: Memory and Objects
-Alright so we can do some basic "movement", but we left a big trail in the video
-memory of everywhere we went. Most of the time that's not what we want at all.
-If we want more hardware support we're going to have to use a new video mode. So
-far we've only used Mode 3, but modes 4 and 5 are basically the same. Instead,
-we'll switch focus to using a tiled graphical mode.
-First we will go over the complete GBA memory mapping. Part of this is the
-memory for tiled graphics, but also things like all those IO registers, where
-our RAM is for scratch space, all that stuff. Even if we can't put all of them
-to use at once, it's helpful to have an idea of what will be available in the
-long run.
-Tiled modes bring us three big new concepts that each have their own complexity:
-tiles, backgrounds, and objects. Backgrounds and objects both use tiles, but the
-background is for creating a very large static space that you can scroll around
-the view within, and the objects are about having a few moving bits that appear
-over the background. Careful use of backgrounds and objects is key to having the
-best looking GBA game, so we won't even be able to cover it all in a single
-chapter.
-And, of course, since most games are pretty boring if they're totally static
-we'll touch on the kinds of RNG implementations you might want to have on a GBA.
-Most general purpose RNGs that you find are rather big compared to the amount of
-memory we want to give them, and they often use a lot of u64
operations, so
-they end up much slower on a 32-bit machine like the GBA (you can lower 64-bit
-ops to combinations of 32-bit ops, but that's quite a bit more work). We'll
-cover a few RNG options that size down the RNG to a good size and a good speed
-without trading away too much in terms of quality.
-To top it all off, we'll make a simple "memory game" sort of thing. There's some
-face down cards in a grid, you pick one to check, then you pick the other to
-check, and then if they match the pair disappears.
-
-Both backgrounds and objects can have "priority" values associated with them.
-TONC and GBATEK have opposite ideas of what it means to have the "highest"
-priority. TONC goes by highest numerical value, and GBATEK goes by what's on the
-z-layer closest to the user. Let's list out the rules as clearly as we can:
-
-- Priority is always two bits, so 0 through 3.
-- Priority conceptually proceeds in drawing passes that count down, so any
-priority 3 things can get covered up by priority 2 things. In truth there's
-probably depth testing and buffering stuff going on so it's all one single
-pass, but conceptually we will imagine it happening as all of the 3 elements,
-then all of 2, and so on.
-- Objects always draw over top of backgrounds of equal priority.
-- Within things of the same type and priority, the lower numbered element "wins"
-and gets its pixel drawn (bg0 is favored over bg1, obj0 is favored over obj1,
-etc).
-
-
-The GBA Memory Map has
-several memory portions to it, each with their own little differences. Most of
-the memory has pre-determined use according to the hardware, but there is also
-space for games to use as a scratch pad in whatever way the game sees fit.
-The memory ranges listed here are inclusive, so they end with a lot of F's
-and E's.
-We've talked about volatile memory before, but just as a reminder I'll say that
-all of the memory we'll talk about here should be accessed using volatile with
-two exceptions:
-
-- Work RAM (both internal and external) can be used normally, and if the
-compiler is able to totally elide some reads and writes that's okay.
-- However, if you set aside any space in Work RAM where an interrupt will
-communicate with the main program then that specific location will have to
-keep using volatile access, since the compiler never knows when an interrupt
-will actually happen.
-
-
-
-This is special memory for the BIOS. It is "read-only", but even then it's only
-accessible when the program counter is pointing into the BIOS region. At all
-other times you get a garbage
-value back when you
-try to read out of the BIOS.
-
-
-0x2000000
to 0x203FFFF
(256k)
-
-This is a big pile of space, the use of which is up to each game. However, the
-external work ram has only a 16-bit bus (if you read/write a 32-bit value it
-silently breaks it up into two 16-bit operations) and also 2 wait cycles (extra
-CPU cycles that you have to expend per 16-bit bus use).
-It's most helpful to think of EWRAM as slower, distant memory, similar to the
-"heap" in a normal application. You can take the time to go store something
-within EWRAM, or to load it out of EWRAM, but if you've got several operations
-to do in a row and you're worried about time you should pull that value into
-local memory, work on your local copy, and then push it back out to EWRAM.
-
-
-0x3000000
to 0x3007FFF
(32k)
-
-This is a smaller pile of space, but it has a 32-bit bus and no wait.
-By default, 0x3007F00
to 0x3007FFF
is reserved for interrupt and BIOS use.
-The rest of it is totally up to you. The user's stack space starts at
-0x3007F00
and proceeds down from there. For best results you should probably
-start at 0x3000000
and then go upwards. Under normal use it's unlikely that
-the two memory regions will crash into each other.
-
-
-0x4000000
to 0x40003FE
-
-We've touched upon a few of these so far, and we'll get to more later. At the
-moment it is enough to say that, as you might have guessed, all of them live in
-this region. Each individual register is a u16
or u32
and they control all
-sorts of things. We'll actually be talking about some more of them in this very
-chapter, because that's how we'll control some of the background and object
-stuff.
-
-
-0x5000000
to 0x50003FF
(1k)
-
-Palette RAM has a 16-bit bus, which isn't really a problem because it
-conceptually just holds u16
values. There's no automatic wait state, but if
-you try to access the same location that the display controller is accessing you
-get bumped by 1 cycle. Since the display controller can use the palette ram any
-number of times per scanline it's basically impossible to predict if you'll have
-to do a wait or not during VDraw. During VBlank you won't have any wait of
-course.
-PALRAM is among the memory where there's weirdness if you try to write just one
-byte: if you try to write just 1 byte, it writes that byte into both parts of
-the larger 16-bit location. This doesn't really affect us much with PALRAM,
-because palette values are all supposed to be u16
anyway.
-The palette memory actually contains not one, but two sets of palettes. First
-there's 256 entries for the background palette data (starting at 0x5000000
),
-and then there's 256 entries for object palette data (starting at 0x5000200
).
-The GBA also has two modes for palette access: 8-bits-per-pixel (8bpp) and
-4-bits-per-pixel (4bpp).
-
-- In 8bpp mode an 8-bit palette index value within a background or sprite
-simply indexes directly into the 256 slots for that type of thing.
-- In 4bpp mode a 4-bit palette index value within a background or sprite
-specifies an index within a particular "palbank" (16 palette entries each),
-and then a separate setting outside of the graphical data determines which
-palbank is to be used for that background or object (the screen entry data for
-backgrounds, and the object attributes for objects).
-
-
-When a pixel within a background or object specifies index 0 as its palette
-entry it is treated as a transparent pixel. This means that in 8bpp mode there's
-only 255 actual color options (0 being transparent), and in 4bpp mode there's
-only 15 actual color options available within each palbank (the 0th entry of
-each palbank is transparent).
-Individual backgrounds, and individual objects, each determine if they're 4bpp
-or 8bpp separately, so a given overall palette slot might map to a used color in
-8bpp and an unused/transparent color in 4bpp. If you're a palette wizard.
-Palette slot 0 of the overall background palette is used to determine the
-"backdrop" color. That's the color you see if no background or object ends up
-being rendered within a given pixel.
-Since display mode 3 and display mode 5 don't use the palette, they cannot
-benefit from transparency.
-
-
-0x6000000
to 0x6017FFF
(96k)
-
-We've used this before! VRAM has a 16-bit bus and no wait. However, the same as
-with PALRAM, the "you might have to wait if the display controller is looking at
-it" rule applies here.
-Unfortunately there's not much more exact detail that can be given about VRAM.
-The use of the memory depends on the video mode that you're using.
-One general detail of note is that you can't write individual bytes to any part
-of VRAM. Depending on mode and location, you'll either get your bytes doubled
-into both the upper and lower parts of the 16-bit location targeted, or you
-won't even affect the memory. This usually isn't a big deal, except in two
-situations:
-
-- In Mode 4, if you want to change just 1 pixel, you'll have to be very careful
-to read the old
u16
, overwrite just the byte you wanted to change, and then
-write that back.
-- In any display mode, avoid using
memcopy
to place things into VRAM.
-It's written to be byte oriented, and only does 32-bit transfers under select
-conditions. The rest of the time it'll copy one byte at a time and you'll get
-either garbage or nothing at all.
-
-
-
-0x7000000
to 0x70003FF
(1k)
-
-The Object Attribute Memory has a 32-bit bus and no default wait, but suffers
-from the "you might have to wait if the display controller is looking at it"
-rule. You cannot write individual bytes to OAM at all, but that's not really a
-problem because all the fields of the data types within OAM are either i16
or
-u16
anyway.
-Object attribute memory is the wildest yet: it conceptually contains two types
-of things, but they're interlaced with each other all the way through.
-Now, GBATEK and
-CowByte
-doesn't quite give names to the two data types here.
-TONC calls them
-OBJ_ATTR
and OBJ_AFFINE
, but we'll be giving them names fitting with the
-Rust naming convention. Just know that if you try to talk about it with others
-they might not be using the same names. In Rust terms their layout would look
-like this:
-
-# #![allow(unused_variables)]
-#fn main() {
-#[repr(C)]
-pub struct ObjectAttributes {
- attr0: u16,
- attr1: u16,
- attr2: u16,
- filler: i16,
-}
-
-#[repr(C)]
-pub struct AffineMatrix {
- filler0: [u16; 3],
- pa: i16,
- filler1: [u16; 3],
- pb: i16,
- filler2: [u16; 3],
- pc: i16,
- filler3: [u16; 3],
- pd: i16,
-}
-#}
-(Note: the #[repr(C)]
part just means that Rust must lay out the data exactly
-in the order we specify, which otherwise it is not required to do).
-So, we've got 1024 bytes in OAM and each ObjectAttributes
value is 8 bytes, so
-naturally we can support up to 128 objects.
-At the same time, we've got 1024 bytes in OAM and each AffineMatrix
is 32
-bytes, so we can have 32 of them.
-But, as I said, these things are all interlaced with each other. See how
-there's "filler" fields in each struct? If we imagine the OAM as being just an
-array of one type or the other, indexes 0/1/2/3 of the ObjectAttributes
array
-would line up with index 0 of the AffineMatrix
array. It's kinda weird, but
-that's just how it works. When we setup functions to read and write these values
-we'll have to be careful with how we do it. We probably won't want to use
-those representations above, at least not with the AffineMatrix
type, because
-they're quite wasteful if you want to store just object attributes or just
-affine matrices.
-
-
-0x8000000
to 0x9FFFFFF
(wait 0)
-0xA000000
to 0xBFFFFFF
(wait 1)
-0xC000000
to 0xDFFFFFF
(wait 2)
-- Max of 32Mb
-
-These portions of the memory are less fixed, because they depend on the precise
-details of the game pak you've inserted into the GBA. In general, they connect
-to the game pak ROM and/or Flash memory, using a 16-bit bus. The ROM is
-read-only, but the Flash memory (if any) allows writes.
-The game pak ROM is listed as being in three sections, but it's actually the
-same memory being effectively mirrored into three different locations. The
-mirror that you choose to access the game pak through affects which wait state
-setting it uses (configured via IO register of course). Unfortunately, the
-details come down more to the game pak hardware that you load your game onto
-than anything else, so there's not much I can say right here. We'll eventually
-talk about it more later when I'm forced to do the boring thing and just cover
-all the IO registers that aren't covered anywhere else.
-One thing of note is the way that the 16-bit bus affects us: the instructions to
-execute are coming through the same bus as the rest of the game data, so we want
-them to be as compact as possible. The ARM chip in the GBA supports two
-different instruction sets, "thumb" and "non-thumb". The thumb mode instructions
-are 16-bit, so they can each be loaded one at a time, and the non-thumb
-instructions are 32-bit, so we're at a penalty if we execute them directly out
-of the game pak. However, some things will demand that we use non-thumb code, so
-we'll have to deal with that eventually. It's possible to switch between modes,
-but it's a pain to keep track of what mode you're in because there's not
-currently support for it in Rust itself (perhaps some day). So we'll stick with
-thumb code as much as we possibly can, that's why our target profile for our
-builds starts with thumbv4
.
-
-
-0xE000000
to 0xE00FFFF
(64k)
-
-The game pak SRAM has an 8-bit bus. Why did Pokémon always take so long to save?
-Saving the whole game one byte at a time is why. The SRAM also has some amount
-of wait, but as with the ROM, the details depend on your game pak hardware (and
-also as with ROM, you can adjust the settings with an IO register, should you
-need to).
-One thing to note about the SRAM is that the GBA has a Direct Memory Access
-(DMA) feature that can be used for bulk memory movements in some cases, but the
-DMA cannot access the SRAM region. You really are stuck reading and writing
-one byte at a time when you're using the SRAM.
-
-When using the GBA's hardware graphics, if you want to let the hardware do most
-of the work you have to use Modes 0, 1 or 2. However, to do that we first have
-to learn about how tile data works inside of the GBA.
-
-Fundamentally, a tile is an 8x8 image. If you want anything bigger than 8x8 you
-need to arrange several tiles so that it looks like whatever you're trying to
-draw.
-As was already mentioned, the GBA supports two different color modes: 4 bits per
-pixel and 8 bits per pixel. This means that we have two types of tile that we
-need to model. The pixel bits always represent an index into the PALRAM.
-
-- With 4 bits per pixel, the PALRAM is imagined to be 16 palbank sections of
-16 palette entries each. The image data selects the index within the palbank,
-and an external configuration selects which palbank is used.
-- With 8 bits per pixel, the PALRAM is imagined to be a single 256 entry array
-and the index just directly picks which of the 256 colors is used.
-
-Knowing this, we can write the following definitions:
-
-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Debug, Clone, Copy, Default)]
-#[repr(transparent)]
-pub struct Tile4bpp {
- pub data: [u32; 8]
-}
-
-#[derive(Debug, Clone, Copy, Default)]
-#[repr(transparent)]
-pub struct Tile8bpp {
- pub data: [u32; 16]
-}
-#}
-I hope this makes sense so far. At 4bpp, we have 4 bits per pixel, times 8
-pixels per line, times 8 lines: 256 bits required. Similarly, at 8 bits per
-pixel we'll need 512 bits. Why are we defining them as arrays of u32
values?
-Because when it comes time to do bulk copies the fastest way to it will be to go
-one whole machine word at a time. If we make the data inside the type be an
-array of u32
then it'll already be aligned for fast u32
bulk copies.
-Keeping track of the current color depth is naturally the programmer's
-problem. If you get it wrong you'll see a whole ton of garbage pixels all over
-the screen, and you'll probably be able to guess why. You know, unless you did
-one of the other things that can make a bunch of garbage pixels show up all over
-the screen. Graphics programming is fun like that.
-
-Tiles don't just sit on their own, they get grouped into charblocks. Long
-ago in the distant past, video games were built with hardware that was also used
-to make text terminals. So tile image data was called "character data". In fact
-some guides will even call the regular mode for the background layers "text
-mode", despite the fact that you obviously don't have to show text at all.
-A charblock is 16kb long (0x4000
bytes), which means that the number of tiles
-that fit into a charblock depends on your color depth. With 4bpp you get 512
-tiles, and with 8bpp there's 256 tiles. So they'd be something like this:
-
-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Clone, Copy)]
-#[repr(transparent)]
-pub struct Charblock4bpp {
- pub data: [Tile4bpp; 512],
-}
-
-#[derive(Clone, Copy)]
-#[repr(transparent)]
-pub struct Charblock8bpp {
- pub data: [Tile8bpp; 256],
-}
-#}
-You'll note that we can't even derive Debug
or Default
any more because the
-arrays are so big. Rust supports Clone and Copy for arrays of any size, but the
-rest is still size 32 or less. We won't generally be making up an entire
-Charblock on the fly though, so it's not a big deal. If we absolutely had to,
-we could call core::mem::zeroed()
, but we really don't want to be trying to
-build a whole charblock at runtime. We'll usually want to define our tile data
-as const
charblock values (or even parts of charblock values) that we then
-load out of the game pak ROM at runtime.
-Anyway, with 16k per charblock and only 96k total in VRAM, it's easy math to see
-that there's 6 different charblocks in VRAM when in a tiled mode. The first four
-of these are for backgrounds, and the other two are for objects. There's rules
-for how a tile ID on a background or object selects a tile within a charblock,
-but since they're different between backgrounds and objects we'll cover that on
-their own pages.
-
-It's very important to note that if you use a normal image editor you'll get
-very bad results if you translate that directly into GBA memory.
-Imagine you have part of an image that's 16 by 16 pixels, aka 2 tiles by 2
-tiles. The data for that bitmap is the 1st row of the 1st tile, then the 1st row
-of the 2nd tile. However, when we translate that into the GBA, the first 8
-pixels will indeed be the first 8 tile pixels, but then the next 8 pixels in
-memory will be used as the 2nd row of the first tile, not the 1st row of the
-2nd tile.
-So, how do we fix this?
-Well, the simple but annoying way is to edit your tile image as being an 8 pixel
-wide image and then have the image get super tall as you add more and more
-tiles. It can work, but it's really impractical if you have any multi-tile
-things that you're trying to do.
-Instead, there are some image conversion tools that devkitpro provides in their
-gba-dev section. They let you take normal images and then repackage them and
-export it in various formats that you can then compile into your project.
-Ketsuban uses the grit tool, with the
-following suggestions:
-
-- Include an actual resource file and a file describing it somewhere in your
-project (see the grit
-manual for all details
-involved here).
-- In a
build.rs
you run grit
on each resource+description pair, such as in
-this old gist
-example
-- Then within your rust code you use the
-include_bytes!
-macro to have the formatted resource be available as a const value you can
-load at runtime.
-
-
-So, backgrounds, they're cool. Why do we call the ones here "regular"
-backgrounds? Because there's also "affine" backgrounds. However, affine math
-stuff adds a complication, so for now we'll just work with regular backgrounds.
-The non-affine backgrounds are sometimes called "text mode" backgrounds by other
-guides.
-To get your background image working you generally need to perform all of the
-following steps, though I suppose the exact ordering is up to you.
-
-When you want regular tiled display, you must use video mode 0 or 1.
-
-- Mode 0 allows for using all four BG layers (0 through 3) as regular
-backgrounds.
-- Mode 1 allows for using BG0 and BG1 as regular backgrounds, BG2 as an affine
-background, and BG3 not at all.
-- Mode 2 allows for BG2 and BG3 to be used as affine backgrounds, while BG0 and
-BG1 cannot be used at all.
-
-We will not cover affine backgrounds in this chapter, so we will naturally be
-using video mode 0.
-Also, note that you have to enable each background layer that you want to use
-within the display control register.
-
-Background palette starts at 0x5000000
and is 256 u16
values long. It'd
-potentially be possible declare a static array starting at a fixed address and
-use a linker script to make sure that it ends up at the right spot in the final
-program, but since we have to use volatile reads and writes with PALRAM anyway,
-we'll just reuse our VolatilePtr
type. Something like this:
-
-# #![allow(unused_variables)]
-#fn main() {
-pub const PALRAM_BG_BASE: VolatilePtr<u16> = VolatilePtr(0x500_0000 as *mut u16);
-
-pub fn bg_palette(slot: usize) -> u16 {
- assert!(slot < 256);
- unsafe { PALRAM_BG_BASE.offset(slot as isize).read() }
-}
-
-pub fn set_bg_palette(slot: usize, color: u16) {
- assert!(slot < 256);
- unsafe { PALRAM_BG_BASE.offset(slot as isize).write(color) }
-}
-#}
-As we discussed with the tile color depths, the palette can be utilized as a
-single block of palette values ([u16; 256]
) or as 16 palbanks of 16 palette
-values each ([[u16;16]; 16]
). This setting is assigned per background layer
-via IO register.
-
-Tile data is placed into charblocks. A charblock is always 16kb, so depending on
-color depth it will have either 256 or 512 tiles within that charblock.
-Charblocks 0, 1, 2, and 3 are all for background tiles. That's a maximum of 2048
-tiles for backgrounds, but as you'll see in a moment a particular tilemap entry
-can't even index that high. Instead, each background layer is assigned a
-"character base block", and then tilemap entries index relative to the character
-base block of that background layer.
-Now, if you want to move in a lot of tile data you'll probably want to use a DMA
-routine, or at least write a function like memcopy32 for fast u32
copying from
-ROM into VRAM. However, for now, and because we're being very explicit since
-this is our first time doing it, we'll write it as functions for individual tile
-reads and writes.
-The math works like indexing a pointer, except that we have two sizes we need to
-go by. First you take the base address for VRAM (0x600_0000
), then add the
-size of a charblock (16kb) times the charblock you want to place the tile
-within, and then you add the index of the tile slot you're placing it into times
-the size of that type of tile. Like this:
-
-# #![allow(unused_variables)]
-#fn main() {
-pub fn bg_tile_4bpp(base_block: usize, tile_index: usize) -> Tile4bpp {
- assert!(base_block < 4);
- assert!(tile_index < 512);
- let address = VRAM + size_of::<Charblock4bpp>() * base_block + size_of::<Tile4bpp>() * tile_index;
- unsafe { VolatilePtr(address as *mut Tile4bpp).read() }
-}
-
-pub fn set_bg_tile_4bpp(base_block: usize, tile_index: usize, tile: Tile4bpp) {
- assert!(base_block < 4);
- assert!(tile_index < 512);
- let address = VRAM + size_of::<Charblock4bpp>() * base_block + size_of::<Tile4bpp>() * tile_index;
- unsafe { VolatilePtr(address as *mut Tile4bpp).write(tile) }
-}
-
-pub fn bg_tile_8bpp(base_block: usize, tile_index: usize) -> Tile8bpp {
- assert!(base_block < 4);
- assert!(tile_index < 256);
- let address = VRAM + size_of::<Charblock8bpp>() * base_block + size_of::<Tile8bpp>() * tile_index;
- unsafe { VolatilePtr(address as *mut Tile8bpp).read() }
-}
-
-pub fn set_bg_tile_8bpp(base_block: usize, tile_index: usize, tile: Tile8bpp) {
- assert!(base_block < 4);
- assert!(tile_index < 256);
- let address = VRAM + size_of::<Charblock8bpp>() * base_block + size_of::<Tile8bpp>() * tile_index;
- unsafe { VolatilePtr(address as *mut Tile8bpp).write(tile) }
-}
-#}
-For bulk operations, you'd do the exact same math to get your base destination
-pointer, and then you'd get the base source pointer for the tile you're copying
-out of ROM, and then you'd do the bulk copy for the correct number of u32
-values that you're trying to move (8 per tile moved for 4bpp, or 16 per tile
-moved for 8bpp).
-GBA Limitation Note: on a modern PC (eg: x86
or x86_64
) you're probably
-used to index based loops and iterator based loops being the same speed. The CPU
-has the ability to do a "fused multiply add", so the base address of the array
-plus desired index * size per element is a single CPU operation to compute. It's
-slightly more complicated if there's arrays within arrays like there are here,
-but with normal arrays it's basically the same speed to index per loop cycle as
-it is to take a base address and then add +1 offset per loop cycle. However, the
-GBA's CPU can't do any of that. On the GBA, there's a genuine speed difference
-between looping over indexes and then indexing each loop (slow) compared to
-using an iterator that just stores an internal pointer and does +1 offset per
-loop until it reaches the end (fast). The repeated indexing itself can by itself
-be an expensive step. If it's like a 3 element array it's no big deal, but if
-you've got a big slice of data to process, be sure to go over it with .iter()
-and .iter_mut()
if you can, instead of looping by index. This is Rust and all,
-so probably you were gonna do that anyway, but just a heads up.
-
-I believe that at one point I alluded to a tilemap existing. Well, just as the
-tiles are arranged into charblocks, the data describing what tile to show in
-what location is arranged into a thing called a screenblock.
-A screenblock is placed into VRAM the same as the tile data charblocks. Starting
-at the base of VRAM (0x600_0000
) there are 32 slots for the screenblock array.
-Each screenblock is 2048 bytes (0x800
). Naturally, if our tiles are using up
-charblock space within VRAM and our tilemaps are using up screenblock space
-within the same VRAM... well it would just be a disaster if they ran in to
-each other. Once again, it's up to you as the programmer to determine how much
-space you want to devote to each thing. Each complete charblock uses up 8
-screenblocks worth of space, but you don't have to fill a complete charblock
-with tiles, so you can be very fiddly with how you split the memory.
-Each screenblock is composed of a series of screenblock entry values, which
-describe what tile index to use and if the tile should be flipped and what
-palbank it should use (if any). Because both regular backgrounds and affine
-backgrounds are composed of screenblocks with entries, and because the affine
-background has a smaller format for screenblock entries, we'll name
-appropriately.
-
-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Clone, Copy)]
-#[repr(transparent)]
-pub struct RegularScreenblock {
- pub data: [RegularScreenblockEntry; 32 * 32],
-}
-
-#[derive(Debug, Clone, Copy, Default)]
-#[repr(transparent)]
-pub struct RegularScreenblockEntry(u16);
-#}
-So, with one entry per tile, a single screenblock allows for 32x32 tiles worth of
-background.
-The format of a regular screenblock entry is quite simple compared to some of
-the IO register stuff:
-
-- 10 bits for tile index (base off of the character base block of the background)
-- 1 bit for horizontal flip
-- 1 bit for vertical flip
-- 4 bits for picking which palbank to use (if 4bpp, otherwise it's ignored)
-
-
-# #![allow(unused_variables)]
-#fn main() {
-impl RegularScreenblockEntry {
- pub fn tile_id(self) -> u16 {
- self.0 & 0b11_1111_1111
- }
- pub fn set_tile_id(&mut self, id: u16) {
- self.0 &= !0b11_1111_1111;
- self.0 |= id;
- }
- pub fn horizontal_flip(self) -> bool {
- (self.0 & (1 << 0xA)) > 0
- }
- pub fn set_horizontal_flip(&mut self, bit: bool) {
- if bit {
- self.0 |= 1 << 0xA;
- } else {
- self.0 &= !(1 << 0xA);
- }
- }
- pub fn vertical_flip(self) -> bool {
- (self.0 & (1 << 0xB)) > 0
- }
- pub fn set_vertical_flip(&mut self, bit: bool) {
- if bit {
- self.0 |= 1 << 0xB;
- } else {
- self.0 &= !(1 << 0xB);
- }
- }
- pub fn palbank_index(self) -> u16 {
- self.0 >> 12
- }
- pub fn set_palbank_index(&mut self, palbank_index: u16) {
- self.0 &= 0b1111_1111_1111;
- self.0 |= palbank_index << 12;
- }
-}
-#}
-Now, at either 256 or 512 tiles per charblock, you might be thinking that with a
-10 bit index you can index past the end of one charblock and into the next.
-You'd be right, mostly.
-As long as you stay within the background memory region for charblocks (that is,
-0 through 3), then it all works out. However, if you try to get the background
-rendering to reach outside of the background charblocks you'll get an
-implementation defined result. It's not the dreaded "undefined behavior" we're
-often worried about in programming, but the results are determined by what
-you're running the game on. With GBA hardware you get a bizarre result
-(basically another way to put garbage on the screen). With a DS it acts as if
-the tiles were all 0s. If you use an emulator it might or might not allow for
-you to do this, it's up to the emulator writers.
-
-Instead of being just a single IO register to learn about this time, there's two
-separate groups of related registers.
-
-
-- BG0CNT (
0x400_0008
): BG0 Control
-- BG1CNT (
0x400_000A
): BG1 Control
-- BG2CNT (
0x400_000C
): BG2 Control
-- BG3CNT (
0x400_000E
): BG3 Control
-
-Each of these are a read/write u16
location. This is where we get to all of
-the important details that we've been putting off.
-
-- 2 bits for the priority.
-- 2 bits for "character base block", the charblock that all of the tile indexes
-for this background are offset from.
-- 1 bit for mosaic effect being enabled (we'll get to that below).
-- 1 bit to enable 8bpp, otherwise 4bpp is used.
-- 5 bits to pick the "screen base block", the screen block that serves as the
-base value for this background.
-- 1 bit that is not used in regular mode, but in affine mode it can be enabled
-to cause the affine background to wrap around at the edges.
-- 2 bits for the background size.
-
-The size works a little funny. When size is 0 only the base screen block is
-used. If size is 1 or 2 then the base screenblock and the following screenblock
-are placed next to each other (horizontally for 1, vertically for 2). If the
-size is 3 then the base screenblock and the following three screenblocks are
-arranged into a 2x2 grid of screenblocks.
-
-
-- BG0HOFS (
0x400_0010
): BG0 X-Offset
-- BG0VOFS (
0x400_0012
): BG0 Y-Offset
-- BG1HOFS (
0x400_0014
): BG1 X-Offset
-- BG1VOFS (
0x400_0016
): BG1 Y-Offset
-- BG2HOFS (
0x400_0018
): BG2 X-Offset
-- BG2VOFS (
0x400_001A
): BG2 Y-Offset
-- BG3HOFS (
0x400_001C
): BG3 X-Offset
-- BG3VOFS (
0x400_001E
): BG3 Y-Offset
-
-Each of these are a write only u16
location. Bits 0 through 8 are used, so
-the offsets can be 0 through 511. They also only apply in regular backgrounds.
-If a background is in an affine state then you'll use different IO registers to
-control it (discussed in a later chapter).
-The offset that you assign determines the pixel offset of the display area
-relative to the start of the background scene, as if the screen was a camera
-looking at the scene. In other words, as a BG X offset value increases, you can
-think of it as the camera moving to the right, or as that background moving to
-the left. Like when mario walks toward the goal. Similarly, when a BG Y offset
-increases the camera is moving down, or the background is moving up, like when
-mario falls down from a high platform.
-Depending on how much the background is scrolled and the size of the background,
-it will loop.
-
-As a special effect, you can apply mosaic to backgrounds and objects. It's just
-a single flag for each background, so all backgrounds will use the same mosaic
-settings when they have it enabled. What it actually does is split the normal
-image into "blocks" and then each block gets the color of the top left pixel of
-that block. This is the effect you see when link hits an electric foe with his
-sword and the whole screen "buzzes" at you.
-The mosaic control is a write only u16
IO register at 0x400_004C
.
-There's 4 bits each for:
-
-- Horizontal BG stretch
-- Vertical BG stretch
-- Horizontal object stretch
-- Vertical object stretch
-
-The inputs should be 1 less than the desired block size. So if you set a
-stretch value of 5 then pixels 0-5 would be part of the first block (6 pixels),
-then 6-11 is the next block (another 6 pixels) and so on.
-If you need to make a pixel other than the top left part of each block the one
-that determines the mosaic color you can carefully offset the background or
-image by a tiny bit, but of course that makes every mosaic block change its
-target pixel. You can't change the target pixel on a block by block basis.
-
-As with backgrounds, objects can be used in both an affine and non-affine way.
-For this section we'll focus on the non-affine elements, and then we'll do all
-the affine stuff in a later chapter.
-
-As TONC helpfully reminds us
-(and then proceeds to not follow its own advice), we should always try to think
-in terms of objects, not sprites. A sprite is a logical / software concern,
-perhaps a player concern, whereas an object is a hardware concern.
-What's more, a given sprite that the player sees might need more than one object
-to display. Objects must be either square or rectangular (so sprite bits that
-stick out probably call for a second object), and can only be from 8x8 to 64x64
-(so anything bigger has to be two objects lined up to appear as one).
-
-Unlike with backgrounds, you can enable the object layer in any video mode.
-There's space for 128 object definitions in OAM.
-The display gets a number of cycles per scanline to process objects: 1210 by
-default, but only 954 if you enable the "HBlank interval free" setting in the
-display control register. The cycle cost per
-object depends on the
-object's size and if it's using affine or regular mode, so enabling the HBlank
-interval free setting doesn't cut the number of objects displayable by an exact
-number of objects. The objects are processed in order of their definitions and
-if you run out of cycles then the rest just don't get shown. If there's a
-concern that you might run out of cycles you can place important objects (such
-as the player) at the start of the list and then less important animation
-objects later on.
-
-Objects use the palette the same as the background does. The only difference is
-that the palette data for objects starts at 0x500_0200
.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub const PALRAM_OBJECT_BASE: VolatilePtr<u16> = VolatilePtr(0x500_0200 as *mut u16);
-
-pub fn object_palette(slot: usize) -> u16 {
- assert!(slot < 256);
- unsafe { PALRAM_OBJECT_BASE.offset(slot as isize).read() }
-}
-
-pub fn set_object_palette(slot: usize, color: u16) {
- assert!(slot < 256);
- unsafe { PALRAM_OBJECT_BASE.offset(slot as isize).write(color) }
-}
-#}
-
-Objects, as with backgrounds, are composed of 8x8 tiles, and if you want
-something bigger than 8x8 you have to use more than one tile put together.
-Object tiles go into the final two charblocks of VRAM (indexes 4 and 5). Because
-there's only two of them, they are sometimes called the lower block
-(0x601_0000
) and the higher/upper block (0x601_4000
).
-Tile indexes for sprites always offset from the base of the lower block, and
-they always go 32 bytes at a time, regardless of if the object is set for 4bpp
-or 8bpp. From this we can determine that there's 512 tile slots in each of the
-two object charblocks. However, in video modes 3, 4, and 5 the space for the
-background cuts into the lower charblock, so you can only safely use the upper
-charblock.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub fn obj_tile_4bpp(tile_index: usize) -> Tile4bpp {
- assert!(tile_index < 512);
- let address = VRAM + size_of::<Charblock4bpp>() * 4 + 32 * tile_index;
- unsafe { VolatilePtr(address as *mut Tile4bpp).read() }
-}
-
-pub fn set_obj_tile_4bpp(tile_index: usize, tile: Tile4bpp) {
- assert!(tile_index < 512);
- let address = VRAM + size_of::<Charblock4bpp>() * 4 + 32 * tile_index;
- unsafe { VolatilePtr(address as *mut Tile4bpp).write(tile) }
-}
-
-pub fn obj_tile_8bpp(tile_index: usize) -> Tile8bpp {
- assert!(tile_index < 512);
- let address = VRAM + size_of::<Charblock8bpp>() * 4 + 32 * tile_index;
- unsafe { VolatilePtr(address as *mut Tile8bpp).read() }
-}
-
-pub fn set_obj_tile_8bpp(tile_index: usize, tile: Tile8bpp) {
- assert!(tile_index < 512);
- let address = VRAM + size_of::<Charblock8bpp>() * 4 + 32 * tile_index;
- unsafe { VolatilePtr(address as *mut Tile8bpp).write(tile) }
-}
-#}
-With backgrounds you picked every single tile individually with a bunch of
-screen entry values. Objects don't do that at all. Instead you pick a base tile,
-size, and shape, then it figures out the rest from there. However, you may
-recall back with the display control register something about an "object memory
-1d" bit. This is where that comes into play.
-
-- If object memory is set to be 2d (the default) then each charblock is treated
-as 32 tiles by 32 tiles square. Each object has a base tile and dimensions,
-and that just extracts directly from the charblock picture as if you were
-selecting an area. This mode probably makes for the easiest image editing.
-- If object memory is set to be 1d then the tiles are loaded sequentially from
-the starting point, enough to fill in the object's dimensions. This most
-probably makes it the easiest to program with about things, since programming
-languages are pretty good at 1d things.
-
-I'm not sure I explained that well, here's a picture:
-
-In 2d mode, a new row of tiles starts every 32 tile indexes.
-Of course, the mode that you actually end up using is not particularly
-important, since it should be the job of your image conversion routine to get
-everything all lined up and into place anyway.
-
-The final step is to assign the correct attributes to an object. Each object has
-three u16
values that make up its overall attributes.
-Before we go into the details, I want to bring up that the hardware will attempt
-to process every single object every single frame if the object layer is
-enabled, and also that all of the GBA's object memory is cleared to 0 at
-startup. Why do these two things matter right now? As you'll see in a second an
-"all zero" set of object attributes causes an 8x8 object to appear at 0,0 using
-object tile index 0. This is usually not what you want your unused objects to
-do. When your game first starts you should take a moment to mark any objects you
-won't be using as objects to not render.
-
-
-- 8 bits for row coordinate (marks the top of the sprite)
-- 2 bits for object rendering: 0 = Normal, 1 = Affine, 2 = Disabled, 3 = Affine with double rendering area
-- 2 bits for object mode: 0 = Normal, 1 = Alpha Blending, 2 = Object Window, 3 = Forbidden
-- 1 bit for mosaic enabled
-- 1 bit 8bpp color enabled
-- 2 bits for shape: 0 = Square, 1 = Horizontal, 2 = Vertical, 3 = Forbidden
-
-If an object is 128 pixels big at Y > 128 you'll get a strange looking result
-where it acts like Y > -128 and then displays partly off screen to the top.
-
-
-- 9 bit for column coordinate (marks the left of the sprite)
-- Either:
-
-- 3 empty bits, 1 bit for horizontal flip, 1 bit for vertical flip (non-affine)
-- 5 bits for affine index (affine)
-
-
-- 2 bits for size.
-
- Size | Square | Horizontal | Vertical |
- 0 | 8x8 | 16x8 | 8x16 |
- 1 | 16x16 | 32x8 | 8x32 |
- 2 | 32x32 | 32x16 | 16x32 |
- 3 | 64x64 | 64x32 | 32x64 |
-
-
-
-- 10 bits for the base tile index
-- 2 bits for priority
-- 4 bits for the palbank index (4bpp mode only, ignored in 8bpp)
-
-
-So I said in the GBA memory mapping section that C people would tell you that
-the object attributes should look like this:
-
-# #![allow(unused_variables)]
-#fn main() {
-#[repr(C)]
-pub struct ObjectAttributes {
- attr0: u16,
- attr1: u16,
- attr2: u16,
- filler: i16,
-}
-#}
-Except that:
-
-- It's wasteful when we store object attributes on their own outside of OAM
-(which we definitely might want to do).
-- In Rust we can't access just one field through a volatile pointer (our
-pointers aren't actually volatile to begin with, just the ops we do with them
-are). We have to read or write the whole pointer's value at a time.
-Similarly, we can't do things like
|=
and &=
with volatile in Rust. So in
-rust we can't have a volatile pointer to an ObjectAttributes and then write
-to just the three "real" values and not touch the filler field. Having the
-filler value in there just means we have to dance around it more, not less.
-- We want to newtype this whole thing to prevent accidental invalid states from
-being written into memory.
-
-So we will not be using that representation. At the same time we want to have no
-overhead, so we will stick to three u16
values. We could newtype each
-individual field to be its own type (ObjectAttributesAttr0
or something silly
-like that), since there aren't actual dependencies between two different fields
-such that a change in one can throw another into a forbidden state. The worst
-that can happen is if we disable or enable affine mode (attr0
) it can change
-the meaning of attr1
. The changed meaning isn't actually in invalid state
-though, so we could make each field its own type if we wanted.
-However, when you think about it, I can't imagine a common situation where we do
-something like make an attr0
value that we then want to save on its own and
-apply to several different ObjectAttributes
that we make during a game. That
-just doesn't sound likely to me. So, we'll go the route where ObjectAttributes
-is just a big black box to the outside world and we don't need to think about
-the three fields internally as being separate.
-First we make it so that we can get and set object attributes from memory:
-
-# #![allow(unused_variables)]
-#fn main() {
-pub const OAM: usize = 0x700_0000;
-
-pub fn object_attributes(slot: usize) -> ObjectAttributes {
- assert!(slot < 128);
- let ptr = VolatilePtr((OAM + slot * (size_of::<u16>() * 4)) as *mut u16);
- unsafe {
- ObjectAttributes {
- attr0: ptr.read(),
- attr1: ptr.offset(1).read(),
- attr2: ptr.offset(2).read(),
- }
- }
-}
-
-pub fn set_object_attributes(slot: usize, obj: ObjectAttributes) {
- assert!(slot < 128);
- let ptr = VolatilePtr((OAM + slot * (size_of::<u16>() * 4)) as *mut u16);
- unsafe {
- ptr.write(obj.attr0);
- ptr.offset(1).write(obj.attr1);
- ptr.offset(2).write(obj.attr2);
- }
-}
-
-#[derive(Debug, Clone, Copy, Default)]
-pub struct ObjectAttributes {
- attr0: u16,
- attr1: u16,
- attr2: u16,
-}
-#}
-Then we add a billion methods to the ObjectAttributes
type so that we can
-actually set all the different values that we want to set.
-This code block is the last thing on this page so if you don't wanna scroll past
-the whole thing you can just go to the next page.
-
-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Debug, Clone, Copy)]
-pub enum ObjectRenderMode {
- Normal,
- Affine,
- Disabled,
- DoubleAreaAffine,
-}
-
-#[derive(Debug, Clone, Copy)]
-pub enum ObjectMode {
- Normal,
- AlphaBlending,
- ObjectWindow,
-}
-
-#[derive(Debug, Clone, Copy)]
-pub enum ObjectShape {
- Square,
- Horizontal,
- Vertical,
-}
-
-#[derive(Debug, Clone, Copy)]
-pub enum ObjectOrientation {
- Normal,
- HFlip,
- VFlip,
- BothFlip,
- Affine(u8),
-}
-
-impl ObjectAttributes {
- pub fn row(&self) -> u16 {
- self.attr0 & 0b1111_1111
- }
- pub fn column(&self) -> u16 {
- self.attr1 & 0b1_1111_1111
- }
- pub fn rendering(&self) -> ObjectRenderMode {
- match (self.attr0 >> 8) & 0b11 {
- 0 => ObjectRenderMode::Normal,
- 1 => ObjectRenderMode::Affine,
- 2 => ObjectRenderMode::Disabled,
- 3 => ObjectRenderMode::DoubleAreaAffine,
- _ => unimplemented!(),
- }
- }
- pub fn mode(&self) -> ObjectMode {
- match (self.attr0 >> 0xA) & 0b11 {
- 0 => ObjectMode::Normal,
- 1 => ObjectMode::AlphaBlending,
- 2 => ObjectMode::ObjectWindow,
- _ => unimplemented!(),
- }
- }
- pub fn mosaic(&self) -> bool {
- ((self.attr0 << 3) as i16) < 0
- }
- pub fn two_fifty_six_colors(&self) -> bool {
- ((self.attr0 << 2) as i16) < 0
- }
- pub fn shape(&self) -> ObjectShape {
- match (self.attr0 >> 0xE) & 0b11 {
- 0 => ObjectShape::Square,
- 1 => ObjectShape::Horizontal,
- 2 => ObjectShape::Vertical,
- _ => unimplemented!(),
- }
- }
- pub fn orientation(&self) -> ObjectOrientation {
- if (self.attr0 >> 8) & 1 > 0 {
- ObjectOrientation::Affine((self.attr1 >> 9) as u8 & 0b1_1111)
- } else {
- match (self.attr1 >> 0xC) & 0b11 {
- 0 => ObjectOrientation::Normal,
- 1 => ObjectOrientation::HFlip,
- 2 => ObjectOrientation::VFlip,
- 3 => ObjectOrientation::BothFlip,
- _ => unimplemented!(),
- }
- }
- }
- pub fn size(&self) -> u16 {
- self.attr1 >> 0xE
- }
- pub fn tile_index(&self) -> u16 {
- self.attr2 & 0b11_1111_1111
- }
- pub fn priority(&self) -> u16 {
- self.attr2 >> 0xA
- }
- pub fn palbank(&self) -> u16 {
- self.attr2 >> 0xC
- }
- //
- pub fn set_row(&mut self, row: u16) {
- self.attr0 &= !0b1111_1111;
- self.attr0 |= row & 0b1111_1111;
- }
- pub fn set_column(&mut self, col: u16) {
- self.attr1 &= !0b1_1111_1111;
- self.attr2 |= col & 0b1_1111_1111;
- }
- pub fn set_rendering(&mut self, rendering: ObjectRenderMode) {
- const RENDERING_MASK: u16 = 0b11 << 8;
- self.attr0 &= !RENDERING_MASK;
- self.attr0 |= (rendering as u16) << 8;
- }
- pub fn set_mode(&mut self, mode: ObjectMode) {
- const MODE_MASK: u16 = 0b11 << 0xA;
- self.attr0 &= MODE_MASK;
- self.attr0 |= (mode as u16) << 0xA;
- }
- pub fn set_mosaic(&mut self, bit: bool) {
- const MOSAIC_BIT: u16 = 1 << 0xC;
- if bit {
- self.attr0 |= MOSAIC_BIT
- } else {
- self.attr0 &= !MOSAIC_BIT
- }
- }
- pub fn set_two_fifty_six_colors(&mut self, bit: bool) {
- const COLOR_MODE_BIT: u16 = 1 << 0xD;
- if bit {
- self.attr0 |= COLOR_MODE_BIT
- } else {
- self.attr0 &= !COLOR_MODE_BIT
- }
- }
- pub fn set_shape(&mut self, shape: ObjectShape) {
- self.attr0 &= 0b0011_1111_1111_1111;
- self.attr0 |= (shape as u16) << 0xE;
- }
- pub fn set_orientation(&mut self, orientation: ObjectOrientation) {
- const AFFINE_INDEX_MASK: u16 = 0b1_1111 << 9;
- self.attr1 &= !AFFINE_INDEX_MASK;
- let bits = match orientation {
- ObjectOrientation::Affine(index) => (index as u16) << 9,
- ObjectOrientation::Normal => 0,
- ObjectOrientation::HFlip => 1 << 0xC,
- ObjectOrientation::VFlip => 1 << 0xD,
- ObjectOrientation::BothFlip => 0b11 << 0xC,
- };
- self.attr1 |= bits;
- }
- pub fn set_size(&mut self, size: u16) {
- self.attr1 &= 0b0011_1111_1111_1111;
- self.attr1 |= size << 14;
- }
- pub fn set_tile_index(&mut self, index: u16) {
- self.attr2 &= !0b11_1111_1111;
- self.attr2 |= 0b11_1111_1111 & index;
- }
- pub fn set_priority(&mut self, priority: u16) {
- self.attr2 &= !0b0000_1100_0000_0000;
- self.attr2 |= (priority & 0b11) << 0xA;
- }
- pub fn set_palbank(&mut self, palbank: u16) {
- self.attr2 &= !0b1111_0000_0000_0000;
- self.attr2 |= (palbank & 0b1111) << 0xC;
- }
-}
-#}
-
-You often hear of the "Random Number Generator" in video games. First of all,
-usually a game doesn't have access to any source of "true randomness". On a PC
-you can send out a web request to random.org which
-uses atmospheric data, or even just point a camera at some lava
-lamps. Even
-then, the rate at which you'll want random numbers far exceeds the rate at which
-those services can offer them up. So instead you'll get a pseudo-random number
-generator and "seed" it with the true random data and then use that.
-However, we don't even have that! On the GBA, we can't ask any external anything
-what we should do for our initial seed. So we will not only need to come up with
-a few PRNG options, but we'll also need to come up with some seed source
-options. More than with other options within the book, I think this is an area
-where you can tailor what you do to your specific game.
-What is a Pseudo-random Number Generator?
-For those of you who somehow read The Rust Book, plus possibly The Rustonomicon,
-and then found this book, but somehow still don't know what a PRNG is... Well,
-I don't think there are many such people. Still, we'll define it anyway I
-suppose.
-
-A PRNG is any mathematical process that takes an initial input (of some fixed
-size) and then produces a series of outputs (of a possibly different size).
-
-So, if you seed your PRNG with a 32-bit value you might get 32-bit values out or
-you might get 16-bit values out, or something like that.
-We measure the quality of a PRNG based upon:
-
-- Is the output range easy to work with? Most PRNG techniques that you'll
-find these days are already hip to the idea that we'll have the fastest
-operations with numbers that match our register width and all that, so
-they're usually designed around power of two inputs and power of two outputs.
-Still, every once in a while you might find some page old page intended for
-compatibility with the
rand()
function in the C standard library that'll
-talk about something crazy like having 15-bit PRNG outputs. Stupid as it
-sounds, that's real. Avoid those. Whenever possible we want generators that
-give us uniformly distributed u8
, u16
, u32
, or whatever size value
-we're producing. From there we can mold our random bits into whatever else we
-need (eg: turning a u8
into a "1d6" roll).
-- How long does each generation cycle take? This can be tricky for us. A
-lot of the top quality PRNGs you'll find these days are oriented towards
-64-bit machines so they do a bunch of 64-bit operations. You can do that on
-a 32-bit machine if you have to, and the compiler will automatically "lower"
-the 64-bit operation into a series of 32-bit operations. What we'd really
-like to pick is something that sticks to just 32-bit operations though, since
-those will be our best candidates for fast results. We can use Compiler
-Explorer and tell it to build for the
-
thumbv6m-none-eabi
target to get a basic idea of what the ASM for a
-generator looks like. That's not our exact target, but it's the closest
-target that's shipped with the standard rust distribution.
-- What is the statistical quality of the output? This involves heavy
-amounts of math. Since computers are quite good a large amounts of repeated
-math you might wonder if there's programs for this already, and there are.
-Many in fact. They take a generator and then run it over and over and perform
-the necessary tests and report the results. I won't be explaining how to hook
-our generators up to those tools, they each have their own user manuals.
-However, if someone says that a generator "passes BigCrush" (the biggest
-suite in TestU01) or "fails PractRand" or anything similar it's useful to
-know what they're referring to. Example test suites include:
-
-
-
-Note that if a generator is called upon to produce enough output relative to its
-state size it will basically always end up failing statistical tests. This means
-that any generator with 32-bit state will always fail in any of those test sets.
-The theoretical minimum state size for any generator at all to pass the
-standard suites is 36 bits, but most generators need many more than that.
-
-I've mostly chosen to discuss generators that are towards the smaller end of the
-state size scale. In fact we'll be going over many generators that are below the
-36-bit theoretical minimum to pass all those fancy statistical tests. Why so?
-Well, we don't always need the highest possible quality generators.
-"But Lokathor!", I can already hear you shouting. "I want the highest quality
-randomness at all times! The game depends on it!", you cry out.
-Well... does it? Like, really?
-The GBA
-Pokemon
-games use a dead simple 32-bit LCG (we'll see it below). Then starting with
-the DS they moved to also using Mersenne Twister, which also fails several
-statistical tests and is one of the most predictable PRNGs around. Metroid
-Fusion
-has a 100% goofy PRNG system for enemies that would definitely never pass any
-sort of statistics tests at all. But like, those games were still awesome. Since
-we're never going to be keeping secrets safe with our PRNG, it's okay if we
-trade in some quality for something else in return (we obviously don't want to
-trade quality for nothing).
-And you have to ask yourself: Where's the space used for the Metroid Fusion
-PRNG? No where at all. They were already using everything involved for other
-things too, so they're paying no extra cost to have the randomization they do.
-How much does it cost Pokemon to throw in a 32-bit LCG? Just 4 bytes, might as
-well. How much does it cost to add in a Mersenne Twister? ~2,500 bytes ya say?
-I'm sorry what on Earth? Yeah, that sounds crazy, we're probably not doing
-that one.
-
-So, wait, why did the Pokemon developers add in the Mersenne Twister generator?
-They're smart people, surely they had a reason. Obviously we can't know for
-sure, but Mersenne Twister is terrible in a lot of ways, so what's its single
-best feature? Well, that gets us to a funky thing called k-dimensional
-equidistribution. Basically, if you take a generator's output and chop it down
-to get some value you want, with uniform generator output you can always get a
-smaller ranged uniform result (though sometimes you will have to reject a result
-and run the generator again). Imagine you have a u32
output from your
-generator. If you want a u16
value from that you can just pick either half. If
-you want a [bool; 4]
from that you can just pick four bits. However you wanna
-do it, as long as the final form of random thing we're getting needs a number of
-bits equal to or less than the number of bits that come out of a single
-generator use, we're totally fine.
-What happens if the thing you want to make requires more bits than a single
-generator's output? You obviously have to run the generator more than once and
-then stick two or more outputs together, duh. Except, that doesn't always work.
-What I mean is that obviously you can always put two u8
side by side to get a
-u16
, but if you start with a uniform u8
generator and then you run it twice
-and stick the results together you don't always get a uniform u16
generator.
-Imagine a byte generator that just does state+=1
and then outputs the state.
-It's not good by almost any standard, but it does give uniform output. Then we
-run it twice in a row, put the two bytes together, and suddenly a whole ton of
-potential u16
values can never be generated. That's what k-dimensional
-equidistribution is all about. Every uniform output generator is 1-dimensional
-equidistributed, but if you need to combine outputs and still have uniform
-results then you need a higher k
value. So why does Pokemon have Mersenne
-Twister in it? Because it's got 623-dimensional equidistribution. That means
-when you're combining PRNG calls for all those little IVs and Pokemon Abilities
-and other things you're sure to have every potential pokemon actually be a
-pokemon that the game can generate. Do you need that for most situations?
-Absolutely not. Do you need it for pokemon? No, not even then, but a lot of the
-hot new PRNGs have come out just within the past 10 years, so we can't fault
-them too much for it.
-TLDR: 1-dimensional equidistribution just means "a normal uniform generator",
-and higher k values mean "you can actually combine up to k output chains and
-maintain uniformity". Generators that aren't uniform to begin with effectively
-have a k value of 0.
-
-Finally, some generators have other features that aren't strictly quantifiable.
-Two tricks of note are "jump ahead" or "multiple streams":
-
-- Jump ahead lets you advance the generator's state by some enormous number of
-outputs in a relatively small number of operations.
-- Multi-stream generators have more than one output sequence, and then some part
-of their total state space picks a "stream" rather than being part of the
-actual seed, with each possible stream causing the potential output sequence
-to be in a different order.
-
-They're normally used as a way to do multi-threaded stuff (we don't care about
-that on GBA), but another interesting potential is to take one world seed and
-then split off a generator for each "type" of thing you'd use PRNG for (combat,
-world events, etc). This can become quite useful, where you can do things like
-procedurally generate a world region, and then when they leave the region you
-only need to store a single generator seed and a small amount of "delta"
-information for what the player changed there that you want to save, and then
-when they come back you can regenerate the region without having stored much at
-all. This is the basis for how old games with limited memory like
-Starflight did their whole thing
-(800 planets to explore on just to 5.25" floppy disks!).
-
-Oh I bet you thought we could somehow get through a section without learning
-about yet another IO register. Ha, wishful thinking.
-There's actually not much involved. Starting at 0x400_0100
there's an array of
-registers that go "data", "control", "data", "control", etc. TONC and GBATEK use
-different names here, and we'll go by the TONC names because they're much
-clearer:
-
-# #![allow(unused_variables)]
-#fn main() {
-pub const TM0D: VolatilePtr<u16> = VolatilePtr(0x400_0100 as *mut u16);
-pub const TM0CNT: VolatilePtr<u16> = VolatilePtr(0x400_0102 as *mut u16);
-
-pub const TM1D: VolatilePtr<u16> = VolatilePtr(0x400_0104 as *mut u16);
-pub const TM1CNT: VolatilePtr<u16> = VolatilePtr(0x400_0106 as *mut u16);
-
-pub const TM2D: VolatilePtr<u16> = VolatilePtr(0x400_0108 as *mut u16);
-pub const TM2CNT: VolatilePtr<u16> = VolatilePtr(0x400_010A as *mut u16);
-
-pub const TM3D: VolatilePtr<u16> = VolatilePtr(0x400_010C as *mut u16);
-pub const TM3CNT: VolatilePtr<u16> = VolatilePtr(0x400_010E as *mut u16);
-#}
-Basically there's 4 timers, numbered 0 to 3. Each one has a Data register and a
-Control register. They're all u16
and you can definitely read from all of
-them normally, but then it gets a little weird. You can also write to the
-Control portions normally, when you write to the Data portion of a timer that
-writes the value that the timer resets to, without changing its current Data
-value. So if TM0D
is paused on some value other than 5
and you write 5
to
-it, when you read it back you won't get a 5
. When the next timer run starts
-it'll begin counting at 5
instead of whatever value it currently reads as.
-The Data registers are just a u16
number, no special bits to know about.
-The Control registers are also pretty simple compared to most IO registers:
-
-- 2 bits for the Frequency: 1, 64, 256, 1024. While active, the timer's
-value will tick up once every
frequency
CPU cycles. On the GBA, 1 CPU cycle
-is about 59.59ns (2^(-24) seconds). One display controller cycle is 280,896
-CPU cycles.
-- 1 bit for Cascade Mode: If this is on the timer doesn't count on its own,
-instead it ticks up whenever the preceding timer overflows its counter (eg:
-if t0 overflows, t1 will tick up if it's in cascade mode). You still have to
-also enable this timer for it to do that (below). This naturally doesn't have
-an effect when used with timer 0.
-- 3 bits that do nothing
-- 1 bit for Interrupt: Whenever this timer overflows it will signal an
-interrupt. We still haven't gotten into interrupts yet (since you have to hand
-write some ASM for that, it's annoying), but when we cover them this is how
-you do them with timers.
-- 1 bit to Enable the timer. When you disable a timer it retains the current
-value, but when you enable it again the value jumps to whatever its currently
-assigned default value is.
-
-
-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
-#[repr(transparent)]
-pub struct TimerControl(u16);
-
-#[derive(Debug, Clone, Copy, PartialEq, Eq)]
-pub enum TimerFrequency {
- One = 0,
- SixFour = 1,
- TwoFiveSix = 2,
- OneZeroTwoFour = 3,
-}
-
-impl TimerControl {
- pub fn frequency(self) -> TimerFrequency {
- match self.0 & 0b11 {
- 0 => TimerFrequency::One,
- 1 => TimerFrequency::SixFour,
- 2 => TimerFrequency::TwoFiveSix,
- 3 => TimerFrequency::OneZeroTwoFour,
- _ => unreachable!(),
- }
- }
- pub fn cascade_mode(self) -> bool {
- self.0 & 0b100 > 0
- }
- pub fn interrupt(self) -> bool {
- self.0 & 0b100_0000 > 0
- }
- pub fn enabled(self) -> bool {
- self.0 & 0b1000_0000 > 0
- }
- //
- pub fn set_frequency(&mut self, frequency: TimerFrequency) {
- self.0 &= !0b11;
- self.0 |= frequency as u16;
- }
- pub fn set_cascade_mode(&mut self, bit: bool) {
- if bit {
- self.0 |= 0b100;
- } else {
- self.0 &= !0b100;
- }
- }
- pub fn set_interrupt(&mut self, bit: bool) {
- if bit {
- self.0 |= 0b100_0000;
- } else {
- self.0 &= !0b100_0000;
- }
- }
- pub fn set_enabled(&mut self, bit: bool) {
- if bit {
- self.0 |= 0b1000_0000;
- } else {
- self.0 &= !0b1000_0000;
- }
- }
-}
-#}
-
-Okay so how do we turns some timers into a PRNG seed? Well, usually our seed is
-a u32
. So we'll take two timers, string them together with that cascade deal,
-and then set them off. Then we wait until the user presses any key. We probably
-do this as our first thing at startup, but we might show the title and like a
-"press any key to continue" message, or something.
-
-# #![allow(unused_variables)]
-#fn main() {
-/// Mucks with the settings of Timers 0 and 1.
-unsafe fn u32_from_user_wait() -> u32 {
- let mut t = TimerControl::default();
- t.set_enabled(true);
- t.set_cascading(true);
- TM1CNT.write(t.0);
- t.set_cascading(false);
- TM0CNT.write(t.0);
- while key_input().0 == 0 {}
- t.set_enabled(false);
- TM0CNT.write(t.0);
- TM1CNT.write(t.0);
- let low = TM0D.read() as u32;
- let high = TM1D.read() as u32;
- (high << 32) | low
-}
-#}
-
-
-Our first PRNG to mention isn't one that's at all good, but it sure might be
-cute to use. It's the PRNG that Super Mario 64 had (video explanation,
-long).
-With a PRNG this simple the output of one call is also the seed to the next
-call, so we don't need to make a struct for it or anything. You're also assumed
-to just seed with a plain 0 value at startup. The generator has a painfully
-small period, and you're assumed to be looping through the state space
-constantly while the RNG goes.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub fn sm64(mut input: u16) -> u16 {
- if input == 0x560A {
- input = 0;
- }
- let mut s0 = input << 8;
- s0 ^= input;
- input = s0.rotate_left(8);
- s0 = ((s0 as u8) << 1) as u16 ^ input;
- let s1 = (s0 >> 1) ^ 0xFF80;
- if (s0 & 1) == 0 {
- if s1 == 0xAA55 {
- input = 0;
- } else {
- input = s1 ^ 0x1FF4;
- }
- } else {
- input = s1 ^ 0x8180;
- }
- input
-}
-#}
-Compiler Explorer
-If you watch the video explanation about this generator you'll note that the
-first if
checking for 0x560A
prevents you from being locked into a 2-step
-cycle, but it's only important if you want to feed bad seeds to the generator. A
-bad seed is unhelpfully defined defined as "any value that the generator can't
-output". The second if
that checks for 0xAA55
doesn't seem to be important
-at all from a mathematical perspective. It cuts the generator's period shorter
-by an arbitrary amount for no known reason. It's left in there only for
-authenticity.
-
-The Linear Congruential
-Generator is a
-well known PRNG family. You pick a multiplier and an additive and you're done.
-Right? Well, not exactly, because (as the wikipedia article explains) the values
-that you pick can easily make your LCG better or worse all on its own. You want
-a good multiplier, and you want your additive to be odd. In our example here
-we've got the values that
-Bulbapedia
-says were used in the actual GBA Pokemon games, though Bulbapedia also lists
-values for a few other other games as well.
-I don't actually know if any of the constants used in the official games are
-particularly good from a statistical viewpoint, though with only 32 bits an LCG
-isn't gonna be passing any of the major statistical tests anyway (you need way
-more bits in your LCG for that to happen). In my mind the main reason to use a
-plain LCG like this is just for the fun of using the same PRNG that an official
-Pokemon game did.
-You should not use this as your default generator if you care about quality.
-It is very fast though... if you want to set everything else on fire for
-speed. If you do, please at least remember that the highest bits are the best
-ones, so if you're after less than 32 bits you should shift the high ones down
-and keep those, or if you want to turn it into a bool
cast to i32
and then
-check if it's negative, etc.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub fn lcg32(seed: u32) -> u32 {
- seed.wrapping_mul(0x41C6_4E6D).wrapping_add(0x6073)
-}
-#}
-Compiler Explorer
-
-Note that you don't have to add a compile time constant, you could add a runtime
-value instead. Doing so allows the generator to be "multi-stream", with each
-different additive value being its own unique output stream. This true of LCGs
-as well as all the PCGs below (since they're LCG based). The examples here just
-use a fixed stream for simplicity and to save space, but if you want streams you
-can add that in for only a small amount of extra space used:
-
-# #![allow(unused_variables)]
-#fn main() {
-pub fn lcg_streaming(seed: u32, stream: u32) -> u32 {
- seed.wrapping_mul(0x41C6_4E6D).wrapping_add(stream)
-}
-#}
-With a streaming LCG you should pass the same stream value every single time. If
-you don't, then your generator will jump between streams in some crazy way and
-you lose your nice uniformity properties.
-There is the possibility of intentionally changing the stream value exactly when
-the seed lands on a pre-determined value (after the multiply and add). This
-basically makes the stream selection value's bit size (minus one bit, because
-it must be odd) count into the LCG's state bit size for calculating the overall
-period of the generator. So an LCG32 with a 32-bit stream selection would have a
-period of 2^32 * 2^31 = 2^63.
-
-# #![allow(unused_variables)]
-#fn main() {
-let next_seed = lcg_streaming(seed, stream);
-// It's cheapest to test for 0, so we pick 0
-if seed == 0 {
- stream = stream.wrapping_add(2)
-}
-#}
-However, this isn't a particularly effective way to extend the generator's
-period, and we'll see a much better extension technique below.
-
-The Permuted Congruential
-Generator family
-is the next step in LCG technology. We start with LCG output, which is good but
-not great, and then we apply one of several possible permutations to bump up the
-quality. There's basically a bunch of permutation components that are each
-defined in terms of the bit width that you're working with.
-The "default" variant of PCG, PCG32, has 64 bits of state and 32 bits of output,
-and it uses the "XSH-RR" permutation. Here we'll put together a 32 bit version
-with 16-bit output, and using the "XSH-RS" permutation (but we'll show the other
-one too for comparison).
-Of course, since PCG is based on a LCG, we have to start with a good LCG base.
-As I said above, a better or worse set of LCG constants can make your generator
-better or worse. The Wikipedia example for PCG has a good 64-bit constant, but
-not a 32-bit constant. So we gotta ask an
-expert
-about what a good 32-bit constant would be. I'm definitely not the best at
-reading math papers, but it seems that the general idea is that we want m % 8 == 5
and is_even(a)
to both hold for the values we pick. There are three
-suggested LCG multipliers in a chart on page 10. A chart that's quite hard to
-understand. Truth be told I asked several folks that are good at math papers and
-even they couldn't make sense of the chart. Eventually timutable
read the
-whole paper in depth and concluded the same as I did: that we probably want to
-pick the 32310901
option.
-For an additive value, we can pick any odd value, so we might as well pick
-something small so that we can do an immediate add. Immediate add? That sounds
-new. An immediate instruction is when one side of an operation is small enough
-that you can encode the value directly into the space that'd normally be for the
-register you want to use. It basically means one less load you have to do, if
-you're working with small enough numbers. To see what I mean compare loading
-the add value and immediate add
-value. It's something you might have seen
-frequently in x86
or x86_64
ASM output, but because a thumb instruction is
-only 16 bits total, we can only get immediate instructions if the target value
-is 8 bits or less, so we haven't used them too much ourselves yet.
-I guess we'll pick 5, because I happen to personally like the number.
-
-# #![allow(unused_variables)]
-#fn main() {
-// Demo only. The "default" PCG permutation, for use when rotate is cheaper
-pub fn pcg16_xsh_rr(seed: &mut u32) -> u16 {
- *seed = seed.wrapping_mul(32310901).wrapping_add(5);
- const INPUT_SIZE: u32 = 32;
- const OUTPUT_SIZE: u32 = 16;
- const ROTATE_BITS: u32 = 4;
- let mut out32 = *seed;
- let rot = out32 >> (INPUT_SIZE - ROTATE_BITS);
- out32 ^= out32 >> ((OUTPUT_SIZE + ROTATE_BITS) / 2);
- ((out32 >> (OUTPUT_SIZE - ROTATE_BITS)) as u16).rotate_right(rot)
-}
-
-// This has slightly worse statistics but runs much better on the GBA
-pub fn pcg16_xsh_rs(seed: &mut u32) -> u16 {
- *seed = seed.wrapping_mul(32310901).wrapping_add(5);
- const INPUT_SIZE: u32 = 32;
- const OUTPUT_SIZE: u32 = 16;
- const SHIFT_BITS: u32 = 2;
- const NEXT_MOST_BITS: u32 = 19;
- let mut out32 = *seed;
- let shift = out32 >> (INPUT_SIZE - SHIFT_BITS);
- out32 ^= out32 >> ((OUTPUT_SIZE + SHIFT_BITS) / 2);
- (out32 >> (NEXT_MOST_BITS + shift)) as u16
-}
-#}
-Compiler Explorer
-
-Having the output be smaller than the input is great because you can keep just
-the best quality bits that the LCG stage puts out, and you basically get 1 point
-of dimensional equidistribution for each bit you discard as the size goes down
-(so 32->16 gives 16). However, if your output size has to the the same as your
-input size, the PCG family is still up to the task.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub fn pcg32_rxs_m_xs(seed: &mut u32) -> u32 {
- *seed = seed.wrapping_mul(32310901).wrapping_add(5);
- let mut out32 = *seed;
- let rxs = out32 >> 28;
- out32 ^= out32 >> (4 + rxs);
- const PURE_MAGIC: u32 = 277803737;
- out32 *= PURE_MAGIC;
- out32^ (out32 >> 22)
-}
-#}
-Compiler Explorer
-This permutation is the slowest but gives the strongest statistical benefits. If
-you're going to be keeping 100% of the output bits you want the added strength
-obviously. However, the period isn't actually any longer, so each output will be
-given only once within the full period (1-dimensional equidistribution).
-
-As a general improvement to any PCG you can hook on an "extension array" to give
-yourself a longer period. It's all described in the PCG
-Paper, but here's the bullet points:
-
-- In addition to your generator's state (and possible stream) you keep an array
-of "extension" values. The array type is the same as your output type, and
-the array count must be a power of two value that's less than the maximum
-value of your state size.
-- When you run the generator, use the lowest bits to select from your
-extension array according to the array's power of two. Eg: if the size is 2
-then use the single lowest bit, if it's 4 then use the lowest 2 bits, etc.
-- Every time you run the generator, XOR the output with the selected value from
-the array.
-- Every time the generator state lands on 0, cycle the array. We want to be
-careful with what we mean here by "cycle". We want the entire pattern of
-possible array bits to occur eventually. However, we obviously can't do
-arbitrary adds for as many bits as we like, so we'll have to "carry the 1"
-between the portions of the array by hand.
-
-Here's an example using an 8 slot array and pcg16_xsh_rs
:
-
-# #![allow(unused_variables)]
-#fn main() {
-// uses pcg16_xsh_rs from above
-
-pub struct PCG16Ext8 {
- state: u32,
- ext: [u16; 8],
-}
-
-impl PCG16Ext8 {
- pub fn next_u16(&mut self) -> u16 {
- // PCG as normal.
- let mut out = pcg16_xsh_rs(&mut self.state);
- // XOR with a selected extension array value
- out ^= unsafe { self.ext.get_unchecked((self.state & !0b111) as usize) };
- // if state == 0 we cycle the array with a series of overflowing adds
- if self.state == 0 {
- let mut carry = true;
- let mut index = 0;
- while carry && index < self.ext.len() {
- let (add_output, next_carry) = self.ext[index].overflowing_add(1);
- self.ext[index] = add_output;
- carry = next_carry;
- index += 1;
- }
- }
- out
- }
-}
-#}
-Compiler Explorer
-The period gained from using an extension array is quite impressive. For a b-bit
-generator giving r-bit outputs, and k array slots, the period goes from 2^b to
-2^(k*r+b). So our 2^32 period generator has been extended to 2^160.
-Of course, we might care to seed the array itself so that it's not all 0 bits
-all the way though, but that's not strictly necessary. All 0s is a legitimate
-part of the extension cycle, so we have to pass through it at some point.
-
-The Xoshiro128** generator is
-an advancement of the Xorshift family.
-It was specifically requested, and I'm not aware of Xorshift specifically being
-used in any of my favorite games, so instead of going over Xorshift and then
-leading up to this, we'll just jump straight to this. Take care not to confuse
-this generator with the very similarly named
-Xoroshiro128** generator,
-which is the 64 bit variant. Note the extra "ro" hiding in the 64-bit version's
-name near the start.
-Anyway, weird names aside, it's fairly zippy. The biggest downside is that you
-can't have a seed state that's all 0s, and as a result 0 will be produced one
-less time than all other outputs within a full cycle, making it non-uniform by
-just a little bit. You also can't do a simple stream selection like with the LCG
-based generators, instead it has a fixed jump function that advances a seed as
-if you'd done 2^64 normal generator advancements.
-Note that Xoshiro256**
is known to fail statistical tests, so the 128 version
-is unlikely to pass them, though I admit that I didn't check myself.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub fn xoshiro128_starstar(seed: &mut [u32; 4]) -> u32 {
- let output = seed[0].wrapping_mul(5).rotate_left(7).wrapping_mul(9);
- let t = seed[1] << 9;
-
- seed[2] ^= seed[0];
- seed[3] ^= seed[1];
- seed[1] ^= seed[2];
- seed[0] ^= seed[3];
-
- seed[2] ^= t;
-
- seed[3] = seed[3].rotate_left(11);
-
- output
-}
-
-pub fn xoshiro128_starstar_jump(seed: &mut [u32; 4]) {
- const JUMP: [u32; 4] = [0x8764000b, 0xf542d2d3, 0x6fa035c3, 0x77f2db5b];
- let mut s0 = 0;
- let mut s1 = 0;
- let mut s2 = 0;
- let mut s3 = 0;
- for j in JUMP.iter() {
- for b in 0 .. 32 {
- if *j & (1 << b) > 0 {
- s0 ^= seed[0];
- s1 ^= seed[1];
- s2 ^= seed[2];
- s3 ^= seed[3];
- }
- xoshiro128_starstar(seed);
- }
- }
- seed[0] = s0;
- seed[1] = s1;
- seed[2] = s2;
- seed[3] = s3;
-}
-#}
-Compiler Explorer
-
-This is Bob Jenkins's [Small/Fast PRNG](small noncryptographic PRNG). It's a
-little faster than Xoshiro128**
(no multiplication involved), and can pass any
-statistical test that's been thrown at it.
-Interestingly the generator's period is not fixed based on the generator
-overall. It's actually set by the exact internal generator state. There's even
-six possible internal generator states where the generator becomes a fixed
-point. Because of this, we should use the verified seeding method provided.
-Using the provided seeding, the minimum period is expected to be 2^94, the
-average is about 2^126, and no seed given to the generator is likely to overlap
-with another seed's output for at least 2^64 uses.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub struct JSF32 {
- a: u32,
- b: u32,
- c: u32,
- d: u32,
-}
-
-impl JSF32 {
- pub fn new(seed: u32) -> Self {
- let mut output = JSF32 {
- a: 0xf1ea5eed,
- b: seed,
- c: seed,
- d: seed
- };
- for _ in 0 .. 20 {
- output.next();
- }
- output
- }
-
- pub fn next(&mut self) -> u32 {
- let e = self.a - self.b.rotate_left(27);
- self.a = self.b ^ self.c.rotate_left(17);
- self.b = self.c + self.d;
- self.c = self.d + e;
- self.d = e + self.a;
- self.d
- }
-}
-#}
-Compiler Explorer
-Here it's presented with (27,17), but you can also use any of the following if
-you want alternative generator flavors that use this same core technique:
-
-- (9,16), (9,24), (10,16), (10,24), (11,16), (11,24), (25,8), (25,16), (26,8),
-(26,16), (26,17), or (27,16).
-
-Note that these alternate flavors haven't had as much testing as the (27,17)
-version, though they are likely to be just as good.
-
-
-- Mersenne Twister: Gosh, 2.5k
-is just way too many for me to ever want to use this thing. If you'd really
-like to use it, there is a
-crate for it that
-already has it. Small catch, they use a ton of stuff from
std
that they
-could be importing from core
, so you'll have to fork it and patch it
-yourself to get it working on the GBA. They also stupidly depend on an old
-version of rand
, so you'll have to cut out that nonsense.
-
-
-I said earlier that you can always take a uniform output and then throw out some
-bits, and possibly the whole result, to reduce it down into a smaller range. How
-exactly does one do that? Well it turns out that it's very
-tricky to get right, and we
-could be losing as much as 60% of our execution time if we don't do it carefully.
-The best possible case is if you can cleanly take a specific number of bits
-out of your result without even doing any branching. The rest can be discarded
-or kept for another step as you choose. I know that I keep referencing Pokemon,
-but it's a very good example for the use of randomization. Each pokemon has,
-among many values, a thing called an "IV" for each of 6 stats. The IVs range
-from 0 to 31, which is total nonsense to anyone not familiar with decimal/binary
-conversions, but to us programmers that's clearly a 5 bit range. Rather than
-making math that's better for people using decimal (such as a 1-20 range or
-something like that) they went with what's easiest for the computer.
-The next best case is if you can have a designated range that you want to
-generate within that's known at compile time. This at least gives us a chance to
-write some bit of extremely specialized code that can take random bits and get
-them into range. Hopefully your range can be "close enough" to a binary range
-that you can get things into place. Example: if you want a "1d6" result then you
-can generate a u16
, look at just 3 bits (0..8
), and if they're in the range
-you're after you're good. If not you can discard those and look at the next 3
-bits. We started with 16 of them, so you get five chances before you have to run
-the generator again entirely.
-The goal here is to avoid having to do one of the worst things possible in
-computing: divmod. It's terribly expensive, even on a modern computer it's
-about 10x as expensive as any other arithmetic, and on a GBA it's even worse for
-us. We have to call into the BIOS to have it do a software division. Calling
-into the BIOS at all is about a 60 cycle overhead (for comparison, a normal
-function call is more like 30 cycles of overhead), plus the time it takes to
-do the math itself. Remember earlier how we were happy to have a savings of 5
-instructions here or there? Compared to this, all our previous efforts are
-basically useless if we can't evade having to do a divmod. You can do quite a
-bit of if
checking and potential additional generator calls before it exceeds
-the cost of having to do even a single divmod.
-
-How do we do the actual divmod when we're forced to? Easy: inline
-assembly of
-course (There's also an ARM
-oriented blog post
-about it that I found most helpful). The GBA has many BIOS
-Functions, each of which has
-a designated number. We use the
-swi
-op (short for "SoftWare Interrupt") combined with the BIOS function number that
-we want performed. Our code halts, some setup happens (hence that 60 cycles of
-overhead I mentioned), the BIOS does its thing, and then eventually control
-returns to us.
-The precise details of what the BIOS call does depends on the function number
-that we call. We'd even have to potentially mark it as volatile asm if there's
-no clear outputs, otherwise the compiler would "helpfully" eliminate it for us
-during optimization. In our case there are clear outputs. The numerator goes
-into register 0, and the denominator goes into register 1, the divmod happens,
-and then the division output is left in register 0 and the modulus output is
-left in register 1. I keep calling it "divmod" because div and modulus are two
-sides of the same coin. There's no way to do one of them faster by not doing the
-other or anything like that, so we'll first define it as a unified function that
-returns a tuple:
-
-# #![allow(unused_variables)]
-#![feature(asm)]
-#fn main() {
-// put the above at the top of any program and/or library that uses inline asm
-
-pub fn div_modulus(numerator: i32, denominator: i32) -> (i32, i32) {
- assert!(denominator != 0);
- {
- let div_out: i32;
- let mod_out: i32;
- unsafe {
- asm!(/* assembly template */ "swi 0x06"
- :/* output operands */ "={r0}"(div_out), "={r1}"(mod_out)
- :/* input operands */ "{r0}"(numerator), "{r1}"(denominator)
- :/* clobbers */ "r3"
- :/* options */
- );
- }
- (div_out, mod_out)
- }
-}
-#}
-And next, since most of the time we really do want just the div
or modulus
-without having to explicitly throw out the other half, we also define
-intermediary functions to unpack the correct values.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub fn div(numerator: i32, denominator: i32) -> i32 {
- div_modulus(numerator, denominator).0
-}
-
-pub fn modulus(numerator: i32, denominator: i32) -> i32 {
- div_modulus(numerator, denominator).1
-}
-#}
-We can generally trust the compiler to inline single line functions correctly
-even without an #[inline]
directive when it's not going cross-crate or when
-LTO is on. I'd point you to some exact output from the Compiler Explorer, but at
-the time of writing their nightly compiler is broken, and you can only use
-inline asm with a nightly compiler. Unfortunate. Hopefully they'll fix it soon
-and I can come back to this section with some links.
-Finally Those Random Ranges We Mentioned
-Of course, now that we can do divmod if we need to, let's get back to random
-numbers in ranges that aren't exact powers of two.
-yada yada yada, if you just use x % n
to place x
into the range 0..n
then
-you'll turn an unbiased value into a biased value (or you'll turn a biased value
-into an arbitrarily more biased value). You should never do this, etc etc.
-So what's a good way to get unbiased outputs? We're going to be adapting some
-CPP code from that that I first hinted at way up above. It's specifically all
-about the various ways you can go about getting unbiased random results for
-various bounds. There's actually many different methods offered, and for
-specific situations there's sometimes different winners for speed. The best
-overall performer looks like this:
-uint32_t bounded_rand(rng_t& rng, uint32_t range) {
- uint32_t x = rng();
- uint64_t m = uint64_t(x) * uint64_t(range);
- uint32_t l = uint32_t(m);
- if (l < range) {
- uint32_t t = -range;
- if (t >= range) {
- t -= range;
- if (t >= range)
- t %= range;
- }
- while (l < t) {
- x = rng();
- m = uint64_t(x) * uint64_t(range);
- l = uint32_t(m);
- }
- }
- return m >> 32;
-}
-
-And, wow, I sure don't know what a lot of that means (well, I do, but let's
-pretend I don't for dramatic effect, don't tell anyone). Let's try to pick it
-apart some.
-First, all the uint32_t
and uint64_t
are C nonsense names for what we just
-call u32
and u64
. You probably guessed that on your own.
-Next, rng_t& rng
is more properly written as rng: &rng_t
. Though, here
-there's a catch: as you can see we're calling rng
within the function, so in
-rust we'd need to declare it as rng: &mut rng_t
, because C++ doesn't track
-mutability the same as we do (barbaric, I know).
-Finally, what's rng_t
actually defined as? Well, I sure don't know, but in our
-context it's taking nothing and then spitting out a u32
. We'll also presume
-that it's a different u32
each time (not a huge leap in this context). To us
-rust programmers that means we'd want something like impl FnMut() -> u32
.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub fn bounded_rand(rng: &mut impl FnMut() -> u32, range: u32) -> u32 {
- let mut x: u32 = rng();
- let mut m: u64 = x as u64 * range as u64;
- let mut l: u32 = m as u32;
- if l < range {
- let mut t: u32 = range.wrapping_neg();
- if t >= range {
- t -= range;
- if t >= range {
- t = modulus(t, range);
- }
- }
- while l < t {
- x = rng();
- m = x as u64 * range as u64;
- l = m as u32;
- }
- }
- (m >> 32) as u32
-}
-#}
-So, now we can read it. Can we compile it? No, actually. Turns out we can't.
-Remember how our modulus
function is (i32, i32) -> i32
? Here we're doing
-(u32, u32) -> u32
. You can't just cast, modulus, and cast back. You'll get
-totally wrong results most of the time because of sign-bit stuff. Since it's
-fairly probable that range
fits in a positive i32
, its negation must
-necessarily be a negative value, which triggers exactly the bad situation where
-casting around gives us the wrong results.
-Well, that's not the worst thing in the world either, since we also didn't
-really wanna be doing those 64-bit multiplies. Let's try again with everything
-scaled down one stage:
-
-# #![allow(unused_variables)]
-#fn main() {
-pub fn bounded_rand16(rng: &mut impl FnMut() -> u16, range: u16) -> u16 {
- let mut x: u16 = rng();
- let mut m: u32 = x as u32 * range as u32;
- let mut l: u16 = m as u16;
- if l < range {
- let mut t: u16 = range.wrapping_neg();
- if t >= range {
- t -= range;
- if t >= range {
- t = modulus(t as i32, range as i32) as u16;
- }
- }
- while l < t {
- x = rng();
- m = x as u32 * range as u32;
- l = m as u16;
- }
- }
- (m >> 16) as u16
-}
-#}
-Okay, so the code compiles, and it plays nicely what the known limits of the
-various number types involved. We know that if we cast a u16
up into i32
-it's assured to fit properly and also be positive, and the output is assured to
-be smaller than the input so it'll fit when we cast it back down to u16
.
-What's even happening though? Well, this is a variation on Lemire's
-method. One of the biggest attempts at a
-speedup here is that when you have
-
-# #![allow(unused_variables)]
-#fn main() {
-a %= b;
-#}
-You can translate that into
-
-# #![allow(unused_variables)]
-#fn main() {
-if a >= b {
- a -= b;
- if a >= b {
- a %= b;
- }
-}
-#}
-Now... if we're being real with ourselves, let's just think about this for a
-moment. How often will this help us? I genuinely don't know. But I do know how
-to find out: we write a program to just enumerate all possible
-cases
-and run the code. You can't always do this, but there's not many possible u16
-values. The output is this:
-skip_all:32767
-sub_worked:10923
-had_to_modulus:21846
-Some skips:
-32769
-32770
-32771
-32772
-32773
-Some subs:
-21846
-21847
-21848
-21849
-21850
-Some mods:
-0
-1
-2
-3
-4
-
-So, about half the time, we're able to skip all our work, and about a sixth of
-the time we're able to solve it with just the subtract, with the other third of
-the time we have to do the mod. However, what I personally care about the most
-is smaller ranges, and we can see that we'll have to do the mod if our target
-range size is in 0..21846
, and just the subtract if our target range size is
-in 21846..32769
, and we can only skip all work if our range size is 32769
-and above. So that's not cool.
-But what is cool is that we're doing the modulus only once, and the rest of
-the time we've just got the cheap operations. Sounds like we can maybe try to
-cache that work and reuse a range of some particular size. We can also get that
-going pretty easily.
-
-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Debug, Clone, Copy, PartialEq, Eq)]
-pub struct RandRangeU16 {
- range: u16,
- threshold: u16,
-}
-
-impl RandRangeU16 {
- pub fn new(mut range: u16) -> Self {
- let mut threshold = range.wrapping_neg();
- if threshold >= range {
- threshold -= range;
- if threshold >= range {
- threshold = modulus(threshold as i32, range as i32) as u16;
- }
- }
- RandRangeU16 { range, threshold }
- }
-
- pub fn roll_random(&self, rng: &mut impl FnMut() -> u16) -> u16 {
- let mut x: u16 = rng();
- let mut m: u32 = x as u32 * self.range as u32;
- let mut l: u16 = m as u16;
- if l < self.range {
- while l < self.threshold {
- x = rng();
- m = x as u32 * self.range as u32;
- l = m as u16;
- }
- }
- (m >> 16) as u16
- }
-}
-#}
-What if you really want to use ranges bigger than u16
? Well, that's possible,
-but we'd want a whole new technique. Preferably one that didn't do divmod at
-all, to avoid any nastiness with sign bit nonsense. Thankfully there is one such
-method listed in the blog post, "Bitmask with Rejection (Unbiased)"
-uint32_t bounded_rand(rng_t& rng, uint32_t range) {
- uint32_t mask = ~uint32_t(0);
- --range;
- mask >>= __builtin_clz(range|1);
- uint32_t x;
- do {
- x = rng() & mask;
- } while (x > range);
- return x;
-}
-
-And in Rust
-
-# #![allow(unused_variables)]
-#fn main() {
-pub fn bounded_rand32(rng: &mut impl FnMut() -> u32, mut range: u32) -> u32 {
- let mut mask: u32 = !0;
- range -= 1;
- mask >>= (range | 1).leading_zeros();
- let mut x = rng() & mask;
- while x > range {
- x = rng() & mask;
- }
- x
-}
-#}
-Wow, that's so much less code. What the heck? Less code is supposed to be the
-faster version, why is this rated slower? Basically, because of how the math
-works out on how often you have to run the PRNG again and stuff, Lemire's method
-usually better with smaller ranges and the masking method usually works
-better with larger ranges. If your target range fits in a u8
, probably use
-Lemire's. If it's bigger than u8
, or if you need to do it just once and can't
-benefit from the cached modulus, you might want to start moving toward the
-masking version at some point in there. Obviously if your target range is more
-than a u16
then you have to use the masking method. The fact that they're each
-oriented towards different size generator outputs only makes things more
-complicated.
-Life just be that way, I guess.
-
-That was a whole lot. Let's put them in a table:
- Generator | Bytes | Output | Period | k-Dim |
- sm64 | 2 | u16 | 65,114 | 0 |
- lcg32 | 4 | u16 | 2^32 | 1 |
- pcg16_xsh_rs | 4 | u16 | 2^32 | 1 |
- pcg32_rxs_m_xs | 4 | u32 | 2^32 | 1 |
- PCG16Ext8 | 20 | u16 | 2^160 | 8 |
- xoshiro128** | 16 | u32 | 2^128-1 | 0 |
- jsf32 | 16 | u32 | ~2^126 | 0 |
-
-
-For this example to show off our new skills we'll make a "memory" game. The idea
-is that there's some face down cards and you pick one, it flips, you pick a
-second, if they match they both go away, if they don't match they both turn back
-face down. The player keeps going until all the cards are gone, then we'll deal
-the cards again.
-There are many steps to do to get such a simple seeming game going. In fact I
-stumbled a bit myself when trying to get things set up and going despite having
-written and explained all the parts so far. Accordingly, we'll take each part
-very slowly, and review things as we build up our game.
-We'll start back with a nearly blank file, calling it memory_game.rs
:
-#![feature(start)]
-#![no_std]
-
-#[panic_handler]
-fn panic(_info: &core::panic::PanicInfo) -> ! {
- loop {}
-}
-
-#[start]
-fn main(_argc: isize, _argv: *const *const u8) -> isize {
- loop {
- // TODO the whole thing
- }
-}
-
-
-First let's try to get a background going. We'll display a simple checker
-pattern just so that we know that we did something.
-Remember, backgrounds have the following essential components:
-
-- Background Palette
-- Background Tiles
-- Screenblock
-- IO Registers
-
-
-To write to the background palette memory we'll want to name a VolatilePtr
for
-it. We'll probably also want to be able to cast between different types either
-right away or later in this program, so we'll add a method for that.
-
-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Debug, Clone, Copy, PartialEq, Eq)]
-#[repr(transparent)]
-pub struct VolatilePtr<T>(pub *mut T);
-impl<T> VolatilePtr<T> {
- pub unsafe fn read(&self) -> T {
- core::ptr::read_volatile(self.0)
- }
- pub unsafe fn write(&self, data: T) {
- core::ptr::write_volatile(self.0, data);
- }
- pub fn offset(self, count: isize) -> Self {
- VolatilePtr(self.0.wrapping_offset(count))
- }
- pub fn cast<Z>(self) -> VolatilePtr<Z> {
- VolatilePtr(self.0 as *mut Z)
- }
-}
-#}
-Now we give ourselves an easy way to write a color into a palbank slot.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub const BACKGROUND_PALETTE: VolatilePtr<u16> = VolatilePtr(0x500_0000 as *mut u16);
-
-pub fn set_bg_palette_4bpp(palbank: usize, slot: usize, color: u16) {
- assert!(palbank < 16);
- assert!(slot > 0 && slot < 16);
- unsafe {
- BACKGROUND_PALETTE
- .cast::<[u16; 16]>()
- .offset(palbank as isize)
- .cast::<u16>()
- .offset(slot as isize)
- .write(color);
- }
-}
-#}
-And of course we need to bring back in our ability to build color values, as
-well as a few named colors to start us off:
-
-# #![allow(unused_variables)]
-#fn main() {
-pub const fn rgb16(red: u16, green: u16, blue: u16) -> u16 {
- blue << 10 | green << 5 | red
-}
-
-pub const WHITE: u16 = rgb16(31, 31, 31);
-pub const LIGHT_GRAY: u16 = rgb16(25, 25, 25);
-pub const DARK_GRAY: u16 = rgb16(15, 15, 15);
-#}
-Which finally allows us to set our palette colors in main
:
-fn main(_argc: isize, _argv: *const *const u8) -> isize {
- set_bg_palette_4bpp(0, 1, WHITE);
- set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
- set_bg_palette_4bpp(0, 3, DARK_GRAY);
-
-
-So we'll want some light gray tiles and some dark gray tiles. We could use a
-single tile and then swap it between palbanks to do the color selection, but for
-now we'll just use two different tiles, since we've got tons of tile space to
-spare.
-
-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Debug, Clone, Copy, Default)]
-#[repr(transparent)]
-pub struct Tile4bpp {
- pub data: [u32; 8],
-}
-
-pub const ALL_TWOS: Tile4bpp = Tile4bpp {
- data: [
- 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222,
- ],
-};
-
-pub const ALL_THREES: Tile4bpp = Tile4bpp {
- data: [
- 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333,
- ],
-};
-#}
-And then we have to have a way to put the tiles into video memory:
-
-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Clone, Copy)]
-#[repr(transparent)]
-pub struct Charblock4bpp {
- pub data: [Tile4bpp; 512],
-}
-
-pub const VRAM: VolatilePtr<Charblock4bpp> = VolatilePtr(0x0600_0000 as *mut Charblock4bpp);
-
-pub fn set_bg_tile_4bpp(charblock: usize, index: usize, tile: Tile4bpp) {
- assert!(charblock < 4);
- assert!(index < 512);
- unsafe { VRAM.offset(charblock as isize).cast::<Tile4bpp>().offset(index as isize).write(tile) }
-}
-#}
-And finally, we can call that within main
:
-fn main(_argc: isize, _argv: *const *const u8) -> isize {
- // bg palette
- set_bg_palette_4bpp(0, 1, WHITE);
- set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
- set_bg_palette_4bpp(0, 3, DARK_GRAY);
- // bg tiles
- set_bg_tile_4bpp(0, 0, ALL_TWOS);
- set_bg_tile_4bpp(0, 1, ALL_THREES);
-
-
-Screenblocks are a little weird because they take the same space as the
-charblocks (8 screenblocks per charblock). The GBA will let you mix and match
-and it's up to you to keep it all straight. We're using tiles at the base of
-charblock 0, so we'll place our screenblock at the base of charblock 1.
-First, we have to be able to make one single screenblock entry at a time:
-
-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Debug, Clone, Copy, Default)]
-#[repr(transparent)]
-pub struct RegularScreenblockEntry(u16);
-
-impl RegularScreenblockEntry {
- pub const SCREENBLOCK_ENTRY_TILE_ID_MASK: u16 = 0b11_1111_1111;
- pub fn from_tile_id(id: u16) -> Self {
- RegularScreenblockEntry(id & Self::SCREENBLOCK_ENTRY_TILE_ID_MASK)
- }
-}
-#}
-And then with 32x32 of these things we'll have a whole screenblock. Now, we
-probably won't actually make values of the screenblock type itself, but we at
-least need it to have the type declared with the correct size so that we can
-move our pointers around by the right amount.
-
-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Clone, Copy)]
-#[repr(transparent)]
-pub struct RegularScreenblock {
- pub data: [RegularScreenblockEntry; 32 * 32],
-}
-#}
-Alright, so, as I said those things are kinda big, we don't really want to be
-building them up on the stack if we can avoid it, so we'll write one straight
-into memory at the correct location.
-
-# #![allow(unused_variables)]
-#fn main() {
-pub fn checker_screenblock(slot: usize, a_entry: RegularScreenblockEntry, b_entry: RegularScreenblockEntry) {
- let mut p = VRAM.cast::<RegularScreenblock>().offset(slot as isize).cast::<RegularScreenblockEntry>();
- let mut checker = true;
- for _row in 0..32 {
- for _col in 0..32 {
- unsafe { p.write(if checker { a_entry } else { b_entry }) };
- p = p.offset(1);
- checker = !checker;
- }
- checker = !checker;
- }
-}
-#}
-And then we add this into main
-
-# #![allow(unused_variables)]
-#fn main() {
- // screenblock
- let light_entry = RegularScreenblockEntry::from_tile_id(0);
- let dark_entry = RegularScreenblockEntry::from_tile_id(1);
- checker_screenblock(8, light_entry, dark_entry);
-#}
-
-Our most important step is of course the IO register step. There's four
-different background layers, but each of them has the same format for their
-control register. For the moment, all that we care about is being able to set
-the "screen base block" value.
-
-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Clone, Copy, Default, PartialEq, Eq)]
-#[repr(transparent)]
-pub struct BackgroundControlSetting(u16);
-
-impl BackgroundControlSetting {
- pub fn from_base_block(sbb: u16) -> Self {
- BackgroundControlSetting(sbb << 8)
- }
-}
-
-pub const BG0CNT: VolatilePtr<BackgroundControlSetting> = VolatilePtr(0x400_0008 as *mut BackgroundControlSetting);
-#}
-And... that's all it takes for us to be able to add a line into main
-
-# #![allow(unused_variables)]
-#fn main() {
- // bg0 control
- unsafe { BG0CNT.write(BackgroundControlSetting::from_base_block(8)) };
-#}
-
-We're finally ready to set the display control register and get things going.
-We've slightly glossed over it so far, but when the GBA is first booted most
-everything within the address space will be all zeroed. However, the display
-control register has the "Force VBlank" bit enabled by the BIOS, giving you a
-moment to put the memory in place that you'll need for the first frame.
-So, now that have got all of our memory set, we'll overwrite the initial
-display control register value with what we'll call "just enable bg0".
-
-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Clone, Copy, Default, PartialEq, Eq)]
-#[repr(transparent)]
-pub struct DisplayControlSetting(u16);
-
-impl DisplayControlSetting {
- pub const JUST_ENABLE_BG0: DisplayControlSetting = DisplayControlSetting(1 << 8);
-}
-
-pub const DISPCNT: VolatilePtr<DisplayControlSetting> = VolatilePtr(0x0400_0000 as *mut DisplayControlSetting);
-#}
-And so finally we have a complete main
-#[start]
-fn main(_argc: isize, _argv: *const *const u8) -> isize {
- // bg palette
- set_bg_palette_4bpp(0, 1, WHITE);
- set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
- set_bg_palette_4bpp(0, 3, DARK_GRAY);
- // bg tiles
- set_bg_tile_4bpp(0, 0, ALL_TWOS);
- set_bg_tile_4bpp(0, 1, ALL_THREES);
- // screenblock
- let light_entry = RegularScreenblockEntry::from_tile_id(0);
- let dark_entry = RegularScreenblockEntry::from_tile_id(1);
- checker_screenblock(8, light_entry, dark_entry);
- // bg0 control
- unsafe { BG0CNT.write(BackgroundControlSetting::from_base_block(8)) };
- // Display Control
- unsafe { DISPCNT.write(DisplayControlSetting::JUST_ENABLE_BG0) };
- loop {
- // TODO the whole thing
- }
-}
-
-And It works, Marty! It works!
-
+
diff --git a/docs/searchindex.js b/docs/searchindex.js
index 6f61e06..0fe790f 100644
--- a/docs/searchindex.js
+++ b/docs/searchindex.js
@@ -1 +1 @@
-window.search = {"doc_urls":["introduction.html#introduction","introduction.html#style-and-purpose","introduction.html#expected-knowledge","introduction.html#getting-help","introduction.html#further-reading","ch00/index.html#chapter-0-development-setup","ch00/index.html#per-system-setup","ch00/index.html#per-project-setup","ch00/index.html#compiling","ch01/index.html#ch-1-hello-gba","ch01/hello1.html#hello1","ch01/hello1.html#a-basic-hello1-explanation","ch01/hello1.html#all-those-magic-numbers","ch01/volatile.html#volatile","ch01/volatile.html#volatile-by-default","ch01/io_registers.html#io-registers","ch01/the_display_control_register.html#the-display-control-register","ch01/the_display_control_register.html#video-modes","ch01/the_display_control_register.html#cgb-mode","ch01/the_display_control_register.html#page-flipping","ch01/the_display_control_register.html#oam-vram-and-blanking","ch01/the_display_control_register.html#screen-layers","ch01/the_display_control_register.html#in-conclusion","ch01/video_memory_intro.html#video-memory-intro","ch01/video_memory_intro.html#rgb15","ch01/video_memory_intro.html#mode-3","ch01/video_memory_intro.html#mode-4","ch01/video_memory_intro.html#mode-5","ch01/video_memory_intro.html#in-conclusion","ch01/hello2.html#hello2","ch02/index.html#ch-2-user-input","ch02/the_key_input_register.html#the-key-input-register","ch02/the_key_input_register.html#key-input-code","ch02/the_vcount_register.html#the-vcount-register","ch02/light_cycle.html#light_cycle","ch02/light_cycle.html#gameplay","ch02/light_cycle.html#operations","ch02/light_cycle.html#the-gba-crate-doesnt-quite-work-like-this","ch03/index.html#ch-3-memory-and-objects","ch03/index.html#drawing-priority","ch03/gba_memory_mapping.html#gba-memory-mapping","ch03/gba_memory_mapping.html#bios--system-rom","ch03/gba_memory_mapping.html#external-work-ram--ewram","ch03/gba_memory_mapping.html#internal-work-ram--iwram","ch03/gba_memory_mapping.html#io-registers","ch03/gba_memory_mapping.html#palette-ram--palram","ch03/gba_memory_mapping.html#transparency","ch03/gba_memory_mapping.html#video-ram--vram","ch03/gba_memory_mapping.html#object-attribute-memory--oam","ch03/gba_memory_mapping.html#game-pak-rom--flash-rom","ch03/gba_memory_mapping.html#game-pak-sram","ch03/tile_data.html#tile-data","ch03/tile_data.html#tiles","ch03/tile_data.html#charblocks","ch03/tile_data.html#image-editing","ch03/regular_backgrounds.html#regular-backgrounds","ch03/regular_backgrounds.html#tiled-video-modes","ch03/regular_backgrounds.html#get-your-palette-ready","ch03/regular_backgrounds.html#get-your-tiles-ready","ch03/regular_backgrounds.html#get-your-tilemap-ready","ch03/regular_backgrounds.html#set-your-io-registers","ch03/regular_backgrounds.html#background-control","ch03/regular_backgrounds.html#background-offset","ch03/regular_backgrounds.html#mosaic","ch03/regular_objects.html#regular-objects","ch03/regular_objects.html#objects-vs-sprites","ch03/regular_objects.html#general-object-info","ch03/regular_objects.html#ready-the-palette","ch03/regular_objects.html#ready-the-tiles","ch03/regular_objects.html#set-the-object-attributes","ch03/regular_objects.html#objectattributesattr0","ch03/regular_objects.html#objectattributesattr1","ch03/regular_objects.html#objectattributesattr2","ch03/regular_objects.html#objectattributes-summary","ch03/gba_prng.html#gba-prng","ch03/gba_prng.html#what-is-a-pseudo-random-number-generator","ch03/gba_prng.html#generator-size","ch03/gba_prng.html#k-dimensional-equidistribution","ch03/gba_prng.html#other-tricks","ch03/gba_prng.html#how-to-seed","ch03/gba_prng.html#a-timer-based-seed","ch03/gba_prng.html#various-generators","ch03/gba_prng.html#sm64-16-bit-state-16-bit-output-non-uniform-bonkers","ch03/gba_prng.html#lcg32-32-bit-state-32-bit-output-uniform","ch03/gba_prng.html#pcg16-xsh-rs-32-bit-state-16-bit-output-uniform","ch03/gba_prng.html#pcg32-rxs-m-xs-32-bit-state-32-bit-output-uniform","ch03/gba_prng.html#pcg-extension-array","ch03/gba_prng.html#xoshiro128-128-bit-state-32-bit-output-non-uniform","ch03/gba_prng.html#jsf32-128-bit-state-32-bit-output-non-uniform","ch03/gba_prng.html#other-generators","ch03/gba_prng.html#placing-a-value-in-range","ch03/gba_prng.html#calling-the-bios","ch03/gba_prng.html#finally-those-random-ranges-we-mentioned","ch03/gba_prng.html#summary-table","ch03/memory_game.html#making-a-memory-game","ch03/memory_game.html#displaying-a-background","ch03/memory_game.html#background-palette","ch03/memory_game.html#background-tiles","ch03/memory_game.html#setup-a-screenblock","ch03/memory_game.html#background-io-registers","ch03/memory_game.html#set-the-display-control-register"],"index":{"documentStore":{"docInfo":{"0":{"body":16,"breadcrumbs":1,"title":1},"1":{"body":84,"breadcrumbs":2,"title":2},"10":{"body":93,"breadcrumbs":5,"title":1},"100":{"body":132,"breadcrumbs":8,"title":4},"11":{"body":298,"breadcrumbs":7,"title":3},"12":{"body":49,"breadcrumbs":7,"title":3},"13":{"body":164,"breadcrumbs":5,"title":1},"14":{"body":99,"breadcrumbs":6,"title":2},"15":{"body":143,"breadcrumbs":6,"title":2},"16":{"body":83,"breadcrumbs":7,"title":3},"17":{"body":174,"breadcrumbs":6,"title":2},"18":{"body":25,"breadcrumbs":6,"title":2},"19":{"body":32,"breadcrumbs":6,"title":2},"2":{"body":41,"breadcrumbs":2,"title":2},"20":{"body":70,"breadcrumbs":7,"title":3},"21":{"body":54,"breadcrumbs":6,"title":2},"22":{"body":21,"breadcrumbs":5,"title":1},"23":{"body":94,"breadcrumbs":7,"title":3},"24":{"body":47,"breadcrumbs":5,"title":1},"25":{"body":41,"breadcrumbs":6,"title":2},"26":{"body":211,"breadcrumbs":6,"title":2},"27":{"body":25,"breadcrumbs":6,"title":2},"28":{"body":47,"breadcrumbs":5,"title":1},"29":{"body":359,"breadcrumbs":5,"title":1},"3":{"body":34,"breadcrumbs":2,"title":2},"30":{"body":111,"breadcrumbs":4,"title":4},"31":{"body":198,"breadcrumbs":7,"title":3},"32":{"body":548,"breadcrumbs":7,"title":3},"33":{"body":320,"breadcrumbs":6,"title":2},"34":{"body":7,"breadcrumbs":5,"title":1},"35":{"body":20,"breadcrumbs":5,"title":1},"36":{"body":337,"breadcrumbs":5,"title":1},"37":{"body":74,"breadcrumbs":9,"title":5},"38":{"body":209,"breadcrumbs":4,"title":4},"39":{"body":99,"breadcrumbs":2,"title":2},"4":{"body":48,"breadcrumbs":2,"title":2},"40":{"body":85,"breadcrumbs":7,"title":3},"41":{"body":23,"breadcrumbs":7,"title":3},"42":{"body":78,"breadcrumbs":8,"title":4},"43":{"body":44,"breadcrumbs":8,"title":4},"44":{"body":35,"breadcrumbs":6,"title":2},"45":{"body":174,"breadcrumbs":7,"title":3},"46":{"body":96,"breadcrumbs":5,"title":1},"47":{"body":116,"breadcrumbs":7,"title":3},"48":{"body":208,"breadcrumbs":8,"title":4},"49":{"body":199,"breadcrumbs":9,"title":5},"5":{"body":33,"breadcrumbs":4,"title":4},"50":{"body":64,"breadcrumbs":7,"title":3},"51":{"body":19,"breadcrumbs":6,"title":2},"52":{"body":206,"breadcrumbs":5,"title":1},"53":{"body":184,"breadcrumbs":5,"title":1},"54":{"body":151,"breadcrumbs":6,"title":2},"55":{"body":43,"breadcrumbs":6,"title":2},"56":{"body":64,"breadcrumbs":7,"title":3},"57":{"body":95,"breadcrumbs":6,"title":2},"58":{"body":376,"breadcrumbs":6,"title":2},"59":{"body":339,"breadcrumbs":6,"title":2},"6":{"body":148,"breadcrumbs":3,"title":3},"60":{"body":13,"breadcrumbs":7,"title":3},"61":{"body":117,"breadcrumbs":6,"title":2},"62":{"body":123,"breadcrumbs":6,"title":2},"63":{"body":120,"breadcrumbs":5,"title":1},"64":{"body":19,"breadcrumbs":6,"title":2},"65":{"body":55,"breadcrumbs":7,"title":3},"66":{"body":84,"breadcrumbs":7,"title":3},"67":{"body":39,"breadcrumbs":6,"title":2},"68":{"body":266,"breadcrumbs":6,"title":2},"69":{"body":74,"breadcrumbs":7,"title":3},"7":{"body":80,"breadcrumbs":3,"title":3},"70":{"body":73,"breadcrumbs":5,"title":1},"71":{"body":48,"breadcrumbs":5,"title":1},"72":{"body":16,"breadcrumbs":5,"title":1},"73":{"body":617,"breadcrumbs":6,"title":2},"74":{"body":83,"breadcrumbs":6,"title":2},"75":{"body":323,"breadcrumbs":8,"title":4},"76":{"body":158,"breadcrumbs":6,"title":2},"77":{"body":262,"breadcrumbs":7,"title":3},"78":{"body":126,"breadcrumbs":5,"title":1},"79":{"body":388,"breadcrumbs":5,"title":1},"8":{"body":423,"breadcrumbs":1,"title":1},"80":{"body":65,"breadcrumbs":7,"title":3},"81":{"body":0,"breadcrumbs":6,"title":2},"82":{"body":145,"breadcrumbs":14,"title":10},"83":{"body":281,"breadcrumbs":12,"title":8},"84":{"body":329,"breadcrumbs":14,"title":10},"85":{"body":97,"breadcrumbs":15,"title":11},"86":{"body":227,"breadcrumbs":7,"title":3},"87":{"body":188,"breadcrumbs":13,"title":9},"88":{"body":156,"breadcrumbs":13,"title":9},"89":{"body":37,"breadcrumbs":5,"title":1},"9":{"body":39,"breadcrumbs":4,"title":4},"90":{"body":240,"breadcrumbs":7,"title":3},"91":{"body":237,"breadcrumbs":6,"title":2},"92":{"body":813,"breadcrumbs":9,"title":5},"93":{"body":47,"breadcrumbs":6,"title":2},"94":{"body":101,"breadcrumbs":7,"title":3},"95":{"body":24,"breadcrumbs":6,"title":2},"96":{"body":175,"breadcrumbs":6,"title":2},"97":{"body":136,"breadcrumbs":6,"title":2},"98":{"body":153,"breadcrumbs":6,"title":2},"99":{"body":58,"breadcrumbs":7,"title":3}},"docs":{"0":{"body":"Here's a book that'll help you program in Rust on the Game Boy Advance (GBA). It's a work in progress of course, but so is most of everything in Rust.","breadcrumbs":"Introduction","id":"0","title":"Introduction"},"1":{"body":"I'm out to teach you how to program in Rust on the GBA, obviously. However, while there is a gba crate, and while I genuinely believe it to be a good and useful crate for GBA programming, we will not be using the gba crate within this book. In fact we won't be using any crates at all. We can call it the Handmade Hero approach, if you like. I don't want to just teach you how to use the gba crate, I want to teach you what you'd need to know to write the crate from scratch if it wasn't there. Each chapter of the book will focus on a few things you'll need to know about GBA programming and then present a fully self-contained example that puts those ideas into action. Just one file per example, no dependencies, no external assets, no fuss. The examples will be in the text of the book within code blocks, but also you can find them in the examples directory of the repo if you want to get them that way.","breadcrumbs":"Style and Purpose","id":"1","title":"Style and Purpose"},"10":{"body":"Our first example will be a totally minimal, full magic number crazy town. Ready? Here goes: hello1.rs #![feature(start)]\n#![no_std] #[panic_handler]\nfn panic(_info: &core::panic::PanicInfo) -> ! { loop {}\n} #[start]\nfn main(_argc: isize, _argv: *const *const u8) -> isize { unsafe { (0x04000000 as *mut u16).write_volatile(0x0403); (0x06000000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F); (0x06000000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0); (0x06000000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00); loop {} }\n} Throw that into your project skeleton, build the program (as described back in Chapter 0), and give it a run in your emulator. You should see a red, green, and blue dot close-ish to the middle of the screen. If you don't, something already went wrong. Double check things, phone a friend, write your senators, try asking Ketsuban on the Rust Community Discord , until you're able to get your three dots going.","breadcrumbs":"Ch 1: Hello GBA » hello1","id":"10","title":"hello1"},"100":{"body":"We're finally ready to set the display control register and get things going. We've slightly glossed over it so far, but when the GBA is first booted most everything within the address space will be all zeroed. However, the display control register has the \"Force VBlank\" bit enabled by the BIOS, giving you a moment to put the memory in place that you'll need for the first frame. So, now that have got all of our memory set, we'll overwrite the initial display control register value with what we'll call \"just enable bg0\". #[derive(Clone, Copy, Default, PartialEq, Eq)]\n#[repr(transparent)]\npub struct DisplayControlSetting(u16); impl DisplayControlSetting { pub const JUST_ENABLE_BG0: DisplayControlSetting = DisplayControlSetting(1 << 8);\n} pub const DISPCNT: VolatilePtr