diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index 533c0e7..91f1553 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -1,42 +1,42 @@ # Rust GBA Guide -* Introduction - * Reader Requirements - * Book Goals and Style - * Getting Outside Help - * Development Setup - * Hello Magic -* GBA Limitations - * No Floats - * Core Only - * Volatile Destination -* Broad Concepts - * CPU - * BIOS - * Working RAM - * IO Registers - * Palette RAM - * Video RAM - * Object Attribute Memory - * Game Pak ROM / Flash ROM - * Save RAM -* Video - * RBG15 Color - * Bitmap Modes - * Tiled Modes - * Affine Math - * Special Effects -* Non-Video - * Buttons - * Timers - * Direct Memory Access - * Sound - * Interrupts - * Network - * Game Pak -* Examples - * hello_magic - * hello_world - * light_cycle - * bg_demo +* [Introduction](introduction/index.md) + * [Reader Requirements](introduction/requirements.md) + * [Book Goals and Style](introduction/goals_and_style.md) + * [Getting Outside Help](introduction/outside_help.md) + * [Development Setup](introduction/setup.md) + * [Hello Magic](introduction/magic.md) +* [GBA Limitations](limitations/index.md) + * [No Floats](limitations/no_floats.md) + * [Core Only](limitations/core_only.md) + * [Volatile Destination](limitations/volatile_destination.md) +* [Broad Concepts](concepts/index.md) + * [CPU](concepts/cpu.md) + * [BIOS](concepts/bios.md) + * [Working RAM](concepts/wram.md) + * [IO Registers](concepts/io-registers.md) + * [Palette RAM](concepts/palram.md) + * [Video RAM](concepts/vram.md) + * [Object Attribute Memory](concepts/oam.md) + * [Game Pak ROM / Flash ROM](concepts/rom.md) + * [Save RAM](concepts/sram.md) +* [Video](video/index.md) + * [RBG15 Color](video/rgb15.md) + * [Bitmap Modes](video/bitmap.md) + * [Tiled Modes](video/tiled_modes.md) + * [Affine Math](video/affine_math.md) + * [Special Effects](video/specials.md) +* [Non-Video](non-video/index.md) + * [Buttons](non-video/buttons.md) + * [Timers](non-video/timers.md) + * [Direct Memory Access](non-video/dma.md) + * [Sound](non-video/sound.md) + * [Interrupts](non-video/interrupts.md) + * [Network](non-video/network.md) + * [Game Pak](non-video/game_pak.md) +* [Examples](examples/index.md) + * [hello_magic](examples/hello_magic.md) + * [hello_world](examples/hello_world.md) + * [light_cycle](examples/light_cycle.md) + * [bg_demo](examples/bg_demo.md) diff --git a/book/src/concepts/bios.md b/book/src/concepts/bios.md new file mode 100644 index 0000000..435d69f --- /dev/null +++ b/book/src/concepts/bios.md @@ -0,0 +1 @@ +# BIOS diff --git a/book/src/concepts/cpu.md b/book/src/concepts/cpu.md new file mode 100644 index 0000000..894d34b --- /dev/null +++ b/book/src/concepts/cpu.md @@ -0,0 +1 @@ +# CPU diff --git a/book/src/concepts/index.md b/book/src/concepts/index.md new file mode 100644 index 0000000..864e1ff --- /dev/null +++ b/book/src/concepts/index.md @@ -0,0 +1 @@ +# Broad Concepts diff --git a/book/src/concepts/io-registers.md b/book/src/concepts/io-registers.md new file mode 100644 index 0000000..3a3e53f --- /dev/null +++ b/book/src/concepts/io-registers.md @@ -0,0 +1 @@ +# IO Registers diff --git a/book/src/concepts/oam.md b/book/src/concepts/oam.md new file mode 100644 index 0000000..78d8d02 --- /dev/null +++ b/book/src/concepts/oam.md @@ -0,0 +1 @@ +# Object Attribute Memory diff --git a/book/src/concepts/palram.md b/book/src/concepts/palram.md new file mode 100644 index 0000000..5353b1c --- /dev/null +++ b/book/src/concepts/palram.md @@ -0,0 +1 @@ +# Palette RAM diff --git a/book/src/concepts/rom.md b/book/src/concepts/rom.md new file mode 100644 index 0000000..753857b --- /dev/null +++ b/book/src/concepts/rom.md @@ -0,0 +1 @@ +# Game Pak ROM / Flash ROM diff --git a/book/src/concepts/sram.md b/book/src/concepts/sram.md new file mode 100644 index 0000000..aa68e68 --- /dev/null +++ b/book/src/concepts/sram.md @@ -0,0 +1 @@ +# Save RAM diff --git a/book/src/concepts/vram.md b/book/src/concepts/vram.md new file mode 100644 index 0000000..e6915fd --- /dev/null +++ b/book/src/concepts/vram.md @@ -0,0 +1 @@ +# Video RAM diff --git a/book/src/concepts/wram.md b/book/src/concepts/wram.md new file mode 100644 index 0000000..fb0f112 --- /dev/null +++ b/book/src/concepts/wram.md @@ -0,0 +1 @@ +# Working RAM diff --git a/book/src/examples/bg_demo.md b/book/src/examples/bg_demo.md new file mode 100644 index 0000000..8cc8680 --- /dev/null +++ b/book/src/examples/bg_demo.md @@ -0,0 +1 @@ +# bg_demo diff --git a/book/src/examples/hello_magic.md b/book/src/examples/hello_magic.md new file mode 100644 index 0000000..ba1a038 --- /dev/null +++ b/book/src/examples/hello_magic.md @@ -0,0 +1 @@ +# hello_magic diff --git a/book/src/examples/hello_world.md b/book/src/examples/hello_world.md new file mode 100644 index 0000000..115e729 --- /dev/null +++ b/book/src/examples/hello_world.md @@ -0,0 +1 @@ +# hello_world diff --git a/book/src/examples/index.md b/book/src/examples/index.md new file mode 100644 index 0000000..df635b4 --- /dev/null +++ b/book/src/examples/index.md @@ -0,0 +1 @@ +# Examples diff --git a/book/src/examples/light_cycle.md b/book/src/examples/light_cycle.md new file mode 100644 index 0000000..f72194e --- /dev/null +++ b/book/src/examples/light_cycle.md @@ -0,0 +1 @@ +# light_cycle diff --git a/book/src/introduction/goals_and_style.md b/book/src/introduction/goals_and_style.md new file mode 100644 index 0000000..d901353 --- /dev/null +++ b/book/src/introduction/goals_and_style.md @@ -0,0 +1 @@ +# Book Goals and Style diff --git a/book/src/introduction/index.md b/book/src/introduction/index.md new file mode 100644 index 0000000..e10b99d --- /dev/null +++ b/book/src/introduction/index.md @@ -0,0 +1 @@ +# Introduction diff --git a/book/src/introduction/magic.md b/book/src/introduction/magic.md new file mode 100644 index 0000000..d70a7ca --- /dev/null +++ b/book/src/introduction/magic.md @@ -0,0 +1 @@ +# Hello Magic diff --git a/book/src/introduction/outside_help.md b/book/src/introduction/outside_help.md new file mode 100644 index 0000000..ce89685 --- /dev/null +++ b/book/src/introduction/outside_help.md @@ -0,0 +1 @@ +# Getting Outside Help diff --git a/book/src/introduction/requirements.md b/book/src/introduction/requirements.md new file mode 100644 index 0000000..0cdf4ef --- /dev/null +++ b/book/src/introduction/requirements.md @@ -0,0 +1 @@ +# Reader Requirements diff --git a/book/src/introduction/setup.md b/book/src/introduction/setup.md new file mode 100644 index 0000000..4698bc8 --- /dev/null +++ b/book/src/introduction/setup.md @@ -0,0 +1 @@ +# Development Setup diff --git a/book/src/limitations/core_only.md b/book/src/limitations/core_only.md new file mode 100644 index 0000000..c213ee4 --- /dev/null +++ b/book/src/limitations/core_only.md @@ -0,0 +1 @@ +# Core Only diff --git a/book/src/limitations/index.md b/book/src/limitations/index.md new file mode 100644 index 0000000..71e12e6 --- /dev/null +++ b/book/src/limitations/index.md @@ -0,0 +1 @@ +# GBA Limitations diff --git a/book/src/limitations/no_floats.md b/book/src/limitations/no_floats.md new file mode 100644 index 0000000..e27e28b --- /dev/null +++ b/book/src/limitations/no_floats.md @@ -0,0 +1 @@ +# No Floats diff --git a/book/src/limitations/volatile_destination.md b/book/src/limitations/volatile_destination.md new file mode 100644 index 0000000..693d129 --- /dev/null +++ b/book/src/limitations/volatile_destination.md @@ -0,0 +1 @@ +# Volatile Destination diff --git a/book/src/non-video/buttons.md b/book/src/non-video/buttons.md new file mode 100644 index 0000000..8694b48 --- /dev/null +++ b/book/src/non-video/buttons.md @@ -0,0 +1 @@ +# Buttons diff --git a/book/src/non-video/dma.md b/book/src/non-video/dma.md new file mode 100644 index 0000000..08754f5 --- /dev/null +++ b/book/src/non-video/dma.md @@ -0,0 +1 @@ +# Direct Memory Access diff --git a/book/src/non-video/game_pak.md b/book/src/non-video/game_pak.md new file mode 100644 index 0000000..7e1ac79 --- /dev/null +++ b/book/src/non-video/game_pak.md @@ -0,0 +1 @@ +# Game Pak diff --git a/book/src/non-video/index.md b/book/src/non-video/index.md new file mode 100644 index 0000000..d7d1113 --- /dev/null +++ b/book/src/non-video/index.md @@ -0,0 +1 @@ +# Non-Video diff --git a/book/src/non-video/interrupts.md b/book/src/non-video/interrupts.md new file mode 100644 index 0000000..81df6c7 --- /dev/null +++ b/book/src/non-video/interrupts.md @@ -0,0 +1 @@ +# Interrupts diff --git a/book/src/non-video/network.md b/book/src/non-video/network.md new file mode 100644 index 0000000..05db335 --- /dev/null +++ b/book/src/non-video/network.md @@ -0,0 +1 @@ +# Network diff --git a/book/src/non-video/sound.md b/book/src/non-video/sound.md new file mode 100644 index 0000000..26f833d --- /dev/null +++ b/book/src/non-video/sound.md @@ -0,0 +1 @@ +# Sound diff --git a/book/src/non-video/timers.md b/book/src/non-video/timers.md new file mode 100644 index 0000000..2f76034 --- /dev/null +++ b/book/src/non-video/timers.md @@ -0,0 +1 @@ +# Timers diff --git a/book/src/video/affine_math.md b/book/src/video/affine_math.md new file mode 100644 index 0000000..1cd510c --- /dev/null +++ b/book/src/video/affine_math.md @@ -0,0 +1 @@ +# Affine Math diff --git a/book/src/video/bitmap.md b/book/src/video/bitmap.md new file mode 100644 index 0000000..194e111 --- /dev/null +++ b/book/src/video/bitmap.md @@ -0,0 +1 @@ +# Bitmap Modes diff --git a/book/src/video/index.md b/book/src/video/index.md new file mode 100644 index 0000000..f076b5d --- /dev/null +++ b/book/src/video/index.md @@ -0,0 +1 @@ +# Video diff --git a/book/src/video/rgb15.md b/book/src/video/rgb15.md new file mode 100644 index 0000000..adf5784 --- /dev/null +++ b/book/src/video/rgb15.md @@ -0,0 +1 @@ +# RBG15 Color diff --git a/book/src/video/specials.md b/book/src/video/specials.md new file mode 100644 index 0000000..b33a9a2 --- /dev/null +++ b/book/src/video/specials.md @@ -0,0 +1 @@ +# Special Effects diff --git a/book/src/video/tiled_modes.md b/book/src/video/tiled_modes.md new file mode 100644 index 0000000..35a02fb --- /dev/null +++ b/book/src/video/tiled_modes.md @@ -0,0 +1 @@ +# Tiled Modes diff --git a/docs/ch00/index.html b/docs/ch00/index.html deleted file mode 100644 index 555d97f..0000000 --- a/docs/ch00/index.html +++ /dev/null @@ -1,347 +0,0 @@ - - - - - - Ch 0: Development Setup - Rust GBA Guide - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
- - - - - - - - - - -
-
-

Chapter 0: Development Setup

-

Before you can build a GBA game you'll have to follow some special steps to -setup the development environment. Perhaps unfortunately, there's enough detail -here to warrant a mini-chapter all on its own.

-

Once again, extra special thanks to Ketsuban, who first dove into how to -make this all work with rust and then shared it with the world.

-

Per System Setup

-

Obviously you need your computer to have a working rust -installation. However, you'll also need to ensure that -you're using a nightly toolchain (we will need it for inline assembly, among -other potential useful features). You can run rustup default nightly to set -nightly as the system wide default toolchain, or you can use a toolchain -file to use -nightly just on a specific project, but either way we'll be assuming the use of -nightly from now on. You'll also need the rust-src component so that -cargo-xbuild will be able to compile the core crate for us in a bit, so run -rustup component add rust-src.

-

Next, you need devkitpro. They've -got a graphical installer for Windows that runs nicely, and I guess pacman -support on Linux (I'm on Windows so I haven't tried the Linux install myself). -We'll be using a few of their general binutils for the arm-none-eabi target, -and we'll also be using some of their tools that are specific to GBA -development, so even if you already have the right binutils for whatever -reason, you'll still want devkitpro for the gbafix utility.

-
    -
  • On Windows you'll want something like C:\devkitpro\devkitARM\bin and -C:\devkitpro\tools\bin to be added to your -PATH, depending on where you -installed it to and such.
  • -
  • On Linux you can use pacman to get it, and the default install puts the stuff -in /opt/devkitpro/devkitARM/bin and /opt/devkitpro/tools/bin. If you need -help you can look in our repository's -.travis.yml -file to see exactly what our CI does.
  • -
-

Finally, you'll need cargo-xbuild. Just run cargo install cargo-xbuild and -cargo will figure it all out for you.

-

Per Project Setup

-

Once the system wide tools are ready, you'll need some particular files each -time you want to start a new project. You can find them in the root of the -rust-console/gba repo.

-
    -
  • thumbv4-none-agb.json describes the overall GBA to cargo-xbuild (and LLVM) -so it knows what to do. Technically the GBA is thumbv4-none-eabi, but we -change the eabi to agb so that we can distinguish it from other eabi -devices when using cfg flags.
  • -
  • crt0.s describes some ASM startup stuff. If you have more ASM to place here -later on this is where you can put it. You also need to build it into a -crt0.o file before it can actually be used, but we'll cover that below.
  • -
  • linker.ld tells the linker all the critical info about the layout -expectations that the GBA has about our program, and that it should also -include the crt0.o file with our compiled rust code.
  • -
-

Compiling

-

The next steps only work once you've got some source code to build. If you need -a quick test, copy the hello1.rs file from our examples directory in the -repository.

-

Once you've got something to build, you perform the following steps:

-
    -
  • -

    arm-none-eabi-as crt0.s -o crt0.o

    -
      -
    • This builds your text format crt0.s file into object format crt0.o. You -don't need to perform it every time, only when crt0.s changes, but you -might as well do it every time so that you never forget to because it's a -practically instant operation.
    • -
    -
  • -
  • -

    cargo xbuild --target thumbv4-none-agb.json

    -
      -
    • This builds your Rust source. It accepts most of the normal options, such -as --release, and options, such as --bin foo or --examples, that you'd -expect cargo to accept.
    • -
    • You can not build and run tests this way, because they require std, -which the GBA doesn't have. If you want you can still run some of your -project's tests with cargo test --lib or similar, but that builds for your -local machine, so anything specific to the GBA (such as reading and writing -registers) won't be testable that way. If you want to isolate and try out -some piece code running on the GBA you'll unfortunately have to make a demo -for it in your examples/ directory and then run the demo in an emulator -and see if it does what you expect.
    • -
    • The file extension is important. cargo xbuild takes it as a flag to -compile dependencies with the same sysroot, so you can include crates -normally. Well, creates that work in the GBA's limited environment, but you -get the idea.
    • -
    -
  • -
-

At this point you have an ELF binary that some emulators can execute directly. -This is helpful because it'll have debug symbols and all that, assuming a debug -build. Specifically, mgba 0.7 beta -1 can do it, and perhaps other -emulators can also do it.

-

However, if you want a "real" ROM that works in all emulators and that you could -transfer to a flash cart there's a little more to do.

-
    -
  • -

    arm-none-eabi-objcopy -O binary target/thumbv4-none-agb/MODE/BIN_NAME target/ROM_NAME.gba

    -
      -
    • This will perform an objcopy on our -program. Here I've named the program arm-none-eabi-objcopy, which is what -devkitpro calls their version of objcopy that's specific to the GBA in the -Windows install. If the program isn't found under that name, have a look in -your installation directory to see if it's under a slightly different name -or something.
    • -
    • As you can see from reading the man page, the -O binary option takes our -lovely ELF file with symbols and all that and strips it down to basically a -bare memory dump of the program.
    • -
    • The next argument is the input file. You might not be familiar with how -cargo arranges stuff in the target/ directory, and between RLS and -cargo doc and stuff it gets kinda crowded, so it goes like this: -
        -
      • Since our program was built for a non-local target, first we've got a -directory named for that target, thumbv4-none-agb/
      • -
      • Next, the "MODE" is either debug/ or release/, depending on if we had -the --release flag included. You'll probably only be packing release -mode programs all the way into GBA roms, but it works with either mode.
      • -
      • Finally, the name of the program. If your program is something out of the -project's src/bin/ then it'll be that file's name, or whatever name you -configured for the bin in the Cargo.toml file. If your program is -something out of the project's examples/ directory there will be a -similar examples/ sub-directory first, and then the example's name.
      • -
      -
    • -
    • The final argument is the output of the objcopy, which I suggest putting -at just the top level of the target/ directory. Really it could go -anywhere, but if you're using git then it's likely that your .gitignore -file is already setup to exclude everything in target/, so this makes sure -that your intermediate game builds don't get checked into your git.
    • -
    -
  • -
  • -

    gbafix target/ROM_NAME.gba

    -
      -
    • The gbafix tool also comes from devkitpro. The GBA is very picky about a -ROM's format, and gbafix patches the ROM's header and such so that it'll -work right. Unlike objcopy, this tool is custom built for GBA development, -so it works just perfectly without any arguments beyond the file name. The -ROM is patched in place, so we don't even need to specify a new destination.
    • -
    -
  • -
-

And you're finally done!

-

Of course, you probably want to make a script for all that, but it's up to you. -On our own project we have it mostly set up within a Makefile.toml which runs -using the cargo-make plugin. It's -not really the best plugin, but it's what's available.

- -
- - -
-
- - - -
- - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/ch01/hello1.html b/docs/ch01/hello1.html deleted file mode 100644 index 7c49512..0000000 --- a/docs/ch01/hello1.html +++ /dev/null @@ -1,331 +0,0 @@ - - - - - - hello1 - Rust GBA Guide - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
- - - - - - - - - - -
-
-

hello1

-

Our first example will be a totally minimal, full magic number crazy town. -Ready? Here goes:

-

hello1.rs

-
#![feature(start)]
-#![no_std]
-
-#[panic_handler]
-fn panic(_info: &core::panic::PanicInfo) -> ! {
-  loop {}
-}
-
-#[start]
-fn main(_argc: isize, _argv: *const *const u8) -> isize {
-  unsafe {
-    (0x04000000 as *mut u16).write_volatile(0x0403);
-    (0x06000000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
-    (0x06000000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
-    (0x06000000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
-    loop {}
-  }
-}
-
-

Throw that into your project skeleton, build the program (as described back in -Chapter 0), and give it a run in your emulator. You should see a red, green, and -blue dot close-ish to the middle of the screen. If you don't, something already -went wrong. Double check things, phone a friend, write your senators, try asking -Ketsuban on the Rust Community Discord, -until you're able to get your three dots going.

-

A basic hello1 explanation

-

So, what just happened? Even if you're used to Rust that might look pretty -strange. We'll go over most of the little parts right here, and then bigger -parts will get their own sections.

-

-# #![allow(unused_variables)]
-#![feature(start)]
-#fn main() {
-#}
-

This enables the start -feature, -which you would normally be able to read about in the unstable book, except that -the book tells you nothing at all except to look at the tracking -issue.

-

Basically, a GBA game is even more low-level than the normal amount of -low-level that you get from Rust, so we have to tell the compiler to account for -that by specifying a #[start], and we need this feature on to do that.

-

-# #![allow(unused_variables)]
-#![no_std]
-#fn main() {
-#}
-

There's no standard library available on the GBA, so we'll have to live a core -only life.

-

-# #![allow(unused_variables)]
-#fn main() {
-#[panic_handler]
-fn panic(_info: &core::panic::PanicInfo) -> ! {
-  loop {}
-}
-#}
-

This sets our panic -handler. -Basically, if we somehow trigger a panic, this is where the program goes. -However, right now we don't know how to get any sort of message out to the user -so... we do nothing at all. We can't even return from here, so we just sit in -an infinite loop. The player will have to reset the universe from the outside.

-
#[start]
-fn main(_argc: isize, _argv: *const *const u8) -> isize {
-
-

This is our #[start]. We call it main, but it's not like a main that you'd -see in a Rust program. It's more like the sort of main that you'd see in a C -program, but it's still not that either. If you compile a #[start] program -for a target with an OS such as arm-none-eabi-nm you can open up the debug -info and see that your result will have the symbol for the C main along side -the symbol for the start main that we write here. Our start main is just its -own unique thing, and the inputs and outputs have to be like that because that's -how #[start] is specified to work in Rust.

-

If you think about it for a moment you'll probably realize that, those inputs -and outputs are totally useless to us on a GBA. There's no OS on the GBA to call -our program, and there's no place for our program to "return to" when it's done.

-

Side note: if you want to learn more about stuff "before main gets called" you -can watch a great CppCon talk by -Matt Godbolt (yes, that Godbolt) where he delves into quite a bit of it. The -talk doesn't really apply to the GBA, but it's pretty good.

-

-# #![allow(unused_variables)]
-#fn main() {
-  unsafe {
-#}
-

I hope you're all set for some unsafe, because there's a lot of it to be had.

-

-# #![allow(unused_variables)]
-#fn main() {
-    (0x04000000 as *mut u16).write_volatile(0x0403);
-#}
-

Sure!

-

-# #![allow(unused_variables)]
-#fn main() {
-    (0x06000000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
-    (0x06000000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
-    (0x06000000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
-#}
-

Ah, of course.

-

-# #![allow(unused_variables)]
-#fn main() {
-    loop {}
-  }
-}
-#}
-

And, as mentioned above, there's no place for a GBA program to "return to", so -we can't ever let main try to return there. Instead, we go into an infinite -loop that does nothing. The fact that this doesn't ever return an isize -value doesn't seem to bother Rust, because I guess we're at least not returning -any other type of thing instead.

-

Fun fact: unlike in C++, an infinite loop with no side effects isn't Undefined -Behavior for us rustaceans... semantically. In truth LLVM has a known -bug in this area, so we won't -actually be relying on empty loops in any future programs.

-

All Those Magic Numbers

-

Alright, I cheated quite a bit in the middle there. The program works, but I -didn't really tell you why because I didn't really tell you what any of those -magic numbers mean or do.

-
    -
  • 0x04000000 is the address of an IO Register called the Display Control.
  • -
  • 0x06000000 is the start of Video RAM.
  • -
-

So we write some magic to the display control register once, then we write some -other magic to three magic locations in the Video RAM. Somehow that shows three -dots. Gotta read on to find out why!

- -
- - -
-
- - - -
- - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/ch01/hello2.html b/docs/ch01/hello2.html deleted file mode 100644 index 2acf3f0..0000000 --- a/docs/ch01/hello2.html +++ /dev/null @@ -1,319 +0,0 @@ - - - - - - hello2 - Rust GBA Guide - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
- - - - - - - - - - -
-
-

hello2

-

Okay so let's have a look again:

-

hello1

-
#![feature(start)]
-#![no_std]
-
-#[panic_handler]
-fn panic(_info: &core::panic::PanicInfo) -> ! {
-  loop {}
-}
-
-#[start]
-fn main(_argc: isize, _argv: *const *const u8) -> isize {
-  unsafe {
-    (0x04000000 as *mut u16).write_volatile(0x0403);
-    (0x06000000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
-    (0x06000000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
-    (0x06000000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
-    loop {}
-  }
-}
-
-

Now let's clean this up so that it's clearer what's going on.

-

First we'll label that display control stuff, including using the VolatilePtr -type from the volatile explanation:

-

-# #![allow(unused_variables)]
-#fn main() {
-pub const DISPCNT: VolatilePtr<u16> = VolatilePtr(0x04000000 as *mut u16);
-pub const MODE3: u16 = 3;
-pub const BG2: u16 = 0b100_0000_0000;
-#}
-

Next we make some const values for the actual pixel drawing

-

-# #![allow(unused_variables)]
-#fn main() {
-pub const VRAM: usize = 0x06000000;
-pub const SCREEN_WIDTH: isize = 240;
-#}
-

Note that VRAM has to be interpreted in different ways depending on mode, so we -just leave it as usize and we'll cast it into the right form closer to the -actual use.

-

Next we want a small helper function for putting together a color value. -Happily, this one can even be declared as a const function. At the time of -writing, we've got the "minimal const fn" support in nightly. It really is quite -limited, but I'm happy to let rustc and LLVM pre-compute as much as they can -when it comes to the GBA's tiny CPU.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub const fn rgb16(red: u16, green: u16, blue: u16) -> u16 {
-  blue << 10 | green << 5 | red
-}
-#}
-

Finally, we'll make a function for drawing a pixel in Mode 3. Even though it's -just a one-liner, having the "important parts" be labeled as function arguments -usually helps you think about it a lot better.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub unsafe fn mode3_pixel(col: isize, row: isize, color: u16) {
-  VolatilePtr(VRAM as *mut u16).offset(col + row * SCREEN_WIDTH).write(color);
-}
-#}
-

So now we've got this:

-

hello2

-
#![feature(start)]
-#![no_std]
-
-#[panic_handler]
-fn panic(_info: &core::panic::PanicInfo) -> ! {
-  loop {}
-}
-
-#[start]
-fn main(_argc: isize, _argv: *const *const u8) -> isize {
-  unsafe {
-    DISPCNT.write(MODE3 | BG2);
-    mode3_pixel(120, 80, rgb16(31, 0, 0));
-    mode3_pixel(136, 80, rgb16(0, 31, 0));
-    mode3_pixel(120, 96, rgb16(0, 0, 31));
-    loop {}
-  }
-}
-
-#[derive(Debug, Clone, Copy, Hash, PartialEq, Eq, PartialOrd, Ord)]
-#[repr(transparent)]
-pub struct VolatilePtr<T>(pub *mut T);
-impl<T> VolatilePtr<T> {
-  pub unsafe fn read(&self) -> T {
-    core::ptr::read_volatile(self.0)
-  }
-  pub unsafe fn write(&self, data: T) {
-    core::ptr::write_volatile(self.0, data);
-  }
-  pub unsafe fn offset(self, count: isize) -> Self {
-    VolatilePtr(self.0.wrapping_offset(count))
-  }
-}
-
-pub const DISPCNT: VolatilePtr<u16> = VolatilePtr(0x04000000 as *mut u16);
-pub const MODE3: u16 = 3;
-pub const BG2: u16 = 0b100_0000_0000;
-
-pub const VRAM: usize = 0x06000000;
-pub const SCREEN_WIDTH: isize = 240;
-
-pub const fn rgb16(red: u16, green: u16, blue: u16) -> u16 {
-  blue << 10 | green << 5 | red
-}
-
-pub unsafe fn mode3_pixel(col: isize, row: isize, color: u16) {
-  VolatilePtr(VRAM as *mut u16).offset(col + row * SCREEN_WIDTH).write(color);
-}
-
-

Exact same program that we started with, but much easier to read.

-

Of course, in the full gba crate that this book is a part of we have these and -other elements all labeled and sorted out for you (not identically, but -similarly). Still, for educational purposes it's often best to do it yourself at -least once.

- -
- - -
-
- - - -
- - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/ch01/the_display_control_register.html b/docs/ch01/the_display_control_register.html deleted file mode 100644 index 717bc87..0000000 --- a/docs/ch01/the_display_control_register.html +++ /dev/null @@ -1,284 +0,0 @@ - - - - - - The Display Control Register - Rust GBA Guide - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
- - - - - - - - - - -
-
-

The Display Control Register

-

The display control register is our first actual IO Register. GBATEK gives it the -shorthand DISPCNT, so -you might see it under that name if you read other guides.

-

Among IO Registers, it's one of the simpler ones, but it's got enough complexity -that we can get a hint of what's to come.

-

Also it's the one that you basically always need to set at least once in every -GBA game, so it's a good starting one to go over for that reason too.

-

The display control register holds a u16 value, and is located at 0x0400_0000.

-

Many of the bits here won't mean much to you right now. That is fine. You do -NOT need to memorize them all or what they all do right away. We'll just skim -over all the parts of this register to start, and then we'll go into more detail -in later chapters when we need to come back and use more of the bits.

-

Video Modes

-

The lowest three bits (0-2) let you select from among the GBA's six video modes. -You'll notice that 3 bits allows for eight modes, but the values 6 and 7 are -prohibited.

-

Modes 0, 1, and 2 are "tiled" modes. These are actually the modes that you -should eventually learn to use as much as possible. It lets the GBA's limited -video hardware do as much of the work as possible, leaving more of your CPU time -for gameplay computations. However, they're also complex enough to deserve their -own demos and chapters later on, so that's all we'll say about them for now.

-

Modes 3, 4, and 5 are "bitmap" modes. These let you write individual pixels to -locations on the screen.

-
    -
  • Mode 3 is full resolution (240w x 160h) RGB15 color. You might not be used -to RGB15, since modern computers have 24 or 32 bit colors. In RGB15, there's 5 -bits for each color channel stored within a u16 value, and the highest bit is -simply ignored.
  • -
  • Mode 4 is full resolution paletted color. Instead of being a u16 color, each -pixel value is a u8 palette index entry, and then the display uses the -palette memory (which we'll talk about later) to store the actual color data. -Since each pixel is half sized, we can fit twice as many. This lets us have -two "pages". At any given moment only one page is active, and you can draw to -the other page without the user noticing. You set which page to show with -another bit we'll get to in a moment.
  • -
  • Mode 5 is full color, but also with pages. This means that we must have a -reduced resolution to compensate (video memory is only so big!). The screen is -effectively only 160w x 128h in this mode.
  • -
-

CGB Mode

-

Bit 3 is effectively read only. Technically it can be flipped using a BIOS call, -but when you write to the display control register normally it won't write to -this bit, so we'll call it effectively read only.

-

This bit is on if the CPU is in CGB mode.

-

Page Flipping

-

Bit 4 lets you pick which page to use. This is only relevent in video modes 4 or -5, and is just ignored otherwise. It's very easy to remember: when the bit is 0 -the 0th page is used, and when the bit is 1 the 1st page is used.

-

The second page always starts at 0x0600_A000.

-

OAM, VRAM, and Blanking

-

Bit 5 lets you access OAM during HBlank if enabled. This is cool, but it reduces -the maximum sprites per scanline, so it's not default.

-

Bit 6 lets you adjust if the GBA should treat Object Character VRAM as being 2d -(off) or 1d (on). This particular control can be kinda tricky to wrap your head -around, so we'll be sure to have some extra diagrams in the chapter that deals -with it.

-

Bit 7 forces the screen to stay in VBlank as long as it's set. This allows the -fastest use of the VRAM, Palette, and Object Attribute Memory. Obviously if you -leave this on for too long the player will notice a blank screen, but it might -be okay to use for a moment or two every once in a while.

-

Screen Layers

-

Bits 8 through 11 control if Background layers 0 through 3 should be active.

-

Bit 12 affects the Object layer.

-

Note that not all background layers are available in all video modes:

-
    -
  • Mode 0: all
  • -
  • Mode 1: 0/1/2
  • -
  • Mode 2: 2/3
  • -
  • Mode 3/4/5: 2
  • -
-

Bit 13 and 14 enable the display of Windows 0 and 1, and Bit 15 enables the -object display window. We'll get into how windows work later on, they let you do -some nifty graphical effects.

-

In Conclusion...

-

So what did we do to the display control register in hello1?

-

-# #![allow(unused_variables)]
-#fn main() {
-    (0x04000000 as *mut u16).write_volatile(0x0403);
-#}
-

First let's convert that to -binary, and we get -0b100_0000_0011. So, that's setting Mode 3 with background 2 enabled and -nothing else special.

- -
- - -
-
- - - -
- - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/ch01/video_memory_intro.html b/docs/ch01/video_memory_intro.html deleted file mode 100644 index 1a75137..0000000 --- a/docs/ch01/video_memory_intro.html +++ /dev/null @@ -1,296 +0,0 @@ - - - - - - Video Memory Intro - Rust GBA Guide - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
- - - - - - - - - - -
-
-

Video Memory Intro

-

The GBA's Video RAM is 96k stretching from 0x0600_0000 to 0x0601_7FFF.

-

The Video RAM can only be accessed totally freely during a Vertical Blank (aka -"VBlank", though sometimes I forget and don't capitalize it properly). At other -times, if the CPU tries to touch the same part of video memory as the display -controller is accessing then the CPU gets bumped by a cycle to avoid a clash.

-

Annoyingly, VRAM can only be properly written to in 16 and 32 bit segments (same -with PALRAM and OAM). If you try to write just an 8 bit segment, then both parts -of the 16 bit segment get the same value written to them. In other words, if you -write the byte 5 to 0x0600_0000, then both 0x0600_0000 and ALSO -0x0600_0001 will have the byte 5 in them. We have to be extra careful when -trying to set an individual byte, and we also have to be careful if we use -memcopy or memset as well, because they're byte oriented by default and -don't know to follow the special rules.

-

RGB15

-

As I said before, RGB15 stores a color within a u16 value using 5 bits for -each color channel.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub const RED:   u16 = 0b0_00000_00000_11111;
-pub const GREEN: u16 = 0b0_00000_11111_00000;
-pub const BLUE:  u16 = 0b0_11111_00000_00000;
-#}
-

In Mode 3 and Mode 5 we write direct color values into VRAM, and in Mode 4 we -write palette index values, and then the color values go into the PALRAM.

-

Mode 3

-

Mode 3 is pretty easy. We have a full resolution grid of rgb15 pixels. There's -160 rows of 240 pixels each, with the base address being the top left corner. A -particular pixel uses normal "2d indexing" math:

-

-# #![allow(unused_variables)]
-#fn main() {
-let row_five_col_seven = 5 + (7 * SCREEN_WIDTH);
-#}
-

To draw a pixel, we just write a value at the address for the row and col that -we want to draw to.

-

Mode 4

-

Mode 4 introduces page flipping. Instead of one giant page at 0x0600_0000, -there's Page 0 at 0x0600_0000 and then Page 1 at 0x0600_A000. The resolution -for each page is the same as above, but instead of writing u16 values, the -memory is treated as u8 indexes into PALRAM. The PALRAM starts at -0x0500_0000, and there's enough space for 256 palette entries (each a u16).

-

To set the color of a palette entry we just do a normal u16 write_volatile.

-

-# #![allow(unused_variables)]
-#fn main() {
-(0x0500_0000 as *mut u16).offset(target_index).write_volatile(new_color)
-#}
-

To draw a pixel we set the palette entry that we want the pixel to use. However, -we must remember the "minimum size" write limitation that applies to VRAM. So, -if we want to change just a single pixel at a time we must

-
    -
  1. Read the full u16 it's a part of.
  2. -
  3. Clear the half of the u16 we're going to replace
  4. -
  5. Write the half of the u16 we're going to replace with the new value
  6. -
  7. Write that result back to the address.
  8. -
-

So, the math for finding a byte offset is the same as Mode 3 (since they're both -a 2d grid). If the byte offset is EVEN it'll be the high bits of the u16 at -half the byte offset rounded down. If the offset is ODD it'll be the low bits of -the u16 at half the byte.

-

Does that make sense?

-
    -
  • If we want to write pixel (0,0) the byte offset is 0, so we change the high -bits of u16 offset 0. Then we want to write to (1,0), so the byte offset is -1, so we change the low bits of u16 offset 0. The pixels are next to each -other, and the target bytes are next to each other, good so far.
  • -
  • If we want to write to (5,6) that'd be byte 5 + 6 * 240 = 1445, so we'd -target the low bits of u16 offset floor(1445/2) = 722.
  • -
-

As you can see, trying to write individual pixels in Mode 4 is mostly a bad -time. Fret not! We don't have to write individual bytes. If our data is -arranged correctly ahead of time we can just write u16 or u32 values -directly. The video hardware doesn't care, it'll get along just fine.

-

Mode 5

-

Mode 5 is also a two page mode, but instead of compressing the size of a pixel's -data to fit in two pages, we compress the resolution.

-

Mode 5 is full u16 color, but only 160w x 128h per page.

-

In Conclusion...

-

So what got written into VRAM in hello1?

-

-# #![allow(unused_variables)]
-#fn main() {
-    (0x06000000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
-    (0x06000000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
-    (0x06000000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
-#}
-

So at pixels (120,80), (136,80), and (120,96) we write three values. Once -again we probably need to convert them into -binary to make sense of it.

-
    -
  • 0x001F: 0b0_00000_00000_11111
  • -
  • 0x03E0: 0b0_00000_11111_00000
  • -
  • 0x7C00: 0b0_11111_00000_00000
  • -
-

Ah, of course, a red pixel, a green pixel, and a blue pixel.

- -
- - -
-
- - - -
- - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/ch02/light_cycle.html b/docs/ch02/light_cycle.html deleted file mode 100644 index e1e5fbd..0000000 --- a/docs/ch02/light_cycle.html +++ /dev/null @@ -1,314 +0,0 @@ - - - - - - light_cycle - Rust GBA Guide - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
- - - - - - - - - - -
-
-

light_cycle

-

Now let's make a game of "light_cycle" with our new knowledge.

-

Gameplay

-

light_cycle is pretty simple, and very obvious if you've ever seen Tron. The -player moves around the screen with a trail left behind them. They die if they -go off the screen or if they touch their own trail.

-

Operations

-

We need some better drawing operations this time around.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub unsafe fn mode3_clear_screen(color: u16) {
-  let color = color as u32;
-  let bulk_color = color << 16 | color;
-  let mut ptr = VolatilePtr(VRAM as *mut u32);
-  for _ in 0..SCREEN_HEIGHT {
-    for _ in 0..(SCREEN_WIDTH / 2) {
-      ptr.write(bulk_color);
-      ptr = ptr.offset(1);
-    }
-  }
-}
-
-pub unsafe fn mode3_draw_pixel(col: isize, row: isize, color: u16) {
-  VolatilePtr(VRAM as *mut u16).offset(col + row * SCREEN_WIDTH).write(color);
-}
-
-pub unsafe fn mode3_read_pixel(col: isize, row: isize) -> u16 {
-  VolatilePtr(VRAM as *mut u16).offset(col + row * SCREEN_WIDTH).read()
-}
-#}
-

The draw pixel and read pixel are both pretty obvious. What's new is the clear -screen operation. It changes the u16 color into a u32 and then packs the -value in twice. Then we write out u32 values the whole way through screen -memory. This means we have to do less write operations overall, and so the -screen clear is twice as fast.

-

Now we just have to fill in the main function:

-
#[start]
-fn main(_argc: isize, _argv: *const *const u8) -> isize {
-  unsafe {
-    DISPCNT.write(MODE3 | BG2);
-  }
-
-  let mut px = SCREEN_WIDTH / 2;
-  let mut py = SCREEN_HEIGHT / 2;
-  let mut color = rgb16(31, 0, 0);
-
-  loop {
-    // read the input for this frame
-    let this_frame_keys = key_input();
-
-    // adjust game state and wait for vblank
-    px += 2 * this_frame_keys.column_direction() as isize;
-    py += 2 * this_frame_keys.row_direction() as isize;
-    wait_until_vblank();
-
-    // draw the new game and wait until the next frame starts.
-    unsafe {
-      if px < 0 || py < 0 || px == SCREEN_WIDTH || py == SCREEN_HEIGHT {
-        // out of bounds, reset the screen and position.
-        mode3_clear_screen(0);
-        color = color.rotate_left(5);
-        px = SCREEN_WIDTH / 2;
-        py = SCREEN_HEIGHT / 2;
-      } else {
-        let color_here = mode3_read_pixel(px, py);
-        if color_here != 0 {
-          // crashed into our own line, reset the screen
-          mode3_clear_screen(0);
-          color = color.rotate_left(5);
-        } else {
-          // draw the new part of the line
-          mode3_draw_pixel(px, py, color);
-          mode3_draw_pixel(px, py + 1, color);
-          mode3_draw_pixel(px + 1, py, color);
-          mode3_draw_pixel(px + 1, py + 1, color);
-        }
-      }
-    }
-    wait_until_vdraw();
-  }
-}
-
-

Oh that's a lot more than before!

-

First we set Mode 3 and Background 2, we know about that.

-

Then we're going to store the player's x and y, along with a color value for -their light cycle. Then we enter the core loop.

-

We read the keys for input, and then do as much as we can without touching video -memory. Since we're using video memory as the place to store the player's light -trail, we can't do much, we just update their position and wait for VBlank to -start. The player will be a 2x2 square, so the arrows will move you 2 pixels per -frame.

-

Once we're in VBlank we check to see what kind of drawing we're doing. If the -player has gone out of bounds, we clear the screen, rotate their color, and then -reset their position. Why rotate the color? Just because it's fun to have -different colors.

-

Next, if the player is in bounds we read the video memory for their position. If -it's not black that means we've been here before and the player has crashed into -their own line. In this case, we reset the game without moving them to a new -location.

-

Finally, if the player is in bounds and they haven't crashed, we write their -color into memory at this position.

-

Regardless of how it worked out, we hold here until vdraw starts before going to -the next loop. That's all there is to it.

-

The gba crate doesn't quite work like this

-

Once again, as with the hello1 and hello2 examples, the gba crate covers -much of this same ground as our example here, but in slightly different ways.

-

Better organization and abstractions are usually only realized once you've used -more of the whole thing you're trying to work with. If we want to have a crate -where the whole thing is well integrated with itself, then the examples would -also end up having to explain about things we haven't really touched on much -yet. It becomes a lot harder to teach.

-

So, going forward, we will continue to teach concepts and build examples that -don't directly depend on the gba crate. This allows the crate to freely grow -without all the past examples becoming a great inertia upon it.

- -
- - -
-
- - - -
- - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/ch02/the_key_input_register.html b/docs/ch02/the_key_input_register.html deleted file mode 100644 index f51ded3..0000000 --- a/docs/ch02/the_key_input_register.html +++ /dev/null @@ -1,398 +0,0 @@ - - - - - - The Key Input Register - Rust GBA Guide - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
- - - - - - - - - - -
-
-

The Key Input Register

-

The Key Input Register is our next IO register. Its shorthand name is -KEYINPUT and it's a u16 -at 0x4000130. The entire register is obviously read only, you can't tell the -GBA what buttons are pressed.

-

Each button is exactly one bit:

- - - - - - - - - - - -
Bit Button
0 A
1 B
2 Select
3 Start
4 Right
5 Left
6 Up
7 Down
8 R
9 L
-

The higher bits above are not used at all.

-

Similar to other old hardware devices, the convention here is that a button's -bit is clear when pressed, active when released. In other words, when the -user is not touching the device at all the KEYINPUT value will read -0b0000_0011_1111_1111. There's similar values for when the user is pressing as -many buttons as possible, but since the left/right and up/down keys are on an -arrow pad the value can never be 0 since you can't ever press every single key -at once.

-

When dealing with key input, the register always shows the exact key values at -any moment you read it. Obviously that's what it should do, but what it means to -you as a programmer is that you should usually gather input once at the top of a -game frame and then use that single input poll as the input values across the -whole game frame.

-

Of course, you might want to know if a user's key state changed from frame to -frame. That's fairly easy too: We just store the last frame keys as well as the -current frame keys (it's only a u16) and then we can xor the two values. -Anything that shows up in the xor result is a key that changed. If it's changed -and it's now down, that means it was pushed this frame. If it's changed and it's -now up, that means it was released this frame.

-

The other major thing you might frequently want is to know "which way" the arrow -pad is pointing: Up/Down/None and Left/Right/None. Sounds like an enum to me. -Except that often time we'll have situations where the direction just needs to -be multiplied by a speed and applied as a delta to a position. We want to -support that as well as we can too.

-

Key Input Code

-

Let's get down to some code. First we want to make a way to read the address as -a u16 and then wrap that in our newtype which will implement methods for -reading and writing the key bits.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub const KEYINPUT: VolatilePtr<u16> = VolatilePtr(0x400_0130 as *mut u16);
-
-/// A newtype over the key input state of the GBA.
-#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
-#[repr(transparent)]
-pub struct KeyInputSetting(u16);
-
-pub fn key_input() -> KeyInputSetting {
-  unsafe { KeyInputSetting(KEYINPUT.read()) }
-}
-#}
-

Now we want a way to check if a key is being pressed, since that's normally -how we think of things as a game designer and even as a player. That is, usually -you'd say "if you press A, then X happens" instead of "if you don't press A, -then X does not happen".

-

Normally we'd pick a constant for the bit we want, & it with our value, and -then check for val != 0. Since the bit we're looking for is 0 in the "true" -state we still pick the same constant and we still do the &, but we test with -== 0. Practically the same, right? Well, since I'm asking a rhetorical -question like that you can probably already guess that it's not the same. I was -shocked to learn this too.

-

All we have to do is ask our good friend -Godbolt what's gonna happen when the code -compiles. The link there has the page set for the stable 1.30 compiler just so -that the link results stay consistent if you read this book in a year or -something. Also, we've set the target to thumbv6m-none-eabi, which is a -slightly later version of ARM than the actual GBA, but it's close enough for -just checking. Of course, in a full program small functions like these will -probably get inlined into the calling code and disappear entirely as they're -folded and refolded by the compiler, but we can just check.

-

It turns out that the !=0 test is 4 instructions and the ==0 test is 6 -instructions. Since we want to get savings where we can, and we'll probably -check the keys of an input often enough, we'll just always use a !=0 test and -then adjust how we initially read the register to compensate. By using xor with -a mask for only the 10 used bits we can flip the "low when pressed" values so -that the entire result has active bits in all positions where a key is pressed.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub fn key_input() -> KeyInputSetting {
-  unsafe { KeyInputSetting(KEYINPUT.read_volatile() ^ 0b0000_0011_1111_1111) }
-}
-#}
-

Now we add a method for seeing if a key is pressed. In the full library there's -a more advanced version of this that's built up via macro, but for this example -we'll just name a bunch of const values and then have a method that takes a -value and says if that bit is on.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub const KEY_A: u16 = 1 << 0;
-pub const KEY_B: u16 = 1 << 1;
-pub const KEY_SELECT: u16 = 1 << 2;
-pub const KEY_START: u16 = 1 << 3;
-pub const KEY_RIGHT: u16 = 1 << 4;
-pub const KEY_LEFT: u16 = 1 << 5;
-pub const KEY_UP: u16 = 1 << 6;
-pub const KEY_DOWN: u16 = 1 << 7;
-pub const KEY_R: u16 = 1 << 8;
-pub const KEY_L: u16 = 1 << 9;
-
-impl KeyInputSetting {
-  pub fn contains(&self, key: u16) -> bool {
-    (self.0 & key) != 0
-  }
-}
-#}
-

Because each key is a unique bit you can even check for more than one key at -once by just adding two key values together.

-

-# #![allow(unused_variables)]
-#fn main() {
-let input_contains_a_and_l = input.contains(KEY_A + KEY_L);
-#}
-

And we wanted to save the state of an old frame and compare it to the current -frame to see what was different:

-

-# #![allow(unused_variables)]
-#fn main() {
-  pub fn difference(&self, other: KeyInputSetting) -> KeyInputSetting {
-    KeyInputSetting(self.0 ^ other.0)
-  }
-#}
-

Anything that's "in" the difference output is a key that changed, and then if -the key reads as pressed this frame that means it was just pressed. The exact -mechanics of all the ways you might care to do something based on new key -presses is obviously quite varied, but it might be something like this:

-

-# #![allow(unused_variables)]
-#fn main() {
-let this_frame_diff = this_frame_input.difference(last_frame_input);
-
-if this_frame_diff.contains(KEY_B) && this_frame_input.contains(KEY_B) {
-  // the user just pressed B, react in some way
-}
-#}
-

And for the arrow pad, we'll make an enum that easily casts into i32. Whenever -we're working with stuff we can try to use i32 / isize as often as possible -just because it's easier on the GBA's CPU if we stick to its native number size. -Having it be an enum lets us use match and be sure that we've covered all our -cases.

-

-# #![allow(unused_variables)]
-#fn main() {
-/// A "tribool" value helps us interpret the arrow pad.
-#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
-#[repr(i32)]
-pub enum TriBool {
-  Minus = -1,
-  Neutral = 0,
-  Plus = +1,
-}
-#}
-

Now, how do we determine which way is plus or minus? Well... I don't know. -Really. I'm not sure what the best one is because the GBA really wants the -origin at 0,0 with higher rows going down and higher cols going right. On the -other hand, all the normal math you and I learned in school is oriented with -increasing Y being upward on the page. So, at least for this demo, we're going -to go with what the GBA wants us to do and give it a try. If we don't end up -confusing ourselves then we can stick with that. Maybe we can cover it over -somehow later on.

-

-# #![allow(unused_variables)]
-#fn main() {
-  pub fn column_direction(&self) -> TriBool {
-    if self.contains(KEY_RIGHT) {
-      TriBool::Plus
-    } else if self.contains(KEY_LEFT) {
-      TriBool::Minus
-    } else {
-      TriBool::Neutral
-    }
-  }
-
-  pub fn row_direction(&self) -> TriBool {
-    if self.contains(KEY_DOWN) {
-      TriBool::Plus
-    } else if self.contains(KEY_UP) {
-      TriBool::Minus
-    } else {
-      TriBool::Neutral
-    }
-  }
-#}
-

So then in our game, every frame we can check for column_direction and -row_direction and then apply those to the player's current position to make -them move around the screen.

-

With that settled I think we're all done with user input for now. There's some -other things to eventually know about like key interrupts that you can set and -stuff, but we'll cover that later on because it's not necessary right now.

- -
- - -
-
- - - -
- - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/ch02/the_vcount_register.html b/docs/ch02/the_vcount_register.html deleted file mode 100644 index 6f74b78..0000000 --- a/docs/ch02/the_vcount_register.html +++ /dev/null @@ -1,268 +0,0 @@ - - - - - - The VCount Register - Rust GBA Guide - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
- - - - - - - - - - -
-
-

The VCount Register

-

There's an IO register called -VCOUNT that shows -you, what else, the Vertical (row) COUNT(er). It's a u16 at address -0x0400_0006, and it's how we'll be doing our very poor quality vertical sync -code to start.

-
    -
  • What makes it poor? Well, we're just going to read from the vcount value as -often as possible every time we need to wait for a specific value to come up, -and then proceed once it hits the point we're looking for.
  • -
  • Why is this bad? Because we're making the CPU do a lot of useless work, -which uses a lot more power that necessary. Even if you're not on an actual -GBA you might be running inside an emulator on a phone or other handheld. You -wanna try to save battery if all you're doing with that power use is waiting -instead of making a game actually do something.
  • -
  • Can we do better? We can, but not yet. The better way to do things is to -use a BIOS call to put the CPU into low power mode until a VBlank interrupt -happens. However, we don't know about interrupts yet, and we don't know about -BIOS calls yet, so we'll do the basic thing for now and then upgrade later.
  • -
-

So the way that display hardware actually displays each frame is that it moves a -tiny pointer left to right across each pixel row one pixel at a time. When it's -within the actual screen width (240px) it's drawing out those pixels. Then it -goes past the edge of the screen for 68px during a period known as the -"horizontal blank" (HBlank). Then it starts on the next row and does that loop -over again. This happens for the whole screen height (160px) and then once again -it goes past the last row for another 68px into a "vertical blank" (VBlank) -period.

-
    -
  • One pixel is 4 CPU cycles
  • -
  • HDraw is 240 pixels, HBlank is 68 pixels (1,232 cycles per full scanline)
  • -
  • VDraw is 150 scanlines, VBlank is 68 scanlines (280,896 cycles per full refresh)
  • -
-

Now you may remember some stuff from the display control register section where -it was mentioned that some parts of memory are best accessed during VBlank, and -also during hblank with a setting applied. These blanking periods are what was -being talked about. At other times if you attempt to access video or object -memory you (the CPU) might try touching the same memory that the display device -is trying to use, in which case you get bumped back a cycle so that the display -can finish what it's doing. Also, if you really insist on doing video memory -changes while the screen is being drawn then you might get some visual glitches. -If you can, just prepare all your changes ahead of time and then assign then all -quickly during the blank period.

-

So first we want a way to check the vcount value at all:

-

-# #![allow(unused_variables)]
-#fn main() {
-pub const VCOUNT: VolatilePtr<u16> = VolatilePtr(0x0400_0006 as *mut u16);
-
-pub fn vcount() -> u16 {
-  unsafe { VCOUNT.read() }
-}
-#}
-

Then we want two little helper functions to wait until VBlank and vdraw.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub const SCREEN_HEIGHT: isize = 160;
-
-pub fn wait_until_vblank() {
-  while vcount() < SCREEN_HEIGHT as u16 {}
-}
-
-pub fn wait_until_vdraw() {
-  while vcount() >= SCREEN_HEIGHT as u16 {}
-}
-#}
-

And... that's it. No special types to be made this time around, it's just a -number we read out of memory.

- -
- - -
-
- - - -
- - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/ch03/gba_memory_mapping.html b/docs/ch03/gba_memory_mapping.html deleted file mode 100644 index ce58985..0000000 --- a/docs/ch03/gba_memory_mapping.html +++ /dev/null @@ -1,426 +0,0 @@ - - - - - - GBA Memory Mapping - Rust GBA Guide - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
- - - - - - - - - - -
-
-

GBA Memory Mapping

-

The GBA Memory Map has -several memory portions to it, each with their own little differences. Most of -the memory has pre-determined use according to the hardware, but there is also -space for games to use as a scratch pad in whatever way the game sees fit.

-

The memory ranges listed here are inclusive, so they end with a lot of F's -and E's.

-

We've talked about volatile memory before, but just as a reminder I'll say that -all of the memory we'll talk about here should be accessed using volatile with -two exceptions:

-
    -
  1. Work RAM (both internal and external) can be used normally, and if the -compiler is able to totally elide some reads and writes that's okay.
  2. -
  3. However, if you set aside any space in Work RAM where an interrupt will -communicate with the main program then that specific location will have to -keep using volatile access, since the compiler never knows when an interrupt -will actually happen.
  4. -
-

BIOS / System ROM

-
    -
  • 0x0 to 0x3FFF (16k)
  • -
-

This is special memory for the BIOS. It is "read-only", but even then it's only -accessible when the program counter is pointing into the BIOS region. At all -other times you get a garbage -value back when you -try to read out of the BIOS.

-

External Work RAM / EWRAM

-
    -
  • 0x2000000 to 0x203FFFF (256k)
  • -
-

This is a big pile of space, the use of which is up to each game. However, the -external work ram has only a 16-bit bus (if you read/write a 32-bit value it -silently breaks it up into two 16-bit operations) and also 2 wait cycles (extra -CPU cycles that you have to expend per 16-bit bus use).

-

It's most helpful to think of EWRAM as slower, distant memory, similar to the -"heap" in a normal application. You can take the time to go store something -within EWRAM, or to load it out of EWRAM, but if you've got several operations -to do in a row and you're worried about time you should pull that value into -local memory, work on your local copy, and then push it back out to EWRAM.

-

Internal Work RAM / IWRAM

-
    -
  • 0x3000000 to 0x3007FFF (32k)
  • -
-

This is a smaller pile of space, but it has a 32-bit bus and no wait.

-

By default, 0x3007F00 to 0x3007FFF is reserved for interrupt and BIOS use. -The rest of it is totally up to you. The user's stack space starts at -0x3007F00 and proceeds down from there. For best results you should probably -start at 0x3000000 and then go upwards. Under normal use it's unlikely that -the two memory regions will crash into each other.

-

IO Registers

-
    -
  • 0x4000000 to 0x40003FE
  • -
-

We've touched upon a few of these so far, and we'll get to more later. At the -moment it is enough to say that, as you might have guessed, all of them live in -this region. Each individual register is a u16 or u32 and they control all -sorts of things. We'll actually be talking about some more of them in this very -chapter, because that's how we'll control some of the background and object -stuff.

-

Palette RAM / PALRAM

-
    -
  • 0x5000000 to 0x50003FF (1k)
  • -
-

Palette RAM has a 16-bit bus, which isn't really a problem because it -conceptually just holds u16 values. There's no automatic wait state, but if -you try to access the same location that the display controller is accessing you -get bumped by 1 cycle. Since the display controller can use the palette ram any -number of times per scanline it's basically impossible to predict if you'll have -to do a wait or not during VDraw. During VBlank you won't have any wait of -course.

-

PALRAM is among the memory where there's weirdness if you try to write just one -byte: if you try to write just 1 byte, it writes that byte into both parts of -the larger 16-bit location. This doesn't really affect us much with PALRAM, -because palette values are all supposed to be u16 anyway.

-

The palette memory actually contains not one, but two sets of palettes. First -there's 256 entries for the background palette data (starting at 0x5000000), -and then there's 256 entries for object palette data (starting at 0x5000200).

-

The GBA also has two modes for palette access: 8-bits-per-pixel (8bpp) and -4-bits-per-pixel (4bpp).

-
    -
  • In 8bpp mode an 8-bit palette index value within a background or sprite -simply indexes directly into the 256 slots for that type of thing.
  • -
  • In 4bpp mode a 4-bit palette index value within a background or sprite -specifies an index within a particular "palbank" (16 palette entries each), -and then a separate setting outside of the graphical data determines which -palbank is to be used for that background or object (the screen entry data for -backgrounds, and the object attributes for objects).
  • -
-

Transparency

-

When a pixel within a background or object specifies index 0 as its palette -entry it is treated as a transparent pixel. This means that in 8bpp mode there's -only 255 actual color options (0 being transparent), and in 4bpp mode there's -only 15 actual color options available within each palbank (the 0th entry of -each palbank is transparent).

-

Individual backgrounds, and individual objects, each determine if they're 4bpp -or 8bpp separately, so a given overall palette slot might map to a used color in -8bpp and an unused/transparent color in 4bpp. If you're a palette wizard.

-

Palette slot 0 of the overall background palette is used to determine the -"backdrop" color. That's the color you see if no background or object ends up -being rendered within a given pixel.

-

Since display mode 3 and display mode 5 don't use the palette, they cannot -benefit from transparency.

-

Video RAM / VRAM

-
    -
  • 0x6000000 to 0x6017FFF (96k)
  • -
-

We've used this before! VRAM has a 16-bit bus and no wait. However, the same as -with PALRAM, the "you might have to wait if the display controller is looking at -it" rule applies here.

-

Unfortunately there's not much more exact detail that can be given about VRAM. -The use of the memory depends on the video mode that you're using.

-

One general detail of note is that you can't write individual bytes to any part -of VRAM. Depending on mode and location, you'll either get your bytes doubled -into both the upper and lower parts of the 16-bit location targeted, or you -won't even affect the memory. This usually isn't a big deal, except in two -situations:

-
    -
  • In Mode 4, if you want to change just 1 pixel, you'll have to be very careful -to read the old u16, overwrite just the byte you wanted to change, and then -write that back.
  • -
  • In any display mode, avoid using memcopy to place things into VRAM. -It's written to be byte oriented, and only does 32-bit transfers under select -conditions. The rest of the time it'll copy one byte at a time and you'll get -either garbage or nothing at all.
  • -
-

Object Attribute Memory / OAM

-
    -
  • 0x7000000 to 0x70003FF (1k)
  • -
-

The Object Attribute Memory has a 32-bit bus and no default wait, but suffers -from the "you might have to wait if the display controller is looking at it" -rule. You cannot write individual bytes to OAM at all, but that's not really a -problem because all the fields of the data types within OAM are either i16 or -u16 anyway.

-

Object attribute memory is the wildest yet: it conceptually contains two types -of things, but they're interlaced with each other all the way through.

-

Now, GBATEK and -CowByte -doesn't quite give names to the two data types here. -TONC calls them -OBJ_ATTR and OBJ_AFFINE, but we'll be giving them names fitting with the -Rust naming convention. Just know that if you try to talk about it with others -they might not be using the same names. In Rust terms their layout would look -like this:

-

-# #![allow(unused_variables)]
-#fn main() {
-#[repr(C)]
-pub struct ObjectAttributes {
-  attr0: u16,
-  attr1: u16,
-  attr2: u16,
-  filler: i16,
-}
-
-#[repr(C)]
-pub struct AffineMatrix {
-  filler0: [u16; 3],
-  pa: i16,
-  filler1: [u16; 3],
-  pb: i16,
-  filler2: [u16; 3],
-  pc: i16,
-  filler3: [u16; 3],
-  pd: i16,
-}
-#}
-

(Note: the #[repr(C)] part just means that Rust must lay out the data exactly -in the order we specify, which otherwise it is not required to do).

-

So, we've got 1024 bytes in OAM and each ObjectAttributes value is 8 bytes, so -naturally we can support up to 128 objects.

-

At the same time, we've got 1024 bytes in OAM and each AffineMatrix is 32 -bytes, so we can have 32 of them.

-

But, as I said, these things are all interlaced with each other. See how -there's "filler" fields in each struct? If we imagine the OAM as being just an -array of one type or the other, indexes 0/1/2/3 of the ObjectAttributes array -would line up with index 0 of the AffineMatrix array. It's kinda weird, but -that's just how it works. When we setup functions to read and write these values -we'll have to be careful with how we do it. We probably won't want to use -those representations above, at least not with the AffineMatrix type, because -they're quite wasteful if you want to store just object attributes or just -affine matrices.

-

Game Pak ROM / Flash ROM

-
    -
  • 0x8000000 to 0x9FFFFFF (wait 0)
  • -
  • 0xA000000 to 0xBFFFFFF (wait 1)
  • -
  • 0xC000000 to 0xDFFFFFF (wait 2)
  • -
  • Max of 32Mb
  • -
-

These portions of the memory are less fixed, because they depend on the precise -details of the game pak you've inserted into the GBA. In general, they connect -to the game pak ROM and/or Flash memory, using a 16-bit bus. The ROM is -read-only, but the Flash memory (if any) allows writes.

-

The game pak ROM is listed as being in three sections, but it's actually the -same memory being effectively mirrored into three different locations. The -mirror that you choose to access the game pak through affects which wait state -setting it uses (configured via IO register of course). Unfortunately, the -details come down more to the game pak hardware that you load your game onto -than anything else, so there's not much I can say right here. We'll eventually -talk about it more later when I'm forced to do the boring thing and just cover -all the IO registers that aren't covered anywhere else.

-

One thing of note is the way that the 16-bit bus affects us: the instructions to -execute are coming through the same bus as the rest of the game data, so we want -them to be as compact as possible. The ARM chip in the GBA supports two -different instruction sets, "thumb" and "non-thumb". The thumb mode instructions -are 16-bit, so they can each be loaded one at a time, and the non-thumb -instructions are 32-bit, so we're at a penalty if we execute them directly out -of the game pak. However, some things will demand that we use non-thumb code, so -we'll have to deal with that eventually. It's possible to switch between modes, -but it's a pain to keep track of what mode you're in because there's not -currently support for it in Rust itself (perhaps some day). So we'll stick with -thumb code as much as we possibly can, that's why our target profile for our -builds starts with thumbv4.

-

Game Pak SRAM

-
    -
  • 0xE000000 to 0xE00FFFF (64k)
  • -
-

The game pak SRAM has an 8-bit bus. Why did Pokémon always take so long to save? -Saving the whole game one byte at a time is why. The SRAM also has some amount -of wait, but as with the ROM, the details depend on your game pak hardware (and -also as with ROM, you can adjust the settings with an IO register, should you -need to).

-

One thing to note about the SRAM is that the GBA has a Direct Memory Access -(DMA) feature that can be used for bulk memory movements in some cases, but the -DMA cannot access the SRAM region. You really are stuck reading and writing -one byte at a time when you're using the SRAM.

- -
- - -
-
- - - -
- - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/ch03/gba_prng.html b/docs/ch03/gba_prng.html deleted file mode 100644 index b653867..0000000 --- a/docs/ch03/gba_prng.html +++ /dev/null @@ -1,1232 +0,0 @@ - - - - - - GBA PRNG - Rust GBA Guide - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
- - - - - - - - - - -
-
-

GBA PRNG

-

You often hear of the "Random Number Generator" in video games. First of all, -usually a game doesn't have access to any source of "true randomness". On a PC -you can send out a web request to random.org which -uses atmospheric data, or even just point a camera at some lava -lamps. Even -then, the rate at which you'll want random numbers far exceeds the rate at which -those services can offer them up. So instead you'll get a pseudo-random number -generator and "seed" it with the true random data and then use that.

-

However, we don't even have that! On the GBA, we can't ask any external anything -what we should do for our initial seed. So we will not only need to come up with -a few PRNG options, but we'll also need to come up with some seed source -options. More than with other options within the book, I think this is an area -where you can tailor what you do to your specific game.

-

What is a Pseudo-random Number Generator?

-

For those of you who somehow read The Rust Book, plus possibly The Rustonomicon, -and then found this book, but somehow still don't know what a PRNG is... Well, -I don't think there are many such people. Still, we'll define it anyway I -suppose.

-
-

A PRNG is any mathematical process that takes an initial input (of some fixed -size) and then produces a series of outputs (of a possibly different size).

-
-

So, if you seed your PRNG with a 32-bit value you might get 32-bit values out or -you might get 16-bit values out, or something like that.

-

We measure the quality of a PRNG based upon:

-
    -
  1. Is the output range easy to work with? Most PRNG techniques that you'll -find these days are already hip to the idea that we'll have the fastest -operations with numbers that match our register width and all that, so -they're usually designed around power of two inputs and power of two outputs. -Still, every once in a while you might find some page old page intended for -compatibility with the rand() function in the C standard library that'll -talk about something crazy like having 15-bit PRNG outputs. Stupid as it -sounds, that's real. Avoid those. Whenever possible we want generators that -give us uniformly distributed u8, u16, u32, or whatever size value -we're producing. From there we can mold our random bits into whatever else we -need (eg: turning a u8 into a "1d6" roll).
  2. -
  3. How long does each generation cycle take? This can be tricky for us. A -lot of the top quality PRNGs you'll find these days are oriented towards -64-bit machines so they do a bunch of 64-bit operations. You can do that on -a 32-bit machine if you have to, and the compiler will automatically "lower" -the 64-bit operation into a series of 32-bit operations. What we'd really -like to pick is something that sticks to just 32-bit operations though, since -those will be our best candidates for fast results. We can use Compiler -Explorer and tell it to build for the -thumbv6m-none-eabi target to get a basic idea of what the ASM for a -generator looks like. That's not our exact target, but it's the closest -target that's shipped with the standard rust distribution.
  4. -
  5. What is the statistical quality of the output? This involves heavy -amounts of math. Since computers are quite good a large amounts of repeated -math you might wonder if there's programs for this already, and there are. -Many in fact. They take a generator and then run it over and over and perform -the necessary tests and report the results. I won't be explaining how to hook -our generators up to those tools, they each have their own user manuals. -However, if someone says that a generator "passes BigCrush" (the biggest -suite in TestU01) or "fails PractRand" or anything similar it's useful to -know what they're referring to. Example test suites include: - -
  6. -
-

Note that if a generator is called upon to produce enough output relative to its -state size it will basically always end up failing statistical tests. This means -that any generator with 32-bit state will always fail in any of those test sets. -The theoretical minimum state size for any generator at all to pass the -standard suites is 36 bits, but most generators need many more than that.

-

Generator Size

-

I've mostly chosen to discuss generators that are towards the smaller end of the -state size scale. In fact we'll be going over many generators that are below the -36-bit theoretical minimum to pass all those fancy statistical tests. Why so? -Well, we don't always need the highest possible quality generators.

-

"But Lokathor!", I can already hear you shouting. "I want the highest quality -randomness at all times! The game depends on it!", you cry out.

-

Well... does it? Like, really?

-

The GBA -Pokemon -games use a dead simple 32-bit LCG (we'll see it below). Then starting with -the DS they moved to also using Mersenne Twister, which also fails several -statistical tests and is one of the most predictable PRNGs around. Metroid -Fusion -has a 100% goofy PRNG system for enemies that would definitely never pass any -sort of statistics tests at all. But like, those games were still awesome. Since -we're never going to be keeping secrets safe with our PRNG, it's okay if we -trade in some quality for something else in return (we obviously don't want to -trade quality for nothing).

-

And you have to ask yourself: Where's the space used for the Metroid Fusion -PRNG? No where at all. They were already using everything involved for other -things too, so they're paying no extra cost to have the randomization they do. -How much does it cost Pokemon to throw in a 32-bit LCG? Just 4 bytes, might as -well. How much does it cost to add in a Mersenne Twister? ~2,500 bytes ya say? -I'm sorry what on Earth? Yeah, that sounds crazy, we're probably not doing -that one.

-

k-Dimensional Equidistribution

-

So, wait, why did the Pokemon developers add in the Mersenne Twister generator? -They're smart people, surely they had a reason. Obviously we can't know for -sure, but Mersenne Twister is terrible in a lot of ways, so what's its single -best feature? Well, that gets us to a funky thing called k-dimensional -equidistribution. Basically, if you take a generator's output and chop it down -to get some value you want, with uniform generator output you can always get a -smaller ranged uniform result (though sometimes you will have to reject a result -and run the generator again). Imagine you have a u32 output from your -generator. If you want a u16 value from that you can just pick either half. If -you want a [bool; 4] from that you can just pick four bits. However you wanna -do it, as long as the final form of random thing we're getting needs a number of -bits equal to or less than the number of bits that come out of a single -generator use, we're totally fine.

-

What happens if the thing you want to make requires more bits than a single -generator's output? You obviously have to run the generator more than once and -then stick two or more outputs together, duh. Except, that doesn't always work. -What I mean is that obviously you can always put two u8 side by side to get a -u16, but if you start with a uniform u8 generator and then you run it twice -and stick the results together you don't always get a uniform u16 generator. -Imagine a byte generator that just does state+=1 and then outputs the state. -It's not good by almost any standard, but it does give uniform output. Then we -run it twice in a row, put the two bytes together, and suddenly a whole ton of -potential u16 values can never be generated. That's what k-dimensional -equidistribution is all about. Every uniform output generator is 1-dimensional -equidistributed, but if you need to combine outputs and still have uniform -results then you need a higher k value. So why does Pokemon have Mersenne -Twister in it? Because it's got 623-dimensional equidistribution. That means -when you're combining PRNG calls for all those little IVs and Pokemon Abilities -and other things you're sure to have every potential pokemon actually be a -pokemon that the game can generate. Do you need that for most situations? -Absolutely not. Do you need it for pokemon? No, not even then, but a lot of the -hot new PRNGs have come out just within the past 10 years, so we can't fault -them too much for it.

-

TLDR: 1-dimensional equidistribution just means "a normal uniform generator", -and higher k values mean "you can actually combine up to k output chains and -maintain uniformity". Generators that aren't uniform to begin with effectively -have a k value of 0.

-

Other Tricks

-

Finally, some generators have other features that aren't strictly quantifiable. -Two tricks of note are "jump ahead" or "multiple streams":

-
    -
  • Jump ahead lets you advance the generator's state by some enormous number of -outputs in a relatively small number of operations.
  • -
  • Multi-stream generators have more than one output sequence, and then some part -of their total state space picks a "stream" rather than being part of the -actual seed, with each possible stream causing the potential output sequence -to be in a different order.
  • -
-

They're normally used as a way to do multi-threaded stuff (we don't care about -that on GBA), but another interesting potential is to take one world seed and -then split off a generator for each "type" of thing you'd use PRNG for (combat, -world events, etc). This can become quite useful, where you can do things like -procedurally generate a world region, and then when they leave the region you -only need to store a single generator seed and a small amount of "delta" -information for what the player changed there that you want to save, and then -when they come back you can regenerate the region without having stored much at -all. This is the basis for how old games with limited memory like -Starflight did their whole thing -(800 planets to explore on just to 5.25" floppy disks!).

-

How To Seed

-

Oh I bet you thought we could somehow get through a section without learning -about yet another IO register. Ha, wishful thinking.

-

There's actually not much involved. Starting at 0x400_0100 there's an array of -registers that go "data", "control", "data", "control", etc. TONC and GBATEK use -different names here, and we'll go by the TONC names because they're much -clearer:

-

-# #![allow(unused_variables)]
-#fn main() {
-pub const TM0D: VolatilePtr<u16> = VolatilePtr(0x400_0100 as *mut u16);
-pub const TM0CNT: VolatilePtr<u16> = VolatilePtr(0x400_0102 as *mut u16);
-
-pub const TM1D: VolatilePtr<u16> = VolatilePtr(0x400_0104 as *mut u16);
-pub const TM1CNT: VolatilePtr<u16> = VolatilePtr(0x400_0106 as *mut u16);
-
-pub const TM2D: VolatilePtr<u16> = VolatilePtr(0x400_0108 as *mut u16);
-pub const TM2CNT: VolatilePtr<u16> = VolatilePtr(0x400_010A as *mut u16);
-
-pub const TM3D: VolatilePtr<u16> = VolatilePtr(0x400_010C as *mut u16);
-pub const TM3CNT: VolatilePtr<u16> = VolatilePtr(0x400_010E as *mut u16);
-#}
-

Basically there's 4 timers, numbered 0 to 3. Each one has a Data register and a -Control register. They're all u16 and you can definitely read from all of -them normally, but then it gets a little weird. You can also write to the -Control portions normally, when you write to the Data portion of a timer that -writes the value that the timer resets to, without changing its current Data -value. So if TM0D is paused on some value other than 5 and you write 5 to -it, when you read it back you won't get a 5. When the next timer run starts -it'll begin counting at 5 instead of whatever value it currently reads as.

-

The Data registers are just a u16 number, no special bits to know about.

-

The Control registers are also pretty simple compared to most IO registers:

-
    -
  • 2 bits for the Frequency: 1, 64, 256, 1024. While active, the timer's -value will tick up once every frequency CPU cycles. On the GBA, 1 CPU cycle -is about 59.59ns (2^(-24) seconds). One display controller cycle is 280,896 -CPU cycles.
  • -
  • 1 bit for Cascade Mode: If this is on the timer doesn't count on its own, -instead it ticks up whenever the preceding timer overflows its counter (eg: -if t0 overflows, t1 will tick up if it's in cascade mode). You still have to -also enable this timer for it to do that (below). This naturally doesn't have -an effect when used with timer 0.
  • -
  • 3 bits that do nothing
  • -
  • 1 bit for Interrupt: Whenever this timer overflows it will signal an -interrupt. We still haven't gotten into interrupts yet (since you have to hand -write some ASM for that, it's annoying), but when we cover them this is how -you do them with timers.
  • -
  • 1 bit to Enable the timer. When you disable a timer it retains the current -value, but when you enable it again the value jumps to whatever its currently -assigned default value is.
  • -
-

-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
-#[repr(transparent)]
-pub struct TimerControl(u16);
-
-#[derive(Debug, Clone, Copy, PartialEq, Eq)]
-pub enum TimerFrequency {
-  One = 0,
-  SixFour = 1,
-  TwoFiveSix = 2,
-  OneZeroTwoFour = 3,
-}
-
-impl TimerControl {
-  pub fn frequency(self) -> TimerFrequency {
-    match self.0 & 0b11 {
-      0 => TimerFrequency::One,
-      1 => TimerFrequency::SixFour,
-      2 => TimerFrequency::TwoFiveSix,
-      3 => TimerFrequency::OneZeroTwoFour,
-      _ => unreachable!(),
-    }
-  }
-  pub fn cascade_mode(self) -> bool {
-    self.0 & 0b100 > 0
-  }
-  pub fn interrupt(self) -> bool {
-    self.0 & 0b100_0000 > 0
-  }
-  pub fn enabled(self) -> bool {
-    self.0 & 0b1000_0000 > 0
-  }
-  //
-  pub fn set_frequency(&mut self, frequency: TimerFrequency) {
-    self.0 &= !0b11;
-    self.0 |= frequency as u16;
-  }
-  pub fn set_cascade_mode(&mut self, bit: bool) {
-    if bit {
-      self.0 |= 0b100;
-    } else {
-      self.0 &= !0b100;
-    }
-  }
-  pub fn set_interrupt(&mut self, bit: bool) {
-    if bit {
-      self.0 |= 0b100_0000;
-    } else {
-      self.0 &= !0b100_0000;
-    }
-  }
-  pub fn set_enabled(&mut self, bit: bool) {
-    if bit {
-      self.0 |= 0b1000_0000;
-    } else {
-      self.0 &= !0b1000_0000;
-    }
-  }
-}
-#}
-

A Timer Based Seed

-

Okay so how do we turns some timers into a PRNG seed? Well, usually our seed is -a u32. So we'll take two timers, string them together with that cascade deal, -and then set them off. Then we wait until the user presses any key. We probably -do this as our first thing at startup, but we might show the title and like a -"press any key to continue" message, or something.

-

-# #![allow(unused_variables)]
-#fn main() {
-/// Mucks with the settings of Timers 0 and 1.
-unsafe fn u32_from_user_wait() -> u32 {
-  let mut t = TimerControl::default();
-  t.set_enabled(true);
-  t.set_cascading(true);
-  TM1CNT.write(t.0);
-  t.set_cascading(false);
-  TM0CNT.write(t.0);
-  while key_input().0 == 0 {}
-  t.set_enabled(false);
-  TM0CNT.write(t.0);
-  TM1CNT.write(t.0);
-  let low = TM0D.read() as u32;
-  let high = TM1D.read() as u32;
-  (high << 32) | low
-}
-#}
-

Various Generators

-

SM64 (16-bit state, 16-bit output, non-uniform, bonkers)

-

Our first PRNG to mention isn't one that's at all good, but it sure might be -cute to use. It's the PRNG that Super Mario 64 had (video explanation, -long).

-

With a PRNG this simple the output of one call is also the seed to the next -call, so we don't need to make a struct for it or anything. You're also assumed -to just seed with a plain 0 value at startup. The generator has a painfully -small period, and you're assumed to be looping through the state space -constantly while the RNG goes.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub fn sm64(mut input: u16) -> u16 {
-  if input == 0x560A {
-    input = 0;
-  }
-  let mut s0 = input << 8;
-  s0 ^= input;
-  input = s0.rotate_left(8);
-  s0 = ((s0 as u8) << 1) as u16 ^ input;
-  let s1 = (s0 >> 1) ^ 0xFF80;
-  if (s0 & 1) == 0 {
-    if s1 == 0xAA55 {
-        input = 0;
-    } else {
-        input = s1 ^ 0x1FF4;
-    }
-  } else {
-    input = s1 ^ 0x8180;
-  }
-  input
-}
-#}
-

Compiler Explorer

-

If you watch the video explanation about this generator you'll note that the -first if checking for 0x560A prevents you from being locked into a 2-step -cycle, but it's only important if you want to feed bad seeds to the generator. A -bad seed is unhelpfully defined defined as "any value that the generator can't -output". The second if that checks for 0xAA55 doesn't seem to be important -at all from a mathematical perspective. It cuts the generator's period shorter -by an arbitrary amount for no known reason. It's left in there only for -authenticity.

-

LCG32 (32-bit state, 32-bit output, uniform)

-

The Linear Congruential -Generator is a -well known PRNG family. You pick a multiplier and an additive and you're done. -Right? Well, not exactly, because (as the wikipedia article explains) the values -that you pick can easily make your LCG better or worse all on its own. You want -a good multiplier, and you want your additive to be odd. In our example here -we've got the values that -Bulbapedia -says were used in the actual GBA Pokemon games, though Bulbapedia also lists -values for a few other other games as well.

-

I don't actually know if any of the constants used in the official games are -particularly good from a statistical viewpoint, though with only 32 bits an LCG -isn't gonna be passing any of the major statistical tests anyway (you need way -more bits in your LCG for that to happen). In my mind the main reason to use a -plain LCG like this is just for the fun of using the same PRNG that an official -Pokemon game did.

-

You should not use this as your default generator if you care about quality.

-

It is very fast though... if you want to set everything else on fire for -speed. If you do, please at least remember that the highest bits are the best -ones, so if you're after less than 32 bits you should shift the high ones down -and keep those, or if you want to turn it into a bool cast to i32 and then -check if it's negative, etc.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub fn lcg32(seed: u32) -> u32 {
-  seed.wrapping_mul(0x41C6_4E6D).wrapping_add(0x6073)
-}
-#}
-

Compiler Explorer

-

Multi-stream Generators

-

Note that you don't have to add a compile time constant, you could add a runtime -value instead. Doing so allows the generator to be "multi-stream", with each -different additive value being its own unique output stream. This true of LCGs -as well as all the PCGs below (since they're LCG based). The examples here just -use a fixed stream for simplicity and to save space, but if you want streams you -can add that in for only a small amount of extra space used:

-

-# #![allow(unused_variables)]
-#fn main() {
-pub fn lcg_streaming(seed: u32, stream: u32) -> u32 {
-  seed.wrapping_mul(0x41C6_4E6D).wrapping_add(stream)
-}
-#}
-

With a streaming LCG you should pass the same stream value every single time. If -you don't, then your generator will jump between streams in some crazy way and -you lose your nice uniformity properties.

-

There is the possibility of intentionally changing the stream value exactly when -the seed lands on a pre-determined value (after the multiply and add). This -basically makes the stream selection value's bit size (minus one bit, because -it must be odd) count into the LCG's state bit size for calculating the overall -period of the generator. So an LCG32 with a 32-bit stream selection would have a -period of 2^32 * 2^31 = 2^63.

-

-# #![allow(unused_variables)]
-#fn main() {
-let next_seed = lcg_streaming(seed, stream);
-// It's cheapest to test for 0, so we pick 0
-if seed == 0 {
-  stream = stream.wrapping_add(2)
-}
-#}
-

However, this isn't a particularly effective way to extend the generator's -period, and we'll see a much better extension technique below.

-

PCG16 XSH-RS (32-bit state, 16-bit output, uniform)

-

The Permuted Congruential -Generator family -is the next step in LCG technology. We start with LCG output, which is good but -not great, and then we apply one of several possible permutations to bump up the -quality. There's basically a bunch of permutation components that are each -defined in terms of the bit width that you're working with.

-

The "default" variant of PCG, PCG32, has 64 bits of state and 32 bits of output, -and it uses the "XSH-RR" permutation. Here we'll put together a 32 bit version -with 16-bit output, and using the "XSH-RS" permutation (but we'll show the other -one too for comparison).

-

Of course, since PCG is based on a LCG, we have to start with a good LCG base. -As I said above, a better or worse set of LCG constants can make your generator -better or worse. The Wikipedia example for PCG has a good 64-bit constant, but -not a 32-bit constant. So we gotta ask an -expert -about what a good 32-bit constant would be. I'm definitely not the best at -reading math papers, but it seems that the general idea is that we want m % 8 == 5 and is_even(a) to both hold for the values we pick. There are three -suggested LCG multipliers in a chart on page 10. A chart that's quite hard to -understand. Truth be told I asked several folks that are good at math papers and -even they couldn't make sense of the chart. Eventually timutable read the -whole paper in depth and concluded the same as I did: that we probably want to -pick the 32310901 option.

-

For an additive value, we can pick any odd value, so we might as well pick -something small so that we can do an immediate add. Immediate add? That sounds -new. An immediate instruction is when one side of an operation is small enough -that you can encode the value directly into the space that'd normally be for the -register you want to use. It basically means one less load you have to do, if -you're working with small enough numbers. To see what I mean compare loading -the add value and immediate add -value. It's something you might have seen -frequently in x86 or x86_64 ASM output, but because a thumb instruction is -only 16 bits total, we can only get immediate instructions if the target value -is 8 bits or less, so we haven't used them too much ourselves yet.

-

I guess we'll pick 5, because I happen to personally like the number.

-

-# #![allow(unused_variables)]
-#fn main() {
-// Demo only. The "default" PCG permutation, for use when rotate is cheaper
-pub fn pcg16_xsh_rr(seed: &mut u32) -> u16 {
-  *seed = seed.wrapping_mul(32310901).wrapping_add(5);
-  const INPUT_SIZE: u32 = 32;
-  const OUTPUT_SIZE: u32 = 16;
-  const ROTATE_BITS: u32 = 4;
-  let mut out32 = *seed;
-  let rot = out32 >> (INPUT_SIZE - ROTATE_BITS);
-  out32 ^= out32 >> ((OUTPUT_SIZE + ROTATE_BITS) / 2);
-  ((out32 >> (OUTPUT_SIZE - ROTATE_BITS)) as u16).rotate_right(rot)
-}
-
-// This has slightly worse statistics but runs much better on the GBA
-pub fn pcg16_xsh_rs(seed: &mut u32) -> u16 {
-  *seed = seed.wrapping_mul(32310901).wrapping_add(5);
-  const INPUT_SIZE: u32 = 32;
-  const OUTPUT_SIZE: u32 = 16;
-  const SHIFT_BITS: u32 = 2;
-  const NEXT_MOST_BITS: u32 = 19;
-  let mut out32 = *seed;
-  let shift = out32 >> (INPUT_SIZE - SHIFT_BITS);
-  out32 ^= out32 >> ((OUTPUT_SIZE + SHIFT_BITS) / 2);
-  (out32 >> (NEXT_MOST_BITS + shift)) as u16
-}
-#}
-

Compiler Explorer

-

PCG32 RXS-M-XS (32-bit state, 32-bit output, uniform)

-

Having the output be smaller than the input is great because you can keep just -the best quality bits that the LCG stage puts out, and you basically get 1 point -of dimensional equidistribution for each bit you discard as the size goes down -(so 32->16 gives 16). However, if your output size has to the the same as your -input size, the PCG family is still up to the task.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub fn pcg32_rxs_m_xs(seed: &mut u32) -> u32 {
-  *seed = seed.wrapping_mul(32310901).wrapping_add(5);
-  let mut out32 = *seed;
-  let rxs = out32 >> 28;
-  out32 ^= out32 >> (4 + rxs);
-  const PURE_MAGIC: u32 = 277803737;
-  out32 *= PURE_MAGIC;
-  out32^ (out32 >> 22)
-}
-#}
-

Compiler Explorer

-

This permutation is the slowest but gives the strongest statistical benefits. If -you're going to be keeping 100% of the output bits you want the added strength -obviously. However, the period isn't actually any longer, so each output will be -given only once within the full period (1-dimensional equidistribution).

-

PCG Extension Array

-

As a general improvement to any PCG you can hook on an "extension array" to give -yourself a longer period. It's all described in the PCG -Paper, but here's the bullet points:

-
    -
  • In addition to your generator's state (and possible stream) you keep an array -of "extension" values. The array type is the same as your output type, and -the array count must be a power of two value that's less than the maximum -value of your state size.
  • -
  • When you run the generator, use the lowest bits to select from your -extension array according to the array's power of two. Eg: if the size is 2 -then use the single lowest bit, if it's 4 then use the lowest 2 bits, etc.
  • -
  • Every time you run the generator, XOR the output with the selected value from -the array.
  • -
  • Every time the generator state lands on 0, cycle the array. We want to be -careful with what we mean here by "cycle". We want the entire pattern of -possible array bits to occur eventually. However, we obviously can't do -arbitrary adds for as many bits as we like, so we'll have to "carry the 1" -between the portions of the array by hand.
  • -
-

Here's an example using an 8 slot array and pcg16_xsh_rs:

-

-# #![allow(unused_variables)]
-#fn main() {
-// uses pcg16_xsh_rs from above
-
-pub struct PCG16Ext8 {
-  state: u32,
-  ext: [u16; 8],
-}
-
-impl PCG16Ext8 {
-  pub fn next_u16(&mut self) -> u16 {
-    // PCG as normal.
-    let mut out = pcg16_xsh_rs(&mut self.state);
-    // XOR with a selected extension array value
-    out ^= unsafe { self.ext.get_unchecked((self.state & !0b111) as usize) };
-    // if state == 0 we cycle the array with a series of overflowing adds
-    if self.state == 0 {
-      let mut carry = true;
-      let mut index = 0;
-      while carry && index < self.ext.len() {
-        let (add_output, next_carry) = self.ext[index].overflowing_add(1);
-        self.ext[index] = add_output;
-        carry = next_carry;
-        index += 1;
-      }
-    }
-    out
-  }
-}
-#}
-

Compiler Explorer

-

The period gained from using an extension array is quite impressive. For a b-bit -generator giving r-bit outputs, and k array slots, the period goes from 2^b to -2^(k*r+b). So our 2^32 period generator has been extended to 2^160.

-

Of course, we might care to seed the array itself so that it's not all 0 bits -all the way though, but that's not strictly necessary. All 0s is a legitimate -part of the extension cycle, so we have to pass through it at some point.

-

Xoshiro128** (128-bit state, 32-bit output, non-uniform)

-

The Xoshiro128** generator is -an advancement of the Xorshift family. -It was specifically requested, and I'm not aware of Xorshift specifically being -used in any of my favorite games, so instead of going over Xorshift and then -leading up to this, we'll just jump straight to this. Take care not to confuse -this generator with the very similarly named -Xoroshiro128** generator, -which is the 64 bit variant. Note the extra "ro" hiding in the 64-bit version's -name near the start.

-

Anyway, weird names aside, it's fairly zippy. The biggest downside is that you -can't have a seed state that's all 0s, and as a result 0 will be produced one -less time than all other outputs within a full cycle, making it non-uniform by -just a little bit. You also can't do a simple stream selection like with the LCG -based generators, instead it has a fixed jump function that advances a seed as -if you'd done 2^64 normal generator advancements.

-

Note that Xoshiro256** is known to fail statistical tests, so the 128 version -is unlikely to pass them, though I admit that I didn't check myself.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub fn xoshiro128_starstar(seed: &mut [u32; 4]) -> u32 {
-  let output = seed[0].wrapping_mul(5).rotate_left(7).wrapping_mul(9);
-  let t = seed[1] << 9;
-
-  seed[2] ^= seed[0];
-  seed[3] ^= seed[1];
-  seed[1] ^= seed[2];
-  seed[0] ^= seed[3];
-
-  seed[2] ^= t;
-
-  seed[3] = seed[3].rotate_left(11);
-
-  output
-}
-
-pub fn xoshiro128_starstar_jump(seed: &mut [u32; 4]) {
-  const JUMP: [u32; 4] = [0x8764000b, 0xf542d2d3, 0x6fa035c3, 0x77f2db5b];
-  let mut s0 = 0;
-  let mut s1 = 0;
-  let mut s2 = 0;
-  let mut s3 = 0;
-  for j in JUMP.iter() {
-    for b in 0 .. 32 {
-        if *j & (1 << b) > 0 {
-            s0 ^= seed[0];
-            s1 ^= seed[1];
-            s2 ^= seed[2];
-            s3 ^= seed[3];
-        }
-        xoshiro128_starstar(seed);
-    }
-  }
-  seed[0] = s0;
-  seed[1] = s1;
-  seed[2] = s2;
-  seed[3] = s3;
-}
-#}
-

Compiler Explorer

-

jsf32 (128-bit state, 32-bit output, non-uniform)

-

This is Bob Jenkins's [Small/Fast PRNG](small noncryptographic PRNG). It's a -little faster than Xoshiro128** (no multiplication involved), and can pass any -statistical test that's been thrown at it.

-

Interestingly the generator's period is not fixed based on the generator -overall. It's actually set by the exact internal generator state. There's even -six possible internal generator states where the generator becomes a fixed -point. Because of this, we should use the verified seeding method provided. -Using the provided seeding, the minimum period is expected to be 2^94, the -average is about 2^126, and no seed given to the generator is likely to overlap -with another seed's output for at least 2^64 uses.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub struct JSF32 {
-  a: u32,
-  b: u32,
-  c: u32,
-  d: u32,
-}
-
-impl JSF32 {
-  pub fn new(seed: u32) -> Self {
-    let mut output = JSF32 {
-      a: 0xf1ea5eed,
-      b: seed,
-      c: seed,
-      d: seed
-    };
-    for _ in 0 .. 20 {
-      output.next();
-    }
-    output
-  }
-
-  pub fn next(&mut self) -> u32 {
-    let e = self.a - self.b.rotate_left(27);
-    self.a = self.b ^ self.c.rotate_left(17);
-    self.b = self.c + self.d;
-    self.c = self.d + e;
-    self.d = e + self.a;
-    self.d
-  }
-}
-#}
-

Compiler Explorer

-

Here it's presented with (27,17), but you can also use any of the following if -you want alternative generator flavors that use this same core technique:

-
    -
  • (9,16), (9,24), (10,16), (10,24), (11,16), (11,24), (25,8), (25,16), (26,8), -(26,16), (26,17), or (27,16).
  • -
-

Note that these alternate flavors haven't had as much testing as the (27,17) -version, though they are likely to be just as good.

-

Other Generators?

-
    -
  • Mersenne Twister: Gosh, 2.5k -is just way too many for me to ever want to use this thing. If you'd really -like to use it, there is a -crate for it that -already has it. Small catch, they use a ton of stuff from std that they -could be importing from core, so you'll have to fork it and patch it -yourself to get it working on the GBA. They also stupidly depend on an old -version of rand, so you'll have to cut out that nonsense.
  • -
-

Placing a Value In Range

-

I said earlier that you can always take a uniform output and then throw out some -bits, and possibly the whole result, to reduce it down into a smaller range. How -exactly does one do that? Well it turns out that it's very -tricky to get right, and we -could be losing as much as 60% of our execution time if we don't do it carefully.

-

The best possible case is if you can cleanly take a specific number of bits -out of your result without even doing any branching. The rest can be discarded -or kept for another step as you choose. I know that I keep referencing Pokemon, -but it's a very good example for the use of randomization. Each pokemon has, -among many values, a thing called an "IV" for each of 6 stats. The IVs range -from 0 to 31, which is total nonsense to anyone not familiar with decimal/binary -conversions, but to us programmers that's clearly a 5 bit range. Rather than -making math that's better for people using decimal (such as a 1-20 range or -something like that) they went with what's easiest for the computer.

-

The next best case is if you can have a designated range that you want to -generate within that's known at compile time. This at least gives us a chance to -write some bit of extremely specialized code that can take random bits and get -them into range. Hopefully your range can be "close enough" to a binary range -that you can get things into place. Example: if you want a "1d6" result then you -can generate a u16, look at just 3 bits (0..8), and if they're in the range -you're after you're good. If not you can discard those and look at the next 3 -bits. We started with 16 of them, so you get five chances before you have to run -the generator again entirely.

-

The goal here is to avoid having to do one of the worst things possible in -computing: divmod. It's terribly expensive, even on a modern computer it's -about 10x as expensive as any other arithmetic, and on a GBA it's even worse for -us. We have to call into the BIOS to have it do a software division. Calling -into the BIOS at all is about a 60 cycle overhead (for comparison, a normal -function call is more like 30 cycles of overhead), plus the time it takes to -do the math itself. Remember earlier how we were happy to have a savings of 5 -instructions here or there? Compared to this, all our previous efforts are -basically useless if we can't evade having to do a divmod. You can do quite a -bit of if checking and potential additional generator calls before it exceeds -the cost of having to do even a single divmod.

-

Calling The BIOS

-

How do we do the actual divmod when we're forced to? Easy: inline -assembly of -course (There's also an ARM -oriented blog post -about it that I found most helpful). The GBA has many BIOS -Functions, each of which has -a designated number. We use the -swi -op (short for "SoftWare Interrupt") combined with the BIOS function number that -we want performed. Our code halts, some setup happens (hence that 60 cycles of -overhead I mentioned), the BIOS does its thing, and then eventually control -returns to us.

-

The precise details of what the BIOS call does depends on the function number -that we call. We'd even have to potentially mark it as volatile asm if there's -no clear outputs, otherwise the compiler would "helpfully" eliminate it for us -during optimization. In our case there are clear outputs. The numerator goes -into register 0, and the denominator goes into register 1, the divmod happens, -and then the division output is left in register 0 and the modulus output is -left in register 1. I keep calling it "divmod" because div and modulus are two -sides of the same coin. There's no way to do one of them faster by not doing the -other or anything like that, so we'll first define it as a unified function that -returns a tuple:

-

-# #![allow(unused_variables)]
-#![feature(asm)]
-#fn main() {
-// put the above at the top of any program and/or library that uses inline asm
-
-pub fn div_modulus(numerator: i32, denominator: i32) -> (i32, i32) {
-  assert!(denominator != 0);
-  {
-    let div_out: i32;
-    let mod_out: i32;
-    unsafe {
-      asm!(/* assembly template */ "swi 0x06"
-          :/* output operands */ "={r0}"(div_out), "={r1}"(mod_out)
-          :/* input operands */ "{r0}"(numerator), "{r1}"(denominator)
-          :/* clobbers */ "r3"
-          :/* options */
-    );
-    }
-    (div_out, mod_out)
-  }
-}
-#}
-

And next, since most of the time we really do want just the div or modulus -without having to explicitly throw out the other half, we also define -intermediary functions to unpack the correct values.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub fn div(numerator: i32, denominator: i32) -> i32 {
-  div_modulus(numerator, denominator).0
-}
-
-pub fn modulus(numerator: i32, denominator: i32) -> i32 {
-  div_modulus(numerator, denominator).1
-}
-#}
-

We can generally trust the compiler to inline single line functions correctly -even without an #[inline] directive when it's not going cross-crate or when -LTO is on. I'd point you to some exact output from the Compiler Explorer, but at -the time of writing their nightly compiler is broken, and you can only use -inline asm with a nightly compiler. Unfortunate. Hopefully they'll fix it soon -and I can come back to this section with some links.

-

Finally Those Random Ranges We Mentioned

-

Of course, now that we can do divmod if we need to, let's get back to random -numbers in ranges that aren't exact powers of two.

-

yada yada yada, if you just use x % n to place x into the range 0..n then -you'll turn an unbiased value into a biased value (or you'll turn a biased value -into an arbitrarily more biased value). You should never do this, etc etc.

-

So what's a good way to get unbiased outputs? We're going to be adapting some -CPP code from that that I first hinted at way up above. It's specifically all -about the various ways you can go about getting unbiased random results for -various bounds. There's actually many different methods offered, and for -specific situations there's sometimes different winners for speed. The best -overall performer looks like this:

-
uint32_t bounded_rand(rng_t& rng, uint32_t range) {
-    uint32_t x = rng();
-    uint64_t m = uint64_t(x) * uint64_t(range);
-    uint32_t l = uint32_t(m);
-    if (l < range) {
-        uint32_t t = -range;
-        if (t >= range) {
-            t -= range;
-            if (t >= range) 
-                t %= range;
-        }
-        while (l < t) {
-            x = rng();
-            m = uint64_t(x) * uint64_t(range);
-            l = uint32_t(m);
-        }
-    }
-    return m >> 32;
-}
-
-

And, wow, I sure don't know what a lot of that means (well, I do, but let's -pretend I don't for dramatic effect, don't tell anyone). Let's try to pick it -apart some.

-

First, all the uint32_t and uint64_t are C nonsense names for what we just -call u32 and u64. You probably guessed that on your own.

-

Next, rng_t& rng is more properly written as rng: &rng_t. Though, here -there's a catch: as you can see we're calling rng within the function, so in -rust we'd need to declare it as rng: &mut rng_t, because C++ doesn't track -mutability the same as we do (barbaric, I know).

-

Finally, what's rng_t actually defined as? Well, I sure don't know, but in our -context it's taking nothing and then spitting out a u32. We'll also presume -that it's a different u32 each time (not a huge leap in this context). To us -rust programmers that means we'd want something like impl FnMut() -> u32.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub fn bounded_rand(rng: &mut impl FnMut() -> u32, range: u32) -> u32 {
-  let mut x: u32 = rng();
-  let mut m: u64 = x as u64 * range as u64;
-  let mut l: u32 = m as u32;
-  if l < range {
-    let mut t: u32 = range.wrapping_neg();
-    if t >= range {
-      t -= range;
-      if t >= range {
-        t = modulus(t, range);
-      }
-    }
-    while l < t {
-      x = rng();
-      m = x as u64 * range as u64;
-      l = m as u32;
-    }
-  }
-  (m >> 32) as u32
-}
-#}
-

So, now we can read it. Can we compile it? No, actually. Turns out we can't. -Remember how our modulus function is (i32, i32) -> i32? Here we're doing -(u32, u32) -> u32. You can't just cast, modulus, and cast back. You'll get -totally wrong results most of the time because of sign-bit stuff. Since it's -fairly probable that range fits in a positive i32, its negation must -necessarily be a negative value, which triggers exactly the bad situation where -casting around gives us the wrong results.

-

Well, that's not the worst thing in the world either, since we also didn't -really wanna be doing those 64-bit multiplies. Let's try again with everything -scaled down one stage:

-

-# #![allow(unused_variables)]
-#fn main() {
-pub fn bounded_rand16(rng: &mut impl FnMut() -> u16, range: u16) -> u16 {
-  let mut x: u16 = rng();
-  let mut m: u32 = x as u32 * range as u32;
-  let mut l: u16 = m as u16;
-  if l < range {
-    let mut t: u16 = range.wrapping_neg();
-    if t >= range {
-      t -= range;
-      if t >= range {
-        t = modulus(t as i32, range as i32) as u16;
-      }
-    }
-    while l < t {
-      x = rng();
-      m = x as u32 * range as u32;
-      l = m as u16;
-    }
-  }
-  (m >> 16) as u16
-}
-#}
-

Okay, so the code compiles, and it plays nicely what the known limits of the -various number types involved. We know that if we cast a u16 up into i32 -it's assured to fit properly and also be positive, and the output is assured to -be smaller than the input so it'll fit when we cast it back down to u16. -What's even happening though? Well, this is a variation on Lemire's -method. One of the biggest attempts at a -speedup here is that when you have

-

-# #![allow(unused_variables)]
-#fn main() {
-a %= b;
-#}
-

You can translate that into

-

-# #![allow(unused_variables)]
-#fn main() {
-if a >= b {
-  a -= b;
-  if a >= b {
-    a %= b;
-  }
-}
-#}
-

Now... if we're being real with ourselves, let's just think about this for a -moment. How often will this help us? I genuinely don't know. But I do know how -to find out: we write a program to just enumerate all possible -cases -and run the code. You can't always do this, but there's not many possible u16 -values. The output is this:

-
skip_all:32767
-sub_worked:10923
-had_to_modulus:21846
-Some skips:
-32769
-32770
-32771
-32772
-32773
-Some subs:
-21846
-21847
-21848
-21849
-21850
-Some mods:
-0
-1
-2
-3
-4
-
-

So, about half the time, we're able to skip all our work, and about a sixth of -the time we're able to solve it with just the subtract, with the other third of -the time we have to do the mod. However, what I personally care about the most -is smaller ranges, and we can see that we'll have to do the mod if our target -range size is in 0..21846, and just the subtract if our target range size is -in 21846..32769, and we can only skip all work if our range size is 32769 -and above. So that's not cool.

-

But what is cool is that we're doing the modulus only once, and the rest of -the time we've just got the cheap operations. Sounds like we can maybe try to -cache that work and reuse a range of some particular size. We can also get that -going pretty easily.

-

-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Debug, Clone, Copy, PartialEq, Eq)]
-pub struct RandRangeU16 {
-  range: u16,
-  threshold: u16,
-}
-
-impl RandRangeU16 {
-  pub fn new(mut range: u16) -> Self {
-    let mut threshold = range.wrapping_neg();
-    if threshold >= range {
-      threshold -= range;
-      if threshold >= range {
-        threshold = modulus(threshold as i32, range as i32) as u16;
-      }
-    }
-    RandRangeU16 { range, threshold }
-  }
-
-  pub fn roll_random(&self, rng: &mut impl FnMut() -> u16) -> u16 {
-    let mut x: u16 = rng();
-    let mut m: u32 = x as u32 * self.range as u32;
-    let mut l: u16 = m as u16;
-    if l < self.range {
-      while l < self.threshold {
-        x = rng();
-        m = x as u32 * self.range as u32;
-        l = m as u16;
-      }
-    }
-    (m >> 16) as u16
-  }
-}
-#}
-

What if you really want to use ranges bigger than u16? Well, that's possible, -but we'd want a whole new technique. Preferably one that didn't do divmod at -all, to avoid any nastiness with sign bit nonsense. Thankfully there is one such -method listed in the blog post, "Bitmask with Rejection (Unbiased)"

-
uint32_t bounded_rand(rng_t& rng, uint32_t range) {
-    uint32_t mask = ~uint32_t(0);
-    --range;
-    mask >>= __builtin_clz(range|1);
-    uint32_t x;
-    do {
-        x = rng() & mask;
-    } while (x > range);
-    return x;
-}
-
-

And in Rust

-

-# #![allow(unused_variables)]
-#fn main() {
-pub fn bounded_rand32(rng: &mut impl FnMut() -> u32, mut range: u32) -> u32 {
-  let mut mask: u32 = !0;
-  range -= 1;
-  mask >>= (range | 1).leading_zeros();
-  let mut x = rng() & mask;
-  while x > range {
-    x = rng() & mask;
-  }
-  x
-}
-#}
-

Wow, that's so much less code. What the heck? Less code is supposed to be the -faster version, why is this rated slower? Basically, because of how the math -works out on how often you have to run the PRNG again and stuff, Lemire's method -usually better with smaller ranges and the masking method usually works -better with larger ranges. If your target range fits in a u8, probably use -Lemire's. If it's bigger than u8, or if you need to do it just once and can't -benefit from the cached modulus, you might want to start moving toward the -masking version at some point in there. Obviously if your target range is more -than a u16 then you have to use the masking method. The fact that they're each -oriented towards different size generator outputs only makes things more -complicated.

-

Life just be that way, I guess.

-

Summary Table

-

That was a whole lot. Let's put them in a table:

- - - - - - - - -
Generator Bytes Output Period k-Dim
sm64 2 u16 65,114 0
lcg32 4 u16 2^32 1
pcg16_xsh_rs 4 u16 2^32 1
pcg32_rxs_m_xs 4 u32 2^32 1
PCG16Ext8 20 u16 2^160 8
xoshiro128** 16 u32 2^128-1 0
jsf32 16 u32 ~2^126 0
- -
- - -
-
- - - -
- - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/ch03/memory_game.html b/docs/ch03/memory_game.html deleted file mode 100644 index 6797e77..0000000 --- a/docs/ch03/memory_game.html +++ /dev/null @@ -1,478 +0,0 @@ - - - - - - memory_game - Rust GBA Guide - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
- - - - - - - - - - -
-
-

Making A Memory Game

-

For this example to show off our new skills we'll make a "memory" game. The idea -is that there's some face down cards and you pick one, it flips, you pick a -second, if they match they both go away, if they don't match they both turn back -face down. The player keeps going until all the cards are gone, then we'll deal -the cards again.

-

There are many steps to do to get such a simple seeming game going. In fact I -stumbled a bit myself when trying to get things set up and going despite having -written and explained all the parts so far. Accordingly, we'll take each part -very slowly, and review things as we build up our game.

-

We'll start back with a nearly blank file, calling it memory_game.rs:

-
#![feature(start)]
-#![no_std]
-
-#[panic_handler]
-fn panic(_info: &core::panic::PanicInfo) -> ! {
-  loop {}
-}
-
-#[start]
-fn main(_argc: isize, _argv: *const *const u8) -> isize {
-  loop {
-    // TODO the whole thing
-  }
-}
-
-

Displaying A Background

-

First let's try to get a background going. We'll display a simple checker -pattern just so that we know that we did something.

-

Remember, backgrounds have the following essential components:

-
    -
  • Background Palette
  • -
  • Background Tiles
  • -
  • Screenblock
  • -
  • IO Registers
  • -
-

Background Palette

-

To write to the background palette memory we'll want to name a VolatilePtr for -it. We'll probably also want to be able to cast between different types either -right away or later in this program, so we'll add a method for that.

-

-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Debug, Clone, Copy, PartialEq, Eq)]
-#[repr(transparent)]
-pub struct VolatilePtr<T>(pub *mut T);
-impl<T> VolatilePtr<T> {
-  pub unsafe fn read(&self) -> T {
-    core::ptr::read_volatile(self.0)
-  }
-  pub unsafe fn write(&self, data: T) {
-    core::ptr::write_volatile(self.0, data);
-  }
-  pub fn offset(self, count: isize) -> Self {
-    VolatilePtr(self.0.wrapping_offset(count))
-  }
-  pub fn cast<Z>(self) -> VolatilePtr<Z> {
-    VolatilePtr(self.0 as *mut Z)
-  }
-}
-#}
-

Now we give ourselves an easy way to write a color into a palbank slot.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub const BACKGROUND_PALETTE: VolatilePtr<u16> = VolatilePtr(0x500_0000 as *mut u16);
-
-pub fn set_bg_palette_4bpp(palbank: usize, slot: usize, color: u16) {
-  assert!(palbank < 16);
-  assert!(slot > 0 && slot < 16);
-  unsafe {
-    BACKGROUND_PALETTE
-      .cast::<[u16; 16]>()
-      .offset(palbank as isize)
-      .cast::<u16>()
-      .offset(slot as isize)
-      .write(color);
-  }
-}
-#}
-

And of course we need to bring back in our ability to build color values, as -well as a few named colors to start us off:

-

-# #![allow(unused_variables)]
-#fn main() {
-pub const fn rgb16(red: u16, green: u16, blue: u16) -> u16 {
-  blue << 10 | green << 5 | red
-}
-
-pub const WHITE: u16 = rgb16(31, 31, 31);
-pub const LIGHT_GRAY: u16 = rgb16(25, 25, 25);
-pub const DARK_GRAY: u16 = rgb16(15, 15, 15);
-#}
-

Which finally allows us to set our palette colors in main:

-
fn main(_argc: isize, _argv: *const *const u8) -> isize {
-  set_bg_palette_4bpp(0, 1, WHITE);
-  set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
-  set_bg_palette_4bpp(0, 3, DARK_GRAY);
-
-

Background Tiles

-

So we'll want some light gray tiles and some dark gray tiles. We could use a -single tile and then swap it between palbanks to do the color selection, but for -now we'll just use two different tiles, since we've got tons of tile space to -spare.

-

-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Debug, Clone, Copy, Default)]
-#[repr(transparent)]
-pub struct Tile4bpp {
-  pub data: [u32; 8],
-}
-
-pub const ALL_TWOS: Tile4bpp = Tile4bpp {
-  data: [
-    0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222,
-  ],
-};
-
-pub const ALL_THREES: Tile4bpp = Tile4bpp {
-  data: [
-    0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333,
-  ],
-};
-#}
-

And then we have to have a way to put the tiles into video memory:

-

-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Clone, Copy)]
-#[repr(transparent)]
-pub struct Charblock4bpp {
-  pub data: [Tile4bpp; 512],
-}
-
-pub const VRAM: VolatilePtr<Charblock4bpp> = VolatilePtr(0x0600_0000 as *mut Charblock4bpp);
-
-pub fn set_bg_tile_4bpp(charblock: usize, index: usize, tile: Tile4bpp) {
-  assert!(charblock < 4);
-  assert!(index < 512);
-  unsafe { VRAM.offset(charblock as isize).cast::<Tile4bpp>().offset(index as isize).write(tile) }
-}
-#}
-

And finally, we can call that within main:

-
fn main(_argc: isize, _argv: *const *const u8) -> isize {
-  // bg palette
-  set_bg_palette_4bpp(0, 1, WHITE);
-  set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
-  set_bg_palette_4bpp(0, 3, DARK_GRAY);
-  // bg tiles
-  set_bg_tile_4bpp(0, 0, ALL_TWOS);
-  set_bg_tile_4bpp(0, 1, ALL_THREES);
-
-

Setup A Screenblock

-

Screenblocks are a little weird because they take the same space as the -charblocks (8 screenblocks per charblock). The GBA will let you mix and match -and it's up to you to keep it all straight. We're using tiles at the base of -charblock 0, so we'll place our screenblock at the base of charblock 1.

-

First, we have to be able to make one single screenblock entry at a time:

-

-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Debug, Clone, Copy, Default)]
-#[repr(transparent)]
-pub struct RegularScreenblockEntry(u16);
-
-impl RegularScreenblockEntry {
-  pub const SCREENBLOCK_ENTRY_TILE_ID_MASK: u16 = 0b11_1111_1111;
-  pub fn from_tile_id(id: u16) -> Self {
-    RegularScreenblockEntry(id & Self::SCREENBLOCK_ENTRY_TILE_ID_MASK)
-  }
-}
-#}
-

And then with 32x32 of these things we'll have a whole screenblock. Now, we -probably won't actually make values of the screenblock type itself, but we at -least need it to have the type declared with the correct size so that we can -move our pointers around by the right amount.

-

-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Clone, Copy)]
-#[repr(transparent)]
-pub struct RegularScreenblock {
-  pub data: [RegularScreenblockEntry; 32 * 32],
-}
-#}
-

Alright, so, as I said those things are kinda big, we don't really want to be -building them up on the stack if we can avoid it, so we'll write one straight -into memory at the correct location.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub fn checker_screenblock(slot: usize, a_entry: RegularScreenblockEntry, b_entry: RegularScreenblockEntry) {
-  let mut p = VRAM.cast::<RegularScreenblock>().offset(slot as isize).cast::<RegularScreenblockEntry>();
-  let mut checker = true;
-  for _row in 0..32 {
-    for _col in 0..32 {
-      unsafe { p.write(if checker { a_entry } else { b_entry }) };
-      p = p.offset(1);
-      checker = !checker;
-    }
-    checker = !checker;
-  }
-}
-#}
-

And then we add this into main

-

-# #![allow(unused_variables)]
-#fn main() {
-  // screenblock
-  let light_entry = RegularScreenblockEntry::from_tile_id(0);
-  let dark_entry = RegularScreenblockEntry::from_tile_id(1);
-  checker_screenblock(8, light_entry, dark_entry);
-#}
-

Background IO Registers

-

Our most important step is of course the IO register step. There's four -different background layers, but each of them has the same format for their -control register. For the moment, all that we care about is being able to set -the "screen base block" value.

-

-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Clone, Copy, Default, PartialEq, Eq)]
-#[repr(transparent)]
-pub struct BackgroundControlSetting(u16);
-
-impl BackgroundControlSetting {
-  pub fn from_base_block(sbb: u16) -> Self {
-    BackgroundControlSetting(sbb << 8)
-  }
-}
-
-pub const BG0CNT: VolatilePtr<BackgroundControlSetting> = VolatilePtr(0x400_0008 as *mut BackgroundControlSetting);
-#}
-

And... that's all it takes for us to be able to add a line into main

-

-# #![allow(unused_variables)]
-#fn main() {
-  // bg0 control
-  unsafe { BG0CNT.write(BackgroundControlSetting::from_base_block(8)) };
-#}
-

Set The Display Control Register

-

We're finally ready to set the display control register and get things going.

-

We've slightly glossed over it so far, but when the GBA is first booted most -everything within the address space will be all zeroed. However, the display -control register has the "Force VBlank" bit enabled by the BIOS, giving you a -moment to put the memory in place that you'll need for the first frame.

-

So, now that have got all of our memory set, we'll overwrite the initial -display control register value with what we'll call "just enable bg0".

-

-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Clone, Copy, Default, PartialEq, Eq)]
-#[repr(transparent)]
-pub struct DisplayControlSetting(u16);
-
-impl DisplayControlSetting {
-  pub const JUST_ENABLE_BG0: DisplayControlSetting = DisplayControlSetting(1 << 8);
-}
-
-pub const DISPCNT: VolatilePtr<DisplayControlSetting> = VolatilePtr(0x0400_0000 as *mut DisplayControlSetting);
-#}
-

And so finally we have a complete main

-
#[start]
-fn main(_argc: isize, _argv: *const *const u8) -> isize {
-  // bg palette
-  set_bg_palette_4bpp(0, 1, WHITE);
-  set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
-  set_bg_palette_4bpp(0, 3, DARK_GRAY);
-  // bg tiles
-  set_bg_tile_4bpp(0, 0, ALL_TWOS);
-  set_bg_tile_4bpp(0, 1, ALL_THREES);
-  // screenblock
-  let light_entry = RegularScreenblockEntry::from_tile_id(0);
-  let dark_entry = RegularScreenblockEntry::from_tile_id(1);
-  checker_screenblock(8, light_entry, dark_entry);
-  // bg0 control
-  unsafe { BG0CNT.write(BackgroundControlSetting::from_base_block(8)) };
-  // Display Control
-  unsafe { DISPCNT.write(DisplayControlSetting::JUST_ENABLE_BG0) };
-  loop {
-    // TODO the whole thing
-  }
-}
-
-

And It works, Marty! It works!

-

screenshot_checkers

- -
- - -
-
- - - -
- - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/ch03/obj_memory_2d1d.jpg b/docs/ch03/obj_memory_2d1d.jpg deleted file mode 100644 index 4cec80c..0000000 Binary files a/docs/ch03/obj_memory_2d1d.jpg and /dev/null differ diff --git a/docs/ch03/regular_backgrounds.html b/docs/ch03/regular_backgrounds.html deleted file mode 100644 index 82bb6f1..0000000 --- a/docs/ch03/regular_backgrounds.html +++ /dev/null @@ -1,484 +0,0 @@ - - - - - - Regular Backgrounds - Rust GBA Guide - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
- - - - - - - - - - -
-
-

Regular Backgrounds

-

So, backgrounds, they're cool. Why do we call the ones here "regular" -backgrounds? Because there's also "affine" backgrounds. However, affine math -stuff adds a complication, so for now we'll just work with regular backgrounds. -The non-affine backgrounds are sometimes called "text mode" backgrounds by other -guides.

-

To get your background image working you generally need to perform all of the -following steps, though I suppose the exact ordering is up to you.

-

Tiled Video Modes

-

When you want regular tiled display, you must use video mode 0 or 1.

-
    -
  • Mode 0 allows for using all four BG layers (0 through 3) as regular -backgrounds.
  • -
  • Mode 1 allows for using BG0 and BG1 as regular backgrounds, BG2 as an affine -background, and BG3 not at all.
  • -
  • Mode 2 allows for BG2 and BG3 to be used as affine backgrounds, while BG0 and -BG1 cannot be used at all.
  • -
-

We will not cover affine backgrounds in this chapter, so we will naturally be -using video mode 0.

-

Also, note that you have to enable each background layer that you want to use -within the display control register.

-

Get Your Palette Ready

-

Background palette starts at 0x5000000 and is 256 u16 values long. It'd -potentially be possible declare a static array starting at a fixed address and -use a linker script to make sure that it ends up at the right spot in the final -program, but since we have to use volatile reads and writes with PALRAM anyway, -we'll just reuse our VolatilePtr type. Something like this:

-

-# #![allow(unused_variables)]
-#fn main() {
-pub const PALRAM_BG_BASE: VolatilePtr<u16> = VolatilePtr(0x500_0000 as *mut u16);
-
-pub fn bg_palette(slot: usize) -> u16 {
-  assert!(slot < 256);
-  unsafe { PALRAM_BG_BASE.offset(slot as isize).read() }
-}
-
-pub fn set_bg_palette(slot: usize, color: u16) {
-  assert!(slot < 256);
-  unsafe { PALRAM_BG_BASE.offset(slot as isize).write(color) }
-}
-#}
-

As we discussed with the tile color depths, the palette can be utilized as a -single block of palette values ([u16; 256]) or as 16 palbanks of 16 palette -values each ([[u16;16]; 16]). This setting is assigned per background layer -via IO register.

-

Get Your Tiles Ready

-

Tile data is placed into charblocks. A charblock is always 16kb, so depending on -color depth it will have either 256 or 512 tiles within that charblock. -Charblocks 0, 1, 2, and 3 are all for background tiles. That's a maximum of 2048 -tiles for backgrounds, but as you'll see in a moment a particular tilemap entry -can't even index that high. Instead, each background layer is assigned a -"character base block", and then tilemap entries index relative to the character -base block of that background layer.

-

Now, if you want to move in a lot of tile data you'll probably want to use a DMA -routine, or at least write a function like memcopy32 for fast u32 copying from -ROM into VRAM. However, for now, and because we're being very explicit since -this is our first time doing it, we'll write it as functions for individual tile -reads and writes.

-

The math works like indexing a pointer, except that we have two sizes we need to -go by. First you take the base address for VRAM (0x600_0000), then add the -size of a charblock (16kb) times the charblock you want to place the tile -within, and then you add the index of the tile slot you're placing it into times -the size of that type of tile. Like this:

-

-# #![allow(unused_variables)]
-#fn main() {
-pub fn bg_tile_4bpp(base_block: usize, tile_index: usize) -> Tile4bpp {
-  assert!(base_block < 4);
-  assert!(tile_index < 512);
-  let address = VRAM + size_of::<Charblock4bpp>() * base_block + size_of::<Tile4bpp>() * tile_index;
-  unsafe { VolatilePtr(address as *mut Tile4bpp).read() }
-}
-
-pub fn set_bg_tile_4bpp(base_block: usize, tile_index: usize, tile: Tile4bpp) {
-  assert!(base_block < 4);
-  assert!(tile_index < 512);
-  let address = VRAM + size_of::<Charblock4bpp>() * base_block + size_of::<Tile4bpp>() * tile_index;
-  unsafe { VolatilePtr(address as *mut Tile4bpp).write(tile) }
-}
-
-pub fn bg_tile_8bpp(base_block: usize, tile_index: usize) -> Tile8bpp {
-  assert!(base_block < 4);
-  assert!(tile_index < 256);
-  let address = VRAM + size_of::<Charblock8bpp>() * base_block + size_of::<Tile8bpp>() * tile_index;
-  unsafe { VolatilePtr(address as *mut Tile8bpp).read() }
-}
-
-pub fn set_bg_tile_8bpp(base_block: usize, tile_index: usize, tile: Tile8bpp) {
-  assert!(base_block < 4);
-  assert!(tile_index < 256);
-  let address = VRAM + size_of::<Charblock8bpp>() * base_block + size_of::<Tile8bpp>() * tile_index;
-  unsafe { VolatilePtr(address as *mut Tile8bpp).write(tile) }
-}
-#}
-

For bulk operations, you'd do the exact same math to get your base destination -pointer, and then you'd get the base source pointer for the tile you're copying -out of ROM, and then you'd do the bulk copy for the correct number of u32 -values that you're trying to move (8 per tile moved for 4bpp, or 16 per tile -moved for 8bpp).

-

GBA Limitation Note: on a modern PC (eg: x86 or x86_64) you're probably -used to index based loops and iterator based loops being the same speed. The CPU -has the ability to do a "fused multiply add", so the base address of the array -plus desired index * size per element is a single CPU operation to compute. It's -slightly more complicated if there's arrays within arrays like there are here, -but with normal arrays it's basically the same speed to index per loop cycle as -it is to take a base address and then add +1 offset per loop cycle. However, the -GBA's CPU can't do any of that. On the GBA, there's a genuine speed difference -between looping over indexes and then indexing each loop (slow) compared to -using an iterator that just stores an internal pointer and does +1 offset per -loop until it reaches the end (fast). The repeated indexing itself can by itself -be an expensive step. If it's like a 3 element array it's no big deal, but if -you've got a big slice of data to process, be sure to go over it with .iter() -and .iter_mut() if you can, instead of looping by index. This is Rust and all, -so probably you were gonna do that anyway, but just a heads up.

-

Get your Tilemap ready

-

I believe that at one point I alluded to a tilemap existing. Well, just as the -tiles are arranged into charblocks, the data describing what tile to show in -what location is arranged into a thing called a screenblock.

-

A screenblock is placed into VRAM the same as the tile data charblocks. Starting -at the base of VRAM (0x600_0000) there are 32 slots for the screenblock array. -Each screenblock is 2048 bytes (0x800). Naturally, if our tiles are using up -charblock space within VRAM and our tilemaps are using up screenblock space -within the same VRAM... well it would just be a disaster if they ran in to -each other. Once again, it's up to you as the programmer to determine how much -space you want to devote to each thing. Each complete charblock uses up 8 -screenblocks worth of space, but you don't have to fill a complete charblock -with tiles, so you can be very fiddly with how you split the memory.

-

Each screenblock is composed of a series of screenblock entry values, which -describe what tile index to use and if the tile should be flipped and what -palbank it should use (if any). Because both regular backgrounds and affine -backgrounds are composed of screenblocks with entries, and because the affine -background has a smaller format for screenblock entries, we'll name -appropriately.

-

-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Clone, Copy)]
-#[repr(transparent)]
-pub struct RegularScreenblock {
-  pub data: [RegularScreenblockEntry; 32 * 32],
-}
-
-#[derive(Debug, Clone, Copy, Default)]
-#[repr(transparent)]
-pub struct RegularScreenblockEntry(u16);
-#}
-

So, with one entry per tile, a single screenblock allows for 32x32 tiles worth of -background.

-

The format of a regular screenblock entry is quite simple compared to some of -the IO register stuff:

-
    -
  • 10 bits for tile index (base off of the character base block of the background)
  • -
  • 1 bit for horizontal flip
  • -
  • 1 bit for vertical flip
  • -
  • 4 bits for picking which palbank to use (if 4bpp, otherwise it's ignored)
  • -
-

-# #![allow(unused_variables)]
-#fn main() {
-impl RegularScreenblockEntry {
-  pub fn tile_id(self) -> u16 {
-    self.0 & 0b11_1111_1111
-  }
-  pub fn set_tile_id(&mut self, id: u16) {
-    self.0 &= !0b11_1111_1111;
-    self.0 |= id;
-  }
-  pub fn horizontal_flip(self) -> bool {
-    (self.0 & (1 << 0xA)) > 0
-  }
-  pub fn set_horizontal_flip(&mut self, bit: bool) {
-    if bit {
-      self.0 |= 1 << 0xA;
-    } else {
-      self.0 &= !(1 << 0xA);
-    }
-  }
-  pub fn vertical_flip(self) -> bool {
-    (self.0 & (1 << 0xB)) > 0
-  }
-  pub fn set_vertical_flip(&mut self, bit: bool) {
-    if bit {
-      self.0 |= 1 << 0xB;
-    } else {
-      self.0 &= !(1 << 0xB);
-    }
-  }
-  pub fn palbank_index(self) -> u16 {
-    self.0 >> 12
-  }
-  pub fn set_palbank_index(&mut self, palbank_index: u16) {
-    self.0 &= 0b1111_1111_1111;
-    self.0 |= palbank_index << 12;
-  }
-}
-#}
-

Now, at either 256 or 512 tiles per charblock, you might be thinking that with a -10 bit index you can index past the end of one charblock and into the next. -You'd be right, mostly.

-

As long as you stay within the background memory region for charblocks (that is, -0 through 3), then it all works out. However, if you try to get the background -rendering to reach outside of the background charblocks you'll get an -implementation defined result. It's not the dreaded "undefined behavior" we're -often worried about in programming, but the results are determined by what -you're running the game on. With GBA hardware you get a bizarre result -(basically another way to put garbage on the screen). With a DS it acts as if -the tiles were all 0s. If you use an emulator it might or might not allow for -you to do this, it's up to the emulator writers.

-

Set Your IO Registers

-

Instead of being just a single IO register to learn about this time, there's two -separate groups of related registers.

-

Background Control

-
    -
  • BG0CNT (0x400_0008): BG0 Control
  • -
  • BG1CNT (0x400_000A): BG1 Control
  • -
  • BG2CNT (0x400_000C): BG2 Control
  • -
  • BG3CNT (0x400_000E): BG3 Control
  • -
-

Each of these are a read/write u16 location. This is where we get to all of -the important details that we've been putting off.

-
    -
  • 2 bits for the priority.
  • -
  • 2 bits for "character base block", the charblock that all of the tile indexes -for this background are offset from.
  • -
  • 1 bit for mosaic effect being enabled (we'll get to that below).
  • -
  • 1 bit to enable 8bpp, otherwise 4bpp is used.
  • -
  • 5 bits to pick the "screen base block", the screen block that serves as the -base value for this background.
  • -
  • 1 bit that is not used in regular mode, but in affine mode it can be enabled -to cause the affine background to wrap around at the edges.
  • -
  • 2 bits for the background size.
  • -
-

The size works a little funny. When size is 0 only the base screen block is -used. If size is 1 or 2 then the base screenblock and the following screenblock -are placed next to each other (horizontally for 1, vertically for 2). If the -size is 3 then the base screenblock and the following three screenblocks are -arranged into a 2x2 grid of screenblocks.

-

Background Offset

-
    -
  • BG0HOFS (0x400_0010): BG0 X-Offset
  • -
  • BG0VOFS (0x400_0012): BG0 Y-Offset
  • -
  • BG1HOFS (0x400_0014): BG1 X-Offset
  • -
  • BG1VOFS (0x400_0016): BG1 Y-Offset
  • -
  • BG2HOFS (0x400_0018): BG2 X-Offset
  • -
  • BG2VOFS (0x400_001A): BG2 Y-Offset
  • -
  • BG3HOFS (0x400_001C): BG3 X-Offset
  • -
  • BG3VOFS (0x400_001E): BG3 Y-Offset
  • -
-

Each of these are a write only u16 location. Bits 0 through 8 are used, so -the offsets can be 0 through 511. They also only apply in regular backgrounds. -If a background is in an affine state then you'll use different IO registers to -control it (discussed in a later chapter).

-

The offset that you assign determines the pixel offset of the display area -relative to the start of the background scene, as if the screen was a camera -looking at the scene. In other words, as a BG X offset value increases, you can -think of it as the camera moving to the right, or as that background moving to -the left. Like when mario walks toward the goal. Similarly, when a BG Y offset -increases the camera is moving down, or the background is moving up, like when -mario falls down from a high platform.

-

Depending on how much the background is scrolled and the size of the background, -it will loop.

-

Mosaic

-

As a special effect, you can apply mosaic to backgrounds and objects. It's just -a single flag for each background, so all backgrounds will use the same mosaic -settings when they have it enabled. What it actually does is split the normal -image into "blocks" and then each block gets the color of the top left pixel of -that block. This is the effect you see when link hits an electric foe with his -sword and the whole screen "buzzes" at you.

-

The mosaic control is a write only u16 IO register at 0x400_004C.

-

There's 4 bits each for:

-
    -
  • Horizontal BG stretch
  • -
  • Vertical BG stretch
  • -
  • Horizontal object stretch
  • -
  • Vertical object stretch
  • -
-

The inputs should be 1 less than the desired block size. So if you set a -stretch value of 5 then pixels 0-5 would be part of the first block (6 pixels), -then 6-11 is the next block (another 6 pixels) and so on.

-

If you need to make a pixel other than the top left part of each block the one -that determines the mosaic color you can carefully offset the background or -image by a tiny bit, but of course that makes every mosaic block change its -target pixel. You can't change the target pixel on a block by block basis.

- -
- - -
-
- - - -
- - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/ch03/regular_objects.html b/docs/ch03/regular_objects.html deleted file mode 100644 index 7f7cd0c..0000000 --- a/docs/ch03/regular_objects.html +++ /dev/null @@ -1,596 +0,0 @@ - - - - - - Regular Objects - Rust GBA Guide - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
- - - - - - - - - - -
-
-

Regular Objects

-

As with backgrounds, objects can be used in both an affine and non-affine way. -For this section we'll focus on the non-affine elements, and then we'll do all -the affine stuff in a later chapter.

-

Objects vs Sprites

-

As TONC helpfully reminds us -(and then proceeds to not follow its own advice), we should always try to think -in terms of objects, not sprites. A sprite is a logical / software concern, -perhaps a player concern, whereas an object is a hardware concern.

-

What's more, a given sprite that the player sees might need more than one object -to display. Objects must be either square or rectangular (so sprite bits that -stick out probably call for a second object), and can only be from 8x8 to 64x64 -(so anything bigger has to be two objects lined up to appear as one).

-

General Object Info

-

Unlike with backgrounds, you can enable the object layer in any video mode. -There's space for 128 object definitions in OAM.

-

The display gets a number of cycles per scanline to process objects: 1210 by -default, but only 954 if you enable the "HBlank interval free" setting in the -display control register. The cycle cost per -object depends on the -object's size and if it's using affine or regular mode, so enabling the HBlank -interval free setting doesn't cut the number of objects displayable by an exact -number of objects. The objects are processed in order of their definitions and -if you run out of cycles then the rest just don't get shown. If there's a -concern that you might run out of cycles you can place important objects (such -as the player) at the start of the list and then less important animation -objects later on.

-

Ready the Palette

-

Objects use the palette the same as the background does. The only difference is -that the palette data for objects starts at 0x500_0200.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub const PALRAM_OBJECT_BASE: VolatilePtr<u16> = VolatilePtr(0x500_0200 as *mut u16);
-
-pub fn object_palette(slot: usize) -> u16 {
-  assert!(slot < 256);
-  unsafe { PALRAM_OBJECT_BASE.offset(slot as isize).read() }
-}
-
-pub fn set_object_palette(slot: usize, color: u16) {
-  assert!(slot < 256);
-  unsafe { PALRAM_OBJECT_BASE.offset(slot as isize).write(color) }
-}
-#}
-

Ready the Tiles

-

Objects, as with backgrounds, are composed of 8x8 tiles, and if you want -something bigger than 8x8 you have to use more than one tile put together. -Object tiles go into the final two charblocks of VRAM (indexes 4 and 5). Because -there's only two of them, they are sometimes called the lower block -(0x601_0000) and the higher/upper block (0x601_4000).

-

Tile indexes for sprites always offset from the base of the lower block, and -they always go 32 bytes at a time, regardless of if the object is set for 4bpp -or 8bpp. From this we can determine that there's 512 tile slots in each of the -two object charblocks. However, in video modes 3, 4, and 5 the space for the -background cuts into the lower charblock, so you can only safely use the upper -charblock.

-

-# #![allow(unused_variables)]
-#fn main() {
-pub fn obj_tile_4bpp(tile_index: usize) -> Tile4bpp {
-  assert!(tile_index < 512);
-  let address = VRAM + size_of::<Charblock4bpp>() * 4 + 32 * tile_index;
-  unsafe { VolatilePtr(address as *mut Tile4bpp).read() }
-}
-
-pub fn set_obj_tile_4bpp(tile_index: usize, tile: Tile4bpp) {
-  assert!(tile_index < 512);
-  let address = VRAM + size_of::<Charblock4bpp>() * 4 + 32 * tile_index;
-  unsafe { VolatilePtr(address as *mut Tile4bpp).write(tile) }
-}
-
-pub fn obj_tile_8bpp(tile_index: usize) -> Tile8bpp {
-  assert!(tile_index < 512);
-  let address = VRAM + size_of::<Charblock8bpp>() * 4 + 32 * tile_index;
-  unsafe { VolatilePtr(address as *mut Tile8bpp).read() }
-}
-
-pub fn set_obj_tile_8bpp(tile_index: usize, tile: Tile8bpp) {
-  assert!(tile_index < 512);
-  let address = VRAM + size_of::<Charblock8bpp>() * 4 + 32 * tile_index;
-  unsafe { VolatilePtr(address as *mut Tile8bpp).write(tile) }
-}
-#}
-

With backgrounds you picked every single tile individually with a bunch of -screen entry values. Objects don't do that at all. Instead you pick a base tile, -size, and shape, then it figures out the rest from there. However, you may -recall back with the display control register something about an "object memory -1d" bit. This is where that comes into play.

-
    -
  • If object memory is set to be 2d (the default) then each charblock is treated -as 32 tiles by 32 tiles square. Each object has a base tile and dimensions, -and that just extracts directly from the charblock picture as if you were -selecting an area. This mode probably makes for the easiest image editing.
  • -
  • If object memory is set to be 1d then the tiles are loaded sequentially from -the starting point, enough to fill in the object's dimensions. This most -probably makes it the easiest to program with about things, since programming -languages are pretty good at 1d things.
  • -
-

I'm not sure I explained that well, here's a picture:

-

2d1d-diagram

-

In 2d mode, a new row of tiles starts every 32 tile indexes.

-

Of course, the mode that you actually end up using is not particularly -important, since it should be the job of your image conversion routine to get -everything all lined up and into place anyway.

-

Set the Object Attributes

-

The final step is to assign the correct attributes to an object. Each object has -three u16 values that make up its overall attributes.

-

Before we go into the details, I want to bring up that the hardware will attempt -to process every single object every single frame if the object layer is -enabled, and also that all of the GBA's object memory is cleared to 0 at -startup. Why do these two things matter right now? As you'll see in a second an -"all zero" set of object attributes causes an 8x8 object to appear at 0,0 using -object tile index 0. This is usually not what you want your unused objects to -do. When your game first starts you should take a moment to mark any objects you -won't be using as objects to not render.

-

ObjectAttributes.attr0

-
    -
  • 8 bits for row coordinate (marks the top of the sprite)
  • -
  • 2 bits for object rendering: 0 = Normal, 1 = Affine, 2 = Disabled, 3 = Affine with double rendering area
  • -
  • 2 bits for object mode: 0 = Normal, 1 = Alpha Blending, 2 = Object Window, 3 = Forbidden
  • -
  • 1 bit for mosaic enabled
  • -
  • 1 bit 8bpp color enabled
  • -
  • 2 bits for shape: 0 = Square, 1 = Horizontal, 2 = Vertical, 3 = Forbidden
  • -
-

If an object is 128 pixels big at Y > 128 you'll get a strange looking result -where it acts like Y > -128 and then displays partly off screen to the top.

-

ObjectAttributes.attr1

-
    -
  • 9 bit for column coordinate (marks the left of the sprite)
  • -
  • Either: -
      -
    • 3 empty bits, 1 bit for horizontal flip, 1 bit for vertical flip (non-affine)
    • -
    • 5 bits for affine index (affine)
    • -
    -
  • -
  • 2 bits for size.
  • -
- - - - - -
Size Square Horizontal Vertical
0 8x8 16x8 8x16
1 16x16 32x8 8x32
2 32x32 32x16 16x32
3 64x64 64x32 32x64
-

ObjectAttributes.attr2

-
    -
  • 10 bits for the base tile index
  • -
  • 2 bits for priority
  • -
  • 4 bits for the palbank index (4bpp mode only, ignored in 8bpp)
  • -
-

ObjectAttributes summary

-

So I said in the GBA memory mapping section that C people would tell you that -the object attributes should look like this:

-

-# #![allow(unused_variables)]
-#fn main() {
-#[repr(C)]
-pub struct ObjectAttributes {
-  attr0: u16,
-  attr1: u16,
-  attr2: u16,
-  filler: i16,
-}
-#}
-

Except that:

-
    -
  1. It's wasteful when we store object attributes on their own outside of OAM -(which we definitely might want to do).
  2. -
  3. In Rust we can't access just one field through a volatile pointer (our -pointers aren't actually volatile to begin with, just the ops we do with them -are). We have to read or write the whole pointer's value at a time. -Similarly, we can't do things like |= and &= with volatile in Rust. So in -rust we can't have a volatile pointer to an ObjectAttributes and then write -to just the three "real" values and not touch the filler field. Having the -filler value in there just means we have to dance around it more, not less.
  4. -
  5. We want to newtype this whole thing to prevent accidental invalid states from -being written into memory.
  6. -
-

So we will not be using that representation. At the same time we want to have no -overhead, so we will stick to three u16 values. We could newtype each -individual field to be its own type (ObjectAttributesAttr0 or something silly -like that), since there aren't actual dependencies between two different fields -such that a change in one can throw another into a forbidden state. The worst -that can happen is if we disable or enable affine mode (attr0) it can change -the meaning of attr1. The changed meaning isn't actually in invalid state -though, so we could make each field its own type if we wanted.

-

However, when you think about it, I can't imagine a common situation where we do -something like make an attr0 value that we then want to save on its own and -apply to several different ObjectAttributes that we make during a game. That -just doesn't sound likely to me. So, we'll go the route where ObjectAttributes -is just a big black box to the outside world and we don't need to think about -the three fields internally as being separate.

-

First we make it so that we can get and set object attributes from memory:

-

-# #![allow(unused_variables)]
-#fn main() {
-pub const OAM: usize = 0x700_0000;
-
-pub fn object_attributes(slot: usize) -> ObjectAttributes {
-  assert!(slot < 128);
-  let ptr = VolatilePtr((OAM + slot * (size_of::<u16>() * 4)) as *mut u16);
-  unsafe {
-    ObjectAttributes {
-      attr0: ptr.read(),
-      attr1: ptr.offset(1).read(),
-      attr2: ptr.offset(2).read(),
-    }
-  }
-}
-
-pub fn set_object_attributes(slot: usize, obj: ObjectAttributes) {
-  assert!(slot < 128);
-  let ptr = VolatilePtr((OAM + slot * (size_of::<u16>() * 4)) as *mut u16);
-  unsafe {
-    ptr.write(obj.attr0);
-    ptr.offset(1).write(obj.attr1);
-    ptr.offset(2).write(obj.attr2);
-  }
-}
-
-#[derive(Debug, Clone, Copy, Default)]
-pub struct ObjectAttributes {
-  attr0: u16,
-  attr1: u16,
-  attr2: u16,
-}
-#}
-

Then we add a billion methods to the ObjectAttributes type so that we can -actually set all the different values that we want to set.

-

This code block is the last thing on this page so if you don't wanna scroll past -the whole thing you can just go to the next page.

-

-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Debug, Clone, Copy)]
-pub enum ObjectRenderMode {
-  Normal,
-  Affine,
-  Disabled,
-  DoubleAreaAffine,
-}
-
-#[derive(Debug, Clone, Copy)]
-pub enum ObjectMode {
-  Normal,
-  AlphaBlending,
-  ObjectWindow,
-}
-
-#[derive(Debug, Clone, Copy)]
-pub enum ObjectShape {
-  Square,
-  Horizontal,
-  Vertical,
-}
-
-#[derive(Debug, Clone, Copy)]
-pub enum ObjectOrientation {
-  Normal,
-  HFlip,
-  VFlip,
-  BothFlip,
-  Affine(u8),
-}
-
-impl ObjectAttributes {
-  pub fn row(&self) -> u16 {
-    self.attr0 & 0b1111_1111
-  }
-  pub fn column(&self) -> u16 {
-    self.attr1 & 0b1_1111_1111
-  }
-  pub fn rendering(&self) -> ObjectRenderMode {
-    match (self.attr0 >> 8) & 0b11 {
-      0 => ObjectRenderMode::Normal,
-      1 => ObjectRenderMode::Affine,
-      2 => ObjectRenderMode::Disabled,
-      3 => ObjectRenderMode::DoubleAreaAffine,
-      _ => unimplemented!(),
-    }
-  }
-  pub fn mode(&self) -> ObjectMode {
-    match (self.attr0 >> 0xA) & 0b11 {
-      0 => ObjectMode::Normal,
-      1 => ObjectMode::AlphaBlending,
-      2 => ObjectMode::ObjectWindow,
-      _ => unimplemented!(),
-    }
-  }
-  pub fn mosaic(&self) -> bool {
-    ((self.attr0 << 3) as i16) < 0
-  }
-  pub fn two_fifty_six_colors(&self) -> bool {
-    ((self.attr0 << 2) as i16) < 0
-  }
-  pub fn shape(&self) -> ObjectShape {
-    match (self.attr0 >> 0xE) & 0b11 {
-      0 => ObjectShape::Square,
-      1 => ObjectShape::Horizontal,
-      2 => ObjectShape::Vertical,
-      _ => unimplemented!(),
-    }
-  }
-  pub fn orientation(&self) -> ObjectOrientation {
-    if (self.attr0 >> 8) & 1 > 0 {
-      ObjectOrientation::Affine((self.attr1 >> 9) as u8 & 0b1_1111)
-    } else {
-      match (self.attr1 >> 0xC) & 0b11 {
-        0 => ObjectOrientation::Normal,
-        1 => ObjectOrientation::HFlip,
-        2 => ObjectOrientation::VFlip,
-        3 => ObjectOrientation::BothFlip,
-        _ => unimplemented!(),
-      }
-    }
-  }
-  pub fn size(&self) -> u16 {
-    self.attr1 >> 0xE
-  }
-  pub fn tile_index(&self) -> u16 {
-    self.attr2 & 0b11_1111_1111
-  }
-  pub fn priority(&self) -> u16 {
-    self.attr2 >> 0xA
-  }
-  pub fn palbank(&self) -> u16 {
-    self.attr2 >> 0xC
-  }
-  //
-  pub fn set_row(&mut self, row: u16) {
-    self.attr0 &= !0b1111_1111;
-    self.attr0 |= row & 0b1111_1111;
-  }
-  pub fn set_column(&mut self, col: u16) {
-    self.attr1 &= !0b1_1111_1111;
-    self.attr2 |= col & 0b1_1111_1111;
-  }
-  pub fn set_rendering(&mut self, rendering: ObjectRenderMode) {
-    const RENDERING_MASK: u16 = 0b11 << 8;
-    self.attr0 &= !RENDERING_MASK;
-    self.attr0 |= (rendering as u16) << 8;
-  }
-  pub fn set_mode(&mut self, mode: ObjectMode) {
-    const MODE_MASK: u16 = 0b11 << 0xA;
-    self.attr0 &= MODE_MASK;
-    self.attr0 |= (mode as u16) << 0xA;
-  }
-  pub fn set_mosaic(&mut self, bit: bool) {
-    const MOSAIC_BIT: u16 = 1 << 0xC;
-    if bit {
-      self.attr0 |= MOSAIC_BIT
-    } else {
-      self.attr0 &= !MOSAIC_BIT
-    }
-  }
-  pub fn set_two_fifty_six_colors(&mut self, bit: bool) {
-    const COLOR_MODE_BIT: u16 = 1 << 0xD;
-    if bit {
-      self.attr0 |= COLOR_MODE_BIT
-    } else {
-      self.attr0 &= !COLOR_MODE_BIT
-    }
-  }
-  pub fn set_shape(&mut self, shape: ObjectShape) {
-    self.attr0 &= 0b0011_1111_1111_1111;
-    self.attr0 |= (shape as u16) << 0xE;
-  }
-  pub fn set_orientation(&mut self, orientation: ObjectOrientation) {
-    const AFFINE_INDEX_MASK: u16 = 0b1_1111 << 9;
-    self.attr1 &= !AFFINE_INDEX_MASK;
-    let bits = match orientation {
-      ObjectOrientation::Affine(index) => (index as u16) << 9,
-      ObjectOrientation::Normal => 0,
-      ObjectOrientation::HFlip => 1 << 0xC,
-      ObjectOrientation::VFlip => 1 << 0xD,
-      ObjectOrientation::BothFlip => 0b11 << 0xC,
-    };
-    self.attr1 |= bits;
-  }
-  pub fn set_size(&mut self, size: u16) {
-    self.attr1 &= 0b0011_1111_1111_1111;
-    self.attr1 |= size << 14;
-  }
-  pub fn set_tile_index(&mut self, index: u16) {
-    self.attr2 &= !0b11_1111_1111;
-    self.attr2 |= 0b11_1111_1111 & index;
-  }
-  pub fn set_priority(&mut self, priority: u16) {
-    self.attr2 &= !0b0000_1100_0000_0000;
-    self.attr2 |= (priority & 0b11) << 0xA;
-  }
-  pub fn set_palbank(&mut self, palbank: u16) {
-    self.attr2 &= !0b1111_0000_0000_0000;
-    self.attr2 |= (palbank & 0b1111) << 0xC;
-  }
-}
-#}
- -
- - -
-
- - - -
- - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/ch03/screenshot_checkers.png b/docs/ch03/screenshot_checkers.png deleted file mode 100644 index dc6d71a..0000000 Binary files a/docs/ch03/screenshot_checkers.png and /dev/null differ diff --git a/docs/ch03/tile_data.html b/docs/ch03/tile_data.html deleted file mode 100644 index d5e0f24..0000000 --- a/docs/ch03/tile_data.html +++ /dev/null @@ -1,314 +0,0 @@ - - - - - - Tile Data - Rust GBA Guide - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
- - - - - - - - - - -
-
-

Tile Data

-

When using the GBA's hardware graphics, if you want to let the hardware do most -of the work you have to use Modes 0, 1 or 2. However, to do that we first have -to learn about how tile data works inside of the GBA.

-

Tiles

-

Fundamentally, a tile is an 8x8 image. If you want anything bigger than 8x8 you -need to arrange several tiles so that it looks like whatever you're trying to -draw.

-

As was already mentioned, the GBA supports two different color modes: 4 bits per -pixel and 8 bits per pixel. This means that we have two types of tile that we -need to model. The pixel bits always represent an index into the PALRAM.

-
    -
  • With 4 bits per pixel, the PALRAM is imagined to be 16 palbank sections of -16 palette entries each. The image data selects the index within the palbank, -and an external configuration selects which palbank is used.
  • -
  • With 8 bits per pixel, the PALRAM is imagined to be a single 256 entry array -and the index just directly picks which of the 256 colors is used.
  • -
-

Knowing this, we can write the following definitions:

-

-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Debug, Clone, Copy, Default)]
-#[repr(transparent)]
-pub struct Tile4bpp {
-  pub data: [u32; 8]
-}
-
-#[derive(Debug, Clone, Copy, Default)]
-#[repr(transparent)]
-pub struct Tile8bpp {
-  pub data: [u32; 16]
-}
-#}
-

I hope this makes sense so far. At 4bpp, we have 4 bits per pixel, times 8 -pixels per line, times 8 lines: 256 bits required. Similarly, at 8 bits per -pixel we'll need 512 bits. Why are we defining them as arrays of u32 values? -Because when it comes time to do bulk copies the fastest way to it will be to go -one whole machine word at a time. If we make the data inside the type be an -array of u32 then it'll already be aligned for fast u32 bulk copies.

-

Keeping track of the current color depth is naturally the programmer's -problem. If you get it wrong you'll see a whole ton of garbage pixels all over -the screen, and you'll probably be able to guess why. You know, unless you did -one of the other things that can make a bunch of garbage pixels show up all over -the screen. Graphics programming is fun like that.

-

Charblocks

-

Tiles don't just sit on their own, they get grouped into charblocks. Long -ago in the distant past, video games were built with hardware that was also used -to make text terminals. So tile image data was called "character data". In fact -some guides will even call the regular mode for the background layers "text -mode", despite the fact that you obviously don't have to show text at all.

-

A charblock is 16kb long (0x4000 bytes), which means that the number of tiles -that fit into a charblock depends on your color depth. With 4bpp you get 512 -tiles, and with 8bpp there's 256 tiles. So they'd be something like this:

-

-# #![allow(unused_variables)]
-#fn main() {
-#[derive(Clone, Copy)]
-#[repr(transparent)]
-pub struct Charblock4bpp {
-  pub data: [Tile4bpp; 512],
-}
-
-#[derive(Clone, Copy)]
-#[repr(transparent)]
-pub struct Charblock8bpp {
-  pub data: [Tile8bpp; 256],
-}
-#}
-

You'll note that we can't even derive Debug or Default any more because the -arrays are so big. Rust supports Clone and Copy for arrays of any size, but the -rest is still size 32 or less. We won't generally be making up an entire -Charblock on the fly though, so it's not a big deal. If we absolutely had to, -we could call core::mem::zeroed(), but we really don't want to be trying to -build a whole charblock at runtime. We'll usually want to define our tile data -as const charblock values (or even parts of charblock values) that we then -load out of the game pak ROM at runtime.

-

Anyway, with 16k per charblock and only 96k total in VRAM, it's easy math to see -that there's 6 different charblocks in VRAM when in a tiled mode. The first four -of these are for backgrounds, and the other two are for objects. There's rules -for how a tile ID on a background or object selects a tile within a charblock, -but since they're different between backgrounds and objects we'll cover that on -their own pages.

-

Image Editing

-

It's very important to note that if you use a normal image editor you'll get -very bad results if you translate that directly into GBA memory.

-

Imagine you have part of an image that's 16 by 16 pixels, aka 2 tiles by 2 -tiles. The data for that bitmap is the 1st row of the 1st tile, then the 1st row -of the 2nd tile. However, when we translate that into the GBA, the first 8 -pixels will indeed be the first 8 tile pixels, but then the next 8 pixels in -memory will be used as the 2nd row of the first tile, not the 1st row of the -2nd tile.

-

So, how do we fix this?

-

Well, the simple but annoying way is to edit your tile image as being an 8 pixel -wide image and then have the image get super tall as you add more and more -tiles. It can work, but it's really impractical if you have any multi-tile -things that you're trying to do.

-

Instead, there are some image conversion tools that devkitpro provides in their -gba-dev section. They let you take normal images and then repackage them and -export it in various formats that you can then compile into your project.

-

Ketsuban uses the grit tool, with the -following suggestions:

-
    -
  1. Include an actual resource file and a file describing it somewhere in your -project (see the grit -manual for all details -involved here).
  2. -
  3. In a build.rs you run grit on each resource+description pair, such as in -this old gist -example
  4. -
  5. Then within your rust code you use the -include_bytes! -macro to have the formatted resource be available as a const value you can -load at runtime.
  6. -
- -
- - -
-
- - - -
- - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/concepts/bios.html b/docs/concepts/bios.html new file mode 100644 index 0000000..a03d8c9 --- /dev/null +++ b/docs/concepts/bios.html @@ -0,0 +1,200 @@ + + + + + + BIOS - Rust GBA Guide + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + + +
+
+

BIOS

+ +
+ + +
+
+ + + +
+ + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/concepts/cpu.html b/docs/concepts/cpu.html new file mode 100644 index 0000000..707acaf --- /dev/null +++ b/docs/concepts/cpu.html @@ -0,0 +1,200 @@ + + + + + + CPU - Rust GBA Guide + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + + +
+
+

CPU

+ +
+ + +
+
+ + + +
+ + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/concepts/index.html b/docs/concepts/index.html new file mode 100644 index 0000000..345d15a --- /dev/null +++ b/docs/concepts/index.html @@ -0,0 +1,200 @@ + + + + + + Broad Concepts - Rust GBA Guide + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + + +
+
+

Broad Concepts

+ +
+ + +
+
+ + + +
+ + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/ch01/io_registers.html b/docs/concepts/io-registers.html similarity index 66% rename from docs/ch01/io_registers.html rename to docs/concepts/io-registers.html index 6f60d8b..e858342 100644 --- a/docs/ch01/io_registers.html +++ b/docs/concepts/io-registers.html @@ -72,7 +72,7 @@
@@ -137,52 +137,19 @@

IO Registers

-

The GBA has a large number of IO Registers (not to be confused with CPU -registers). These are special memory locations from 0x04000000 to -0x040003FE. GBATEK has a full -list, but we only need to learn -about a few of them at a time as we go, so don't be worried.

-

The important facts to know about IO Registers are these:

-
    -
  • Each has their own specific size. Most are u16, but some are u32.
  • -
  • All of them must be accessed in a volatile style.
  • -
  • Each register is specifically readable or writable or both. Actually, with -some registers there are even individual bits that are read-only or -write-only. -
      -
    • If you write to a read-only position, those writes are simply ignored. This -mostly matters if a writable register contains a read-only bit (such as the -Display Control, next section).
    • -
    • If you read from a write-only position, you get back values that are -basically -nonsense. There -aren't really any registers that mix writable bits with read only bits, so -you're basically safe here. The only (mild) concern is that when you write a -value into a write-only register you need to keep track of what you wrote -somewhere else if you want to know what you wrote (such to adjust an offset -value by +1, or whatever).
    • -
    • You can always check GBATEK to be sure, but if I don't mention it then a bit -is probably both read and write.
    • -
    -
  • -
  • Some registers have invalid bit patterns. For example, the lowest three bits -of the Display Control register can't legally be set to the values 6 or 7.
  • -
-

When talking about bit positions, the numbers are zero indexed just like an -array index is.