No more old book stuff (#123)

* stop with the book, we should focus on the crate. * Update README.md * Update README.md
2025-01-11 03:21:30 +11:00 · 2021-04-05 18:11:42 -06:00 · 2021-04-05 18:11:42 -06:00 · 8efef6ebc5
parent 99f80d2b9a
commit 8efef6ebc5
54 changed files with 32 additions and 6528 deletions
--- a/Cargo.toml
+++ b/Cargo.toml
@ -1,6 +1,6 @@
 [package]
 name = "gba"
-description = "A crate (and book) for making GBA games with Rust."
+description = "A crate for making GBA games with Rust."
 version = "0.4.0-pre1"
 authors = ["Lokathor <zefria@gmail.com>", "Thomas Winwood <twwinwood@gmail.com>"]
 repository = "https://github.com/rust-console/gba"
--- a/README.md
+++ b/README.md
@ -11,43 +11,45 @@
 # gba
-_Eventually_ there will be a full [Tutorial
+A crate to make GBA programming easy.
 Book](https://rust-console.github.io/gba/) that goes along with this crate.
 However, currently the development focus is leaning towards having minimal
 coverage of all the parts of the GBA. Until that's done, unfortunately the book
 will be in a rather messy state.
-## What's Missing
+Currently we don't have as much documentation as we'd like.
 If you check out the [awesome-gbadev](https://github.com/gbdev/awesome-gbadev) repository they have many resources, though most are oriented towards C.
-The following major GBA features are still missing from the crate:
+## First Time Setup
-* Affine Graphics
+Building for the GBA requires Nightly rust, and also uses the `build-std` feature, so you'll need the rust source available.
 * Interrupt Handling
 * Serial Communication
 ## Build Dependencies
 Install required cargo packages
 ```sh
 rustup install nightly
 rustup +nightly component add rust-src
 ```
 You'll also need the ARM binutils so that you can have the assembler and linker for the ARMv4T architecture.
 The way to get them varies by platform:
 * Ubuntu and other debian-like linux distros will usually have them in the package manager.
  ```shell
  sudo apt-get install binutils-arm-none-eabi
  ```
 * With OSX you can get them via homebrew.
  ```shell
  brew install --cask gcc-arm-embedded
  ```
 * On Windows you can get the installer from ARM's website and run that.
  * Download the [GNU Arm Embedded Toolchain](https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-rm/downloads)
  * When installing the toolchain, make sure to select "Add path to environment variable" during install.
  * You'll have to restart any open command prompts after you so run the installer so that they see the new PATH value.
 Finally, rustc itself is only able to make ELF format files. These can be run in emulators, but aren't able to be played on actual hardware.
 You'll need to convert the ELF file into a GBA rom. There's a `cargo-make` file in this repository to do this, and it relies on a tool called `gbafix`
 to assign the right header data to the ROM when packing it.
 ```sh
 cargo install cargo-make
 cargo install gbafix
 ```
-Install arm build tools
+<!--
 * Ubuntu
  ```shell
  sudo apt-get install binutils-arm-none-eabi
  ```
 * OSX
  ```shell
  brew install --cask gcc-arm-embedded
  ```
 * Windows
  * Download the [GNU Arm Embedded Toolchain](https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-rm/downloads)
  * Install the toolchain, make sure to select "Add path to environment variable" during install
 ## First Time Setup
 Writing a Rust program for the GBA requires a fair amount of special setup. All
@ -61,8 +63,9 @@ project started quickly we got you covered:
 ```sh
 curl https://raw.githubusercontent.com/rust-console/gba/master/init.sh -sSf | bash -s APP_NAME
 ```
 -->
 # Contribution
-This crate is Apache2 licensed and any contributions you submit must also be
+This crate is tri-licensed under Zlib / Apache-2.0 / MIT.
-Apache2 licensed.
+Any contributions you submit must be licensed the same.
--- a/book/book.toml
+++ b/book/book.toml
@ -1,7 +0,0 @@
 [book]
 title = "Rust GBA Guide"
 authors = ["Lokathor"]
 [build]
 build-dir = "../target/book-output"
 create-missing = true
--- a/book/src-bak/00-concepts-index.md
+++ b/book/src-bak/00-concepts-index.md
@ -1,38 +0,0 @@
 # Broad Concepts
 The GameBoy Advance sits in a middle place between the chthonic game consoles of
 the ancient past and the "small PC in a funny case" consoles of the modern age.
 On the one hand, yeah, you're gonna find a few strange conventions as you learn
 all the ropes.
 On the other, at least we're writing in Rust at all, and not having to do all
 the assembly by hand.
 This chapter for "concepts" has a section for each part of the GBA's hardware
 memory map, going by increasing order of base address value. The sections try to
 explain as much as possible while sticking to just the concerns you might have
 regarding that part of the memory map.
 For an assessment of how to wrangle all three parts of the video system (PALRAM,
 VRAM, and OAM), along with the correct IO registers, into something that shows a
 picture, you'll want the Video chapter.
 Similarly, the "IO Registers" part of the GBA actually controls how you interact
 with every single bit of hardware connected to the GBA. A full description of
 everything is obviously too much for just one section of the book. Instead you
 get an overview of general IO register rules and advice. Each particular
 register is described in the appropriate sections of either the Video or
 Non-Video chapters.
 ## Bus Size
 TODO: describe this
 ## Minimum Write Size
 TODO: talk about parts where you can't write one byte at a time
 ## Volatile or Not?
 TODO: discuss what memory should be used volatile style and what can be used normal style.
--- a/book/src-bak/00-introduction-index.md
+++ b/book/src-bak/00-introduction-index.md
@ -1,21 +0,0 @@
 # Introduction
 This is the book for learning how to write GameBoy Advance (GBA) games in Rust.
 I'm **Lokathor**, the main author of the book. There's also **Ketsuban** who
 provides the technical advisement, reviews the PRs, and keeps my crazy in check.
 The book is a work in progress, as you can see if you actually try to open many
 of the pages listed in the Table Of Contents.
 ## Feedback
 It's very often hard to tell when you've explained something properly. In the
 same way that your brain will read over small misspellings and correct things
 into the right word, if an explanation for something you already understand
 accidentally skips over some small detail then your brain can fill in the gaps
 without you realizing it.
 **Please**, if things don't make sense then [file an
 issue](https://github.com/rust-console/gba/issues) about it so I know where
 things need to improve.
--- a/book/src-bak/00-non-video-index.md
+++ b/book/src-bak/00-non-video-index.md
@ -1,21 +0,0 @@
 # Non-Video
 Besides video effects the GBA still has an okay amount of stuff going on.
 Obviously you'll want to know how to read the user's button inputs. That can
 almost go without saying, except that I said it.
 Each other part can be handled in about any order you like.
 Using interrupts is perhaps one of the hardest things for us as Rust programmers
 due to quirks in our compilation process. Our code all gets compiled to 16-bit
 THUMB instructions, and we don't have a way to mark a function to be compiled
 using 32-bit ASM instructions instead. However, an interrupt handler _must_ be
 written in 32-bit ASM instructions for it to work. That means that we have to
 write our interrupt handler in 32-bit ASM by hand. We'll do it, but I don't
 think we'll be too happy about it.
 The Link Cable related stuff is also probably a little harder to test than
 anything else. Just because link cable emulation isn't always the best, and or
 you need two GBAs with two flash carts and the cable for hardware testing.
 Still, we'll try to go over it eventually.
--- a/book/src-bak/00-quirks-index.md
+++ b/book/src-bak/00-quirks-index.md
@ -1,9 +0,0 @@
 # Quirks
 The GBA supports a lot of totally normal Rust code exactly like you'd think.
 However, it also is missing a lot of what you might expect, and sometimes we
 have to do things in slightly weird ways.
 We start the book by covering the quirks our code will have, just to avoid too
 many surprises later.
--- a/book/src-bak/00-video-index.md
+++ b/book/src-bak/00-video-index.md
@ -1,9 +0,0 @@
 # Video
 GBA Video starts with an IO register called the "Display Control Register", and
 then spirals out from there. You generally have to use Palette RAM (PALRAM),
 Video RAM (VRAM), Object Attribute Memory (OAM), as well as any number of other
 IO registers.
 They all have to work together just right, and there's a lot going on when you
 first try doing it, so try to take it very slowly as you're learning each step.
--- a/book/src-bak/01-buttons.md
+++ b/book/src-bak/01-buttons.md
@ -1,102 +0,0 @@
 # Buttons
 It's all well and good to just show a picture, even to show an animation, but if
 we want a game we have to let the user interact with something.
 ## Key Input Register
 * KEYINPUT, `0x400_0130`, `u16`, read only
 This little `u16` stores the status of _all_ the buttons on the GBA, all at
 once. There's only 10 of them, and we have 16 bits to work with, so that sounds
 easy. However, there's a bit of a catch. The register follows a "low-active"
 convention, where pressing a button _clears_ that bit until it's released.
 ```rust
 const NO_BUTTONS_PRESSED: u16 = 0b0000_0011_1111_1111;
 ```
 The buttons are, going up in order from the 0th bit:
 * A
 * B
 * Select
 * Start
 * Right
 * Left
 * Up
 * Down
 * R
 * L
 Bits above that are not used. However, since the left and right directions, as
 well as the up and down directions, can never be pressed at the same time, the
 `KEYINPUT` register should never read as zero. Of course, the register _might_
 read as zero if someone is using an emulator that allows for such inputs, so I
 wouldn't go so far as to make it be `NonZeroU16` or anything like that.
 When programming, we usually are thinking of what buttons we want to have _be
 pressed_ instead of buttons we want to have _not be pressed_. This means that we
 need an inversion to happen somewhere along the line. The easiest moment of
 inversion is immediately as you read in from the register and wrap the value up
 in a newtype.
 ```rust
 pub fn read_key_input() -> KeyInput {
  KeyInput(KEYINPUT.read() ^ 0b0000_0011_1111_1111)
 }
 ```
 Now the KeyInput you get can be checked for what buttons are pressed by checking
 for a set bit like you'd do anywhere else.
 ```rust
 impl KeyInput {
  pub fn a_pressed(self) -> bool {
    (self.0 & A_BIT) > 0
  }
 }
 ```
 Note that the current `KEYINPUT` value changes in real time as the user presses
 or releases the buttons. To account for this, it's best to read the value just
 once per game frame and then use that single value as if it was the input across
 the whole frame. If you've worked with polling input before that should sound
 totally normal. If not, just remember to call `read_key_input` once per frame
 and then use that `KeyInput` value across the whole frame.
 ### Detecting New Presses
 The keypad only tells you what's _currently_ pressed, but if you want to check
 what's _newly_ pressed it's not too much harder.
 All that you do is store the last frame's keys and compare them to the current
 keys with an `XOR`. In the `gba` crate it's called `KeyInput::difference`. Once
 you've got the difference between last frame and this frame, you know what
 changes happened.
 * If something is in the difference and _not pressed_ in the last frame, that
  means it was newly pressed.
 * If something is in the difference and _pressed_ in the last frame that means
  it was newly released.
 * If something is not in the difference then there's no change between last
  frame and this frame.
 ## Key Interrupt Control
 * KEYCNT, `0x400_0132`, `u16`, read/write
 This lets you control what keys will trigger a keypad interrupt. Of course, for
 the actual interrupt to fire you also need to set the `IME` and `IE` registers
 properly. See the [Interrupts](05-interrupts.md) section for details there.
 The main thing to know about this register is that the keys are in _the exact
 same order_ as the key input order. However, with this register they use a
 high-active convention instead (eg: the bit is active when the button should be
 pressed as part of the interrupt).
 In addition to simply having the bits for the buttons, bit 14 is a flag for
 enabling keypad interrupts (in addition to the flag in the `IE` register), and
 bit 15 decides how having more than one button works. If bit 15 is disabled,
 it's an OR combination (eg: "press any key to continue"). If bit 15 is enabled
 it's an AND combination (eg: "press A+B+Start+Select to reset").
--- a/book/src-bak/01-cpu.md
+++ b/book/src-bak/01-cpu.md
@ -1 +0,0 @@
 # CPU
--- a/book/src-bak/01-no_std.md
+++ b/book/src-bak/01-no_std.md
@ -1,160 +0,0 @@
 # No Std
 First up, as you already saw in the `hello_magic` code, we have to use the
 `#![no_std]` outer attribute on our program when we target the GBA. You can find
 some info about `no_std` in two official sources:
 * [unstable
  book section](https://doc.rust-lang.org/unstable-book/language-features/lang-items.html#writing-an-executable-without-stdlib)
 * [embedded
  book section](https://rust-embedded.github.io/book/intro/no-std.html?highlight=no_std#a--no_std--rust-environment)
 The unstable book is borderline useless here because it's describing too many
 things in too many words. The embedded book is much better, but still fairly
 terse.
 ## Bare Metal
 The GBA falls under what the Embedded Book calls "Bare Metal Environments".
 Basically, the machine powers on and immediately begins executing some ASM code.
 Our ASM startup was provided by `Ketsuban` (check the `crt0.s` file). We'll go
 over _how_ it works much later on, for now it's enough to know that it does
 work, and eventually control passes into Rust code.
 On the rust code side of things, we determine our starting point with the
 `#[start]` attribute on our `main` function. The `main` function also has a
 specific type signature that's different from the usual `main` that you'd see in
 Rust. I'd tell you to read the unstable-book entry on `#[start]` but they
 [literally](https://doc.rust-lang.org/unstable-book/language-features/start.html)
 just tell you to look at the [tracking issue for
 it](https://github.com/rust-lang/rust/issues/29633) instead, and that's not very
 helpful either. Basically it just _has_ to be declared the way it is, even
 though there's nothing passing in the arguments and there's no place that the
 return value will go. The compiler won't accept it any other way.
 ## No Standard Library
 The Embedded Book tells us that we can't use the standard library, but we get
 access to something called "libcore", which sounds kinda funny. What they're
 talking about is just [the core
 crate](https://doc.rust-lang.org/core/index.html), which is called `libcore`
 within the rust repository for historical reasons.
 The `core` crate is actually still a really big portion of Rust. The standard
 library doesn't actually hold too much code (relatively speaking), instead it
 just takes code form other crates and then re-exports it in an organized way. So
 with just `core` instead of `std`, what are we missing?
 In no particular order:
 * Allocation
 * Clock
 * Network
 * File System
 The allocation system and all the types that you can use if you have a global
 allocator are neatly packaged up in the
 [alloc](https://doc.rust-lang.org/alloc/index.html) crate. The rest isn't as
 nicely organized.
 It's _possible_ to implement a fair portion of the entire standard library
 within a GBA context and make the rest just panic if you try to use it. However,
 do you really need all that? Eh... probably not?
 * We don't need a file system, because all of our data is just sitting there in
  the ROM for us to use. When programming we can organize our `const` data into
  modules and such to keep it organized, but once the game is compiled it's just
  one huge flat address space. TODO: Parasyte says that a FS can be handy even
  if it's all just ReadOnly, so we'll eventually talk about how you might set up
  such a thing I guess, since we'll already be talking about replacements for
  three of the other four things we "lost". Maybe we'll make Parasyte write that
  section.
 * Networking, well, the GBA has a Link Cable you can use to communicate with
  another GBA, but it's not really like a unix socket with TCP, so the standard
  Rust networking isn't a very good match.
 * Clock is actually two different things at once. One is the ability to store
  the time long term, which is a bit of hardware that some gamepaks have in them
  (eg: pokemon ruby/sapphire/emerald). The GBA itself can't keep time while
  power is off. However, the second part is just tracking time moment to moment,
  which the GBA can totally do. We'll see how to access the timers soon enough.
 Which just leaves us with allocation. Do we need an allocator? Depends on your
 game. For demos and small games you probably don't need one. For bigger games
 you'll maybe want to get an allocator going eventually. It's in some sense a
 crutch, but it's a very useful one.
 So I promise that at some point we'll cover how to get an allocator going.
 Either a Rust Global Allocator (if practical), which would allow for a lot of
 the standard library types to be used "for free" once it was set up, or just a
 custom allocator that's GBA specific if Rust's global allocator style isn't a
 good fit for the GBA (I honestly haven't looked into it).
 ## Bare Metal Panic
 If our code panics, we usually want to see that panic message. Unfortunately,
 without a way to access something like `stdout` or `stderr` we've gotta do
 something a little weirder.
 If our program is running within the `mGBA` emulator, version 0.7 or later, we
 can access a special set of addresses that allow us to send out `CString`
 values, which then appear within a message log that you can check.
 We can capture this behavior by making an `MGBADebug` type, and then implement
 `core::fmt::Write` for that type. Once done, the `write!` macro will let us
 target the mGBA debug output channel.
 When used, it looks like this:
 ```rust
 #[panic_handler]
 fn panic(info: &core::panic::PanicInfo) -> ! {
  use core::fmt::Write;
  use gba::mgba::{MGBADebug, MGBADebugLevel};
  if let Some(mut mgba) = MGBADebug::new() {
    let _ = write!(mgba, "{}", info);
    mgba.send(MGBADebugLevel::Fatal);
  }
  loop {}
 }
 ```
 If you want to follow the particulars you can check the `MGBADebug` source in
 the `gba` crate. Basically, there's one address you can use to try and activate
 the debug output, and if it works you write your message into the "array" at
 another address, and then finally write a send value to a third address. You'll
 need to have read the [volatile](03-volatile_destination.md) section for the
 details to make sense.
 ## LLVM Intrinsics
 The above code will make your program fail to build in debug mode, saying that
 `__clzsi2` can't be found. This is a special builtin function that LLVM attempts
 to use when there's no hardware version of an operation it wants to do (in this
 case, counting the leading zeros). It's not _actually_ necessary in this case,
 which is why you only need it in debug mode. The higher optimization level of
 release mode makes LLVM pre-compute more and fold more constants or whatever and
 then it stops trying to call `__clzsi2`.
 Unfortunately, sometimes a build will fail with a missing intrinsic even in
 release mode.
 If LLVM wants _core_ to have that intrinsic then you're in
 trouble, you'll have to send a PR to the
 [compiler-builtins](https://github.com/rust-lang-nursery/compiler-builtins)
 repository and hope to get it into rust itself.
 If LLVM wants _your code_ to have the intrinsic then you're in less trouble. You
 can look up the details and then implement it yourself. It can go anywhere in
 your program, as long as it has the right ABI and name. In the case of
 `__clzsi2` it takes a `usize` and returns a `usize`, so you'd write something
 like:
 ```rust
 #[no_mangle]
 pub extern "C" fn __clzsi2(mut x: usize) -> usize {
  //
 }
 ```
 And so on for whatever other missing intrinsic.
--- a/book/src-bak/01-requirements.md
+++ b/book/src-bak/01-requirements.md
@ -1,29 +0,0 @@
 # Reader Requirements
 This book naturally assumes that you've already read Rust's core book:
 * [The Rust Programming Language](https://doc.rust-lang.org/book/)
 Now, I _know_ it sounds silly to say "if you wanna program Rust on this old
 video game system you should already know how to program Rust", but the more
 people I meet and chat with the more they tell me that they jumped into Rust
 without reading any or all of the book. You know who you are.
 Please, read the whole book!
 In addition to the core book, there's also an expansion book that I will declare
 to be required reading for this:
 * [The Rustonomicon](https://doc.rust-lang.org/nomicon/)
 The Rustonomicon is all about trying to demystify `unsafe`. We'll end up using a
 fair bit of unsafe code as a natural consequence of doing direct hardware
 manipulations. Using unsafe is like [swinging a
 sword](https://www.zeldadungeon.net/wp-content/uploads/2013/04/tumblr_mlkpzij6T81qizbpto1_1280.gif),
 you should start slowly, practice carefully, and always pay attention no matter
 how experienced you think you've become.
 That said, it's sometimes a [necessary
 tool](https://www.youtube.com/watch?v=rTo2u13lVcQ) to get the job done, so you
 have to break out of the borderline pathological fear of using it that most rust
 programmers tend to have.
--- a/book/src-bak/01-rgb15.md
+++ b/book/src-bak/01-rgb15.md
@ -1 +0,0 @@
 # RBG15 Color
--- a/book/src-bak/02-bios.md
+++ b/book/src-bak/02-bios.md
@ -1,239 +0,0 @@
 # BIOS
 * **Address Span:** `0x0` to `0x3FFF` (16k)
 The [BIOS](https://en.wikipedia.org/wiki/BIOS) of the GBA is a small read-only
 portion of memory at the very base of the address space. However, it is also
 hardware protected against reading, so if you try to read from BIOS memory when
 the program counter isn't pointed into the BIOS (eg: any time code _you_ write
 is executing) then you get [basically garbage
 data](https://problemkaputt.de/gbatek.htm#gbaunpredictablethings) back.
 So we're not going to spend time here talking about what bits to read or write
 within BIOS memory like we do with the other sections. Instead we're going to
 spend time talking about [inline
 assembly](https://doc.rust-lang.org/unstable-book/language-features/asm.html)
 ([tracking issue](https://github.com/rust-lang/rust/issues/29722)) and then use
 it to call the [GBA BIOS
 Functions](https://problemkaputt.de/gbatek.htm#biosfunctions).
 Note that BIOS calls have _more overhead than normal function calls_, so don't
 go using them all over the place if you don't have to. They're also usually
 written more to be compact in terms of code than for raw speed, so you actually
 can out speed them in some cases. Between the increased overhead and not being
 as speed optimized, you can sometimes do a faster job without calling the BIOS
 at all. (TODO: investigate more about  what parts of the BIOS we could
 potentially offer faster alternatives for.)
 I'd like to take a moment to thank [Marc Brinkmann](https://github.com/mbr)
 (with contributions from [Oliver Scherer](https://github.com/oli-obk) and
 [Philipp Oppermann](https://github.com/phil-opp)) for writing [this blog
 post](http://embed.rs/articles/2016/arm-inline-assembly-rust/). It's at least
 ten times the tutorial quality as the `asm` entry in the Unstable Book has. In
 fairness to the Unstable Book, the actual spec of how inline ASM works in rust
 is "basically what clang does", and that's specified as "basically what GCC
 does", and that's basically/shockingly not specified much at all despite GCC
 being like 30 years old.
 So let's be slow and pedantic about this process.
 ## Inline ASM
 **Fair Warning:** The general information that follows regarding the asm macro
 is consistent from system to system, but specific information about register
 names, register quantities, asm instruction argument ordering, and so on is
 specific to ARM on the GBA. If you're programming for any other device you'll
 need to carefully investigate that before you begin.
 Now then, with those out of the way, the inline asm docs describe an asm call as
 looking like this:
 ```rust
 let x = 10u32;
 let y = 34u32;
 let result: u32;
 asm!(
  // assembly template
  "add {lhs}, {rhs}",
  lhs = inout(reg_thumb) x => result,
  rhs = in(reg_thumb) y,
  options(nostack, nomem),
 );
 // result == 44
 ```
 The `asm` macro follows the [RFC
 2873](https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md)
 syntax. The following is just a summary of the RFC.
 Now we have to decide what we're gonna write. Obviously we're going to do some
 instructions, but those instructions use registers, and how are we gonna talk
 about them? We've got two choices.
 1) We can pick each and every register used by specifying exact register names.
   In THUMB mode we have 8 registers available, named `r0` through `r7`. To use
   those registers you would write  `in("r0") x` instead of
   `rhs = in(reg_thumb) x`, and directly refer to `r0` in the assembly template.
 2) We can specify slots for registers we need and let LLVM decide. This is what
   we do when we write `rhs = in(reg_thumb) y` and use `{rhs}` in the assembly
   template.
   The `reg_thumb` stands for the register class we are using. Since we are
   in THUMB mode, the set of registers we can use is limited. `reg_thumb` tells
   LLVM: "use only registers available in THUMB mode". In 32-bit mode, you have
   access to more register and you should use a different register class.
   The register classes [are described in the
   RFC](https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md#register-operands).
   Look for "ARM" register classes.
 In the case of the GBA BIOS, each BIOS function has pre-designated input and
 output registers, so we will use the first style. If you use inline ASM in other
 parts of your code you're free to use the second style.
 ### Assembly
 This is just one big string literal. You write out one instruction per line, and
 excess whitespace is ignored. You can also do comments within your assembly
 using `;` to start a comment that goes until the end of the line.
 Assembly convention doesn't consider it unreasonable to comment potentially as
 much as _every single line_ of asm that you write when you're getting used to
 things. Or even if you are used to things. This is cryptic stuff, there's a
 reason we avoid writing in it as much as possible.
 Remember that our Rust code is in 16-bit mode. You _can_ switch to 32-bit mode
 within your asm as long as you switch back by the time the block ends. Otherwise
 you'll have a bad time.
 ### Register bindings
 After the assembly string literal, you need to define your binding (which
 rust variables are getting into your registers and which ones are going to refer
 to their value afterward).
 There are many operand types [as per the
 RFC](https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md#operand-type),
 but you will most often use:
 ```
 [alias =] in(<reg>) <binding> // input
 [alias =] out(<reg>) <binding> // output
 [alias =] inout(<reg>) <in binding> => <out binding> // both
 out(<reg>) _ // Clobber
 ```
 * The binding can be any single 32-bit or smaller value.
 * If your binding has bit pattern requirements ("must be non-zero", etc) you are
  responsible for upholding that.
 * If your binding type will try to `Drop` later then you are responsible for it
  being in a fit state to do that.
 * The binding must be either a mutable binding or a binding that was
  pre-declared but not yet assigned.
 * An input binding must be a single 32-bit or smaller value.
 * An input binding _should_ be a type that is `Copy` but this is not an absolute
  requirement. Having the input be read is semantically similar to using
  `core::ptr::read(&binding)` and forgetting the value when you're done.
 Anything else is UB.
 ### Clobbers
 Sometimes your asm will touch registers other than the ones declared for input
 and output. 
 Clobbers are declared as a comma separated list of string literals naming
 specific registers. You don't use curly braces with clobbers.
 LLVM _needs_ to know this information. It can move things around to keep your
 data safe, but only if you tell it what's about to happen.
 Failure to define all of your clobbers can cause UB.
 ### Options
 By default the compiler won't optimize the code you wrote in an `asm` block. You
 will need to specify with the `options(..)` parameter that your code can be
 optimized. The available options [are specified in the
 RFC](https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md#options-1).
 An optimization might duplicate or remove your instructions from the final
 code.
 Typically when executing a BIOS call (such as `swi 0x01`, which resets the
 console), it's important that the instruction is executed, and not optimized
 away, even though it has no observable input and output to the compiler.
 However some BIOS calls, such as _some_ math functions, have no observable
 effects outside of the registers we specified, in this case, we instruct the
 compiler to optimize them.
 ### BIOS ASM
 * Inputs are always `r0`, `r1`, `r2`, and/or `r3`, depending on function.
 * Outputs are always zero or more of `r0`, `r1`, and `r3`.
 * Any of the output registers that aren't actually used should be marked as
  clobbered.
 * All other registers are unaffected.
 All of the GBA BIOS calls are performed using the
 [swi](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/BABFCEEG.html)
 instruction, combined with a value depending on what BIOS function you're trying
 to invoke. If you're in 16-bit code you use the value directly, and if you're in
 32-bit mode you shift the value up by 16 bits first.
 ### Example BIOS Function: Division
 For our example we'll use the division function, because GBATEK gives very clear
 instructions on how each register is used with that one:
 ```txt
 Signed Division, r0/r1.
  r0  signed 32bit Number
  r1  signed 32bit Denom
 Return:
  r0  Number DIV Denom ;signed
  r1  Number MOD Denom ;signed
  r3  ABS (Number DIV Denom) ;unsigned
 For example, incoming -1234, 10 should return -123, -4, +123.
 The function usually gets caught in an endless loop upon division by zero.
 ```
 The math folks tell me that the `r1` value should be properly called the
 "remainder" not the "modulus". We'll go with that for our function, doesn't hurt
 to use the correct names. Our Rust function has an assert against dividing by
 `0`, then we name some bindings _without_ giving them a value, we make the asm
 call, and then return what we got.
 ```rust
 pub fn div_rem(numerator: i32, denominator: i32) -> (i32, i32) {
  assert!(denominator != 0);
  let div_out: i32;
  let rem_out: i32;
  unsafe {
    asm!(
      "swi 0x06",
      inout("r0") numerator => div_out,
      inout("r1") denominator => rem_out,
      out("r3") _,
      options(nostack, nomem),
    );
  }
  (div_out, rem_out)
 }
 ```
 I _hope_ this all makes sense by now.
 ## Specific BIOS Functions
 For a full list of all the specific BIOS functions and their use you should
 check the `gba::bios` module within the `gba` crate. There's just so many of
 them that enumerating them all here wouldn't serve much purpose.
 Which is not to say that we'll never cover any BIOS functions in this book!
 Instead, we'll simply mention them when whenever they're relevent to the task at
 hand (such as controlling sound or waiting for vblank).
 //TODO: list/name all BIOS functions as well as what they relate to elsewhere.
--- a/book/src-bak/02-fixed_only.md
+++ b/book/src-bak/02-fixed_only.md
@ -1,548 +0,0 @@
 # Fixed Only
 In addition to not having much of the standard library available, we don't even
 have a floating point unit available! We can't do floating point math in
 hardware! We _could_ still do floating point math as pure software computations
 if we wanted, but that's a slow, slow thing to do.
 Are there faster ways? It's the same answer as always: "Yes, but not without a
 tradeoff."
 The faster way is to represent fractional values using a system called a [Fixed
 Point Representation](https://en.wikipedia.org/wiki/Fixed-point_arithmetic).
 What do we trade away? Numeric range.
 * Floating point math stores bits for base value and for exponent all according
  to a single [well defined](https://en.wikipedia.org/wiki/IEEE_754) standard
  for how such a complicated thing works.
 * Fixed point math takes a normal integer (either signed or unsigned) and then
  just "mentally associates" it (so to speak) with a fractional value for its
  "units". If you have 3 and it's in units of 1/2, then you have 3/2, or 1.5
  using decimal notation. If your number is 256 and it's in units of 1/256th
  then the value is 1.0 in decimal notation.
 Floating point math requires dedicated hardware to perform quickly, but it can
 "trade" precision when it needs to represent extremely large or small values.
 Fixed point math is just integral math, which our GBA is reasonably good at, but
 because your number is associated with a fixed fraction your results can get out
 of range very easily.
 ## Representing A Fixed Point Value
 So we want to associate our numbers with a mental note of what units they're in:
 * [PhantomData](https://doc.rust-lang.org/core/marker/struct.PhantomData.html)
  is a type that tells the compiler "please remember this extra type info" when
  you add it as a field to a struct. It goes away at compile time, so it's
  perfect for us to use as space for a note to ourselves without causing runtime
  overhead.
 * The [typenum](https://crates.io/crates/typenum) crate is the best way to
  represent a number within a type in Rust. Since our values on the GBA are
  always specified as a number of fractional bits to count the number as, we can
  put `typenum` types such as `U8` or `U14` into our `PhantomData` to keep track
  of what's going on.
 Now, those of you who know me, or perhaps just know my reputation, will of
 course _immediately_ question what happened to the real Lokathor. I do not care
 for most crates, and I particularly don't care for using a crate in teaching
 situations. However, `typenum` has a number of factors on its side that let me
 suggest it in this situation:
 * It's version 1.10 with a total of 21 versions and nearly 700k downloads, so we
  can expect that the major troubles have been shaken out and that it will remain
  fairly stable for quite some time to come.
 * It has no further dependencies that it's going to drag into the compilation.
 * It happens all at compile time, so it's not clogging up our actual game with
  any nonsense.
 * The (interesting) subject of "how do you do math inside Rust's trait system?" is
  totally separate from the concern that we're trying to focus on here.
 Therefore, we will consider it acceptable to use this crate.
 Now the `typenum` crate defines a whole lot, but we'll focus down to just a
 single type at the moment:
 [UInt](https://docs.rs/typenum/1.10.0/typenum/uint/struct.UInt.html) is a
 type-level unsigned value. It's like `u8` or `u16`, but while they're types that
 then have values, each `UInt` construction statically equates to a specific
 value. Like how the `()` type only has one value, which is also called `()`. In
 this case, you wrap up `UInt` around smaller `UInt` values and a `B1` or `B0`
 value to build up the binary number that you want at the type level.
 In other words, instead of writing
 ```rust
 let six = 0b110;
 ```
 We write
 ```rust
 type U6 = UInt<UInt<UInt<UTerm, B1>, B1>, B0>;
 ```
 Wild, I know. If you look into the `typenum` crate you can do math and stuff
 with these type level numbers, and we will a little bit below, but to start off
 we _just_ need to store one in some `PhantomData`.
 ### A struct For Fixed Point
 Our actual type for a fixed point value looks like this:
 ```rust
 use core::marker::PhantomData;
 use typenum::marker_traits::Unsigned;
 /// Fixed point `T` value with `F` fractional bits.
 #[derive(Debug, Copy, Clone, Default, PartialEq, Eq, PartialOrd, Ord)]
 #[repr(transparent)]
 pub struct Fx<T, F: Unsigned> {
  bits: T,
  _phantom: PhantomData<F>,
 }
 ```
 This says that `Fx<T,F>` is a generic type that holds some base number type `T`
 and a `F` type that's marking off how many fractional bits we're using. We only
 want people giving unsigned type-level values for the `PhantomData` type, so we
 use the trait bound `F: Unsigned`.
 We use
 [repr(transparent)](https://github.com/rust-lang/rfcs/blob/master/text/1758-repr-transparent.md)
 here to ensure that `Fx` will always be treated just like the base type in the
 final program (in terms of bit pattern and ABI).
 If you go and check, this is _basically_ how the existing general purpose crates
 for fixed point math represent their numbers. They're a little fancier about it
 because they have to cover every case, and we only have to cover our GBA case.
 That's quite a bit to type though. We probably want to make a few type aliases
 for things to be easier to look at. Unfortunately there's [no standard
 notation](https://en.wikipedia.org/wiki/Fixed-point_arithmetic#Notation) for how
 you write a fixed point type. We also have to limit ourselves to what's valid
 for use in a Rust type too. I like the `fx` thing, so we'll use that for signed
 and then `fxu` if we need an unsigned value.
 ```rust
 /// Alias for an `i16` fixed point value with 8 fractional bits.
 pub type fx8_8 = Fx<i16,U8>;
 ```
 Rust will complain about having `non_camel_case_types`, and you can shut that
 warning up by putting an `#[allow(non_camel_case_types)]` attribute on the type
 alias directly, or you can use `#![allow(non_camel_case_types)]` at the very top
 of the module to shut up that warning for the whole module (which is what I
 did).
 ## Constructing A Fixed Point Value
 So how do we actually _make_ one of these values? Well, we can always just wrap or unwrap any value in our `Fx` type:
 ```rust
 impl<T, F: Unsigned> Fx<T, F> {
  /// Uses the provided value directly.
  pub fn from_raw(r: T) -> Self {
    Fx {
      num: r,
      phantom: PhantomData,
    }
  }
  /// Unwraps the inner value.
  pub fn into_raw(self) -> T {
    self.num
  }
 }
 ```
 I'd like to use the `From` trait of course, but it was giving me some trouble, i
 think because of the orphan rule. Oh well.
 If we want to be particular to the fact that these are supposed to be
 _numbers_... that gets tricky. Rust is actually quite bad at being generic about
 number types. You can use the [num](https://crates.io/crates/num) crate, or you
 can just use a macro and invoke it once per type. Guess what we're gonna do.
 ```rust
 macro_rules! fixed_point_methods {
  ($t:ident) => {
    impl<F: Unsigned> Fx<$t, F> {
      /// Gives the smallest positive non-zero value.
      pub fn precision() -> Self {
        Fx {
          num: 1,
          phantom: PhantomData,
        }
      }
      /// Makes a value with the integer part shifted into place.
      pub fn from_int_part(i: $t) -> Self {
        Fx {
          num: i << F::U8,
          phantom: PhantomData,
        }
      }
    }
  };
 }
 fixed_point_methods! {u8}
 fixed_point_methods! {i8}
 fixed_point_methods! {i16}
 fixed_point_methods! {u16}
 fixed_point_methods! {i32}
 fixed_point_methods! {u32}
 ```
 Now _you'd think_ that those can be `const`, but at the moment you can't have a
 `const` function with a bound on any trait other than `Sized`, so they have to
 be normal functions.
 Also, we're doing something a little interesting there with `from_int_part`. We
 can take our `F` type and get its constant value. There's other associated
 constants if we want it in other types, and also non-const methods if you wanted
 that for some reason (maybe passing it as a closure function? dunno).
 ## Casting Base Values
 Next, once we have a value in one base type we will need to be able to move it
 into another base type. Unfortunately this means we gotta use the `as` operator,
 which requires a concrete source type and a concrete destination type. There's
 no easy way for us to make it generic here.
 We could let the user use `into_raw`, cast, and then do `from_raw`, but that's
 error prone because they might change the fractional bit count accidentally.
 This means that we have to write a function that does the casting while
 perfectly preserving the fractional bit quantity. If we wrote one function for
 each conversion it'd be like 30 different possible casts (6 base types that we
 support, and then 5 possible target types). Instead, we'll write it just once in
 a way that takes a closure, and let the user pass a closure that does the cast.
 The compiler should merge it all together quite nicely for us once optimizations
 kick in.
 This code goes outside the macro. I want to avoid too much code in the macro if
 we can, it's a little easier to cope with I think.
 ```rust
  /// Casts the base type, keeping the fractional bit quantity the same.
  pub fn cast_inner<Z, C: Fn(T) -> Z>(self, op: C) -> Fx<Z, F> {
    Fx {
      num: op(self.num),
      phantom: PhantomData,
    }
  }
 ```
 It's horrible and ugly, but Rust is just bad at numbers sometimes.
 ## Adjusting Fractional Part
 In addition to the base value we might want to change our fractional bit
 quantity. This is actually easier that it sounds, but it also requires us to be
 tricky with the generics. We can actually use some typenum type level operators
 here.
 This code goes inside the macro: we need to be able to use the left shift and
 right shift, which is easiest when we just use the macro's `$t` as our type. We
 could alternately put a similar function outside the macro and be generic on `T`
 having the left and right shift operators by using a `where` clause. As much as
 I'd like to avoid too much code being generated by macro, I'd _even more_ like
 to avoid generic code with huge and complicated trait bounds. It comes down to
 style, and you gotta decide for yourself.
 ```rust
      /// Changes the fractional bit quantity, keeping the base type the same.
      pub fn adjust_fractional_bits<Y: Unsigned + IsEqual<F, Output = False>>(self) -> Fx<$t, Y> {
        let leftward_movement: i32 = Y::to_i32() - F::to_i32();
        Fx {
          num: if leftward_movement > 0 {
            self.num << leftward_movement
          } else {
            self.num >> (-leftward_movement)
          },
          phantom: PhantomData,
        }
      }
 ```
 There's a few things at work. First, we introduce `Y` as the target number of
 fractional bits, and we _also_ limit it that the target bits quantity can't be
 the same as we already have using a type-level operator. If it's the same as we
 started with, why are you doing the cast at all?
 Now, once we're sure that the current bits and target bits aren't the same, we
 compute `target - start`, and call this our "leftward movement". Example: if
 we're targeting 8 bits and we're at 4 bits, we do 8-4 and get +4 as our leftward
 movement. If the leftward_movement is positive we naturally shift our current
 value to the left. If it's not positive then it _must_ be negative because we
 eliminated 0 as a possibility using the type-level operator, so we shift to the
 right by the negative value.
 ## Addition, Subtraction, Shifting, Negative, Comparisons
 From here on we're getting help from [this blog
 post](https://spin.atomicobject.com/2012/03/15/simple-fixed-point-math/) by [Job
 Vranish](https://spin.atomicobject.com/author/vranish/), so thank them if you
 learn something.
 I might have given away the game a bit with those `derive` traits on our fixed
 point type. For a fair number of operations you can use the normal form of the
 op on the inner bits as long as the fractional parts have the same quantity.
 This includes equality and ordering (which we derived) as well as addition,
 subtraction, and bit shifting (which we need to do ourselves).
 This code can go outside the macro, with sufficient trait bounds.
 ```rust
 impl<T: Add<Output = T>, F: Unsigned> Add for Fx<T, F> {
  type Output = Self;
  fn add(self, rhs: Fx<T, F>) -> Self::Output {
    Fx {
      num: self.num + rhs.num,
      phantom: PhantomData,
    }
  }
 }
 ```
 The bound on `T` makes it so that `Fx<T, F>` can be added any time that `T` can
 be added to its own type with itself as the output. We can use the exact same
 pattern for `Sub`, `Shl`, `Shr`, and `Neg`. With enough trait bounds, we can do
 anything!
 ```rust
 impl<T: Sub<Output = T>, F: Unsigned> Sub for Fx<T, F> {
  type Output = Self;
  fn sub(self, rhs: Fx<T, F>) -> Self::Output {
    Fx {
      num: self.num - rhs.num,
      phantom: PhantomData,
    }
  }
 }
 impl<T: Shl<u32, Output = T>, F: Unsigned> Shl<u32> for Fx<T, F> {
  type Output = Self;
  fn shl(self, rhs: u32) -> Self::Output {
    Fx {
      num: self.num << rhs,
      phantom: PhantomData,
    }
  }
 }
 impl<T: Shr<u32, Output = T>, F: Unsigned> Shr<u32> for Fx<T, F> {
  type Output = Self;
  fn shr(self, rhs: u32) -> Self::Output {
    Fx {
      num: self.num >> rhs,
      phantom: PhantomData,
    }
  }
 }
 impl<T: Neg<Output = T>, F: Unsigned> Neg for Fx<T, F> {
  type Output = Self;
  fn neg(self) -> Self::Output {
    Fx {
      num: -self.num,
      phantom: PhantomData,
    }
  }
 }
 ```
 Unfortunately, for `Shl` and `Shr` to have as much coverage on our type as it
 does on the base type (allowing just about any right hand side) we'd have to do
 another macro, but I think just `u32` is fine. We can always add more later if
 we need.
 We could also implement `BitAnd`, `BitOr`, `BitXor`, and `Not`, but they don't
 seem relevent to our fixed point math use, and this section is getting long
 already. Just use the same general patterns if you want to add it in your own
 programs. Shockingly, `Rem` also works directly if you want it, though I don't
 forsee us needing floating point remainder. Also, the GBA can't do hardware
 division or remainder, and we'll have to work around that below when we
 implement `Div` (which maybe we don't need, but it's complex enough I should
 show it instead of letting people guess).
 **Note:** In addition to the various `Op` traits, there's also `OpAssign`
 variants. Each `OpAssign` is the same as `Op`, but takes `&mut self` instead of
 `self` and then modifies in place instead of producing a fresh value. In other
 words, if you want both `+` and `+=` you'll need to do the `AddAssign` trait
 too. It's not the worst thing to just write `a = a+b`, so I won't bother with
 showing all that here. It's pretty easy to figure out for yourself if you want.
 ## Multiplication
 This is where things get more interesting. When we have two numbers `A` and `B`
 they really stand for `(a*f)` and `(b*f)`. If we write `A*B` then we're really
 writing `(a*f)*(b*f)`, which can be rewritten as `(a*b)*2f`, and now it's
 obvious that we have one more `f` than we wanted to have. We have to do the
 multiply of the inner value and then divide out the `f`. We divide by `1 <<
 bit_count`, so if we have 8 fractional bits we'll divide by 256.
 The catch is that, when we do the multiply we're _extremely_ likely to overflow
 our base type with that multiplication step. Then we do that divide, and now our
 result is basically nonsense. We can avoid this to some extent by casting up to
 a higher bit type, doing the multiplication and division at higher precision,
 and then casting back down. We want as much precision as possible without being
 too inefficient, so we'll always cast up to 32-bit (on a 64-bit machine you'd
 cast up to 64-bit instead).
 Naturally, any signed value has to be cast up to `i32` and any unsigned value
 has to be cast up to `u32`, so we'll have to handle those separately.
 Also, instead of doing an _actual_ divide we can right-shift by the correct
 number of bits to achieve the same effect. _Except_ when we have a signed value
 that's negative, because actual division truncates towards zero and
 right-shifting truncates towards negative infinity. We can get around _this_ by
 flipping the sign, doing the shift, and flipping the sign again (which sounds
 silly but it's so much faster than doing an actual division).
 Also, again signed values can be annoying, because if the value _just happens_
 to be `i32::MIN` then when you negate it you'll have... _still_ a negative
 value. I'm not 100% on this, but I think the correct thing to do at that point
 is to give `$t::MIN` as the output num value.
 Did you get all that? Good, because this involves casting, so we will need to
 implement it three times, which calls for another macro.
 ```rust
 macro_rules! fixed_point_signed_multiply {
  ($t:ident) => {
    impl<F: Unsigned> Mul for Fx<$t, F> {
      type Output = Self;
      fn mul(self, rhs: Fx<$t, F>) -> Self::Output {
        let pre_shift = (self.num as i32).wrapping_mul(rhs.num as i32);
        if pre_shift < 0 {
          if pre_shift == core::i32::MIN {
            Fx {
              num: core::$t::MIN,
              phantom: PhantomData,
            }
          } else {
            Fx {
              num: (-((-pre_shift) >> F::U8)) as $t,
              phantom: PhantomData,
            }
          }
        } else {
          Fx {
            num: (pre_shift >> F::U8) as $t,
            phantom: PhantomData,
          }
        }
      }
    }
  };
 }
 fixed_point_signed_multiply! {i8}
 fixed_point_signed_multiply! {i16}
 fixed_point_signed_multiply! {i32}
 macro_rules! fixed_point_unsigned_multiply {
  ($t:ident) => {
    impl<F: Unsigned> Mul for Fx<$t, F> {
      type Output = Self;
      fn mul(self, rhs: Fx<$t, F>) -> Self::Output {
        Fx {
          num: ((self.num as u32).wrapping_mul(rhs.num as u32) >> F::U8) as $t,
          phantom: PhantomData,
        }
      }
    }
  };
 }
 fixed_point_unsigned_multiply! {u8}
 fixed_point_unsigned_multiply! {u16}
 fixed_point_unsigned_multiply! {u32}
 ```
 ## Division
 Division is similar to multiplication, but reversed. Which makes sense. This
 time `A/B` gives `(a*f)/(b*f)` which is `a/b`, one _less_ `f` than we were
 after.
 As with the multiplication version of things, we have to up-cast our inner value
 as much a we can before doing the math, to allow for the most precision
 possible.
 The snag here is that the GBA has no division or remainder. Instead, the GBA has
 a BIOS function you can call to do `i32/i32` division.
 This is a potential problem for us though. If we have some unsigned value, we
 need it to fit within the positive space of an `i32` _after the multiply_ so
 that we can cast it to `i32`, call the BIOS function that only works on `i32`
 values, and cast it back to its actual type.
 * If you have a u8 you're always okay, even with 8 floating bits.
 * If you have a u16 you're okay even with a maximum value up to 15 floating
  bits, but having a maximum value and 16 floating bits makes it break.
 * If you have a u32 you're probably going to be in trouble all the time.
 So... ugh, there's not much we can do about this. For now we'll just have to
 suffer some.
 // TODO: find a numerics book that tells us how to do `u32/u32` divisions.
 ```rust
 macro_rules! fixed_point_signed_division {
  ($t:ident) => {
    impl<F: Unsigned> Div for Fx<$t, F> {
      type Output = Self;
      fn div(self, rhs: Fx<$t, F>) -> Self::Output {
        let mul_output: i32 = (self.num as i32).wrapping_mul(1 << F::U8);
        let divide_result: i32 = crate::bios::div(mul_output, rhs.num as i32);
        Fx {
          num: divide_result as $t,
          phantom: PhantomData,
        }
      }
    }
  };
 }
 fixed_point_signed_division! {i8}
 fixed_point_signed_division! {i16}
 fixed_point_signed_division! {i32}
 macro_rules! fixed_point_unsigned_division {
  ($t:ident) => {
    impl<F: Unsigned> Div for Fx<$t, F> {
      type Output = Self;
      fn div(self, rhs: Fx<$t, F>) -> Self::Output {
        let mul_output: i32 = (self.num as i32).wrapping_mul(1 << F::U8);
        let divide_result: i32 = crate::bios::div(mul_output, rhs.num as i32);
        Fx {
          num: divide_result as $t,
          phantom: PhantomData,
        }
      }
    }
  };
 }
 fixed_point_unsigned_division! {u8}
 fixed_point_unsigned_division! {u16}
 fixed_point_unsigned_division! {u32}
 ```
 ## Trigonometry
 TODO: look up tables! arcbits!
 ## Just Using A Crate
 If, after seeing all that, and seeing that I still didn't even cover every
 possible trait impl that you might want for all the possible types... if after
 all that you feel too intimidated, then I'll cave a bit on your behalf and
 suggest to you that the [fixed](https://crates.io/crates/fixed) crate seems to
 be the best crate available for fixed point math.
 _I have not tested its use on the GBA myself_.
 It's just my recommendation from looking at the docs of the various options
 available, if you really wanted to just have a crate for it.
--- a/book/src-bak/02-goals_and_style.md
+++ b/book/src-bak/02-goals_and_style.md
@ -1,23 +0,0 @@
 # Book Goals and Style
 So, what's this book actually gonna teach you?
 My goal is certainly not just showing off the crate. Programming for the GBA is
 weird enough that I'm trying to teach you all the rest of the stuff you need to
 know along the way. If I do my job right then you'd be able to write your own
 crate for GBA stuff just how you think it should all go by the end.
 Overall the book is sorted more for easy review once you're trying to program
 something. The GBA has a few things that can stand on their own and many other
 things are a mass of interconnected concepts, so some parts of the book end up
 having to refer you to portions that you haven't read yet. The chapters and
 sections are sorted so that _minimal_ future references are required, but it's
 unavoidable that it'll happen sometimes.
 The actual "tutorial order" of the book is the
 [Examples](../05-examples/00-index.md) chapter. Each section of that chapter
 breaks down one of the provided examples in the [examples
 directory](https://github.com/rust-console/gba/tree/master/examples) of the
 repository. We go over what sections of the book you'll need to have read for
 the example code to make sense, and also how we apply the general concepts
 described in the book to the specific example cases.
--- a/book/src-bak/02-timers.md
+++ b/book/src-bak/02-timers.md
@ -1 +0,0 @@
 # Timers
--- a/book/src-bak/03-dma.md
+++ b/book/src-bak/03-dma.md
@ -1,133 +0,0 @@
 # Direct Memory Access
 The GBA has four Direct Memory Access (DMA) units that can be utilized. They're
 mostly the same in terms of overall operation, but each unit has special rules
 that make it better suited to a particular task.
 **Please Note:** TONC and GBATEK have slightly different concepts of how a DMA
 unit's registers should be viewed. I've chosen to go by what GBATEK uses.
 ## General DMA
 A single DMA unit is controlled through four different IO Registers.
 * **Source:** (`DMAxSAD`, read only) A `*const` pointer that the DMA reads from.
 * **Destination:** (`DMAxDAD`, read only) A `*mut` pointer that the DMA writes
  to.
 * **Count:** (`DMAxCNT_L`, read only) How many transfers to perform.
 * **Control:** (`DMAxCNT_H`, read/write) A register full of bit-flags that
  controls all sorts of details.
 Here, the `x` is replaced with 0 through 3 when utilizing whichever particular
 DMA unit.
 ### Source Address
 This is either a `u32` or `u16` address depending on the unit's assigned
 transfer mode (see Control). The address MUST be aligned.
 With DMA0 the source must be internal memory. With other DMA units the source
 can be any non-`SRAM` location.
 ### Destination Address
 As with the Source, this is either a `u32` or `u16` address depending on the
 unit's assigned transfer mode (see Control). The address MUST be aligned.
 With DMA0/1/2 the destination must be internal memory. With DMA3 the destination
 can be any non-`SRAM` memory (allowing writes into Game Pak ROM / FlashROM,
 assuming that your Game Pak hardware supports that).
 ### Count
 This is a `u16` that says how many transfers (`u16` or `u32`) to make.
 DMA0/1/2 will only actually accept a 14-bit value, while DMA3 will accept a full
 16-bit value. A value of 0 instead acts as if you'd used the _maximum_ value for
 the DMA in question. Put another way, DMA0/1/2 transfer `1` through `0x4000`
 words, with `0` as the `0x4000` value, and DMA3 transfers `1` through `0x1_0000`
 words, with `0` as the `0x1_0000` value.
 The maximum value isn't a very harsh limit. Even in just `u16` mode, `0x4000`
 transfers is 32k, which would for example be all 32k of `IWRAM` (including your
 own user stack). If you for some reason do need to transfer more than a single
 DMA use can move around at once then you can just setup the DMA a second time
 and keep going.
 ### Control
 This `u16` bit-flag field is where things get wild.
 * Bits 0-4 do nothing
 * Bit 5-6 control how the destination address changes per transfer:
  * 0: Offset +1
  * 1: Offset -1
  * 2: No Change
  * 3: Offset +1 and reload when a Repeat starts (below)
 * Bit 7-8 similarly control how the source address changes per transfer:
  * 0: Offset +1
  * 1: Offset -1
  * 2: No Change
  * 3: Prohibited
 * Bit 9: enables Repeat mode.
 * Bit 10: Transfer `u16` (false) or `u32` (true) data.
 * Bit 11: "Game Pak DRQ" flag. GBATEK says that this is only allowed for DMA3,
  and also your Game Pak hardware must be equipped to use DRQ mode. I don't even
  know what DRQ mode is all about, and GBATEK doesn't say much either. If DRQ is
  set then you _must not_ set the Repeat bit as well. The `gba` crate simply
  doesn't bother to expose this flag to users.
 * Bit 12-13: DMA Start:
  * 0: "Immediate", which is 2 cycles after requested.
  * 1: VBlank
  * 2: HBlank
  * 3: Special, depending on what DMA unit is involved:
    * DMA0: Prohibited.
    * DMA1/2: Sound FIFO (see the [Sound](04-sound.md) section)
    * DMA3: Video Capture, intended for use with the Repeat flag, performs a
      transfer per scanline (similar to HBlank) starting at `VCOUNT` 2 and
      stopping at `VCOUNT` 162. Intended for copying things from ROM or camera
      into VRAM.
 * Bit 14: Interrupt upon DMA complete.
 * Bit 15: Enable this DMA unit.
 ## DMA Life Cycle
 The general technique for using a DMA unit involves first setting the relevent
 source, destination, and count registers, then setting the appropriate control
 register value with the Enable bit set.
 Once the Enable flag is set the appropriate DMA unit will trigger at the
 assigned time (Bit 12-13). The CPU's operation is halted while any DMA unit is
 active, until the DMA completes its task. If more than one DMA unit is supposed
 to be active at once, then the DMA unit with the lower number will activate and
 complete before any others.
 When the DMA triggers via _Enable_, the `Source`, `Destination`, and `Count`
 values are copied from the GBA's registers into the DMA unit's internal
 registers. Changes to the DMA unit's internal copy of the data don't affect the
 values in the GBA registers. Another _Enable_ will read the same values as
 before.
 If DMA is triggered via having _Repeat_ active then _only_ the Count is copied
 in to the DMA unit registers. The `Source` and `Destination` are unaffected
 during a Repeat. The exception to this is if the destination address control
 value (Bits 5-6) are set to 3 (`0b11`), in which case a _Repeat_ will also
 re-copy the `Destination` as well as the `Count`.
 Once a DMA operation completes, the Enable flag of its Control register will
 automatically be disabled, _unless_ the Repeat flag is on, in which case the
 Enable flag is left active. You will have to manually disable it if you don't
 want the DMA to kick in again over and over at the specified starting time.
 ## DMA Limitations
 The DMA units cannot access `SRAM` at all.
 If you're using HBlank to access any part of the memory that the display
 controller utilizes (`OAM`, `PALRAM`, `VRAM`), you need to have enabled the
 "HBlank Interval Free" bit in the Display Control Register (`DISPCNT`).
 Whenever DMA is active the CPU is _not_ active, which means that
 [Interrupts](05-interrupts.md) will not fire while DMA is happening. This can
 cause any number of hard to track down bugs. Try to limit your use of the DMA
 units if you can.
--- a/book/src-bak/03-volatile_destination.md
+++ b/book/src-bak/03-volatile_destination.md
@ -1,317 +0,0 @@
 # Volatile Destination
 TODO: update this when we can make more stuff `const`
 ## Volatile Memory
 The compiler is an eager friend, so when it sees a read or a write that won't
 have an effect, it eliminates that read or write. For example, if we write
 ```rust
 let mut x = 5;
 x = 7;
 ```
 The compiler won't actually ever put 5 into `x`. It'll skip straight to putting
 7 in `x`, because we never read from `x` when it's 5, so that's a safe change to
 make. Normally, values are stored in RAM, which has no side effects when you
 read and write from it. RAM is purely for keeping notes about values you'll need
 later on.
 However, what if we had a bit of hardware where we wanted to do a write and that
 did something _other than_ keeping the value for us to look at later? As you saw
 in the `hello_magic` example, we have to use a `write_volatile` operation.
 Volatile means "just do it anyway". The compiler thinks that it's pointless, but
 we know better, so we can force it to really do exactly what we say by using
 `write_volatile` instead of `write`.
 This is kinda error prone though, right? Because it's just a raw pointer, so we
 might forget to use `write_volatile` at some point.
 Instead, we want a type that's always going to use volatile reads and writes.
 Also, we want a pointer type that lets our reads and writes to be as safe as
 possible once we've unsafely constructed the initial value.
 ### Constructing The VolAddress Type
 First, we want a type that stores a location within the address space. This can
 be a pointer, or a `usize`, and we'll use a `usize` because that's easier to
 work with in a `const` context (and we want to have `const` when we can get it).
 We'll also have our type use `NonZeroUsize` instead of just `usize` so that
 `Option<VolAddress<T>>` stays as a single machine word. This helps quite a bit
 when we want to iterate over the addresses of a block of memory (such as
 locations within the palette memory). Hardware is never at the null address
 anyway. Also, if we had _just_ an address number then we wouldn't be able to
 track what type the address is for. We need some
 [PhantomData](https://doc.rust-lang.org/core/marker/struct.PhantomData.html),
 and specifically we need the phantom data to be for `*mut T`:
 * If we used `*const T` that'd have the wrong
  [variance](https://doc.rust-lang.org/nomicon/subtyping.html).
 * If we used `&mut T` then that's fusing in the ideas of _lifetime_ and
  _exclusive access_ to our type. That's potentially important, but that's also
  an abstraction we'll build _on top of_ this `VolAddress` type if we need it.
 One abstraction layer at a time, so we start with just a phantom pointer. This gives us a type that looks like this:
 ```rust
 #[derive(Debug)]
 #[repr(transparent)]
 pub struct VolAddress<T> {
  address: NonZeroUsize,
  marker: PhantomData<*mut T>,
 }
 ```
 Now, because of how `derive` is specified, it derives traits _if the generic
 parameter_ supports those traits. Since our type is like a pointer, the traits
 it supports are distinct from whatever traits the target type supports. So we'll
 provide those implementations manually.
 ```rust
 impl<T> Clone for VolAddress<T> {
  fn clone(&self) -> Self {
    *self
  }
 }
 impl<T> Copy for VolAddress<T> {}
 impl<T> PartialEq for VolAddress<T> {
  fn eq(&self, other: &Self) -> bool {
    self.address == other.address
  }
 }
 impl<T> Eq for VolAddress<T> {}
 impl<T> PartialOrd for VolAddress<T> {
  fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
    Some(self.address.cmp(&other.address))
  }
 }
 impl<T> Ord for VolAddress<T> {
  fn cmp(&self, other: &Self) -> Ordering {
    self.address.cmp(&other.address)
  }
 }
 ```
 Boilerplate junk, not interesting. There's a reason that you derive those traits
 99% of the time in Rust.
 ### Constructing A VolAddress Value
 Okay so here's the next core concept: If we unsafely _construct_ a
 `VolAddress<T>`, then we can safely _use_ the value once it's been properly
 created.
 ```rust
 // you'll need these features enabled and a recent nightly
 #![feature(const_int_wrapping)]
 #![feature(min_const_unsafe_fn)]
 impl<T> VolAddress<T> {
  pub const unsafe fn new_unchecked(address: usize) -> Self {
    VolAddress {
      address: NonZeroUsize::new_unchecked(address),
      marker: PhantomData,
    }
  }
  pub const unsafe fn cast<Z>(self) -> VolAddress<Z> {
    VolAddress {
      address: self.address,
      marker: PhantomData,
    }
  }
  pub unsafe fn offset(self, offset: isize) -> Self {
    VolAddress {
      address: NonZeroUsize::new_unchecked(self.address.get().wrapping_add(offset as usize * core::mem::size_of::<T>())),
      marker: PhantomData,
    }
  }
 }
 ```
 So what are the unsafety rules here?
 * Non-null, obviously.
 * Must be aligned for `T`
 * Must always produce valid bit patterns for `T`
 * Must not be part of the address space that Rust's stack or allocator will ever
  uses.
 So, again using the `hello_magic` example, we had
 ```rust
 (0x400_0000 as *mut u16).write_volatile(0x0403);
 ```
 And instead we could declare
 ```rust
 const MAGIC_LOCATION: VolAddress<u16> = unsafe { VolAddress::new(0x400_0000) };
 ```
 ### Using A VolAddress Value
 Now that we've named the magic location, we want to write to it.
 ```rust
 impl<T> VolAddress<T> {
  pub fn read(self) -> T
  where
    T: Copy,
  {
    unsafe { (self.address.get() as *mut T).read_volatile() }
  }
  pub unsafe fn read_non_copy(self) -> T {
    (self.address.get() as *mut T).read_volatile()
  }
  pub fn write(self, val: T) {
    unsafe { (self.address.get() as *mut T).write_volatile(val) }
  }
 }
 ```
 So if the type is `Copy` we can `read` it as much as we want. If, somehow, the
 type isn't `Copy`, then it might be `Drop`, and that means if we read out a
 value over and over we could cause the `drop` method to trigger UB. Since the
 end user might really know what they're doing, we provide an unsafe backup
 `read_non_copy`.
 On the other hand, we can `write` to the location as much as we want. Even if
 the type isn't `Copy`, _not running `Drop` is safe_, so a `write` is always
 safe.
 Now we can write to our magical location.
 ```rust
 MAGIC_LOCATION.write(0x0403);
 ```
 ### VolAddress Iteration
 We've already seen that sometimes we want to have a base address of some sort
 and then offset from that location to another. What if we wanted to iterate over
 _all the locations_. That's not particularly hard.
 ```rust
 impl<T> VolAddress<T> {
  pub const unsafe fn iter_slots(self, slots: usize) -> VolAddressIter<T> {
    VolAddressIter { vol_address: self, slots }
  }
 }
 #[derive(Debug)]
 pub struct VolAddressIter<T> {
  vol_address: VolAddress<T>,
  slots: usize,
 }
 impl<T> Clone for VolAddressIter<T> {
  fn clone(&self) -> Self {
    VolAddressIter {
      vol_address: self.vol_address,
      slots: self.slots,
    }
  }
 }
 impl<T> PartialEq for VolAddressIter<T> {
  fn eq(&self, other: &Self) -> bool {
    self.vol_address == other.vol_address && self.slots == other.slots
  }
 }
 impl<T> Eq for VolAddressIter<T> {}
 impl<T> Iterator for VolAddressIter<T> {
  type Item = VolAddress<T>;
  fn next(&mut self) -> Option<Self::Item> {
    if self.slots > 0 {
      let out = self.vol_address;
      unsafe {
        self.slots -= 1;
        self.vol_address = self.vol_address.offset(1);
      }
      Some(out)
    } else {
      None
    }
  }
 }
 impl<T> FusedIterator for VolAddressIter<T> {}
 ```
 ### VolAddressBlock
 Obviously, having a base address and a length exist separately is error prone.
 There's a good reason for slices to keep their pointer and their length
 together. We want something like that, which we'll call a "block" because
 "array" and "slice" are already things in Rust.
 ```rust
 #[derive(Debug)]
 pub struct VolAddressBlock<T> {
  vol_address: VolAddress<T>,
  slots: usize,
 }
 impl<T> Clone for VolAddressBlock<T> {
  fn clone(&self) -> Self {
    VolAddressBlock {
      vol_address: self.vol_address,
      slots: self.slots,
    }
  }
 }
 impl<T> PartialEq for VolAddressBlock<T> {
  fn eq(&self, other: &Self) -> bool {
    self.vol_address == other.vol_address && self.slots == other.slots
  }
 }
 impl<T> Eq for VolAddressBlock<T> {}
 impl<T> VolAddressBlock<T> {
  pub const unsafe fn new_unchecked(vol_address: VolAddress<T>, slots: usize) -> Self {
    VolAddressBlock { vol_address, slots }
  }
  pub const fn iter(self) -> VolAddressIter<T> {
    VolAddressIter {
      vol_address: self.vol_address,
      slots: self.slots,
    }
  }
  pub unsafe fn index_unchecked(self, slot: usize) -> VolAddress<T> {
    self.vol_address.offset(slot as isize)
  }
  pub fn index(self, slot: usize) -> VolAddress<T> {
    if slot < self.slots {
      unsafe { self.vol_address.offset(slot as isize) }
    } else {
      panic!("Index Requested: {} >= Bound: {}", slot, self.slots)
    }
  }
  pub fn get(self, slot: usize) -> Option<VolAddress<T>> {
    if slot < self.slots {
      unsafe { Some(self.vol_address.offset(slot as isize)) }
    } else {
      None
    }
  }
 }
 ```
 Now we can have something like:
 ```rust
 const OTHER_MAGIC: VolAddressBlock<u16> = unsafe {
  VolAddressBlock::new_unchecked(
    VolAddress::new(0x600_0000),
    240 * 160
  )
 };
 OTHER_MAGIC.index(120 + 80 * 240).write_volatile(0x001F);
 OTHER_MAGIC.index(136 + 80 * 240).write_volatile(0x03E0);
 OTHER_MAGIC.index(120 + 96 * 240).write_volatile(0x7C00);
 ```
 ### Docs?
 If you wanna see these types and methods with a full docs write up you should
 check the GBA crate's source.
--- a/book/src-bak/03-wram.md
+++ b/book/src-bak/03-wram.md
@ -1,28 +0,0 @@
 # Work RAM
 ## External Work RAM (EWRAM)
 * **Address Span:** `0x2000000` to `0x203FFFF` (256k)
 This is a big pile of space, the use of which is up to each game. However, the
 external work ram has only a 16-bit bus (if you read/write a 32-bit value it
 silently breaks it up into two 16-bit operations) and also 2 wait cycles (extra
 CPU cycles that you have to expend _per 16-bit bus use_).
 It's most helpful to think of EWRAM as slower, distant memory, similar to the
 "heap" in a normal application. You can take the time to go store something
 within EWRAM, or to load it out of EWRAM, but if you've got several operations
 to do in a row and you're worried about time you should pull that value into
 local memory, work on your local copy, and then push it back out to EWRAM.
 ## Internal Work RAM (IWRAM)
 * **Address Span:** `0x3000000` to `0x3007FFF` (32k)
 This is a smaller pile of space, but it has a 32-bit bus and no wait.
 By default, `0x3007F00` to `0x3007FFF` is reserved for interrupt and BIOS use.
 The rest of it is mostly up to you. The user's stack space starts at `0x3007F00`
 and proceeds _down_ from there. For best results you should probably start at
 `0x3000000` and then go upwards. Under normal use it's unlikely that the two
 memory regions will crash into each other.
--- a/book/src-bak/04-io-registers.md
+++ b/book/src-bak/04-io-registers.md
@ -1,3 +0,0 @@
 # IO Registers
 * **Address Span:** `0x400_0000` to `0x400_03FE`
--- a/book/src-bak/04-newtype.md
+++ b/book/src-bak/04-newtype.md
@ -1,206 +0,0 @@
 # Newtype
 TODO: we've already used newtype twice by now (fixed point values and volatile
 addresses), so we need to adjust how we start this section.
 There's a great Zero Cost abstraction that we'll be using a lot that you might
 not already be familiar with: we're talking about the "Newtype Pattern"!
 Now, I told you to read the Rust Book before you read this book, and I'm sure
 you're all good students who wouldn't sneak into this book without doing the
 required reading, so I'm sure you all remember exactly what I'm talking about,
 because they touch on the newtype concept in the book twice, in two _very_ long
 named sections:
 * [Using the Newtype Pattern to Implement External Traits on External
  Types](https://doc.rust-lang.org/book/ch19-03-advanced-traits.html#using-the-newtype-pattern-to-implement-external-traits-on-external-types)
 * [Using the Newtype Pattern for Type Safety and
  Abstraction](https://doc.rust-lang.org/book/ch19-04-advanced-types.html#using-the-newtype-pattern-for-type-safety-and-abstraction)
 ...Yeah... The Rust Book doesn't know how to make a short sub-section name to
 save its life. Shame.
 ## Newtype Basics
 So, we have all these pieces of data, and we want to keep them separated, and we
 don't wanna pay the cost for it at runtime. Well, we're in luck, we can pay the
 cost at compile time.
 ```rust
 pub struct PixelColor(u16);
 ```
 TODO: we've already talked about repr(transparent) by now
 Ah, except that, as I'm sure you remember from [The
 Rustonomicon](https://doc.rust-lang.org/nomicon/other-reprs.html#reprtransparent)
 (and from the RFC too, of course), if we have a single field struct that's
 sometimes different from having just the bare value, so we should be using
 `#[repr(transparent)]` with our newtypes.
 ```rust
 #[repr(transparent)]
 pub struct PixelColor(u16);
 ```
 And then we'll need to do that same thing for _every other newtype we want_.
 Except there's only two tiny parts that actually differ between newtype
 declarations: the new name and the base type. All the rest is just the same rote
 code over and over. Generating piles and piles of boilerplate code? Sounds like
 a job for a macro to me!
 ## Making It A Macro
 If you're going to do much with macros you should definitely read through [The
 Little Book of Rust
 Macros](https://danielkeep.github.io/tlborm/book/index.html), but we won't be
 doing too much so you can just follow along here a bit if you like.
 The most basic version of a newtype macro starts like this:
 ```rust
 #[macro_export]
 macro_rules! newtype {
  ($new_name:ident, $old_name:ident) => {
    #[repr(transparent)]
    pub struct $new_name($old_name);
  };
 }
 ```
 The `#[macro_export]` makes it exported by the current module (like `pub`
 kinda), and then we have one expansion option that takes an identifier, a `,`,
 and then a second identifier. The new name is the outer type we'll be using, and
 the old name is the inner type that's being wrapped. You'd use our new macro
 something like this:
 ```rust
 newtype! {PixelColorCurly, u16}
 newtype!(PixelColorParens, u16);
 newtype![PixelColorBrackets, u16];
 ```
 Note that you can invoke the macro with the outermost grouping as any of `()`,
 `[]`, or `{}`.  It makes no particular difference to the macro. Also, that space
 in the first version is kinda to show off that you can put white space in
 between the macro name and the grouping if you want. The difference is mostly
 style, but there are some rules and considerations here:
 * If you use curly braces then you _must not_ put a `;` after the invocation.
 * If you use parentheses or brackets then you _must_ put the `;` at the end.
 * Rustfmt cares which you use and formats accordingly:
  * Curly brace macro use mostly gets treated like a code block.
  * Parentheses macro use mostly gets treated like a function call.
  * Bracket macro use mostly gets treated like an array declaration.
 **As a reminder:** remember that `macro_rules` macros have to appear _before_
 they're invoked in your source, so the `newtype` macro will always have to be at
 the very top of your file, or if you put it in a module within your project
 you'll need to declare the module before anything that uses it.
 ## Upgrade That Macro!
 We also want to be able to add `derive` stuff and doc comments to our newtype.
 Within the context of `macro_rules!` definitions these are called "meta". Since
 we can have any number of them we wrap it all up in a "zero or more" matcher.
 Then our macro looks like this:
 ```rust
 #[macro_export]
 macro_rules! newtype {
  ($(#[$attr:meta])* $new_name:ident, $old_name:ident) => {
    $(#[$attr])*
    #[repr(transparent)]
    pub struct $new_name($old_name);
  };
 }
 ```
 So now we can write
 ```rust
 newtype! {
  /// Color on the GBA gives 5 bits for each channel, the highest bit is ignored.
  #[derive(Debug, Clone, Copy)]
  PixelColor, u16
 }
 ```
 Next, we can allow for the wrapping of types that aren't just a single
 identifier by changing `$old_name` from `:ident` to `:ty`. We can't _also_ do
 this for the `$new_type` part because declaring a new struct expects a valid
 identifier that's _not_ already declared (obviously), and `:ty` is intended for
 capturing types that already exist.
 ```rust
 #[macro_export]
 macro_rules! newtype {
  ($(#[$attr:meta])* $new_name:ident, $old_name:ty) => {
    $(#[$attr])*
    #[repr(transparent)]
    pub struct $new_name($old_name);
  };
 }
 ```
 Next of course we'll want to usually have a `new` method that's const and just
 gives a 0 value. We won't always be making a newtype over a number value, but we
 often will. It's usually silly to have a `new` method with no arguments since we
 might as well just impl `Default`, but `Default::default` isn't `const`, so
 having `pub const fn new() -> Self` is justified here.
 Here, the token `0` is given the `{integer}` type, which can be converted into
 any of the integer types as needed, but it still can't be converted into an
 array type or a pointer or things like that. Accordingly we've added the "no
 frills" option which declares the struct and no `new` method.
 ```rust
 #[macro_export]
 macro_rules! newtype {
  ($(#[$attr:meta])* $new_name:ident, $old_name:ty) => {
    $(#[$attr])*
    #[repr(transparent)]
    pub struct $new_name($old_name);
    impl $new_name {
      /// A `const` "zero value" constructor
      pub const fn new() -> Self {
        $new_name(0)
      }
    }
  };
  ($(#[$attr:meta])* $new_name:ident, $old_name:ty, no frills) => {
    $(#[$attr])*
    #[repr(transparent)]
    pub struct $new_name($old_name);
  };
 }
 ```
 Finally, we usually want to have the wrapped value be totally private, but there
 _are_ occasions where that's not the case. For this, we can allow the wrapped
 field to accept a visibility modifier.
 ```rust
 #[macro_export]
 macro_rules! newtype {
  ($(#[$attr:meta])* $new_name:ident, $v:vis $old_name:ty) => {
    $(#[$attr])*
    #[repr(transparent)]
    pub struct $new_name($v $old_name);
    impl $new_name {
      /// A `const` "zero value" constructor
      pub const fn new() -> Self {
        $new_name(0)
      }
    }
  };
  ($(#[$attr:meta])* $new_name:ident, $v:vis $old_name:ty, no frills) => {
    $(#[$attr])*
    #[repr(transparent)]
    pub struct $new_name($v $old_name);
  };
 }
 ```
--- a/book/src-bak/04-sound.md
+++ b/book/src-bak/04-sound.md
@ -1 +0,0 @@
 # Sound
--- a/book/src-bak/05-const_asserts.md
+++ b/book/src-bak/05-const_asserts.md
@ -1,130 +0,0 @@
 # Constant Assertions
 Have you ever wanted to assert things _even before runtime_? We all have, of
 course. Particularly when the runtime machine is a poor little GBA, we'd like to
 have the machine doing the compile handle as much checking as possible.
 Enter the [static assertions](https://docs.rs/static_assertions/) crate, which
 provides a way to let you assert on a `const` expression.
 This is an amazing crate that you should definitely use when you can.
 It's written by [Nikolai Vazquez](https://github.com/nvzqz), and they kindly
 wrote up a [blog
 post](https://nikolaivazquez.com/posts/programming/rust-static-assertions/) that
 explains the thinking behind it.
 However, I promised that each example would be single file, and I also promised
 to explain what's going on as we go, so we'll briefly touch upon giving an
 explanation here.
 ## How We Const Assert
 Alright, as it stands (2018-12-15), we can't use `if` in a `const` context.
 Since we can't use `if`, we can't use a normal `assert!`. Some day it will be
 possible, and a failed assert at compile time will be a compile error and a
 failed assert at run time will be a panic and we'll have a nice unified
 programming experience. We can add runtime-only assertions by being a little
 tricky with the compiler.
 If we write
 ```rust
 const ASSERT: usize = 0 - 1;
 ```
 that gives a warning, since the math would underflow. We can upgrade that
 warning to a hard error:
 ```rust
 #[deny(const_err)]
 const ASSERT: usize = 0 - 1;
 ```
 And to make our construction reusable we can enable the
 [underscore_const_names](https://github.com/rust-lang/rust/issues/54912) feature
 in our program (or library) and then give each such const an underscore for a
 name.
 ```rust
 #![feature(underscore_const_names)]
 #[deny(const_err)]
 const _: usize = 0 - 1;
 ```
 Now we wrap this in a macro where we give a `bool` expression as input. We
 negate the bool then cast it to a `usize`, meaning that `true` negates into
 `false`, which becomes `0usize`, and then there's no underflow error. Or if the
 input was `false`, it negates into `true`, then becomes `1usize`, and then the
 underflow error fires.
 ```rust
 macro_rules! const_assert {
  ($condition:expr) => {
    #[deny(const_err)]
    #[allow(dead_code)]
    const ASSERT: usize = 0 - !$condition as usize;
  }
 }
 ```
 Technically, written like this, the expression can be anything with a
 `core::ops::Not` implementation that can also be `as` cast into `usize`. That's
 `bool`, but also basically all the other number types. Since we want to ensure
 that we get proper looking type errors when things go wrong, we can use
 `($condition && true)` to enforce that we get a `bool` (thanks to `Talchas` for
 that particular suggestion).
 ```rust
 macro_rules! const_assert {
  ($condition:expr) => {
    #[deny(const_err)]
    #[allow(dead_code)]
    const _: usize = 0 - !($condition && true) as usize;
  }
 }
 ```
 ## Asserting Something
 As an example of how we might use a `const_assert`, we'll do a demo with colors.
 There's a red, blue, and green channel. We store colors in a `u16` with 5 bits
 for each channel.
 ```rust
 newtype! {
  #[derive(Debug, Clone, Copy, PartialEq, Eq)]
  Color, u16
 }
 ```
 And when we're building a color, we're passing in `u16` values, but they could
 be using more than just 5 bits of space. We want to make sure that each channel
 is 31 or less, so we can make a color builder that does a `const_assert!` on the
 value of each channel.
 ```rust
 macro_rules! rgb {
  ($r:expr, $g:expr, $b:expr) => {
    {
      const_assert!($r <= 31);
      const_assert!($g <= 31);
      const_assert!($b <= 31);
      Color($b << 10 | $g << 5 | $r)
    }
  }
 }
 ```
 And then we can declare some colors
 ```rust
 const RED: Color = rgb!(31, 0, 0);
 const BLUE: Color = rgb!(31, 500, 0);
 ```
 The second one is clearly out of bounds and it fires an error just like we
 wanted.
--- a/book/src-bak/05-help_and_resources.md
+++ b/book/src-bak/05-help_and_resources.md
@ -1,78 +0,0 @@
 # Help and Resources
 ## Help
 So you're stuck on a problem and the book doesn't say what to do. Where can you
 find out more?
 The first place I would suggest is the [Rust Community
 Discord](https://discordapp.com/invite/aVESxV8). If it's a general Rust question
 then you can ask anyone in any channel you feel is appropriate. If it's GBA
 specific then you can try asking me (`Lokathor`) or `Ketsuban` in the `#gamedev`
 channel.
 ## Emulators
 You certainly might want to eventually write a game that you can put on a flash
 cart and play on real hardware, but for most of your development you'll probably
 want to be using an emulator for testing, because you don't have to fiddle with
 cables and all that.
 In terms of emulators, you want to be using
 [mGBA](https://github.com/mgba-emu/mgba), and you want to be using the [0.7 Beta
 1](https://github.com/mgba-emu/mgba/releases/tag/0.7-b1) or later. This update
 lets you run raw ELF files, which means that you can have full debug symbols
 available while you're debugging problems.
 ## Information Resources
 First, if I fail to describe something related to Rust, you can always try
 checking in [The Rust
 Reference](https://doc.rust-lang.org/nightly/reference/introduction.html) to see
 if they cover it. You can mostly ignore that big scary red banner at the top,
 things are a lot better documented than they make it sound.
 If you need help trying to fiddle your math down as hard as you can, there are
 resources such as the [Bit Twiddling
 Hacks](https://graphics.stanford.edu/~seander/bithacks.html) page.
 As to GBA related lore, Ketsuban and I didn't magically learn this all from
 nowhere, we read various technical manuals and guides ourselves and then
 distilled those works oriented around C and C++ into a book for Rust.
 We have personally used some or all of the following:
 * [GBATEK](http://problemkaputt.de/gbatek.htm): This is _the_ resource. It
  covers not only the GBA, but also the DS and DSi, and also a run down of ARM
  assembly (32-bit and 16-bit opcodes). The link there is to the 2.9b version on
  `problemkaputt.de` (the official home of the document), but if you just google
  for gbatek the top result is for the 2.5 version on `akkit.org`, so make sure
  you're looking at the newest version. Sometimes `problemkaputt.de` is a little
  sluggish so I've also [mirrored](https://lokathor.com/gbatek.html) the 2.9b
  version on my own site as well. GBATEK is rather large, over 2mb of text, so
  if you're on a phone or similar you might want to save an offline copy to go
  easy on your data usage.
 * [TONC](https://www.coranac.com/tonc/text/): While GBATEK is basically just a
  huge tech specification, TONC is an actual _guide_ on how to make sense of the
  GBA's abilities and organize it into a game. It's written for C of course, but
  as a Rust programmer you should always be practicing your ability to read C
  code anyway. It's the programming equivalent of learning Latin because all the
  old academic books are written in Latin.
 * [CowBite](https://www.cs.rit.edu/~tjh8300/CowBite/CowBiteSpec.htm): This is
  more like GBATEK, and it's less complete, but it mixes in a little more
  friendly explanation of things in between the hardware spec parts.
 And I haven't had time to look at it myself, [The Audio
 Advance](http://belogic.com/gba/) seems to be very good. It explains in depth
 how you can get audio working on the GBA. Note that the table of contents for
 each page goes along the top instead of down the side.
 ## Non-Rust GBA Community
 There's also the [GBADev.org](http://www.gbadev.org/) site, which has a forum
 and everything. They're coding in C and C++, but you can probably overcome that
 difference with a little work on your part.
 I also found a place called
 [GBATemp](https://gbatemp.net/categories/nintendo-gba-discussions.32/), which
 seems to have a more active forum but less of a focus on actual coding.
--- a/book/src-bak/05-interrupts.md
+++ b/book/src-bak/05-interrupts.md
@ -1 +0,0 @@
 # Interrupts
--- a/book/src-bak/05-palram.md
+++ b/book/src-bak/05-palram.md
@ -1,50 +0,0 @@
 # Palette RAM (PALRAM)
 * **Address Span:** `0x500_0000` to `0x500_03FF` (1k)
 Palette RAM has a 16-bit bus, which isn't really a problem because it
 conceptually just holds `u16` values. There's no automatic wait state, but if
 you try to access the same location that the display controller is accessing you
 get bumped by 1 cycle. Since the display controller can use the palette ram any
 number of times per scanline it's basically impossible to predict if you'll have
 to do a wait or not during VDraw. During VBlank you won't have any wait of
 course.
 PALRAM is among the memory where there's weirdness if you try to write just one
 byte: if you try to write just 1 byte, it writes that byte into _both_ parts of
 the larger 16-bit location. This doesn't really affect us much with PALRAM,
 because palette values are all supposed to be `u16` anyway.
 The palette memory actually contains not one, but _two_ sets of palettes. First
 there's 256 entries for the background palette data (starting at `0x500_0000`),
 and then there's 256 entries for object palette data (starting at `0x500_0200`).
 The GBA also has two modes for palette access: 8-bits-per-pixel (8bpp) and
 4-bits-per-pixel (4bpp).
 * In 8bpp mode an 8-bit palette index value within a background or sprite
  simply indexes directly into the 256 slots for that type of thing.
 * In 4bpp mode a 4-bit palette index value within a background or sprite
  specifies an index within a particular "palbank" (16 palette entries each),
  and then a _separate_ setting outside of the graphical data determines which
  palbank is to be used for that background or object (the screen entry data for
  backgrounds, and the object attributes for objects).
 ### Transparency
 When a pixel within a background or object specifies index 0 as its palette
 entry it is treated as a transparent pixel. This means that in 8bpp mode there's
 only 255 actual color options (0 being transparent), and in 4bpp mode there's
 only 15 actual color options available within each palbank (the 0th entry of
 _each_ palbank is transparent).
 Individual backgrounds, and individual objects, each determine if they're 4bpp
 or 8bpp separately, so a given overall palette slot might map to a used color in
 8bpp and an unused/transparent color in 4bpp. If you're a palette wizard.
 Palette slot 0 of the overall background palette is used to determine the
 "backdrop" color. That's the color you see if no background or object ends up
 being rendered within a given pixel.
 Since display mode 3 and display mode 5 don't use the palette, they cannot
 benefit from transparency.
--- a/book/src-bak/06-link_cable.md
+++ b/book/src-bak/06-link_cable.md
@ -1 +0,0 @@
 # Link Cable
--- a/book/src-bak/06-vram.md
+++ b/book/src-bak/06-vram.md
@ -1,24 +0,0 @@
 # Video RAM (VRAM)
 * **Address Span:** `0x600_0000` to `0x601_7FFF` (96k)
 We've used this before! VRAM has a 16-bit bus and no wait. However, the same as
 with PALRAM, the "you might have to wait if the display controller is looking at
 it" rule applies here.
 Unfortunately there's not much more exact detail that can be given about VRAM.
 The use of the memory depends on the video mode that you're using.
 One general detail of note is that you can't write individual bytes to any part
 of VRAM. Depending on mode and location, you'll either get your bytes doubled
 into both the upper and lower parts of the 16-bit location targeted, or you
 won't even affect the memory. This usually isn't a big deal, except in two
 situations:
 * In Mode 4, if you want to change just 1 pixel, you'll have to be very careful
  to read the old `u16`, overwrite just the byte you wanted to change, and then
  write that back.
 * In any display mode, avoid using `memcopy` to place things into VRAM.
  It's written to be byte oriented, and only does 32-bit transfers under select
  conditions. The rest of the time it'll copy one byte at a time and you'll get
  either garbage or nothing at all.
--- a/book/src-bak/07-game_pak.md
+++ b/book/src-bak/07-game_pak.md
@ -1 +0,0 @@
 # Game Pak
--- a/book/src-bak/07-oam.md
+++ b/book/src-bak/07-oam.md
@ -1,62 +0,0 @@
 # Object Attribute Memory (OAM)
 * **Address Span:** `0x700_0000` to `0x700_03FF` (1k)
 The Object Attribute Memory has a 32-bit bus and no default wait, but suffers
 from the "you might have to wait if the display controller is looking at it"
 rule. You cannot write individual bytes to OAM at all, but that's not really a
 problem because all the fields of the data types within OAM are either `i16` or
 `u16` anyway.
 Object attribute memory is the wildest yet: it conceptually contains two types
 of things, but they're _interlaced_ with each other all the way through.
 Now, [GBATEK](http://problemkaputt.de/gbatek.htm#lcdobjoamattributes) and
 [CowByte](https://www.cs.rit.edu/~tjh8300/CowBite/CowBiteSpec.htm#OAM%20(sprites))
 doesn't quite give names to the two data types here.
 [TONC](https://www.coranac.com/tonc/text/regobj.htm#sec-oam) calls them
 `OBJ_ATTR` and `OBJ_AFFINE`, but we'll be giving them names fitting with the
 Rust naming convention. Just know that if you try to talk about it with others
 they might not be using the same names. In Rust terms their layout would look
 like this:
 ```rust
 #[repr(C)]
 pub struct ObjectAttributes {
  attr0: u16,
  attr1: u16,
  attr2: u16,
  filler: i16,
 }
 #[repr(C)]
 pub struct AffineMatrix {
  filler0: [u16; 3],
  pa: i16,
  filler1: [u16; 3],
  pb: i16,
  filler2: [u16; 3],
  pc: i16,
  filler3: [u16; 3],
  pd: i16,
 }
 ```
 (Note: the `#[repr(C)]` part just means that Rust must lay out the data exactly
 in the order we specify, which otherwise it is not required to do).
 So, we've got 1024 bytes in OAM and each `ObjectAttributes` value is 8 bytes, so
 naturally we can support up to 128 objects.
 _At the same time_, we've got 1024 bytes in OAM and each `AffineMatrix` is 32
 bytes, so we can have 32 of them.
 But, as I said, these things are all _interlaced_ with each other. See how
 there's "filler" fields in each struct? If we imagine the OAM as being just an
 array of one type or the other, indexes 0/1/2/3 of the `ObjectAttributes` array
 would line up with index 0 of the `AffineMatrix` array. It's kinda weird, but
 that's just how it works. When we setup functions to read and write these values
 we'll have to be careful with how we do it. We probably _won't_ want to use
 those representations above, at least not with the `AffineMatrix` type, because
 they're quite wasteful if you want to store just object attributes or just
 affine matrices.
--- a/book/src-bak/08-rom.md
+++ b/book/src-bak/08-rom.md
@ -1,14 +0,0 @@
 # Game Pak ROM / Flash ROM (ROM)
 * **Address Span (Wait State 0):** `0x800_0000` to `0x9FF_FFFF`
 * **Address Span (Wait State 1):** `0xA00_0000` to `0xBFF_FFFF`
 * **Address Span (Wait State 2):** `0xC00_0000` to `0xDFF_FFFF`
 The game's ROM data is a single set of data that's up to 32 megabytes in size.
 However, that data is mirrored to three different locations in the address
 space. Depending on which part of the address space you use, it can affect the
 memory timings involved.
 TODO: describe `WAITCNT` here, we won't get a better chance at it.
 TODO: discuss THUMB vs ARM code and why THUMB is so much faster (because ROM is a 16-bit bus)
--- a/book/src-bak/09-sram.md
+++ b/book/src-bak/09-sram.md
@ -1,21 +0,0 @@
 # Save RAM (SRAM)
 * **Address Span:** `0xE00_0000` to `0xE00FFFF` (64k)
 The actual amount of SRAM available depends on your game pak, and the 64k figure
 is simply the maximum possible. A particular game pak might have less, and an
 emulator will likely let you have all 64k if you want.
 As with other portions of the address space, SRAM has some number of wait cycles
 per use. As with ROM, you can change the wait cycle settings via the `WAITCNT`
 register if the defaults don't work well for your game pak. See the ROM section
 for full details of how the `WAITCNT` register works.
 The game pak SRAM also has only an 8-bit bus, so have fun with that.
 The GBA Direct Memory Access (DMA) unit cannot access SRAM.
 Also, you [should not write to SRAM with code executing from
 ROM](https://problemkaputt.de/gbatek.htm#gbacartbackupsramfram). Instead, you
 should move the code to WRAM and execute the save code from there. We'll cover
 how to handle that eventually.
--- a/book/src-bak/gba_prng.md
+++ b/book/src-bak/gba_prng.md
--- a/book/src-bak/index.md
+++ b/book/src-bak/index.md
@ -1,52 +0,0 @@
 # Ch 3: Memory and Objects
 Alright so we can do some basic "movement", but we left a big trail in the video
 memory of everywhere we went. Most of the time that's not what we want at all.
 If we want more hardware support we're going to have to use a new video mode. So
 far we've only used Mode 3, but modes 4 and 5 are basically the same. Instead,
 we'll switch focus to using a tiled graphical mode.
 First we will go over the complete GBA memory mapping. Part of this is the
 memory for tiled graphics, but also things like all those IO registers, where
 our RAM is for scratch space, all that stuff. Even if we can't put all of them
 to use at once, it's helpful to have an idea of what will be available in the
 long run.
 Tiled modes bring us three big new concepts that each have their own complexity:
 tiles, backgrounds, and objects. Backgrounds and objects both use tiles, but the
 background is for creating a very large static space that you can scroll around
 the view within, and the objects are about having a few moving bits that appear
 over the background. Careful use of backgrounds and objects is key to having the
 best looking GBA game, so we won't even be able to cover it all in a single
 chapter.
 And, of course, since most games are pretty boring if they're totally static
 we'll touch on the kinds of RNG implementations you might want to have on a GBA.
 Most general purpose RNGs that you find are rather big compared to the amount of
 memory we want to give them, and they often use a lot of `u64` operations, so
 they end up much slower on a 32-bit machine like the GBA (you can lower 64-bit
 ops to combinations of 32-bit ops, but that's quite a bit more work). We'll
 cover a few RNG options that size down the RNG to a good size and a good speed
 without trading away too much in terms of quality.
 To top it all off, we'll make a simple "memory game" sort of thing. There's some
 face down cards in a grid, you pick one to check, then you pick the other to
 check, and then if they match the pair disappears.
 ## Drawing Priority
 Both backgrounds and objects can have "priority" values associated with them.
 TONC and GBATEK have _opposite_ ideas of what it means to have the "highest"
 priority. TONC goes by highest numerical value, and GBATEK goes by what's on the
 z-layer closest to the user. Let's list out the rules as clearly as we can:
 * Priority is always two bits, so 0 through 3.
 * Priority conceptually proceeds in drawing passes that count _down_, so any
  priority 3 things can get covered up by priority 2 things. In truth there's
  probably depth testing and buffering stuff going on so it's all one single
  pass, but conceptually we will imagine it happening as all of the 3 elements,
  then all of 2, and so on.
 * Objects always draw over top of backgrounds of equal priority.
 * Within things of the same type and priority, the lower numbered element "wins"
  and gets its pixel drawn (bg0 is favored over bg1, obj0 is favored over obj1,
  etc).
--- a/book/src-bak/io_registers.md
+++ b/book/src-bak/io_registers.md
@ -1,33 +0,0 @@
 # IO Registers
 The GBA has a large number of **IO Registers** (not to be confused with CPU
 registers). These are special memory locations from `0x04000000` to
 `0x040003FE`. GBATEK has a [full
 list](http://problemkaputt.de/gbatek.htm#gbaiomap), but we only need to learn
 about a few of them at a time as we go, so don't be worried.
 The important facts to know about IO Registers are these:
 * Each has their own specific size. Most are `u16`, but some are `u32`.
 * All of them must be accessed in a `volatile` style.
 * Each register is specifically readable or writable or both. Actually, with
  some registers there are even individual bits that are read-only or
  write-only.
  * If you write to a read-only position, those writes are simply ignored. This
    mostly matters if a writable register contains a read-only bit (such as the
    Display Control, next section).
  * If you read from a write-only position, you get back values that are
    [basically
    nonsense](http://problemkaputt.de/gbatek.htm#gbaunpredictablethings). There
    aren't really any registers that mix writable bits with read only bits, so
    you're basically safe here. The only (mild) concern is that when you write a
    value into a write-only register you need to keep track of what you wrote
    somewhere else if you want to know what you wrote (such to adjust an offset
    value by +1, or whatever).
  * You can always check GBATEK to be sure, but if I don't mention it then a bit
    is probably both read and write.
 * Some registers have invalid bit patterns. For example, the lowest three bits
  of the Display Control register can't legally be set to the values 6 or 7.
 When talking about bit positions, the numbers are _zero indexed_ just like an
 array index is.
--- a/book/src-bak/light_cycle.md
+++ b/book/src-bak/light_cycle.md
@ -1,135 +0,0 @@
 # light_cycle
 Now let's make a game of "light_cycle" with our new knowledge.
 ## Gameplay
 `light_cycle` is pretty simple, and very obvious if you've ever seen Tron. The
 player moves around the screen with a trail left behind them. They die if they
 go off the screen or if they touch their own trail.
 ## Operations
 We need some better drawing operations this time around.
 ```rust
 pub unsafe fn mode3_clear_screen(color: u16) {
  let color = color as u32;
  let bulk_color = color << 16 | color;
  let mut ptr = VolatilePtr(VRAM as *mut u32);
  for _ in 0..SCREEN_HEIGHT {
    for _ in 0..(SCREEN_WIDTH / 2) {
      ptr.write(bulk_color);
      ptr = ptr.offset(1);
    }
  }
 }
 pub unsafe fn mode3_draw_pixel(col: isize, row: isize, color: u16) {
  VolatilePtr(VRAM as *mut u16).offset(col + row * SCREEN_WIDTH).write(color);
 }
 pub unsafe fn mode3_read_pixel(col: isize, row: isize) -> u16 {
  VolatilePtr(VRAM as *mut u16).offset(col + row * SCREEN_WIDTH).read()
 }
 ```
 The draw pixel and read pixel are both pretty obvious. What's new is the clear
 screen operation. It changes the `u16` color into a `u32` and then packs the
 value in twice. Then we write out `u32` values the whole way through screen
 memory. This means we have to do less write operations overall, and so the
 screen clear is twice as fast.
 Now we just have to fill in the main function:
 ```rust
 #[start]
 fn main(_argc: isize, _argv: *const *const u8) -> isize {
  unsafe {
    DISPCNT.write(MODE3 | BG2);
  }
  let mut px = SCREEN_WIDTH / 2;
  let mut py = SCREEN_HEIGHT / 2;
  let mut color = rgb16(31, 0, 0);
  loop {
    // read the input for this frame
    let this_frame_keys = key_input();
    // adjust game state and wait for vblank
    px += 2 * this_frame_keys.column_direction() as isize;
    py += 2 * this_frame_keys.row_direction() as isize;
    wait_until_vblank();
    // draw the new game and wait until the next frame starts.
    unsafe {
      if px < 0 || py < 0 || px == SCREEN_WIDTH || py == SCREEN_HEIGHT {
        // out of bounds, reset the screen and position.
        mode3_clear_screen(0);
        color = color.rotate_left(5);
        px = SCREEN_WIDTH / 2;
        py = SCREEN_HEIGHT / 2;
      } else {
        let color_here = mode3_read_pixel(px, py);
        if color_here != 0 {
          // crashed into our own line, reset the screen
          mode3_clear_screen(0);
          color = color.rotate_left(5);
        } else {
          // draw the new part of the line
          mode3_draw_pixel(px, py, color);
          mode3_draw_pixel(px, py + 1, color);
          mode3_draw_pixel(px + 1, py, color);
          mode3_draw_pixel(px + 1, py + 1, color);
        }
      }
    }
    wait_until_vdraw();
  }
 }
 ```
 Oh that's a lot more than before!
 First we set Mode 3 and Background 2, we know about that.
 Then we're going to store the player's x and y, along with a color value for
 their light cycle. Then we enter the core loop.
 We read the keys for input, and then do as much as we can without touching video
 memory. Since we're using video memory as the place to store the player's light
 trail, we can't do much, we just update their position and wait for VBlank to
 start. The player will be a 2x2 square, so the arrows will move you 2 pixels per
 frame.
 Once we're in VBlank we check to see what kind of drawing we're doing. If the
 player has gone out of bounds, we clear the screen, rotate their color, and then
 reset their position. Why rotate the color? Just because it's fun to have
 different colors.
 Next, if the player is in bounds we read the video memory for their position. If
 it's not black that means we've been here before and the player has crashed into
 their own line. In this case, we reset the game without moving them to a new
 location.
 Finally, if the player is in bounds and they haven't crashed, we write their
 color into memory at this position.
 Regardless of how it worked out, we hold here until vdraw starts before going to
 the next loop. That's all there is to it.
 ## The gba crate doesn't quite work like this
 Once again, as with the `hello1` and `hello2` examples, the `gba` crate covers
 much of this same ground as our example here, but in slightly different ways.
 Better organization and abstractions are usually only realized once you've used
 more of the whole thing you're trying to work with. If we want to have a crate
 where the whole thing is well integrated with itself, then the examples would
 also end up having to explain about things we haven't really touched on much
 yet. It becomes a lot harder to teach.
 So, going forward, we will continue to teach concepts and build examples that
 don't directly depend on the `gba` crate. This allows the crate to freely grow
 without all the past examples becoming a great inertia upon it.
--- a/book/src-bak/memory_game.md
+++ b/book/src-bak/memory_game.md
@ -1,316 +0,0 @@
 # Making A Memory Game
 For this example to show off our new skills we'll make a "memory" game. The idea
 is that there's some face down cards and you pick one, it flips, you pick a
 second, if they match they both go away, if they don't match they both turn back
 face down. The player keeps going until all the cards are gone, then we'll deal
 the cards again.
 There are many steps to do to get such a simple seeming game going. In fact I
 stumbled a bit myself when trying to get things set up and going despite having
 written and explained all the parts so far. Accordingly, we'll take each part
 very slowly, and review things as we build up our game.
 We'll start back with a nearly blank file, calling it `memory_game.rs`:
 ```rust
 #![feature(start)]
 #![no_std]
 #[panic_handler]
 fn panic(_info: &core::panic::PanicInfo) -> ! {
  loop {}
 }
 #[start]
 fn main(_argc: isize, _argv: *const *const u8) -> isize {
  loop {
    // TODO the whole thing
  }
 }
 ```
 ## Displaying A Background
 First let's try to get a background going. We'll display a simple checker
 pattern just so that we know that we did something.
 Remember, backgrounds have the following essential components:
 * Background Palette
 * Background Tiles
 * Screenblock
 * IO Registers
 ### Background Palette
 To write to the background palette memory we'll want to name a `VolatilePtr` for
 it. We'll probably also want to be able to cast between different types either
 right away or later in this program, so we'll add a method for that.
 ```rust
 #[derive(Debug, Clone, Copy, PartialEq, Eq)]
 #[repr(transparent)]
 pub struct VolatilePtr<T>(pub *mut T);
 impl<T> VolatilePtr<T> {
  pub unsafe fn read(&self) -> T {
    core::ptr::read_volatile(self.0)
  }
  pub unsafe fn write(&self, data: T) {
    core::ptr::write_volatile(self.0, data);
  }
  pub fn offset(self, count: isize) -> Self {
    VolatilePtr(self.0.wrapping_offset(count))
  }
  pub fn cast<Z>(self) -> VolatilePtr<Z> {
    VolatilePtr(self.0 as *mut Z)
  }
 }
 ```
 Now we give ourselves an easy way to write a color into a palbank slot.
 ```rust
 pub const BACKGROUND_PALETTE: VolatilePtr<u16> = VolatilePtr(0x500_0000 as *mut u16);
 pub fn set_bg_palette_4bpp(palbank: usize, slot: usize, color: u16) {
  assert!(palbank < 16);
  assert!(slot > 0 && slot < 16);
  unsafe {
    BACKGROUND_PALETTE
      .cast::<[u16; 16]>()
      .offset(palbank as isize)
      .cast::<u16>()
      .offset(slot as isize)
      .write(color);
  }
 }
 ```
 And of course we need to bring back in our ability to build color values, as
 well as a few named colors to start us off:
 ```rust
 pub const fn rgb16(red: u16, green: u16, blue: u16) -> u16 {
  blue << 10 | green << 5 | red
 }
 pub const WHITE: u16 = rgb16(31, 31, 31);
 pub const LIGHT_GRAY: u16 = rgb16(25, 25, 25);
 pub const DARK_GRAY: u16 = rgb16(15, 15, 15);
 ```
 Which _finally_ allows us to set our palette colors in `main`:
 ```rust
 fn main(_argc: isize, _argv: *const *const u8) -> isize {
  set_bg_palette_4bpp(0, 1, WHITE);
  set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
  set_bg_palette_4bpp(0, 3, DARK_GRAY);
 ```
 ### Background Tiles
 So we'll want some light gray tiles and some dark gray tiles. We could use a
 single tile and then swap it between palbanks to do the color selection, but for
 now we'll just use two different tiles, since we've got tons of tile space to
 spare.
 ```rust
 #[derive(Debug, Clone, Copy, Default)]
 #[repr(transparent)]
 pub struct Tile4bpp {
  pub data: [u32; 8],
 }
 pub const ALL_TWOS: Tile4bpp = Tile4bpp {
  data: [
    0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222,
  ],
 };
 pub const ALL_THREES: Tile4bpp = Tile4bpp {
  data: [
    0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333,
  ],
 };
 ```
 And then we have to have a way to put the tiles into video memory:
 ```rust
 #[derive(Clone, Copy)]
 #[repr(transparent)]
 pub struct Charblock4bpp {
  pub data: [Tile4bpp; 512],
 }
 pub const VRAM: VolatilePtr<Charblock4bpp> = VolatilePtr(0x0600_0000 as *mut Charblock4bpp);
 pub fn set_bg_tile_4bpp(charblock: usize, index: usize, tile: Tile4bpp) {
  assert!(charblock < 4);
  assert!(index < 512);
  unsafe { VRAM.offset(charblock as isize).cast::<Tile4bpp>().offset(index as isize).write(tile) }
 }
 ```
 And finally, we can call that within `main`:
 ```rust
 fn main(_argc: isize, _argv: *const *const u8) -> isize {
  // bg palette
  set_bg_palette_4bpp(0, 1, WHITE);
  set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
  set_bg_palette_4bpp(0, 3, DARK_GRAY);
  // bg tiles
  set_bg_tile_4bpp(0, 0, ALL_TWOS);
  set_bg_tile_4bpp(0, 1, ALL_THREES);
 ```
 ### Setup A Screenblock
 Screenblocks are a little weird because they take the same space as the
 charblocks (8 screenblocks per charblock). The GBA will let you mix and match
 and it's up to you to keep it all straight. We're using tiles at the base of
 charblock 0, so we'll place our screenblock at the base of charblock 1.
 First, we have to be able to make one single screenblock entry at a time:
 ```rust
 #[derive(Debug, Clone, Copy, Default)]
 #[repr(transparent)]
 pub struct RegularScreenblockEntry(u16);
 impl RegularScreenblockEntry {
  pub const SCREENBLOCK_ENTRY_TILE_ID_MASK: u16 = 0b11_1111_1111;
  pub const fn from_tile_id(id: u16) -> Self {
    RegularScreenblockEntry(id & Self::SCREENBLOCK_ENTRY_TILE_ID_MASK)
  }
 }
 ```
 And then with 32x32 of these things we'll have a whole screenblock. Now, we
 probably won't actually make values of the screenblock type itself, but we at
 least need it to have the type declared with the correct size so that we can
 move our pointers around by the right amount.
 ```rust
 #[derive(Clone, Copy)]
 #[repr(transparent)]
 pub struct RegularScreenblock {
  pub data: [RegularScreenblockEntry; 32 * 32],
 }
 ```
 Alright, so, as I said those things are kinda big, we don't really want to be
 building them up on the stack if we can avoid it, so we'll write one straight
 into memory at the correct location.
 ```rust
 pub fn checker_screenblock(slot: usize, a_entry: RegularScreenblockEntry, b_entry: RegularScreenblockEntry) {
  let mut p = VRAM.cast::<RegularScreenblock>().offset(slot as isize).cast::<RegularScreenblockEntry>();
  let mut checker = true;
  for _row in 0..32 {
    for _col in 0..32 {
      unsafe { p.write(if checker { a_entry } else { b_entry }) };
      p = p.offset(1);
      checker = !checker;
    }
    checker = !checker;
  }
 }
 ```
 And then we add this into `main`
 ```rust
  // screenblock
  let light_entry = RegularScreenblockEntry::from_tile_id(0);
  let dark_entry = RegularScreenblockEntry::from_tile_id(1);
  checker_screenblock(8, light_entry, dark_entry);
 ```
 ### Background IO Registers
 Our most important step is of course the IO register step. There's four
 different background layers, but each of them has the same format for their
 control register. For the moment, all that we care about is being able to set
 the "screen base block" value.
 ```rust
 #[derive(Clone, Copy, Default, PartialEq, Eq)]
 #[repr(transparent)]
 pub struct BackgroundControlSetting(u16);
 impl BackgroundControlSetting {
  pub const SCREEN_BASE_BLOCK_MASK: u16 = 0b1_1111;
  pub const fn from_base_block(sbb: u16) -> Self {
    BackgroundControlSetting((sbb & Self::SCREEN_BASE_BLOCK_MASK) << 8)
  }
 }
 pub const BG0CNT: VolatilePtr<BackgroundControlSetting> = VolatilePtr(0x400_0008 as *mut BackgroundControlSetting);
 ```
 And... that's all it takes for us to be able to add a line into `main`
 ```rust
  // bg0 control
  unsafe { BG0CNT.write(BackgroundControlSetting::from_base_block(8)) };
 ```
 ### Set The Display Control Register
 We're finally ready to set the display control register and get things going.
 We've slightly glossed over it so far, but when the GBA is first booted most
 everything within the address space will be all zeroed. However, the display
 control register has the "Force VBlank" bit enabled by the BIOS, giving you a
 moment to put the memory in place that you'll need for the first frame.
 So, now that have got all of our memory set, we'll overwrite the initial
 display control register value with what we'll call "just enable bg0".
 ```rust
 #[derive(Clone, Copy, Default, PartialEq, Eq)]
 #[repr(transparent)]
 pub struct DisplayControlSetting(u16);
 impl DisplayControlSetting {
  pub const JUST_ENABLE_BG0: DisplayControlSetting = DisplayControlSetting(1 << 8);
 }
 pub const DISPCNT: VolatilePtr<DisplayControlSetting> = VolatilePtr(0x0400_0000 as *mut DisplayControlSetting);
 ```
 And so finally we have a complete `main`
 ```rust
 #[start]
 fn main(_argc: isize, _argv: *const *const u8) -> isize {
  // bg palette
  set_bg_palette_4bpp(0, 1, WHITE);
  set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
  set_bg_palette_4bpp(0, 3, DARK_GRAY);
  // bg tiles
  set_bg_tile_4bpp(0, 0, ALL_TWOS);
  set_bg_tile_4bpp(0, 1, ALL_THREES);
  // screenblock
  let light_entry = RegularScreenblockEntry::from_tile_id(0);
  let dark_entry = RegularScreenblockEntry::from_tile_id(1);
  checker_screenblock(8, light_entry, dark_entry);
  // bg0 control
  unsafe { BG0CNT.write(BackgroundControlSetting::from_base_block(8)) };
  // Display Control
  unsafe { DISPCNT.write(DisplayControlSetting::JUST_ENABLE_BG0) };
  loop {
    // TODO the whole thing
  }
 }
 ```
 And _It works, Marty! It works!_
 ![screenshot_checkers](screenshot_checkers.png)
 We've got more to go, but we're well on our way.
--- a/book/src-bak/obj_memory_2d1d.jpg
+++ b/book/src-bak/obj_memory_2d1d.jpg
--- a/book/src-bak/regular_backgrounds.md
+++ b/book/src-bak/regular_backgrounds.md
@ -1,313 +0,0 @@
 # Regular Backgrounds
 So, backgrounds, they're cool. Why do we call the ones here "regular"
 backgrounds? Because there's also "affine" backgrounds. However, affine math
 stuff adds a complication, so for now we'll just work with regular backgrounds.
 The non-affine backgrounds are sometimes called "text mode" backgrounds by other
 guides.
 To get your background image working you generally need to perform all of the
 following steps, though I suppose the exact ordering is up to you.
 ## Tiled Video Modes
 When you want regular tiled display, you must use video mode 0 or 1.
 * Mode 0 allows for using all four BG layers (0 through 3) as regular
  backgrounds.
 * Mode 1 allows for using BG0 and BG1 as regular backgrounds, BG2 as an affine
  background, and BG3 not at all.
 * Mode 2 allows for BG2 and BG3 to be used as affine backgrounds, while BG0 and
  BG1 cannot be used at all.
 We will not cover affine backgrounds in this chapter, so we will naturally be
 using video mode 0.
 Also, note that you have to enable each background layer that you want to use
 within the display control register.
 ## Get Your Palette Ready
 Background palette starts at `0x5000000` and is 256 `u16` values long. It'd
 potentially be possible declare a static array starting at a fixed address and
 use a linker script to make sure that it ends up at the right spot in the final
 program, but since we have to use volatile reads and writes with PALRAM anyway,
 we'll just reuse our `VolatilePtr` type. Something like this:
 ```rust
 pub const PALRAM_BG_BASE: VolatilePtr<u16> = VolatilePtr(0x500_0000 as *mut u16);
 pub fn bg_palette(slot: usize) -> u16 {
  assert!(slot < 256);
  unsafe { PALRAM_BG_BASE.offset(slot as isize).read() }
 }
 pub fn set_bg_palette(slot: usize, color: u16) {
  assert!(slot < 256);
  unsafe { PALRAM_BG_BASE.offset(slot as isize).write(color) }
 }
 ```
 As we discussed with the tile color depths, the palette can be utilized as a
 single block of palette values (`[u16; 256]`) or as 16 palbanks of 16 palette
 values each (`[[u16;16]; 16]`). This setting is assigned per background layer
 via IO register.
 ## Get Your Tiles Ready
 Tile data is placed into charblocks. A charblock is always 16kb, so depending on
 color depth it will have either 256 or 512 tiles within that charblock.
 Charblocks 0, 1, 2, and 3 are all for background tiles. That's a maximum of 2048
 tiles for backgrounds, but as you'll see in a moment a particular tilemap entry
 can't even index that high. Instead, each background layer is assigned a
 "character base block", and then tilemap entries index relative to the character
 base block of that background layer.
 Now, if you want to move in a lot of tile data you'll probably want to use a DMA
 routine, or at least write a function like memcopy32 for fast `u32` copying from
 ROM into VRAM. However, for now, and because we're being very explicit since
 this is our first time doing it, we'll write it as functions for individual tile
 reads and writes.
 The math works like indexing a pointer, except that we have two sizes we need to
 go by. First you take the base address for VRAM (`0x600_0000`), then add the
 size of a charblock (16kb) times the charblock you want to place the tile
 within, and then you add the index of the tile slot you're placing it into times
 the size of that type of tile. Like this:
 ```rust
 pub fn bg_tile_4bpp(base_block: usize, tile_index: usize) -> Tile4bpp {
  assert!(base_block < 4);
  assert!(tile_index < 512);
  let address = VRAM + size_of::<Charblock4bpp>() * base_block + size_of::<Tile4bpp>() * tile_index;
  unsafe { VolatilePtr(address as *mut Tile4bpp).read() }
 }
 pub fn set_bg_tile_4bpp(base_block: usize, tile_index: usize, tile: Tile4bpp) {
  assert!(base_block < 4);
  assert!(tile_index < 512);
  let address = VRAM + size_of::<Charblock4bpp>() * base_block + size_of::<Tile4bpp>() * tile_index;
  unsafe { VolatilePtr(address as *mut Tile4bpp).write(tile) }
 }
 pub fn bg_tile_8bpp(base_block: usize, tile_index: usize) -> Tile8bpp {
  assert!(base_block < 4);
  assert!(tile_index < 256);
  let address = VRAM + size_of::<Charblock8bpp>() * base_block + size_of::<Tile8bpp>() * tile_index;
  unsafe { VolatilePtr(address as *mut Tile8bpp).read() }
 }
 pub fn set_bg_tile_8bpp(base_block: usize, tile_index: usize, tile: Tile8bpp) {
  assert!(base_block < 4);
  assert!(tile_index < 256);
  let address = VRAM + size_of::<Charblock8bpp>() * base_block + size_of::<Tile8bpp>() * tile_index;
  unsafe { VolatilePtr(address as *mut Tile8bpp).write(tile) }
 }
 ```
 For bulk operations, you'd do the exact same math to get your base destination
 pointer, and then you'd get the base source pointer for the tile you're copying
 out of ROM, and then you'd do the bulk copy for the correct number of `u32`
 values that you're trying to move (8 per tile moved for 4bpp, or 16 per tile
 moved for 8bpp).
 **GBA Limitation Note:** on a modern PC (eg: `x86` or `x86_64`) you're probably
 used to index based loops and iterator based loops being the same speed. The CPU
 has the ability to do a "fused multiply add", so the base address of the array
 plus desired index * size per element is a single CPU operation to compute. It's
 slightly more complicated if there's arrays within arrays like there are here,
 but with normal arrays it's basically the same speed to index per loop cycle as
 it is to take a base address and then add +1 offset per loop cycle. However, the
 GBA's CPU _can't do any of that_. On the GBA, there's a genuine speed difference
 between looping over indexes and then indexing each loop (slow) compared to
 using an iterator that just stores an internal pointer and does +1 offset per
 loop until it reaches the end (fast). The repeated indexing itself can by itself
 be an expensive step. If it's like a 3 element array it's no big deal, but if
 you've got a big slice of data to process, be sure to go over it with `.iter()`
 and `.iter_mut()` if you can, instead of looping by index. This is Rust and all,
 so probably you were gonna do that anyway, but just a heads up.
 ## Get your Tilemap ready
 I believe that at one point I alluded to a tilemap existing. Well, just as the
 tiles are arranged into charblocks, the data describing what tile to show in
 what location is arranged into a thing called a **screenblock**.
 A screenblock is placed into VRAM the same as the tile data charblocks. Starting
 at the base of VRAM (`0x600_0000`) there are 32 slots for the screenblock array.
 Each screenblock is 2048 bytes (`0x800`). Naturally, if our tiles are using up
 charblock space within VRAM and our tilemaps are using up screenblock space
 within the same VRAM... well it would just be a _disaster_ if they ran in to
 each other. Once again, it's up to you as the programmer to determine how much
 space you want to devote to each thing. Each complete charblock uses up 8
 screenblocks worth of space, but you don't have to fill a complete charblock
 with tiles, so you can be very fiddly with how you split the memory.
 Each screenblock is composed of a series of _screenblock entry_ values, which
 describe what tile index to use and if the tile should be flipped and what
 palbank it should use (if any). Because both regular backgrounds and affine
 backgrounds are composed of screenblocks with entries, and because the affine
 background has a smaller format for screenblock entries, we'll name
 appropriately.
 ```rust
 #[derive(Clone, Copy)]
 #[repr(transparent)]
 pub struct RegularScreenblock {
  pub data: [RegularScreenblockEntry; 32 * 32],
 }
 #[derive(Debug, Clone, Copy, Default)]
 #[repr(transparent)]
 pub struct RegularScreenblockEntry(u16);
 ```
 So, with one entry per tile, a single screenblock allows for 32x32 tiles worth of
 background.
 The format of a regular screenblock entry is quite simple compared to some of
 the IO register stuff:
 * 10 bits for tile index (base off of the character base block of the background)
 * 1 bit for horizontal flip
 * 1 bit for vertical flip
 * 4 bits for picking which palbank to use (if 4bpp, otherwise it's ignored)
 ```rust
 impl RegularScreenblockEntry {
  pub fn tile_id(self) -> u16 {
    self.0 & 0b11_1111_1111
  }
  pub fn set_tile_id(&mut self, id: u16) {
    self.0 &= !0b11_1111_1111;
    self.0 |= id;
  }
  pub fn horizontal_flip(self) -> bool {
    (self.0 & (1 << 0xA)) > 0
  }
  pub fn set_horizontal_flip(&mut self, bit: bool) {
    if bit {
      self.0 |= 1 << 0xA;
    } else {
      self.0 &= !(1 << 0xA);
    }
  }
  pub fn vertical_flip(self) -> bool {
    (self.0 & (1 << 0xB)) > 0
  }
  pub fn set_vertical_flip(&mut self, bit: bool) {
    if bit {
      self.0 |= 1 << 0xB;
    } else {
      self.0 &= !(1 << 0xB);
    }
  }
  pub fn palbank_index(self) -> u16 {
    self.0 >> 12
  }
  pub fn set_palbank_index(&mut self, palbank_index: u16) {
    self.0 &= 0b1111_1111_1111;
    self.0 |= palbank_index << 12;
  }
 }
 ```
 Now, at either 256 or 512 tiles per charblock, you might be thinking that with a
 10 bit index you can index past the end of one charblock and into the next.
 You'd be right, mostly.
 As long as you stay within the background memory region for charblocks (that is,
 0 through 3), then it all works out. However, if you try to get the background
 rendering to reach outside of the background charblocks you'll get an
 implementation defined result. It's not the dreaded "undefined behavior" we're
 often worried about in programming, but the results _are_ determined by what
 you're running the game on. With GBA hardware you get a bizarre result
 (basically another way to put garbage on the screen). With a DS it acts as if
 the tiles were all 0s. If you use an emulator it might or might not allow for
 you to do this, it's up to the emulator writers.
 ## Set Your IO Registers
 Instead of being just a single IO register to learn about this time, there's two
 separate groups of related registers.
 ### Background Control
 * BG0CNT (`0x400_0008`): BG0 Control
 * BG1CNT (`0x400_000A`): BG1 Control
 * BG2CNT (`0x400_000C`): BG2 Control
 * BG3CNT (`0x400_000E`): BG3 Control
 Each of these are a read/write `u16` location. This is where we get to all of
 the important details that we've been putting off.
 * 2 bits for the priority.
 * 2 bits for "character base block", the charblock that all of the tile indexes
  for this background are offset from.
 * 1 bit for mosaic effect being enabled (we'll get to that below).
 * 1 bit to enable 8bpp, otherwise 4bpp is used.
 * 5 bits to pick the "screen base block", the screen block that serves as the
  _base_ value for this background.
 * 1 bit that is _not_ used in regular mode, but in affine mode it can be enabled
  to cause the affine background to wrap around at the edges.
 * 2 bits for the background size.
 The size works a little funny. When size is 0 only the base screen block is
 used. If size is 1 or 2 then the base screenblock and the following screenblock
 are placed next to each other (horizontally for 1, vertically for 2). If the
 size is 3 then the base screenblock and the following three screenblocks are
 arranged into a 2x2 grid of screenblocks.
 ### Background Offset
 * BG0HOFS (`0x400_0010`): BG0 X-Offset
 * BG0VOFS (`0x400_0012`): BG0 Y-Offset
 * BG1HOFS (`0x400_0014`): BG1 X-Offset
 * BG1VOFS (`0x400_0016`): BG1 Y-Offset
 * BG2HOFS (`0x400_0018`): BG2 X-Offset
 * BG2VOFS (`0x400_001A`): BG2 Y-Offset
 * BG3HOFS (`0x400_001C`): BG3 X-Offset
 * BG3VOFS (`0x400_001E`): BG3 Y-Offset
 Each of these are a _write only_ `u16` location. Bits 0 through 8 are used, so
 the offsets can be 0 through 511. They also only apply in regular backgrounds.
 If a background is in an affine state then you'll use different IO registers to
 control it (discussed in a later chapter).
 The offset that you assign determines the pixel offset of the display area
 relative to the start of the background scene, as if the screen was a camera
 looking at the scene. In other words, as a BG X offset value increases, you can
 think of it as the camera moving to the right, or as that background moving to
 the left. Like when mario walks toward the goal. Similarly, when a BG Y offset
 increases the camera is moving down, or the background is moving up, like when
 mario falls down from a high platform.
 Depending on how much the background is scrolled and the size of the background,
 it will loop.
 ## Mosaic
 As a special effect, you can apply mosaic to backgrounds and objects. It's just
 a single flag for each background, so all backgrounds will use the same mosaic
 settings when they have it enabled. What it actually does is split the normal
 image into "blocks" and then each block gets the color of the top left pixel of
 that block. This is the effect you see when link hits an electric foe with his
 sword and the whole screen "buzzes" at you.
 The mosaic control is a _write only_ `u16` IO register at `0x400_004C`.
 There's 4 bits each for:
 * Horizontal BG stretch
 * Vertical BG stretch
 * Horizontal object stretch
 * Vertical object stretch
 The inputs should be 1 _less_ than the desired block size. So if you set a
 stretch value of 5 then pixels 0-5 would be part of the first block (6 pixels),
 then 6-11 is the next block (another 6 pixels) and so on.
 If you need to make a pixel other than the top left part of each block the one
 that determines the mosaic color you can carefully offset the background or
 image by a tiny bit, but of course that makes every mosaic block change its
 target pixel. You can't change the target pixel on a block by block basis.
--- a/book/src-bak/regular_objects.md
+++ b/book/src-bak/regular_objects.md
@ -1,417 +0,0 @@
 # Regular Objects
 As with backgrounds, objects can be used in both an affine and non-affine way.
 For this section we'll focus on the non-affine elements, and then we'll do all
 the affine stuff in a later chapter.
 ## Objects vs Sprites
 As [TONC](https://www.coranac.com/tonc/text/regobj.htm) helpfully reminds us
 (and then proceeds to not follow its own advice), we should always try to think
 in terms of _objects_, not _sprites_. A sprite is a logical / software concern,
 perhaps a player concern, whereas an object is a hardware concern.
 What's more, a given sprite that the player sees might need more than one object
 to display. Objects must be either square or rectangular (so sprite bits that
 stick out probably call for a second object), and can only be from 8x8 to 64x64
 (so anything bigger has to be two objects lined up to appear as one).
 ## General Object Info
 Unlike with backgrounds, you can enable the object layer in any video mode.
 There's space for 128 object definitions in OAM.
 The display gets a number of cycles per scanline to process objects: 1210 by
 default, but only 954 if you enable the "HBlank interval free" setting in the
 display control register. The [cycle cost per
 object](http://problemkaputt.de/gbatek.htm#lcdobjoverview) depends on the
 object's size and if it's using affine or regular mode, so enabling the HBlank
 interval free setting doesn't cut the number of objects displayable by an exact
 number of objects. The objects are processed in order of their definitions and
 if you run out of cycles then the rest just don't get shown. If there's a
 concern that you might run out of cycles you can place important objects (such
 as the player) at the start of the list and then less important animation
 objects later on.
 ## Ready the Palette
 Objects use the palette the same as the background does. The only difference is
 that the palette data for objects starts at `0x500_0200`.
 ```rust
 pub const PALRAM_OBJECT_BASE: VolatilePtr<u16> = VolatilePtr(0x500_0200 as *mut u16);
 pub fn object_palette(slot: usize) -> u16 {
  assert!(slot < 256);
  unsafe { PALRAM_OBJECT_BASE.offset(slot as isize).read() }
 }
 pub fn set_object_palette(slot: usize, color: u16) {
  assert!(slot < 256);
  unsafe { PALRAM_OBJECT_BASE.offset(slot as isize).write(color) }
 }
 ```
 ## Ready the Tiles
 Objects, as with backgrounds, are composed of 8x8 tiles, and if you want
 something bigger than 8x8 you have to use more than one tile put together.
 Object tiles go into the final two charblocks of VRAM (indexes 4 and 5). Because
 there's only two of them, they are sometimes called the lower block
 (`0x601_0000`) and the higher/upper block (`0x601_4000`).
 Tile indexes for sprites always offset from the base of the lower block, and
 they always go 32 bytes at a time, regardless of if the object is set for 4bpp
 or 8bpp. From this we can determine that there's 512 tile slots in each of the
 two object charblocks. However, in video modes 3, 4, and 5 the space for the
 background cuts into the lower charblock, so you can only safely use the upper
 charblock.
 ```rust
 pub fn obj_tile_4bpp(tile_index: usize) -> Tile4bpp {
  assert!(tile_index < 512);
  let address = VRAM + size_of::<Charblock4bpp>() * 4 + 32 * tile_index;
  unsafe { VolatilePtr(address as *mut Tile4bpp).read() }
 }
 pub fn set_obj_tile_4bpp(tile_index: usize, tile: Tile4bpp) {
  assert!(tile_index < 512);
  let address = VRAM + size_of::<Charblock4bpp>() * 4 + 32 * tile_index;
  unsafe { VolatilePtr(address as *mut Tile4bpp).write(tile) }
 }
 pub fn obj_tile_8bpp(tile_index: usize) -> Tile8bpp {
  assert!(tile_index < 512);
  let address = VRAM + size_of::<Charblock8bpp>() * 4 + 32 * tile_index;
  unsafe { VolatilePtr(address as *mut Tile8bpp).read() }
 }
 pub fn set_obj_tile_8bpp(tile_index: usize, tile: Tile8bpp) {
  assert!(tile_index < 512);
  let address = VRAM + size_of::<Charblock8bpp>() * 4 + 32 * tile_index;
  unsafe { VolatilePtr(address as *mut Tile8bpp).write(tile) }
 }
 ```
 With backgrounds you picked every single tile individually with a bunch of
 screen entry values. Objects don't do that at all. Instead you pick a base tile,
 size, and shape, then it figures out the rest from there. However, you may
 recall back with the display control register something about an "object memory
 1d" bit. This is where that comes into play.
 * If object memory is set to be 2d (the default) then each charblock is treated
  as 32 tiles by 32 tiles square. Each object has a base tile and dimensions,
  and that just extracts directly from the charblock picture as if you were
  selecting an area. This mode probably makes for the easiest image editing.
 * If object memory is set to be 1d then the tiles are loaded sequentially from
  the starting point, enough to fill in the object's dimensions. This most
  probably makes it the easiest to program with about things, since programming
  languages are pretty good at 1d things.
 I'm not sure I explained that well, here's a picture:
 ![2d1d-diagram](obj_memory_2d1d.jpg)
 In 2d mode, a new row of tiles starts every 32 tile indexes.
 Of course, the mode that you actually end up using is not particularly
 important, since it should be the job of your image conversion routine to get
 everything all lined up and into place anyway.
 ## Set the Object Attributes
 The final step is to assign the correct attributes to an object. Each object has
 three `u16` values that make up its overall attributes.
 Before we go into the details, I want to bring up that the hardware will attempt
 to process every single object every single frame if the object layer is
 enabled, and also that all of the GBA's object memory is cleared to 0 at
 startup. Why do these two things matter right now? As you'll see in a second an
 "all zero" set of object attributes causes an 8x8 object to appear at 0,0 using
 object tile index 0. This is usually _not_ what you want your unused objects to
 do. When your game first starts you should take a moment to mark any objects you
 won't be using as objects to not render.
 ### ObjectAttributes.attr0
 * 8 bits for row coordinate (marks the top of the sprite)
 * 2 bits for object rendering: 0 = Normal, 1 = Affine, 2 = Disabled, 3 = Affine with double rendering area
 * 2 bits for object mode: 0 = Normal, 1 = Alpha Blending, 2 = Object Window, 3 = Forbidden
 * 1 bit for mosaic enabled
 * 1 bit 8bpp color enabled
 * 2 bits for shape: 0 = Square, 1 = Horizontal, 2 = Vertical, 3 = Forbidden
 If an object is 128 pixels big at Y > 128 you'll get a strange looking result
 where it acts like Y > -128 and then displays partly off screen to the top.
 ### ObjectAttributes.attr1
 * 9 bit for column coordinate (marks the left of the sprite)
 * Either:
  * 3 empty bits, 1 bit for horizontal flip, 1 bit for vertical flip (non-affine)
  * 5 bits for affine index (affine)
 * 2 bits for size.
 | Size | Square | Horizontal | Vertical|
 |:----:|:------:|:----------:|:-------:|
 | 0    | 8x8    | 16x8       | 8x16    |
 | 1    | 16x16  | 32x8       | 8x32    |
 | 2    | 32x32  | 32x16      | 16x32   |
 | 3    | 64x64  | 64x32      | 32x64   |
 ### ObjectAttributes.attr2
 * 10 bits for the base tile index
 * 2 bits for priority
 * 4 bits for the palbank index (4bpp mode only, ignored in 8bpp)
 ### ObjectAttributes summary
 So I said in the GBA memory mapping section that C people would tell you that
 the object attributes should look like this:
 ```rust
 #[repr(C)]
 pub struct ObjectAttributes {
  attr0: u16,
  attr1: u16,
  attr2: u16,
  filler: i16,
 }
 ```
 Except that:
 1) It's wasteful when we store object attributes on their own outside of OAM
   (which we definitely might want to do).
 2) In Rust we can't access just one field through a volatile pointer (our
   pointers aren't actually volatile to begin with, just the ops we do with them
   are). We have to read or write the whole pointer's value at a time.
   Similarly, we can't do things like `|=` and `&=` with volatile in Rust. So in
   rust we can't have a volatile pointer to an ObjectAttributes and then write
   to just the three "real" values and not touch the filler field. Having the
   filler value in there just means we have to dance around it more, not less.
 3) We want to newtype this whole thing to prevent accidental invalid states from
   being written into memory.
 So we will not be using that representation. At the same time we want to have no
 overhead, so we will stick to three `u16` values. We could newtype each
 individual field to be its own type (`ObjectAttributesAttr0` or something silly
 like that), since there aren't actual dependencies between two different fields
 such that a change in one can throw another into a forbidden state. The worst
 that can happen is if we disable or enable affine mode (`attr0`) it can change
 the meaning of `attr1`. The changed meaning isn't actually in invalid state
 though, so we _could_ make each field its own type if we wanted.
 However, when you think about it, I can't imagine a common situation where we do
 something like make an `attr0` value that we then want to save on its own and
 apply to several different `ObjectAttributes` that we make during a game. That
 just doesn't sound likely to me. So, we'll go the route where `ObjectAttributes`
 is just a big black box to the outside world and we don't need to think about
 the three fields internally as being separate.
 First we make it so that we can get and set object attributes from memory:
 ```rust
 pub const OAM: usize = 0x700_0000;
 pub fn object_attributes(slot: usize) -> ObjectAttributes {
  assert!(slot < 128);
  let ptr = VolatilePtr((OAM + slot * (size_of::<u16>() * 4)) as *mut u16);
  unsafe {
    ObjectAttributes {
      attr0: ptr.read(),
      attr1: ptr.offset(1).read(),
      attr2: ptr.offset(2).read(),
    }
  }
 }
 pub fn set_object_attributes(slot: usize, obj: ObjectAttributes) {
  assert!(slot < 128);
  let ptr = VolatilePtr((OAM + slot * (size_of::<u16>() * 4)) as *mut u16);
  unsafe {
    ptr.write(obj.attr0);
    ptr.offset(1).write(obj.attr1);
    ptr.offset(2).write(obj.attr2);
  }
 }
 #[derive(Debug, Clone, Copy, Default)]
 pub struct ObjectAttributes {
  attr0: u16,
  attr1: u16,
  attr2: u16,
 }
 ```
 Then we add a billion methods to the `ObjectAttributes` type so that we can
 actually set all the different values that we want to set.
 This code block is the last thing on this page so if you don't wanna scroll past
 the whole thing you can just go to the next page.
 ```rust
 #[derive(Debug, Clone, Copy)]
 pub enum ObjectRenderMode {
  Normal,
  Affine,
  Disabled,
  DoubleAreaAffine,
 }
 #[derive(Debug, Clone, Copy)]
 pub enum ObjectMode {
  Normal,
  AlphaBlending,
  ObjectWindow,
 }
 #[derive(Debug, Clone, Copy)]
 pub enum ObjectShape {
  Square,
  Horizontal,
  Vertical,
 }
 #[derive(Debug, Clone, Copy)]
 pub enum ObjectOrientation {
  Normal,
  HFlip,
  VFlip,
  BothFlip,
  Affine(u8),
 }
 impl ObjectAttributes {
  pub fn row(&self) -> u16 {
    self.attr0 & 0b1111_1111
  }
  pub fn column(&self) -> u16 {
    self.attr1 & 0b1_1111_1111
  }
  pub fn rendering(&self) -> ObjectRenderMode {
    match (self.attr0 >> 8) & 0b11 {
      0 => ObjectRenderMode::Normal,
      1 => ObjectRenderMode::Affine,
      2 => ObjectRenderMode::Disabled,
      3 => ObjectRenderMode::DoubleAreaAffine,
      _ => unimplemented!(),
    }
  }
  pub fn mode(&self) -> ObjectMode {
    match (self.attr0 >> 0xA) & 0b11 {
      0 => ObjectMode::Normal,
      1 => ObjectMode::AlphaBlending,
      2 => ObjectMode::ObjectWindow,
      _ => unimplemented!(),
    }
  }
  pub fn mosaic(&self) -> bool {
    ((self.attr0 << 3) as i16) < 0
  }
  pub fn two_fifty_six_colors(&self) -> bool {
    ((self.attr0 << 2) as i16) < 0
  }
  pub fn shape(&self) -> ObjectShape {
    match (self.attr0 >> 0xE) & 0b11 {
      0 => ObjectShape::Square,
      1 => ObjectShape::Horizontal,
      2 => ObjectShape::Vertical,
      _ => unimplemented!(),
    }
  }
  pub fn orientation(&self) -> ObjectOrientation {
    if (self.attr0 >> 8) & 1 > 0 {
      ObjectOrientation::Affine((self.attr1 >> 9) as u8 & 0b1_1111)
    } else {
      match (self.attr1 >> 0xC) & 0b11 {
        0 => ObjectOrientation::Normal,
        1 => ObjectOrientation::HFlip,
        2 => ObjectOrientation::VFlip,
        3 => ObjectOrientation::BothFlip,
        _ => unimplemented!(),
      }
    }
  }
  pub fn size(&self) -> u16 {
    self.attr1 >> 0xE
  }
  pub fn tile_index(&self) -> u16 {
    self.attr2 & 0b11_1111_1111
  }
  pub fn priority(&self) -> u16 {
    self.attr2 >> 0xA
  }
  pub fn palbank(&self) -> u16 {
    self.attr2 >> 0xC
  }
  //
  pub fn set_row(&mut self, row: u16) {
    self.attr0 &= !0b1111_1111;
    self.attr0 |= row & 0b1111_1111;
  }
  pub fn set_column(&mut self, col: u16) {
    self.attr1 &= !0b1_1111_1111;
    self.attr2 |= col & 0b1_1111_1111;
  }
  pub fn set_rendering(&mut self, rendering: ObjectRenderMode) {
    const RENDERING_MASK: u16 = 0b11 << 8;
    self.attr0 &= !RENDERING_MASK;
    self.attr0 |= (rendering as u16) << 8;
  }
  pub fn set_mode(&mut self, mode: ObjectMode) {
    const MODE_MASK: u16 = 0b11 << 0xA;
    self.attr0 &= MODE_MASK;
    self.attr0 |= (mode as u16) << 0xA;
  }
  pub fn set_mosaic(&mut self, bit: bool) {
    const MOSAIC_BIT: u16 = 1 << 0xC;
    if bit {
      self.attr0 |= MOSAIC_BIT
    } else {
      self.attr0 &= !MOSAIC_BIT
    }
  }
  pub fn set_two_fifty_six_colors(&mut self, bit: bool) {
    const COLOR_MODE_BIT: u16 = 1 << 0xD;
    if bit {
      self.attr0 |= COLOR_MODE_BIT
    } else {
      self.attr0 &= !COLOR_MODE_BIT
    }
  }
  pub fn set_shape(&mut self, shape: ObjectShape) {
    self.attr0 &= 0b0011_1111_1111_1111;
    self.attr0 |= (shape as u16) << 0xE;
  }
  pub fn set_orientation(&mut self, orientation: ObjectOrientation) {
    const AFFINE_INDEX_MASK: u16 = 0b1_1111 << 9;
    self.attr1 &= !AFFINE_INDEX_MASK;
    let bits = match orientation {
      ObjectOrientation::Affine(index) => (index as u16) << 9,
      ObjectOrientation::Normal => 0,
      ObjectOrientation::HFlip => 1 << 0xC,
      ObjectOrientation::VFlip => 1 << 0xD,
      ObjectOrientation::BothFlip => 0b11 << 0xC,
    };
    self.attr1 |= bits;
  }
  pub fn set_size(&mut self, size: u16) {
    self.attr1 &= 0b0011_1111_1111_1111;
    self.attr1 |= size << 14;
  }
  pub fn set_tile_index(&mut self, index: u16) {
    self.attr2 &= !0b11_1111_1111;
    self.attr2 |= 0b11_1111_1111 & index;
  }
  pub fn set_priority(&mut self, priority: u16) {
    self.attr2 &= !0b0000_1100_0000_0000;
    self.attr2 |= (priority & 0b11) << 0xA;
  }
  pub fn set_palbank(&mut self, palbank: u16) {
    self.attr2 &= !0b1111_0000_0000_0000;
    self.attr2 |= (palbank & 0b1111) << 0xC;
  }
 }
 ```
--- a/book/src-bak/screenshot_checkers.png
+++ b/book/src-bak/screenshot_checkers.png
--- a/book/src-bak/the_display_control_register.md
+++ b/book/src-bak/the_display_control_register.md
@ -1,109 +0,0 @@
 # The Display Control Register
 The display control register is our first actual IO Register. GBATEK gives it the
 shorthand [DISPCNT](http://problemkaputt.de/gbatek.htm#lcdiodisplaycontrol), so
 you might see it under that name if you read other guides.
 Among IO Registers, it's one of the simpler ones, but it's got enough complexity
 that we can get a hint of what's to come.
 Also it's the one that you basically always need to set at least once in every
 GBA game, so it's a good starting one to go over for that reason too.
 The display control register holds a `u16` value, and is located at `0x0400_0000`.
 Many of the bits here won't mean much to you right now. **That is fine.** You do
 NOT need to memorize them all or what they all do right away. We'll just skim
 over all the parts of this register to start, and then we'll go into more detail
 in later chapters when we need to come back and use more of the bits.
 ## Video Modes
 The lowest three bits (0-2) let you select from among the GBA's six video modes.
 You'll notice that 3 bits allows for eight modes, but the values 6 and 7 are
 prohibited.
 Modes 0, 1, and 2 are "tiled" modes. These are actually the modes that you
 should eventually learn to use as much as possible. It lets the GBA's limited
 video hardware do as much of the work as possible, leaving more of your CPU time
 for gameplay computations. However, they're also complex enough to deserve their
 own demos and chapters later on, so that's all we'll say about them for now.
 Modes 3, 4, and 5 are "bitmap" modes. These let you write individual pixels to
 locations on the screen.
 * **Mode 3** is full resolution (240w x 160h) RGB15 color. You might not be used
  to RGB15, since modern computers have 24 or 32 bit colors. In RGB15, there's 5
  bits for each color channel stored within a `u16` value, and the highest bit is
  simply ignored.
 * **Mode 4** is full resolution paletted color. Instead of being a `u16` color, each
  pixel value is a `u8` palette index entry, and then the display uses the
  palette memory (which we'll talk about later) to store the actual color data.
  Since each pixel is half sized, we can fit twice as many. This lets us have
  two "pages". At any given moment only one page is active, and you can draw to
  the other page without the user noticing. You set which page to show with
  another bit we'll get to in a moment.
 * **Mode 5** is full color, but also with pages. This means that we must have a
  reduced resolution to compensate (video memory is only so big!). The screen is
  effectively only 160w x 128h in this mode.
 ## CGB Mode
 Bit 3 is effectively read only. Technically it can be flipped using a BIOS call,
 but when you write to the display control register normally it won't write to
 this bit, so we'll call it effectively read only.
 This bit is on if the CPU is in CGB mode.
 ## Page Flipping
 Bit 4 lets you pick which page to use. This is only relevent in video modes 4 or
 5, and is just ignored otherwise. It's very easy to remember: when the bit is 0
 the 0th page is used, and when the bit is 1 the 1st page is used.
 The second page always starts at `0x0600_A000`.
 ## OAM, VRAM, and Blanking
 Bit 5 lets you access OAM during HBlank if enabled. This is cool, but it reduces
 the maximum sprites per scanline, so it's not default.
 Bit 6 lets you adjust if the GBA should treat Object Character VRAM as being 2d
 (off) or 1d (on). This particular control can be kinda tricky to wrap your head
 around, so we'll be sure to have some extra diagrams in the chapter that deals
 with it.
 Bit 7 forces the screen to stay in VBlank as long as it's set. This allows the
 fastest use of the VRAM, Palette, and Object Attribute Memory. Obviously if you
 leave this on for too long the player will notice a blank screen, but it might
 be okay to use for a moment or two every once in a while.
 ## Screen Layers
 Bits 8 through 11 control if Background layers 0 through 3 should be active.
 Bit 12 affects the Object layer.
 Note that not all background layers are available in all video modes:
 * Mode 0: all
 * Mode 1: 0/1/2
 * Mode 2: 2/3
 * Mode 3/4/5: 2
 Bit 13 and 14 enable the display of Windows 0 and 1, and Bit 15 enables the
 object display window. We'll get into how windows work later on, they let you do
 some nifty graphical effects.
 ## In Conclusion...
 So what did we do to the display control register in `hello1`?
 ```rust
    (0x04000000 as *mut u16).write_volatile(0x0403);
 ```
 First let's [convert that to
 binary](https://www.wolframalpha.com/input/?i=0x0403), and we get
 `0b100_0000_0011`. So, that's setting Mode 3 with background 2 enabled and
 nothing else special.
--- a/book/src-bak/the_key_input_register.md
+++ b/book/src-bak/the_key_input_register.md
@ -1,213 +0,0 @@
 # The Key Input Register
 The Key Input Register is our next IO register. Its shorthand name is
 [KEYINPUT](http://problemkaputt.de/gbatek.htm#gbakeypadinput) and it's a `u16`
 at `0x4000130`. The entire register is obviously read only, you can't tell the
 GBA what buttons are pressed.
 Each button is exactly one bit:
 | Bit | Button |
 |:---:|:------:|
 |  0  | A |
 |  1  | B |
 |  2  | Select |
 |  3  | Start |
 |  4  | Right |
 |  5  | Left |
 |  6  | Up |
 |  7  | Down |
 |  8  | R |
 |  9  | L |
 The higher bits above are not used at all.
 Similar to other old hardware devices, the convention here is that a button's
 bit is **clear when pressed, active when released**. In other words, when the
 user is not touching the device at all the KEYINPUT value will read
 `0b0000_0011_1111_1111`. There's similar values for when the user is pressing as
 many buttons as possible, but since the left/right and up/down keys are on an
 arrow pad the value can never be 0 since you can't ever press every single key
 at once.
 When dealing with key input, the register always shows the exact key values at
 any moment you read it. Obviously that's what it should do, but what it means to
 you as a programmer is that you should usually gather input once at the top of a
 game frame and then use that single input poll as the input values across the
 whole game frame.
 Of course, you might want to know if a user's key state changed from frame to
 frame. That's fairly easy too: We just store the last frame keys as well as the
 current frame keys (it's only a `u16`) and then we can xor the two values.
 Anything that shows up in the xor result is a key that changed. If it's changed
 and it's now down, that means it was pushed this frame. If it's changed and it's
 now up, that means it was released this frame.
 The other major thing you might frequently want is to know "which way" the arrow
 pad is pointing: Up/Down/None and Left/Right/None. Sounds like an enum to me.
 Except that often time we'll have situations where the direction just needs to
 be multiplied by a speed and applied as a delta to a position. We want to
 support that as well as we can too.
 ## Key Input Code
 Let's get down to some code. First we want to make a way to read the address as
 a `u16` and then wrap that in our newtype which will implement methods for
 reading and writing the key bits.
 ```rust
 pub const KEYINPUT: VolatilePtr<u16> = VolatilePtr(0x400_0130 as *mut u16);
 /// A newtype over the key input state of the GBA.
 #[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
 #[repr(transparent)]
 pub struct KeyInputSetting(u16);
 pub fn key_input() -> KeyInputSetting {
  unsafe { KeyInputSetting(KEYINPUT.read()) }
 }
 ```
 Now we want a way to check if a key is _being pressed_, since that's normally
 how we think of things as a game designer and even as a player. That is, usually
 you'd say "if you press A, then X happens" instead of "if you don't press A,
 then X does not happen".
 Normally we'd pick a constant for the bit we want, `&` it with our value, and
 then check for `val != 0`. Since the bit we're looking for is `0` in the "true"
 state we still pick the same constant and we still do the `&`, but we test with
 `== 0`. Practically the same, right? Well, since I'm asking a rhetorical
 question like that you can probably already guess that it's not the same. I was
 shocked to learn this too.
 All we have to do is ask our good friend
 [Godbolt](https://rust.godbolt.org/z/d-8oCe) what's gonna happen when the code
 compiles. The link there has the page set for the `stable` 1.30 compiler just so
 that the link results stay consistent if you read this book in a year or
 something. Also, we've set the target to `thumbv6m-none-eabi`, which is a
 slightly later version of ARM than the actual GBA, but it's close enough for
 just checking. Of course, in a full program small functions like these will
 probably get inlined into the calling code and disappear entirely as they're
 folded and refolded by the compiler, but we can just check.
 It turns out that the `!=0` test is 4 instructions and the `==0` test is 6
 instructions. Since we want to get savings where we can, and we'll probably
 check the keys of an input often enough, we'll just always use a `!=0` test and
 then adjust how we initially read the register to compensate. By using xor with
 a mask for only the 10 used bits we can flip the "low when pressed" values so
 that the entire result has active bits in all positions where a key is pressed.
 ```rust
 pub fn key_input() -> KeyInputSetting {
  unsafe { KeyInputSetting(KEYINPUT.read_volatile() ^ 0b0000_0011_1111_1111) }
 }
 ```
 Now we add a method for seeing if a key is pressed. In the full library there's
 a more advanced version of this that's built up via macro, but for this example
 we'll just name a bunch of `const` values and then have a method that takes a
 value and says if that bit is on.
 ```rust
 pub const KEY_A: u16 = 1 << 0;
 pub const KEY_B: u16 = 1 << 1;
 pub const KEY_SELECT: u16 = 1 << 2;
 pub const KEY_START: u16 = 1 << 3;
 pub const KEY_RIGHT: u16 = 1 << 4;
 pub const KEY_LEFT: u16 = 1 << 5;
 pub const KEY_UP: u16 = 1 << 6;
 pub const KEY_DOWN: u16 = 1 << 7;
 pub const KEY_R: u16 = 1 << 8;
 pub const KEY_L: u16 = 1 << 9;
 impl KeyInputSetting {
  pub fn contains(&self, key: u16) -> bool {
    (self.0 & key) != 0
  }
 }
 ```
 Because each key is a unique bit you can even check for more than one key at
 once by just adding two key values together.
 ```rust
 let input_contains_a_and_l = input.contains(KEY_A + KEY_L);
 ```
 And we wanted to save the state of an old frame and compare it to the current
 frame to see what was different:
 ```rust
  pub fn difference(&self, other: KeyInputSetting) -> KeyInputSetting {
    KeyInputSetting(self.0 ^ other.0)
  }
 ```
 Anything that's "in" the difference output is a key that _changed_, and then if
 the key reads as pressed this frame that means it was just pressed. The exact
 mechanics of all the ways you might care to do something based on new key
 presses is obviously quite varied, but it might be something like this:
 ```rust
 let this_frame_diff = this_frame_input.difference(last_frame_input);
 if this_frame_diff.contains(KEY_B) && this_frame_input.contains(KEY_B) {
  // the user just pressed B, react in some way
 }
 ```
 And for the arrow pad, we'll make an enum that easily casts into `i32`. Whenever
 we're working with stuff we can try to use `i32` / `isize` as often as possible
 just because it's easier on the GBA's CPU if we stick to its native number size.
 Having it be an enum lets us use `match` and be sure that we've covered all our
 cases.
 ```rust
 /// A "tribool" value helps us interpret the arrow pad.
 #[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
 #[repr(i32)]
 pub enum TriBool {
  Minus = -1,
  Neutral = 0,
  Plus = +1,
 }
 ```
 Now, how do we determine _which way_ is plus or minus? Well... I don't know.
 Really. I'm not sure what the best one is because the GBA really wants the
 origin at 0,0 with higher rows going down and higher cols going right. On the
 other hand, all the normal math you and I learned in school is oriented with
 increasing Y being upward on the page. So, at least for this demo, we're going
 to go with what the GBA wants us to do and give it a try. If we don't end up
 confusing ourselves then we can stick with that. Maybe we can cover it over
 somehow later on.
 ```rust
  pub fn column_direction(&self) -> TriBool {
    if self.contains(KEY_RIGHT) {
      TriBool::Plus
    } else if self.contains(KEY_LEFT) {
      TriBool::Minus
    } else {
      TriBool::Neutral
    }
  }
  pub fn row_direction(&self) -> TriBool {
    if self.contains(KEY_DOWN) {
      TriBool::Plus
    } else if self.contains(KEY_UP) {
      TriBool::Minus
    } else {
      TriBool::Neutral
    }
  }
 ```
 So then in our game, every frame we can check for `column_direction` and
 `row_direction` and then apply those to the player's current position to make
 them move around the screen.
 With that settled I think we're all done with user input for now. There's some
 other things to eventually know about like key interrupts that you can set and
 stuff, but we'll cover that later on because it's not necessary right now.
--- a/book/src-bak/the_vcount_register.md
+++ b/book/src-bak/the_vcount_register.md
@ -1,71 +0,0 @@
 # The VCount Register
 There's an IO register called
 [VCOUNT](http://problemkaputt.de/gbatek.htm#lcdiointerruptsandstatus) that shows
 you, what else, the Vertical (row) COUNT(er). It's a `u16` at address
 `0x0400_0006`, and it's how we'll be doing our very poor quality vertical sync
 code to start.
 * **What makes it poor?** Well, we're just going to read from the vcount value as
  often as possible every time we need to wait for a specific value to come up,
  and then proceed once it hits the point we're looking for.
 * **Why is this bad?** Because we're making the CPU do a lot of useless work,
  which uses a lot more power that necessary. Even if you're not on an actual
  GBA you might be running inside an emulator on a phone or other handheld. You
  wanna try to save battery if all you're doing with that power use is waiting
  instead of making a game actually do something.
 * **Can we do better?** We can, but not yet. The better way to do things is to
  use a BIOS call to put the CPU into low power mode until a VBlank interrupt
  happens. However, we don't know about interrupts yet, and we don't know about
  BIOS calls yet, so we'll do the basic thing for now and then upgrade later.
 So the way that display hardware actually displays each frame is that it moves a
 tiny pointer left to right across each pixel row one pixel at a time. When it's
 within the actual screen width (240px) it's drawing out those pixels. Then it
 goes _past_ the edge of the screen for 68px during a period known as the
 "horizontal blank" (HBlank). Then it starts on the next row and does that loop
 over again. This happens for the whole screen height (160px) and then once again
 it goes past the last row for another 68px into a "vertical blank" (VBlank)
 period.
 * One pixel is 4 CPU cycles
 * HDraw is 240 pixels, HBlank is 68 pixels (1,232 cycles per full scanline)
 * VDraw is 150 scanlines, VBlank is 68 scanlines (280,896 cycles per full refresh)
 Now you may remember some stuff from the display control register section where
 it was mentioned that some parts of memory are best accessed during VBlank, and
 also during hblank with a setting applied. These blanking periods are what was
 being talked about. At other times if you attempt to access video or object
 memory you (the CPU) might try touching the same memory that the display device
 is trying to use, in which case you get bumped back a cycle so that the display
 can finish what it's doing. Also, if you really insist on doing video memory
 changes while the screen is being drawn then you might get some visual glitches.
 If you can, just prepare all your changes ahead of time and then assign then all
 quickly during the blank period.
 So first we want a way to check the vcount value at all:
 ```rust
 pub const VCOUNT: VolatilePtr<u16> = VolatilePtr(0x0400_0006 as *mut u16);
 pub fn vcount() -> u16 {
  unsafe { VCOUNT.read() }
 }
 ```
 Then we want two little helper functions to wait until VBlank and vdraw.
 ```rust
 pub const SCREEN_HEIGHT: isize = 160;
 pub fn wait_until_vblank() {
  while vcount() < SCREEN_HEIGHT as u16 {}
 }
 pub fn wait_until_vdraw() {
  while vcount() >= SCREEN_HEIGHT as u16 {}
 }
 ```
 And... that's it. No special types to be made this time around, it's just a
 number we read out of memory.
--- a/book/src-bak/tile_data.md
+++ b/book/src-bak/tile_data.md
@ -1,130 +0,0 @@
 # Tile Data
 When using the GBA's hardware graphics, if you want to let the hardware do most
 of the work you have to use Modes 0, 1 or 2. However, to do that we first have
 to learn about how tile data works inside of the GBA.
 ## Tiles
 Fundamentally, a tile is an 8x8 image. If you want anything bigger than 8x8 you
 need to arrange several tiles so that it looks like whatever you're trying to
 draw.
 As was already mentioned, the GBA supports two different color modes: 4 bits per
 pixel and 8 bits per pixel. This means that we have two types of tile that we
 need to model. The pixel bits always represent an index into the PALRAM.
 * With 4 bits per pixel, the PALRAM is imagined to be 16 **palbank** sections of
  16 palette entries each. The image data selects the index within the palbank,
  and an external configuration selects which palbank is used.
 * With 8 bits per pixel, the PALRAM is imagined to be a single 256 entry array
  and the index just directly picks which of the 256 colors is used.
 Knowing this, we can write the following definitions:
 ```rust
 #[derive(Debug, Clone, Copy, Default)]
 #[repr(transparent)]
 pub struct Tile4bpp {
  pub data: [u32; 8]
 }
 #[derive(Debug, Clone, Copy, Default)]
 #[repr(transparent)]
 pub struct Tile8bpp {
  pub data: [u32; 16]
 }
 ```
 I hope this makes sense so far. At 4bpp, we have 4 bits per pixel, times 8
 pixels per line, times 8 lines: 256 bits required. Similarly, at 8 bits per
 pixel we'll need 512 bits. Why are we defining them as arrays of `u32` values?
 Because when it comes time to do bulk copies the fastest way to it will be to go
 one whole machine word at a time. If we make the data inside the type be an
 array of `u32` then it'll already be aligned for fast `u32` bulk copies.
 Keeping track of the current color depth is naturally the _programmer's_
 problem. If you get it wrong you'll see a whole ton of garbage pixels all over
 the screen, and you'll probably be able to guess why. You know, unless you did
 one of the other things that can make a bunch of garbage pixels show up all over
 the screen. Graphics programming is fun like that.
 ## Charblocks
 Tiles don't just sit on their own, they get grouped into **charblocks**. Long
 ago in the distant past, video games were built with hardware that was also used
 to make text terminals. So tile image data was called "character data". In fact
 some guides will even call the regular mode for the background layers "text
 mode", despite the fact that you obviously don't have to show text at all.
 A charblock is 16kb long (`0x4000` bytes), which means that the number of tiles
 that fit into a charblock depends on your color depth. With 4bpp you get 512
 tiles, and with 8bpp there's 256 tiles. So they'd be something like this:
 ```rust
 #[derive(Clone, Copy)]
 #[repr(transparent)]
 pub struct Charblock4bpp {
  pub data: [Tile4bpp; 512],
 }
 #[derive(Clone, Copy)]
 #[repr(transparent)]
 pub struct Charblock8bpp {
  pub data: [Tile8bpp; 256],
 }
 ```
 You'll note that we can't even derive `Debug` or `Default` any more because the
 arrays are so big. Rust supports Clone and Copy for arrays of any size, but the
 rest is still size 32 or less. We won't generally be making up an entire
 Charblock on the fly though, so it's not a big deal. If we _absolutely_ had to,
 we could call `core::mem::zeroed()`, but we really don't want to be trying to
 build a whole charblock at runtime. We'll usually want to define our tile data
 as `const` charblock values (or even parts of charblock values) that we then
 load out of the game pak ROM at runtime.
 Anyway, with 16k per charblock and only 96k total in VRAM, it's easy math to see
 that there's 6 different charblocks in VRAM when in a tiled mode. The first four
 of these are for backgrounds, and the other two are for objects. There's rules
 for how a tile ID on a background or object selects a tile within a charblock,
 but since they're different between backgrounds and objects we'll cover that on
 their own pages.
 ## Image Editing
 It's very important to note that if you use a normal image editor you'll get
 very bad results if you translate that directly into GBA memory.
 Imagine you have part of an image that's 16 by 16 pixels, aka 2 tiles by 2
 tiles. The data for that bitmap is the 1st row of the 1st tile, then the 1st row
 of the 2nd tile. However, when we translate that into the GBA, the first 8
 pixels will indeed be the first 8 tile pixels, but then the next 8 pixels in
 memory will be used as the _2nd row of the first tile_, not the 1st row of the
 2nd tile.
 So, how do we fix this?
 Well, the simple but annoying way is to edit your tile image as being an 8 pixel
 wide image and then have the image get super tall as you add more and more
 tiles. It can work, but it's really impractical if you have any multi-tile
 things that you're trying to do.
 Instead, there are some image conversion tools that devkitpro provides in their
 gba-dev section. They let you take normal images and then repackage them and
 export it in various formats that you can then compile into your project.
 Ketsuban uses the [grit](http://www.coranac.com/projects/grit/) tool, with the
 following suggestions:
 1) Include an actual resource file and a file describing it somewhere in your
   project (see [the grit
   manual](http://www.coranac.com/man/grit/html/index.htm) for all details
   involved here).
 2) In a `build.rs` you run `grit` on each resource+description pair, such as in
   this [old gist
   example](https://gist.github.com/ketsuban/526fa55fbef0a3ccd4c7cd6204f29f94)
 3) Then within your rust code you use the
   [include_bytes!](https://doc.rust-lang.org/core/macro.include_bytes.html)
   macro to have the formatted resource be available as a const value you can
   load at runtime.
--- a/book/src-bak/video_memory_intro.md
+++ b/book/src-bak/video_memory_intro.md
@ -1,113 +0,0 @@
 # Video Memory Intro
 The GBA's Video RAM is 96k stretching from `0x0600_0000` to `0x0601_7FFF`.
 The Video RAM can only be accessed totally freely during a Vertical Blank (aka
 "VBlank", though sometimes I forget and don't capitalize it properly). At other
 times, if the CPU tries to touch the same part of video memory as the display
 controller is accessing then the CPU gets bumped by a cycle to avoid a clash.
 Annoyingly, VRAM can only be properly written to in 16 and 32 bit segments (same
 with PALRAM and OAM). If you try to write just an 8 bit segment, then both parts
 of the 16 bit segment get the same value written to them. In other words, if you
 write the byte `5` to `0x0600_0000`, then both `0x0600_0000` and ALSO
 `0x0600_0001` will have the byte `5` in them. We have to be extra careful when
 trying to set an individual byte, and we also have to be careful if we use
 `memcopy` or `memset` as well, because they're byte oriented by default and
 don't know to follow the special rules.
 ## RGB15
 As I said before, RGB15 stores a color within a `u16` value using 5 bits for
 each color channel.
 ```rust
 pub const RED:   u16 = 0b0_00000_00000_11111;
 pub const GREEN: u16 = 0b0_00000_11111_00000;
 pub const BLUE:  u16 = 0b0_11111_00000_00000;
 ```
 In Mode 3 and Mode 5 we write direct color values into VRAM, and in Mode 4 we
 write palette index values, and then the color values go into the PALRAM.
 ## Mode 3
 Mode 3 is pretty easy. We have a full resolution grid of rgb15 pixels. There's
 160 rows of 240 pixels each, with the base address being the top left corner. A
 particular pixel uses normal "2d indexing" math:
 ```rust
 let row_five_col_seven = 5 + (7 * SCREEN_WIDTH);
 ```
 To draw a pixel, we just write a value at the address for the row and col that
 we want to draw to.
 ## Mode 4
 Mode 4 introduces page flipping. Instead of one giant page at `0x0600_0000`,
 there's Page 0 at `0x0600_0000` and then Page 1 at `0x0600_A000`. The resolution
 for each page is the same as above, but instead of writing `u16` values, the
 memory is treated as `u8` indexes into PALRAM. The PALRAM starts at
 `0x0500_0000`, and there's enough space for 256 palette entries (each a `u16`).
 To set the color of a palette entry we just do a normal `u16` write_volatile.
 ```rust
 (0x0500_0000 as *mut u16).offset(target_index).write_volatile(new_color)
 ```
 To draw a pixel we set the palette entry that we want the pixel to use. However,
 we must remember the "minimum size" write limitation that applies to VRAM. So,
 if we want to change just a single pixel at a time we must
 1) Read the full `u16` it's a part of.
 2) Clear the half of the `u16` we're going to replace
 3) Write the half of the `u16` we're going to replace with the new value
 4) Write that result back to the address.
 So, the math for finding a byte offset is the same as Mode 3 (since they're both
 a 2d grid). If the byte offset is EVEN it'll be the high bits of the `u16` at
 half the byte offset rounded down. If the offset is ODD it'll be the low bits of
 the `u16` at half the byte.
 Does that make sense?
 * If we want to write pixel (0,0) the byte offset is 0, so we change the high
  bits of `u16` offset 0. Then we want to write to (1,0), so the byte offset is
  1, so we change the low bits of `u16` offset 0. The pixels are next to each
  other, and the target bytes are next to each other, good so far.
 * If we want to write to (5,6) that'd be byte `5 + 6 * 240 = 1445`, so we'd
  target the low bits of `u16` offset `floor(1445/2) = 722`.
 As you can see, trying to write individual pixels in Mode 4 is mostly a bad
 time. Fret not! We don't _have_ to write individual bytes. If our data is
 arranged correctly ahead of time we can just write `u16` or `u32` values
 directly. The video hardware doesn't care, it'll get along just fine.
 ## Mode 5
 Mode 5 is also a two page mode, but instead of compressing the size of a pixel's
 data to fit in two pages, we compress the resolution.
 Mode 5 is full `u16` color, but only 160w x 128h per page.
 ## In Conclusion...
 So what got written into VRAM in `hello1`?
 ```rust
    (0x06000000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
    (0x06000000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
    (0x06000000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
 ```
 So at pixels `(120,80)`, `(136,80)`, and `(120,96)` we write three values. Once
 again we probably need to [convert them](https://www.wolframalpha.com/) into
 binary to make sense of it.
 * 0x001F: 0b0_00000_00000_11111
 * 0x03E0: 0b0_00000_11111_00000
 * 0x7C00: 0b0_11111_00000_00000
 Ah, of course, a red pixel, a green pixel, and a blue pixel.
--- a/book/src/SUMMARY.md
+++ b/book/src/SUMMARY.md
@ -1,9 +0,0 @@
 # Rust GBA Guide
 * [Development Setup](development-setup.md)
 * [Volatile](volatile.md)
 * [The Hardware Memory Map](the-hardware-memory-map.md)
 * [IO Registers](io-registers.md)
 * [Bitmap Video](bitmap-video.md)
 * [GBA Assembly](gba-asm.md)
--- a/book/src/bitmap-video.md
+++ b/book/src/bitmap-video.md
@ -1,214 +0,0 @@
 # Bitmap Video
 Our first video modes to talk about are the bitmap video modes.
 It's not because they're the best and fastest, it's because they're the
 _simplest_. You can get going and practice with them really quickly. Usually
 after that you end up wanting to move on to the other video modes because they
 have better hardware support, so you can draw more complex things with the small
 number of cycles that the GBA allows.
 ## The Three Bitmap Modes
 As I said in the Hardware Memory Map section, the Video RAM lives in the address
 space at `0x600_0000`. Depending on our video mode the display controller will
 consider this memory to be in one of a few totally different formats.
 ### Mode 3
 The screen is 160 rows, each 240 pixels long, of `u16` color values.
 This is "full" resolution, and "full" color. It adds up to 76,800 bytes. VRAM is
 only 96,304 bytes total though. There's enough space left over after the bitmap
 for some object tile data if you want to use objects, but basically Mode3 is
 using all of VRAM as one huge canvas.
 ### Mode 4
 The screen is 160 rows, each 240 pixels long, of `u8` palette values.
 This has half as much space per pixel. What's a palette value? That's an index
 into the background PALRAM which says what the color of that pixel should be. We
 still have the full color space available, but we can only use 256 colors at the
 same time.
 What did we get in exchange for this? Well, now there's a second "page". The
 second page starts `0xA000` bytes into VRAM (in both Mode 4 and Mode 5). It's an
 entire second set of pixel data. You determine if Page 0 or Page 1 is shown
 using bit 4 of DISPCNT. When you swap which page is being displayed it's called
 page flipping or flipping the page, or something like that.
 Having two pages is cool, but Mode 4 has a big drawback: it's part of VRAM so
 that "can't write 1 byte at a time" rule applies. This means that to set a
 single byte we need to read a `u16`, adjust just one side of it, and then write
 that `u16` back. We can hide the complication behind a method call, but it
 simply takes longer to do all that, so editing pixels ends up being
 unfortunately slow compared to the other bitmap modes.
 ### Mode 5
 The screen is 128 rows, each 160 pixels long, of `u16` color values.
 Mode 5 has two pages like Mode 4 does, but instead of keeping full resolution we
 keep full color. The pixels are displayed in the top left and it's just black on
 the right and bottom edges. You can use the background control registers to
 shift it around, maybe center it, but there's no way to get around the fact that
 not having full resolution is kinda awkward.
 ## Using Mode 3
 Let's have a look at how this comes together. We'll call this one
 `hello_world.rs`, since it's our first real program.
 ### Module Attributes and Imports
 At the top of our file we're still `no_std` and we're still using
 `feature(start)`, but now we're using the `gba` crate so we're 100% safe code!
 Often enough we'll need a little `unsafe`, but for just bitmap drawing we don't
 need it.
 ```rust
 #![no_std]
 #![feature(start)]
 #![forbid(unsafe_code)]
 use gba::{
  fatal,
  io::{
    display::{DisplayControlSetting, DisplayMode, DISPCNT, VBLANK_SCANLINE, VCOUNT},
    keypad::read_key_input,
  },
  vram::bitmap::Mode3,
  Color,
 };
 ```
 ### Panic Handler
 Before we had a panic handler that just looped forever. Now that we're using the
 `gba` crate we can rely on the debug output channel from `mGBA` to get a message
 into the real world. There's macros setup for each message severity, and they
 all accept a format string and arguments, like how `println` works. The catch is
 that a given message is capped at a length of 255 bytes, and it should probably
 be ASCII only.
 In the case of the `fatal` message level, it also halts the emulator.
 Of course, if the program is run on real hardware then the `fatal` message won't
 stop the program, so we still need the infinite loop there too.
 (not that this program _can_ panic, but `rustc` doesn't know that so it demands
 we have a `panic_handler`)
 ```rust
 #[panic_handler]
 fn panic(info: &core::panic::PanicInfo) -> ! {
  // This kills the emulation with a message if we're running within mGBA.
  fatal!("{}", info);
  // If we're _not_ running within mGBA then we still need to not return, so
  // loop forever doing nothing.
  loop {}
 }
 ```
 ### Waiting Around
 Like I talked about before, sometimes we need to wait around a bit for the right
 moment to start doing work. However, we don't know how to do the good version of
 waiting for VBlank and VDraw to start, so we'll use the really bad version of it
 for now.
 ```rust
 /// Performs a busy loop until VBlank starts.
 ///
 /// This is very inefficient, and please keep following the lessons until we
 /// cover how interrupts work!
 pub fn spin_until_vblank() {
  while VCOUNT.read() < VBLANK_SCANLINE {}
 }
 /// Performs a busy loop until VDraw starts.
 ///
 /// This is very inefficient, and please keep following the lessons until we
 /// cover how interrupts work!
 pub fn spin_until_vdraw() {
  while VCOUNT.read() >= VBLANK_SCANLINE {}
 }
 ```
 ### Setup in `main`
 In main we set the display control value we want and declare a few variables
 we're going to use in our primary loop.
 ```rust
 #[start]
 fn main(_argc: isize, _argv: *const *const u8) -> isize {
  const SETTING: DisplayControlSetting =
    DisplayControlSetting::new().with_mode(DisplayMode::Mode3).with_bg2(true);
  DISPCNT.write(SETTING);
  let mut px = Mode3::WIDTH / 2;
  let mut py = Mode3::HEIGHT / 2;
  let mut color = Color::from_rgb(31, 0, 0);
 ```
 ### Stuff During VDraw
 When a frame starts we want to read the keys, then adjust as much of the game
 state as we can without touching VRAM.
 Once we're ready, we do our spin loop until VBlank starts.
 In this case, we're going to adjust `px` and `py` depending on the arrow pad
 input, and also we'll cycle around the color depending on L and R being pressed.
 ```rust
  loop {
    // read our keys for this frame
    let this_frame_keys = read_key_input();
    // adjust game state and wait for vblank
    px = px.wrapping_add(2 * this_frame_keys.x_tribool() as usize);
    py = py.wrapping_add(2 * this_frame_keys.y_tribool() as usize);
    if this_frame_keys.l() {
      color = Color(color.0.rotate_left(5));
    }
    if this_frame_keys.r() {
      color = Color(color.0.rotate_right(5));
    }
    // now we wait
    spin_until_vblank();
 ```
 ### Stuff During VBlank
 When VBlank starts we want want to update video memory to display the new
 frame's situation.
 In our case, we're going to paint a little square of the current color, but also
 if you go off the map it resets the screen.
 At the end, we spin until VDraw starts so we can do the whole thing again.
 ```rust
    // draw the new game and wait until the next frame starts.
    if px >= Mode3::WIDTH || py >= Mode3::HEIGHT {
      // out of bounds, reset the screen and position.
      Mode3::dma_clear_to(Color::from_rgb(0, 0, 0));
      px = Mode3::WIDTH / 2;
      py = Mode3::HEIGHT / 2;
    } else {
      // draw the new part of the line
      Mode3::write(px, py, color);
      Mode3::write(px, py + 1, color);
      Mode3::write(px + 1, py, color);
      Mode3::write(px + 1, py + 1, color);
    }
    // now we wait again
    spin_until_vdraw();
  }
 }
 ```
--- a/book/src/development-setup.md
+++ b/book/src/development-setup.md
@ -1,189 +0,0 @@
 # Development Setup
 Before you can build a GBA game you'll have to follow some special steps to
 setup the development environment.
 Once again, extra special thanks to **Ketsuban**, who first dove into how to
 make this all work with rust and then shared it with the world.
 ## Per System Setup
 Obviously you need your computer to have a [working rust
 installation](https://rustup.rs/). However, you'll also need to ensure that
 you're using a nightly toolchain (we will need it for inline assembly, among
 other potential useful features). You can run `rustup default nightly` to set
 nightly as the system wide default toolchain, or you can use a [toolchain
 file](https://github.com/rust-lang-nursery/rustup.rs#the-toolchain-file) to use
 nightly just on a specific project, but either way we'll be assuming the use of
 nightly from now on. You'll also need the `rust-src` component so that
 `cargo-xbuild` will be able to compile the core crate for us in a bit, so run
 `rustup component add rust-src`.
 Next, you need [devkitpro](https://devkitpro.org/wiki/Getting_Started). They've
 got a graphical installer for Windows that runs nicely, and I guess `pacman`
 support on Linux (I'm on Windows so I haven't tried the Linux install myself).
 We'll be using a few of their general binutils for the `arm-none-eabi` target,
 and we'll also be using some of their tools that are specific to GBA
 development, so _even if_ you already have the right binutils for whatever
 reason, you'll still want devkitpro for the `gbafix` utility.
 * On Windows you'll want something like `C:\devkitpro\devkitARM\bin` and
  `C:\devkitpro\tools\bin` to be [added to your
  PATH](https://stackoverflow.com/q/44272416/455232), depending on where you
  installed it to and such.
 * On Linux you can use pacman to get it, and the default install puts the stuff
  in `/opt/devkitpro/devkitARM/bin` and `/opt/devkitpro/tools/bin`. If you need
  help you can look in our repository's
  [.travis.yml](https://github.com/rust-console/gba/blob/master/.travis.yml)
  file to see exactly what our CI does.
 Finally, you'll need `cargo-xbuild`. Just run `cargo install cargo-xbuild` and
 cargo will figure it all out for you.
 ## Per Project Setup
 Once the system wide tools are ready, you'll need some particular files each
 time you want to start a new project. You can find them in the root of the
 [rust-console/gba repo](https://github.com/rust-console/gba).
 * `thumbv4-none-agb.json` describes the overall GBA to cargo-xbuild (and LLVM)
  so it knows what to do. Technically the GBA is `thumbv4-none-eabi`, but we
  change the `eabi` to `agb` so that we can distinguish it from other `eabi`
  devices when using `cfg` flags.
 * `crt0.s` describes some ASM startup stuff. If you have more ASM to place here
  later on this is where you can put it. You also need to build it into a
  `crt0.o` file before it can actually be used, but we'll cover that below.
 * `linker.ld` tells the linker all the critical info about the layout
  expectations that the GBA has about our program, and that it should also
  include the `crt0.o` file with our compiled rust code.
 ## Compiling
 Once all the tools are in place, there's particular steps that you need to
 compile the project. For these to work you'll need some source code to compile.
 Unlike with other things, an empty main file and/or an empty lib file will cause
 a total build failure, because we'll need a
 [no_std](https://rust-embedded.github.io/book/intro/no-std.html) build, and rust
 defaults to builds that use the standard library. The next section has a minimal
 example file you can use (along with explanation), but we'll describe the build
 steps here.
 * `arm-none-eabi-as crt0.s -o target/crt0.o`
  * This builds your text format `crt0.s` file into object format `crt0.o`
    that's placed in the `target/` directory. Note that if the `target/`
    directory doesn't exist yet it will fail, so you have to make the directory
    if it's not there. You don't need to rebuild `crt0.s` every single time,
    only when it changes, but you might as well throw a line to do it every time
    into your build script so that you never forget because it's a practically
    instant operation anyway.
 * `cargo xbuild --target thumbv4-none-agb.json`
  * This builds your Rust source. It accepts _most of_ the normal options, such
    as `--release`, and options, such as `--bin foo` or `--examples`, that you'd
    expect `cargo` to accept.
  * You **can not** build and run tests this way, because they require `std`,
    which the GBA doesn't have. If you want you can still run some of your
    project's tests with `cargo test --lib` or similar, but that builds for your
    local machine, so anything specific to the GBA (such as reading and writing
    registers) won't be testable that way. If you want to isolate and try out
    some piece code running on the GBA you'll unfortunately have to make a demo
    for it in your `examples/` directory and then run the demo in an emulator
    and see if it does what you expect.
  * The file extension is important! It will work if you forget it, but `cargo
    xbuild` takes the inclusion of the extension as a flag to also compile
    dependencies with the same sysroot, so you can include other crates in your
    build. Well, crates that work in the GBA's limited environment, but you get
    the idea.
 At this point you have an ELF binary that some emulators can execute directly
 (more on that later). However, if you want a "real" ROM that works in all
 emulators and that you could transfer to a flash cart to play on real hardware
 there's a little more to do.
 * `arm-none-eabi-objcopy -O binary target/thumbv4-none-agb/MODE/BIN_NAME target/ROM_NAME.gba`
  * This will perform an [objcopy](https://linux.die.net/man/1/objcopy) on our
    program. Here I've named the program `arm-none-eabi-objcopy`, which is what
    devkitpro calls their version of `objcopy` that's specific to the GBA in the
    Windows install. If the program isn't found under that name, have a look in
    your installation directory to see if it's under a slightly different name
    or something.
  * As you can see from reading the man page, the `-O binary` option takes our
    lovely ELF file with symbols and all that and strips it down to basically a
    bare memory dump of the program.
  * The next argument is the input file. You might not be familiar with how
    `cargo` arranges stuff in the `target/` directory, and between RLS and
    `cargo doc` and stuff it gets kinda crowded, so it goes like this:
    * Since our program was built for a non-local target, first we've got a
      directory named for that target, `thumbv4-none-agb/`
    * Next, the "MODE" is either `debug/` or `release/`, depending on if we had
      the `--release` flag included. You'll probably only be packing release
      mode programs all the way into GBA roms, but it works with either mode.
    * Finally, the name of the program. If your program is something out of the
      project's `src/bin/` then it'll be that file's name, or whatever name you
      configured for the bin in the `Cargo.toml` file. If your program is
      something out of the project's `examples/` directory there will be a
      similar `examples/` sub-directory first, and then the example's name.
  * The final argument is the output of the `objcopy`, which I suggest putting
    at just the top level of the `target/` directory. Really it could go
    anywhere, but if you're using git then it's likely that your `.gitignore`
    file is already setup to exclude everything in `target/`, so this makes sure
    that your intermediate game builds don't get checked into your git.
 * `gbafix target/ROM_NAME.gba`
  * The `gbafix` tool also comes from devkitpro. The GBA is very picky about a
    ROM's format, and `gbafix` patches the ROM's header and such so that it'll
    work right. Unlike `objcopy`, this tool is custom built for GBA development,
    so it works just perfectly without any arguments beyond the file name. The
    ROM is patched in place, so we don't even need to specify a new destination.
 And you're _finally_ done!
 Of course, you probably want to make a script for all that, but it's up to you.
 On our own project we have it mostly set up within a `Makefile.toml` which runs
 using the [cargo-make](https://github.com/sagiegurari/cargo-make) plugin.
 ## Checking Your Setup
 As I said, you need some source code to compile just to check that your
 compilation pipeline is working. Here's a sample file that just puts three dots
 on the screen without depending on any crates or anything at all.
 `hello_magic.rs`:
 ```rust
 #![no_std]
 #![feature(start)]
 #[panic_handler]
 fn panic(_info: &core::panic::PanicInfo) -> ! {
  loop {}
 }
 #[start]
 fn main(_argc: isize, _argv: *const *const u8) -> isize {
  unsafe {
    (0x400_0000 as *mut u16).write_volatile(0x0403);
    (0x600_0000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
    (0x600_0000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
    (0x600_0000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
    loop {}
  }
 }
 #[no_mangle]
 static __IRQ_HANDLER: extern "C" fn() = irq_handler;
 extern "C" fn irq_handler() {}
 ```
 Throw that into your project skeleton, build the program, and give it a run in
 an emulator. I suggest [mgba](https://mgba.io/2019/01/26/mgba-0.7.0/), it has
 some developer tools we'll use later on. You should see a red, green, and blue
 dot close-ish to the middle of the screen. If you don't, something _already_
 went wrong. Double check things, phone a friend, write your senators, try asking
 `Lokathor` or `Ketsuban` on the [Rust Community
 Discord](https://discordapp.com/invite/aVESxV8), until you're eventually able to
 get your three dots going.
 Of course, I'm sure you want to know why those particular numbers are the
 numbers to use. Well that's what the whole rest of the book is about!
--- a/book/src/gba-asm.md
+++ b/book/src/gba-asm.md
@ -1,123 +0,0 @@
 # GBA Assembly
 On the GBA sometimes you just end up using assembly. Not a whole lot, but
 sometimes. Accordingly, you should know how assembly works on the GBA.
 * The [ARM Infocenter:
  ARM7TDMI](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0210c/index.html)
  is the basic authority for reference information. The GBA has a CPU with the
  `ARMv4` ISA, the `ARMv4T` variant, and specifically the `ARM7TDMI`
  microarchitecture. Someone at ARM decided that having both `ARM#` and `ARMv#`
  was a good way to [version things](https://en.wikichip.org/wiki/arm/versions),
  even when the numbers don't match. The rest of us have been sad ever since.
  The link there will take you to the correct book specific to the GBA's
  microarchitecture. There's a whole big pile of ARM books available within the
  ARM Infocenter, so if you just google it or whatever make sure you end up
  looking at the correct one. Note that there is also a [PDF
  Version](http://infocenter.arm.com/help/topic/com.arm.doc.ddi0210c/DDI0210B.pdf)
  of the documentation available, if you'd like that.
 * In addition to the `ARM7TDMI` book, which is specific to the GBA's CPU, you'll
  need to find a copy of the ARM Architecture Reference Manual if you want
  general ARM knowledge. The ARM Infocenter has the
  [ARMv5](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0100i/index.html)
  version of said manual hosted on their site. Unfortunately, they don't seem to
  host the `ARMv4T` version of the manual any more.
 * The [GBATek: ARM CPU
  Overview](https://problemkaputt.de/gbatek.htm#armcpuoverview) also has quite a
  bit of info. Some of it is a duplication of what you'd find in the ARM
  Infocenter reference manuals. Some of it is information that's specific to the
  GBA's layout and how the CPU interacts with other parts (such as how its
  timings and the display adapter's timings line up). Some of it is specific to
  the ARM chips _within the DS and DSi_, so be careful to make sure that you
  don't wander into the wrong section. GBATEK is always a bit of a jumbled mess,
  and the explanations are often "sparse" (to put it nicely), so I'd advise that
  you also look at the official ARM manuals.
 * The [Compiler Explorer](https://rust.godbolt.org/z/ndCnk3) can be used to
  quickly look at assembly versions of your Rust code. That link there will load
  up an essentially blank `no_std` file with `opt-level=3` set and targeting
  `thumbv6m-none-eabi`. That's _not_ the same target as the GBA (it's two ISA
  revisions later, `ARMv6` instead of `ARMv4`), but it's the closest CPU target
  that is bundled with `rustc`, so it's the closest you can get with the
  compiler explorer website. If you're very dedicated I suppose you could setup
  a [local
  instance](https://github.com/mattgodbolt/compiler-explorer#running-a-local-instance)
  of compiler explorer and then add the extra target definition and so on, but
  that's _probably_ overkill.
 ## ARM and Thumb
 The "T" part in `ARMv4T` and `ARM7TDMI` means "Thumb". An ARM chip that supports
 Thumb has two different instruction sets instead of just one. The chip can run
 in ARM state with 32-bit instructions, or it can run in Thumb state with 16-bit
 instructions. Note that the CPU _state_ (ARM or Thumb) is distinct from the
 _mode_ (User, FIQ, IRQ, etc). Apparently these states are sometimes called
 `a32` and `t32` in a more modern context, but I will stick with ARM and Thumb
 because that's what the official ARM7TDMI manual and GBATEK both use.
 On the GBA, the memory bus that physically transfers data from the cartridge into
 the device is a 16-bit memory bus. This means that if you need to transfer more
 than 16 bits at a time you have to do more than one transfer. Since we'd like
 our instructions to get to the CPU as fast as possible, we compile the majority
 of our program with the Thumb instruction set. The ARM reference says that with
 Thumb instructions on a 16-bit memory bus system you get about 160% performance
 compared to using ARM instructions. That's absolutely something we want to take
 advantage of. Also, your Thumb compiled code is about 65% of the same code
 compiled with ARM. Since a game ROM can only be 32MB total, and we're trying to
 fit in images and sound too, we want to get space savings where we can.
 You may wonder, why is the Thumb code 65% as large if the instructions
 themselves are 50% as large, and why have ARM state at all if there's such a
 benefit to be had with Thumb? Well, Thumb state doesn't support as many different
 instructions as ARM state does. Some lines of source code that can compile to a
 single ARM instruction might need to compile into more than one Thumb
 instruction. Thumb still has most of the really good instructions available, so
 it all averages out to about 65%.
 That said, some parts of a GBA program _must_ be written for ARM state. Also,
 ARM state does allow that increased instruction flexibility. So we _need_ to use
 ARM some of the time, and we might just _want_ to use ARM even when we don't
 need to at other times. It is possible to switch states on the fly, there's
 extremely minimal overhead, even less than doing some function calls. The only
 problem is the 16-bit memory bus of the cartridge giving us a needless speed
 penalty with our ARM code. The CPU _executes_ the ARM instructions at full
 speed, but then it has to wait while more instructions get sent in. What do we
 do? Well, code is ultimately just a different kind of data. We can copy parts of
 our code off the cartridge ROM and place it into a part of the RAM that has a
 32-bit memory bus. Then the CPU can execute the code from there, going at full
 speed. Of course, there's only a very small amount of RAM compared to the size
 of a cartridge, so we'll only do this with a few select functions. Exactly which
 functions will probably depend on your game.
 There's two problems that we face as Rust programmers:
 1) Rust offers no way to specify individual functions as being ARM or Thumb. The
   whole program is compiled for one state or the other. Obviously this is no
   good, so it's on the [2019 embedded
   wishlist](https://github.com/rust-embedded/wg/issues/256#issuecomment-439677804),
   and perhaps a fix will come.
 2) Rust offers no way to get a pointer to a function as well as the length of
   the compiled function, so we can't copy a function from the ROM to some other
   location because we can't even express statements about the function's data.
   I also put this [on the
   wishlist](https://github.com/rust-embedded/wg/issues/256#issuecomment-450539836),
   but honestly I have much less hope that this becomes a part of rust.
 What this ultimately means is that some parts of our program have to be written
 in external assembly files and then added to the program with the linker. We
 were already going to write some assembly, and we already use more than one file
 in our project all the time, those parts aren't a big problem. The big problem
 is that using custom linker scripts to get assembly code into our final program
 isn't transitive between crates.
 What I mean is that once we have a file full of custom assembly that we're
 linking in by hand, that's not "part of" the crate any more. At least not as
 `cargo` sees it. So we can't just upload it to `crates.io` and then depend on it
 in other projects and have `cargo` download the right version and and include it
 all automatically. We're back to fully manually copying files from the old
 project into the new one, adding more lines to the linker script each time we
 split up a new assembly file, all that stuff. Like the stone age. Sometimes ya
 gotta suffer for your art.
--- a/book/src/io-registers.md
+++ b/book/src/io-registers.md
@ -1,237 +0,0 @@
 # IO Registers
 As I said before, the IO registers are how you tell the GBA to do all the things
 you want it to do. If you want a hint at what's available, they're all listed
 out in the [GBA I/O Map](https://problemkaputt.de/gbatek.htm#gbaiomap) section
 of GBATEK. Go have a quick look.
 Each individual IO register has a particular address just like we talked about
 in the Hardware Memory Map section. They also have a size (listed in bytes), and
 a note on if they're read only, write only, or read-write. Finally, each
 register has a name and a one line summary. Unfortunately for us, the names are
 all C style names with heavy shorthand. I'm not normally a fan of shorthand
 names, but the `gba` crate uses the register names from GBATEK as much as
 possible, since they're the most commonly used set of names among GBA
 programmers. That way, if you're reading other guides and they say to set the
 `BG2CNT` register, then you know exactly what register to look for within the
 `gba` docs.
 ## Register Bits
 There's only about 100 registers, but there's a lot more than 100 details we
 want to have control over on the GBA. How does that work? Well, let's use a
 particular register to talk about it. The first one on the list is `DISPCNT`,
 the "Display Control" register. It's one of the most important IO registers, so
 this is a "two birds with one stone" situation.
 Naturally there's a whole lot of things involved in the LCD that we want to
 control, and it's all "one" value, but that value is actually many "fields"
 packed into one value. When learning about an IO register, you have to look at
 its bit pattern breakdown. For `DISPCNT` the GBATEK entry looks like this:
 ```txt
 4000000h - DISPCNT - LCD Control (Read/Write)
  Bit   Expl.
  0-2   BG Mode                (0-5=Video Mode 0-5, 6-7=Prohibited)
  3     Reserved / CGB Mode    (0=GBA, 1=CGB; can be set only by BIOS opcodes)
  4     Display Frame Select   (0-1=Frame 0-1) (for BG Modes 4,5 only)
  5     H-Blank Interval Free  (1=Allow access to OAM during H-Blank)
  6     OBJ Character VRAM Mapping (0=Two dimensional, 1=One dimensional)
  7     Forced Blank           (1=Allow FAST access to VRAM,Palette,OAM)
  8     Screen Display BG0  (0=Off, 1=On)
  9     Screen Display BG1  (0=Off, 1=On)
  10    Screen Display BG2  (0=Off, 1=On)
  11    Screen Display BG3  (0=Off, 1=On)
  12    Screen Display OBJ  (0=Off, 1=On)
  13    Window 0 Display Flag   (0=Off, 1=On)
  14    Window 1 Display Flag   (0=Off, 1=On)
  15    OBJ Window Display Flag (0=Off, 1=On)
 ```
 So what we're supposed to understand here is that we've got a `u16`, and then we
 set the individual bits for the things that we want. In the `hello_magic`
 example you might recall that we set this register to the value `0x0403`. That
 was a bit of a trick on my part because hex numbers usually look far more
 mysterious than decimal or binary numbers. If we converted it to binary it'd
 look like this:
 ```rust
 0b100_0000_0011
 ```
 And then you can just go down the list of settings to see what bits are what:
 * Bits 0-2 (BG Mode) are `0b011`, so that's Video Mode 3
 * Bit 10 (Display BG2) is enabled
 * Everything else is disabled
 Naturally, trying to remember exactly what bit does what can be difficult. In
 the `gba` crate we attempt as much as possible to make types that wrap over a
 `u16` or `u32` and then have getters and setters _as if_ all the inner bits were
 different fields.
 * If it's a single bit then the getter/setter will use `bool`.
 * If it's more than one bit and each pattern has some non-numeric meaning then
  it'll use an `enum`.
 * If it's more than one bit and numeric in nature then it'll just use the
  wrapped integer type. Note that you generally won't get the full range of the
  inner number type, and any excess gets truncated down to fit in the bits
  available.
 All the getters and setters are defined as `const` functions, so you can make
 constant declarations for the exact setting combinations that you want.
 ## Some Important IO Registers
 It's not easy to automatically see what registers will be important for getting
 started and what registers can be saved to learn about later.
 We'll go over three IO registers here that will help us the most to get started,
 then next lesson we'll cover how that Video Mode 3 bitmap drawing works, and
 then by the end of the next lesson we'll be able to put it all together into
 something interactive.
 ### DISPCNT: Display Control
 The [DISPCNT](https://problemkaputt.de/gbatek.htm#lcdiodisplaycontrol) register
 lets us affect the major details of our video output. There's a lot of other
 registers involved too, but it all starts here.
 ```rust
 pub const DISPCNT: VolAddress<DisplayControlSetting> = unsafe { VolAddress::new(0x400_0000) };
 ```
 As you can see, the display control register is, like most registers,
 complicated enough that we make it a dedicated type with getters and setters for
 the "phantom" fields. In this case it's mostly a bunch of `bool` values we can
 set, and also the video mode is an `enum`.
 We already looked at the bit listing above, let's go over what's important right
 now and skip the other bits:
 * BG Mode sets how the whole screen is going to work and even how the display
  adapter is going to interpret the bit layout of video memory for pixel
  processing. We'll start with Mode 3, which is the simplest to learn.
 * The "Forced Blank" bit is one of the very few bits that starts _on_ at the
  start of the main program. When it's enabled it prevents the display adapter
  from displaying anything at all. You use this bit when you need to do a very
  long change to video memory and you don't want the user to see the
  intermediate states being partly drawn.
 * The "Screen Display" bits let us enable different display layers. We care
  about BG2 right now because the bitmap modes (3, 4, and 5) are all treated as
  if they were drawing into BG2 (even though it's the only BG layer available in
  those modes).
 There's a bunch of other stuff, but we'll get to those things later. They're not
 relevent right now, and there's enough to learn already. Already we can see that
 when the `hello_magic` demo says
 ```rust
  (0x400_0000 as *mut u16).write_volatile(0x0403);
 ```
 We could re-write that more sensibly like this
 ```rust
  const SETTING: DisplayControlSetting =
    DisplayControlSetting::new().with_mode(DisplayMode::Mode3).with_bg2(true);
  DISPCNT.write(SETTING);
 ```
 ### VCOUNT: Vertical Display Counter
 The [VCOUNT](https://problemkaputt.de/gbatek.htm#lcdiointerruptsandstatus)
 register lets us find out what row of pixels (called a **scanline**) is
 currently being processed.
 ```rust
 pub const VCOUNT: ROVolAddress<u16> = unsafe { ROVolAddress::new(0x400_0006) };
 ```
 You see, the display adapter is constantly running its own loop, along side the
 CPU. It starts at the very first pixel of the very first scanline, takes 4
 cycles to determine what color that pixel is, and then processes the next
 pixel. Each scanline is 240 pixels long, followed by 68 "virtual" pixels so that
 you have just a moment to setup for the next scanline to be drawn if you need
 it. 272 cycles (68*4) is not a lot of time, but it's enough that you could
 change some palette colors or move some objects around if you need to.
 * Horizontal pixel value `0..240`: "HDraw"
 * Horizontal pixel value `240..308`: "HBlank"
 There's no way to check the current horizontal counter, but there is a way to
 have the CPU interrupt the normal code when the HBlank period starts, which
 we'll learn about later.
 Once a complete scanline has been processed (including the blank period), the
 display adapter keeps going with the next scanline. Similar to how the
 horizontal processing works, there's 160 scanlines in the real display, and then
 it's followed by 68 "virtual" scanlines to give you time for adjusting video
 memory between the frames of the game.
 * Vertical Count `0..160`: "VDraw"
 * Vertical Count `160..228`: "VBlank"
 Once every scanline has been processed (including the vblank period), the
 display adapter starts the whole loop over again with scanline 0. A total of
 280,896 cycles per display loop (4 * 308 * 228), and about 59.59ns per CPU
 cycle, gives us a full speed display rate of 59.73fps. That's close enough to
 60fps that I think we can just round up a bit whenever we're not counting it
 down to the exact cycle timings.
 However, there's a bit of a snag. If we change video memory during the middle of
 a scanline the display will _immediately_ start processing using the new state
 of video memory. The picture before the change and after the change won't look
 like a single, clean picture. Instead you'll get what's called "[screen
 tearing](https://en.wikipedia.org/wiki/Screen_tearing)", which is usually
 considered to be the mark of a badly programmed game.
 To avoid this we just need to only adjust video memory during one of the blank
 periods. If you're really cool you can adjust things during HBlank, but we're
 not that cool yet. Starting out our general program flow will be:
 1) Gather input for the frame (next part of this lesson) and update the game
   state, getting everything ready for when VBlank actually starts.
 2) Once VBlank starts we update all of the video memory as fast as we can.
 3) Once we're done drawing we again wait for the VDraw period to begin and then
   do it all again.
 Now, it's not the most efficient way, but to get our timings right we can just
 read from `VCOUNT` over and over in a "busy loop". Once we read a value of 160
 we know that we've entered VBlank. Once it goes back to 0 we know that we're
 back in VDraw.
 Doing a busy loop like this actually drains the batteries way more than
 necessary. It keeps the CPU active constantly, which is what uses a fair amount
 of the power. Normally you're supposed to put the CPU to sleep if you're just
 waiting around for something to happen. However, that also requires learning
 about some more concepts to get right. So to keep things easier starting out
 we'll do the bad/lazy version and then upgrade our technique later.
 ### KEYINPUT: Key Input Reading
 The [KEYINPUT](https://problemkaputt.de/gbatek.htm#gbakeypadinput) register is
 the last one we've got to learn about this lesson. It lets you check the status
 of all 10 buttons on the GBA.
 ```rust
 pub const KEYINPUT: ROVolAddress<u16> = unsafe { ROVolAddress::new(0x400_0130) };
 ```
 There's little to say here. It's a read only register, and the data just
 contains one bit per button. The only thing that's a little weird about it is
 that the bits follow a "low active" convention, so if the button is pressed then
 the bit is 0, and if the button is released the bit is 1.
 You _could_ work with that directly, but I think it's a lot easier to think
 about having `true` for pressed and `false` for not pressed. So the `gba` crate
 flips the bits when you read the keys:
 ```rust
 /// Gets the current state of the keys
 pub fn read_key_input() -> KeyInput {
  KeyInput(KEYINPUT.read() ^ 0b0000_0011_1111_1111)
 }
 ```
 Now we can treat the KeyInput values like a totally normal bitset.
--- a/book/src/the-hardware-memory-map.md
+++ b/book/src/the-hardware-memory-map.md
@ -1,379 +0,0 @@
 # The Hardware Memory Map
 So we saw `hello_magic.rs` and then we learned what `volatile` was all about,
 but we've still got a few things that are a bit mysterious. You can't just cast
 a number into a pointer and start writing to it! That's totally crazy! That's
 writing to un-allocated memory! Against the rules!
 Well, _kinda_. It's true that you're not allowed to write _anywhere at all_, but
 those locations were carefully selected locations.
 You see, on a modern computer if you need to check if a key is pressed you ask
 the Operating System (OS) to please go check for you. If you need to play a
 sound, you ask the OS to please play the sound on a default sound output. If you
 need to show a picture you ask the OS to give you access to the video driver so
 that you can ask the video driver to please put some pixels on the screen.
 That's mostly fine, except how does the OS actually do it? It doesn't have an OS
 to go ask, it has to stop somewhere.
 Ultimately, every piece of hardware is mapped into somewhere in the address
 space of the CPU. You can't actually tell that this is the case as a normal user
 because your program runs inside a virtualized address space. That way you can't
 go writing into another program's memory and crash what they're doing or steal
 their data (well, hopefully, it's obviously not perfect). Outside of the
 virtualization layer the OS is running directly in the "true" address space, and
 it can access the hardware on behalf of a program whenever it's asked to.
 How does directly accessing the hardware work, _precisely_? It's just the same
 as accessing the RAM. Each address holds some bits, and the CPU picks an address
 and loads in the bits. Then the program gets the bits and has to decide what
 they mean. The "driver" of a hardware device is just the layer that translates
 between raw bits in the outside world and more meaningful values inside of the
 program.
 Of course, memory mapped hardware can change its bits at any time. The user can
 press and release a key and you can't stop them. This is where `volatile` comes
 in. Whenever there's memory mapped hardware you want to access it with
 `volatile` operations so that you can be sure that you're sending the data every
 time, and that you're getting fresh data every time.
 ## GBA Specifics
 That's enough about the general concept of memory mapped hardware, let's get to
 some GBA specifics. The GBA has the following sections in its memory map.
 * BIOS
 * External Work RAM (EWRAM)
 * Internal Work RAM (IWRAM)
 * IO Registers
 * Palette RAM (PALRAM)
 * Video RAM (VRAM)
 * Object Attribute Memory (OAM)
 * Game Pak ROM (ROM)
 * Save RAM (SRAM)
 Each of these has a few key points of interest:
 * **Bus Width:** Also just called "bus", this is how many little wires are
  _physically_ connecting a part of the address space to the CPU. If you need to
  transfer more data than fits in the bus you have to do repeated transfers
  until it all gets through.
 * **Read/Write Modes:** Most parts of the address space can be read from in 8,
  16, or 32 bits at a time (there's a few exceptions we'll see). However, a
  significant portion of the address space can't accept 8 bit writes. Usually
  this isn't a big deal, but standard `memcopy` routine switches to doing a
  byte-by-byte copy in some situations, so we'll have to be careful about using
  it in combination with those regions of the memory.
 * **Access Speed:** On top of the bus width issue, not all memory can be
  accessed at the same speed. The "fast" parts of memory can do a read or write
  in 1 cycle, but the slower parts of memory can take a few cycles per access.
  These are called "wait cycles". The exact timings depend on what you configure
  the system to use, which is also limited by what your cartridge physically
  supports. You'll often see timings broken down into `N` cycles (non-sequential
  memory access) and `S` cycles (sequential memory access, often faster). There
  are also `I` cycles (internal cycles) which happen whenever the CPU does an
  internal operation that's more than one cycle to complete (like a multiply).
  Don't worry, you don't have to count exact cycle timings unless you're on the
  razor's edge of the GBA's abilities. For more normal games you just have to be
  mindful of what you're doing and it'll be fine.
 Let's briefly go over the major talking points of each memory region. All of
 this information is also available in GBATEK, mostly in their [memory
 map](http://www.akkit.org/info/gbatek.htm#gbamemorymap) section (though somewhat
 spread through the rest of the document too).
 Though I'm going to list the location range of each memory space below, most of
 the hardware locations are actually mirrored at several points throughout the
 address space.
 ### BIOS
 * **Location:** `0x0` to `0x3FFF`
 * **Bus:** 32-bit
 * **Access:** Memory protected read-only (see text).
 * **Wait Cycles:** None
 The "basic input output system". This contains a grab bag of utilities that do
 various tasks. The code is optimized for small size rather than great speed, so
 you can sometimes write faster versions of these routines. Also, calling a bios
 function has more overhead than a normal function call. You can think of bios
 calls as being similar to system calls to the OS on a desktop computer. Useful,
 but costly.
 As a side note, not only is BIOS memory read only, but it's memory protected so
 that you can't even read from bios memory unless the system is currently
 executing a function that's in bios memory. If you try then the system just
 gives back a nonsensical value that's not really what you asked for. If you
 really want to know what's inside, there's actually a bug in one bios call
 (`MidiKey2Freq`) that lets you read the bios section one byte at a time.
 Also, there's not just one bios! Of course there's the official bios from
 Nintendo that's used on actual hardware, but since that's code instead of
 hardware it's protected by copyright. Since a bios is needed to run a GBA
 emulator properly, people have come up with their own open source versions or
 they simply make the emulator special case the bios and act _as if_ the function
 call had done the right thing.
 * The [TempGBA](https://github.com/Nebuleon/TempGBA) repository has an easy to
  look at version written in assembly. It's API and effects are close enough to
  the Nintendo version that most games will run just fine.
 * You can also check out the [mGBA
  bios](https://github.com/mgba-emu/mgba/blob/master/src/gba/bios.c) if you want
  to see the C version of what various bios functions are doing.
 ### External Work RAM (EWRAM)
 * **Location:** `0x200_0000` to `0x203_FFFF` (256k)
 * **Bus:** 16-bit
 * **Access:** Read-write, any size.
 * **Wait Cycles:** 2
 The external work ram is a sizable amount of space, but the 2 wait cycles per
 access and 16-bit bus mean that you should probably think of it as being a
 "heap" to avoid putting things in if you don't have to.
 The GBA itself doesn't use this for anything, so any use is totally up to you.
 At the moment, the linker script and `crt0.s` files provided with the `gba`
 crate also have no defined use for the EWRAM, so it's 100% on you to decide how
 you wanna use them.
 (Note: There is an undocumented control register that lets you adjust the wait
 cycles on EWRAM. Using it, you can turn EWRAM from the default 2 wait cycles
 down to 1. However, not all GBA-like things support it. The GBA and GBA SP do,
 the GBA Micro and DS do not. Emulators might or might not depending on the
 particular emulator. See the [GBATEK system
 control](https://problemkaputt.de/gbatek.htm#gbasystemcontrol) page for a full
 description of that register, though probably only once you've read more of this
 tutorial book and know how to make sense of IO registers and such.)
 ### Internal Work RAM (IWRAM)
 * **Location:** `0x300_0000` to `0x300_7FFF` (32k)
 * **Bus:** 32-bit
 * **Access:** Read-write, any size.
 * **Wait Cycles:** 0
 This is where the "fast" memory for general purposes lives. By default the
 system uses the 256 _bytes_ starting at `0x300_7F00` _and up_ for system and
 interrupt purposes, while Rust's program stack starts at that same address _and
 goes down_ from there.
 Even though your stack exists in this space, it's totally reasonable to use the
 bottom parts of this memory space for whatever quick scratch purposes, same as
 EWRAM. 32k is fairly huge, and the stack going down from the top and the scratch
 data going up from the bottom are unlikely to hit each other. If they do you
 were probably well on your way to a stack overflow anyway.
 The linker script and `crt0.s` file provided with the `gba` crate use the bottom
 of IWRAM to store the `.data` and `.bss` [data
 segments](https://en.wikipedia.org/wiki/Data_segment). That's where your global
 variables get placed (both `static` and `static mut`). The `.data` segment holds
 any variable that's initialized to non-zero, and the `.bss` section is for any
 variable initialized to zero. When the GBA is powered on, some code in the
 `crt0.s` file runs and copies the initial `.data` values into place within IWRAM
 (all of `.bss` starts at 0, so there's no copy for those variables).
 If you have no global variables at all, then you don't need to worry about those
 details, but if you do have some global variables then you can use the _address
 of_ the `__bss_end` symbol defined in the top of the `gba` crate as a marker for
 where it's safe for you to start using IWRAM without overwriting your globals.
 ### IO Registers
 * **Location:** `0x400_0000` to `0x400_03FE`
 * **Bus:** 32-bit
 * **Access:** different for each IO register
 * **Wait Cycles:** 0
 The IO Registers are where most of the magic happens, and it's where most of the
 variety happens too. Each IO register is a specific width, usually 16-bit but
 sometimes 32-bit. Most of them are fully read/write, but some of them are read
 only or write only. Some of them have individual bits that are read only even
 when the rest of the register is writable. Some of them can be written to, but
 the write doesn't change the value you read back, it sets something else.
 Really.
 The IO registers are how you control every bit of hardware besides the CPU
 itself. Reading the buttons, setting display modes, enabling timers, all of that
 goes through different IO registers. Actually, even a few parts of the CPU's
 operation can be controlled via IO register.
 We'll go over IO registers more in the next section, including a few specific
 registers, and then we'll constantly encounter more IO registers as we explore
 each new topic through the rest of the book.
 ### Palette RAM (PALRAM)
 * **Location:** `0x500_0000` to `0x500_03FF` (1k)
 * **Bus:** 16-bit
 * **Access:** Read any, single bytes mirrored (see text).
 * **Wait Cycles:** Video Memory Wait (see text)
 This is where the GBA stores color palette data. There's 256 slots for
 Background color, and then 256 slots for Object color.
 GBA colors are 15 bits each, with five bits per channel and the highest bit
 being totally ignored, so we store them as `u16` values:
 * `X_BBBBB_GGGGG_RRRRR`
 Of note is the fact that the 256 palette slots can be viewed in two different
 ways. There's two different formats for images in video memory: "8 bit per
 pixel" (8bpp) and "4 bit per pixel mode" (4bpp).
 * **8bpp:** Each pixel in the image is 8 bits and indexes directly into the full
  256 entry palette array. An index of 0 means that pixel should be transparent,
  so there's 255 possible colors.
 * **4bpp:** Each pixel in the image is 4 bits and indexes into a "palbank" of 16
  colors within the palette data. Some exterior control selects the palbank to
  be used. An index of 0 still means that the pixel should be transparent, so
  there's 15 possible colors.
 Different images can use different modes all at once, as long as you can fit all
 the colors you want to use into your palette layout.
 PALRAM can't be written to in individual bytes. This isn't normally a problem at
 all, because you wouldn't really want to write half of a color entry anyway. If
 you do try to write a single byte then it gets "mirrored" into both halves of
 the `u16` that would be associated with that address. For example, if you tried
 to write `0x01u8` to either `0x500_0000` or `0x500_0001` then you'd actually
 _effectively_ be writing `0x0101u16` to `0x500_0000`.
 PALRAM follows what we'll call the "Video Memory Wait" rule: If you to access
 the memory during a vertical blank or horizontal blank period there's 0 wait
 cycles, and if you try to access the memory while the display controller is
 drawing there is a 1 cycle wait inserted _if_ the display controller was using
 that memory at that moment.
 ### Video RAM (VRAM)
 * **Location:** `0x600_0000` to `0x601_7FFF` (96k or 64k+32k depending on mode)
 * **Bus:** 16-bit
 * **Access:** Read any, single bytes _sometimes_ mirrored (see text).
 * **Wait Cycles:** Video Memory Wait (see text)
 Video RAM is the memory for what you want the display controller to be
 displaying. The GBA actually has 6 different display modes (numbered 0 through
 5), and depending on the mode you're using the layout that you should imagine
 VRAM having changes. Because there's so much involved here, I'll leave more
 precise details to the following sections which talk about how to use VRAM in
 each mode.
 VRAM can't be written to in individual bytes. If you try to write a single byte
 to background VRAM the byte gets mirrored like with PALRAM, and if you try with
 object VRAM the write gets ignored entirely. Exactly what address ranges those
 memory types are depends on video mode, but just don't bother with individual
 byte writes to VRAM. If you want to change a single byte of data (and you might)
 then the correct style is to read the full `u16`, mask out the old data, mask in
 your new value, and then write the whole `u16`.
 VRAM follows the same "Video Memory Wait" rule that PALRAM has.
 ### Object Attribute Memory (OAM)
 * **Location:** `0x700_0000` to `0x700_03FF` (1k)
 * **Bus:** 32-bit
 * **Access:** Read any, single bytes no effect (see text).
 * **Wait Cycles:** Video Memory Wait (see text)
 This part of memory controls the "Objects" (OBJ) on the screen. An object is
 _similar to_ the concept of a "sprite". However, because of an object's size
 limitations, a single sprite might require more than one object to be drawn
 properly. In general, if you want to think in terms of sprites at all, you
 should think of sprites as being a logical / programming concept, and objects as
 being a hardware concept.
 While VRAM has the _image_ data for each object, this part of memory has the
 _control_ data for each object. An objects "attributes" describe what part of
 the VRAM to use, where to place is on the screen, any special graphical effects
 to use, all that stuff. Each object has 6 bytes of attribute data (arranged as
 three `u16` values), and there's a total of 128 objects (indexed 0 through 127).
 But 6 bytes each times 128 entries out of 1024 bytes leaves us with 256 bytes
 left over. What's the other space used for? Well, it's a little weird, but after
 every three `u16` object attribute fields there's one `i16` "affine parameter"
 field mixed in. It takes four such fields to make a complete set of affine
 parameters (a 2x2 matrix), so we get a total of 32 affine parameter entries
 across all of OAM. "Affine" might sound fancy but it just means a transformation
 where anything that started parallel stays parallel after the transform. The
 affine parameters can be used to scale, rotate, and/or skew a background or
 object as it's being displayed on the screen. It takes more computing power than
 the non-affine display, so you can't display as many different things at once
 when using the affine modes.
 OAM can't ever be written to with individual bytes. The write just has no effect
 at all.
 OAM follows the same "Video Memory Wait" rule that PALRAM has, **and** you can
 also only freely access OAM during a horizontal blank if you set a special
 "HBlank Interval Free" bit in one of the IO registers (the "Display Control"
 register, which we'll talk about next lesson). The reason that you might _not_
 want to set that bit is because when it's enabled you can't draw as many objects
 at once. You don't lose the use of an exact number of objects, you actually lose
 the use of a number of display adapter drawing cycles. Since not all objects
 take the same number of cycles to render, it depends on what you're drawing.
 GBATEK [has the details](https://problemkaputt.de/gbatek.htm#lcdobjoverview) if
 you want to know precisely.
 ### Game Pak ROM (ROM)
 * **Location:** Special (max of 32MB)
 * **Bus:** 16-bit
 * **Access:** Special
 * **Wait Cycles:** Special
 This is where your actual game is located! As you might guess, since each
 cartridge is different, the details here depend quite a bit on the cartridge
 that you use for your game. Even a simple statement like "you can't write to the
 ROM region" isn't true for some carts if they have FlashROM.
 The _most important_ thing to concern yourself with when considering the ROM
 portion of memory is the 32MB limit. That's compiled code, images, sound,
 everything put together. The total has to stay under 32MB.
 The next most important thing to consider is that 16-bit bus. It means that we
 compile our programs using "Thumb state" code instead of "ARM state" code.
 Details about this can be found in the GBA Assembly section of the book, but
 just be aware that there's two different types of assembly on the GBA. You can
 switch between them, but the default for us is always Thumb state.
 Another detail which you actually _don't_ have to think about much, but that you
 might care if you're doing precise optimization, is that the ROM address space
 is actually mirrored across three different locations:
 * `0x800_0000` to `0x9FF_FFFF`: Wait State 0
 * `0xA00_0000` to `0xBFF_FFFF`: Wait State 1
 * `0xC00_0000` to `0xDFF_FFFF`: Wait State 2
 These _don't_ mean 0, 1, and 2 wait cycles, they mean the wait cycles associated
 with ROM mirrors 0, 1, and 2. On some carts the game will store different parts
 of the data into different chips that are wired to be accessible through
 different parts of the mirroring. The actual wait cycles used are even
 configurable via an IO register called the
 [WAITCNT](https://problemkaputt.de/gbatek.htm#gbasystemcontrol) ("Wait Control",
 I don't know why C programmers have to give everything the worst names it's not
 1980 any more).
 ### Save RAM (SRAM)
 * **Location:** Special (max of 64k)
 * **Bus:** 8-bit
 * **Access:** Special
 * **Wait Cycles:** Special
 The Save RAM is also part of the cart that you've got your game on, so it also
 depends on your hardware.
 SRAM _starts_ at `0xE00_0000` and you can save up to however much the hardware
 supports, to a maximum of 64k. However, you can only read and write SRAM one
 _byte_ at a time. What's worse, while you can _write_ to SRAM using code
 executing anywhere, you can only _read_ with code that's executing out of either
 Internal or External Work RAM, not from with code that's executing out of ROM.
 This means that you need to copy the code for doing the read into some scratch
 space (either at startup or on the fly, doesn't matter) and call that function
 you've carefully placed. It's a bit annoying, but soon enough a routine for it
 all will be provided in the `gba` crate and we won't have to worry too much
 about it.
 (TODO: Provide the routine that I just claimed we would provide.)
--- a/book/src/volatile.md
+++ b/book/src/volatile.md
@ -1,48 +0,0 @@
 # Volatile
 I know that you just got your first program running and you're probably excited
 to learn more about GBA stuff, but first we have to cover a subject that's not
 quite GBA specific.
 In the `hello_magic.rs` file we had these lines
 ```rust
    (0x600_0000 as *mut u16).offset(120 + 80 * 240).write_volatile(0x001F);
    (0x600_0000 as *mut u16).offset(136 + 80 * 240).write_volatile(0x03E0);
    (0x600_0000 as *mut u16).offset(120 + 96 * 240).write_volatile(0x7C00);
 ```
 You've probably seen or heard of the
 [write](https://doc.rust-lang.org/core/ptr/fn.write.html) function before, but
 you'd be excused if you've never heard of its cousin function,
 [write_volatile](https://doc.rust-lang.org/core/ptr/fn.write_volatile.html).
 What's the difference? Well, when the compiler sees normal reads and writes, it
 assumes that those go into plain old memory locations. CPU registers, RAM,
 wherever it is that the value's being placed. The compiler assumes that it's
 safe to optimize away some of the reads and writes, or maybe issue the reads and
 writes in a different order from what you wrote. Normally this is okay, and it's
 exactly what we want the compiler to be doing, quietly making things faster for us.
 However, some of the time we access values from parts of memory where it's
 important that each access happen, and in the exact order that we say. In our
 `hello_magic.rs` example, we're writing directly into the video memory of the
 display. The compiler sees that the rest of the Rust program never read out of
 those locations, so it might think "oh, we can skip those writes, they're
 pointless". It doesn't know that we're having a side effect besides just storing
 some value at an address.
 By declaring a particular read or write to be `volatile` then we can force the
 compiler to issue that access. Further, we're guaranteed that all `volatile`
 access will happen in exactly the order it appears in the program relative to
 other `volatile` access. However, non-volatile access can still be re-ordered
 relative to a volatile access. In other words, for parts of the memory that are
 volatile, we must _always_ use a volatile read or write for our program to
 perform properly.
 For exactly this reason, we've got the [voladdress](https://docs.rs/voladdress/)
 crate. It used to be part of the GBA crate, but it became big enough to break
 out into a stand alone crate. It doesn't even do too much, it just makes it a
 lot less error prone to accidentally forget to use volatile with our memory
 mapped addresses. We just call `read` and `write` on any `VolAddress` that we
 happen to see and the right thing will happen.
		`@ -1,3 +0,0 @@`
			`# IO Registers`

			* Address Span: `0x400_0000` to `0x400_03FE`