diff --git a/book/src/ch03/memory_game.md b/book/src/ch03/memory_game.md index ae854cd..0bb11af 100644 --- a/book/src/ch03/memory_game.md +++ b/book/src/ch03/memory_game.md @@ -1,4 +1,4 @@ -# memory_game +# Making A Memory Game For this example to show off our new skills we'll make a "memory" game. The idea is that there's some face down cards and you pick one, it flips, you pick a @@ -6,207 +6,310 @@ second, if they match they both go away, if they don't match they both turn back face down. The player keeps going until all the cards are gone, then we'll deal the cards again. -For this example, I started with the `light_cycle.rs` example and then just -copied it into a new file, `memory_game.rs`. Then I added most all the code from -the previous sections right into that file, so we'll assume that all those -definitions are in scope. +There are many steps to do to get such a simple seeming game going. In fact I +stumbled a bit myself when trying to get things set up and going despite having +written and explained all the parts so far. Accordingly, we'll take each part +very slowly, and review things as we build up our game. -## Getting Some Images - -First we need some images to show! Let's have one for our little selector thingy -that we'll move around to pick cards with. How about some little triangles at -the corner of a square like on a picture frame. +We'll start back with a nearly blank file, calling it `memory_game.rs`: ```rust -#[rustfmt::skip] -pub const CARD_SELECTOR: Tile4bpp = Tile4bpp { - data : [ - 0x44400444, - 0x44000044, - 0x40000004, - 0x00000000, - 0x00000000, - 0x40000004, - 0x44000044, - 0x44400444 - ] -}; -``` +#![feature(start)] +#![no_std] -That weird looking attribute keeps rustfmt from spreading out the values, so -that we can see it as an ASCII art. Now that we understand what an individual -tile looks like, let's add some mono-color squares. - -```rust -#[rustfmt::skip] -pub const FULL_ONE: Tile4bpp = Tile4bpp { - data : [ - 0x11111111, - 0x11111111, - 0x11111111, - 0x11111111, - 0x11111111, - 0x11111111, - 0x11111111, - 0x11111111, - ] -}; - -#[rustfmt::skip] -pub const FULL_TWO: Tile4bpp = Tile4bpp { - data : [ - 0x22222222, - 0x22222222, - 0x22222222, - 0x22222222, - 0x22222222, - 0x22222222, - 0x22222222, - 0x22222222 - ] -}; - -#[rustfmt::skip] -pub const FULL_THREE: Tile4bpp = Tile4bpp { - data : [ - 0x33333333, - 0x33333333, - 0x33333333, - 0x33333333, - 0x33333333, - 0x33333333, - 0x33333333, - 0x33333333 - ] -}; -``` - -We can control the rest with palbank selection. Since there's 16 palbanks, -that's 48 little colored squares we can make, and 16 different selector colors, -which should be plenty in both cases. - -## Setup The Images - -### Arrange the PALRAM - -Alright, so, as we went over, the first step is to make sure that we've got our -palette data in order. We'll be using this to set our palette values. - -```rust -pub fn set_bg_palette(slot: usize, color: u16) { - assert!(slot < 256); - unsafe { PALRAM_BG_BASE.offset(slot as isize).write(color) } +#[panic_handler] +fn panic(_info: &core::panic::PanicInfo) -> ! { + loop {} } -``` -Should the type of `slot` be changed to `u8` instead of `usize`? Well, maybe. -Let's not worry about it at the moment. - -Of course, we don't need to set the color black, all the values start as black. -We just need to set the other colors we'll be wanting. For this demo, we'll just -use the same basic colors for both the BG and Object stuff. - -```rust -pub fn init_palette() { - // palbank 0: black/white/gray - set_bg_palette(2, rgb16(31, 31, 31)); - set_bg_palette(3, rgb16(15, 15, 15)); - // palbank 1 is reds - set_bg_palette(1 * 16 + 1, rgb16(31, 0, 0)); - set_bg_palette(1 * 16 + 2, rgb16(22, 0, 0)); - set_bg_palette(1 * 16 + 3, rgb16(10, 0, 0)); - // palbank 2 is greens - set_bg_palette(2 * 16 + 1, rgb16(0, 31, 0)); - set_bg_palette(2 * 16 + 2, rgb16(0, 22, 0)); - set_bg_palette(2 * 16 + 3, rgb16(0, 10, 0)); - // palbank 2 is blues - set_bg_palette(3 * 16 + 1, rgb16(0, 0, 31)); - set_bg_palette(3 * 16 + 2, rgb16(0, 0, 22)); - set_bg_palette(3 * 16 + 3, rgb16(0, 0, 10)); - - // Direct copy all BG selections into OBJ palette too - let mut bgp = PALRAM_BG_BASE; - let mut objp = PALRAM_OBJECT_BASE; - for _ in 0..(4 * 16) { - objp.write(bgp.read()); - bgp = bgp.offset(1); - objp = objp.offset(1); +#[start] +fn main(_argc: isize, _argv: *const *const u8) -> isize { + loop { + // TODO the whole thing } } ``` -### Arrange the Objects +## Displaying A Background -So, time to think about objects. I'm thinking we'll have 13 objects in use. One -for the selector, and then 12 for the cards (a simple grid that's 4 wide and 3 -tall). +First let's try to get a background going. We'll display a simple checker +pattern just so that we know that we did something. -We want a way to easily clear away all the objects that we're not using, which -is all the slots starting at some index and then going to the end. +Remember, backgrounds have the following essential components: + +* Background Palette +* Background Tiles +* Screenblock +* IO Registers + +### Background Palette + +To write to the background palette memory we'll want to name a `VolatilePtr` for +it. We'll probably also want to be able to cast between different types either +right away or later in this program, so we'll add a method for that. ```rust -pub fn clear_objects_starting_with(base_slot: usize) { - let mut obj = ObjectAttributes::default(); - obj.set_rendering(ObjectRenderMode::Disabled); - for s in base_slot..128 { - set_object_attributes(s, obj); +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +#[repr(transparent)] +pub struct VolatilePtr(pub *mut T); +impl VolatilePtr { + pub unsafe fn read(&self) -> T { + core::ptr::read_volatile(self.0) + } + pub unsafe fn write(&self, data: T) { + core::ptr::write_volatile(self.0, data); + } + pub fn offset(self, count: isize) -> Self { + VolatilePtr(self.0.wrapping_offset(count)) + } + pub fn cast(self) -> VolatilePtr { + VolatilePtr(self.0 as *mut Z) } } ``` -Next we set out the positions of our cards. We set the tile data we need, and -then assign the object attributes to go with it. For this, we'll make the -position finder function be its own thing since we'll also need it for the card -selector to move around. Finally, we set our selector as being at position 0,0 -of the card grid. +Now we give ourselves an easy way to write a color into a palbank slot. ```rust -pub fn position_of_card(card_col: usize, card_row: usize) -> (u16, u16) { - (10 + card_col as u16 * 17, 5 + card_row as u16 * 15) +pub const BACKGROUND_PALETTE: VolatilePtr = VolatilePtr(0x500_0000 as *mut u16); + +pub fn set_bg_palette_4bpp(palbank: usize, slot: usize, color: u16) { + assert!(palbank < 16); + assert!(slot > 0 && slot < 16); + unsafe { + BACKGROUND_PALETTE + .cast::<[u16; 16]>() + .offset(palbank as isize) + .cast::() + .offset(slot as isize) + .write(color); + } +} +``` + +And of course we need to bring back in our ability to build color values, as +well as a few named colors to start us off: + +```rust +pub const fn rgb16(red: u16, green: u16, blue: u16) -> u16 { + blue << 10 | green << 5 | red } -pub fn arrange_cards() { - set_obj_tile_4bpp(1, FULL_ONE); - set_obj_tile_4bpp(2, FULL_TWO); - set_obj_tile_4bpp(3, FULL_THREE); - let mut obj = ObjectAttributes::default(); - obj.set_tile_index(2); // along with palbank0, this is a white card - for card_row in 0..3 { - for card_col in 0..4 { - let (col, row) = position_of_card(card_col, card_row); - obj.set_column(col); - obj.set_row(row); - set_object_attributes(1 + card_col as usize + (card_row as usize * 3), obj); +pub const WHITE: u16 = rgb16(31, 31, 31); +pub const LIGHT_GRAY: u16 = rgb16(25, 25, 25); +pub const DARK_GRAY: u16 = rgb16(15, 15, 15); +``` + +Which _finally_ allows us to set our palette colors in `main`: + +```rust +fn main(_argc: isize, _argv: *const *const u8) -> isize { + set_bg_palette_4bpp(0, 1, WHITE); + set_bg_palette_4bpp(0, 2, LIGHT_GRAY); + set_bg_palette_4bpp(0, 3, DARK_GRAY); +``` + +### Background Tiles + +So we'll want some light gray tiles and some dark gray tiles. We could use a +single tile and then swap it between palbanks to do the color selection, but for +now we'll just use two different tiles, since we've got tons of tile space to +spare. + +```rust +#[derive(Debug, Clone, Copy, Default)] +#[repr(transparent)] +pub struct Tile4bpp { + pub data: [u32; 8], +} + +pub const ALL_TWOS: Tile4bpp = Tile4bpp { + data: [ + 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, + ], +}; + +pub const ALL_THREES: Tile4bpp = Tile4bpp { + data: [ + 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, + ], +}; +``` + +And then we have to have a way to put the tiles into video memory: + +```rust +#[derive(Clone, Copy)] +#[repr(transparent)] +pub struct Charblock4bpp { + pub data: [Tile4bpp; 512], +} + +pub const VRAM: VolatilePtr = VolatilePtr(0x0600_0000 as *mut Charblock4bpp); + +pub fn set_bg_tile_4bpp(charblock: usize, index: usize, tile: Tile4bpp) { + assert!(charblock < 4); + assert!(index < 512); + unsafe { VRAM.offset(charblock as isize).cast::().offset(index as isize).write(tile) } +} +``` + +And finally, we can call that within `main`: + +```rust +fn main(_argc: isize, _argv: *const *const u8) -> isize { + // bg palette + set_bg_palette_4bpp(0, 1, WHITE); + set_bg_palette_4bpp(0, 2, LIGHT_GRAY); + set_bg_palette_4bpp(0, 3, DARK_GRAY); + // bg tiles + set_bg_tile_4bpp(0, 0, ALL_TWOS); + set_bg_tile_4bpp(0, 1, ALL_THREES); +``` + +### Setup A Screenblock + +Screenblocks are a little weird because they take the same space as the +charblocks (8 screenblocks per charblock). The GBA will let you mix and match +and it's up to you to keep it all straight. We're using tiles at the base of +charblock 0, so we'll place our screenblock at the base of charblock 1. + +First, we have to be able to make one single screenblock entry at a time: + +```rust +#[derive(Debug, Clone, Copy, Default)] +#[repr(transparent)] +pub struct RegularScreenblockEntry(u16); + +impl RegularScreenblockEntry { + pub const SCREENBLOCK_ENTRY_TILE_ID_MASK: u16 = 0b11_1111_1111; + pub fn from_tile_id(id: u16) -> Self { + RegularScreenblockEntry(id & Self::SCREENBLOCK_ENTRY_TILE_ID_MASK) + } +} +``` + +And then with 32x32 of these things we'll have a whole screenblock. Now, we +probably won't actually make values of the screenblock type itself, but we at +least need it to have the type declared with the correct size so that we can +move our pointers around by the right amount. + +```rust +#[derive(Clone, Copy)] +#[repr(transparent)] +pub struct RegularScreenblock { + pub data: [RegularScreenblockEntry; 32 * 32], +} +``` + +Alright, so, as I said those things are kinda big, we don't really want to be +building them up on the stack if we can avoid it, so we'll write one straight +into memory at the correct location. + +```rust +pub fn checker_screenblock(slot: usize, a_entry: RegularScreenblockEntry, b_entry: RegularScreenblockEntry) { + let mut p = VRAM.cast::().offset(slot as isize).cast::(); + let mut checker = true; + for _row in 0..32 { + for _col in 0..32 { + unsafe { p.write(if checker { a_entry } else { b_entry }) }; + p = p.offset(1); + checker = !checker; } + checker = !checker; } } - -pub fn init_selector() { - set_obj_tile_4bpp(0, CARD_SELECTOR); - let mut obj = ObjectAttributes::default(); - let (col, row) = position_of_card(0, 0); - obj.set_column(col); - obj.set_row(row); - set_object_attributes(0, obj); -} ``` -### Arrange the Background +And then we add this into `main` -TODO +```rust + // screenblock + let light_entry = RegularScreenblockEntry::from_tile_id(0); + let dark_entry = RegularScreenblockEntry::from_tile_id(1); + checker_screenblock(8, light_entry, dark_entry); +``` -## Shuffling The Cards +### Background IO Registers -TODO +Our most important step is of course the IO register step. There's four +different background layers, but each of them has the same format for their +control register. For the moment, all that we care about is being able to set +the "screen base block" value. -## Picking One Card +```rust +#[derive(Clone, Copy, Default, PartialEq, Eq)] +#[repr(transparent)] +pub struct BackgroundControlSetting(u16); -TODO +impl BackgroundControlSetting { + pub fn from_base_block(sbb: u16) -> Self { + BackgroundControlSetting(sbb << 8) + } +} -## Picking The Second Card +pub const BG0CNT: VolatilePtr = VolatilePtr(0x400_0008 as *mut BackgroundControlSetting); +``` -TODO +And... that's all it takes for us to be able to add a line into `main` -## Resetting The Game +```rust + // bg0 control + unsafe { BG0CNT.write(BackgroundControlSetting::from_base_block(8)) }; +``` -TODO +### Set The Display Control Register + +We're finally ready to set the display control register and get things going. + +We've slightly glossed over it so far, but when the GBA is first booted most +everything within the address space will be all zeroed. However, the display +control register has the "Force VBlank" bit enabled by the BIOS, giving you a +moment to put the memory in place that you'll need for the first frame. + +So, now that have got all of our memory set, we'll overwrite the initial +display control register value with what we'll call "just enable bg0". + +```rust +#[derive(Clone, Copy, Default, PartialEq, Eq)] +#[repr(transparent)] +pub struct DisplayControlSetting(u16); + +impl DisplayControlSetting { + pub const JUST_ENABLE_BG0: DisplayControlSetting = DisplayControlSetting(1 << 8); +} + +pub const DISPCNT: VolatilePtr = VolatilePtr(0x0400_0000 as *mut DisplayControlSetting); +``` + +And so finally we have a complete `main` + +```rust +#[start] +fn main(_argc: isize, _argv: *const *const u8) -> isize { + // bg palette + set_bg_palette_4bpp(0, 1, WHITE); + set_bg_palette_4bpp(0, 2, LIGHT_GRAY); + set_bg_palette_4bpp(0, 3, DARK_GRAY); + // bg tiles + set_bg_tile_4bpp(0, 0, ALL_TWOS); + set_bg_tile_4bpp(0, 1, ALL_THREES); + // screenblock + let light_entry = RegularScreenblockEntry::from_tile_id(0); + let dark_entry = RegularScreenblockEntry::from_tile_id(1); + checker_screenblock(8, light_entry, dark_entry); + // bg0 control + unsafe { BG0CNT.write(BackgroundControlSetting::from_base_block(8)) }; + // Display Control + unsafe { DISPCNT.write(DisplayControlSetting::JUST_ENABLE_BG0) }; + loop { + // TODO the whole thing + } +} +``` + +And _It works, Marty! It works!_ + +![screenshot_checkers](screenshot_checkers.png) + +We've got more to go, but we're well on our way. diff --git a/book/src/ch03/regular_objects.md b/book/src/ch03/regular_objects.md index ddc661b..f32c86f 100644 --- a/book/src/ch03/regular_objects.md +++ b/book/src/ch03/regular_objects.md @@ -123,14 +123,14 @@ everything all lined up and into place anyway. The final step is to assign the correct attributes to an object. Each object has three `u16` values that make up its overall attributes. -Before we go into the details, I want to remind you that the hardware will -attempt to process every single object every single frame, and also that all of -the GBA's memory is cleared to 0 at startup. Why do these two things matter -right now? As you'll see in a second an "all zero" set of object attributes -causes an 8x8 object to appear at 0,0 using object tile index 0. This is usually -_not_ what you want your unused objects to do. When your game first starts you -should take a moment to mark any objects you won't be using as objects to not -render. +Before we go into the details, I want to bring up that the hardware will attempt +to process every single object every single frame if the object layer is +enabled, and also that all of the GBA's object memory is cleared to 0 at +startup. Why do these two things matter right now? As you'll see in a second an +"all zero" set of object attributes causes an 8x8 object to appear at 0,0 using +object tile index 0. This is usually _not_ what you want your unused objects to +do. When your game first starts you should take a moment to mark any objects you +won't be using as objects to not render. ### ObjectAttributes.attr0 diff --git a/book/src/ch03/screenshot_checkers.png b/book/src/ch03/screenshot_checkers.png new file mode 100644 index 0000000..dc6d71a Binary files /dev/null and b/book/src/ch03/screenshot_checkers.png differ diff --git a/docs/ch00/index.html b/docs/ch00/index.html index 86f5140..555d97f 100644 --- a/docs/ch00/index.html +++ b/docs/ch00/index.html @@ -72,7 +72,7 @@
diff --git a/docs/ch01/hello1.html b/docs/ch01/hello1.html index 1db437b..7c49512 100644 --- a/docs/ch01/hello1.html +++ b/docs/ch01/hello1.html @@ -72,7 +72,7 @@
diff --git a/docs/ch01/hello2.html b/docs/ch01/hello2.html index 240d7eb..2acf3f0 100644 --- a/docs/ch01/hello2.html +++ b/docs/ch01/hello2.html @@ -72,7 +72,7 @@
diff --git a/docs/ch01/index.html b/docs/ch01/index.html index 0a98833..953dcc4 100644 --- a/docs/ch01/index.html +++ b/docs/ch01/index.html @@ -72,7 +72,7 @@
diff --git a/docs/ch01/io_registers.html b/docs/ch01/io_registers.html index 62b4d09..6f60d8b 100644 --- a/docs/ch01/io_registers.html +++ b/docs/ch01/io_registers.html @@ -72,7 +72,7 @@
diff --git a/docs/ch01/the_display_control_register.html b/docs/ch01/the_display_control_register.html index 1f6a15e..717bc87 100644 --- a/docs/ch01/the_display_control_register.html +++ b/docs/ch01/the_display_control_register.html @@ -72,7 +72,7 @@
diff --git a/docs/ch01/video_memory_intro.html b/docs/ch01/video_memory_intro.html index 09d44d1..1a75137 100644 --- a/docs/ch01/video_memory_intro.html +++ b/docs/ch01/video_memory_intro.html @@ -72,7 +72,7 @@
diff --git a/docs/ch01/volatile.html b/docs/ch01/volatile.html index 7a0a0d3..e88dfb4 100644 --- a/docs/ch01/volatile.html +++ b/docs/ch01/volatile.html @@ -72,7 +72,7 @@
diff --git a/docs/ch02/index.html b/docs/ch02/index.html index 77ca662..932b1be 100644 --- a/docs/ch02/index.html +++ b/docs/ch02/index.html @@ -72,7 +72,7 @@
diff --git a/docs/ch02/light_cycle.html b/docs/ch02/light_cycle.html index 3380812..e1e5fbd 100644 --- a/docs/ch02/light_cycle.html +++ b/docs/ch02/light_cycle.html @@ -72,7 +72,7 @@
diff --git a/docs/ch02/the_key_input_register.html b/docs/ch02/the_key_input_register.html index ac05277..f51ded3 100644 --- a/docs/ch02/the_key_input_register.html +++ b/docs/ch02/the_key_input_register.html @@ -72,7 +72,7 @@
diff --git a/docs/ch02/the_vcount_register.html b/docs/ch02/the_vcount_register.html index 8f5a0ce..6f74b78 100644 --- a/docs/ch02/the_vcount_register.html +++ b/docs/ch02/the_vcount_register.html @@ -72,7 +72,7 @@
diff --git a/docs/ch03/gba_memory_mapping.html b/docs/ch03/gba_memory_mapping.html index fdfc501..ce58985 100644 --- a/docs/ch03/gba_memory_mapping.html +++ b/docs/ch03/gba_memory_mapping.html @@ -72,7 +72,7 @@
diff --git a/docs/ch03/gba_prng.html b/docs/ch03/gba_prng.html new file mode 100644 index 0000000..b653867 --- /dev/null +++ b/docs/ch03/gba_prng.html @@ -0,0 +1,1232 @@ + + + + + + GBA PRNG - Rust GBA Guide + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + + +
+
+

GBA PRNG

+

You often hear of the "Random Number Generator" in video games. First of all, +usually a game doesn't have access to any source of "true randomness". On a PC +you can send out a web request to random.org which +uses atmospheric data, or even just point a camera at some lava +lamps. Even +then, the rate at which you'll want random numbers far exceeds the rate at which +those services can offer them up. So instead you'll get a pseudo-random number +generator and "seed" it with the true random data and then use that.

+

However, we don't even have that! On the GBA, we can't ask any external anything +what we should do for our initial seed. So we will not only need to come up with +a few PRNG options, but we'll also need to come up with some seed source +options. More than with other options within the book, I think this is an area +where you can tailor what you do to your specific game.

+

What is a Pseudo-random Number Generator?

+

For those of you who somehow read The Rust Book, plus possibly The Rustonomicon, +and then found this book, but somehow still don't know what a PRNG is... Well, +I don't think there are many such people. Still, we'll define it anyway I +suppose.

+
+

A PRNG is any mathematical process that takes an initial input (of some fixed +size) and then produces a series of outputs (of a possibly different size).

+
+

So, if you seed your PRNG with a 32-bit value you might get 32-bit values out or +you might get 16-bit values out, or something like that.

+

We measure the quality of a PRNG based upon:

+
    +
  1. Is the output range easy to work with? Most PRNG techniques that you'll +find these days are already hip to the idea that we'll have the fastest +operations with numbers that match our register width and all that, so +they're usually designed around power of two inputs and power of two outputs. +Still, every once in a while you might find some page old page intended for +compatibility with the rand() function in the C standard library that'll +talk about something crazy like having 15-bit PRNG outputs. Stupid as it +sounds, that's real. Avoid those. Whenever possible we want generators that +give us uniformly distributed u8, u16, u32, or whatever size value +we're producing. From there we can mold our random bits into whatever else we +need (eg: turning a u8 into a "1d6" roll).
  2. +
  3. How long does each generation cycle take? This can be tricky for us. A +lot of the top quality PRNGs you'll find these days are oriented towards +64-bit machines so they do a bunch of 64-bit operations. You can do that on +a 32-bit machine if you have to, and the compiler will automatically "lower" +the 64-bit operation into a series of 32-bit operations. What we'd really +like to pick is something that sticks to just 32-bit operations though, since +those will be our best candidates for fast results. We can use Compiler +Explorer and tell it to build for the +thumbv6m-none-eabi target to get a basic idea of what the ASM for a +generator looks like. That's not our exact target, but it's the closest +target that's shipped with the standard rust distribution.
  4. +
  5. What is the statistical quality of the output? This involves heavy +amounts of math. Since computers are quite good a large amounts of repeated +math you might wonder if there's programs for this already, and there are. +Many in fact. They take a generator and then run it over and over and perform +the necessary tests and report the results. I won't be explaining how to hook +our generators up to those tools, they each have their own user manuals. +However, if someone says that a generator "passes BigCrush" (the biggest +suite in TestU01) or "fails PractRand" or anything similar it's useful to +know what they're referring to. Example test suites include: + +
  6. +
+

Note that if a generator is called upon to produce enough output relative to its +state size it will basically always end up failing statistical tests. This means +that any generator with 32-bit state will always fail in any of those test sets. +The theoretical minimum state size for any generator at all to pass the +standard suites is 36 bits, but most generators need many more than that.

+

Generator Size

+

I've mostly chosen to discuss generators that are towards the smaller end of the +state size scale. In fact we'll be going over many generators that are below the +36-bit theoretical minimum to pass all those fancy statistical tests. Why so? +Well, we don't always need the highest possible quality generators.

+

"But Lokathor!", I can already hear you shouting. "I want the highest quality +randomness at all times! The game depends on it!", you cry out.

+

Well... does it? Like, really?

+

The GBA +Pokemon +games use a dead simple 32-bit LCG (we'll see it below). Then starting with +the DS they moved to also using Mersenne Twister, which also fails several +statistical tests and is one of the most predictable PRNGs around. Metroid +Fusion +has a 100% goofy PRNG system for enemies that would definitely never pass any +sort of statistics tests at all. But like, those games were still awesome. Since +we're never going to be keeping secrets safe with our PRNG, it's okay if we +trade in some quality for something else in return (we obviously don't want to +trade quality for nothing).

+

And you have to ask yourself: Where's the space used for the Metroid Fusion +PRNG? No where at all. They were already using everything involved for other +things too, so they're paying no extra cost to have the randomization they do. +How much does it cost Pokemon to throw in a 32-bit LCG? Just 4 bytes, might as +well. How much does it cost to add in a Mersenne Twister? ~2,500 bytes ya say? +I'm sorry what on Earth? Yeah, that sounds crazy, we're probably not doing +that one.

+

k-Dimensional Equidistribution

+

So, wait, why did the Pokemon developers add in the Mersenne Twister generator? +They're smart people, surely they had a reason. Obviously we can't know for +sure, but Mersenne Twister is terrible in a lot of ways, so what's its single +best feature? Well, that gets us to a funky thing called k-dimensional +equidistribution. Basically, if you take a generator's output and chop it down +to get some value you want, with uniform generator output you can always get a +smaller ranged uniform result (though sometimes you will have to reject a result +and run the generator again). Imagine you have a u32 output from your +generator. If you want a u16 value from that you can just pick either half. If +you want a [bool; 4] from that you can just pick four bits. However you wanna +do it, as long as the final form of random thing we're getting needs a number of +bits equal to or less than the number of bits that come out of a single +generator use, we're totally fine.

+

What happens if the thing you want to make requires more bits than a single +generator's output? You obviously have to run the generator more than once and +then stick two or more outputs together, duh. Except, that doesn't always work. +What I mean is that obviously you can always put two u8 side by side to get a +u16, but if you start with a uniform u8 generator and then you run it twice +and stick the results together you don't always get a uniform u16 generator. +Imagine a byte generator that just does state+=1 and then outputs the state. +It's not good by almost any standard, but it does give uniform output. Then we +run it twice in a row, put the two bytes together, and suddenly a whole ton of +potential u16 values can never be generated. That's what k-dimensional +equidistribution is all about. Every uniform output generator is 1-dimensional +equidistributed, but if you need to combine outputs and still have uniform +results then you need a higher k value. So why does Pokemon have Mersenne +Twister in it? Because it's got 623-dimensional equidistribution. That means +when you're combining PRNG calls for all those little IVs and Pokemon Abilities +and other things you're sure to have every potential pokemon actually be a +pokemon that the game can generate. Do you need that for most situations? +Absolutely not. Do you need it for pokemon? No, not even then, but a lot of the +hot new PRNGs have come out just within the past 10 years, so we can't fault +them too much for it.

+

TLDR: 1-dimensional equidistribution just means "a normal uniform generator", +and higher k values mean "you can actually combine up to k output chains and +maintain uniformity". Generators that aren't uniform to begin with effectively +have a k value of 0.

+

Other Tricks

+

Finally, some generators have other features that aren't strictly quantifiable. +Two tricks of note are "jump ahead" or "multiple streams":

+
    +
  • Jump ahead lets you advance the generator's state by some enormous number of +outputs in a relatively small number of operations.
  • +
  • Multi-stream generators have more than one output sequence, and then some part +of their total state space picks a "stream" rather than being part of the +actual seed, with each possible stream causing the potential output sequence +to be in a different order.
  • +
+

They're normally used as a way to do multi-threaded stuff (we don't care about +that on GBA), but another interesting potential is to take one world seed and +then split off a generator for each "type" of thing you'd use PRNG for (combat, +world events, etc). This can become quite useful, where you can do things like +procedurally generate a world region, and then when they leave the region you +only need to store a single generator seed and a small amount of "delta" +information for what the player changed there that you want to save, and then +when they come back you can regenerate the region without having stored much at +all. This is the basis for how old games with limited memory like +Starflight did their whole thing +(800 planets to explore on just to 5.25" floppy disks!).

+

How To Seed

+

Oh I bet you thought we could somehow get through a section without learning +about yet another IO register. Ha, wishful thinking.

+

There's actually not much involved. Starting at 0x400_0100 there's an array of +registers that go "data", "control", "data", "control", etc. TONC and GBATEK use +different names here, and we'll go by the TONC names because they're much +clearer:

+

+# #![allow(unused_variables)]
+#fn main() {
+pub const TM0D: VolatilePtr<u16> = VolatilePtr(0x400_0100 as *mut u16);
+pub const TM0CNT: VolatilePtr<u16> = VolatilePtr(0x400_0102 as *mut u16);
+
+pub const TM1D: VolatilePtr<u16> = VolatilePtr(0x400_0104 as *mut u16);
+pub const TM1CNT: VolatilePtr<u16> = VolatilePtr(0x400_0106 as *mut u16);
+
+pub const TM2D: VolatilePtr<u16> = VolatilePtr(0x400_0108 as *mut u16);
+pub const TM2CNT: VolatilePtr<u16> = VolatilePtr(0x400_010A as *mut u16);
+
+pub const TM3D: VolatilePtr<u16> = VolatilePtr(0x400_010C as *mut u16);
+pub const TM3CNT: VolatilePtr<u16> = VolatilePtr(0x400_010E as *mut u16);
+#}
+

Basically there's 4 timers, numbered 0 to 3. Each one has a Data register and a +Control register. They're all u16 and you can definitely read from all of +them normally, but then it gets a little weird. You can also write to the +Control portions normally, when you write to the Data portion of a timer that +writes the value that the timer resets to, without changing its current Data +value. So if TM0D is paused on some value other than 5 and you write 5 to +it, when you read it back you won't get a 5. When the next timer run starts +it'll begin counting at 5 instead of whatever value it currently reads as.

+

The Data registers are just a u16 number, no special bits to know about.

+

The Control registers are also pretty simple compared to most IO registers:

+
    +
  • 2 bits for the Frequency: 1, 64, 256, 1024. While active, the timer's +value will tick up once every frequency CPU cycles. On the GBA, 1 CPU cycle +is about 59.59ns (2^(-24) seconds). One display controller cycle is 280,896 +CPU cycles.
  • +
  • 1 bit for Cascade Mode: If this is on the timer doesn't count on its own, +instead it ticks up whenever the preceding timer overflows its counter (eg: +if t0 overflows, t1 will tick up if it's in cascade mode). You still have to +also enable this timer for it to do that (below). This naturally doesn't have +an effect when used with timer 0.
  • +
  • 3 bits that do nothing
  • +
  • 1 bit for Interrupt: Whenever this timer overflows it will signal an +interrupt. We still haven't gotten into interrupts yet (since you have to hand +write some ASM for that, it's annoying), but when we cover them this is how +you do them with timers.
  • +
  • 1 bit to Enable the timer. When you disable a timer it retains the current +value, but when you enable it again the value jumps to whatever its currently +assigned default value is.
  • +
+

+# #![allow(unused_variables)]
+#fn main() {
+#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
+#[repr(transparent)]
+pub struct TimerControl(u16);
+
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub enum TimerFrequency {
+  One = 0,
+  SixFour = 1,
+  TwoFiveSix = 2,
+  OneZeroTwoFour = 3,
+}
+
+impl TimerControl {
+  pub fn frequency(self) -> TimerFrequency {
+    match self.0 & 0b11 {
+      0 => TimerFrequency::One,
+      1 => TimerFrequency::SixFour,
+      2 => TimerFrequency::TwoFiveSix,
+      3 => TimerFrequency::OneZeroTwoFour,
+      _ => unreachable!(),
+    }
+  }
+  pub fn cascade_mode(self) -> bool {
+    self.0 & 0b100 > 0
+  }
+  pub fn interrupt(self) -> bool {
+    self.0 & 0b100_0000 > 0
+  }
+  pub fn enabled(self) -> bool {
+    self.0 & 0b1000_0000 > 0
+  }
+  //
+  pub fn set_frequency(&mut self, frequency: TimerFrequency) {
+    self.0 &= !0b11;
+    self.0 |= frequency as u16;
+  }
+  pub fn set_cascade_mode(&mut self, bit: bool) {
+    if bit {
+      self.0 |= 0b100;
+    } else {
+      self.0 &= !0b100;
+    }
+  }
+  pub fn set_interrupt(&mut self, bit: bool) {
+    if bit {
+      self.0 |= 0b100_0000;
+    } else {
+      self.0 &= !0b100_0000;
+    }
+  }
+  pub fn set_enabled(&mut self, bit: bool) {
+    if bit {
+      self.0 |= 0b1000_0000;
+    } else {
+      self.0 &= !0b1000_0000;
+    }
+  }
+}
+#}
+

A Timer Based Seed

+

Okay so how do we turns some timers into a PRNG seed? Well, usually our seed is +a u32. So we'll take two timers, string them together with that cascade deal, +and then set them off. Then we wait until the user presses any key. We probably +do this as our first thing at startup, but we might show the title and like a +"press any key to continue" message, or something.

+

+# #![allow(unused_variables)]
+#fn main() {
+/// Mucks with the settings of Timers 0 and 1.
+unsafe fn u32_from_user_wait() -> u32 {
+  let mut t = TimerControl::default();
+  t.set_enabled(true);
+  t.set_cascading(true);
+  TM1CNT.write(t.0);
+  t.set_cascading(false);
+  TM0CNT.write(t.0);
+  while key_input().0 == 0 {}
+  t.set_enabled(false);
+  TM0CNT.write(t.0);
+  TM1CNT.write(t.0);
+  let low = TM0D.read() as u32;
+  let high = TM1D.read() as u32;
+  (high << 32) | low
+}
+#}
+

Various Generators

+

SM64 (16-bit state, 16-bit output, non-uniform, bonkers)

+

Our first PRNG to mention isn't one that's at all good, but it sure might be +cute to use. It's the PRNG that Super Mario 64 had (video explanation, +long).

+

With a PRNG this simple the output of one call is also the seed to the next +call, so we don't need to make a struct for it or anything. You're also assumed +to just seed with a plain 0 value at startup. The generator has a painfully +small period, and you're assumed to be looping through the state space +constantly while the RNG goes.

+

+# #![allow(unused_variables)]
+#fn main() {
+pub fn sm64(mut input: u16) -> u16 {
+  if input == 0x560A {
+    input = 0;
+  }
+  let mut s0 = input << 8;
+  s0 ^= input;
+  input = s0.rotate_left(8);
+  s0 = ((s0 as u8) << 1) as u16 ^ input;
+  let s1 = (s0 >> 1) ^ 0xFF80;
+  if (s0 & 1) == 0 {
+    if s1 == 0xAA55 {
+        input = 0;
+    } else {
+        input = s1 ^ 0x1FF4;
+    }
+  } else {
+    input = s1 ^ 0x8180;
+  }
+  input
+}
+#}
+

Compiler Explorer

+

If you watch the video explanation about this generator you'll note that the +first if checking for 0x560A prevents you from being locked into a 2-step +cycle, but it's only important if you want to feed bad seeds to the generator. A +bad seed is unhelpfully defined defined as "any value that the generator can't +output". The second if that checks for 0xAA55 doesn't seem to be important +at all from a mathematical perspective. It cuts the generator's period shorter +by an arbitrary amount for no known reason. It's left in there only for +authenticity.

+

LCG32 (32-bit state, 32-bit output, uniform)

+

The Linear Congruential +Generator is a +well known PRNG family. You pick a multiplier and an additive and you're done. +Right? Well, not exactly, because (as the wikipedia article explains) the values +that you pick can easily make your LCG better or worse all on its own. You want +a good multiplier, and you want your additive to be odd. In our example here +we've got the values that +Bulbapedia +says were used in the actual GBA Pokemon games, though Bulbapedia also lists +values for a few other other games as well.

+

I don't actually know if any of the constants used in the official games are +particularly good from a statistical viewpoint, though with only 32 bits an LCG +isn't gonna be passing any of the major statistical tests anyway (you need way +more bits in your LCG for that to happen). In my mind the main reason to use a +plain LCG like this is just for the fun of using the same PRNG that an official +Pokemon game did.

+

You should not use this as your default generator if you care about quality.

+

It is very fast though... if you want to set everything else on fire for +speed. If you do, please at least remember that the highest bits are the best +ones, so if you're after less than 32 bits you should shift the high ones down +and keep those, or if you want to turn it into a bool cast to i32 and then +check if it's negative, etc.

+

+# #![allow(unused_variables)]
+#fn main() {
+pub fn lcg32(seed: u32) -> u32 {
+  seed.wrapping_mul(0x41C6_4E6D).wrapping_add(0x6073)
+}
+#}
+

Compiler Explorer

+

Multi-stream Generators

+

Note that you don't have to add a compile time constant, you could add a runtime +value instead. Doing so allows the generator to be "multi-stream", with each +different additive value being its own unique output stream. This true of LCGs +as well as all the PCGs below (since they're LCG based). The examples here just +use a fixed stream for simplicity and to save space, but if you want streams you +can add that in for only a small amount of extra space used:

+

+# #![allow(unused_variables)]
+#fn main() {
+pub fn lcg_streaming(seed: u32, stream: u32) -> u32 {
+  seed.wrapping_mul(0x41C6_4E6D).wrapping_add(stream)
+}
+#}
+

With a streaming LCG you should pass the same stream value every single time. If +you don't, then your generator will jump between streams in some crazy way and +you lose your nice uniformity properties.

+

There is the possibility of intentionally changing the stream value exactly when +the seed lands on a pre-determined value (after the multiply and add). This +basically makes the stream selection value's bit size (minus one bit, because +it must be odd) count into the LCG's state bit size for calculating the overall +period of the generator. So an LCG32 with a 32-bit stream selection would have a +period of 2^32 * 2^31 = 2^63.

+

+# #![allow(unused_variables)]
+#fn main() {
+let next_seed = lcg_streaming(seed, stream);
+// It's cheapest to test for 0, so we pick 0
+if seed == 0 {
+  stream = stream.wrapping_add(2)
+}
+#}
+

However, this isn't a particularly effective way to extend the generator's +period, and we'll see a much better extension technique below.

+

PCG16 XSH-RS (32-bit state, 16-bit output, uniform)

+

The Permuted Congruential +Generator family +is the next step in LCG technology. We start with LCG output, which is good but +not great, and then we apply one of several possible permutations to bump up the +quality. There's basically a bunch of permutation components that are each +defined in terms of the bit width that you're working with.

+

The "default" variant of PCG, PCG32, has 64 bits of state and 32 bits of output, +and it uses the "XSH-RR" permutation. Here we'll put together a 32 bit version +with 16-bit output, and using the "XSH-RS" permutation (but we'll show the other +one too for comparison).

+

Of course, since PCG is based on a LCG, we have to start with a good LCG base. +As I said above, a better or worse set of LCG constants can make your generator +better or worse. The Wikipedia example for PCG has a good 64-bit constant, but +not a 32-bit constant. So we gotta ask an +expert +about what a good 32-bit constant would be. I'm definitely not the best at +reading math papers, but it seems that the general idea is that we want m % 8 == 5 and is_even(a) to both hold for the values we pick. There are three +suggested LCG multipliers in a chart on page 10. A chart that's quite hard to +understand. Truth be told I asked several folks that are good at math papers and +even they couldn't make sense of the chart. Eventually timutable read the +whole paper in depth and concluded the same as I did: that we probably want to +pick the 32310901 option.

+

For an additive value, we can pick any odd value, so we might as well pick +something small so that we can do an immediate add. Immediate add? That sounds +new. An immediate instruction is when one side of an operation is small enough +that you can encode the value directly into the space that'd normally be for the +register you want to use. It basically means one less load you have to do, if +you're working with small enough numbers. To see what I mean compare loading +the add value and immediate add +value. It's something you might have seen +frequently in x86 or x86_64 ASM output, but because a thumb instruction is +only 16 bits total, we can only get immediate instructions if the target value +is 8 bits or less, so we haven't used them too much ourselves yet.

+

I guess we'll pick 5, because I happen to personally like the number.

+

+# #![allow(unused_variables)]
+#fn main() {
+// Demo only. The "default" PCG permutation, for use when rotate is cheaper
+pub fn pcg16_xsh_rr(seed: &mut u32) -> u16 {
+  *seed = seed.wrapping_mul(32310901).wrapping_add(5);
+  const INPUT_SIZE: u32 = 32;
+  const OUTPUT_SIZE: u32 = 16;
+  const ROTATE_BITS: u32 = 4;
+  let mut out32 = *seed;
+  let rot = out32 >> (INPUT_SIZE - ROTATE_BITS);
+  out32 ^= out32 >> ((OUTPUT_SIZE + ROTATE_BITS) / 2);
+  ((out32 >> (OUTPUT_SIZE - ROTATE_BITS)) as u16).rotate_right(rot)
+}
+
+// This has slightly worse statistics but runs much better on the GBA
+pub fn pcg16_xsh_rs(seed: &mut u32) -> u16 {
+  *seed = seed.wrapping_mul(32310901).wrapping_add(5);
+  const INPUT_SIZE: u32 = 32;
+  const OUTPUT_SIZE: u32 = 16;
+  const SHIFT_BITS: u32 = 2;
+  const NEXT_MOST_BITS: u32 = 19;
+  let mut out32 = *seed;
+  let shift = out32 >> (INPUT_SIZE - SHIFT_BITS);
+  out32 ^= out32 >> ((OUTPUT_SIZE + SHIFT_BITS) / 2);
+  (out32 >> (NEXT_MOST_BITS + shift)) as u16
+}
+#}
+

Compiler Explorer

+

PCG32 RXS-M-XS (32-bit state, 32-bit output, uniform)

+

Having the output be smaller than the input is great because you can keep just +the best quality bits that the LCG stage puts out, and you basically get 1 point +of dimensional equidistribution for each bit you discard as the size goes down +(so 32->16 gives 16). However, if your output size has to the the same as your +input size, the PCG family is still up to the task.

+

+# #![allow(unused_variables)]
+#fn main() {
+pub fn pcg32_rxs_m_xs(seed: &mut u32) -> u32 {
+  *seed = seed.wrapping_mul(32310901).wrapping_add(5);
+  let mut out32 = *seed;
+  let rxs = out32 >> 28;
+  out32 ^= out32 >> (4 + rxs);
+  const PURE_MAGIC: u32 = 277803737;
+  out32 *= PURE_MAGIC;
+  out32^ (out32 >> 22)
+}
+#}
+

Compiler Explorer

+

This permutation is the slowest but gives the strongest statistical benefits. If +you're going to be keeping 100% of the output bits you want the added strength +obviously. However, the period isn't actually any longer, so each output will be +given only once within the full period (1-dimensional equidistribution).

+

PCG Extension Array

+

As a general improvement to any PCG you can hook on an "extension array" to give +yourself a longer period. It's all described in the PCG +Paper, but here's the bullet points:

+
    +
  • In addition to your generator's state (and possible stream) you keep an array +of "extension" values. The array type is the same as your output type, and +the array count must be a power of two value that's less than the maximum +value of your state size.
  • +
  • When you run the generator, use the lowest bits to select from your +extension array according to the array's power of two. Eg: if the size is 2 +then use the single lowest bit, if it's 4 then use the lowest 2 bits, etc.
  • +
  • Every time you run the generator, XOR the output with the selected value from +the array.
  • +
  • Every time the generator state lands on 0, cycle the array. We want to be +careful with what we mean here by "cycle". We want the entire pattern of +possible array bits to occur eventually. However, we obviously can't do +arbitrary adds for as many bits as we like, so we'll have to "carry the 1" +between the portions of the array by hand.
  • +
+

Here's an example using an 8 slot array and pcg16_xsh_rs:

+

+# #![allow(unused_variables)]
+#fn main() {
+// uses pcg16_xsh_rs from above
+
+pub struct PCG16Ext8 {
+  state: u32,
+  ext: [u16; 8],
+}
+
+impl PCG16Ext8 {
+  pub fn next_u16(&mut self) -> u16 {
+    // PCG as normal.
+    let mut out = pcg16_xsh_rs(&mut self.state);
+    // XOR with a selected extension array value
+    out ^= unsafe { self.ext.get_unchecked((self.state & !0b111) as usize) };
+    // if state == 0 we cycle the array with a series of overflowing adds
+    if self.state == 0 {
+      let mut carry = true;
+      let mut index = 0;
+      while carry && index < self.ext.len() {
+        let (add_output, next_carry) = self.ext[index].overflowing_add(1);
+        self.ext[index] = add_output;
+        carry = next_carry;
+        index += 1;
+      }
+    }
+    out
+  }
+}
+#}
+

Compiler Explorer

+

The period gained from using an extension array is quite impressive. For a b-bit +generator giving r-bit outputs, and k array slots, the period goes from 2^b to +2^(k*r+b). So our 2^32 period generator has been extended to 2^160.

+

Of course, we might care to seed the array itself so that it's not all 0 bits +all the way though, but that's not strictly necessary. All 0s is a legitimate +part of the extension cycle, so we have to pass through it at some point.

+

Xoshiro128** (128-bit state, 32-bit output, non-uniform)

+

The Xoshiro128** generator is +an advancement of the Xorshift family. +It was specifically requested, and I'm not aware of Xorshift specifically being +used in any of my favorite games, so instead of going over Xorshift and then +leading up to this, we'll just jump straight to this. Take care not to confuse +this generator with the very similarly named +Xoroshiro128** generator, +which is the 64 bit variant. Note the extra "ro" hiding in the 64-bit version's +name near the start.

+

Anyway, weird names aside, it's fairly zippy. The biggest downside is that you +can't have a seed state that's all 0s, and as a result 0 will be produced one +less time than all other outputs within a full cycle, making it non-uniform by +just a little bit. You also can't do a simple stream selection like with the LCG +based generators, instead it has a fixed jump function that advances a seed as +if you'd done 2^64 normal generator advancements.

+

Note that Xoshiro256** is known to fail statistical tests, so the 128 version +is unlikely to pass them, though I admit that I didn't check myself.

+

+# #![allow(unused_variables)]
+#fn main() {
+pub fn xoshiro128_starstar(seed: &mut [u32; 4]) -> u32 {
+  let output = seed[0].wrapping_mul(5).rotate_left(7).wrapping_mul(9);
+  let t = seed[1] << 9;
+
+  seed[2] ^= seed[0];
+  seed[3] ^= seed[1];
+  seed[1] ^= seed[2];
+  seed[0] ^= seed[3];
+
+  seed[2] ^= t;
+
+  seed[3] = seed[3].rotate_left(11);
+
+  output
+}
+
+pub fn xoshiro128_starstar_jump(seed: &mut [u32; 4]) {
+  const JUMP: [u32; 4] = [0x8764000b, 0xf542d2d3, 0x6fa035c3, 0x77f2db5b];
+  let mut s0 = 0;
+  let mut s1 = 0;
+  let mut s2 = 0;
+  let mut s3 = 0;
+  for j in JUMP.iter() {
+    for b in 0 .. 32 {
+        if *j & (1 << b) > 0 {
+            s0 ^= seed[0];
+            s1 ^= seed[1];
+            s2 ^= seed[2];
+            s3 ^= seed[3];
+        }
+        xoshiro128_starstar(seed);
+    }
+  }
+  seed[0] = s0;
+  seed[1] = s1;
+  seed[2] = s2;
+  seed[3] = s3;
+}
+#}
+

Compiler Explorer

+

jsf32 (128-bit state, 32-bit output, non-uniform)

+

This is Bob Jenkins's [Small/Fast PRNG](small noncryptographic PRNG). It's a +little faster than Xoshiro128** (no multiplication involved), and can pass any +statistical test that's been thrown at it.

+

Interestingly the generator's period is not fixed based on the generator +overall. It's actually set by the exact internal generator state. There's even +six possible internal generator states where the generator becomes a fixed +point. Because of this, we should use the verified seeding method provided. +Using the provided seeding, the minimum period is expected to be 2^94, the +average is about 2^126, and no seed given to the generator is likely to overlap +with another seed's output for at least 2^64 uses.

+

+# #![allow(unused_variables)]
+#fn main() {
+pub struct JSF32 {
+  a: u32,
+  b: u32,
+  c: u32,
+  d: u32,
+}
+
+impl JSF32 {
+  pub fn new(seed: u32) -> Self {
+    let mut output = JSF32 {
+      a: 0xf1ea5eed,
+      b: seed,
+      c: seed,
+      d: seed
+    };
+    for _ in 0 .. 20 {
+      output.next();
+    }
+    output
+  }
+
+  pub fn next(&mut self) -> u32 {
+    let e = self.a - self.b.rotate_left(27);
+    self.a = self.b ^ self.c.rotate_left(17);
+    self.b = self.c + self.d;
+    self.c = self.d + e;
+    self.d = e + self.a;
+    self.d
+  }
+}
+#}
+

Compiler Explorer

+

Here it's presented with (27,17), but you can also use any of the following if +you want alternative generator flavors that use this same core technique:

+
    +
  • (9,16), (9,24), (10,16), (10,24), (11,16), (11,24), (25,8), (25,16), (26,8), +(26,16), (26,17), or (27,16).
  • +
+

Note that these alternate flavors haven't had as much testing as the (27,17) +version, though they are likely to be just as good.

+

Other Generators?

+
    +
  • Mersenne Twister: Gosh, 2.5k +is just way too many for me to ever want to use this thing. If you'd really +like to use it, there is a +crate for it that +already has it. Small catch, they use a ton of stuff from std that they +could be importing from core, so you'll have to fork it and patch it +yourself to get it working on the GBA. They also stupidly depend on an old +version of rand, so you'll have to cut out that nonsense.
  • +
+

Placing a Value In Range

+

I said earlier that you can always take a uniform output and then throw out some +bits, and possibly the whole result, to reduce it down into a smaller range. How +exactly does one do that? Well it turns out that it's very +tricky to get right, and we +could be losing as much as 60% of our execution time if we don't do it carefully.

+

The best possible case is if you can cleanly take a specific number of bits +out of your result without even doing any branching. The rest can be discarded +or kept for another step as you choose. I know that I keep referencing Pokemon, +but it's a very good example for the use of randomization. Each pokemon has, +among many values, a thing called an "IV" for each of 6 stats. The IVs range +from 0 to 31, which is total nonsense to anyone not familiar with decimal/binary +conversions, but to us programmers that's clearly a 5 bit range. Rather than +making math that's better for people using decimal (such as a 1-20 range or +something like that) they went with what's easiest for the computer.

+

The next best case is if you can have a designated range that you want to +generate within that's known at compile time. This at least gives us a chance to +write some bit of extremely specialized code that can take random bits and get +them into range. Hopefully your range can be "close enough" to a binary range +that you can get things into place. Example: if you want a "1d6" result then you +can generate a u16, look at just 3 bits (0..8), and if they're in the range +you're after you're good. If not you can discard those and look at the next 3 +bits. We started with 16 of them, so you get five chances before you have to run +the generator again entirely.

+

The goal here is to avoid having to do one of the worst things possible in +computing: divmod. It's terribly expensive, even on a modern computer it's +about 10x as expensive as any other arithmetic, and on a GBA it's even worse for +us. We have to call into the BIOS to have it do a software division. Calling +into the BIOS at all is about a 60 cycle overhead (for comparison, a normal +function call is more like 30 cycles of overhead), plus the time it takes to +do the math itself. Remember earlier how we were happy to have a savings of 5 +instructions here or there? Compared to this, all our previous efforts are +basically useless if we can't evade having to do a divmod. You can do quite a +bit of if checking and potential additional generator calls before it exceeds +the cost of having to do even a single divmod.

+

Calling The BIOS

+

How do we do the actual divmod when we're forced to? Easy: inline +assembly of +course (There's also an ARM +oriented blog post +about it that I found most helpful). The GBA has many BIOS +Functions, each of which has +a designated number. We use the +swi +op (short for "SoftWare Interrupt") combined with the BIOS function number that +we want performed. Our code halts, some setup happens (hence that 60 cycles of +overhead I mentioned), the BIOS does its thing, and then eventually control +returns to us.

+

The precise details of what the BIOS call does depends on the function number +that we call. We'd even have to potentially mark it as volatile asm if there's +no clear outputs, otherwise the compiler would "helpfully" eliminate it for us +during optimization. In our case there are clear outputs. The numerator goes +into register 0, and the denominator goes into register 1, the divmod happens, +and then the division output is left in register 0 and the modulus output is +left in register 1. I keep calling it "divmod" because div and modulus are two +sides of the same coin. There's no way to do one of them faster by not doing the +other or anything like that, so we'll first define it as a unified function that +returns a tuple:

+

+# #![allow(unused_variables)]
+#![feature(asm)]
+#fn main() {
+// put the above at the top of any program and/or library that uses inline asm
+
+pub fn div_modulus(numerator: i32, denominator: i32) -> (i32, i32) {
+  assert!(denominator != 0);
+  {
+    let div_out: i32;
+    let mod_out: i32;
+    unsafe {
+      asm!(/* assembly template */ "swi 0x06"
+          :/* output operands */ "={r0}"(div_out), "={r1}"(mod_out)
+          :/* input operands */ "{r0}"(numerator), "{r1}"(denominator)
+          :/* clobbers */ "r3"
+          :/* options */
+    );
+    }
+    (div_out, mod_out)
+  }
+}
+#}
+

And next, since most of the time we really do want just the div or modulus +without having to explicitly throw out the other half, we also define +intermediary functions to unpack the correct values.

+

+# #![allow(unused_variables)]
+#fn main() {
+pub fn div(numerator: i32, denominator: i32) -> i32 {
+  div_modulus(numerator, denominator).0
+}
+
+pub fn modulus(numerator: i32, denominator: i32) -> i32 {
+  div_modulus(numerator, denominator).1
+}
+#}
+

We can generally trust the compiler to inline single line functions correctly +even without an #[inline] directive when it's not going cross-crate or when +LTO is on. I'd point you to some exact output from the Compiler Explorer, but at +the time of writing their nightly compiler is broken, and you can only use +inline asm with a nightly compiler. Unfortunate. Hopefully they'll fix it soon +and I can come back to this section with some links.

+

Finally Those Random Ranges We Mentioned

+

Of course, now that we can do divmod if we need to, let's get back to random +numbers in ranges that aren't exact powers of two.

+

yada yada yada, if you just use x % n to place x into the range 0..n then +you'll turn an unbiased value into a biased value (or you'll turn a biased value +into an arbitrarily more biased value). You should never do this, etc etc.

+

So what's a good way to get unbiased outputs? We're going to be adapting some +CPP code from that that I first hinted at way up above. It's specifically all +about the various ways you can go about getting unbiased random results for +various bounds. There's actually many different methods offered, and for +specific situations there's sometimes different winners for speed. The best +overall performer looks like this:

+
uint32_t bounded_rand(rng_t& rng, uint32_t range) {
+    uint32_t x = rng();
+    uint64_t m = uint64_t(x) * uint64_t(range);
+    uint32_t l = uint32_t(m);
+    if (l < range) {
+        uint32_t t = -range;
+        if (t >= range) {
+            t -= range;
+            if (t >= range) 
+                t %= range;
+        }
+        while (l < t) {
+            x = rng();
+            m = uint64_t(x) * uint64_t(range);
+            l = uint32_t(m);
+        }
+    }
+    return m >> 32;
+}
+
+

And, wow, I sure don't know what a lot of that means (well, I do, but let's +pretend I don't for dramatic effect, don't tell anyone). Let's try to pick it +apart some.

+

First, all the uint32_t and uint64_t are C nonsense names for what we just +call u32 and u64. You probably guessed that on your own.

+

Next, rng_t& rng is more properly written as rng: &rng_t. Though, here +there's a catch: as you can see we're calling rng within the function, so in +rust we'd need to declare it as rng: &mut rng_t, because C++ doesn't track +mutability the same as we do (barbaric, I know).

+

Finally, what's rng_t actually defined as? Well, I sure don't know, but in our +context it's taking nothing and then spitting out a u32. We'll also presume +that it's a different u32 each time (not a huge leap in this context). To us +rust programmers that means we'd want something like impl FnMut() -> u32.

+

+# #![allow(unused_variables)]
+#fn main() {
+pub fn bounded_rand(rng: &mut impl FnMut() -> u32, range: u32) -> u32 {
+  let mut x: u32 = rng();
+  let mut m: u64 = x as u64 * range as u64;
+  let mut l: u32 = m as u32;
+  if l < range {
+    let mut t: u32 = range.wrapping_neg();
+    if t >= range {
+      t -= range;
+      if t >= range {
+        t = modulus(t, range);
+      }
+    }
+    while l < t {
+      x = rng();
+      m = x as u64 * range as u64;
+      l = m as u32;
+    }
+  }
+  (m >> 32) as u32
+}
+#}
+

So, now we can read it. Can we compile it? No, actually. Turns out we can't. +Remember how our modulus function is (i32, i32) -> i32? Here we're doing +(u32, u32) -> u32. You can't just cast, modulus, and cast back. You'll get +totally wrong results most of the time because of sign-bit stuff. Since it's +fairly probable that range fits in a positive i32, its negation must +necessarily be a negative value, which triggers exactly the bad situation where +casting around gives us the wrong results.

+

Well, that's not the worst thing in the world either, since we also didn't +really wanna be doing those 64-bit multiplies. Let's try again with everything +scaled down one stage:

+

+# #![allow(unused_variables)]
+#fn main() {
+pub fn bounded_rand16(rng: &mut impl FnMut() -> u16, range: u16) -> u16 {
+  let mut x: u16 = rng();
+  let mut m: u32 = x as u32 * range as u32;
+  let mut l: u16 = m as u16;
+  if l < range {
+    let mut t: u16 = range.wrapping_neg();
+    if t >= range {
+      t -= range;
+      if t >= range {
+        t = modulus(t as i32, range as i32) as u16;
+      }
+    }
+    while l < t {
+      x = rng();
+      m = x as u32 * range as u32;
+      l = m as u16;
+    }
+  }
+  (m >> 16) as u16
+}
+#}
+

Okay, so the code compiles, and it plays nicely what the known limits of the +various number types involved. We know that if we cast a u16 up into i32 +it's assured to fit properly and also be positive, and the output is assured to +be smaller than the input so it'll fit when we cast it back down to u16. +What's even happening though? Well, this is a variation on Lemire's +method. One of the biggest attempts at a +speedup here is that when you have

+

+# #![allow(unused_variables)]
+#fn main() {
+a %= b;
+#}
+

You can translate that into

+

+# #![allow(unused_variables)]
+#fn main() {
+if a >= b {
+  a -= b;
+  if a >= b {
+    a %= b;
+  }
+}
+#}
+

Now... if we're being real with ourselves, let's just think about this for a +moment. How often will this help us? I genuinely don't know. But I do know how +to find out: we write a program to just enumerate all possible +cases +and run the code. You can't always do this, but there's not many possible u16 +values. The output is this:

+
skip_all:32767
+sub_worked:10923
+had_to_modulus:21846
+Some skips:
+32769
+32770
+32771
+32772
+32773
+Some subs:
+21846
+21847
+21848
+21849
+21850
+Some mods:
+0
+1
+2
+3
+4
+
+

So, about half the time, we're able to skip all our work, and about a sixth of +the time we're able to solve it with just the subtract, with the other third of +the time we have to do the mod. However, what I personally care about the most +is smaller ranges, and we can see that we'll have to do the mod if our target +range size is in 0..21846, and just the subtract if our target range size is +in 21846..32769, and we can only skip all work if our range size is 32769 +and above. So that's not cool.

+

But what is cool is that we're doing the modulus only once, and the rest of +the time we've just got the cheap operations. Sounds like we can maybe try to +cache that work and reuse a range of some particular size. We can also get that +going pretty easily.

+

+# #![allow(unused_variables)]
+#fn main() {
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub struct RandRangeU16 {
+  range: u16,
+  threshold: u16,
+}
+
+impl RandRangeU16 {
+  pub fn new(mut range: u16) -> Self {
+    let mut threshold = range.wrapping_neg();
+    if threshold >= range {
+      threshold -= range;
+      if threshold >= range {
+        threshold = modulus(threshold as i32, range as i32) as u16;
+      }
+    }
+    RandRangeU16 { range, threshold }
+  }
+
+  pub fn roll_random(&self, rng: &mut impl FnMut() -> u16) -> u16 {
+    let mut x: u16 = rng();
+    let mut m: u32 = x as u32 * self.range as u32;
+    let mut l: u16 = m as u16;
+    if l < self.range {
+      while l < self.threshold {
+        x = rng();
+        m = x as u32 * self.range as u32;
+        l = m as u16;
+      }
+    }
+    (m >> 16) as u16
+  }
+}
+#}
+

What if you really want to use ranges bigger than u16? Well, that's possible, +but we'd want a whole new technique. Preferably one that didn't do divmod at +all, to avoid any nastiness with sign bit nonsense. Thankfully there is one such +method listed in the blog post, "Bitmask with Rejection (Unbiased)"

+
uint32_t bounded_rand(rng_t& rng, uint32_t range) {
+    uint32_t mask = ~uint32_t(0);
+    --range;
+    mask >>= __builtin_clz(range|1);
+    uint32_t x;
+    do {
+        x = rng() & mask;
+    } while (x > range);
+    return x;
+}
+
+

And in Rust

+

+# #![allow(unused_variables)]
+#fn main() {
+pub fn bounded_rand32(rng: &mut impl FnMut() -> u32, mut range: u32) -> u32 {
+  let mut mask: u32 = !0;
+  range -= 1;
+  mask >>= (range | 1).leading_zeros();
+  let mut x = rng() & mask;
+  while x > range {
+    x = rng() & mask;
+  }
+  x
+}
+#}
+

Wow, that's so much less code. What the heck? Less code is supposed to be the +faster version, why is this rated slower? Basically, because of how the math +works out on how often you have to run the PRNG again and stuff, Lemire's method +usually better with smaller ranges and the masking method usually works +better with larger ranges. If your target range fits in a u8, probably use +Lemire's. If it's bigger than u8, or if you need to do it just once and can't +benefit from the cached modulus, you might want to start moving toward the +masking version at some point in there. Obviously if your target range is more +than a u16 then you have to use the masking method. The fact that they're each +oriented towards different size generator outputs only makes things more +complicated.

+

Life just be that way, I guess.

+

Summary Table

+

That was a whole lot. Let's put them in a table:

+ + + + + + + + +
Generator Bytes Output Period k-Dim
sm64 2 u16 65,114 0
lcg32 4 u16 2^32 1
pcg16_xsh_rs 4 u16 2^32 1
pcg32_rxs_m_xs 4 u32 2^32 1
PCG16Ext8 20 u16 2^160 8
xoshiro128** 16 u32 2^128-1 0
jsf32 16 u32 ~2^126 0
+ +
+ + +
+
+ + + +
+ + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/ch03/gba_rng.html b/docs/ch03/gba_rng.html deleted file mode 100644 index 1a48e10..0000000 --- a/docs/ch03/gba_rng.html +++ /dev/null @@ -1,201 +0,0 @@ - - - - - - GBA RNG - Rust GBA Guide - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
- - - - - - - - - - -
-
-

GBA RNG

-

TODO

- -
- - -
-
- - - -
- - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/ch03/index.html b/docs/ch03/index.html index 56c468b..7f33fbd 100644 --- a/docs/ch03/index.html +++ b/docs/ch03/index.html @@ -72,7 +72,7 @@
diff --git a/docs/ch03/memory_game.html b/docs/ch03/memory_game.html index e4aac3d..6797e77 100644 --- a/docs/ch03/memory_game.html +++ b/docs/ch03/memory_game.html @@ -72,7 +72,7 @@
@@ -136,15 +136,300 @@
-

memory_game

-

TODO

+

Making A Memory Game

+

For this example to show off our new skills we'll make a "memory" game. The idea +is that there's some face down cards and you pick one, it flips, you pick a +second, if they match they both go away, if they don't match they both turn back +face down. The player keeps going until all the cards are gone, then we'll deal +the cards again.

+

There are many steps to do to get such a simple seeming game going. In fact I +stumbled a bit myself when trying to get things set up and going despite having +written and explained all the parts so far. Accordingly, we'll take each part +very slowly, and review things as we build up our game.

+

We'll start back with a nearly blank file, calling it memory_game.rs:

+
#![feature(start)]
+#![no_std]
+
+#[panic_handler]
+fn panic(_info: &core::panic::PanicInfo) -> ! {
+  loop {}
+}
+
+#[start]
+fn main(_argc: isize, _argv: *const *const u8) -> isize {
+  loop {
+    // TODO the whole thing
+  }
+}
+
+

Displaying A Background

+

First let's try to get a background going. We'll display a simple checker +pattern just so that we know that we did something.

+

Remember, backgrounds have the following essential components:

+
    +
  • Background Palette
  • +
  • Background Tiles
  • +
  • Screenblock
  • +
  • IO Registers
  • +
+

Background Palette

+

To write to the background palette memory we'll want to name a VolatilePtr for +it. We'll probably also want to be able to cast between different types either +right away or later in this program, so we'll add a method for that.

+

+# #![allow(unused_variables)]
+#fn main() {
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+#[repr(transparent)]
+pub struct VolatilePtr<T>(pub *mut T);
+impl<T> VolatilePtr<T> {
+  pub unsafe fn read(&self) -> T {
+    core::ptr::read_volatile(self.0)
+  }
+  pub unsafe fn write(&self, data: T) {
+    core::ptr::write_volatile(self.0, data);
+  }
+  pub fn offset(self, count: isize) -> Self {
+    VolatilePtr(self.0.wrapping_offset(count))
+  }
+  pub fn cast<Z>(self) -> VolatilePtr<Z> {
+    VolatilePtr(self.0 as *mut Z)
+  }
+}
+#}
+

Now we give ourselves an easy way to write a color into a palbank slot.

+

+# #![allow(unused_variables)]
+#fn main() {
+pub const BACKGROUND_PALETTE: VolatilePtr<u16> = VolatilePtr(0x500_0000 as *mut u16);
+
+pub fn set_bg_palette_4bpp(palbank: usize, slot: usize, color: u16) {
+  assert!(palbank < 16);
+  assert!(slot > 0 && slot < 16);
+  unsafe {
+    BACKGROUND_PALETTE
+      .cast::<[u16; 16]>()
+      .offset(palbank as isize)
+      .cast::<u16>()
+      .offset(slot as isize)
+      .write(color);
+  }
+}
+#}
+

And of course we need to bring back in our ability to build color values, as +well as a few named colors to start us off:

+

+# #![allow(unused_variables)]
+#fn main() {
+pub const fn rgb16(red: u16, green: u16, blue: u16) -> u16 {
+  blue << 10 | green << 5 | red
+}
+
+pub const WHITE: u16 = rgb16(31, 31, 31);
+pub const LIGHT_GRAY: u16 = rgb16(25, 25, 25);
+pub const DARK_GRAY: u16 = rgb16(15, 15, 15);
+#}
+

Which finally allows us to set our palette colors in main:

+
fn main(_argc: isize, _argv: *const *const u8) -> isize {
+  set_bg_palette_4bpp(0, 1, WHITE);
+  set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
+  set_bg_palette_4bpp(0, 3, DARK_GRAY);
+
+

Background Tiles

+

So we'll want some light gray tiles and some dark gray tiles. We could use a +single tile and then swap it between palbanks to do the color selection, but for +now we'll just use two different tiles, since we've got tons of tile space to +spare.

+

+# #![allow(unused_variables)]
+#fn main() {
+#[derive(Debug, Clone, Copy, Default)]
+#[repr(transparent)]
+pub struct Tile4bpp {
+  pub data: [u32; 8],
+}
+
+pub const ALL_TWOS: Tile4bpp = Tile4bpp {
+  data: [
+    0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222,
+  ],
+};
+
+pub const ALL_THREES: Tile4bpp = Tile4bpp {
+  data: [
+    0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333, 0x33333333,
+  ],
+};
+#}
+

And then we have to have a way to put the tiles into video memory:

+

+# #![allow(unused_variables)]
+#fn main() {
+#[derive(Clone, Copy)]
+#[repr(transparent)]
+pub struct Charblock4bpp {
+  pub data: [Tile4bpp; 512],
+}
+
+pub const VRAM: VolatilePtr<Charblock4bpp> = VolatilePtr(0x0600_0000 as *mut Charblock4bpp);
+
+pub fn set_bg_tile_4bpp(charblock: usize, index: usize, tile: Tile4bpp) {
+  assert!(charblock < 4);
+  assert!(index < 512);
+  unsafe { VRAM.offset(charblock as isize).cast::<Tile4bpp>().offset(index as isize).write(tile) }
+}
+#}
+

And finally, we can call that within main:

+
fn main(_argc: isize, _argv: *const *const u8) -> isize {
+  // bg palette
+  set_bg_palette_4bpp(0, 1, WHITE);
+  set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
+  set_bg_palette_4bpp(0, 3, DARK_GRAY);
+  // bg tiles
+  set_bg_tile_4bpp(0, 0, ALL_TWOS);
+  set_bg_tile_4bpp(0, 1, ALL_THREES);
+
+

Setup A Screenblock

+

Screenblocks are a little weird because they take the same space as the +charblocks (8 screenblocks per charblock). The GBA will let you mix and match +and it's up to you to keep it all straight. We're using tiles at the base of +charblock 0, so we'll place our screenblock at the base of charblock 1.

+

First, we have to be able to make one single screenblock entry at a time:

+

+# #![allow(unused_variables)]
+#fn main() {
+#[derive(Debug, Clone, Copy, Default)]
+#[repr(transparent)]
+pub struct RegularScreenblockEntry(u16);
+
+impl RegularScreenblockEntry {
+  pub const SCREENBLOCK_ENTRY_TILE_ID_MASK: u16 = 0b11_1111_1111;
+  pub fn from_tile_id(id: u16) -> Self {
+    RegularScreenblockEntry(id & Self::SCREENBLOCK_ENTRY_TILE_ID_MASK)
+  }
+}
+#}
+

And then with 32x32 of these things we'll have a whole screenblock. Now, we +probably won't actually make values of the screenblock type itself, but we at +least need it to have the type declared with the correct size so that we can +move our pointers around by the right amount.

+

+# #![allow(unused_variables)]
+#fn main() {
+#[derive(Clone, Copy)]
+#[repr(transparent)]
+pub struct RegularScreenblock {
+  pub data: [RegularScreenblockEntry; 32 * 32],
+}
+#}
+

Alright, so, as I said those things are kinda big, we don't really want to be +building them up on the stack if we can avoid it, so we'll write one straight +into memory at the correct location.

+

+# #![allow(unused_variables)]
+#fn main() {
+pub fn checker_screenblock(slot: usize, a_entry: RegularScreenblockEntry, b_entry: RegularScreenblockEntry) {
+  let mut p = VRAM.cast::<RegularScreenblock>().offset(slot as isize).cast::<RegularScreenblockEntry>();
+  let mut checker = true;
+  for _row in 0..32 {
+    for _col in 0..32 {
+      unsafe { p.write(if checker { a_entry } else { b_entry }) };
+      p = p.offset(1);
+      checker = !checker;
+    }
+    checker = !checker;
+  }
+}
+#}
+

And then we add this into main

+

+# #![allow(unused_variables)]
+#fn main() {
+  // screenblock
+  let light_entry = RegularScreenblockEntry::from_tile_id(0);
+  let dark_entry = RegularScreenblockEntry::from_tile_id(1);
+  checker_screenblock(8, light_entry, dark_entry);
+#}
+

Background IO Registers

+

Our most important step is of course the IO register step. There's four +different background layers, but each of them has the same format for their +control register. For the moment, all that we care about is being able to set +the "screen base block" value.

+

+# #![allow(unused_variables)]
+#fn main() {
+#[derive(Clone, Copy, Default, PartialEq, Eq)]
+#[repr(transparent)]
+pub struct BackgroundControlSetting(u16);
+
+impl BackgroundControlSetting {
+  pub fn from_base_block(sbb: u16) -> Self {
+    BackgroundControlSetting(sbb << 8)
+  }
+}
+
+pub const BG0CNT: VolatilePtr<BackgroundControlSetting> = VolatilePtr(0x400_0008 as *mut BackgroundControlSetting);
+#}
+

And... that's all it takes for us to be able to add a line into main

+

+# #![allow(unused_variables)]
+#fn main() {
+  // bg0 control
+  unsafe { BG0CNT.write(BackgroundControlSetting::from_base_block(8)) };
+#}
+

Set The Display Control Register

+

We're finally ready to set the display control register and get things going.

+

We've slightly glossed over it so far, but when the GBA is first booted most +everything within the address space will be all zeroed. However, the display +control register has the "Force VBlank" bit enabled by the BIOS, giving you a +moment to put the memory in place that you'll need for the first frame.

+

So, now that have got all of our memory set, we'll overwrite the initial +display control register value with what we'll call "just enable bg0".

+

+# #![allow(unused_variables)]
+#fn main() {
+#[derive(Clone, Copy, Default, PartialEq, Eq)]
+#[repr(transparent)]
+pub struct DisplayControlSetting(u16);
+
+impl DisplayControlSetting {
+  pub const JUST_ENABLE_BG0: DisplayControlSetting = DisplayControlSetting(1 << 8);
+}
+
+pub const DISPCNT: VolatilePtr<DisplayControlSetting> = VolatilePtr(0x0400_0000 as *mut DisplayControlSetting);
+#}
+

And so finally we have a complete main

+
#[start]
+fn main(_argc: isize, _argv: *const *const u8) -> isize {
+  // bg palette
+  set_bg_palette_4bpp(0, 1, WHITE);
+  set_bg_palette_4bpp(0, 2, LIGHT_GRAY);
+  set_bg_palette_4bpp(0, 3, DARK_GRAY);
+  // bg tiles
+  set_bg_tile_4bpp(0, 0, ALL_TWOS);
+  set_bg_tile_4bpp(0, 1, ALL_THREES);
+  // screenblock
+  let light_entry = RegularScreenblockEntry::from_tile_id(0);
+  let dark_entry = RegularScreenblockEntry::from_tile_id(1);
+  checker_screenblock(8, light_entry, dark_entry);
+  // bg0 control
+  unsafe { BG0CNT.write(BackgroundControlSetting::from_base_block(8)) };
+  // Display Control
+  unsafe { DISPCNT.write(DisplayControlSetting::JUST_ENABLE_BG0) };
+  loop {
+    // TODO the whole thing
+  }
+}
+
+

And It works, Marty! It works!

+

screenshot_checkers