Regular Backgrounds

So, backgrounds, they're cool. Why do we call the ones here "regular" backgrounds? Because there's also "affine" backgrounds. However, affine math stuff adds a complication, so for now we'll just work with regular backgrounds. The non-affine backgrounds are sometimes called "text mode" backgrounds by other guides.

To get your background image working you generally need to perform all of the following steps, though I suppose the exact ordering is up to you.

Tiled Video Modes

When you want regular tiled display, you must use video mode 0 or 1.

  • Mode 0 allows for using all four BG layers (0 through 3) as regular backgrounds.
  • Mode 1 allows for using BG0 and BG1 as regular backgrounds, BG2 as an affine background, and BG3 not at all.
  • Mode 2 allows for BG2 and BG3 to be used as affine backgrounds, while BG0 and BG1 cannot be used at all.

We will not cover affine backgrounds in this chapter, so we will naturally be using video mode 0.

Also, note that you have to enable each background layer that you want to use within the display control register.

Get Your Palette Ready

Background palette starts at 0x5000000 and is 256 u16 values long. It'd potentially be possible declare a static array starting at a fixed address and use a linker script to make sure that it ends up at the right spot in the final program, but since we have to use volatile reads and writes with PALRAM anyway, we'll just reuse our VolatilePtr type. Something like this:


# #![allow(unused_variables)]
#fn main() {
pub const PALRAM_BG_BASE: VolatilePtr<u16> = VolatilePtr(0x500_0000 as *mut u16);

pub fn bg_palette(slot: usize) -> u16 {
  assert!(slot < 256);
  unsafe { PALRAM_BG_BASE.offset(slot as isize).read() }
}

pub fn set_bg_palette(slot: usize, color: u16) {
  assert!(slot < 256);
  unsafe { PALRAM_BG_BASE.offset(slot as isize).write(color) }
}
#}

As we discussed with the tile color depths, the palette can be utilized as a single block of palette values ([u16; 256]) or as 16 palbanks of 16 palette values each ([[u16;16]; 16]). This setting is assigned per background layer via IO register.

Get Your Tiles Ready

Tile data is placed into charblocks. A charblock is always 16kb, so depending on color depth it will have either 256 or 512 tiles within that charblock. Charblocks 0, 1, 2, and 3 are all for background tiles. That's a maximum of 2048 tiles for backgrounds, but as you'll see in a moment a particular tilemap entry can't even index that high. Instead, each background layer is assigned a "character base block", and then tilemap entries index relative to the character base block of that background layer.

Now, if you want to move in a lot of tile data you'll probably want to use a DMA routine, or at least write a function like memcopy32 for fast u32 copying from ROM into VRAM. However, for now, and because we're being very explicit since this is our first time doing it, we'll write it as functions for individual tile reads and writes.

The math works like indexing a pointer, except that we have two sizes we need to go by. First you take the base address for VRAM (0x600_0000), then add the size of a charblock (16kb) times the charblock you want to place the tile within, and then you add the index of the tile slot you're placing it into times the size of that type of tile. Like this:


# #![allow(unused_variables)]
#fn main() {
pub fn bg_tile_4bpp(base_block: usize, tile_index: usize) -> Tile4bpp {
  assert!(base_block < 4);
  assert!(tile_index < 512);
  let address = VRAM + size_of::<Charblock4bpp>() * base_block + size_of::<Tile4bpp>() * tile_index;
  unsafe { VolatilePtr(address as *mut Tile4bpp).read() }
}

pub fn set_bg_tile_4bpp(base_block: usize, tile_index: usize, tile: Tile4bpp) {
  assert!(base_block < 4);
  assert!(tile_index < 512);
  let address = VRAM + size_of::<Charblock4bpp>() * base_block + size_of::<Tile4bpp>() * tile_index;
  unsafe { VolatilePtr(address as *mut Tile4bpp).write(tile) }
}

pub fn bg_tile_8bpp(base_block: usize, tile_index: usize) -> Tile8bpp {
  assert!(base_block < 4);
  assert!(tile_index < 256);
  let address = VRAM + size_of::<Charblock8bpp>() * base_block + size_of::<Tile8bpp>() * tile_index;
  unsafe { VolatilePtr(address as *mut Tile8bpp).read() }
}

pub fn set_bg_tile_8bpp(base_block: usize, tile_index: usize, tile: Tile8bpp) {
  assert!(base_block < 4);
  assert!(tile_index < 256);
  let address = VRAM + size_of::<Charblock8bpp>() * base_block + size_of::<Tile8bpp>() * tile_index;
  unsafe { VolatilePtr(address as *mut Tile8bpp).write(tile) }
}
#}

For bulk operations, you'd do the exact same math to get your base destination pointer, and then you'd get the base source pointer for the tile you're copying out of ROM, and then you'd do the bulk copy for the correct number of u32 values that you're trying to move (8 per tile moved for 4bpp, or 16 per tile moved for 8bpp).

GBA Limitation Note: on a modern PC (eg: x86 or x86_64) you're probably used to index based loops and iterator based loops being the same speed. The CPU has the ability to do a "fused multiply add", so the base address of the array plus desired index * size per element is a single CPU operation to compute. It's slightly more complicated if there's arrays within arrays like there are here, but with normal arrays it's basically the same speed to index per loop cycle as it is to take a base address and then add +1 offset per loop cycle. However, the GBA's CPU can't do any of that. On the GBA, there's a genuine speed difference between looping over indexes and then indexing each loop (slow) compared to using an iterator that just stores an internal pointer and does +1 offset per loop until it reaches the end (fast). The repeated indexing itself can by itself be an expensive step. If it's like a 3 element array it's no big deal, but if you've got a big slice of data to process, be sure to go over it with .iter() and .iter_mut() if you can, instead of looping by index. This is Rust and all, so probably you were gonna do that anyway, but just a heads up.

Get your Tilemap ready

I believe that at one point I alluded to a tilemap existing. Well, just as the tiles are arranged into charblocks, the data describing what tile to show in what location is arranged into a thing called a screenblock.

A screenblock is placed into VRAM the same as the tile data charblocks. Starting at the base of VRAM (0x600_0000) there are 32 slots for the screenblock array. Each screenblock is 2048 bytes (0x800). Naturally, if our tiles are using up charblock space within VRAM and our tilemaps are using up screenblock space within the same VRAM... well it would just be a disaster if they ran in to each other. Once again, it's up to you as the programmer to determine how much space you want to devote to each thing. Each complete charblock uses up 8 screenblocks worth of space, but you don't have to fill a complete charblock with tiles, so you can be very fiddly with how you split the memory.

Each screenblock is composed of a series of screenblock entry values, which describe what tile index to use and if the tile should be flipped and what palbank it should use (if any). Because both regular backgrounds and affine backgrounds are composed of screenblocks with entries, and because the affine background has a smaller format for screenblock entries, we'll name appropriately.


# #![allow(unused_variables)]
#fn main() {
#[derive(Clone, Copy)]
#[repr(transparent)]
pub struct RegularScreenblock {
  pub data: [RegularScreenblockEntry; 32 * 32],
}

#[derive(Debug, Clone, Copy, Default)]
#[repr(transparent)]
pub struct RegularScreenblockEntry(u16);
#}

So, with one entry per tile, a single screenblock allows for 32x32 tiles worth of background.

The format of a regular screenblock entry is quite simple compared to some of the IO register stuff:

  • 10 bits for tile index (base off of the character base block of the background)
  • 1 bit for horizontal flip
  • 1 bit for vertical flip
  • 4 bits for picking which palbank to use (if 4bpp, otherwise it's ignored)

# #![allow(unused_variables)]
#fn main() {
impl RegularScreenblockEntry {
  pub fn tile_id(self) -> u16 {
    self.0 & 0b11_1111_1111
  }
  pub fn set_tile_id(&mut self, id: u16) {
    self.0 &= !0b11_1111_1111;
    self.0 |= id;
  }
  pub fn horizontal_flip(self) -> bool {
    (self.0 & (1 << 0xA)) > 0
  }
  pub fn set_horizontal_flip(&mut self, bit: bool) {
    if bit {
      self.0 |= 1 << 0xA;
    } else {
      self.0 &= !(1 << 0xA);
    }
  }
  pub fn vertical_flip(self) -> bool {
    (self.0 & (1 << 0xB)) > 0
  }
  pub fn set_vertical_flip(&mut self, bit: bool) {
    if bit {
      self.0 |= 1 << 0xB;
    } else {
      self.0 &= !(1 << 0xB);
    }
  }
  pub fn palbank_index(self) -> u16 {
    self.0 >> 12
  }
  pub fn set_palbank_index(&mut self, palbank_index: u16) {
    self.0 &= 0b1111_1111_1111;
    self.0 |= palbank_index << 12;
  }
}
#}

Now, at either 256 or 512 tiles per charblock, you might be thinking that with a 10 bit index you can index past the end of one charblock and into the next. You'd be right, mostly.

As long as you stay within the background memory region for charblocks (that is, 0 through 3), then it all works out. However, if you try to get the background rendering to reach outside of the background charblocks you'll get an implementation defined result. It's not the dreaded "undefined behavior" we're often worried about in programming, but the results are determined by what you're running the game on. With GBA hardware you get a bizarre result (basically another way to put garbage on the screen). With a DS it acts as if the tiles were all 0s. If you use an emulator it might or might not allow for you to do this, it's up to the emulator writers.

Set Your IO Registers

Instead of being just a single IO register to learn about this time, there's two separate groups of related registers.

Background Control

  • BG0CNT (0x400_0008): BG0 Control
  • BG1CNT (0x400_000A): BG1 Control
  • BG2CNT (0x400_000C): BG2 Control
  • BG3CNT (0x400_000E): BG3 Control

Each of these are a read/write u16 location. This is where we get to all of the important details that we've been putting off.

  • 2 bits for the priority.
  • 2 bits for "character base block", the charblock that all of the tile indexes for this background are offset from.
  • 1 bit for mosaic effect being enabled (we'll get to that below).
  • 1 bit to enable 8bpp, otherwise 4bpp is used.
  • 5 bits to pick the "screen base block", the screen block that serves as the base value for this background.
  • 1 bit that is not used in regular mode, but in affine mode it can be enabled to cause the affine background to wrap around at the edges.
  • 2 bits for the background size.

The size works a little funny. When size is 0 only the base screen block is used. If size is 1 or 2 then the base screenblock and the following screenblock are placed next to each other (horizontally for 1, vertically for 2). If the size is 3 then the base screenblock and the following three screenblocks are arranged into a 2x2 grid of screenblocks.

Background Offset

  • BG0HOFS (0x400_0010): BG0 X-Offset
  • BG0VOFS (0x400_0012): BG0 Y-Offset
  • BG1HOFS (0x400_0014): BG1 X-Offset
  • BG1VOFS (0x400_0016): BG1 Y-Offset
  • BG2HOFS (0x400_0018): BG2 X-Offset
  • BG2VOFS (0x400_001A): BG2 Y-Offset
  • BG3HOFS (0x400_001C): BG3 X-Offset
  • BG3VOFS (0x400_001E): BG3 Y-Offset

Each of these are a write only u16 location. Bits 0 through 8 are used, so the offsets can be 0 through 511. They also only apply in regular backgrounds. If a background is in an affine state then you'll use different IO registers to control it (discussed in a later chapter).

The offset that you assign determines the pixel offset of the display area relative to the start of the background scene, as if the screen was a camera looking at the scene. In other words, as a BG X offset value increases, you can think of it as the camera moving to the right, or as that background moving to the left. Like when mario walks toward the goal. Similarly, when a BG Y offset increases the camera is moving down, or the background is moving up, like when mario falls down from a high platform.

Depending on how much the background is scrolled and the size of the background, it will loop.

Mosaic

As a special effect, you can apply mosaic to backgrounds and objects. It's just a single flag for each background, so all backgrounds will use the same mosaic settings when they have it enabled. What it actually does is split the normal image into "blocks" and then each block gets the color of the top left pixel of that block. This is the effect you see when link hits an electric foe with his sword and the whole screen "buzzes" at you.

The mosaic control is a write only u16 IO register at 0x400_004C.

There's 4 bits each for:

  • Horizontal BG stretch
  • Vertical BG stretch
  • Horizontal object stretch
  • Vertical object stretch

The inputs should be 1 less than the desired block size. So if you set a stretch value of 5 then pixels 0-5 would be part of the first block (6 pixels), then 6-11 is the next block (another 6 pixels) and so on.

If you need to make a pixel other than the top left part of each block the one that determines the mosaic color you can carefully offset the background or image by a tiny bit, but of course that makes every mosaic block change its target pixel. You can't change the target pixel on a block by block basis.