diff --git a/book/src/the-hardware-memory-map.md b/book/src/the-hardware-memory-map.md index bfa4f61..5ef79a5 100644 --- a/book/src/the-hardware-memory-map.md +++ b/book/src/the-hardware-memory-map.md @@ -43,8 +43,8 @@ That's enough about the general concept of memory mapped hardware, let's get to some GBA specifics. The GBA has the following sections in its memory map. * BIOS -* Internal Work RAM (IWRAM) * External Work RAM (EWRAM) +* Internal Work RAM (IWRAM) * IO Registers * Palette RAM (PALRAM) * Video RAM (VRAM) @@ -82,11 +82,15 @@ this information is also available in GBATEK, mostly in their [memory map](http://www.akkit.org/info/gbatek.htm#gbamemorymap) section (though somewhat spread through the rest of the document too). +Though I'm going to list the location range of each memory space below, most of +the hardware locations are actually mirrored at several points throughout the +address space. + ### BIOS -* **Location:** `0x0` to `0x3FFF` (16k) +* **Location:** `0x0` to `0x3FFF` * **Bus:** 32-bit -* **Access:** Protected read-only (see text). +* **Access:** Memory protected read-only (see text). * **Wait Cycles:** None The "basic input output system". This contains a grab bag of utilities that do @@ -98,63 +102,278 @@ but costly. As a side note, not only is BIOS memory read only, but it's memory protected so that you can't even read from bios memory unless the system is currently -executing a function that's in bios memory. There's actually a bug in one bios -call that lets you read the bytes of the rest of the bios (if you really want), -but a normal `read_volatile` won't do the trick. +executing a function that's in bios memory. If you try then the system just +gives back a nonsensical value that's not really what you asked for. If you +really want to know what's inside, there's actually a bug in one bios call +(`MidiKey2Freq`) that lets you read the bios section one byte at a time. -### Internal Work RAM (IWRAM) +Also, there's not just one bios! Of course there's the official bios from +Nintendo that's used on actual hardware, but since that's code instead of +hardware it's protected by copyright. Since a bios is needed to run a GBA +emulator properly, people have come up with their own open source versions or +they simply make the emulator special case the bios and act _as if_ the function +call had done the right thing. -* **Location:** . -* **Bus:** . -* **Access:** . -* **Wait Cycles:** . +* The [TempGBA](https://github.com/Nebuleon/TempGBA) repository has an easy to + look at version written in assembly. It's API and effects are close enough to + the Nintendo version that most games will run just fine. +* You can also check out the [mGBA + bios](https://github.com/mgba-emu/mgba/blob/master/src/gba/bios.c) if you want + to see the C version of what various bios functions are doing. ### External Work RAM (EWRAM) -* **Location:** . -* **Bus:** . -* **Access:** . -* **Wait Cycles:** . +* **Location:** `0x200_0000` to `0x203_FFFF` (256k) +* **Bus:** 16-bit +* **Access:** Read-write, any size. +* **Wait Cycles:** 2 + +The external work ram is a sizable amount of space, but the 2 wait cycles per +access and 16-bit bus mean that you should probably think of it as being a +"heap" to avoid putting things in if you don't have to. + +The GBA itself doesn't use this for anything, so any use is totally up to you. + +At the moment, the linker script and `crt0.s` files provided with the `gba` +crate also have no defined use for the EWRAM, so it's 100% on you to decide how +you wanna use them. + +(Note: There is an undocumented control register that lets you adjust the wait +cycles on EWRAM. Using it, you can turn EWRAM from the default 2 wait cycles +down to 1. However, not all GBA-like things support it. The GBA and GBA SP do, +the GBA Micro and DS do not. Emulators might or might not depending on the +particular emulator. See the [GBATEK system +control](https://problemkaputt.de/gbatek.htm#gbasystemcontrol) page for a full +description of that register, though probably only once you've read more of this +tutorial book and know how to make sense of IO registers and such.) + +### Internal Work RAM (IWRAM) + +* **Location:** `0x300_0000` to `0x300_7FFF` (32k) +* **Bus:** 32-bit +* **Access:** Read-write, any size. +* **Wait Cycles:** 0 + +This is where the "fast" memory for general purposes lives. By default the +system uses the 256 _bytes_ starting at `0x300_7F00` _and up_ for system and +interrupt purposes, while Rust's program stack starts at that same address _and +goes down_ from there. + +Even though your stack exists in this space, it's totally reasonable to use the +bottom parts of this memory space for whatever quick scratch purposes, same as +EWRAM. 32k is fairly huge, and the stack going down from the top and the scratch +data going up from the bottom are unlikely to hit each other. If they do you +were probably well on your way to a stack overflow anyway. + +The linker script and `crt0.s` file provided with the `gba` crate use the bottom +of IWRAM to store the `.data` and `.bss` [data +segments](https://en.wikipedia.org/wiki/Data_segment). That's where your global +variables get placed (both `static` and `static mut`). The `.data` segment holds +any variable that's initialized to non-zero, and the `.bss` section is for any +variable initialized to zero. When the GBA is powered on, some code in the +`crt0.s` file runs and copies the initial `.data` values into place within IWRAM +(all of `.bss` starts at 0, so there's no copy for those variables). + +If you have no global variables at all, then you don't need to worry about those +details, but if you do have some global variables then you can use the _address +of_ the `__bss_end` symbol defined in the top of the `gba` crate as a marker for +where it's safe for you to start using IWRAM without overwriting your globals. ### IO Registers -* **Location:** . -* **Bus:** . -* **Access:** . -* **Wait Cycles:** . +* **Location:** `0x400_0000` to `0x400_03FE` +* **Bus:** 32-bit +* **Access:** different for each IO register +* **Wait Cycles:** 0 + +The IO Registers are where most of the magic happens, and it's where most of the +variety happens too. Each IO register is a specific width, usually 16-bit but +sometimes 32-bit. Most of them are fully read/write, but some of them are read +only or write only. Some of them have individual bits that are read only even +when the rest of the register is writable. Some of them can be written to, but +the write doesn't change the value you read back, it sets something else. +Really. + +The IO registers are how you control every bit of hardware besides the CPU +itself. Reading the buttons, setting display modes, enabling timers, all of that +goes through different IO registers. Actually, even a few parts of the CPU's +operation can be controlled via IO register. + +We'll go over IO registers more in the next section, including a few specific +registers, and then we'll constantly encounter more IO registers as we explore +each new topic through the rest of the book. ### Palette RAM (PALRAM) -* **Location:** . -* **Bus:** . -* **Access:** . -* **Wait Cycles:** . +* **Location:** `0x500_0000` to `0x500_03FF` (1k) +* **Bus:** 16-bit +* **Access:** Read any, single bytes mirrored (see text). +* **Wait Cycles:** Video Memory Wait (see text) + +This is where the GBA stores color palette data. There's 256 slots for +Background color, and then 256 slots for Object color. + +GBA colors are 15 bits each, with five bits per channel and the highest bit +being totally ignored, so we store them as `u16` values: + +* `X_BBBBB_GGGGG_RRRRR` + +Of note is the fact that the 256 palette slots can be viewed in two different +ways. There's two different formats for images in video memory: "8 bit per +pixel" (8bpp) and "4 bit per pixel mode" (4bpp). + +* **8bpp:** Each pixel in the image is 8 bits and indexes directly into the full + 256 entry palette array. An index of 0 means that pixel should be transparent, + so there's 255 possible colors. +* **4bpp:** Each pixel in the image is 4 bits and indexes into a "palbank" of 16 + colors within the palette data. Some exterior control selects the palbank to + be used. An index of 0 still means that the pixel should be transparent, so + there's 15 possible colors. + +Different images can use different modes all at once, as long as you can fit all +the colors you want to use into your palette layout. + +PALRAM can't be written to in individual bytes. This isn't normally a problem at +all, because you wouldn't really want to write half of a color entry anyway. If +you do try to write a single byte then it gets "mirrored" into both halves of +the `u16` that would be associated with that address. For example, if you tried +to write `0x01u8` to either `0x500_0000` or `0x500_0001` then you'd actually +_effectively_ be writing `0x0101u16` to `0x500_0000`. + +PALRAM follows what we'll call the "Video Memory Wait" rule: If you to access +the memory during a vertical blank or horizontal blank period there's 0 wait +cycles, and if you try to access the memory while the display controller is +drawing there is a 1 cycle wait inserted _if_ the display controller was using +that memory at that moment. ### Video RAM (VRAM) -* **Location:** . -* **Bus:** . -* **Access:** . -* **Wait Cycles:** . +* **Location:** `0x600_0000` to `0x601_7FFF` (96k or 64k+32k depending on mode) +* **Bus:** 16-bit +* **Access:** Read any, single bytes _sometimes_ mirrored (see text). +* **Wait Cycles:** Video Memory Wait (see text) + +Video RAM is the memory for what you want the display controller to be +displaying. The GBA actually has 6 different display modes (numbered 0 through +5), and depending on the mode you're using the layout that you should imagine +VRAM having changes. Because there's so much involved here, I'll leave more +precise details to the following sections which talk about how to use VRAM in +each mode. + +VRAM can't be written to in individual bytes. If you try to write a single byte +to background VRAM the byte gets mirrored like with PALRAM, and if you try with +object VRAM the write gets ignored entirely. Exactly what address ranges those +memory types are depends on video mode, but just don't bother with individual +byte writes to VRAM. If you want to change a single byte of data (and you might) +then the correct style is to read the full `u16`, mask out the old data, mask in +your new value, and then write the whole `u16`. + +VRAM follows the same "Video Memory Wait" rule that PALRAM has. ### Object Attribute Memory (OAM) -* **Location:** . -* **Bus:** . -* **Access:** . -* **Wait Cycles:** . +* **Location:** `0x700_0000` to `0x700_03FF` (1k) +* **Bus:** 32-bit +* **Access:** Read any, single bytes no effect (see text). +* **Wait Cycles:** Video Memory Wait (see text) + +This part of memory controls the "Objects" (OBJ) on the screen. An object is +_similar to_ the concept of a "sprite". However, because of an object's size +limitations, a single sprite might require more than one object to be drawn +properly. In general, if you want to think in terms of sprites at all, you +should think of sprites as being a logical / programming concept, and objects as +being a hardware concept. + +While VRAM has the _image_ data for each object, this part of memory has the +_control_ data for each object. An objects "attributes" describe what part of +the VRAM to use, where to place is on the screen, any special graphical effects +to use, all that stuff. Each object has 6 bytes of attribute data (arranged as +three `u16` values), and there's a total of 128 objects (indexed 0 through 127). + +But 6 bytes each times 128 entries out of 1024 bytes leaves us with 256 bytes +left over. What's the other space used for? Well, it's a little weird, but after +every three `u16` object attribute fields there's one `i16` "affine parameter" +field mixed in. It takes four such fields to make a complete set of affine +parameters (a 2x2 matrix), so we get a total of 32 affine parameter entries +across all of OAM. "Affine" might sound fancy but it just means a transformation +where anything that started parallel stays parallel after the transform. The +affine parameters can be used to scale, rotate, and/or skew a background or +object as it's being displayed on the screen. It takes more computing power than +the non-affine display, so you can't display as many different things at once +when using the affine modes. + +OAM can't ever be written to with individual bytes. The write just has no effect +at all. + +OAM follows the same "Video Memory Wait" rule that PALRAM has, **and** you can +also only freely access OAM during a horizontal blank if you set a special +"HBlank Interval Free" bit in one of the IO registers (the "Display Control" +register, which we'll talk about next lesson). The reason that you might _not_ +want to set that bit is because when it's enabled you can't draw as many objects +at once. You don't lose the use of an exact number of objects, you actually lose +the use of a number of display adapter drawing cycles. Since not all objects +take the same number of cycles to render, it depends on what you're drawing. +GBATEK [has the details](https://problemkaputt.de/gbatek.htm#lcdobjoverview) if +you want to know precisely. ### Game Pak ROM (ROM) -* **Location:** . -* **Bus:** . -* **Access:** . -* **Wait Cycles:** . +* **Location:** Special (max of 32MB) +* **Bus:** 16-bit +* **Access:** Special +* **Wait Cycles:** Special + +This is where your actual game is located! As you might guess, since each +cartridge is different, the details here depend quite a bit on the cartridge +that you use for your game. Even a simple statement like "you can't write to the +ROM region" isn't true for some carts if they have FlashROM. + +The _most important_ thing to concern yourself with when considering the ROM +portion of memory is the 32MB limit. That's compiled code, images, sound, +everything put together. The total has to stay under 32MB. + +The next most important thing to consider is that 16-bit bus. It means that we +compile our programs using "Thumb state" code instead of "ARM state" code. +Details about this can be found in the GBA Assembly section of the book, but +just be aware that there's two different types of assembly on the GBA. You can +switch between them, but the default for us is always Thumb state. + +Another detail which you actually _don't_ have to think about much, but that you +might care if you're doing precise optimization, is that the ROM address space +is actually mirrored across three different locations: + +* `0x800_0000` to `0x9FF_FFFF`: Wait State 0 +* `0xA00_0000` to `0xBFF_FFFF`: Wait State 1 +* `0xC00_0000` to `0xDFF_FFFF`: Wait State 2 + +These _don't_ mean 0, 1, and 2 wait cycles, they mean the wait cycles associated +with ROM mirrors 0, 1, and 2. On some carts the game will store different parts +of the data into different chips that are wired to be accessible through +different parts of the mirroring. The actual wait cycles used are even +configurable via an IO register called the +[WAITCNT](https://problemkaputt.de/gbatek.htm#gbasystemcontrol) ("Wait Control", +I don't know why C programmers have to give everything the worst names it's not +1980 any more). ### Save RAM (SRAM) -* **Location:** . -* **Bus:** . -* **Access:** . -* **Wait Cycles:** . +* **Location:** Special (max of 64k) +* **Bus:** 8-bit +* **Access:** Special +* **Wait Cycles:** Special +The Save RAM is also part of the cart that you've got your game on, so it also +depends on your hardware. + +SRAM _starts_ at `0xE00_0000` and you can save up to however much the hardware +supports, to a maximum of 64k. However, you can only read and write SRAM one +_byte_ at a time. What's worse, while you can _write_ to SRAM using code +executing anywhere, you can only _read_ with code that's executing out of either +Internal or External Work RAM, not from with code that's executing out of ROM. +This means that you need to copy the code for doing the read into some scratch +space (either at startup or on the fly, doesn't matter) and call that function +you've carefully placed. It's a bit annoying, but soon enough a routine for it +all will be provided in the `gba` crate and we won't have to worry too much +about it. + +(TODO: Provide the routine that I just claimed we would provide.)