gba/book/src/01-limitations/03-volatile_destination.md
2018-12-14 22:57:14 -07:00

14 KiB

Volatile Destination

TODO: replace all this one "the rant" is finalized

There's a reasonable chance that you've never heard of volatile before, so what's that? Well, it's a term that can be used in more than one context, but basically it means "get your grubby mitts off my stuff you over-eager compiler".

Volatile Memory

The first, and most common, form of volatile thing is volatile memory. Volatile memory can change without your program changing it, usually because it's not a location in RAM, but instead some special location that represents an actual hardware device, or part of a hardware device perhaps. The compiler doesn't know what's going on in this situation, but when the program is actually run and the CPU gets an instruction to read or write from that location, instead of just accessing some place in RAM like with normal memory, it accesses whatever bit of hardware and does something. The details of that something depend on the hardware, but what's important is that we need to actually, definitely execute that read or write instruction.

This is not how normal memory works. Normally when the compiler sees us write values into variables and read values from variables, it's free to optimize those expressions and eliminate some of the reads and writes if it can, and generally try to save us time. Maybe it even knows some stuff about the data dependencies in our expressions and so it does some of the reads or writes out of order from what the source says, because the compiler knows that it won't actually make a difference to the operation of the program. A good and helpful friend, that compiler.

Volatile memory works almost the opposite way. With volatile memory we need the compiler to definitely emit an instruction to do a read or write and they need to happen exactly in the order that we say to do it. Each volatile read or write might have any sort of side effect that the compiler doesn't know about, and it shouldn't try to be clever about the optimization. Just do what we say, please.

In Rust, we don't mark volatile things as being a separate type of thing, instead we use normal raw pointers and then call the read_volatile and write_volatile functions (also available as methods, if you like), which then delegate to the LLVM volatile_load and volatile_store intrinsics. In C and C++ you can tag a pointer as being volatile and then any normal read and write with it becomes the volatile version, but in Rust we have to remember to use the correct alternate function instead.

I'm told by the experts that this makes for a cleaner and saner design from a language design perspective, but it really kinda screws us when doing low level code. References, both mutable and shared, aren't volatile, so they compile into normal reads and writes. This means we can't do anything we'd normally do in Rust that utilizes references of any kind. Volatile blocks of memory can't use normal .iter() or .iter_mut() based iteration (which give &T or &mut T), and they also can't use normal Index and IndexMut sugar like a + x[i] or x[i] = 7.

Unlike with normal raw pointers, this pain point never goes away. There's no way to abstract over the difference with Rust as it exists now, you'd need to actually adjust the core language by adding an additional pointer type (*vol T) and possibly a reference type to go with it (&vol T) to get the right semantics. And then you'd need an IndexVol trait, and you'd need .iter_vol(), and so on for every other little thing. It would be a lot of work, and the Rust developers just aren't interested in doing all that for such a limited portion of their user population. We'll just have to deal with not having any syntax sugar.

VolatilePtr

No syntax sugar doesn't mean we can't at least make things a little easier for ourselves. Enter the VolatilePtr<T> type, which is a newtype over a *mut T. One of those "manual" newtypes I mentioned where we can't use our nice macro.

#[derive(Debug, Clone, Copy, Hash, PartialEq, Eq, PartialOrd, Ord)]
#[repr(transparent)]
pub struct VolatilePtr<T>(pub *mut T);

Obviously we want to be able to read and write:

impl<T> VolatilePtr<T> {
  /// Performs a `read_volatile`.
  pub unsafe fn read(self) -> T {
    self.0.read_volatile()
  }

  /// Performs a `write_volatile`.
  pub unsafe fn write(self, data: T) {
    self.0.write_volatile(data);
  }

And we want a way to jump around when we do have volatile memory that's in blocks. This is where we can get ourselves into some trouble if we're not careful. We have to decide between offset and wrapping_offset. The difference is that offset optimizes better, but also it can be Undefined Behavior if the result is not "in bounds or one byte past the end of the same allocated object". I asked ubsan (who is the expert that you should always listen to on matters like this) what that means exactly when memory mapped hardware is involved (since we never allocated anything), and the answer was that you can use an offset in statically memory mapped situations like this as long as you don't use it to jump to the address of something that Rust itself allocated at some point. Cool, we all like being able to use the one that optimizes better. Unfortunately, the downside to using offset instead of wrapping_offset is that with offset, it's Undefined Behavior simply to calculate the out of bounds result (with wrapping_offset it's not Undefined Behavior until you use the out of bounds result). We'll have to be quite careful when we're using offset.

  /// Performs a normal `offset`.
  pub unsafe fn offset(self, count: isize) -> Self {
    VolatilePtr(self.0.offset(count))
  }

Now, one thing of note is that doing the offset isn't const. The math for it is something that's possible to do in a const way of course, but Rust basically doesn't allow you to fiddle raw pointers much during const right now. Maybe in the future that will improve.

If we did want to have a const function for finding the correct address within a volatile block of memory we'd have to do all the math using usize values, and then cast that value into being a pointer once we were done. It'd look something like this:

const fn address_index<T>(address: usize, index: usize) -> usize {
  address + (index * std::mem::size_of::<T>())
}

But, back to methods for VolatilePtr, well we sometimes want to be able to cast a VolatilePtr between pointer types. Since we won't be able to do that with as, we'll have to write a method for it:

  /// Performs a cast into some new pointer type.
  pub fn cast<Z>(self) -> VolatilePtr<Z> {
    VolatilePtr(self.0 as *mut Z)
  }

Volatile Iterating

How about that Iterator stuff I said we'd be missing? We can actually make an Iterator available, it's just not the normal "iterate by shared reference or unique reference" Iterator. Instead, it's more like a "throw out a series of VolatilePtr values" style Iterator. Other than that small difference it's totally normal, and we'll be able to use map and skip and take and all those neat methods.

So how do we make this thing we need? First we check out the Implementing Iterator section in the core documentation. It says we need a struct for holding the iterator state. Right-o, probably something like this:

#[derive(Debug, Clone, Hash, PartialEq, Eq)]
pub struct VolatilePtrIter<T> {
  vol_ptr: VolatilePtr<T>,
  slots: usize,
}

And then we just implement core::iter::Iterator on that struct. Wow, that's quite the trait though! Don't worry, we only need to implement two small things and then the rest of it comes free as a bunch of default methods.

So, the code that we want to write looks like this:

impl<T> Iterator for VolatilePtrIter<T> {
  type Item = VolatilePtr<T>;

  fn next(&mut self) -> Option<VolatilePtr<T>> {
    if self.slots > 0 {
      let out = Some(self.vol_ptr);
      self.slots -= 1;
      self.vol_ptr = unsafe { self.vol_ptr.offset(1) };
      out
    } else {
      None
    }
  }
}

Except we can't write that code. What? The problem is that we used derive(Clone, Copy on VolatilePtr. Because of a quirk in how derive works, this makes VolatilePtr<T> will only be Copy if the T is Copy, even though the pointer itself is always Copy regardless of what it points to. Ugh, terrible. We've got three basic ways to handle this:

  • Make the Iterator implementation be for <T:Clone>, and then hope that we always have types that are Clone.
  • Hand implement every trait we want VolatilePtr (and VolatilePtrIter) to have so that we can override the fact that derive is basically broken in this case.
  • Make VolatilePtr store a usize value instead of a pointer, and then cast it to *mut T when we actually need to read and write. This would require us to also store a PhantomData<T> so that the type of the address is tracked properly, which would make it a lot more verbose to construct a VolatilePtr value.

None of those options are particularly appealing. I guess we'll do the first one because it's the least amount of up front trouble, and I don't think we'll need to be iterating non-Clone values. All we do to pick that option is add the bound to the very start of the impl block, where we introduce the T:

impl<T: Clone> Iterator for VolatilePtrIter<T> {
  type Item = VolatilePtr<T>;

  fn next(&mut self) -> Option<VolatilePtr<T>> {
    if self.slots > 0 {
      let out = Some(self.vol_ptr.clone());
      self.slots -= 1;
      self.vol_ptr = unsafe { self.vol_ptr.clone().offset(1) };
      out
    } else {
      None
    }
  }
}

What's going on here? Okay so our iterator has a number of slots that it'll go over, and then when it's out of slots it starts producing None forever. That's actually pretty simple. We're also masking some unsafety too. In this case, we'll rely on the person who made the VolatilePtrIter to have selected the correct number of slots. This gives us a new method for VolatilePtr:

  pub unsafe fn iter_slots(self, slots: usize) -> VolatilePtrIter<T> {
    VolatilePtrIter {
      vol_ptr: self,
      slots,
    }
  }

With this design, making the VolatilePtrIter at the start is unsafe (we have to trust the caller that the right number of slots exists), and then using it after that is totally safe (if the right number of slots was given we'll never screw up our end of it).

VolatilePtr Formatting

Also, just as a little bonus that we probably won't use, we could enable our new pointer type to be formatted as a pointer value.

impl<T> core::fmt::Pointer for VolatilePtr<T> {
  /// Formats exactly like the inner `*mut T`.
  fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result {
    write!(f, "{:p}", self.0)
  }
}

Neat!

VolatilePtr Complete

That was a lot of small code blocks, let's look at it all put together:

#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[repr(transparent)]
pub struct VolatilePtr<T>(pub *mut T);
impl<T> VolatilePtr<T> {
  pub unsafe fn read(self) -> T {
    self.0.read_volatile()
  }
  pub unsafe fn write(self, data: T) {
    self.0.write_volatile(data);
  }
  pub unsafe fn offset(self, count: isize) -> Self {
    VolatilePtr(self.0.offset(count))
  }
  pub fn cast<Z>(self) -> VolatilePtr<Z> {
    VolatilePtr(self.0 as *mut Z)
  }
  pub unsafe fn iter_slots(self, slots: usize) -> VolatilePtrIter<T> {
    VolatilePtrIter {
      vol_ptr: self,
      slots,
    }
  }
}
impl<T> core::fmt::Pointer for VolatilePtr<T> {
  fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result {
    write!(f, "{:p}", self.0)
  }
}

#[derive(Debug, Clone, Hash, PartialEq, Eq)]
pub struct VolatilePtrIter<T> {
  vol_ptr: VolatilePtr<T>,
  slots: usize,
}
impl<T: Clone> Iterator for VolatilePtrIter<T> {
  type Item = VolatilePtr<T>;
  fn next(&mut self) -> Option<VolatilePtr<T>> {
    if self.slots > 0 {
      let out = Some(self.vol_ptr.clone());
      self.slots -= 1;
      self.vol_ptr = unsafe { self.vol_ptr.clone().offset(1) };
      out
    } else {
      None
    }
  }
}

Volatile ASM

In addition to some memory locations being volatile, it's also possible for inline assembly to be declared volatile. This is basically the same idea, "hey just do what I'm telling you, don't get smart about it".

Normally when you have some asm! it's basically treated like a function, there's inputs and outputs and the compiler will try to optimize it so that if you don't actually use the outputs it won't bother with doing those instructions. However, asm! is basically a pure black box, so the compiler doesn't know what's happening inside at all, and it can't see if there's any important side effects going on.

An example of an important side effect that doesn't have output values would be putting the CPU into a low power state while we want for the next VBlank. This lets us save quite a bit of battery power. It requires some setup to be done safely (otherwise the GBA won't ever actually wake back up from the low power state), but the asm! you use once you're ready is just a single instruction with no return value. The compiler can't tell what's going on, so you just have to say "do it anyway".