14 KiB
Volatile Destination
TODO: replace all this one "the rant" is finalized
There's a reasonable chance that you've never heard of volatile
before, so
what's that? Well, it's a term that can be used in more than one context, but
basically it means "get your grubby mitts off my stuff you over-eager compiler".
Volatile Memory
The first, and most common, form of volatile thing is volatile memory. Volatile memory can change without your program changing it, usually because it's not a location in RAM, but instead some special location that represents an actual hardware device, or part of a hardware device perhaps. The compiler doesn't know what's going on in this situation, but when the program is actually run and the CPU gets an instruction to read or write from that location, instead of just accessing some place in RAM like with normal memory, it accesses whatever bit of hardware and does something. The details of that something depend on the hardware, but what's important is that we need to actually, definitely execute that read or write instruction.
This is not how normal memory works. Normally when the compiler sees us write values into variables and read values from variables, it's free to optimize those expressions and eliminate some of the reads and writes if it can, and generally try to save us time. Maybe it even knows some stuff about the data dependencies in our expressions and so it does some of the reads or writes out of order from what the source says, because the compiler knows that it won't actually make a difference to the operation of the program. A good and helpful friend, that compiler.
Volatile memory works almost the opposite way. With volatile memory we need the compiler to definitely emit an instruction to do a read or write and they need to happen exactly in the order that we say to do it. Each volatile read or write might have any sort of side effect that the compiler doesn't know about, and it shouldn't try to be clever about the optimization. Just do what we say, please.
In Rust, we don't mark volatile things as being a separate type of thing, instead we use normal raw pointers and then call the read_volatile and write_volatile functions (also available as methods, if you like), which then delegate to the LLVM volatile_load and volatile_store intrinsics. In C and C++ you can tag a pointer as being volatile and then any normal read and write with it becomes the volatile version, but in Rust we have to remember to use the correct alternate function instead.
I'm told by the experts that this makes for a cleaner and saner design from a
language design perspective, but it really kinda screws us when doing low
level code. References, both mutable and shared, aren't volatile, so they
compile into normal reads and writes. This means we can't do anything we'd
normally do in Rust that utilizes references of any kind. Volatile blocks of
memory can't use normal .iter()
or .iter_mut()
based iteration (which give
&T
or &mut T
), and they also can't use normal Index
and IndexMut
sugar
like a + x[i]
or x[i] = 7
.
Unlike with normal raw pointers, this pain point never goes away. There's no way
to abstract over the difference with Rust as it exists now, you'd need to
actually adjust the core language by adding an additional pointer type (*vol T
) and possibly a reference type to go with it (&vol T
) to get the right
semantics. And then you'd need an IndexVol
trait, and you'd need
.iter_vol()
, and so on for every other little thing. It would be a lot of
work, and the Rust developers just aren't interested in doing all that for such
a limited portion of their user population. We'll just have to deal with not
having any syntax sugar.
VolatilePtr
No syntax sugar doesn't mean we can't at least make things a little easier for
ourselves. Enter the VolatilePtr<T>
type, which is a newtype over a *mut T
.
One of those "manual" newtypes I mentioned where we can't use our nice macro.
#[derive(Debug, Clone, Copy, Hash, PartialEq, Eq, PartialOrd, Ord)]
#[repr(transparent)]
pub struct VolatilePtr<T>(pub *mut T);
Obviously we want to be able to read and write:
impl<T> VolatilePtr<T> {
/// Performs a `read_volatile`.
pub unsafe fn read(self) -> T {
self.0.read_volatile()
}
/// Performs a `write_volatile`.
pub unsafe fn write(self, data: T) {
self.0.write_volatile(data);
}
And we want a way to jump around when we do have volatile memory that's in
blocks. This is where we can get ourselves into some trouble if we're not
careful. We have to decide between
offset and
wrapping_offset.
The difference is that offset
optimizes better, but also it can be Undefined
Behavior if the result is not "in bounds or one byte past the end of the same
allocated object". I asked ubsan (who is the expert
that you should always listen to on matters like this) what that means exactly
when memory mapped hardware is involved (since we never allocated anything), and
the answer was that you can use an offset
in statically memory mapped
situations like this as long as you don't use it to jump to the address of
something that Rust itself allocated at some point. Cool, we all like being able
to use the one that optimizes better. Unfortunately, the downside to using
offset
instead of wrapping_offset
is that with offset
, it's Undefined
Behavior simply to calculate the out of bounds result (with wrapping_offset
it's not Undefined Behavior until you use the out of bounds result). We'll
have to be quite careful when we're using offset
.
/// Performs a normal `offset`.
pub unsafe fn offset(self, count: isize) -> Self {
VolatilePtr(self.0.offset(count))
}
Now, one thing of note is that doing the offset
isn't const
. The math for it
is something that's possible to do in a const
way of course, but Rust
basically doesn't allow you to fiddle raw pointers much during const
right
now. Maybe in the future that will improve.
If we did want to have a const
function for finding the correct address within
a volatile block of memory we'd have to do all the math using usize
values,
and then cast that value into being a pointer once we were done. It'd look
something like this:
const fn address_index<T>(address: usize, index: usize) -> usize {
address + (index * std::mem::size_of::<T>())
}
But, back to methods for VolatilePtr
, well we sometimes want to be able to
cast a VolatilePtr
between pointer types. Since we won't be able to do that
with as
, we'll have to write a method for it:
/// Performs a cast into some new pointer type.
pub fn cast<Z>(self) -> VolatilePtr<Z> {
VolatilePtr(self.0 as *mut Z)
}
Volatile Iterating
How about that Iterator
stuff I said we'd be missing? We can actually make
an Iterator available, it's just not the normal "iterate by shared reference
or unique reference" Iterator. Instead, it's more like a "throw out a series of
VolatilePtr
values" style Iterator. Other than that small difference it's
totally normal, and we'll be able to use map and skip and take and all those
neat methods.
So how do we make this thing we need? First we check out the Implementing Iterator section in the core documentation. It says we need a struct for holding the iterator state. Right-o, probably something like this:
#[derive(Debug, Clone, Hash, PartialEq, Eq)]
pub struct VolatilePtrIter<T> {
vol_ptr: VolatilePtr<T>,
slots: usize,
}
And then we just implement core::iter::Iterator on that struct. Wow, that's quite the trait though! Don't worry, we only need to implement two small things and then the rest of it comes free as a bunch of default methods.
So, the code that we want to write looks like this:
impl<T> Iterator for VolatilePtrIter<T> {
type Item = VolatilePtr<T>;
fn next(&mut self) -> Option<VolatilePtr<T>> {
if self.slots > 0 {
let out = Some(self.vol_ptr);
self.slots -= 1;
self.vol_ptr = unsafe { self.vol_ptr.offset(1) };
out
} else {
None
}
}
}
Except we can't write that code. What? The problem is that we used
derive(Clone, Copy
on VolatilePtr
. Because of a quirk in how derive
works,
this makes VolatilePtr<T>
will only be Copy
if the T
is Copy
, even
though the pointer itself is always Copy
regardless of what it points to.
Ugh, terrible. We've got three basic ways to handle this:
- Make the
Iterator
implementation be for<T:Clone>
, and then hope that we always have types that areClone
. - Hand implement every trait we want
VolatilePtr
(andVolatilePtrIter
) to have so that we can override the fact thatderive
is basically broken in this case. - Make
VolatilePtr
store ausize
value instead of a pointer, and then cast it to*mut T
when we actually need to read and write. This would require us to also store aPhantomData<T>
so that the type of the address is tracked properly, which would make it a lot more verbose to construct aVolatilePtr
value.
None of those options are particularly appealing. I guess we'll do the first one
because it's the least amount of up front trouble, and I don't think we'll
need to be iterating non-Clone values. All we do to pick that option is add the
bound to the very start of the impl
block, where we introduce the T
:
impl<T: Clone> Iterator for VolatilePtrIter<T> {
type Item = VolatilePtr<T>;
fn next(&mut self) -> Option<VolatilePtr<T>> {
if self.slots > 0 {
let out = Some(self.vol_ptr.clone());
self.slots -= 1;
self.vol_ptr = unsafe { self.vol_ptr.clone().offset(1) };
out
} else {
None
}
}
}
What's going on here? Okay so our iterator has a number of slots that it'll go
over, and then when it's out of slots it starts producing None
forever. That's
actually pretty simple. We're also masking some unsafety too. In this case,
we'll rely on the person who made the VolatilePtrIter
to have selected the
correct number of slots. This gives us a new method for VolatilePtr
:
pub unsafe fn iter_slots(self, slots: usize) -> VolatilePtrIter<T> {
VolatilePtrIter {
vol_ptr: self,
slots,
}
}
With this design, making the VolatilePtrIter
at the start is unsafe
(we have
to trust the caller that the right number of slots exists), and then using it
after that is totally safe (if the right number of slots was given we'll never
screw up our end of it).
VolatilePtr Formatting
Also, just as a little bonus that we probably won't use, we could enable our new pointer type to be formatted as a pointer value.
impl<T> core::fmt::Pointer for VolatilePtr<T> {
/// Formats exactly like the inner `*mut T`.
fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result {
write!(f, "{:p}", self.0)
}
}
Neat!
VolatilePtr Complete
That was a lot of small code blocks, let's look at it all put together:
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[repr(transparent)]
pub struct VolatilePtr<T>(pub *mut T);
impl<T> VolatilePtr<T> {
pub unsafe fn read(self) -> T {
self.0.read_volatile()
}
pub unsafe fn write(self, data: T) {
self.0.write_volatile(data);
}
pub unsafe fn offset(self, count: isize) -> Self {
VolatilePtr(self.0.offset(count))
}
pub fn cast<Z>(self) -> VolatilePtr<Z> {
VolatilePtr(self.0 as *mut Z)
}
pub unsafe fn iter_slots(self, slots: usize) -> VolatilePtrIter<T> {
VolatilePtrIter {
vol_ptr: self,
slots,
}
}
}
impl<T> core::fmt::Pointer for VolatilePtr<T> {
fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result {
write!(f, "{:p}", self.0)
}
}
#[derive(Debug, Clone, Hash, PartialEq, Eq)]
pub struct VolatilePtrIter<T> {
vol_ptr: VolatilePtr<T>,
slots: usize,
}
impl<T: Clone> Iterator for VolatilePtrIter<T> {
type Item = VolatilePtr<T>;
fn next(&mut self) -> Option<VolatilePtr<T>> {
if self.slots > 0 {
let out = Some(self.vol_ptr.clone());
self.slots -= 1;
self.vol_ptr = unsafe { self.vol_ptr.clone().offset(1) };
out
} else {
None
}
}
}
Volatile ASM
In addition to some memory locations being volatile, it's also possible for inline assembly to be declared volatile. This is basically the same idea, "hey just do what I'm telling you, don't get smart about it".
Normally when you have some asm!
it's basically treated like a function,
there's inputs and outputs and the compiler will try to optimize it so that if
you don't actually use the outputs it won't bother with doing those
instructions. However, asm!
is basically a pure black box, so the compiler
doesn't know what's happening inside at all, and it can't see if there's any
important side effects going on.
An example of an important side effect that doesn't have output values would be
putting the CPU into a low power state while we want for the next VBlank. This
lets us save quite a bit of battery power. It requires some setup to be done
safely (otherwise the GBA won't ever actually wake back up from the low power
state), but the asm!
you use once you're ready is just a single instruction
with no return value. The compiler can't tell what's going on, so you just have
to say "do it anyway".