more stuff

This commit is contained in:
Lokathor 2018-11-27 16:58:13 -07:00
parent 7629ce8117
commit caec27da7b
2 changed files with 158 additions and 105 deletions

View file

@ -37,27 +37,21 @@ We measure the quality of a PRNG based upon:
Still, every once in a while you might find some page old page intended for Still, every once in a while you might find some page old page intended for
compatibility with the `rand()` function in the C standard library that'll compatibility with the `rand()` function in the C standard library that'll
talk about something _crazy_ like having 15-bit PRNG outputs. Stupid as it talk about something _crazy_ like having 15-bit PRNG outputs. Stupid as it
sounds, that's real. Avoid those. We almost always want generators that give sounds, that's real. Avoid those. Whenever possible we want generators that
us uniformly distributed `u8`, `u16`, `u32`, or whatever size value we're give us uniformly distributed `u8`, `u16`, `u32`, or whatever size value
producing. From there we can mold our random bits into whatever else we need we're producing. From there we can mold our random bits into whatever else we
(eg: turning a `u8` into a "1d6" roll). need (eg: turning a `u8` into a "1d6" roll).
2) **How long does each generation cycle take?** This can be tricky for us. A 2) **How long does each generation cycle take?** This can be tricky for us. A
lot of the top quality PRNGs you'll find these days are oriented towards lot of the top quality PRNGs you'll find these days are oriented towards
64-bit machines so they do a bunch of 64-bit operations. You _can_ do that on 64-bit machines so they do a bunch of 64-bit operations. You _can_ do that on
a 32-bit machine if you have to, and the compiler will automatically "lower" a 32-bit machine if you have to, and the compiler will automatically "lower"
the 64-bit operation into a series of 32-bit operations. What we'd really the 64-bit operation into a series of 32-bit operations. What we'd really
like to pick is something that sticks to just 32-bit operations though, since like to pick is something that sticks to just 32-bit operations though, since
those will be our best candidates for fast results. As with other those will be our best candidates for fast results. We can use [Compiler
benchmarking related things, we can use [Compiler Explorer](https://rust.godbolt.org/z/JyX7z-) and tell it to build for the
Explorer](https://rust.godbolt.org/z/JyX7z-) set for the `thumbv6m-none-eabi` `thumbv6m-none-eabi` target to get a basic idea of what the ASM for a
target as a basic approximation, which we'll do in this section. Of course, generator looks like. That's not our exact target, but it's the closest
not every instruction is the same time to execute, but basically less ASM is target that's shipped with the standard rust distribution.
better for us. If you wanted to be even more precise you could also try to
coax rustc to spit out the ASM directly (though `xbuild` makes that a hair
tricky) and then pick through that and use the [execution
times](http://problemkaputt.de/gbatek.htm#armcpuoverview) listed in GBATEK to
figure out a total cycle cost, or you could even try to make some sort of
benchmarking harness for the GBA itself if you were really dedicated.
3) **What is the statistical quality of the output?** This involves heavy 3) **What is the statistical quality of the output?** This involves heavy
amounts of math. Since computers are quite good a large amounts of repeated amounts of math. Since computers are quite good a large amounts of repeated
math you might wonder if there's programs for this already, and there are. math you might wonder if there's programs for this already, and there are.
@ -73,48 +67,43 @@ We measure the quality of a PRNG based upon:
* [NIST Statistical Test * [NIST Statistical Test
Suite](https://csrc.nist.gov/projects/random-bit-generation/documentation-and-software) Suite](https://csrc.nist.gov/projects/random-bit-generation/documentation-and-software)
Note that generators with a small state size will _always_ fail the statistical Note that if a generator is called upon to produce enough output relative to its
test suites simply because the suites ask them to produce too much output state size it will basically always end up failing statistical tests. This means
relative to their state size. The same _would_ also happen to larger generators that any generator with 32-bit state will always fail in any of those test sets.
too if you ran them long enough, it's just that the amount of required output to The theoretical _minimum_ state size for any generator at all to pass the
make the generators fail can quickly range up into "100s of years" and beyond as standard suites is 36 bits, but most generators need many more than that.
your generator gets bigger. With a modern "actual" computer (desktop, server,
cloud VM, etc) a good PRNG can produce an output in about 1 nanosecond (depends
on your exact CPU of course). If we wanted to see how long it'd take to run
through a PRNG's whole state, well [2^32
nanoseconds](https://www.wolframalpha.com/input/?i=2%5E32+nanoseconds+in+years)
is 4.295 seconds, but [2^64
nanoseconds](https://www.wolframalpha.com/input/?i=2%5E64+nanoseconds+in+years)
is 584.9 _years_. Of course, the GBA can't actually run a PRNG that fast (with
our poor little 16.78MHz), but the difference in scale is still there. A small
amount of extra state can make a big difference in generator quality if your
algorithm is putting it to good use.
### Generator Size ### Generator Size
Of course, generator quality has to be held in comparison to generator size and I've mostly chosen to discuss generators that are towards the smaller end of the
features. We don't always need the highest possible quality generators. "But state size scale. In fact we'll be going over many generators that are below the
Lokathor!", I can already hear you shouting. "I want the highest quality 36-bit theoretical minimum to pass all those fancy statistical tests. Why so?
randomness at all times! The game depends on it!", you cry out. Well... does it? Well, we don't always need the highest possible quality generators.
Like, really? The [GBA
"But Lokathor!", I can already hear you shouting. "I want the highest quality
randomness at all times! The game depends on it!", you cry out.
Well... does it? Like, _really_?
The [GBA
Pokemon](https://bulbapedia.bulbagarden.net/wiki/Pseudorandom_number_generation_in_Pok%C3%A9mon) Pokemon](https://bulbapedia.bulbagarden.net/wiki/Pseudorandom_number_generation_in_Pok%C3%A9mon)
games use a _dead simple_ PRNG technique called LCG, which fails statistical games use a _dead simple_ 32-bit LCG (we'll see it below). Then starting with
tests when it's only 32 bits big like the GBA games had. Then starting with the the DS they moved to also using Mersenne Twister, which also fails several
DS they moved to also using Mersenne Twister, which fails several statistical statistical tests and is one of the most predictable PRNGs around. [Metroid
tests and is one of the most predictable PRNGs around. [Metroid
Fusion](http://wiki.metroidconstruction.com/doku.php?id=fusion:technical:rng) Fusion](http://wiki.metroidconstruction.com/doku.php?id=fusion:technical:rng)
has a 100% goofy PRNG system for enemies that would definitely never pass any has a 100% goofy PRNG system for enemies that would definitely never pass any
sort of statistics tests at all. But like, those games were still awesome. Since sort of statistics tests at all. But like, those games were still awesome. Since
we're never going to be keeping secrets safe with our generator, it's okay if we we're never going to be keeping secrets safe with our PRNG, it's okay if we
trade in some quality for something else in return (we obviously don't want to trade in some quality for something else in return (we obviously don't want to
trade quality for nothing). trade quality for nothing).
So let's talk about size: Where's the space used for the Metroid Fusion PRNG? No And you have to ask yourself: Where's the space used for the Metroid Fusion
where at all! They were already using everything involved for other things too, PRNG? No where at all. They were already using everything involved for other
so they're paying no extra cost to have the randomization they do. How much does things too, so they're paying no extra cost to have the randomization they do.
it cost Pokemon to throw in a 32-bit LCG? Just 4 bytes, might as well. How much How much does it cost Pokemon to throw in a 32-bit LCG? Just 4 bytes, might as
does it cost to add in a Mersenne Twister? ~2,500 bytes ya say? I'm sorry _what well. How much does it cost to add in a Mersenne Twister? ~2,500 bytes ya say?
on Earth_? Yeah, that's crazy, we're probably not doing that. I'm sorry _what on Earth_? Yeah, that sounds crazy, we're probably not doing
that one.
### k-Dimensional Equidistribution ### k-Dimensional Equidistribution
@ -153,6 +142,9 @@ Absolutely not. Do you need it for pokemon? No, not even then, but a lot of the
hot new PRNGs have come out just within the past 10 years, so we can't fault hot new PRNGs have come out just within the past 10 years, so we can't fault
them too much for it. them too much for it.
Note that generators that aren't uniform to begin with naturally don't have any
amount of k-Dimensional Equidistribution.
### Other Tricks ### Other Tricks
Finally, some generators have other features that aren't strictly quantifiable. Finally, some generators have other features that aren't strictly quantifiable.
@ -189,6 +181,9 @@ Our first PRNG to mention isn't one that's at all good, but it sure might be
cute to use. It's the PRNG that Super Mario 64 had ([video explanation, cute to use. It's the PRNG that Super Mario 64 had ([video explanation,
long](https://www.youtube.com/watch?v=MiuLeTE2MeQ)). long](https://www.youtube.com/watch?v=MiuLeTE2MeQ)).
With a PRNG this simple the output of one call is _also_ the seed to the next
call.
```rust ```rust
pub fn sm64(mut input: u16) -> u16 { pub fn sm64(mut input: u16) -> u16 {
if input == 0x560A { if input == 0x560A {
@ -246,12 +241,12 @@ You should _not_ use this as your default generator if you care about quality.
It is _very_ fast though... if you want to set everything else on fire for It is _very_ fast though... if you want to set everything else on fire for
speed. If you do, please _at least_ remember that the highest bits are the best speed. If you do, please _at least_ remember that the highest bits are the best
ones, so if you're after less than 32 bits you should shift the high ones down ones, so if you're after less than 32 bits you should shift the high ones down
and keep those. If you want to turn it into a `bool` cast to `i32` and then and keep those, or if you want to turn it into a `bool` cast to `i32` and then
check if it's negative. check if it's negative, etc.
```rust ```rust
pub fn pkmn_lcg(seed: u32) -> u32 { pub fn lcg32(seed: u32) -> u32 {
seed.wrapping_mul(0x41C6_4E6D).wrapping_add(0x0000_6073) seed.wrapping_mul(0x41C6_4E6D).wrapping_add(0x6073)
} }
``` ```
@ -264,6 +259,46 @@ flag to change how debug mode works, or (for more "portable" code) you can just
make the call to `wrapping_mul`. All the same goes for add and subtract and so make the call to `wrapping_mul`. All the same goes for add and subtract and so
on. on.
#### Multi-stream Generators
Note that you don't have to add a compile time constant, you could add a runtime
value instead. Doing so allows the generator to be "multi-stream", with each
different additive value being its own unique output stream. This true of LCGs
as well as all the PCGs below (since they're LCG based). The examples here just
use a fixed stream for simplicity and to save space, but if you want streams you
can add that in for only a small amount of extra space used:
```rust
pub fn lcg_streaming(seed: u32, stream: u32) -> u32 {
seed.wrapping_mul(0x41C6_4E6D).wrapping_add(stream)
}
```
With a streaming LCG you should _probably_ pass the same stream value every
single time. If you don't, then your generator will jump between streams in some
crazy way and you lose your nice uniformity properties.
However, there is also the possibility of changing the stream value exactly when
the seed lands on a pre-determined value after transformation. We need to keep
odd stream values, and we would like to ensure our stream performs a full cycle
itself, so we'll just add 2 for simplicity:
```rust
let next_seed = lcg_streaming(seed, stream);
// It's cheapest to test for 0, so we pick 0
if seed == 0 {
stream = stream.wrapping_add(2)
}
```
If you adjust streams at a fixed time like that then you end up going cleanly
through one stream cycle, and then the next stream cycle, and so on. This lets
you have a vastly increased generator period for minimal additional overhead.
The bit size of your generator's increment value type (minus 1, since the 1s bit
must always be odd) gets directly multiplied into your base generator's period
(2^state_size, for LCGs and PCGs). So an LCG32 with a 32-bit stream selection
would have a period of 2^32 * 2^31 = 2^63.
### PCG16 XSH-RR (32-bit state, 16-bit output, uniform) ### PCG16 XSH-RR (32-bit state, 16-bit output, uniform)
The [Permuted Congruential The [Permuted Congruential
@ -291,27 +326,29 @@ Obviously we'll have 32 bits of state, and so 16 bits of output.
Of course, since PCG is based on a LCG, we have to start with a good LCG base. Of course, since PCG is based on a LCG, we have to start with a good LCG base.
As I said above, a better or worse set of LCG constants can make your generator As I said above, a better or worse set of LCG constants can make your generator
better or worse. I'm not an expert, so I [asked an better or worse. The Wikipedia example for PCG has a good 64-bit constant, but
expert](http://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-00996-5/S0025-5718-99-00996-5.pdf). not a 32-bit constant. So we gotta [ask an
I'm definitely not the best at reading math papers, but it seems that the expert](http://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-00996-5/S0025-5718-99-00996-5.pdf)
general idea is that we want `m % 8 == 5` and `is_even(a)` to both hold for the about what a good 32-bit constant would be. I'm definitely not the best at
values we pick. There are three suggested LCG multipliers. In a chart. A chart reading math papers, but it seems that the general idea is that we want `m % 8
that's hard to understand. Truth be told I asked some folks that are good at == 5` and `is_even(a)` to both hold for the values we pick. There are three
math papers and even they couldn't make sense of the chart. They concluded the suggested LCG multipliers in a chart on page 10. A chart that's quite hard to
same as I did that we probably want to pick the `32310901` option. For an understand. Truth be told I asked several folks that are good at math papers and
additive value, we can pick any odd value, so we might as well pick something even they couldn't make sense of the chart. Eventually `timutable` read the
small so that we can do an immediate add. whole paper in depth and concluded the same as I did: that we probably want to
pick the `32310901` option.
_Immediate_ add? That sounds new. An immediate instruction is where the op code For an additive value, we can pick any odd value, so we might as well pick
bits of an instruction (add, mul, etc) don't take up much space within the full something small so that we can do an immediate add. _Immediate_ add? That sounds
instruction, so the rest of the bits can encode one side of the operation new. An immediate instruction is when one side of an operation is small enough
instead of having to specify two separate registers. It usually means one less that you can encode the value directly into the space that'd normally be for the
load you have to do, if you're working with small enough numbers. To see what I register you want to use. It basically means one less load you have to do, if
mean compare [loading the add value](https://rust.godbolt.org/z/LKCFUS) to you're working with small enough numbers. To see what I mean compare [loading
[immediate add value](https://rust.godbolt.org/z/SnZW9a). It's something you the add value](https://rust.godbolt.org/z/LKCFUS) and [immediate add
might have seen frequently in `x86` or `x86_64` ASM output, but because a thumb value](https://rust.godbolt.org/z/SnZW9a). It's something you might have seen
instruction is only 16 bits total, we can only get immediate instructions if the frequently in `x86` or `x86_64` ASM output, but because a thumb instruction is
target value is 8 bits or less, so we haven't used them too much ourselves yet. only 16 bits total, we can only get immediate instructions if the target value
is 8 bits or less, so we haven't used them too much ourselves yet.
I guess we'll pick 5, because I happen to personally like the number. I guess we'll pick 5, because I happen to personally like the number.
@ -399,21 +436,22 @@ pub fn pcg32_rxs_m_xs(seed: &mut u32) -> u32 {
### Xoshiro128** (128-bit state, 32-bit output, non-uniform) ### Xoshiro128** (128-bit state, 32-bit output, non-uniform)
It was suggested that I not show complete favoritism to just the PCG, and so we The [Xoshiro128**](http://xoshiro.di.unimi.it/xoshiro128starstar.c) generator is
will also look at the an advancement of the [Xorshift family](https://en.wikipedia.org/wiki/Xorshift).
[Xoshiro128**](http://xoshiro.di.unimi.it/xoshiro128starstar.c) generator. Take It was specifically requested, and I'm not aware of Xorshift specifically being
care not to confuse it with the used in any of my favorite games, so instead of going over Xorshift and then
[Xoroshiro128**](http://xoshiro.di.unimi.it/xoroshiro128starstar.c) generator leading up to this, we'll just jump straight to this. Take care not to confuse
this generator with the very similarly named
[Xoroshiro128**](http://xoshiro.di.unimi.it/xoroshiro128starstar.c) generator,
which is the 64 bit variant. Note the extra "ro" hiding in the 64-bit version's which is the 64 bit variant. Note the extra "ro" hiding in the 64-bit version's
name. name near the start.
Anyway, weird names aside, you can look at the C version that I linked to, or Anyway, weird names aside, it's fairly zippy. The biggest downside is that you
this Rust translation below. It's zippy and all, though 0 will be produced one can't have a seed state that's all 0s, and as a result 0 will be produced one
less time than all other outputs, making it non-uniform by just a little bit. It less time than all other outputs within a full cycle, making it non-uniform by
also has a fixed jump function. just a little bit. You also can't do a simple stream selection like with the LCG
based generators, instead it has a fixed jump function that advances a seed as
**Important:** With this generator you _must_ initialize the seed array to not if you'd done 2^64 normal generator advancements.
be all 0s before you start using the generator.
```rust ```rust
pub fn xoshiro128_starstar(seed: &mut [u32; 4]) -> u32 { pub fn xoshiro128_starstar(seed: &mut [u32; 4]) -> u32 {
@ -458,27 +496,33 @@ pub fn xoshiro128_starstar_jump(seed: &mut [u32; 4]) {
[Compiler Explorer](https://rust.godbolt.org/z/PGvwZw) [Compiler Explorer](https://rust.godbolt.org/z/PGvwZw)
### More Generators? ### jsf
For completeness I'll even list some generators that I looked at as potential TODO https://gist.github.com/imneme/85cff47d4bad8de6bdeb671f9c76c814
options and then _didn't_ include, along with why I chose to skip them.
### gjrand
TODO https://gist.github.com/imneme/7a783e20f71259cc13e219829bcea4ac
### sfc
TODO https://gist.github.com/imneme/f1f7821f07cf76504a97f6537c818083
### v3b
TODO http://cipherdev.org/v3b.c
### Other Generators?
* [Xorshift family](https://en.wikipedia.org/wiki/Xorshift): the base form gives
N->N with a period of 2^N-1 (aka, non-uniform output). We already have the
LCG32 example for fast 32->32 with uniform output. There's other Xorshift
variants but none of them stood out to me since we also have `Xoshiro128**`,
which is basically the even more refined version of this general group.
* [Mersenne Twister](https://en.wikipedia.org/wiki/Mersenne_Twister): Gosh, 2.5k * [Mersenne Twister](https://en.wikipedia.org/wiki/Mersenne_Twister): Gosh, 2.5k
is just way too many for me to ever want to use this thing. If you'd really is just way too many for me to ever want to use this thing. If you'd really
like to use it, there is a like to use it, there is [a
[crate](https://docs.rs/mersenne_twister/1.1.1/mersenne_twister/) for it that crate](https://docs.rs/mersenne_twister/1.1.1/mersenne_twister/) for it that
already has it. Small catch, they use a ton of stuff from `std` that they already has it. Small catch, they use a ton of stuff from `std` that they
could be importing from `core`, so you'll have to fork it and patch it could be importing from `core`, so you'll have to fork it and patch it
yourself to get it working on the GBA. They also stupidly depend on an old yourself to get it working on the GBA. They also stupidly depend on an old
version of `rand`, so you'll have to cut out that nonsense. version of `rand`, so you'll have to cut out that nonsense.
TODO
## Placing a Value In Range ## Placing a Value In Range
I said earlier that you can always take a uniform output and then throw out some I said earlier that you can always take a uniform output and then throw out some
@ -641,12 +685,10 @@ mutability the same as we do (barbaric, I know).
Finally, what's `rng_t` actually defined as? Well, I sure don't know, but in our Finally, what's `rng_t` actually defined as? Well, I sure don't know, but in our
context it's taking nothing and then spitting out a `u32`. We'll also presume context it's taking nothing and then spitting out a `u32`. We'll also presume
that it's a different `u32` each time (not a huge leap in this context). To us that it's a different `u32` each time (not a huge leap in this context). To us
rust programmers that means we'd want something like `FnMut() -> u32`. rust programmers that means we'd want something like `impl FnMut() -> u32`.
TODO: use `impl FnMut` to avoid the trait object nonsense
```rust ```rust
pub fn bounded_rand(rng: &mut FnMut() -> u32, range: u32) -> u32 { pub fn bounded_rand(rng: &mut impl FnMut() -> u32, range: u32) -> u32 {
let mut x: u32 = rng(); let mut x: u32 = rng();
let mut m: u64 = x as u64 * range as u64; let mut m: u64 = x as u64 * range as u64;
let mut l: u32 = m as u32; let mut l: u32 = m as u32;
@ -681,7 +723,7 @@ really wanna be doing those 64-bit multiplies. Let's try again with everything
scaled down one stage: scaled down one stage:
```rust ```rust
pub fn bounded_rand16(rng: &mut FnMut() -> u16, range: u16) -> u16 { pub fn bounded_rand16(rng: &mut impl FnMut() -> u16, range: u16) -> u16 {
let mut x: u16 = rng(); let mut x: u16 = rng();
let mut m: u32 = x as u32 * range as u32; let mut m: u32 = x as u32 * range as u32;
let mut l: u16 = m as u16; let mut l: u16 = m as u16;
@ -789,7 +831,7 @@ impl RandRangeU16 {
RandRangeU16 { range, threshold } RandRangeU16 { range, threshold }
} }
pub fn roll_random(&self, rng: &mut FnMut() -> u16) -> u16 { pub fn roll_random(&self, rng: &mut impl FnMut() -> u16) -> u16 {
let mut x: u16 = rng(); let mut x: u16 = rng();
let mut m: u32 = x as u32 * self.range as u32; let mut m: u32 = x as u32 * self.range as u32;
let mut l: u16 = m as u16; let mut l: u16 = m as u16;
@ -826,7 +868,7 @@ uint32_t bounded_rand(rng_t& rng, uint32_t range) {
And in Rust And in Rust
```rust ```rust
pub fn bounded_rand32(rng: &mut FnMut() -> u32, mut range: u32) -> u32 { pub fn bounded_rand32(rng: &mut impl FnMut() -> u32, mut range: u32) -> u32 {
let mut mask: u32 = !0; let mut mask: u32 = !0;
range -= 1; range -= 1;
mask >>= (range | 1).leading_zeros(); mask >>= (range | 1).leading_zeros();
@ -854,4 +896,15 @@ Life just be that way, I guess.
## Summary ## Summary
That was a whole lot. Let's put them in a table:
| Generator | Bytes | Output | Period | k-Dimensionality |
|:---------------|:-----:|:------:|:------:|:----------------:|
| sm64 | 2 | u16 | 65,114 | 0 |
| lcg32 | 4 | u16 | 2^32 | 1 |
| pcg16_xsh_rr | 4 | u16 | 2^32 | 16 |
| pcg16_xsh_rs | 4 | u16 | 2^32 | 16 |
| pcg32_rxs_m_xs | 4 | u32 | 2^32 | 1 |
| xoshiro128** | 16 | u32 | 2^128-1| 0 |
TODO TODO

View file

@ -536,7 +536,7 @@ impl RandRangeU16 {
RandRangeU16 { range, threshold } RandRangeU16 { range, threshold }
} }
pub fn roll_random(&self, rng: &mut FnMut() -> u16) -> u16 { pub fn roll_random(&self, rng: &mut impl FnMut() -> u16) -> u16 {
let mut x: u16 = rng(); let mut x: u16 = rng();
let mut m: u32 = x as u32 * self.range as u32; let mut m: u32 = x as u32 * self.range as u32;
let mut l: u16 = m as u16; let mut l: u16 = m as u16;
@ -551,7 +551,7 @@ impl RandRangeU16 {
} }
} }
pub fn bounded_rand32(rng: &mut FnMut() -> u32, mut range: u32) -> u32 { pub fn bounded_rand32(rng: &mut impl FnMut() -> u32, mut range: u32) -> u32 {
let mut mask: u32 = !0; let mut mask: u32 = !0;
range -= 1; range -= 1;
mask >>= (range | 1).leading_zeros(); mask >>= (range | 1).leading_zeros();