Reduce allocation of descriptor heaps. This change also enables clearing
of buffers, as the handles are needed at command dispatch time.
Also updates the tests to use clear_buffers on DX12. Looking forward to
being able to get rid of the compute shader workaround on Metal.
This is a followup on #125, and progress toward #95
This is our version of the standard message passing litmus test for
atomics. It does a bunch in parallel and permutes the reads and writes
extensively, so it's been more sensitive than existing tests.