Use compute pass for tests in tests subdir. This is also shaking out some issues that weren't apparent from just collatz.
In particular, we need more autorelease pools to prevent things from leaking. As of this commit, the "clear" test runs correctly but the others haven't yet been converted to the compute_pass format.
The current plan is to more or less follow the wgpu/wgpu-hal approach. In the mux/backend layer (which corresponds fairly strongly to wgpu-hal), there isn't explicit construction of a compute encoder, but there are new methods for beginning and ending a compute pass. At the hub layer (which corresponds to wgpu) there will be a ComputeEncoder object.
That said, there will be some differences. The WebGPU "end" method on a compute encoder is implemented in wgpu as Drop, and that is not ideal. Also, the wgpu-hal approach to timer queries (still based on write_timestamp) is not up to the task of Metal timer queries, where the query offsets have to be specified at compute encoder creation. That's why there are different projects :)
WIP: current state is that stage-style queries work on Apple Silicon, but non-Metal backends are broken, and piet-gpu is not yet updated to use new API.
This puts most of the infrastructure in place but I'm hitting an error
that "sampleCountersInBuffer is not supported on this device".
The issue is that M1 supports stage boundaries and not command boundaries.
We'll have to rework the logic a bit. (And, in the most general case, support
both)
Start implementing stage boundaries, but it will probably require an API
change.
This gets a swapchain displayed and fills out a number of the image
related parts of the API: image creation, binding to descriptor sets,
and blitting.