Make max workgroup size 256 and respect LG_WG_FACTOR.
Because the monoid scans only support a height of 2, this will reduce
the maximum scene complexity we can render. But it also increases
compatibility. Supporting larger scans is a TODO.
There's a bit of reorganizing as well. Shader stages are made available
from piet-gpu to the test rig, config is now a proper structure
(marshaled with bytemuck).
This commit just has the transform stage, which is a simple monoid scan
of affine transforms.
Progress toward #119