Make max workgroup size 256 and respect LG_WG_FACTOR. Because the monoid scans only support a height of 2, this will reduce the maximum scene complexity we can render. But it also increases compatibility. Supporting larger scans is a TODO.
Translate all piet-gpu shaders into DXIL and MSL; move generated files into the shader/gen directory.