This is a bit of a revert of the load-balanced ("more parallel") coarse
path rasterizer, but includes fills and also uses atomicExchange.
I'm doing it this way because it should be considerably easier to do
flattening in this structure, even though there will be some performance
regression.
Path segments are unsorted, but other elements are using the same
sort-middle approach as before.
This is a checkpoint. At this point, there are unoptimized versions
of tile init and coarse path raster, but it isn't wired up into a
working pipeline. Also observing about a 3x performance regression in
element processing, which needs to be investigated.