Both the Vulkan and OpenGL ES spec allow implementations to limit workgroups to
128 threads. Add a LG_WG_FACTOR setting for easy switching between 128 and 256
threads, with 256 being kept as the default setting.
Manually tested that LG_WG_FACTOR = 0 (128 threads) works as expected.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Path segments are unsorted, but other elements are using the same
sort-middle approach as before.
This is a checkpoint. At this point, there are unoptimized versions
of tile init and coarse path raster, but it isn't wired up into a
working pipeline. Also observing about a 3x performance regression in
element processing, which needs to be investigated.