Coarse rasterization wasn't entirely taking line width into account.
Also fix swizzle in matrix (not yet used). And fix missing End command
in ptcl output (hasn't been a problem because buffer was cleared).
Trying to fit it into the fancy monad doesn't really work, so use a
more straightforward approach to compute it from the aggregate.
Also add yEdge logic (basically copying piet-metal). With a fix to
ELEMENT_BINNING_RATIO (which I had simply gotten wrong), the example
renders almost correctly, with small bounding box artifacts.
Write the right_edge to the binning output.
More work on encoding the fill/stroke distinction and plumbing that
through the pipeline. This is a bit unsatisfying because of the code
duplication; having an extra fill/stroke bool might be better, but I
want to avoid making the structs bigger (this could be solved by
better packing in the struct encoding).
Fills are plumbed through to the last stage. Backdrop is WIP.
This should get the "right_edge" value for each segment plumbed through
to the binning phase. It also needs to be plumbed to coarse raster and
wired up there.
Also considering WIP because none of this logic has been tested yet.
Compute tile coverage of segments using optimized algorithm. This
algorithm does a bit of setup, then uses an efficient formula to
compute the span per scan-line.
As of this point, it mostly renders stroke outlines for tiger. Some
dropouts are because the scan in the elements pass doesn't do lookback
yet, others are probably a bug.
This version seems to work but the allocation of segments has low
utilization. Probably best to allocate in chunks rather than try to
make them contiguous.
This just adds the first step of polyline stroking, which is adding it
to the scene. Also just a bit of cleaning up of dimensions into one
header file.
Handling f16 requires special work, compared to other scalars, as the minimum conversion operation for u32->f16 in GLSL (unpackHalf2x16) loads two f16s from one u32. This means that in order to minimize unnecessary calls to unpackHalf2x16, we should look-ahead to see if the current f16 has already been extracted in the process of dealing with the last f16. Similar considerations exist for write operations, where we want to pack, when possible, two f16s in one go (using packHalf2x16).