Defining MEM_DEBUG in mem.h will add a size field to Alloc and enable
bounds and alignment checks for every memory read and write.
Notes:
- Deriving an Alloc from Path.tiles is unsound, but it's more trouble to
convert Path.tiles from TileRef to a variable sized Alloc.
- elements.comp note that "We should be able to use an array of structs but the
NV shader compiler doesn't seem to like it". If that's still relevant, does
the shared arrays of Allocs work?
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The binning shader supports up to N_TILE bins. To efficiently cover wide or
tall viewports, convert the rigid N_TILE_X x N_TILE_Y bin layout to a variable
width_in_bins x height_in_bins layout.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
If WIDTH_IN_TILES or HEIGHT_IN_TILES are not divisible by N_TILE_X or N_TILE_Y
respectively, the previously unconditional Cmd_End_write would write out of
bounds.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Merge all static and dynamic buffers to just one, "memory". Add a malloc
function for dynamic allocations.
Unify static allocation offsets into a "config" buffer containing scene setup
(number of paths, number of path segments), as well as the memory offsets of
the static allocations.
Finally, set an overflow flag when an allocation fail, and make sure to exit
shader execution as soon as that triggers. Add checks before beginning
execution in case the client wants to run two or more shaders before checking
the flag.
The "state" buffer is left alone because it needs zero'ing and because it is
accessed with the "volatile" keyword.
Fixes#40
Signed-off-by: Elias Naur <mail@eliasnaur.com>
I realized there's a problem with encoding clip bboxes relative to the
current transform (see #36 for a more detailed explanation), so this is
changing it to absolute bboxes.
This more or less gets clips working. There are optimization
opportunities (all-clear and all-opaque mask tiles), and it doesn't deal
with overflow of the blend stack, but it seems to basically work.
Expand the the final kernel4 stage to maintain a per-pixel mask.
Introduce two new path elements, FillMask and FillMaskInv, to fill
the mask. FillMask acts like Fill, while FillMaskInv fills the area
outside the path.
SVG clipPaths is then representable by a FillMaskInv(0.0) for every nested
path, preceded by a FillMask(1.0) to clear the mask.
The bounding box for FillMaskInv elements is the entire screen; tightening of
the bounding box is left for future work. Note that a fullscreen bounding
box is not hopelessly inefficient because completely filling a tile with
a mask is just a single CmdSolidMask per tile.
Fixes#30
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Both the Vulkan and OpenGL ES spec allow implementations to limit workgroups to
128 threads. Add a LG_WG_FACTOR setting for easy switching between 128 and 256
threads, with 256 being kept as the default setting.
Manually tested that LG_WG_FACTOR = 0 (128 threads) works as expected.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Have a more-parallel read of the tile structures based on bbox coverage,
and only set the bit when the tile isn't empty.
This is a speedup, but there is some duplicated work and it is possible
to improve it further.
Another speedup might be to special-case when the number of chunks in a
stroke or fill command is 1, then the segment header doesn't need
allocation and memory traffic is reduced. But right now we'll avoid the
complexity.
Coarse rasterization wasn't entirely taking line width into account.
Also fix swizzle in matrix (not yet used). And fix missing End command
in ptcl output (hasn't been a problem because buffer was cleared).
Trying to fit it into the fancy monad doesn't really work, so use a
more straightforward approach to compute it from the aggregate.
Also add yEdge logic (basically copying piet-metal). With a fix to
ELEMENT_BINNING_RATIO (which I had simply gotten wrong), the example
renders almost correctly, with small bounding box artifacts.
Write the right_edge to the binning output.
More work on encoding the fill/stroke distinction and plumbing that
through the pipeline. This is a bit unsatisfying because of the code
duplication; having an extra fill/stroke bool might be better, but I
want to avoid making the structs bigger (this could be solved by
better packing in the struct encoding).
Fills are plumbed through to the last stage. Backdrop is WIP.
Compute tile coverage of segments using optimized algorithm. This
algorithm does a bit of setup, then uses an efficient formula to
compute the span per scan-line.
As of this point, it mostly renders stroke outlines for tiger. Some
dropouts are because the scan in the elements pass doesn't do lookback
yet, others are probably a bug.