Defining MEM_DEBUG in mem.h will add a size field to Alloc and enable
bounds and alignment checks for every memory read and write.
Notes:
- Deriving an Alloc from Path.tiles is unsound, but it's more trouble to
convert Path.tiles from TileRef to a variable sized Alloc.
- elements.comp note that "We should be able to use an array of structs but the
NV shader compiler doesn't seem to like it". If that's still relevant, does
the shared arrays of Allocs work?
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The binning shader supports up to N_TILE bins. To efficiently cover wide or
tall viewports, convert the rigid N_TILE_X x N_TILE_Y bin layout to a variable
width_in_bins x height_in_bins layout.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
If WIDTH_IN_TILES or HEIGHT_IN_TILES are not divisible by N_TILE_X or N_TILE_Y
respectively, the previously unconditional Cmd_End_write would write out of
bounds.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Merge all static and dynamic buffers to just one, "memory". Add a malloc
function for dynamic allocations.
Unify static allocation offsets into a "config" buffer containing scene setup
(number of paths, number of path segments), as well as the memory offsets of
the static allocations.
Finally, set an overflow flag when an allocation fail, and make sure to exit
shader execution as soon as that triggers. Add checks before beginning
execution in case the client wants to run two or more shaders before checking
the flag.
The "state" buffer is left alone because it needs zero'ing and because it is
accessed with the "volatile" keyword.
Fixes#40
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The previous attempt to fix inconsistent intersections because of floating
point inaccuracy[0] missed two cases.
The first case is that for top intersections with the very first row would fail
the test
tag == PathSeg_FillCubic && y > y0 && xbackdrop < bbox.z
In particular, y is not larger than y0 when y0 has been clipped to 0.
Fix that by re-introducing the min(p0.y, p1.y) < tile_y0 check that does work
and is just as consistent. Add similar check, min(p0.x, p1.x) < tile_x0, for
deciding when to clip the segment to the left edge (but keep consistent xray check
for deciding left edge *intersections*).
The second case is that the tracking left intersections in the [xray, next_xray]
range of tiles may fail when next_xray is forced to last_xray, the final xray value.
Fix that case by computing next_xray explicitly, before looping over the
x tiles. The code is now much simpler.
Finally, ensure that xx0 and xx1 doesn't overflow the allocated number of tiles
by clamping them *after* setting them. Adjust xx0 to include xray, just as xx1
is adjusted; I haven't seen corruption without it, but it's not obvious xx0
always includes xray.
While here, replace a "+=" on a guaranteed zero value to just "=".
Updates #23
[0] 29cfb8b63e
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The state header is only one word (flags), not two.
Move the partition atomic counter to a separate field instead of state[0],
simplifying state offset calculations.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
The finite precision of floating point computations can lead the coarse
renderer into inconsistent tile intersections, which implies impossible line
segments such as lines with gaps or double intersections. The winding number
algorithm is sensitive to these errors which show up as incorrectly filled
paths.
This change forces all intersections to be consistent.
First, the floating point top edge intersection test is removed; top edge intersections are
completely determined by left edge intersections.
Then, left edge intersections are inserted from the tile with the last top edge
intersection. The next top edge is then fixed to be the last tile with a left
edge intersection.
More details in the patch comments.
Fixes#23
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Eliminates the precision loss of the subtraction in the sign(end.x - start.x)
expression in kernel4. That's important for the next change that avoids
inconsistent line intersections in path_coarse.
Updates #23
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Provide images to fine rasterization kernel as readonly textures with a
sampler, rather than storage images. That lets us use the GPU's hardware
for sampling, which should be considerably more efficient.
There are a bunch of parameters that are hardcoded, but it does seem to
work.
This patch passes a dynamically sized array of textures to the fine
rasterizer.
A bunch of the low level Vulkan stuff is done, but only enough of the
shaders and encoders to do minimal testing. We'll want to switch from
storage images to sampled images, track the actual array of textures
during encoding, use that to build the descriptor set (which will need
to be more dynamic), and of course run image elements through the
pipeline.
Progress towards #38
We keep a small window of the clip stack in registers in the fine
rasterization kernel, and when that window is exceeded, spill to global
memory, so the clip stack can be unbounded.
I realized there's a problem with encoding clip bboxes relative to the
current transform (see #36 for a more detailed explanation), so this is
changing it to absolute bboxes.
This more or less gets clips working. There are optimization
opportunities (all-clear and all-opaque mask tiles), and it doesn't deal
with overflow of the blend stack, but it seems to basically work.
Actually handle transforms in RenderCtx (was implemented in renderer but
not actually plumbed through). This also requires maintaining a state
stack, which will also be required for clipping.
This PR also starts work on encoding clipping, including tracking
bounding boxes.
WIP, none of this is tested yet.