Commit graph

131 commits

Author SHA1 Message Date
Elias Naur fd746ea7a6 name and comment magic constant
Follow-up to review of PR #61.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:45:23 +01:00
Elias Naur 79d722df48 remove unused commands from pathseg
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:45:23 +01:00
Ishi Tatsuyuki 8a499bc50e Always close fill paths, fix #68 2021-03-17 01:16:00 +09:00
Elias Naur b73eabf4eb kernel4.comp: remove unused commands
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-02-24 15:32:24 +01:00
Elias Naur 6a4e26ef2a all: add optional memory checks
Defining MEM_DEBUG in mem.h will add a size field to Alloc and enable
bounds and alignment checks for every memory read and write.

Notes:
- Deriving an Alloc from Path.tiles is unsound, but it's more trouble to
  convert Path.tiles from TileRef to a variable sized Alloc.
- elements.comp note that "We should be able to use an array of structs but the
  NV shader compiler doesn't seem to like it". If that's still relevant, does
  the shared arrays of Allocs work?

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-02-15 16:07:45 +01:00
Elias Naur ee67a0a515 kernel4: simplify a tiny bit
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur 716517cc04 coarse,binning: organize bins into width_in_bins x height_in_bins
The binning shader supports up to N_TILE bins. To efficiently cover wide or
tall viewports, convert the rigid N_TILE_X x N_TILE_Y bin layout to a variable
width_in_bins x height_in_bins layout.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur ef4ec772ad backdrop: repair unsound optimization
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur 8b62022749 backdrop: avoid a (benign) zero-sized read
Found with MEM_DEBUG added in later change.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur c4f5a69a0d implement variable output sizing
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur c67696714b coarse.comp: don't write Cmd_End to tiles out of bounds
If WIDTH_IN_TILES or HEIGHT_IN_TILES are not divisible by N_TILE_X or N_TILE_Y
respectively, the previously unconditional Cmd_End_write would write out of
bounds.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur 4de67d9081 unify GPU memory management
Merge all static and dynamic buffers to just one, "memory". Add a malloc
function for dynamic allocations.

Unify static allocation offsets into a "config" buffer containing scene setup
(number of paths, number of path segments), as well as the memory offsets of
the static allocations.

Finally, set an overflow flag when an allocation fail, and make sure to exit
shader execution as soon as that triggers. Add checks before beginning
execution in case the client wants to run two or more shaders before checking
the flag.

The "state" buffer is left alone because it needs zero'ing and because it is
accessed with the "volatile" keyword.

Fixes #40

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur a2a2d12c5d path_coarse.comp: fix intersection inconsistencies, take 2
The previous attempt to fix inconsistent intersections because of floating
point inaccuracy[0] missed two cases.

The first case is that for top intersections with the very first row would fail
the test

tag == PathSeg_FillCubic && y > y0 && xbackdrop < bbox.z

In particular, y is not larger than y0 when y0 has been clipped to 0.

Fix that by re-introducing the min(p0.y, p1.y) < tile_y0 check that does work
and is just as consistent. Add similar check, min(p0.x, p1.x) < tile_x0, for
deciding when to clip the segment to the left edge (but keep consistent xray check
for deciding left edge *intersections*).

The second case is that the tracking left intersections in the [xray, next_xray]
range of tiles may fail when next_xray is forced to last_xray, the final xray value.

Fix that case by computing next_xray explicitly, before looping over the
x tiles. The code is now much simpler.

Finally, ensure that xx0 and xx1 doesn't overflow the allocated number of tiles
by clamping them *after* setting them. Adjust xx0 to include xray, just as xx1
is adjusted; I haven't seen corruption without it, but it's not obvious xx0
always includes xray.

While here, replace a "+=" on a guaranteed zero value to just "=".

Updates #23

[0] 29cfb8b63e

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur d21f2b68de all: add SPDX license headers
Fixes #53

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-11 18:24:35 +01:00
Elias Naur 5c04e4882b remove unused tilegroup.h and extra spaces from kernel4.comp
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-11 15:00:58 +01:00
Elias Naur 580b63e558 elements.comp: tighten state size calculations
The state header is only one word (flags), not two.

Move the partition atomic counter to a separate field instead of state[0],
simplifying state offset calculations.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-10 18:48:16 +01:00
Raph Levien 56aaf7c19a
Merge pull request #52 from linebender/fix_query_pool
Fetch correct query pool
2020-12-09 22:42:23 -08:00
Raph Levien 769d71915e Fetch correct query pool
It should fetch the last query pool, and was off by one. That worked on
my machine (windows), but did cause panics.
2020-12-09 08:34:01 -08:00
Elias Naur 1c6ca7e5fb remove unused BinChunk type
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-08 00:45:08 +01:00
Raph Levien 634530fb91 Merge branch 'master' into image_work 2020-12-02 11:58:45 -08:00
Raph Levien 3906f348fd
Merge pull request #47 from linebender/clip_opt
Optimize clips
2020-12-02 11:57:14 -08:00
Elias Naur 29cfb8b63e eliminate inconsistent line intersections from path_coarse.comp
The finite precision of floating point computations can lead the coarse
renderer into inconsistent tile intersections, which implies impossible line
segments such as lines with gaps or double intersections. The winding number
algorithm is sensitive to these errors which show up as incorrectly filled
paths.

This change forces all intersections to be consistent.
First, the floating point top edge intersection test is removed; top edge intersections are
completely determined by left edge intersections.
Then, left edge intersections are inserted from the tile with the last top edge
intersection. The next top edge is then fixed to be the last tile with a left
edge intersection.

More details in the patch comments.

Fixes #23

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-01 18:35:29 +01:00
Elias Naur 19f4d9fa95 change tile segment representation to (origin, vector)
Eliminates the precision loss of the subtraction in the sign(end.x - start.x)
expression in kernel4. That's important for the next change that avoids
inconsistent line intersections in path_coarse.

Updates #23

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-01 18:34:40 +01:00
Elias Naur 2068171f96 path_coarse.comp: tighten variable scopes, delete unused variables
No functional changes.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-01 18:01:04 +01:00
Raph Levien 97dcb5122e Merge branch 'master' into image_work 2020-11-29 17:09:48 -08:00
Raph Levien b8ea1e35cf Merge branch 'master' into clip_opt 2020-11-29 17:07:46 -08:00
Elias Naur feeb459fa1 remove FillMask and FillMaskInv
Obsoleted by BeginClip/EndClip.

Updates #36

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-11-29 16:59:58 +01:00
Elias Naur bd450ef461 piet-gpu-types: remove unused Segment and SegChunk types
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-11-29 16:51:35 +01:00
Raph Levien 4138f8a516 Optimize clips
Optimize tiles with clip masks that are all-zero or all-one.

Part of #36
2020-11-27 09:30:35 -08:00
Raph Levien facc9e0982 Use sampler for texture images
Provide images to fine rasterization kernel as readonly textures with a
sampler, rather than storage images. That lets us use the GPU's hardware
for sampling, which should be considerably more efficient.

There are a bunch of parameters that are hardcoded, but it does seem to
work.
2020-11-25 18:05:10 -08:00
Raph Levien 047a0830d1 Towards wiring up images to k4
This patch passes a dynamically sized array of textures to the fine
rasterizer.

A bunch of the low level Vulkan stuff is done, but only enough of the
shaders and encoders to do minimal testing. We'll want to switch from
storage images to sampled images, track the actual array of textures
during encoding, use that to build the descriptor set (which will need
to be more dynamic), and of course run image elements through the
pipeline.

Progress towards #38
2020-11-24 22:11:38 -08:00
Raph Levien 6b06d249ab Builder pattern for pipelines
Use a builder pattern for pipelines and descriptor sets, so we can go
richer without hugely complicating existing code.

WIP
2020-11-24 22:11:38 -08:00
Raph Levien a60c2dd3c8 Scratch buffer for clip stack
We keep a small window of the clip stack in registers in the fine
rasterization kernel, and when that window is exceeded, spill to global
memory, so the clip stack can be unbounded.
2020-11-22 18:14:09 -08:00
Raph Levien b928c7a3ed Restore FillMaskInv logic 2020-11-21 10:47:28 -08:00
Raph Levien 13134e7cb3 Restore FillMask logic
Per discussion, don't remove FillMask until we get unbounded clip stacks.
2020-11-21 07:00:03 -08:00
Raph Levien d14895b107 Continuing work on clips
I realized there's a problem with encoding clip bboxes relative to the
current transform (see #36 for a more detailed explanation), so this is
changing it to absolute bboxes.

This more or less gets clips working. There are optimization
opportunities (all-clear and all-opaque mask tiles), and it doesn't deal
with overflow of the blend stack, but it seems to basically work.
2020-11-20 18:25:27 -08:00
Raph Levien f53d00e6bc Add transforms and state stack
Actually handle transforms in RenderCtx (was implemented in renderer but
not actually plumbed through). This also requires maintaining a state
stack, which will also be required for clipping.

This PR also starts work on encoding clipping, including tracking
bounding boxes.

WIP, none of this is tested yet.
2020-11-20 18:25:27 -08:00
Raph Levien 47e24ec9d5 Start adding support for creating images
This is still WIP, focused on creating image resources and making them
available GPU-side.

Progress toward #38
2020-11-19 16:32:29 -08:00
Raph Levien 75c4b62730 Add hub abstraction
The hub does a little better lifetime tracking of resources (so
Rust-side references can be dropped), and in the future will be used for
dynamic selection of backend.

The migration is still a bit half-baked, as there are a bunch of
Vulkan-specific types in the signatures, but it shouldn't be too much
work to sort that out. Perhaps it can wait until there is a second
backend though.

The main motivation for this is to create image objects with lifetime
tracking, one of the things required for #38.
2020-11-18 16:06:08 -08:00
Raph Levien 301abf4db7 Minor cleanups
Mostly cleaning up some comments. Also adds host barrier and a command
to copy a buffer to an image (in preparation for images, see #38).
2020-11-17 14:18:30 -08:00
Raph Levien 8e2f2aeeba Update dependencies
Update to latest versions of all dependencies. Among other things, this
gets us on piet 0.2, though almost all of the changes were around text,
which is not yet implemented.
2020-11-14 08:25:43 -08:00
Elias Naur b942e4035b piet-gpu/shader: ensure forward progress in decoupled lookback
The Vulkan and OpenGL specifications offer only weak forward progress guarantees, and
in practice several mobile devices fail to complete the decoupled lookback
spinloop without mitigation.

This patch implements Raph's suggestion from the "Forward Progress"
section from

https://raphlinus.github.io/gpu/2020/04/30/prefix-sum.html

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-10-25 21:02:58 +01:00
Elias Naur bc01180519 piet-gpu/shader: delete unused is_fill from elements.comp
Delete debug code as well.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-10-25 20:59:54 +01:00
Elias Naur 8fab45544e shader: implement clip paths
Expand the the final kernel4 stage to maintain a per-pixel mask.

Introduce two new path elements, FillMask and FillMaskInv, to fill
the mask. FillMask acts like Fill, while FillMaskInv fills the area
outside the path.

SVG clipPaths is then representable by a FillMaskInv(0.0) for every nested
path, preceded by a FillMask(1.0) to clear the mask.

The bounding box for FillMaskInv elements is the entire screen; tightening of
the bounding box is left for future work. Note that a fullscreen bounding
box is not hopelessly inefficient because completely filling a tile with
a mask is just a single CmdSolidMask per tile.

Fixes #30

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-10-09 13:20:26 +02:00
Elias Naur 55cfd472a5 shader: delete unused code
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-10-09 13:20:26 +02:00
Elias Naur 9be0faba6f piet-gpu-types: remove unused scene elements
Delete image compute shader as well; it is unused.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-27 18:57:53 +02:00
Elias Naur fa9bf0dc2b piet-gpu-types: remove unused ptcl types
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-27 18:30:33 +02:00
Elias Naur dceb0f9412 piet-gpu-types: remove unused annotated types
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-21 10:55:58 +02:00
Elias Naur ac3ac3ddff shader: introduce a crude setting for adjusting the maximum workgroup size
Both the Vulkan and OpenGL ES spec allow implementations to limit workgroups to
128 threads. Add a LG_WG_FACTOR setting for easy switching between 128 and 256
threads, with 256 being kept as the default setting.

Manually tested that LG_WG_FACTOR = 0 (128 threads) works as expected.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-13 13:04:13 +02:00
Elias Naur 326f7f0d03 shader: delete more unused code and variables
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-13 13:03:56 +02:00