Commit graph

26 commits

Author SHA1 Message Date
Tatsuyuki Ishi a7e926d67b shaders: Add .clang-format and reformat
Helps keeping the code tidy.

Style is chosen to minimize diff, but contains a slight bit of personal taste.
2022-01-30 16:33:14 +09:00
Raph Levien 44327fe49f Beginnings of new element pipeline
This successfully renders the tiger; fills and strokes are supported.
Other parts of the imaging model, not yet.

Progress toward #119
2021-12-03 15:33:01 -08:00
Raph Levien 62df7c0bd5 Remove leftover debug stuff
In response to review by Elias.
2021-07-19 08:39:44 -07:00
Raph Levien 29a8975a9a Retain subdivision results
Don't recompute the parameters from quadratic subdivision, but rather
retain them across the two phases (summing the subdivision estimate, and
generating the subdivisions). The motivation for this is that the values
were subtly different (differing by 1 or 2 least signficant bits) across
the two phases. It *might* also be faster depending on ALU/memory
relative performance.

Fixes #107
2021-07-15 11:18:48 -07:00
Elias Naur d9d518b248 avoid non-uniform barrier control flow when exhausting memory
The compute shaders have a check for the succesful completion of their
preceding stage. However, consider a shader execution path like the
following:

	void main()
		if (mem_error != NO_ERROR) {
		    return;
		}
		...
		malloc(...);
		...
		barrier();
		...
	}

and  shader execution that fails to allocate memory, thereby setting
mem_error to ERR_MALLOC_FAILED in malloc before reaching the barrier. If
another shader execution then begins execution, its mem_eror check will
make it return early and not reach the barrier.

All GPU APIs require (dynamically) uniform control flow for barriers,
and the above case may lead to GPU hangs in practice.

Fix this issue by replacing the early exits with careful checks that
don't interrupt barrier control flow.

Unfortunately, it's harder to prove the soundness of the new checks, so
this change also clears dynamic memory ranges in MEM_DEBUG mode when
memory is exhausted. The result is that accessing memory after
exhaustion triggers an error.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-20 10:15:29 +02:00
Elias Naur 44bff2726c collapse FillCubic and StrokeCubic into Cubic with flags for fill mode
Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur a5b6bda941 add support for element flags to shaders
Commit 9afa9b86b6 added Rust support for
encoding flags into elements. This change adds support to shaders by
introducing variant tag structs:

struct VariantTag {
    uint tag;
    uint flags;
}

and returning them from Variant_tag functions.

It also adds a flags argument to write functions for enum variants that
include TagFlags.

No functionality changes.

Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur 07e07c7544 ensure consistent path segment transformation
As described in #62, the non-deterministic scene monoid may result in
slightly different transformations for path segments in an otherwise
closed path.

This change ensures consistent transformation across paths in three steps.

First, absolute transformations computed by the scene monoid is stored
along with path segments and annotated elements.

Second, elements.comp no longer transforms path segments. Instead, each
segment is stored untransformed along with a reference to its absolute
transformation.

Finally, path_coarse performs the transformation of path segments.
Because all segments in a path share a single transformation reference,
the inconsistency in #62 is avoided.

Fixes #62

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:45:23 +01:00
Elias Naur 6a4e26ef2a all: add optional memory checks
Defining MEM_DEBUG in mem.h will add a size field to Alloc and enable
bounds and alignment checks for every memory read and write.

Notes:
- Deriving an Alloc from Path.tiles is unsound, but it's more trouble to
  convert Path.tiles from TileRef to a variable sized Alloc.
- elements.comp note that "We should be able to use an array of structs but the
  NV shader compiler doesn't seem to like it". If that's still relevant, does
  the shared arrays of Allocs work?

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-02-15 16:07:45 +01:00
Elias Naur 4de67d9081 unify GPU memory management
Merge all static and dynamic buffers to just one, "memory". Add a malloc
function for dynamic allocations.

Unify static allocation offsets into a "config" buffer containing scene setup
(number of paths, number of path segments), as well as the memory offsets of
the static allocations.

Finally, set an overflow flag when an allocation fail, and make sure to exit
shader execution as soon as that triggers. Add checks before beginning
execution in case the client wants to run two or more shaders before checking
the flag.

The "state" buffer is left alone because it needs zero'ing and because it is
accessed with the "volatile" keyword.

Fixes #40

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur a2a2d12c5d path_coarse.comp: fix intersection inconsistencies, take 2
The previous attempt to fix inconsistent intersections because of floating
point inaccuracy[0] missed two cases.

The first case is that for top intersections with the very first row would fail
the test

tag == PathSeg_FillCubic && y > y0 && xbackdrop < bbox.z

In particular, y is not larger than y0 when y0 has been clipped to 0.

Fix that by re-introducing the min(p0.y, p1.y) < tile_y0 check that does work
and is just as consistent. Add similar check, min(p0.x, p1.x) < tile_x0, for
deciding when to clip the segment to the left edge (but keep consistent xray check
for deciding left edge *intersections*).

The second case is that the tracking left intersections in the [xray, next_xray]
range of tiles may fail when next_xray is forced to last_xray, the final xray value.

Fix that case by computing next_xray explicitly, before looping over the
x tiles. The code is now much simpler.

Finally, ensure that xx0 and xx1 doesn't overflow the allocated number of tiles
by clamping them *after* setting them. Adjust xx0 to include xray, just as xx1
is adjusted; I haven't seen corruption without it, but it's not obvious xx0
always includes xray.

While here, replace a "+=" on a guaranteed zero value to just "=".

Updates #23

[0] 29cfb8b63e

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur d21f2b68de all: add SPDX license headers
Fixes #53

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-11 18:24:35 +01:00
Elias Naur 29cfb8b63e eliminate inconsistent line intersections from path_coarse.comp
The finite precision of floating point computations can lead the coarse
renderer into inconsistent tile intersections, which implies impossible line
segments such as lines with gaps or double intersections. The winding number
algorithm is sensitive to these errors which show up as incorrectly filled
paths.

This change forces all intersections to be consistent.
First, the floating point top edge intersection test is removed; top edge intersections are
completely determined by left edge intersections.
Then, left edge intersections are inserted from the tile with the last top edge
intersection. The next top edge is then fixed to be the last tile with a left
edge intersection.

More details in the patch comments.

Fixes #23

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-01 18:35:29 +01:00
Elias Naur 19f4d9fa95 change tile segment representation to (origin, vector)
Eliminates the precision loss of the subtraction in the sign(end.x - start.x)
expression in kernel4. That's important for the next change that avoids
inconsistent line intersections in path_coarse.

Updates #23

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-01 18:34:40 +01:00
Elias Naur 2068171f96 path_coarse.comp: tighten variable scopes, delete unused variables
No functional changes.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-01 18:01:04 +01:00
Elias Naur 326f7f0d03 shader: delete more unused code and variables
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-13 13:03:56 +02:00
Raph Levien 79cc9da811 Fancy flattening
Implement same flattening algorithm as kurbo.
2020-06-09 20:45:19 -07:00
Raph Levien eaa1d261c3 Sederberg error metric
Use proper math to compute number of subdivisions. This works but is not
very satisfying, as it over-subdivides.
2020-06-09 18:43:49 -07:00
Raph Levien b571e0d10c Continue wiring up gpu-side flattening
All segments given to path coarse raster are cubics. Flatten to
quadratics.

This works but the quality is not (yet) good.
2020-06-09 17:56:11 -07:00
Raph Levien 3a8227d025 Non-load balanced coarse path raster
This is a bit of a revert of the load-balanced ("more parallel") coarse
path rasterizer, but includes fills and also uses atomicExchange.

I'm doing it this way because it should be considerably easier to do
flattening in this structure, even though there will be some performance
regression.
2020-06-09 15:09:53 -07:00
Raph Levien 7118c8efc1 Fix backdrop of segments to left of viewport
Make sure we account for backdrop in segments clipped out of viewport.
2020-06-09 10:25:22 -07:00
Raph Levien af0a1af8e1 Make fills work
The backdrop propagation is slow but it does work.
2020-06-05 22:40:44 -07:00
Raph Levien f9f5961428 Use atomicExchange over atomicCompSwap
Significant perf win (approx 2x in the path coarse rasterizer)
2020-06-05 08:24:26 -07:00
Raph Levien e5dd9ae01e More parallel path coarse raster
Use fancier load balancing algorithm for coarse rendering of paths.

Seems to work and an improvement in some cases.
2020-06-04 17:42:33 -07:00
Raph Levien 70a9c17e23 Continue building out pipeline
Plumbs the new tiling scheme to k4. This works (stroke only) but still
has some performance issues.
2020-06-03 12:21:09 -07:00
Raph Levien 294f6fd1db Experiment with new sorting scheme
Path segments are unsorted, but other elements are using the same
sort-middle approach as before.

This is a checkpoint. At this point, there are unoptimized versions
of tile init and coarse path raster, but it isn't wired up into a
working pipeline. Also observing about a 3x performance regression in
element processing, which needs to be investigated.
2020-06-03 09:29:25 -07:00