Commit graph

91 commits

Author SHA1 Message Date
Raph Levien 8e2f2aeeba Update dependencies
Update to latest versions of all dependencies. Among other things, this
gets us on piet 0.2, though almost all of the changes were around text,
which is not yet implemented.
2020-11-14 08:25:43 -08:00
Elias Naur b942e4035b piet-gpu/shader: ensure forward progress in decoupled lookback
The Vulkan and OpenGL specifications offer only weak forward progress guarantees, and
in practice several mobile devices fail to complete the decoupled lookback
spinloop without mitigation.

This patch implements Raph's suggestion from the "Forward Progress"
section from

https://raphlinus.github.io/gpu/2020/04/30/prefix-sum.html

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-10-25 21:02:58 +01:00
Elias Naur bc01180519 piet-gpu/shader: delete unused is_fill from elements.comp
Delete debug code as well.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-10-25 20:59:54 +01:00
Elias Naur 8fab45544e shader: implement clip paths
Expand the the final kernel4 stage to maintain a per-pixel mask.

Introduce two new path elements, FillMask and FillMaskInv, to fill
the mask. FillMask acts like Fill, while FillMaskInv fills the area
outside the path.

SVG clipPaths is then representable by a FillMaskInv(0.0) for every nested
path, preceded by a FillMask(1.0) to clear the mask.

The bounding box for FillMaskInv elements is the entire screen; tightening of
the bounding box is left for future work. Note that a fullscreen bounding
box is not hopelessly inefficient because completely filling a tile with
a mask is just a single CmdSolidMask per tile.

Fixes #30

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-10-09 13:20:26 +02:00
Elias Naur 55cfd472a5 shader: delete unused code
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-10-09 13:20:26 +02:00
Elias Naur 9be0faba6f piet-gpu-types: remove unused scene elements
Delete image compute shader as well; it is unused.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-27 18:57:53 +02:00
Elias Naur fa9bf0dc2b piet-gpu-types: remove unused ptcl types
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-27 18:30:33 +02:00
Elias Naur dceb0f9412 piet-gpu-types: remove unused annotated types
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-21 10:55:58 +02:00
Elias Naur ac3ac3ddff shader: introduce a crude setting for adjusting the maximum workgroup size
Both the Vulkan and OpenGL ES spec allow implementations to limit workgroups to
128 threads. Add a LG_WG_FACTOR setting for easy switching between 128 and 256
threads, with 256 being kept as the default setting.

Manually tested that LG_WG_FACTOR = 0 (128 threads) works as expected.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-13 13:04:13 +02:00
Elias Naur 326f7f0d03 shader: delete more unused code and variables
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-13 13:03:56 +02:00
Elias Naur 05636995dd compute IMAGE_WIDTH and IMAGE_HEIGHT; remove dead code from setup.h
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-08-29 15:03:40 +02:00
Elias Naur de4f963ba0 shader: remove dead code
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-08-28 17:37:46 +02:00
Elias Naur cfd57361c4 Fix linewidth transformations
The transformation determinant is signed, but we're only interested in
the absolute scale for transforming linewidths.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-08-24 16:12:18 +02:00
bhmerchant@gmail.com d836d21d12 Clean up bits of right edge tracking logic left over from sort-middle. 2020-08-12 19:57:14 -07:00
msiglreith 1cc5c7ac0d Shader documentation and a slight cleanup 2020-06-28 15:37:27 +02:00
msiglreith eed71721eb Update winit example 2020-06-14 23:32:59 +02:00
Raph Levien 65f802894c Merge branch 'master' into sorta 2020-06-13 07:30:40 -07:00
Raph Levien b23113461b Minor cleanups
Get rid of warnings. Do cargo update to bump deps.
2020-06-10 14:10:28 -07:00
Raph Levien 79cc9da811 Fancy flattening
Implement same flattening algorithm as kurbo.
2020-06-09 20:45:19 -07:00
Raph Levien eaa1d261c3 Sederberg error metric
Use proper math to compute number of subdivisions. This works but is not
very satisfying, as it over-subdivides.
2020-06-09 18:43:49 -07:00
Raph Levien b571e0d10c Continue wiring up gpu-side flattening
All segments given to path coarse raster are cubics. Flatten to
quadratics.

This works but the quality is not (yet) good.
2020-06-09 17:56:11 -07:00
Raph Levien 0f44bc8b78 Start GPU-side flattening
This starts the work on GPU-side flattening by plumbing curves through.
2020-06-09 16:01:47 -07:00
Raph Levien 3a8227d025 Non-load balanced coarse path raster
This is a bit of a revert of the load-balanced ("more parallel") coarse
path rasterizer, but includes fills and also uses atomicExchange.

I'm doing it this way because it should be considerably easier to do
flattening in this structure, even though there will be some performance
regression.
2020-06-09 15:09:53 -07:00
Raph Levien 7118c8efc1 Fix backdrop of segments to left of viewport
Make sure we account for backdrop in segments clipped out of viewport.
2020-06-09 10:25:22 -07:00
Raph Levien 6db4e20bbb More parallel backdrop propagation
This is a nice improvement but still not great on tiger.
2020-06-06 08:23:40 -07:00
Raph Levien af0a1af8e1 Make fills work
The backdrop propagation is slow but it does work.
2020-06-05 22:40:44 -07:00
Raph Levien f9f5961428 Use atomicExchange over atomicCompSwap
Significant perf win (approx 2x in the path coarse rasterizer)
2020-06-05 08:24:26 -07:00
Raph Levien e5dd9ae01e More parallel path coarse raster
Use fancier load balancing algorithm for coarse rendering of paths.

Seems to work and an improvement in some cases.
2020-06-04 17:42:33 -07:00
Raph Levien 877da4a98e Faster coarse raster
Store a lot more tile context in shared memory and do the work from
that.
2020-06-04 10:39:08 -07:00
Raph Levien e1aa9b2f5d Remove bbox guard
It's probably not necessary.

This development still work in progress.
2020-06-03 20:59:19 -07:00
Raph Levien 7f4a6523a8 Filter sparse tiles
Have a more-parallel read of the tile structures based on bbox coverage,
and only set the bit when the tile isn't empty.

This is a speedup, but there is some duplicated work and it is possible
to improve it further.
2020-06-03 17:55:42 -07:00
Raph Levien 63ba45c774 Fix performance issues
Use larger workgroup for tile initialization (utilization was poor).
Provide correct element count to coarse rasterizer.
2020-06-03 15:32:58 -07:00
Raph Levien ff8cee059c Optimize tile allocation
Use parallel scheme to zero out tiles.
2020-06-03 14:46:41 -07:00
Raph Levien 70a9c17e23 Continue building out pipeline
Plumbs the new tiling scheme to k4. This works (stroke only) but still
has some performance issues.
2020-06-03 12:21:09 -07:00
Raph Levien 294f6fd1db Experiment with new sorting scheme
Path segments are unsorted, but other elements are using the same
sort-middle approach as before.

This is a checkpoint. At this point, there are unoptimized versions
of tile init and coarse path raster, but it isn't wired up into a
working pipeline. Also observing about a 3x performance regression in
element processing, which needs to be investigated.
2020-06-03 09:29:25 -07:00
Raph Levien f3cb904f86 Add command line args for loading svg 2020-05-31 09:57:25 -07:00
Raph Levien c603cafc6c Merge branch 'more_svg' into new_merge 2020-05-31 09:19:34 -07:00
Raph Levien 2c185c3718 Simplify ringbuf
We don't really need a ring buffer, as we only read what we're actually
going to process.
2020-05-30 21:20:48 -07:00
Raph Levien 192ddc5eab Parallel merge
The fancy stuff :)
2020-05-30 21:11:13 -07:00
Raph Levien 121f29fef6 Merge one segment at a time
No parallelism yet, but seems to improve performance.
2020-05-30 08:51:52 -07:00
Raph Levien 894ef156e1 Change to new merge strategy in binning
WIP

We get "device lost" on NV :/
2020-05-29 20:06:16 -07:00
Raph Levien 3e83972606 Improve SVG parsing
WIP
2020-05-28 11:48:36 -07:00
Raph Levien 319aa703c4 Output multiple pixels per thread in k4
In kernel 4, compute a chunk of pixels rather than just one per thread.
This is a dramatic speedup.

(This commit cherry-picked from another working branch)
2020-05-28 07:54:24 -07:00
Raph Levien e16f68d89d Fix buffer overrun
Was a little too eager zeroing out sh_is_segment[]
2020-05-26 22:47:28 -07:00
Raph Levien dbcffb10db Reinstate fills
Add fills back in.
2020-05-25 15:27:03 -07:00
Raph Levien 3d422d9243 Allocate segment chunks in slabs
Another speedup might be to special-case when the number of chunks in a
stroke or fill command is 1, then the segment header doesn't need
allocation and memory traffic is reduced. But right now we'll avoid the
complexity.
2020-05-25 12:22:29 -07:00
Raph Levien 8eaf49a04d Checkpoint parallel output
Parallel segment output seems to be working for strokes.
2020-05-25 12:14:18 -07:00
Raph Levien 24b3def0a1 Start work on parallel segment output
Output of segments is in parallel. Getting closer, some problems with
chaining but mostly correct.
2020-05-24 21:02:19 -07:00
Raph Levien 55df3e6cc8 Fix linewidth math
Coarse rasterization wasn't entirely taking line width into account.

Also fix swizzle in matrix (not yet used). And fix missing End command
in ptcl output (hasn't been a problem because buffer was cleared).
2020-05-24 09:43:41 -07:00
Raph Levien 7d040dff37 Bit magic for backdrop accumulation
Use bit counting rather than iterating backdrop increments one by one.
A nice if not huge speedup.
2020-05-22 07:30:32 -07:00