Commit graph

180 commits

Author SHA1 Message Date
Raph Levien 368954a643 Merge branch 'master' into blend_mem
This does the merge and also rebuilds the generated shaders.
2022-05-19 15:42:45 -07:00
Raph Levien aac6513409 Compile shaders on Windows
Updates DXIL on generated shaders.
2022-05-18 15:41:00 -07:00
Raph Levien 307bf8d227 More blend mode fixes
Adds a test to visualize the blend modes. Fixes a dumb bug in blend.h and also a more subtle issue where default blending is not the same as clipping, as the former needs to always push a blend group (to cause isolation) and the latter does not. This might be something we need to get back to.

This should fix the rendering, so it fairly closely resembles the Mozilla reference image. There's also a compile-time switch to disable sRGB conversion, which is (sadly) needed for compatible rendering.
2022-05-17 16:12:05 -07:00
Raph Levien e73049fe98 First cut at split blend stack
Split the blend stack into register and memory segments. Do blending in registers up to that size, then spill to memory if needed.

This version may regress performance on Pixel 4, as it uses common memory for the blend stack, rather than keeping that memory read-only in fine rasterization, and using a separate buffer for blend stack. This needs investigation. It's possible we'll want to have single common memory as a config option, as it pools allocations and decreases the probability of failure.

Also a flaw in this version: there is no checking of memory overflow.

For understanding code history: this commit largely reverts #77, but there were some intervening changes to blending, and this commit also implements the split so some of the stack is in registers.

Closes #156
2022-05-16 11:12:33 -07:00
Raph Levien 18563101b2 Fix blending math
The blending math had two errors: first, colors were not separated for the purpose of blending (blending was wrongly applied to premultiplied values), and second, alpha was applied over-aggressively to the alpha channel.

This PR does *not* address the issue of gamma correctness. That is a complex issue and should probably be handled in the short term by disabling sRGB conversions and doing the internal math in sRGB color space rather than linear. This will degrade the quality of antialiasing but on the other hand give spec-compliant results for compositing.

We remove the plus-darker mode as its specification does not appear to be valid. The plus-lighter mode remains as it is quite useful for cross-fading effects.

Also the generated shaders were compiled on mac so the DXIL is unsigned. Those should be compiled on Windows before this PR is merged. (and we should figure out a better strategy for all that)
2022-05-13 10:18:29 -07:00
Raph Levien 0f91149b49 Radial gradients
This patch adds radial gradients, including both the piet API and some
new methods specifically to support COLRv1, including the ability to
transform the gradient separately from the path.
2022-03-30 20:32:13 -07:00
Raph Levien 05dc88b70f Fix is_clip nit
Not necessary to test is_clip when is_blend is set.

Also recompiles shaders on Windows machine.
2022-03-30 07:27:29 -07:00
Raph Levien 7134be2329 Fix missing blend/clip logic
We always do BeginClip/EndClip if it's a solid tile and the blend mode
is not default.

Also fix missing entry in pipeline layout (affects Vulkan but not Metal).
2022-03-16 14:40:58 -07:00
Raph Levien acb3933d94 Variable size encoding of draw objects
This patch switches to a variable size encoding of draw objects.

In addition to the CPU-side scene encoding, it changes the representation of intermediate per draw object state from the `Annotated` struct to a variable "info" encoding. In addition, the bounding boxes are moved to a separate array (for a more "structure of "arrays" approach). Data that's unchanged from the scene encoding is not copied. Rather, downstream stages can access the data from the scene buffer (reducing allocation and copying).

Prefix sums, computed in `DrawMonoid` track the offset of both scene and intermediate data. The tags for the CPU-side encoding have been split into their own stream (again a change from AoS to SoA style).

This is not necessarily the final form. There's some stuff (including at least one piet-gpu-derive type) that can be deleted. In addition, the linewidth field should probably move from the info to path-specific. Also, the 1:1 correspondence between draw object and path has not yet been broken.

Closes #152
2022-03-14 16:32:08 -07:00
Raph Levien 90774f1f46 Regenerate generated shaders
This just runs ninja on the piet-gpu/shaders on a Windows machine, so
translated shaders match the existing pipeline.

At some point, we'll rework this to reduce friction.
2022-03-07 12:49:59 -08:00
Chad Brokaw d3b08e4c52 Initial implementation of blend modes
* Add blend and composition mode enums to API
* Mirror these in the shaders
* Add new public blend function to PietGpuRenderContext that mirrors clip
* Plumb the modes through the pipeline from scene to kernel4
2022-02-28 12:38:14 -05:00
Raph Levien 3b67a4e7c1 New clip implementation
This PR reworks the clip implementation. The highlight is that clip bounding box accounting is now done on GPU rather than CPU. The clip mask is also rasterized on EndClip rather than BeginClip, which decreases memory traffic needed for the clip stack.

This is a pretty good working state, but not all cleanup has been applied. An important next step is to remove the CPU clip accounting (it is computed and encoded, but that result is not used). Another step is to remove the Annotated structure entirely.

Fixes #88. Also relevant to #119
2022-02-17 17:13:28 -08:00
Raph Levien 43c5ed29b2 Fix generated shaders 2022-02-07 17:30:17 -08:00
Raph Levien 012d679e3d Merge branch 'master' into mtl_guest 2022-02-07 17:28:52 -08:00
Tatsuyuki Ishi 8bee553e6e
Merge pull request #147 from ishitatsuyuki/clang-format 2022-02-01 10:13:30 +09:00
Tatsuyuki Ishi a7e926d67b shaders: Add .clang-format and reformat
Helps keeping the code tidy.

Style is chosen to minimize diff, but contains a slight bit of personal taste.
2022-01-30 16:33:14 +09:00
Tatsuyuki Ishi 2bdca399ab piet-gpu: Add phony targets for spv, dxil and msl
Makes building without dxc or spirv-cross easier.
2022-01-20 19:34:05 +09:00
Raph Levien 2613a7e500 Add generated kernel4_gray shaders 2022-01-19 12:18:39 -08:00
Raph Levien 0cf370f9c7 Mostly working rendering
This exposes interfaces to render glyphs into a texture atlas. The main changes are:

* Methods to plumb raw Metal GPU resources (device, texture, etc) into piet-gpu-hal objects.

* A new glyph_render API specialized to rendering glyphs. This is basically the same as just painting to a canvas, but will allow better caching (and has more direct access to fonts, bypassing the Piet font type which is underdeveloped).

* Ability to render to A8 target in addition to RGBA.

WIP, there are some rough edges, not least of which is that the image format changes are only on mac and cause compile errors elsewhere.
2022-01-19 12:10:51 -08:00
Raph Levien d948126c16 Adjust workgroup sizes
Make max workgroup size 256 and respect LG_WG_FACTOR.

Because the monoid scans only support a height of 2, this will reduce
the maximum scene complexity we can render. But it also increases
compatibility. Supporting larger scans is a TODO.
2021-12-08 11:48:38 -08:00
Raph Levien 49c3a3923b Restore gradients and clips
This changes gradients and clips to the new encoding. Lightly tested.
2021-12-07 18:39:33 -08:00
Raph Levien c503ff28b0 Make shaders cross-platform
Translate all piet-gpu shaders into DXIL and MSL; move generated files
into the shader/gen directory.
2021-12-03 15:49:58 -08:00
Raph Levien 44327fe49f Beginnings of new element pipeline
This successfully renders the tiger; fills and strokes are supported.
Other parts of the imaging model, not yet.

Progress toward #119
2021-12-03 15:33:01 -08:00
Raph Levien 875c8badf4 Add draw object stage
This is one of the stages in the new element pipeline. It's a simple
one, just a prefix sum of a couple counts, and some of it will probably
get merged with a downstream stage, but we'll do it separately for now
for convenience.

This patch also contains an update to Vulkan tools 1.2.198, which
accounts for the large diff of translated shaders.
2021-12-02 13:37:16 -08:00
Raph Levien 1d1801c1aa Cross-platform path stage shaders 2021-12-01 08:42:06 -08:00
Raph Levien 8af4707525 Fix uninitialized variable 2021-12-01 08:34:41 -08:00
Raph Levien 178761dcb3 Path stream processing
This patch contains the core of the path stream processing, though some
integration bits are missing. The core logic is tested, though
combinations of path types, transforms, and line widths are not (yet).

Progress towards #119
2021-12-01 07:33:24 -08:00
Raph Levien 47f8812e2f Start work on new element pipeline
There's a bit of reorganizing as well. Shader stages are made available
from piet-gpu to the test rig, config is now a proper structure
(marshaled with bytemuck).

This commit just has the transform stage, which is a simple monoid scan
of affine transforms.

Progress toward #119
2021-11-24 08:01:43 -08:00
Raph Levien 8015eb25a1 Also fix write-after-read in elements.com
On further testing, this resolves a hard lockup on Intel 630 on the
mmark stress test, so is worth getting into the repo.
2021-11-14 08:23:37 -08:00
Raph Levien 95aad3e6c7 Put memory barrier reliably before flag write 2021-11-02 13:02:12 -07:00
Raph Levien e50d5c1f58 Add memory barrier to elements shader
The flag read needs acquire semantics. There are a number of ways that
could be expressed, but a generally portable way is to have a barrier
after. However, in the translation to Metal, that barrier needs to be in
uniform control flow. This patch does some workarounds to ensure that.
2021-11-02 12:50:11 -07:00
Elias Naur 039cfcf0de piet-gpu/shader: treat memoryBarrierBuffer as a control barrier
memoryBarrierBuffer is mapped to the threadgroup_barrier function in
Metal, which is a control barrier that must be executed by all threads
(or none). This change establishes that property for the two memory
barriers we have.

While here, remove ENABLE_IMAGE_INDICES completely; it was disabled in
an earlier change.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-08-20 20:41:35 +02:00
Raph Levien 59728868de Merge branch 'master' into gradient 2021-08-16 10:53:19 -07:00
Raph Levien 05e81acebc Basically get gradients working
Separate out render context upload from renderer creation. Upload ramps
to GPU buffer. Encode gradients to scene description. Fix a number of
bugs in uploading and processing.

This renders gradients in a test image, but has some shortcomings. For
one, staging buffers need to be applied for a couple things (they're
just host mapped for now). Also, the interaction between sRGB and
premultiplied alpha isn't quite right. The size of the gradient ramp
buffer is fixed and should be dynamic.

And of course there's always more optimization to be done, including
making the upload of gradient ramps more incremental, and probably
hashing of the stops instead of the processed ramps.
2021-08-09 16:16:46 -07:00
Raph Levien 3af033f71f
Merge pull request #108 from linebender/path_hang2
Retain subdivision results
2021-07-19 10:22:55 -07:00
Raph Levien 62df7c0bd5 Remove leftover debug stuff
In response to review by Elias.
2021-07-19 08:39:44 -07:00
Raph Levien 29a8975a9a Retain subdivision results
Don't recompute the parameters from quadratic subdivision, but rather
retain them across the two phases (summing the subdivision estimate, and
generating the subdivisions). The motivation for this is that the values
were subtly different (differing by 1 or 2 least signficant bits) across
the two phases. It *might* also be faster depending on ALU/memory
relative performance.

Fixes #107
2021-07-15 11:18:48 -07:00
Raph Levien 6f707c4c62 Start work on gradients
WIP. Most of the GPU-side work should be done (though it's not tested
end-to-end and it's certainly possible I missed something), but still
needs work on encoding side.
2021-07-12 06:56:52 -07:00
Ishi Tatsuyuki 7a2dc37d36 Remove manual blend stack spilling and rely on scratch memory instead
v2: Add a panic when the nested blend depth exceeds the limit.
v3: Rebase and partially remove code introduced in 22507de.
2021-06-25 17:13:01 +09:00
Ishi Tatsuyuki d77dfb8c00 Runtime querying of threadgroup size 2021-06-08 16:29:40 +09:00
Ishi Tatsuyuki c2772ceac7 Boost backdrop parallelism for the prefix sums 2021-06-08 15:09:32 +09:00
Elias Naur 4b59525e1f use mediump precision for kernel4 colors and areas
Improves kernel4 performance for a Gio scene from ~22ms to ~15ms.

Updates #83

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-20 10:15:42 +02:00
Elias Naur d9d518b248 avoid non-uniform barrier control flow when exhausting memory
The compute shaders have a check for the succesful completion of their
preceding stage. However, consider a shader execution path like the
following:

	void main()
		if (mem_error != NO_ERROR) {
		    return;
		}
		...
		malloc(...);
		...
		barrier();
		...
	}

and  shader execution that fails to allocate memory, thereby setting
mem_error to ERR_MALLOC_FAILED in malloc before reaching the barrier. If
another shader execution then begins execution, its mem_eror check will
make it return early and not reach the barrier.

All GPU APIs require (dynamically) uniform control flow for barriers,
and the above case may lead to GPU hangs in practice.

Fix this issue by replacing the early exits with careful checks that
don't interrupt barrier control flow.

Unfortunately, it's harder to prove the soundness of the new checks, so
this change also clears dynamic memory ranges in MEM_DEBUG mode when
memory is exhausted. The result is that accessing memory after
exhaustion triggers an error.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-20 10:15:29 +02:00
Elias Naur 3b4a72deb9 elements.comp: remove redundant assignment
The assignment was made redundant by eb86456f31.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-20 10:14:04 +02:00
Raph Levien 1c842f8471 Merge branch 'master' into ext_query 2021-04-11 15:33:49 -07:00
Elias Naur 45ea43c157 kernel4: replace continue in switch to support D3D11 shader model 5.0
Without this change, the fxc.exe compiler complains

error X3708: continue cannot be used in a switch

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-11 21:49:57 +02:00
Raph Levien 01e4024599 Merge branch 'master' into ext_query 2021-04-11 09:08:46 -07:00
Raph Levien 115cb855d9 Query extensions at runtime
Don't run extensions unless they're available. This includes querying
for descriptor indexing, and running one of two versions of kernel4
depending on whether it's enabled.

Part of the support needed for #78
2021-04-08 15:11:15 -07:00
Elias Naur eb86456f31 elements.comp: don't modify BeginClip bounding box
The BeginClip and EndClip bounding boxes are absolute and must pairwise
match. I mistakenly modified the BeginClip bounding box for stroked
clips.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-08 19:56:37 +02:00
Elias Naur 5db427c549 kernel4: compute and output alpha
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-31 19:51:49 +02:00