vello

alex/vello

mirror of https://github.com/italicsjenga/vello.git synced 2025-01-10 20:51:29 +11:00

Author	SHA1	Message	Date
Chad Brokaw	753b97c342	Rebase on radial branch	2022-04-11 05:30:08 -04:00
Raph Levien	05dc88b70f	Fix is_clip nit Not necessary to test is_clip when is_blend is set. Also recompiles shaders on Windows machine.	2022-03-30 07:27:29 -07:00
Raph Levien	7134be2329	Fix missing blend/clip logic We always do BeginClip/EndClip if it's a solid tile and the blend mode is not default. Also fix missing entry in pipeline layout (affects Vulkan but not Metal).	2022-03-16 14:40:58 -07:00
Raph Levien	acb3933d94	Variable size encoding of draw objects This patch switches to a variable size encoding of draw objects. In addition to the CPU-side scene encoding, it changes the representation of intermediate per draw object state from the `Annotated` struct to a variable "info" encoding. In addition, the bounding boxes are moved to a separate array (for a more "structure of "arrays" approach). Data that's unchanged from the scene encoding is not copied. Rather, downstream stages can access the data from the scene buffer (reducing allocation and copying). Prefix sums, computed in `DrawMonoid` track the offset of both scene and intermediate data. The tags for the CPU-side encoding have been split into their own stream (again a change from AoS to SoA style). This is not necessarily the final form. There's some stuff (including at least one piet-gpu-derive type) that can be deleted. In addition, the linewidth field should probably move from the info to path-specific. Also, the 1:1 correspondence between draw object and path has not yet been broken. Closes #152	2022-03-14 16:32:08 -07:00
Raph Levien	90774f1f46	Regenerate generated shaders This just runs ninja on the piet-gpu/shaders on a Windows machine, so translated shaders match the existing pipeline. At some point, we'll rework this to reduce friction.	2022-03-07 12:49:59 -08:00
Chad Brokaw	d3b08e4c52	Initial implementation of blend modes * Add blend and composition mode enums to API * Mirror these in the shaders * Add new public blend function to PietGpuRenderContext that mirrors clip * Plumb the modes through the pipeline from scene to kernel4	2022-02-28 12:38:14 -05:00
Raph Levien	3b67a4e7c1	New clip implementation This PR reworks the clip implementation. The highlight is that clip bounding box accounting is now done on GPU rather than CPU. The clip mask is also rasterized on EndClip rather than BeginClip, which decreases memory traffic needed for the clip stack. This is a pretty good working state, but not all cleanup has been applied. An important next step is to remove the CPU clip accounting (it is computed and encoded, but that result is not used). Another step is to remove the Annotated structure entirely. Fixes #88. Also relevant to #119	2022-02-17 17:13:28 -08:00
Raph Levien	43c5ed29b2	Fix generated shaders	2022-02-07 17:30:17 -08:00
Raph Levien	012d679e3d	Merge branch 'master' into mtl_guest	2022-02-07 17:28:52 -08:00
Tatsuyuki Ishi	8bee553e6e	Merge pull request #147 from ishitatsuyuki/clang-format	2022-02-01 10:13:30 +09:00
Tatsuyuki Ishi	a7e926d67b	shaders: Add .clang-format and reformat Helps keeping the code tidy. Style is chosen to minimize diff, but contains a slight bit of personal taste.	2022-01-30 16:33:14 +09:00
Tatsuyuki Ishi	2bdca399ab	piet-gpu: Add phony targets for spv, dxil and msl Makes building without dxc or spirv-cross easier.	2022-01-20 19:34:05 +09:00
Raph Levien	2613a7e500	Add generated kernel4_gray shaders	2022-01-19 12:18:39 -08:00
Raph Levien	0cf370f9c7	Mostly working rendering This exposes interfaces to render glyphs into a texture atlas. The main changes are: * Methods to plumb raw Metal GPU resources (device, texture, etc) into piet-gpu-hal objects. * A new glyph_render API specialized to rendering glyphs. This is basically the same as just painting to a canvas, but will allow better caching (and has more direct access to fonts, bypassing the Piet font type which is underdeveloped). * Ability to render to A8 target in addition to RGBA. WIP, there are some rough edges, not least of which is that the image format changes are only on mac and cause compile errors elsewhere.	2022-01-19 12:10:51 -08:00
Raph Levien	d948126c16	Adjust workgroup sizes Make max workgroup size 256 and respect LG_WG_FACTOR. Because the monoid scans only support a height of 2, this will reduce the maximum scene complexity we can render. But it also increases compatibility. Supporting larger scans is a TODO.	2021-12-08 11:48:38 -08:00
Raph Levien	49c3a3923b	Restore gradients and clips This changes gradients and clips to the new encoding. Lightly tested.	2021-12-07 18:39:33 -08:00
Raph Levien	c503ff28b0	Make shaders cross-platform Translate all piet-gpu shaders into DXIL and MSL; move generated files into the shader/gen directory.	2021-12-03 15:49:58 -08:00
Raph Levien	44327fe49f	Beginnings of new element pipeline This successfully renders the tiger; fills and strokes are supported. Other parts of the imaging model, not yet. Progress toward #119	2021-12-03 15:33:01 -08:00
Raph Levien	875c8badf4	Add draw object stage This is one of the stages in the new element pipeline. It's a simple one, just a prefix sum of a couple counts, and some of it will probably get merged with a downstream stage, but we'll do it separately for now for convenience. This patch also contains an update to Vulkan tools 1.2.198, which accounts for the large diff of translated shaders.	2021-12-02 13:37:16 -08:00
Raph Levien	1d1801c1aa	Cross-platform path stage shaders	2021-12-01 08:42:06 -08:00
Raph Levien	8af4707525	Fix uninitialized variable	2021-12-01 08:34:41 -08:00
Raph Levien	178761dcb3	Path stream processing This patch contains the core of the path stream processing, though some integration bits are missing. The core logic is tested, though combinations of path types, transforms, and line widths are not (yet). Progress towards #119	2021-12-01 07:33:24 -08:00
Raph Levien	47f8812e2f	Start work on new element pipeline There's a bit of reorganizing as well. Shader stages are made available from piet-gpu to the test rig, config is now a proper structure (marshaled with bytemuck). This commit just has the transform stage, which is a simple monoid scan of affine transforms. Progress toward #119	2021-11-24 08:01:43 -08:00
Raph Levien	8015eb25a1	Also fix write-after-read in elements.com On further testing, this resolves a hard lockup on Intel 630 on the mmark stress test, so is worth getting into the repo.	2021-11-14 08:23:37 -08:00
Raph Levien	95aad3e6c7	Put memory barrier reliably before flag write	2021-11-02 13:02:12 -07:00
Raph Levien	e50d5c1f58	Add memory barrier to elements shader The flag read needs acquire semantics. There are a number of ways that could be expressed, but a generally portable way is to have a barrier after. However, in the translation to Metal, that barrier needs to be in uniform control flow. This patch does some workarounds to ensure that.	2021-11-02 12:50:11 -07:00
Elias Naur	039cfcf0de	piet-gpu/shader: treat memoryBarrierBuffer as a control barrier memoryBarrierBuffer is mapped to the threadgroup_barrier function in Metal, which is a control barrier that must be executed by all threads (or none). This change establishes that property for the two memory barriers we have. While here, remove ENABLE_IMAGE_INDICES completely; it was disabled in an earlier change. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-08-20 20:41:35 +02:00
Raph Levien	59728868de	Merge branch 'master' into gradient	2021-08-16 10:53:19 -07:00
Raph Levien	05e81acebc	Basically get gradients working Separate out render context upload from renderer creation. Upload ramps to GPU buffer. Encode gradients to scene description. Fix a number of bugs in uploading and processing. This renders gradients in a test image, but has some shortcomings. For one, staging buffers need to be applied for a couple things (they're just host mapped for now). Also, the interaction between sRGB and premultiplied alpha isn't quite right. The size of the gradient ramp buffer is fixed and should be dynamic. And of course there's always more optimization to be done, including making the upload of gradient ramps more incremental, and probably hashing of the stops instead of the processed ramps.	2021-08-09 16:16:46 -07:00
Raph Levien	3af033f71f	Merge pull request #108 from linebender/path_hang2 Retain subdivision results	2021-07-19 10:22:55 -07:00
Raph Levien	62df7c0bd5	Remove leftover debug stuff In response to review by Elias.	2021-07-19 08:39:44 -07:00
Raph Levien	29a8975a9a	Retain subdivision results Don't recompute the parameters from quadratic subdivision, but rather retain them across the two phases (summing the subdivision estimate, and generating the subdivisions). The motivation for this is that the values were subtly different (differing by 1 or 2 least signficant bits) across the two phases. It might also be faster depending on ALU/memory relative performance. Fixes #107	2021-07-15 11:18:48 -07:00
Raph Levien	6f707c4c62	Start work on gradients WIP. Most of the GPU-side work should be done (though it's not tested end-to-end and it's certainly possible I missed something), but still needs work on encoding side.	2021-07-12 06:56:52 -07:00
Ishi Tatsuyuki	7a2dc37d36	Remove manual blend stack spilling and rely on scratch memory instead v2: Add a panic when the nested blend depth exceeds the limit. v3: Rebase and partially remove code introduced in `22507de`.	2021-06-25 17:13:01 +09:00
Ishi Tatsuyuki	d77dfb8c00	Runtime querying of threadgroup size	2021-06-08 16:29:40 +09:00
Ishi Tatsuyuki	c2772ceac7	Boost backdrop parallelism for the prefix sums	2021-06-08 15:09:32 +09:00
Elias Naur	4b59525e1f	use mediump precision for kernel4 colors and areas Improves kernel4 performance for a Gio scene from ~22ms to ~15ms. Updates #83 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-04-20 10:15:42 +02:00
Elias Naur	d9d518b248	avoid non-uniform barrier control flow when exhausting memory The compute shaders have a check for the succesful completion of their preceding stage. However, consider a shader execution path like the following: void main() if (mem_error != NO_ERROR) { return; } ... malloc(...); ... barrier(); ... } and shader execution that fails to allocate memory, thereby setting mem_error to ERR_MALLOC_FAILED in malloc before reaching the barrier. If another shader execution then begins execution, its mem_eror check will make it return early and not reach the barrier. All GPU APIs require (dynamically) uniform control flow for barriers, and the above case may lead to GPU hangs in practice. Fix this issue by replacing the early exits with careful checks that don't interrupt barrier control flow. Unfortunately, it's harder to prove the soundness of the new checks, so this change also clears dynamic memory ranges in MEM_DEBUG mode when memory is exhausted. The result is that accessing memory after exhaustion triggers an error. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-04-20 10:15:29 +02:00
Elias Naur	3b4a72deb9	elements.comp: remove redundant assignment The assignment was made redundant by `eb86456f31`. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-04-20 10:14:04 +02:00
Raph Levien	1c842f8471	Merge branch 'master' into ext_query	2021-04-11 15:33:49 -07:00
Elias Naur	45ea43c157	kernel4: replace continue in switch to support D3D11 shader model 5.0 Without this change, the fxc.exe compiler complains error X3708: continue cannot be used in a switch Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-04-11 21:49:57 +02:00
Raph Levien	01e4024599	Merge branch 'master' into ext_query	2021-04-11 09:08:46 -07:00
Raph Levien	115cb855d9	Query extensions at runtime Don't run extensions unless they're available. This includes querying for descriptor indexing, and running one of two versions of kernel4 depending on whether it's enabled. Part of the support needed for #78	2021-04-08 15:11:15 -07:00
Elias Naur	eb86456f31	elements.comp: don't modify BeginClip bounding box The BeginClip and EndClip bounding boxes are absolute and must pairwise match. I mistakenly modified the BeginClip bounding box for stroked clips. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-04-08 19:56:37 +02:00
Elias Naur	5db427c549	kernel4: compute and output alpha Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-31 19:51:49 +02:00
Elias Naur	ee4429a26f	kernel4: separate area from alpha in clip stack This change prepares for kernel4 to output alpha. No functional changes. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-31 19:51:42 +02:00
Elias Naur	22507dea0e	pre-allocate kernel4 scratch space in coarse.comp coarse.comp knows the maximum stack depth, and can pre-allocate scratch space for kernel4.comp. Kernel4 no longer contains allocations nor control barriers. The invocation local blend stack is gone as well; it didn't seem to make any difference in performance to always use global memory for pushing and popping. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-31 18:48:19 +02:00
Elias Naur	e6b535d942	coarse.comp: extract area commands into function No functional changes. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-30 19:56:09 +02:00
Elias Naur	d916a9e2c4	backdrop.comp: support stroked Annotated_Image and Annotated_BeginClip Commit `8db77e180e` added support for strokes to FillImage and BeginClip, but missed backdrop.comp. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-30 19:33:25 +02:00
Elias Naur	678bfedfca	kernel4: assume colors in alpha-premultiplied sRGB format See http://ssp.impulsetrain.com/gamma-premult.html for a description of the format. Pre-multiplied alpha only matters for translucent objects; draw a few such shapes in the test render. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-29 21:17:01 +02:00

1 2 3 4

175 commits