vello

alex/vello

mirror of https://github.com/italicsjenga/vello.git synced 2025-01-07 11:21:30 +11:00

Author	SHA1	Message	Date
Bruce Mitchener	fba1b46971	Update to wgpu 0.17.	2023-08-05 07:25:14 +07:00
Arman Uguray	2c46228d06	[binning] Correctly handle disjoint bounding-box intersections When the bounding boxes of a path and its clip are disjoint (i.e. they do not intersect) the result of their intersection is a negative rectangle. When calculating the intersection of the bboxes, the binning stage ensures that the bbox is non-negative. It then normalizes the coordinates to bin dimensions and rounds the top-left corner down to the neareast integer and the bottom-right corner up. However this rounding causes zero-area bounding boxes to have a non-zero area and sends the bottom-right corner to the placed in the next bin. This causes fully clipped out draw objects to be included in binning, with an incorrect clip bounding box that causes them to be erroneously drawn with partial clipping. `binning` now takes care around this logic to make sure that a zero-area intersected-bbox gets skipped and clipped out. `tile_alloc`, also takes care in its logic. Fixes #286 and #333	2023-06-28 20:42:52 -07:00
Arman Uguray	1dea6c0ef0	Fix invalid buffer access errors caught by shader validation Fixed several other shader validation errors caught when running vello_shaders natively on Metal. These were primarily caused by reading an invalid drawtag while accessing the scene buffer. Scene buffer access in the offending pipelines now initialize the draw tag to DRAWTAG_NOP if an invocation ID would land beyond the valid index range of encoded draw objects.	2023-06-28 12:59:21 -07:00
Arman Uguray	a016fc19de	[draw_leaf] Don't write past the end of the draw_monoids buffer The number of global invocations for draw_leaf can exceed the size of the draw_monoids buffer which gets conservatively set to the number of draw objects. Added an explicit bounds check to prevent the invalid write. This is not an issue when targeting wgpu as the WGSL compiler emits implicit bounds checking. When targeting Metal, we disable implicit bounds checks as that requires an extra buffer binding containing buffer sizes. This was caught by Xcode's Metal shader validation and resulted in visual artifacts in native rendering.	2023-06-28 12:39:24 -07:00
Chad Brokaw	7b68630d6a	refactor common scale ratio code	2023-05-15 14:54:44 -04:00
Chad Brokaw	58c7df469d	Address review feedback * replace one_minus_focal_x and abs_one_minus_focal_x variables with the actual expressions * replace division by r^2-1 with multiplication by reciprocal * revert chain selects to branchy code for clarity. Branching is dynamically uniform so shouldn't affect performance * add suggested comment describing gradient kind/flags constants	2023-05-15 14:45:38 -04:00
Chad Brokaw	5e1188f968	replace branches with chained selects This exchanges the per-pixel branching with additional ALU + selects. My expectation is that this will be faster, but that may be hardware/driver dependent and likely requires profiling and examination of generated code. The original code is kept in a comment with notes to explain the more obfuscated select version.	2023-05-11 12:37:36 -04:00
Chad Brokaw	b103a55301	rework radial gradients Adds full support for COLRv1 radial gradients based on the two-point conical gradient algorithm at https://skia.org/docs/dev/design/conical/ Also adds robustness to degenerate cases in gradient encoding: * Radial where p0 == p1 && r0 == r1 renders transparent solid * Empty stops render as transparent solid * Single stop renders as solid	2023-05-09 18:09:53 -04:00
Chad Brokaw	ced6309a3b	support two point radial with r0 > 0.0	2023-05-06 03:27:53 -04:00
Chad Brokaw	15cd306af6	Extend modes for gradients This patch implements the pad, repeat and reflect extend modes for gradient brushes. Adds a new example demonstrating the functionality. Also fixes a few bugs: * Clamps alpha in blend.wgsl for the `blend_compose` function. The `Plus` mode was generating `alpha > 1.0` leading to incorrect rendering. * Small change to radial gradients in fine.wgsl to reject pixels outside the cone when the circles don't nest. This requires further work to properly extend the cone when one of the radii is not 0.	2023-04-30 23:11:57 -04:00
Arman Uguray	bc903d1c3b	Add check for division-by-zero in path_coarse_full The potential division by zero in this line led to visible visual artifacts when running against WebGPU in Chrome.	2023-04-23 16:28:52 -07:00
Arman Uguray	ceeb0b33b6	[shaders] Explicitly guard writes to clip_bboxes The very last statement of the `clip_leaf` shader is the assignment to the `clip_bboxes` buffer. The buffer write is indexed on the global invocation ID. It is possible for this index to be larger than the total number of clips in at least one workgroup since the clip count isn't strictly a multiple of workgroup size. Currently the size of the clip_bboxes buffer matches the number of clips. This means the buffer index is likely to run past the buffer. This is not an issue when running on wgpu as it internally enables bounds checking when compiling WGSL (so all buffer accesses are implicitly conditional). When compiling the shaders to native backends the vello_shaders crate currently does not enable implicit bounds checking, so a buffer overrun is possible. There are a few potential solutions: 1. Have an explicit bounds check in the shader. This is straightforward and consistent with the existing code that reads from clip_inp. The downside is that with bounds checking enabled, this extra check is redundant in the generated code. This is the solution included in this PR. 2. Make sure that the clip_bboxes buffer has a size that is a multiple of clip_leaf's workgroup size. This was the approach taken by piet-gpu on its native HALs. This effectively wastes up to 4080 bytes (255 * 16) to store unused bbox values. 3. Enable Naga's implicit bounds checks when compiling to native. This would make the behavior consistent with the wgpu backend, however it comes at the cost of increased renderer complexity as the native implementation must supply the sizes of each buffer in an implicitly generated buffer binding to every shader stage.	2023-04-21 18:43:51 -07:00
Chad Brokaw	0db71153ad	Playing with shader permutations and AOT compilation	2023-03-29 10:38:10 -07:00
Chad Brokaw	a6307a2520	predicate image loads on non-zero mask	2023-03-15 08:37:00 -04:00
Chad Brokaw	d12b711fe1	premultiply alpha before filtering	2023-03-10 02:04:21 -05:00
Chad Brokaw	a8585781cd	move atlas rect to info Atlas offset and image size were originally stored in the ptcl but are not tile dependent. Moving these to info saves 8 bytes per image tile.	2023-03-10 01:42:50 -05:00
Chad Brokaw	165b3a083b	Let's add images	2023-03-09 17:18:03 -05:00
Chad Brokaw	15efb8b3f6	fixes after rebase * remove SceneBuilder::finish() calls * remove old Config struct * comment about syncing structs in config.wgsl	2023-03-03 20:46:50 -05:00
Arman Uguray	3bbf108df5	Renamed clear_color to base_color; addressed review comments	2023-03-02 14:29:44 -08:00
Arman Uguray	8eabc12a72	Add a clear_color uniform Introduced an RGBA8 config parameter to apply as a base blend color in the fine stage of the full pipeline.	2023-03-02 09:26:31 -08:00
Chad Brokaw	f657b88018	use matrix math!	2023-02-23 01:19:04 -05:00
Chad Brokaw	659ab2ff7e	simplify	2023-02-22 23:25:45 -05:00
Chad Brokaw	1f938e5f49	linear algebra refresher	2023-02-22 23:18:51 -05:00
Chad Brokaw	3c15bff867	Proper inverse of translation components	2023-02-22 23:08:38 -05:00
Chad Brokaw	033870d91e	Fix brush transforms This fixes an incorrect application of the inverse transform for radial gradients in fine. Also fixes an edge case in `SceneBuilder` where a brush transform is identical to the path transform leading to a corrupt encoding.	2023-02-22 20:10:00 -05:00
Raph Levien	27e6fdd267	Partially revert uniform load of bump.failed Just load the atomic bump counter directly instead of piping it through a shared variable, when workgroupUniformLoad is not available. The value is in fact dynamically uniform, but that depends on the stage not setting its own failure flag, a fairly subtle invariant. I think there was a write-after-read hazard for the reuse of sh_part_count[0]. However, doing the experiment of just changing that doesn't fix the problem on mac. It's possible there's a shader compilation problem (possibly the same one as provoking the storageBarrier workaround in tile_alloc), or also possibly a logic error I'm not understanding. In any case, this change does appear to fix the hangs on mac. Fixes #267	2023-01-29 09:01:13 -08:00
Raph Levien	d6cbae2a3f	Fixes to get example running in wasm A number of things were wrong: * The args were missing to `run` * The robust memory changes introduced uniformity errors * `clear_buffer` is a todo for wgpu on wasm * Some more time calls crept in * Initializing both env_logger and console_logger fails In addition, we conditionally opt the shaders into `workgroupUniformLoad`, as that's available on wasm but not yet native. Some of the things (args, uniformity errors) are important fixes. Other things (clear_buffer, wUL being optional) are workarounds for wgpu limitations and have TODO items to be removed when wgpu catches up.	2023-01-26 12:19:12 -08:00
Chad Brokaw	0c0c61dc82	Address review feedback * Add counts to offsets when comparing against buffer size limits * Remove multiplication by 4 in blend buffer allocation (we use units of u32) * Move buffer sizes from BumpAllocators to Config * Add comments about early exit	2023-01-18 21:36:32 -05:00
Chad Brokaw	db7d93b85c	remove unnecessary limit adjustment	2023-01-17 22:56:52 -05:00
Chad Brokaw	f0587b6770	add comment about syncing BumpAllocators struct	2023-01-17 22:51:46 -05:00
Chad Brokaw	1e8d194b6a	initial GPU side work for robust memory This should handle everything on the GPU side except for blend stack loading/storing in fine.	2023-01-17 14:08:20 -05:00
Daniel McNab	3902e65618	Fix missing barriers in the `pathtag_scan`s (#255 ) Co-authored-by: Raph Levien <raph.levien@gmail.com> Co-authored-by: Raph Levien <raph.levien@gmail.com>	2023-01-16 20:20:20 +00:00
Raph Levien	02d8b28439	Merge pull request #245 from linebender/reuse_buf Prototype of buffer reuse	2023-01-13 11:18:03 -08:00
Raph Levien	5c469013c7	Fix even-odd rule This works with winding numbers even larger than 2.	2023-01-12 21:08:51 -08:00
Raph Levien	4907186de4	Prototype of buffer reuse This helps performance but not all performance issues have been resolved. Nontrivial CPU goes into write_buffer, and it's also possible that there isn't enough overlapping between CPU and GPU work.	2023-01-12 20:43:58 -08:00
Chad Brokaw	a9aa3f9cab	Merge pull request #242 from linebender/evenodd Support even-odd fill rule	2023-01-11 14:13:24 -05:00
Raph Levien	3003e42acb	Merge pull request #235 from linebender/large_pathtag Support for larger pathtags	2023-01-11 07:56:15 -08:00
Chad Brokaw	c6ac5bf590	Support even-odd fill rule Add logic for handling the even-odd fill rule to `SceneBuilder` and the coarse and fine shaders.	2023-01-10 15:22:04 -05:00
Raph Levien	080277bcd9	Merge pull request #241 from linebender/fine_fixes Fixes to fine rasterization	2023-01-08 13:50:51 -08:00
Duane Johnson	3e8a4813f0	Update references to piet-gpu where it makes sense (#240 )	2023-01-08 16:15:51 +00:00
Raph Levien	29a16eb210	Fixes to fine rasterization The area bug was found and fixed by @dfrg, and is adapted from #239. I wanted to move it to a separate PR so that one would be more focused on API. The other bug is currently silent because the two quantities swapped are both 4, but it is triggered when experimenting with tuning for performance.	2023-01-08 07:41:01 -08:00
Raph Levien	d94257a7c5	Support for larger pathtags Previously there was a limit of 256k pathtags in a scene, due to the need for multi-dispatch prefix sum for the pathtag monoid. This patch increases the limit to 64M, which ought to be enough for most applications. It works by having 4 dispatches for the pathtag prefix sum: 2 to reduce, then 2 to scan.	2023-01-05 14:25:21 -08:00
Daniel McNab	ff59839737	Move the vello crate to the workspace root (#231 ) * Move the vello crate to the root of the crate * Add warning that README is work in progress * Add newline in warning * Move the unlicense into the shader folder * Fixup wgsl-analyzer include paths	2023-01-05 09:32:09 +00:00

43 commits