vello

alex/vello

mirror of https://github.com/italicsjenga/vello.git synced 2025-01-10 20:51:29 +11:00

Author	SHA1	Message	Date
Elias Naur	4de67d9081	unify GPU memory management Merge all static and dynamic buffers to just one, "memory". Add a malloc function for dynamic allocations. Unify static allocation offsets into a "config" buffer containing scene setup (number of paths, number of path segments), as well as the memory offsets of the static allocations. Finally, set an overflow flag when an allocation fail, and make sure to exit shader execution as soon as that triggers. Add checks before beginning execution in case the client wants to run two or more shaders before checking the flag. The "state" buffer is left alone because it needs zero'ing and because it is accessed with the "volatile" keyword. Fixes #40 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Elias Naur	a2a2d12c5d	path_coarse.comp: fix intersection inconsistencies, take 2 The previous attempt to fix inconsistent intersections because of floating point inaccuracy[0] missed two cases. The first case is that for top intersections with the very first row would fail the test tag == PathSeg_FillCubic && y > y0 && xbackdrop < bbox.z In particular, y is not larger than y0 when y0 has been clipped to 0. Fix that by re-introducing the min(p0.y, p1.y) < tile_y0 check that does work and is just as consistent. Add similar check, min(p0.x, p1.x) < tile_x0, for deciding when to clip the segment to the left edge (but keep consistent xray check for deciding left edge intersections). The second case is that the tracking left intersections in the [xray, next_xray] range of tiles may fail when next_xray is forced to last_xray, the final xray value. Fix that case by computing next_xray explicitly, before looping over the x tiles. The code is now much simpler. Finally, ensure that xx0 and xx1 doesn't overflow the allocated number of tiles by clamping them after setting them. Adjust xx0 to include xray, just as xx1 is adjusted; I haven't seen corruption without it, but it's not obvious xx0 always includes xray. While here, replace a "+=" on a guaranteed zero value to just "=". Updates #23 [0] `29cfb8b63e` Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Elias Naur	d21f2b68de	all: add SPDX license headers Fixes #53 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-11 18:24:35 +01:00
Elias Naur	5c04e4882b	remove unused tilegroup.h and extra spaces from kernel4.comp Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-11 15:00:58 +01:00
Elias Naur	580b63e558	elements.comp: tighten state size calculations The state header is only one word (flags), not two. Move the partition atomic counter to a separate field instead of state[0], simplifying state offset calculations. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-10 18:48:16 +01:00
Elias Naur	1c6ca7e5fb	remove unused BinChunk type Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-08 00:45:08 +01:00
Raph Levien	634530fb91	Merge branch 'master' into image_work	2020-12-02 11:58:45 -08:00
Raph Levien	3906f348fd	Merge pull request #47 from linebender/clip_opt Optimize clips	2020-12-02 11:57:14 -08:00
Elias Naur	29cfb8b63e	eliminate inconsistent line intersections from path_coarse.comp The finite precision of floating point computations can lead the coarse renderer into inconsistent tile intersections, which implies impossible line segments such as lines with gaps or double intersections. The winding number algorithm is sensitive to these errors which show up as incorrectly filled paths. This change forces all intersections to be consistent. First, the floating point top edge intersection test is removed; top edge intersections are completely determined by left edge intersections. Then, left edge intersections are inserted from the tile with the last top edge intersection. The next top edge is then fixed to be the last tile with a left edge intersection. More details in the patch comments. Fixes #23 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-01 18:35:29 +01:00
Elias Naur	19f4d9fa95	change tile segment representation to (origin, vector) Eliminates the precision loss of the subtraction in the sign(end.x - start.x) expression in kernel4. That's important for the next change that avoids inconsistent line intersections in path_coarse. Updates #23 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-01 18:34:40 +01:00
Elias Naur	2068171f96	path_coarse.comp: tighten variable scopes, delete unused variables No functional changes. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-01 18:01:04 +01:00
Raph Levien	97dcb5122e	Merge branch 'master' into image_work	2020-11-29 17:09:48 -08:00
Raph Levien	b8ea1e35cf	Merge branch 'master' into clip_opt	2020-11-29 17:07:46 -08:00
Elias Naur	feeb459fa1	remove FillMask and FillMaskInv Obsoleted by BeginClip/EndClip. Updates #36 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-11-29 16:59:58 +01:00
Elias Naur	bd450ef461	piet-gpu-types: remove unused Segment and SegChunk types Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-11-29 16:51:35 +01:00
Raph Levien	4138f8a516	Optimize clips Optimize tiles with clip masks that are all-zero or all-one. Part of #36	2020-11-27 09:30:35 -08:00
Raph Levien	facc9e0982	Use sampler for texture images Provide images to fine rasterization kernel as readonly textures with a sampler, rather than storage images. That lets us use the GPU's hardware for sampling, which should be considerably more efficient. There are a bunch of parameters that are hardcoded, but it does seem to work.	2020-11-25 18:05:10 -08:00
Raph Levien	047a0830d1	Towards wiring up images to k4 This patch passes a dynamically sized array of textures to the fine rasterizer. A bunch of the low level Vulkan stuff is done, but only enough of the shaders and encoders to do minimal testing. We'll want to switch from storage images to sampled images, track the actual array of textures during encoding, use that to build the descriptor set (which will need to be more dynamic), and of course run image elements through the pipeline. Progress towards #38	2020-11-24 22:11:38 -08:00
Raph Levien	a60c2dd3c8	Scratch buffer for clip stack We keep a small window of the clip stack in registers in the fine rasterization kernel, and when that window is exceeded, spill to global memory, so the clip stack can be unbounded.	2020-11-22 18:14:09 -08:00
Raph Levien	b928c7a3ed	Restore FillMaskInv logic	2020-11-21 10:47:28 -08:00
Raph Levien	13134e7cb3	Restore FillMask logic Per discussion, don't remove FillMask until we get unbounded clip stacks.	2020-11-21 07:00:03 -08:00
Raph Levien	d14895b107	Continuing work on clips I realized there's a problem with encoding clip bboxes relative to the current transform (see #36 for a more detailed explanation), so this is changing it to absolute bboxes. This more or less gets clips working. There are optimization opportunities (all-clear and all-opaque mask tiles), and it doesn't deal with overflow of the blend stack, but it seems to basically work.	2020-11-20 18:25:27 -08:00
Raph Levien	f53d00e6bc	Add transforms and state stack Actually handle transforms in RenderCtx (was implemented in renderer but not actually plumbed through). This also requires maintaining a state stack, which will also be required for clipping. This PR also starts work on encoding clipping, including tracking bounding boxes. WIP, none of this is tested yet.	2020-11-20 18:25:27 -08:00
Elias Naur	b942e4035b	piet-gpu/shader: ensure forward progress in decoupled lookback The Vulkan and OpenGL specifications offer only weak forward progress guarantees, and in practice several mobile devices fail to complete the decoupled lookback spinloop without mitigation. This patch implements Raph's suggestion from the "Forward Progress" section from https://raphlinus.github.io/gpu/2020/04/30/prefix-sum.html Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-10-25 21:02:58 +01:00
Elias Naur	bc01180519	piet-gpu/shader: delete unused is_fill from elements.comp Delete debug code as well. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-10-25 20:59:54 +01:00
Elias Naur	8fab45544e	shader: implement clip paths Expand the the final kernel4 stage to maintain a per-pixel mask. Introduce two new path elements, FillMask and FillMaskInv, to fill the mask. FillMask acts like Fill, while FillMaskInv fills the area outside the path. SVG clipPaths is then representable by a FillMaskInv(0.0) for every nested path, preceded by a FillMask(1.0) to clear the mask. The bounding box for FillMaskInv elements is the entire screen; tightening of the bounding box is left for future work. Note that a fullscreen bounding box is not hopelessly inefficient because completely filling a tile with a mask is just a single CmdSolidMask per tile. Fixes #30 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-10-09 13:20:26 +02:00
Elias Naur	55cfd472a5	shader: delete unused code Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-10-09 13:20:26 +02:00
Elias Naur	9be0faba6f	piet-gpu-types: remove unused scene elements Delete image compute shader as well; it is unused. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-09-27 18:57:53 +02:00
Elias Naur	fa9bf0dc2b	piet-gpu-types: remove unused ptcl types Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-09-27 18:30:33 +02:00
Elias Naur	dceb0f9412	piet-gpu-types: remove unused annotated types Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-09-21 10:55:58 +02:00
Elias Naur	ac3ac3ddff	shader: introduce a crude setting for adjusting the maximum workgroup size Both the Vulkan and OpenGL ES spec allow implementations to limit workgroups to 128 threads. Add a LG_WG_FACTOR setting for easy switching between 128 and 256 threads, with 256 being kept as the default setting. Manually tested that LG_WG_FACTOR = 0 (128 threads) works as expected. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-09-13 13:04:13 +02:00
Elias Naur	326f7f0d03	shader: delete more unused code and variables Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-09-13 13:03:56 +02:00
Elias Naur	05636995dd	compute IMAGE_WIDTH and IMAGE_HEIGHT; remove dead code from setup.h Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-08-29 15:03:40 +02:00
Elias Naur	de4f963ba0	shader: remove dead code Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-08-28 17:37:46 +02:00
Elias Naur	cfd57361c4	Fix linewidth transformations The transformation determinant is signed, but we're only interested in the absolute scale for transforming linewidths. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-08-24 16:12:18 +02:00
bhmerchant@gmail.com	d836d21d12	Clean up bits of right edge tracking logic left over from sort-middle.	2020-08-12 19:57:14 -07:00
msiglreith	1cc5c7ac0d	Shader documentation and a slight cleanup	2020-06-28 15:37:27 +02:00
msiglreith	eed71721eb	Update winit example	2020-06-14 23:32:59 +02:00
Raph Levien	79cc9da811	Fancy flattening Implement same flattening algorithm as kurbo.	2020-06-09 20:45:19 -07:00
Raph Levien	eaa1d261c3	Sederberg error metric Use proper math to compute number of subdivisions. This works but is not very satisfying, as it over-subdivides.	2020-06-09 18:43:49 -07:00
Raph Levien	b571e0d10c	Continue wiring up gpu-side flattening All segments given to path coarse raster are cubics. Flatten to quadratics. This works but the quality is not (yet) good.	2020-06-09 17:56:11 -07:00
Raph Levien	0f44bc8b78	Start GPU-side flattening This starts the work on GPU-side flattening by plumbing curves through.	2020-06-09 16:01:47 -07:00
Raph Levien	3a8227d025	Non-load balanced coarse path raster This is a bit of a revert of the load-balanced ("more parallel") coarse path rasterizer, but includes fills and also uses atomicExchange. I'm doing it this way because it should be considerably easier to do flattening in this structure, even though there will be some performance regression.	2020-06-09 15:09:53 -07:00
Raph Levien	7118c8efc1	Fix backdrop of segments to left of viewport Make sure we account for backdrop in segments clipped out of viewport.	2020-06-09 10:25:22 -07:00
Raph Levien	6db4e20bbb	More parallel backdrop propagation This is a nice improvement but still not great on tiger.	2020-06-06 08:23:40 -07:00
Raph Levien	af0a1af8e1	Make fills work The backdrop propagation is slow but it does work.	2020-06-05 22:40:44 -07:00
Raph Levien	f9f5961428	Use atomicExchange over atomicCompSwap Significant perf win (approx 2x in the path coarse rasterizer)	2020-06-05 08:24:26 -07:00
Raph Levien	e5dd9ae01e	More parallel path coarse raster Use fancier load balancing algorithm for coarse rendering of paths. Seems to work and an improvement in some cases.	2020-06-04 17:42:33 -07:00
Raph Levien	877da4a98e	Faster coarse raster Store a lot more tile context in shared memory and do the work from that.	2020-06-04 10:39:08 -07:00
Raph Levien	e1aa9b2f5d	Remove bbox guard It's probably not necessary. This development still work in progress.	2020-06-03 20:59:19 -07:00

1 2 3

103 commits