vello

alex/vello

mirror of https://github.com/italicsjenga/vello.git synced 2025-01-11 04:51:32 +11:00

Author	SHA1	Message	Date
Raph Levien	59728868de	Merge branch 'master' into gradient	2021-08-16 10:53:19 -07:00
Raph Levien	6f707c4c62	Start work on gradients WIP. Most of the GPU-side work should be done (though it's not tested end-to-end and it's certainly possible I missed something), but still needs work on encoding side.	2021-07-12 06:56:52 -07:00
Ishi Tatsuyuki	7a2dc37d36	Remove manual blend stack spilling and rely on scratch memory instead v2: Add a panic when the nested blend depth exceeds the limit. v3: Rebase and partially remove code introduced in `22507de`.	2021-06-25 17:13:01 +09:00
Elias Naur	4b59525e1f	use mediump precision for kernel4 colors and areas Improves kernel4 performance for a Gio scene from ~22ms to ~15ms. Updates #83 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-04-20 10:15:42 +02:00
Elias Naur	d9d518b248	avoid non-uniform barrier control flow when exhausting memory The compute shaders have a check for the succesful completion of their preceding stage. However, consider a shader execution path like the following: void main() if (mem_error != NO_ERROR) { return; } ... malloc(...); ... barrier(); ... } and shader execution that fails to allocate memory, thereby setting mem_error to ERR_MALLOC_FAILED in malloc before reaching the barrier. If another shader execution then begins execution, its mem_eror check will make it return early and not reach the barrier. All GPU APIs require (dynamically) uniform control flow for barriers, and the above case may lead to GPU hangs in practice. Fix this issue by replacing the early exits with careful checks that don't interrupt barrier control flow. Unfortunately, it's harder to prove the soundness of the new checks, so this change also clears dynamic memory ranges in MEM_DEBUG mode when memory is exhausted. The result is that accessing memory after exhaustion triggers an error. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-04-20 10:15:29 +02:00
Raph Levien	1c842f8471	Merge branch 'master' into ext_query	2021-04-11 15:33:49 -07:00
Elias Naur	45ea43c157	kernel4: replace continue in switch to support D3D11 shader model 5.0 Without this change, the fxc.exe compiler complains error X3708: continue cannot be used in a switch Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-04-11 21:49:57 +02:00
Raph Levien	115cb855d9	Query extensions at runtime Don't run extensions unless they're available. This includes querying for descriptor indexing, and running one of two versions of kernel4 depending on whether it's enabled. Part of the support needed for #78	2021-04-08 15:11:15 -07:00
Elias Naur	5db427c549	kernel4: compute and output alpha Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-31 19:51:49 +02:00
Elias Naur	ee4429a26f	kernel4: separate area from alpha in clip stack This change prepares for kernel4 to output alpha. No functional changes. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-31 19:51:42 +02:00
Elias Naur	22507dea0e	pre-allocate kernel4 scratch space in coarse.comp coarse.comp knows the maximum stack depth, and can pre-allocate scratch space for kernel4.comp. Kernel4 no longer contains allocations nor control barriers. The invocation local blend stack is gone as well; it didn't seem to make any difference in performance to always use global memory for pushing and popping. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-31 18:48:19 +02:00
Elias Naur	678bfedfca	kernel4: assume colors in alpha-premultiplied sRGB format See http://ssp.impulsetrain.com/gamma-premult.html for a description of the format. Pre-multiplied alpha only matters for translucent objects; draw a few such shapes in the test render. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-29 21:17:01 +02:00
Elias Naur	bb61f875dc	kernel4: remove dead code left over from previous clipping approach Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-29 21:10:17 +02:00
Tatsuyuki Ishi	4864a7fe0f	Create chunks over the x axis in addition to y axis This allows more coalescing with image loads/stores, since all of our images are stored with a tiled layout.	2021-03-23 20:54:49 +09:00
Elias Naur	f0127812eb	tightly pack fine rasterizer commands Reclaims the space waste from splitting fill mode commands from fill commands. For example, a CmdStroke + CmdColor use an extra tag word compared to the former combined CmdStroke. This change shaves off that one word. In the future, we can pack several command tags into one tag word, saving even more space. Fixes #66 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 16:43:33 +01:00
Elias Naur	db59b5d570	coarse,kernel4: make stroke, (non-zero) fill, solid separate commands Before this change, every command (FillColor, FillImage, BeginClip) had (or would need) stroke, (non-zero) fill and solid variants. This change adds a command for each fill mode and their parameters, reducing code duplication and adds support for stroked FillImage and BeginClip as a side-effect. The rest of the pipeline doesn't yet support Stroked FillImage and BeginClip. That's a follow-up change. Since each command includes a tag, this change adds an extra word for each fill and stroke. That waste is also addressed in a follow-up. Updates #70 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 16:43:33 +01:00
Elias Naur	a5b6bda941	add support for element flags to shaders Commit `9afa9b86b6` added Rust support for encoding flags into elements. This change adds support to shaders by introducing variant tag structs: struct VariantTag { uint tag; uint flags; } and returning them from Variant_tag functions. It also adds a flags argument to write functions for enum variants that include TagFlags. No functionality changes. Updates #70 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:50:12 +01:00
Elias Naur	903ab1fb59	implement FillImage command and sRGB support FillImage is like Fill, except that it takes its color from one or more image atlases. kernel4 uses a single image for non-Vulkan hosts, and the dynamic sized array of image descriptors on Vulkan. A previous version of this commit used textures. I think images are a better choice for piet-gpu, for several reasons: - Texture sampling, in particular textureGrad, is slow on lower spec devices such as Google Pixel. Texture sampling is particularly slow and difficult to implement for CPU fallbacks. - Texture sampling need more parameters, in particular the full u,v transformation matrix, leading to a large increase in the command size. Since all commands use the same size, that memory penalty is paid by all scenes, not just scenes with textures. - It is unlikely that piet-gpu will support every kind of fill for every client, because each kind must be added to kernel4. With FillImage, a client will prepare the image(s) in separate shader stages, sampling and applying transformations and special effects as needed. Textures that align with the output pixel grid can be used directly, without pre-processing. Note that the pre-processing step can run concurrently with the piet-gpu pipeline; Only the last stage, kernel4, needs the images. Pre-processing most likely uses fixed function vertex/fragment programs, which on some GPUs may run in parallel with piet-gpu's compute programs. While here, fix a few validation errors: - Explicitly enable EXT_descriptor_indexing, KHR_maintenance3, KHR_get_physical_device_properties2. - Specify a vkDescriptorSetVariableDescriptorCountAllocateInfo for vkAllocateDescriptorSets. Otherwise, variable image2D arrays won't work (but sampler2D arrays do, at least on my setup). Updates #38 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:50:12 +01:00
Elias Naur	b73eabf4eb	kernel4.comp: remove unused commands Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-02-24 15:32:24 +01:00
Elias Naur	6a4e26ef2a	all: add optional memory checks Defining MEM_DEBUG in mem.h will add a size field to Alloc and enable bounds and alignment checks for every memory read and write. Notes: - Deriving an Alloc from Path.tiles is unsound, but it's more trouble to convert Path.tiles from TileRef to a variable sized Alloc. - elements.comp note that "We should be able to use an array of structs but the NV shader compiler doesn't seem to like it". If that's still relevant, does the shared arrays of Allocs work? Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-02-15 16:07:45 +01:00
Elias Naur	ee67a0a515	kernel4: simplify a tiny bit Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Elias Naur	c4f5a69a0d	implement variable output sizing Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Elias Naur	4de67d9081	unify GPU memory management Merge all static and dynamic buffers to just one, "memory". Add a malloc function for dynamic allocations. Unify static allocation offsets into a "config" buffer containing scene setup (number of paths, number of path segments), as well as the memory offsets of the static allocations. Finally, set an overflow flag when an allocation fail, and make sure to exit shader execution as soon as that triggers. Add checks before beginning execution in case the client wants to run two or more shaders before checking the flag. The "state" buffer is left alone because it needs zero'ing and because it is accessed with the "volatile" keyword. Fixes #40 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Elias Naur	d21f2b68de	all: add SPDX license headers Fixes #53 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-11 18:24:35 +01:00
Elias Naur	5c04e4882b	remove unused tilegroup.h and extra spaces from kernel4.comp Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-11 15:00:58 +01:00
Raph Levien	634530fb91	Merge branch 'master' into image_work	2020-12-02 11:58:45 -08:00
Elias Naur	19f4d9fa95	change tile segment representation to (origin, vector) Eliminates the precision loss of the subtraction in the sign(end.x - start.x) expression in kernel4. That's important for the next change that avoids inconsistent line intersections in path_coarse. Updates #23 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-01 18:34:40 +01:00
Raph Levien	97dcb5122e	Merge branch 'master' into image_work	2020-11-29 17:09:48 -08:00
Elias Naur	feeb459fa1	remove FillMask and FillMaskInv Obsoleted by BeginClip/EndClip. Updates #36 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-11-29 16:59:58 +01:00
Raph Levien	facc9e0982	Use sampler for texture images Provide images to fine rasterization kernel as readonly textures with a sampler, rather than storage images. That lets us use the GPU's hardware for sampling, which should be considerably more efficient. There are a bunch of parameters that are hardcoded, but it does seem to work.	2020-11-25 18:05:10 -08:00
Raph Levien	047a0830d1	Towards wiring up images to k4 This patch passes a dynamically sized array of textures to the fine rasterizer. A bunch of the low level Vulkan stuff is done, but only enough of the shaders and encoders to do minimal testing. We'll want to switch from storage images to sampled images, track the actual array of textures during encoding, use that to build the descriptor set (which will need to be more dynamic), and of course run image elements through the pipeline. Progress towards #38	2020-11-24 22:11:38 -08:00
Raph Levien	a60c2dd3c8	Scratch buffer for clip stack We keep a small window of the clip stack in registers in the fine rasterization kernel, and when that window is exceeded, spill to global memory, so the clip stack can be unbounded.	2020-11-22 18:14:09 -08:00
Raph Levien	d14895b107	Continuing work on clips I realized there's a problem with encoding clip bboxes relative to the current transform (see #36 for a more detailed explanation), so this is changing it to absolute bboxes. This more or less gets clips working. There are optimization opportunities (all-clear and all-opaque mask tiles), and it doesn't deal with overflow of the blend stack, but it seems to basically work.	2020-11-20 18:25:27 -08:00
Raph Levien	f53d00e6bc	Add transforms and state stack Actually handle transforms in RenderCtx (was implemented in renderer but not actually plumbed through). This also requires maintaining a state stack, which will also be required for clipping. This PR also starts work on encoding clipping, including tracking bounding boxes. WIP, none of this is tested yet.	2020-11-20 18:25:27 -08:00
Elias Naur	8fab45544e	shader: implement clip paths Expand the the final kernel4 stage to maintain a per-pixel mask. Introduce two new path elements, FillMask and FillMaskInv, to fill the mask. FillMask acts like Fill, while FillMaskInv fills the area outside the path. SVG clipPaths is then representable by a FillMaskInv(0.0) for every nested path, preceded by a FillMask(1.0) to clear the mask. The bounding box for FillMaskInv elements is the entire screen; tightening of the bounding box is left for future work. Note that a fullscreen bounding box is not hopelessly inefficient because completely filling a tile with a mask is just a single CmdSolidMask per tile. Fixes #30 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-10-09 13:20:26 +02:00
Elias Naur	de4f963ba0	shader: remove dead code Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-08-28 17:37:46 +02:00
msiglreith	1cc5c7ac0d	Shader documentation and a slight cleanup	2020-06-28 15:37:27 +02:00
msiglreith	eed71721eb	Update winit example	2020-06-14 23:32:59 +02:00
Raph Levien	af0a1af8e1	Make fills work The backdrop propagation is slow but it does work.	2020-06-05 22:40:44 -07:00
Raph Levien	70a9c17e23	Continue building out pipeline Plumbs the new tiling scheme to k4. This works (stroke only) but still has some performance issues.	2020-06-03 12:21:09 -07:00
Raph Levien	319aa703c4	Output multiple pixels per thread in k4 In kernel 4, compute a chunk of pixels rather than just one per thread. This is a dramatic speedup. (This commit cherry-picked from another working branch)	2020-05-28 07:54:24 -07:00
Raph Levien	dbcffb10db	Reinstate fills Add fills back in.	2020-05-25 15:27:03 -07:00
Raph Levien	8eaf49a04d	Checkpoint parallel output Parallel segment output seems to be working for strokes.	2020-05-25 12:14:18 -07:00
Raph Levien	1240da3870	Delete old-style kernels and buffers Pave the way for the coarse raster pass to write to the ptcl buffer.	2020-05-15 15:24:37 -07:00
Raph Levien	3a6428238b	Start writing tiles This is the first checkpoint where it actually runs a pipeline end to end, though it's far from accurate.	2020-05-15 14:31:52 -07:00
msiglreith	abd238bff3	Address review comments	2020-05-05 18:13:07 +02:00
msiglreith	e2ed54361d	Fix rebase issues and split into library and cli/winit binaries	2020-05-04 17:05:54 +02:00
msiglreith	b38e43f0c2	Initial work for surface support surface: handle extensions Implement swapchain creation and blit image to screen	2020-05-04 16:24:42 +02:00
Raph Levien	dcdd35e0b8	Implement solid color cmd Avoids empty fill segment list, which was a minor bug. Also increase tolerance to 0.25 to juice performance.	2020-05-02 10:53:16 -07:00
Raph Levien	aa83d782ed	Fills Adds fills, and has more or less working tiger render (with artifacts).	2020-05-01 19:42:20 -07:00

1 2

55 commits