vello

alex/vello

mirror of https://github.com/italicsjenga/vello.git synced 2025-01-11 04:51:32 +11:00

Author	SHA1	Message	Date
Raph Levien	05e81acebc	Basically get gradients working Separate out render context upload from renderer creation. Upload ramps to GPU buffer. Encode gradients to scene description. Fix a number of bugs in uploading and processing. This renders gradients in a test image, but has some shortcomings. For one, staging buffers need to be applied for a couple things (they're just host mapped for now). Also, the interaction between sRGB and premultiplied alpha isn't quite right. The size of the gradient ramp buffer is fixed and should be dynamic. And of course there's always more optimization to be done, including making the upload of gradient ramps more incremental, and probably hashing of the stops instead of the processed ramps.	2021-08-09 16:16:46 -07:00
Raph Levien	6f707c4c62	Start work on gradients WIP. Most of the GPU-side work should be done (though it's not tested end-to-end and it's certainly possible I missed something), but still needs work on encoding side.	2021-07-12 06:56:52 -07:00
Elias Naur	d9d518b248	avoid non-uniform barrier control flow when exhausting memory The compute shaders have a check for the succesful completion of their preceding stage. However, consider a shader execution path like the following: void main() if (mem_error != NO_ERROR) { return; } ... malloc(...); ... barrier(); ... } and shader execution that fails to allocate memory, thereby setting mem_error to ERR_MALLOC_FAILED in malloc before reaching the barrier. If another shader execution then begins execution, its mem_eror check will make it return early and not reach the barrier. All GPU APIs require (dynamically) uniform control flow for barriers, and the above case may lead to GPU hangs in practice. Fix this issue by replacing the early exits with careful checks that don't interrupt barrier control flow. Unfortunately, it's harder to prove the soundness of the new checks, so this change also clears dynamic memory ranges in MEM_DEBUG mode when memory is exhausted. The result is that accessing memory after exhaustion triggers an error. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-04-20 10:15:29 +02:00
Elias Naur	3b4a72deb9	elements.comp: remove redundant assignment The assignment was made redundant by `eb86456f31`. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-04-20 10:14:04 +02:00
Elias Naur	eb86456f31	elements.comp: don't modify BeginClip bounding box The BeginClip and EndClip bounding boxes are absolute and must pairwise match. I mistakenly modified the BeginClip bounding box for stroked clips. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-04-08 19:56:37 +02:00
Elias Naur	eb37db1b05	replace per-element fill mode flags with a SetFillMode element Fixes #70 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-29 21:10:25 +02:00
Elias Naur	8db77e180e	support stroked fills for clips, images This change completes general support for stroked fills for clips and images. Annotated_size increases from 28 to 32, because of the linewidth field added to AnnoImage. Stroked image fills are presumably rare, and if memory pressure turns out to be a bottleneck, we could replace the linewidth field with a separate AnnoLinewidth elements. Updates #70 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 16:43:33 +01:00
Elias Naur	44bff2726c	collapse FillCubic and StrokeCubic into Cubic with flags for fill mode Updates #70 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:50:12 +01:00
Elias Naur	df055563bd	collapse annotated Fill and Stroke to Color with fill mode flag No functionality changes, just different encoding. Updates #70 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:50:12 +01:00
Elias Naur	e9ff509ab9	use tag flags for fill vs stroke modes in scene elements Encode stroke vs fill as tag flags, thereby reducing the number of scene elements. Encoding change only, no functional changes. The previous Stroke and Fill commands are merged to one command, FillColor. The encoding to annotated element is divergent, which is fixed when annotated elements move to tag flags. Updates #70 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:50:12 +01:00
Elias Naur	a5b6bda941	add support for element flags to shaders Commit `9afa9b86b6` added Rust support for encoding flags into elements. This change adds support to shaders by introducing variant tag structs: struct VariantTag { uint tag; uint flags; } and returning them from Variant_tag functions. It also adds a flags argument to write functions for enum variants that include TagFlags. No functionality changes. Updates #70 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:50:12 +01:00
Elias Naur	903ab1fb59	implement FillImage command and sRGB support FillImage is like Fill, except that it takes its color from one or more image atlases. kernel4 uses a single image for non-Vulkan hosts, and the dynamic sized array of image descriptors on Vulkan. A previous version of this commit used textures. I think images are a better choice for piet-gpu, for several reasons: - Texture sampling, in particular textureGrad, is slow on lower spec devices such as Google Pixel. Texture sampling is particularly slow and difficult to implement for CPU fallbacks. - Texture sampling need more parameters, in particular the full u,v transformation matrix, leading to a large increase in the command size. Since all commands use the same size, that memory penalty is paid by all scenes, not just scenes with textures. - It is unlikely that piet-gpu will support every kind of fill for every client, because each kind must be added to kernel4. With FillImage, a client will prepare the image(s) in separate shader stages, sampling and applying transformations and special effects as needed. Textures that align with the output pixel grid can be used directly, without pre-processing. Note that the pre-processing step can run concurrently with the piet-gpu pipeline; Only the last stage, kernel4, needs the images. Pre-processing most likely uses fixed function vertex/fragment programs, which on some GPUs may run in parallel with piet-gpu's compute programs. While here, fix a few validation errors: - Explicitly enable EXT_descriptor_indexing, KHR_maintenance3, KHR_get_physical_device_properties2. - Specify a vkDescriptorSetVariableDescriptorCountAllocateInfo for vkAllocateDescriptorSets. Otherwise, variable image2D arrays won't work (but sampler2D arrays do, at least on my setup). Updates #38 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:50:12 +01:00
Elias Naur	07e07c7544	ensure consistent path segment transformation As described in #62, the non-deterministic scene monoid may result in slightly different transformations for path segments in an otherwise closed path. This change ensures consistent transformation across paths in three steps. First, absolute transformations computed by the scene monoid is stored along with path segments and annotated elements. Second, elements.comp no longer transforms path segments. Instead, each segment is stored untransformed along with a reference to its absolute transformation. Finally, path_coarse performs the transformation of path segments. Because all segments in a path share a single transformation reference, the inconsistency in #62 is avoided. Fixes #62 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:45:23 +01:00
Elias Naur	ad444f615c	elements.comp: use shared array of structs directly The NVIDIA shader compiler bug that forced splitting of the state struct into primitive types is now fixed. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:45:23 +01:00
Elias Naur	6a4e26ef2a	all: add optional memory checks Defining MEM_DEBUG in mem.h will add a size field to Alloc and enable bounds and alignment checks for every memory read and write. Notes: - Deriving an Alloc from Path.tiles is unsound, but it's more trouble to convert Path.tiles from TileRef to a variable sized Alloc. - elements.comp note that "We should be able to use an array of structs but the NV shader compiler doesn't seem to like it". If that's still relevant, does the shared arrays of Allocs work? Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-02-15 16:07:45 +01:00
Elias Naur	4de67d9081	unify GPU memory management Merge all static and dynamic buffers to just one, "memory". Add a malloc function for dynamic allocations. Unify static allocation offsets into a "config" buffer containing scene setup (number of paths, number of path segments), as well as the memory offsets of the static allocations. Finally, set an overflow flag when an allocation fail, and make sure to exit shader execution as soon as that triggers. Add checks before beginning execution in case the client wants to run two or more shaders before checking the flag. The "state" buffer is left alone because it needs zero'ing and because it is accessed with the "volatile" keyword. Fixes #40 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Elias Naur	d21f2b68de	all: add SPDX license headers Fixes #53 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-11 18:24:35 +01:00
Elias Naur	580b63e558	elements.comp: tighten state size calculations The state header is only one word (flags), not two. Move the partition atomic counter to a separate field instead of state[0], simplifying state offset calculations. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-10 18:48:16 +01:00
Elias Naur	feeb459fa1	remove FillMask and FillMaskInv Obsoleted by BeginClip/EndClip. Updates #36 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-11-29 16:59:58 +01:00
Raph Levien	d14895b107	Continuing work on clips I realized there's a problem with encoding clip bboxes relative to the current transform (see #36 for a more detailed explanation), so this is changing it to absolute bboxes. This more or less gets clips working. There are optimization opportunities (all-clear and all-opaque mask tiles), and it doesn't deal with overflow of the blend stack, but it seems to basically work.	2020-11-20 18:25:27 -08:00
Elias Naur	b942e4035b	piet-gpu/shader: ensure forward progress in decoupled lookback The Vulkan and OpenGL specifications offer only weak forward progress guarantees, and in practice several mobile devices fail to complete the decoupled lookback spinloop without mitigation. This patch implements Raph's suggestion from the "Forward Progress" section from https://raphlinus.github.io/gpu/2020/04/30/prefix-sum.html Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-10-25 21:02:58 +01:00
Elias Naur	bc01180519	piet-gpu/shader: delete unused is_fill from elements.comp Delete debug code as well. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-10-25 20:59:54 +01:00
Elias Naur	8fab45544e	shader: implement clip paths Expand the the final kernel4 stage to maintain a per-pixel mask. Introduce two new path elements, FillMask and FillMaskInv, to fill the mask. FillMask acts like Fill, while FillMaskInv fills the area outside the path. SVG clipPaths is then representable by a FillMaskInv(0.0) for every nested path, preceded by a FillMask(1.0) to clear the mask. The bounding box for FillMaskInv elements is the entire screen; tightening of the bounding box is left for future work. Note that a fullscreen bounding box is not hopelessly inefficient because completely filling a tile with a mask is just a single CmdSolidMask per tile. Fixes #30 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-10-09 13:20:26 +02:00
Elias Naur	55cfd472a5	shader: delete unused code Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-10-09 13:20:26 +02:00
Elias Naur	cfd57361c4	Fix linewidth transformations The transformation determinant is signed, but we're only interested in the absolute scale for transforming linewidths. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-08-24 16:12:18 +02:00
bhmerchant@gmail.com	d836d21d12	Clean up bits of right edge tracking logic left over from sort-middle.	2020-08-12 19:57:14 -07:00
Raph Levien	eaa1d261c3	Sederberg error metric Use proper math to compute number of subdivisions. This works but is not very satisfying, as it over-subdivides.	2020-06-09 18:43:49 -07:00
Raph Levien	b571e0d10c	Continue wiring up gpu-side flattening All segments given to path coarse raster are cubics. Flatten to quadratics. This works but the quality is not (yet) good.	2020-06-09 17:56:11 -07:00
Raph Levien	0f44bc8b78	Start GPU-side flattening This starts the work on GPU-side flattening by plumbing curves through.	2020-06-09 16:01:47 -07:00
Raph Levien	294f6fd1db	Experiment with new sorting scheme Path segments are unsorted, but other elements are using the same sort-middle approach as before. This is a checkpoint. At this point, there are unoptimized versions of tile init and coarse path raster, but it isn't wired up into a working pipeline. Also observing about a 3x performance regression in element processing, which needs to be investigated.	2020-06-03 09:29:25 -07:00
Raph Levien	55df3e6cc8	Fix linewidth math Coarse rasterization wasn't entirely taking line width into account. Also fix swizzle in matrix (not yet used). And fix missing End command in ptcl output (hasn't been a problem because buffer was cleared).	2020-05-24 09:43:41 -07:00
Raph Levien	a616b4d010	Rework right_edge computation in elements Trying to fit it into the fancy monad doesn't really work, so use a more straightforward approach to compute it from the aggregate. Also add yEdge logic (basically copying piet-metal). With a fix to ELEMENT_BINNING_RATIO (which I had simply gotten wrong), the example renders almost correctly, with small bounding box artifacts.	2020-05-21 10:00:56 -07:00
Raph Levien	076e6d600d	Progress on wiring up fills Write the right_edge to the binning output. More work on encoding the fill/stroke distinction and plumbing that through the pipeline. This is a bit unsatisfying because of the code duplication; having an extra fill/stroke bool might be better, but I want to avoid making the structs bigger (this could be solved by better packing in the struct encoding). Fills are plumbed through to the last stage. Backdrop is WIP.	2020-05-20 11:14:19 -07:00
Raph Levien	03da52cff8	Start implementing fills This should get the "right_edge" value for each segment plumbed through to the binning phase. It also needs to be plumbed to coarse raster and wired up there. Also considering WIP because none of this logic has been tested yet.	2020-05-19 20:40:04 -07:00
Raph Levien	fe1790e724	Fix bbox bug Bounding boxes were being calculated as way too large in the element processing. Also wire up counters so winit binary is happy.	2020-05-16 21:20:25 -07:00
Raph Levien	93044b469b	Fix prefix sum First, add decoupled lookback. Second, fix problem with monoid that was overly aggressive in resetting the bbox.	2020-05-15 20:09:39 -07:00
Raph Levien	868b0320a4	Render strokes As of this point, it mostly renders stroke outlines for tiger. Some dropouts are because the scan in the elements pass doesn't do lookback yet, others are probably a bug.	2020-05-15 17:38:17 -07:00
Raph Levien	343e4c3075	Binning stage Adds a binning stage. This is a first draft, and a number of loose ends exist.	2020-05-12 17:34:15 -07:00
Raph Levien	736f883f66	Store annotated elements Apply transform to paths and annotate with computed linewidth and bounding box information, storing the result.	2020-05-12 12:13:39 -07:00
Raph Levien	9a8854ffab	Experimenting with sort-middle Starting a prototype that explores the sort-middle approach. This commit has a prefix sum pass computing state per element.	2020-05-12 08:54:09 -07:00

40 commits