vello

alex/vello

mirror of https://github.com/italicsjenga/vello.git synced 2024-10-17 23:11:30 +11:00

Author	SHA1	Message	Date
Elias Naur	a5b6bda941	add support for element flags to shaders Commit `9afa9b86b6` added Rust support for encoding flags into elements. This change adds support to shaders by introducing variant tag structs: struct VariantTag { uint tag; uint flags; } and returning them from Variant_tag functions. It also adds a flags argument to write functions for enum variants that include TagFlags. No functionality changes. Updates #70 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:50:12 +01:00
Elias Naur	903ab1fb59	implement FillImage command and sRGB support FillImage is like Fill, except that it takes its color from one or more image atlases. kernel4 uses a single image for non-Vulkan hosts, and the dynamic sized array of image descriptors on Vulkan. A previous version of this commit used textures. I think images are a better choice for piet-gpu, for several reasons: - Texture sampling, in particular textureGrad, is slow on lower spec devices such as Google Pixel. Texture sampling is particularly slow and difficult to implement for CPU fallbacks. - Texture sampling need more parameters, in particular the full u,v transformation matrix, leading to a large increase in the command size. Since all commands use the same size, that memory penalty is paid by all scenes, not just scenes with textures. - It is unlikely that piet-gpu will support every kind of fill for every client, because each kind must be added to kernel4. With FillImage, a client will prepare the image(s) in separate shader stages, sampling and applying transformations and special effects as needed. Textures that align with the output pixel grid can be used directly, without pre-processing. Note that the pre-processing step can run concurrently with the piet-gpu pipeline; Only the last stage, kernel4, needs the images. Pre-processing most likely uses fixed function vertex/fragment programs, which on some GPUs may run in parallel with piet-gpu's compute programs. While here, fix a few validation errors: - Explicitly enable EXT_descriptor_indexing, KHR_maintenance3, KHR_get_physical_device_properties2. - Specify a vkDescriptorSetVariableDescriptorCountAllocateInfo for vkAllocateDescriptorSets. Otherwise, variable image2D arrays won't work (but sampler2D arrays do, at least on my setup). Updates #38 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:50:12 +01:00
Elias Naur	07e07c7544	ensure consistent path segment transformation As described in #62, the non-deterministic scene monoid may result in slightly different transformations for path segments in an otherwise closed path. This change ensures consistent transformation across paths in three steps. First, absolute transformations computed by the scene monoid is stored along with path segments and annotated elements. Second, elements.comp no longer transforms path segments. Instead, each segment is stored untransformed along with a reference to its absolute transformation. Finally, path_coarse performs the transformation of path segments. Because all segments in a path share a single transformation reference, the inconsistency in #62 is avoided. Fixes #62 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:45:23 +01:00
Elias Naur	b73eabf4eb	kernel4.comp: remove unused commands Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-02-24 15:32:24 +01:00
Elias Naur	6a4e26ef2a	all: add optional memory checks Defining MEM_DEBUG in mem.h will add a size field to Alloc and enable bounds and alignment checks for every memory read and write. Notes: - Deriving an Alloc from Path.tiles is unsound, but it's more trouble to convert Path.tiles from TileRef to a variable sized Alloc. - elements.comp note that "We should be able to use an array of structs but the NV shader compiler doesn't seem to like it". If that's still relevant, does the shared arrays of Allocs work? Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-02-15 16:07:45 +01:00
Elias Naur	716517cc04	coarse,binning: organize bins into width_in_bins x height_in_bins The binning shader supports up to N_TILE bins. To efficiently cover wide or tall viewports, convert the rigid N_TILE_X x N_TILE_Y bin layout to a variable width_in_bins x height_in_bins layout. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Elias Naur	c4f5a69a0d	implement variable output sizing Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Elias Naur	c67696714b	coarse.comp: don't write Cmd_End to tiles out of bounds If WIDTH_IN_TILES or HEIGHT_IN_TILES are not divisible by N_TILE_X or N_TILE_Y respectively, the previously unconditional Cmd_End_write would write out of bounds. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Elias Naur	4de67d9081	unify GPU memory management Merge all static and dynamic buffers to just one, "memory". Add a malloc function for dynamic allocations. Unify static allocation offsets into a "config" buffer containing scene setup (number of paths, number of path segments), as well as the memory offsets of the static allocations. Finally, set an overflow flag when an allocation fail, and make sure to exit shader execution as soon as that triggers. Add checks before beginning execution in case the client wants to run two or more shaders before checking the flag. The "state" buffer is left alone because it needs zero'ing and because it is accessed with the "volatile" keyword. Fixes #40 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Raph Levien	b8ea1e35cf	Merge branch 'master' into clip_opt	2020-11-29 17:07:46 -08:00
Elias Naur	feeb459fa1	remove FillMask and FillMaskInv Obsoleted by BeginClip/EndClip. Updates #36 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-11-29 16:59:58 +01:00
Raph Levien	4138f8a516	Optimize clips Optimize tiles with clip masks that are all-zero or all-one. Part of #36	2020-11-27 09:30:35 -08:00
Raph Levien	b928c7a3ed	Restore FillMaskInv logic	2020-11-21 10:47:28 -08:00
Raph Levien	d14895b107	Continuing work on clips I realized there's a problem with encoding clip bboxes relative to the current transform (see #36 for a more detailed explanation), so this is changing it to absolute bboxes. This more or less gets clips working. There are optimization opportunities (all-clear and all-opaque mask tiles), and it doesn't deal with overflow of the blend stack, but it seems to basically work.	2020-11-20 18:25:27 -08:00
Raph Levien	f53d00e6bc	Add transforms and state stack Actually handle transforms in RenderCtx (was implemented in renderer but not actually plumbed through). This also requires maintaining a state stack, which will also be required for clipping. This PR also starts work on encoding clipping, including tracking bounding boxes. WIP, none of this is tested yet.	2020-11-20 18:25:27 -08:00
Elias Naur	8fab45544e	shader: implement clip paths Expand the the final kernel4 stage to maintain a per-pixel mask. Introduce two new path elements, FillMask and FillMaskInv, to fill the mask. FillMask acts like Fill, while FillMaskInv fills the area outside the path. SVG clipPaths is then representable by a FillMaskInv(0.0) for every nested path, preceded by a FillMask(1.0) to clear the mask. The bounding box for FillMaskInv elements is the entire screen; tightening of the bounding box is left for future work. Note that a fullscreen bounding box is not hopelessly inefficient because completely filling a tile with a mask is just a single CmdSolidMask per tile. Fixes #30 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-10-09 13:20:26 +02:00
Elias Naur	fa9bf0dc2b	piet-gpu-types: remove unused ptcl types Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-09-27 18:30:33 +02:00
Elias Naur	dceb0f9412	piet-gpu-types: remove unused annotated types Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-09-21 10:55:58 +02:00
Elias Naur	326f7f0d03	shader: delete more unused code and variables Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-09-13 13:03:56 +02:00
msiglreith	1cc5c7ac0d	Shader documentation and a slight cleanup	2020-06-28 15:37:27 +02:00
Raph Levien	af0a1af8e1	Make fills work The backdrop propagation is slow but it does work.	2020-06-05 22:40:44 -07:00
Raph Levien	877da4a98e	Faster coarse raster Store a lot more tile context in shared memory and do the work from that.	2020-06-04 10:39:08 -07:00
Raph Levien	e1aa9b2f5d	Remove bbox guard It's probably not necessary. This development still work in progress.	2020-06-03 20:59:19 -07:00
Raph Levien	7f4a6523a8	Filter sparse tiles Have a more-parallel read of the tile structures based on bbox coverage, and only set the bit when the tile isn't empty. This is a speedup, but there is some duplicated work and it is possible to improve it further.	2020-06-03 17:55:42 -07:00
Raph Levien	70a9c17e23	Continue building out pipeline Plumbs the new tiling scheme to k4. This works (stroke only) but still has some performance issues.	2020-06-03 12:21:09 -07:00
Raph Levien	294f6fd1db	Experiment with new sorting scheme Path segments are unsorted, but other elements are using the same sort-middle approach as before. This is a checkpoint. At this point, there are unoptimized versions of tile init and coarse path raster, but it isn't wired up into a working pipeline. Also observing about a 3x performance regression in element processing, which needs to be investigated.	2020-06-03 09:29:25 -07:00
Raph Levien	2c185c3718	Simplify ringbuf We don't really need a ring buffer, as we only read what we're actually going to process.	2020-05-30 21:20:48 -07:00
Raph Levien	192ddc5eab	Parallel merge The fancy stuff :)	2020-05-30 21:11:13 -07:00
Raph Levien	121f29fef6	Merge one segment at a time No parallelism yet, but seems to improve performance.	2020-05-30 08:51:52 -07:00
Raph Levien	894ef156e1	Change to new merge strategy in binning WIP We get "device lost" on NV :/	2020-05-29 20:06:16 -07:00
Raph Levien	e16f68d89d	Fix buffer overrun Was a little too eager zeroing out sh_is_segment[]	2020-05-26 22:47:28 -07:00
Raph Levien	dbcffb10db	Reinstate fills Add fills back in.	2020-05-25 15:27:03 -07:00
Raph Levien	3d422d9243	Allocate segment chunks in slabs Another speedup might be to special-case when the number of chunks in a stroke or fill command is 1, then the segment header doesn't need allocation and memory traffic is reduced. But right now we'll avoid the complexity.	2020-05-25 12:22:29 -07:00
Raph Levien	8eaf49a04d	Checkpoint parallel output Parallel segment output seems to be working for strokes.	2020-05-25 12:14:18 -07:00
Raph Levien	24b3def0a1	Start work on parallel segment output Output of segments is in parallel. Getting closer, some problems with chaining but mostly correct.	2020-05-24 21:02:19 -07:00
Raph Levien	55df3e6cc8	Fix linewidth math Coarse rasterization wasn't entirely taking line width into account. Also fix swizzle in matrix (not yet used). And fix missing End command in ptcl output (hasn't been a problem because buffer was cleared).	2020-05-24 09:43:41 -07:00
Raph Levien	7d040dff37	Bit magic for backdrop accumulation Use bit counting rather than iterating backdrop increments one by one. A nice if not huge speedup.	2020-05-22 07:30:32 -07:00
Raph Levien	a616b4d010	Rework right_edge computation in elements Trying to fit it into the fancy monad doesn't really work, so use a more straightforward approach to compute it from the aggregate. Also add yEdge logic (basically copying piet-metal). With a fix to ELEMENT_BINNING_RATIO (which I had simply gotten wrong), the example renders almost correctly, with small bounding box artifacts.	2020-05-21 10:00:56 -07:00
Raph Levien	ed4ed30708	Adding backdrop logic Calculation of backdrops kinda works but with issues, so WIP.	2020-05-20 16:03:27 -07:00
Raph Levien	076e6d600d	Progress on wiring up fills Write the right_edge to the binning output. More work on encoding the fill/stroke distinction and plumbing that through the pipeline. This is a bit unsatisfying because of the code duplication; having an extra fill/stroke bool might be better, but I want to avoid making the structs bigger (this could be solved by better packing in the struct encoding). Fills are plumbed through to the last stage. Backdrop is WIP.	2020-05-20 11:14:19 -07:00
Raph Levien	0ed759814b	Smarter line segment coverage Compute tile coverage of segments using optimized algorithm. This algorithm does a bit of setup, then uses an efficient formula to compute the span per scan-line.	2020-05-19 09:26:44 -07:00
Raph Levien	9bb06ec340	Correct rendering (on Intel) Handle multiple passes in coarse raster. Doesn't work on NV, WIP to find out why.	2020-05-16 06:43:31 -07:00
Raph Levien	868b0320a4	Render strokes As of this point, it mostly renders stroke outlines for tiger. Some dropouts are because the scan in the elements pass doesn't do lookback yet, others are probably a bug.	2020-05-15 17:38:17 -07:00
Raph Levien	3a6428238b	Start writing tiles This is the first checkpoint where it actually runs a pipeline end to end, though it's far from accurate.	2020-05-15 14:31:52 -07:00
Raph Levien	06cad48dca	Start output stage in coarse pass Still very much WIP but it's progress.	2020-05-14 17:27:18 -07:00
Raph Levien	cc89d0e285	Starting coarse rasterizer Working down the pipeline. WIP	2020-05-13 21:39:47 -07:00

46 commits