vello

alex/vello

mirror of https://github.com/italicsjenga/vello.git synced 2025-01-10 20:51:29 +11:00

Author	SHA1	Message	Date
Elias Naur	5db427c549	kernel4: compute and output alpha Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-31 19:51:49 +02:00
Elias Naur	ee4429a26f	kernel4: separate area from alpha in clip stack This change prepares for kernel4 to output alpha. No functional changes. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-31 19:51:42 +02:00
Elias Naur	22507dea0e	pre-allocate kernel4 scratch space in coarse.comp coarse.comp knows the maximum stack depth, and can pre-allocate scratch space for kernel4.comp. Kernel4 no longer contains allocations nor control barriers. The invocation local blend stack is gone as well; it didn't seem to make any difference in performance to always use global memory for pushing and popping. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-31 18:48:19 +02:00
Elias Naur	e6b535d942	coarse.comp: extract area commands into function No functional changes. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-30 19:56:09 +02:00
Elias Naur	d916a9e2c4	backdrop.comp: support stroked Annotated_Image and Annotated_BeginClip Commit `8db77e180e` added support for strokes to FillImage and BeginClip, but missed backdrop.comp. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-30 19:33:25 +02:00
Elias Naur	678bfedfca	kernel4: assume colors in alpha-premultiplied sRGB format See http://ssp.impulsetrain.com/gamma-premult.html for a description of the format. Pre-multiplied alpha only matters for translucent objects; draw a few such shapes in the test render. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-29 21:17:01 +02:00
Elias Naur	eb37db1b05	replace per-element fill mode flags with a SetFillMode element Fixes #70 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-29 21:10:25 +02:00
Elias Naur	bb61f875dc	kernel4: remove dead code left over from previous clipping approach Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-29 21:10:17 +02:00
Tatsuyuki Ishi	4864a7fe0f	Create chunks over the x axis in addition to y axis This allows more coalescing with image loads/stores, since all of our images are stored with a tiled layout.	2021-03-23 20:54:49 +09:00
Elias Naur	f0127812eb	tightly pack fine rasterizer commands Reclaims the space waste from splitting fill mode commands from fill commands. For example, a CmdStroke + CmdColor use an extra tag word compared to the former combined CmdStroke. This change shaves off that one word. In the future, we can pack several command tags into one tag word, saving even more space. Fixes #66 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 16:43:33 +01:00
Elias Naur	8db77e180e	support stroked fills for clips, images This change completes general support for stroked fills for clips and images. Annotated_size increases from 28 to 32, because of the linewidth field added to AnnoImage. Stroked image fills are presumably rare, and if memory pressure turns out to be a bottleneck, we could replace the linewidth field with a separate AnnoLinewidth elements. Updates #70 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 16:43:33 +01:00
Elias Naur	db59b5d570	coarse,kernel4: make stroke, (non-zero) fill, solid separate commands Before this change, every command (FillColor, FillImage, BeginClip) had (or would need) stroke, (non-zero) fill and solid variants. This change adds a command for each fill mode and their parameters, reducing code duplication and adds support for stroked FillImage and BeginClip as a side-effect. The rest of the pipeline doesn't yet support Stroked FillImage and BeginClip. That's a follow-up change. Since each command includes a tag, this change adds an extra word for each fill and stroke. That waste is also addressed in a follow-up. Updates #70 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 16:43:33 +01:00
Elias Naur	44bff2726c	collapse FillCubic and StrokeCubic into Cubic with flags for fill mode Updates #70 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:50:12 +01:00
Elias Naur	df055563bd	collapse annotated Fill and Stroke to Color with fill mode flag No functionality changes, just different encoding. Updates #70 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:50:12 +01:00
Elias Naur	e9ff509ab9	use tag flags for fill vs stroke modes in scene elements Encode stroke vs fill as tag flags, thereby reducing the number of scene elements. Encoding change only, no functional changes. The previous Stroke and Fill commands are merged to one command, FillColor. The encoding to annotated element is divergent, which is fixed when annotated elements move to tag flags. Updates #70 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:50:12 +01:00
Elias Naur	a5b6bda941	add support for element flags to shaders Commit `9afa9b86b6` added Rust support for encoding flags into elements. This change adds support to shaders by introducing variant tag structs: struct VariantTag { uint tag; uint flags; } and returning them from Variant_tag functions. It also adds a flags argument to write functions for enum variants that include TagFlags. No functionality changes. Updates #70 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:50:12 +01:00
Elias Naur	903ab1fb59	implement FillImage command and sRGB support FillImage is like Fill, except that it takes its color from one or more image atlases. kernel4 uses a single image for non-Vulkan hosts, and the dynamic sized array of image descriptors on Vulkan. A previous version of this commit used textures. I think images are a better choice for piet-gpu, for several reasons: - Texture sampling, in particular textureGrad, is slow on lower spec devices such as Google Pixel. Texture sampling is particularly slow and difficult to implement for CPU fallbacks. - Texture sampling need more parameters, in particular the full u,v transformation matrix, leading to a large increase in the command size. Since all commands use the same size, that memory penalty is paid by all scenes, not just scenes with textures. - It is unlikely that piet-gpu will support every kind of fill for every client, because each kind must be added to kernel4. With FillImage, a client will prepare the image(s) in separate shader stages, sampling and applying transformations and special effects as needed. Textures that align with the output pixel grid can be used directly, without pre-processing. Note that the pre-processing step can run concurrently with the piet-gpu pipeline; Only the last stage, kernel4, needs the images. Pre-processing most likely uses fixed function vertex/fragment programs, which on some GPUs may run in parallel with piet-gpu's compute programs. While here, fix a few validation errors: - Explicitly enable EXT_descriptor_indexing, KHR_maintenance3, KHR_get_physical_device_properties2. - Specify a vkDescriptorSetVariableDescriptorCountAllocateInfo for vkAllocateDescriptorSets. Otherwise, variable image2D arrays won't work (but sampler2D arrays do, at least on my setup). Updates #38 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:50:12 +01:00
Elias Naur	07e07c7544	ensure consistent path segment transformation As described in #62, the non-deterministic scene monoid may result in slightly different transformations for path segments in an otherwise closed path. This change ensures consistent transformation across paths in three steps. First, absolute transformations computed by the scene monoid is stored along with path segments and annotated elements. Second, elements.comp no longer transforms path segments. Instead, each segment is stored untransformed along with a reference to its absolute transformation. Finally, path_coarse performs the transformation of path segments. Because all segments in a path share a single transformation reference, the inconsistency in #62 is avoided. Fixes #62 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:45:23 +01:00
Elias Naur	ad444f615c	elements.comp: use shared array of structs directly The NVIDIA shader compiler bug that forced splitting of the state struct into primitive types is now fixed. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:45:23 +01:00
Elias Naur	79d722df48	remove unused commands from pathseg Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-03-19 12:45:23 +01:00
Elias Naur	b73eabf4eb	kernel4.comp: remove unused commands Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-02-24 15:32:24 +01:00
Elias Naur	6a4e26ef2a	all: add optional memory checks Defining MEM_DEBUG in mem.h will add a size field to Alloc and enable bounds and alignment checks for every memory read and write. Notes: - Deriving an Alloc from Path.tiles is unsound, but it's more trouble to convert Path.tiles from TileRef to a variable sized Alloc. - elements.comp note that "We should be able to use an array of structs but the NV shader compiler doesn't seem to like it". If that's still relevant, does the shared arrays of Allocs work? Signed-off-by: Elias Naur <mail@eliasnaur.com>	2021-02-15 16:07:45 +01:00
Elias Naur	ee67a0a515	kernel4: simplify a tiny bit Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Elias Naur	716517cc04	coarse,binning: organize bins into width_in_bins x height_in_bins The binning shader supports up to N_TILE bins. To efficiently cover wide or tall viewports, convert the rigid N_TILE_X x N_TILE_Y bin layout to a variable width_in_bins x height_in_bins layout. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Elias Naur	ef4ec772ad	backdrop: repair unsound optimization Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Elias Naur	8b62022749	backdrop: avoid a (benign) zero-sized read Found with MEM_DEBUG added in later change. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Elias Naur	c4f5a69a0d	implement variable output sizing Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Elias Naur	c67696714b	coarse.comp: don't write Cmd_End to tiles out of bounds If WIDTH_IN_TILES or HEIGHT_IN_TILES are not divisible by N_TILE_X or N_TILE_Y respectively, the previously unconditional Cmd_End_write would write out of bounds. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Elias Naur	4de67d9081	unify GPU memory management Merge all static and dynamic buffers to just one, "memory". Add a malloc function for dynamic allocations. Unify static allocation offsets into a "config" buffer containing scene setup (number of paths, number of path segments), as well as the memory offsets of the static allocations. Finally, set an overflow flag when an allocation fail, and make sure to exit shader execution as soon as that triggers. Add checks before beginning execution in case the client wants to run two or more shaders before checking the flag. The "state" buffer is left alone because it needs zero'ing and because it is accessed with the "volatile" keyword. Fixes #40 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Elias Naur	a2a2d12c5d	path_coarse.comp: fix intersection inconsistencies, take 2 The previous attempt to fix inconsistent intersections because of floating point inaccuracy[0] missed two cases. The first case is that for top intersections with the very first row would fail the test tag == PathSeg_FillCubic && y > y0 && xbackdrop < bbox.z In particular, y is not larger than y0 when y0 has been clipped to 0. Fix that by re-introducing the min(p0.y, p1.y) < tile_y0 check that does work and is just as consistent. Add similar check, min(p0.x, p1.x) < tile_x0, for deciding when to clip the segment to the left edge (but keep consistent xray check for deciding left edge intersections). The second case is that the tracking left intersections in the [xray, next_xray] range of tiles may fail when next_xray is forced to last_xray, the final xray value. Fix that case by computing next_xray explicitly, before looping over the x tiles. The code is now much simpler. Finally, ensure that xx0 and xx1 doesn't overflow the allocated number of tiles by clamping them after setting them. Adjust xx0 to include xray, just as xx1 is adjusted; I haven't seen corruption without it, but it's not obvious xx0 always includes xray. While here, replace a "+=" on a guaranteed zero value to just "=". Updates #23 [0] `29cfb8b63e` Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-27 20:24:29 +01:00
Elias Naur	d21f2b68de	all: add SPDX license headers Fixes #53 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-11 18:24:35 +01:00
Elias Naur	5c04e4882b	remove unused tilegroup.h and extra spaces from kernel4.comp Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-11 15:00:58 +01:00
Elias Naur	580b63e558	elements.comp: tighten state size calculations The state header is only one word (flags), not two. Move the partition atomic counter to a separate field instead of state[0], simplifying state offset calculations. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-10 18:48:16 +01:00
Elias Naur	1c6ca7e5fb	remove unused BinChunk type Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-08 00:45:08 +01:00
Raph Levien	634530fb91	Merge branch 'master' into image_work	2020-12-02 11:58:45 -08:00
Raph Levien	3906f348fd	Merge pull request #47 from linebender/clip_opt Optimize clips	2020-12-02 11:57:14 -08:00
Elias Naur	29cfb8b63e	eliminate inconsistent line intersections from path_coarse.comp The finite precision of floating point computations can lead the coarse renderer into inconsistent tile intersections, which implies impossible line segments such as lines with gaps or double intersections. The winding number algorithm is sensitive to these errors which show up as incorrectly filled paths. This change forces all intersections to be consistent. First, the floating point top edge intersection test is removed; top edge intersections are completely determined by left edge intersections. Then, left edge intersections are inserted from the tile with the last top edge intersection. The next top edge is then fixed to be the last tile with a left edge intersection. More details in the patch comments. Fixes #23 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-01 18:35:29 +01:00
Elias Naur	19f4d9fa95	change tile segment representation to (origin, vector) Eliminates the precision loss of the subtraction in the sign(end.x - start.x) expression in kernel4. That's important for the next change that avoids inconsistent line intersections in path_coarse. Updates #23 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-01 18:34:40 +01:00
Elias Naur	2068171f96	path_coarse.comp: tighten variable scopes, delete unused variables No functional changes. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-12-01 18:01:04 +01:00
Raph Levien	97dcb5122e	Merge branch 'master' into image_work	2020-11-29 17:09:48 -08:00
Raph Levien	b8ea1e35cf	Merge branch 'master' into clip_opt	2020-11-29 17:07:46 -08:00
Elias Naur	feeb459fa1	remove FillMask and FillMaskInv Obsoleted by BeginClip/EndClip. Updates #36 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-11-29 16:59:58 +01:00
Elias Naur	bd450ef461	piet-gpu-types: remove unused Segment and SegChunk types Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-11-29 16:51:35 +01:00
Raph Levien	4138f8a516	Optimize clips Optimize tiles with clip masks that are all-zero or all-one. Part of #36	2020-11-27 09:30:35 -08:00
Raph Levien	facc9e0982	Use sampler for texture images Provide images to fine rasterization kernel as readonly textures with a sampler, rather than storage images. That lets us use the GPU's hardware for sampling, which should be considerably more efficient. There are a bunch of parameters that are hardcoded, but it does seem to work.	2020-11-25 18:05:10 -08:00
Raph Levien	047a0830d1	Towards wiring up images to k4 This patch passes a dynamically sized array of textures to the fine rasterizer. A bunch of the low level Vulkan stuff is done, but only enough of the shaders and encoders to do minimal testing. We'll want to switch from storage images to sampled images, track the actual array of textures during encoding, use that to build the descriptor set (which will need to be more dynamic), and of course run image elements through the pipeline. Progress towards #38	2020-11-24 22:11:38 -08:00
Raph Levien	a60c2dd3c8	Scratch buffer for clip stack We keep a small window of the clip stack in registers in the fine rasterization kernel, and when that window is exceeded, spill to global memory, so the clip stack can be unbounded.	2020-11-22 18:14:09 -08:00
Raph Levien	b928c7a3ed	Restore FillMaskInv logic	2020-11-21 10:47:28 -08:00
Raph Levien	13134e7cb3	Restore FillMask logic Per discussion, don't remove FillMask until we get unbounded clip stacks.	2020-11-21 07:00:03 -08:00
Raph Levien	d14895b107	Continuing work on clips I realized there's a problem with encoding clip bboxes relative to the current transform (see #36 for a more detailed explanation), so this is changing it to absolute bboxes. This more or less gets clips working. There are optimization opportunities (all-clear and all-opaque mask tiles), and it doesn't deal with overflow of the blend stack, but it seems to basically work.	2020-11-20 18:25:27 -08:00
Raph Levien	f53d00e6bc	Add transforms and state stack Actually handle transforms in RenderCtx (was implemented in renderer but not actually plumbed through). This also requires maintaining a state stack, which will also be required for clipping. This PR also starts work on encoding clipping, including tracking bounding boxes. WIP, none of this is tested yet.	2020-11-20 18:25:27 -08:00
Elias Naur	b942e4035b	piet-gpu/shader: ensure forward progress in decoupled lookback The Vulkan and OpenGL specifications offer only weak forward progress guarantees, and in practice several mobile devices fail to complete the decoupled lookback spinloop without mitigation. This patch implements Raph's suggestion from the "Forward Progress" section from https://raphlinus.github.io/gpu/2020/04/30/prefix-sum.html Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-10-25 21:02:58 +01:00
Elias Naur	bc01180519	piet-gpu/shader: delete unused is_fill from elements.comp Delete debug code as well. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-10-25 20:59:54 +01:00
Elias Naur	8fab45544e	shader: implement clip paths Expand the the final kernel4 stage to maintain a per-pixel mask. Introduce two new path elements, FillMask and FillMaskInv, to fill the mask. FillMask acts like Fill, while FillMaskInv fills the area outside the path. SVG clipPaths is then representable by a FillMaskInv(0.0) for every nested path, preceded by a FillMask(1.0) to clear the mask. The bounding box for FillMaskInv elements is the entire screen; tightening of the bounding box is left for future work. Note that a fullscreen bounding box is not hopelessly inefficient because completely filling a tile with a mask is just a single CmdSolidMask per tile. Fixes #30 Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-10-09 13:20:26 +02:00
Elias Naur	55cfd472a5	shader: delete unused code Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-10-09 13:20:26 +02:00
Elias Naur	9be0faba6f	piet-gpu-types: remove unused scene elements Delete image compute shader as well; it is unused. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-09-27 18:57:53 +02:00
Elias Naur	fa9bf0dc2b	piet-gpu-types: remove unused ptcl types Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-09-27 18:30:33 +02:00
Elias Naur	dceb0f9412	piet-gpu-types: remove unused annotated types Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-09-21 10:55:58 +02:00
Elias Naur	ac3ac3ddff	shader: introduce a crude setting for adjusting the maximum workgroup size Both the Vulkan and OpenGL ES spec allow implementations to limit workgroups to 128 threads. Add a LG_WG_FACTOR setting for easy switching between 128 and 256 threads, with 256 being kept as the default setting. Manually tested that LG_WG_FACTOR = 0 (128 threads) works as expected. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-09-13 13:04:13 +02:00
Elias Naur	326f7f0d03	shader: delete more unused code and variables Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-09-13 13:03:56 +02:00
Elias Naur	05636995dd	compute IMAGE_WIDTH and IMAGE_HEIGHT; remove dead code from setup.h Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-08-29 15:03:40 +02:00
Elias Naur	de4f963ba0	shader: remove dead code Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-08-28 17:37:46 +02:00
Elias Naur	cfd57361c4	Fix linewidth transformations The transformation determinant is signed, but we're only interested in the absolute scale for transforming linewidths. Signed-off-by: Elias Naur <mail@eliasnaur.com>	2020-08-24 16:12:18 +02:00
bhmerchant@gmail.com	d836d21d12	Clean up bits of right edge tracking logic left over from sort-middle.	2020-08-12 19:57:14 -07:00
msiglreith	1cc5c7ac0d	Shader documentation and a slight cleanup	2020-06-28 15:37:27 +02:00
msiglreith	eed71721eb	Update winit example	2020-06-14 23:32:59 +02:00
Raph Levien	79cc9da811	Fancy flattening Implement same flattening algorithm as kurbo.	2020-06-09 20:45:19 -07:00
Raph Levien	eaa1d261c3	Sederberg error metric Use proper math to compute number of subdivisions. This works but is not very satisfying, as it over-subdivides.	2020-06-09 18:43:49 -07:00
Raph Levien	b571e0d10c	Continue wiring up gpu-side flattening All segments given to path coarse raster are cubics. Flatten to quadratics. This works but the quality is not (yet) good.	2020-06-09 17:56:11 -07:00
Raph Levien	0f44bc8b78	Start GPU-side flattening This starts the work on GPU-side flattening by plumbing curves through.	2020-06-09 16:01:47 -07:00
Raph Levien	3a8227d025	Non-load balanced coarse path raster This is a bit of a revert of the load-balanced ("more parallel") coarse path rasterizer, but includes fills and also uses atomicExchange. I'm doing it this way because it should be considerably easier to do flattening in this structure, even though there will be some performance regression.	2020-06-09 15:09:53 -07:00
Raph Levien	7118c8efc1	Fix backdrop of segments to left of viewport Make sure we account for backdrop in segments clipped out of viewport.	2020-06-09 10:25:22 -07:00
Raph Levien	6db4e20bbb	More parallel backdrop propagation This is a nice improvement but still not great on tiger.	2020-06-06 08:23:40 -07:00
Raph Levien	af0a1af8e1	Make fills work The backdrop propagation is slow but it does work.	2020-06-05 22:40:44 -07:00
Raph Levien	f9f5961428	Use atomicExchange over atomicCompSwap Significant perf win (approx 2x in the path coarse rasterizer)	2020-06-05 08:24:26 -07:00
Raph Levien	e5dd9ae01e	More parallel path coarse raster Use fancier load balancing algorithm for coarse rendering of paths. Seems to work and an improvement in some cases.	2020-06-04 17:42:33 -07:00
Raph Levien	877da4a98e	Faster coarse raster Store a lot more tile context in shared memory and do the work from that.	2020-06-04 10:39:08 -07:00
Raph Levien	e1aa9b2f5d	Remove bbox guard It's probably not necessary. This development still work in progress.	2020-06-03 20:59:19 -07:00
Raph Levien	7f4a6523a8	Filter sparse tiles Have a more-parallel read of the tile structures based on bbox coverage, and only set the bit when the tile isn't empty. This is a speedup, but there is some duplicated work and it is possible to improve it further.	2020-06-03 17:55:42 -07:00
Raph Levien	63ba45c774	Fix performance issues Use larger workgroup for tile initialization (utilization was poor). Provide correct element count to coarse rasterizer.	2020-06-03 15:32:58 -07:00
Raph Levien	ff8cee059c	Optimize tile allocation Use parallel scheme to zero out tiles.	2020-06-03 14:46:41 -07:00
Raph Levien	70a9c17e23	Continue building out pipeline Plumbs the new tiling scheme to k4. This works (stroke only) but still has some performance issues.	2020-06-03 12:21:09 -07:00
Raph Levien	294f6fd1db	Experiment with new sorting scheme Path segments are unsorted, but other elements are using the same sort-middle approach as before. This is a checkpoint. At this point, there are unoptimized versions of tile init and coarse path raster, but it isn't wired up into a working pipeline. Also observing about a 3x performance regression in element processing, which needs to be investigated.	2020-06-03 09:29:25 -07:00
Raph Levien	2c185c3718	Simplify ringbuf We don't really need a ring buffer, as we only read what we're actually going to process.	2020-05-30 21:20:48 -07:00
Raph Levien	192ddc5eab	Parallel merge The fancy stuff :)	2020-05-30 21:11:13 -07:00
Raph Levien	121f29fef6	Merge one segment at a time No parallelism yet, but seems to improve performance.	2020-05-30 08:51:52 -07:00
Raph Levien	894ef156e1	Change to new merge strategy in binning WIP We get "device lost" on NV :/	2020-05-29 20:06:16 -07:00
Raph Levien	319aa703c4	Output multiple pixels per thread in k4 In kernel 4, compute a chunk of pixels rather than just one per thread. This is a dramatic speedup. (This commit cherry-picked from another working branch)	2020-05-28 07:54:24 -07:00
Raph Levien	e16f68d89d	Fix buffer overrun Was a little too eager zeroing out sh_is_segment[]	2020-05-26 22:47:28 -07:00
Raph Levien	dbcffb10db	Reinstate fills Add fills back in.	2020-05-25 15:27:03 -07:00
Raph Levien	3d422d9243	Allocate segment chunks in slabs Another speedup might be to special-case when the number of chunks in a stroke or fill command is 1, then the segment header doesn't need allocation and memory traffic is reduced. But right now we'll avoid the complexity.	2020-05-25 12:22:29 -07:00
Raph Levien	8eaf49a04d	Checkpoint parallel output Parallel segment output seems to be working for strokes.	2020-05-25 12:14:18 -07:00
Raph Levien	24b3def0a1	Start work on parallel segment output Output of segments is in parallel. Getting closer, some problems with chaining but mostly correct.	2020-05-24 21:02:19 -07:00
Raph Levien	55df3e6cc8	Fix linewidth math Coarse rasterization wasn't entirely taking line width into account. Also fix swizzle in matrix (not yet used). And fix missing End command in ptcl output (hasn't been a problem because buffer was cleared).	2020-05-24 09:43:41 -07:00
Raph Levien	7d040dff37	Bit magic for backdrop accumulation Use bit counting rather than iterating backdrop increments one by one. A nice if not huge speedup.	2020-05-22 07:30:32 -07:00
Raph Levien	a616b4d010	Rework right_edge computation in elements Trying to fit it into the fancy monad doesn't really work, so use a more straightforward approach to compute it from the aggregate. Also add yEdge logic (basically copying piet-metal). With a fix to ELEMENT_BINNING_RATIO (which I had simply gotten wrong), the example renders almost correctly, with small bounding box artifacts.	2020-05-21 10:00:56 -07:00
Raph Levien	ed4ed30708	Adding backdrop logic Calculation of backdrops kinda works but with issues, so WIP.	2020-05-20 16:03:27 -07:00
Raph Levien	076e6d600d	Progress on wiring up fills Write the right_edge to the binning output. More work on encoding the fill/stroke distinction and plumbing that through the pipeline. This is a bit unsatisfying because of the code duplication; having an extra fill/stroke bool might be better, but I want to avoid making the structs bigger (this could be solved by better packing in the struct encoding). Fills are plumbed through to the last stage. Backdrop is WIP.	2020-05-20 11:14:19 -07:00
Raph Levien	03da52cff8	Start implementing fills This should get the "right_edge" value for each segment plumbed through to the binning phase. It also needs to be plumbed to coarse raster and wired up there. Also considering WIP because none of this logic has been tested yet.	2020-05-19 20:40:04 -07:00
Raph Levien	0ed759814b	Smarter line segment coverage Compute tile coverage of segments using optimized algorithm. This algorithm does a bit of setup, then uses an efficient formula to compute the span per scan-line.	2020-05-19 09:26:44 -07:00

1 2 3 4

181 commits