Commit graph

31 commits

Author SHA1 Message Date
Raph Levien e73049fe98 First cut at split blend stack
Split the blend stack into register and memory segments. Do blending in registers up to that size, then spill to memory if needed.

This version may regress performance on Pixel 4, as it uses common memory for the blend stack, rather than keeping that memory read-only in fine rasterization, and using a separate buffer for blend stack. This needs investigation. It's possible we'll want to have single common memory as a config option, as it pools allocations and decreases the probability of failure.

Also a flaw in this version: there is no checking of memory overflow.

For understanding code history: this commit largely reverts #77, but there were some intervening changes to blending, and this commit also implements the split so some of the stack is in registers.

Closes #156
2022-05-16 11:12:33 -07:00
Raph Levien acb3933d94 Variable size encoding of draw objects
This patch switches to a variable size encoding of draw objects.

In addition to the CPU-side scene encoding, it changes the representation of intermediate per draw object state from the `Annotated` struct to a variable "info" encoding. In addition, the bounding boxes are moved to a separate array (for a more "structure of "arrays" approach). Data that's unchanged from the scene encoding is not copied. Rather, downstream stages can access the data from the scene buffer (reducing allocation and copying).

Prefix sums, computed in `DrawMonoid` track the offset of both scene and intermediate data. The tags for the CPU-side encoding have been split into their own stream (again a change from AoS to SoA style).

This is not necessarily the final form. There's some stuff (including at least one piet-gpu-derive type) that can be deleted. In addition, the linewidth field should probably move from the info to path-specific. Also, the 1:1 correspondence between draw object and path has not yet been broken.

Closes #152
2022-03-14 16:32:08 -07:00
Raph Levien 3b67a4e7c1 New clip implementation
This PR reworks the clip implementation. The highlight is that clip bounding box accounting is now done on GPU rather than CPU. The clip mask is also rasterized on EndClip rather than BeginClip, which decreases memory traffic needed for the clip stack.

This is a pretty good working state, but not all cleanup has been applied. An important next step is to remove the CPU clip accounting (it is computed and encoded, but that result is not used). Another step is to remove the Annotated structure entirely.

Fixes #88. Also relevant to #119
2022-02-17 17:13:28 -08:00
Raph Levien d948126c16 Adjust workgroup sizes
Make max workgroup size 256 and respect LG_WG_FACTOR.

Because the monoid scans only support a height of 2, this will reduce
the maximum scene complexity we can render. But it also increases
compatibility. Supporting larger scans is a TODO.
2021-12-08 11:48:38 -08:00
Raph Levien 44327fe49f Beginnings of new element pipeline
This successfully renders the tiger; fills and strokes are supported.
Other parts of the imaging model, not yet.

Progress toward #119
2021-12-03 15:33:01 -08:00
Raph Levien 875c8badf4 Add draw object stage
This is one of the stages in the new element pipeline. It's a simple
one, just a prefix sum of a couple counts, and some of it will probably
get merged with a downstream stage, but we'll do it separately for now
for convenience.

This patch also contains an update to Vulkan tools 1.2.198, which
accounts for the large diff of translated shaders.
2021-12-02 13:37:16 -08:00
Raph Levien 178761dcb3 Path stream processing
This patch contains the core of the path stream processing, though some
integration bits are missing. The core logic is tested, though
combinations of path types, transforms, and line widths are not (yet).

Progress towards #119
2021-12-01 07:33:24 -08:00
Raph Levien 47f8812e2f Start work on new element pipeline
There's a bit of reorganizing as well. Shader stages are made available
from piet-gpu to the test rig, config is now a proper structure
(marshaled with bytemuck).

This commit just has the transform stage, which is a simple monoid scan
of affine transforms.

Progress toward #119
2021-11-24 08:01:43 -08:00
Elias Naur 039cfcf0de piet-gpu/shader: treat memoryBarrierBuffer as a control barrier
memoryBarrierBuffer is mapped to the threadgroup_barrier function in
Metal, which is a control barrier that must be executed by all threads
(or none). This change establishes that property for the two memory
barriers we have.

While here, remove ENABLE_IMAGE_INDICES completely; it was disabled in
an earlier change.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-08-20 20:41:35 +02:00
Raph Levien 6f707c4c62 Start work on gradients
WIP. Most of the GPU-side work should be done (though it's not tested
end-to-end and it's certainly possible I missed something), but still
needs work on encoding side.
2021-07-12 06:56:52 -07:00
Raph Levien 115cb855d9 Query extensions at runtime
Don't run extensions unless they're available. This includes querying
for descriptor indexing, and running one of two versions of kernel4
depending on whether it's enabled.

Part of the support needed for #78
2021-04-08 15:11:15 -07:00
Elias Naur ee4429a26f kernel4: separate area from alpha in clip stack
This change prepares for kernel4 to output alpha. No functional changes.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-31 19:51:42 +02:00
Elias Naur e9ff509ab9 use tag flags for fill vs stroke modes in scene elements
Encode stroke vs fill as tag flags, thereby reducing the number of scene
elements. Encoding change only, no functional changes.

The previous Stroke and Fill commands are merged to one command,
FillColor. The encoding to annotated element is divergent, which is
fixed when annotated elements move to tag flags.

Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur 903ab1fb59 implement FillImage command and sRGB support
FillImage is like Fill, except that it takes its color from one or
more image atlases.

kernel4 uses a single image for non-Vulkan hosts, and the dynamic sized array
of image descriptors on Vulkan.

A previous version of this commit used textures. I think images are a better
choice for piet-gpu, for several reasons:

- Texture sampling, in particular textureGrad, is slow on lower spec devices
  such as Google Pixel. Texture sampling is particularly slow and difficult to
implement for CPU fallbacks.
- Texture sampling need more parameters, in particular the full u,v
  transformation matrix, leading to a large increase in the command size. Since
all commands use the same size, that memory penalty is paid by all scenes, not
just scenes with textures.
- It is unlikely that piet-gpu will support every kind of fill for every
  client, because each kind must be added to kernel4.

With FillImage, a client will prepare the image(s) in separate shader stages,
sampling and applying transformations and special effects as needed. Textures
that align with the output pixel grid can be used directly, without
pre-processing.

Note that the pre-processing step can run concurrently with the piet-gpu pipeline;
Only the last stage, kernel4, needs the images.

Pre-processing most likely uses fixed function vertex/fragment programs,
which on some GPUs may run in parallel with piet-gpu's compute programs.

While here, fix a few validation errors:
- Explicitly enable EXT_descriptor_indexing, KHR_maintenance3,
  KHR_get_physical_device_properties2.
- Specify a vkDescriptorSetVariableDescriptorCountAllocateInfo for
  vkAllocateDescriptorSets. Otherwise, variable image2D arrays won't work (but
sampler2D arrays do, at least on my setup).

Updates #38

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur 07e07c7544 ensure consistent path segment transformation
As described in #62, the non-deterministic scene monoid may result in
slightly different transformations for path segments in an otherwise
closed path.

This change ensures consistent transformation across paths in three steps.

First, absolute transformations computed by the scene monoid is stored
along with path segments and annotated elements.

Second, elements.comp no longer transforms path segments. Instead, each
segment is stored untransformed along with a reference to its absolute
transformation.

Finally, path_coarse performs the transformation of path segments.
Because all segments in a path share a single transformation reference,
the inconsistency in #62 is avoided.

Fixes #62

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:45:23 +01:00
Elias Naur 6a4e26ef2a all: add optional memory checks
Defining MEM_DEBUG in mem.h will add a size field to Alloc and enable
bounds and alignment checks for every memory read and write.

Notes:
- Deriving an Alloc from Path.tiles is unsound, but it's more trouble to
  convert Path.tiles from TileRef to a variable sized Alloc.
- elements.comp note that "We should be able to use an array of structs but the
  NV shader compiler doesn't seem to like it". If that's still relevant, does
  the shared arrays of Allocs work?

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-02-15 16:07:45 +01:00
Elias Naur c4f5a69a0d implement variable output sizing
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur 4de67d9081 unify GPU memory management
Merge all static and dynamic buffers to just one, "memory". Add a malloc
function for dynamic allocations.

Unify static allocation offsets into a "config" buffer containing scene setup
(number of paths, number of path segments), as well as the memory offsets of
the static allocations.

Finally, set an overflow flag when an allocation fail, and make sure to exit
shader execution as soon as that triggers. Add checks before beginning
execution in case the client wants to run two or more shaders before checking
the flag.

The "state" buffer is left alone because it needs zero'ing and because it is
accessed with the "volatile" keyword.

Fixes #40

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur d21f2b68de all: add SPDX license headers
Fixes #53

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-11 18:24:35 +01:00
Elias Naur ac3ac3ddff shader: introduce a crude setting for adjusting the maximum workgroup size
Both the Vulkan and OpenGL ES spec allow implementations to limit workgroups to
128 threads. Add a LG_WG_FACTOR setting for easy switching between 128 and 256
threads, with 256 being kept as the default setting.

Manually tested that LG_WG_FACTOR = 0 (128 threads) works as expected.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-13 13:04:13 +02:00
Elias Naur 326f7f0d03 shader: delete more unused code and variables
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-13 13:03:56 +02:00
Elias Naur 05636995dd compute IMAGE_WIDTH and IMAGE_HEIGHT; remove dead code from setup.h
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-08-29 15:03:40 +02:00
Raph Levien 294f6fd1db Experiment with new sorting scheme
Path segments are unsorted, but other elements are using the same
sort-middle approach as before.

This is a checkpoint. At this point, there are unoptimized versions
of tile init and coarse path raster, but it isn't wired up into a
working pipeline. Also observing about a 3x performance regression in
element processing, which needs to be investigated.
2020-06-03 09:29:25 -07:00
Raph Levien a616b4d010 Rework right_edge computation in elements
Trying to fit it into the fancy monad doesn't really work, so use a
more straightforward approach to compute it from the aggregate.

Also add yEdge logic (basically copying piet-metal). With a fix to
ELEMENT_BINNING_RATIO (which I had simply gotten wrong), the example
renders almost correctly, with small bounding box artifacts.
2020-05-21 10:00:56 -07:00
Raph Levien 03da52cff8 Start implementing fills
This should get the "right_edge" value for each segment plumbed through
to the binning phase. It also needs to be plumbed to coarse raster and
wired up there.

Also considering WIP because none of this logic has been tested yet.
2020-05-19 20:40:04 -07:00
Raph Levien cc89d0e285 Starting coarse rasterizer
Working down the pipeline.

WIP
2020-05-13 21:39:47 -07:00
Raph Levien aa83d782ed Fills
Adds fills, and has more or less working tiger render (with artifacts).
2020-05-01 19:42:20 -07:00
Raph Levien b23fe25177 Use linked list strategy for segments
Trying to allocate them contiguously wasn't good.
2020-04-28 22:25:57 -07:00
Raph Levien cb06b1bc3d Implement stroked polylines
This version seems to work but the allocation of segments has low
utilization. Probably best to allocate in chunks rather than try to
make them contiguous.
2020-04-28 18:45:59 -07:00
Raph Levien 55e35dd879 Dynamic allocation of intermediate buffers
When the initial allocation is exceeded, do an atomic bump allocation.
This is done for both tilegroup instances and per tile command lists.
2020-04-25 10:45:47 -07:00
Raph Levien e1c0e448ef Encode stroke in scene
This just adds the first step of polyline stroking, which is adding it
to the scene. Also just a bit of cleaning up of dimensions into one
header file.
2020-04-25 08:24:46 -07:00