vello/piet-gpu/shader/setup.h

// SPDX-License-Identifier: Apache-2.0 OR MIT OR Unlicense

// Various constants for the sizes of groups and tiles.

// Much of this will be made dynamic in various ways, but for now it's easiest
// to hardcode and keep all in one place.

// A LG_WG_FACTOR of n scales workgroup sizes by 2^n. Use 0 for a
// maximum workgroup size of 128, or 1 for a maximum size of 256.
#define LG_WG_FACTOR 1
#define WG_FACTOR (1<<LG_WG_FACTOR)

#define TILE_WIDTH_PX 16
#define TILE_HEIGHT_PX 16

#define PTCL_INITIAL_ALLOC 1024

// This is now set in the ninja file during compilation
//#define ENABLE_IMAGE_INDICES

// These should probably be renamed and/or reworked. In the binning
// kernel, they represent the number of bins. Also, the workgroup size
// of that kernel is equal to the number of bins, but should probably
// be more flexible (it's 512 in the K&L paper).
#define N_TILE_X 16
#define N_TILE_Y (8 * WG_FACTOR)
#define N_TILE (N_TILE_X * N_TILE_Y)
#define LG_N_TILE (7 + LG_WG_FACTOR)
#define N_SLICE (N_TILE / 32)

#define GRADIENT_WIDTH 512

struct Config {
    uint n_elements; // paths
    uint n_pathseg;
    uint width_in_tiles;
    uint height_in_tiles;
    Alloc tile_alloc;
    Alloc bin_alloc;
    Alloc ptcl_alloc;
    Alloc pathseg_alloc;
    Alloc anno_alloc;
    Alloc trans_alloc;
};

// Fill modes.
#define MODE_NONZERO 0
#define MODE_STROKE 1

// Size of kernel4 clip state, in words.
#define CLIP_STATE_SIZE 2

// fill_mode_from_flags extracts the fill mode from tag flags.
uint fill_mode_from_flags(uint flags) {
    return flags & 0x1;
}
all: add SPDX license headers Fixes #53 Signed-off-by: Elias Naur <mail@eliasnaur.com> 2020-12-12 01:01:48 +11:00			`// SPDX-License-Identifier: Apache-2.0 OR MIT OR Unlicense`

Encode stroke in scene This just adds the first step of polyline stroking, which is adding it to the scene. Also just a bit of cleaning up of dimensions into one header file. 2020-04-25 06:06:47 +10:00			`// Various constants for the sizes of groups and tiles.`

			`// Much of this will be made dynamic in various ways, but for now it's easiest`
			`// to hardcode and keep all in one place.`

shader: introduce a crude setting for adjusting the maximum workgroup size Both the Vulkan and OpenGL ES spec allow implementations to limit workgroups to 128 threads. Add a LG_WG_FACTOR setting for easy switching between 128 and 256 threads, with 256 being kept as the default setting. Manually tested that LG_WG_FACTOR = 0 (128 threads) works as expected. Signed-off-by: Elias Naur <mail@eliasnaur.com> 2020-09-13 20:58:47 +10:00			`// A LG_WG_FACTOR of n scales workgroup sizes by 2^n. Use 0 for a`
			`// maximum workgroup size of 128, or 1 for a maximum size of 256.`
			`#define LG_WG_FACTOR 1`
			`#define WG_FACTOR (1<<LG_WG_FACTOR)`

Encode stroke in scene This just adds the first step of polyline stroking, which is adding it to the scene. Also just a bit of cleaning up of dimensions into one header file. 2020-04-25 06:06:47 +10:00			`#define TILE_WIDTH_PX 16`
			`#define TILE_HEIGHT_PX 16`

compute IMAGE_WIDTH and IMAGE_HEIGHT; remove dead code from setup.h Signed-off-by: Elias Naur <mail@eliasnaur.com> 2020-08-29 23:02:45 +10:00			`#define PTCL_INITIAL_ALLOC 1024`
Starting coarse rasterizer Working down the pipeline. WIP 2020-05-14 08:35:19 +10:00
Query extensions at runtime Don't run extensions unless they're available. This includes querying for descriptor indexing, and running one of two versions of kernel4 depending on whether it's enabled. Part of the support needed for #78 2021-04-03 12:59:07 +11:00			`// This is now set in the ninja file during compilation`
			`//#define ENABLE_IMAGE_INDICES`
implement FillImage command and sRGB support FillImage is like Fill, except that it takes its color from one or more image atlases. kernel4 uses a single image for non-Vulkan hosts, and the dynamic sized array of image descriptors on Vulkan. A previous version of this commit used textures. I think images are a better choice for piet-gpu, for several reasons: - Texture sampling, in particular textureGrad, is slow on lower spec devices such as Google Pixel. Texture sampling is particularly slow and difficult to implement for CPU fallbacks. - Texture sampling need more parameters, in particular the full u,v transformation matrix, leading to a large increase in the command size. Since all commands use the same size, that memory penalty is paid by all scenes, not just scenes with textures. - It is unlikely that piet-gpu will support every kind of fill for every client, because each kind must be added to kernel4. With FillImage, a client will prepare the image(s) in separate shader stages, sampling and applying transformations and special effects as needed. Textures that align with the output pixel grid can be used directly, without pre-processing. Note that the pre-processing step can run concurrently with the piet-gpu pipeline; Only the last stage, kernel4, needs the images. Pre-processing most likely uses fixed function vertex/fragment programs, which on some GPUs may run in parallel with piet-gpu's compute programs. While here, fix a few validation errors: - Explicitly enable EXT_descriptor_indexing, KHR_maintenance3, KHR_get_physical_device_properties2. - Specify a vkDescriptorSetVariableDescriptorCountAllocateInfo for vkAllocateDescriptorSets. Otherwise, variable image2D arrays won't work (but sampler2D arrays do, at least on my setup). Updates #38 Signed-off-by: Elias Naur <mail@eliasnaur.com> 2020-12-29 08:02:39 +11:00
Starting coarse rasterizer Working down the pipeline. WIP 2020-05-14 08:35:19 +10:00			`// These should probably be renamed and/or reworked. In the binning`
			`// kernel, they represent the number of bins. Also, the workgroup size`
			`// of that kernel is equal to the number of bins, but should probably`
			`// be more flexible (it's 512 in the K&L paper).`
			`#define N_TILE_X 16`
shader: introduce a crude setting for adjusting the maximum workgroup size Both the Vulkan and OpenGL ES spec allow implementations to limit workgroups to 128 threads. Add a LG_WG_FACTOR setting for easy switching between 128 and 256 threads, with 256 being kept as the default setting. Manually tested that LG_WG_FACTOR = 0 (128 threads) works as expected. Signed-off-by: Elias Naur <mail@eliasnaur.com> 2020-09-13 20:58:47 +10:00			`#define N_TILE_Y (8 * WG_FACTOR)`
Starting coarse rasterizer Working down the pipeline. WIP 2020-05-14 08:35:19 +10:00			`#define N_TILE (N_TILE_X * N_TILE_Y)`
shader: introduce a crude setting for adjusting the maximum workgroup size Both the Vulkan and OpenGL ES spec allow implementations to limit workgroups to 128 threads. Add a LG_WG_FACTOR setting for easy switching between 128 and 256 threads, with 256 being kept as the default setting. Manually tested that LG_WG_FACTOR = 0 (128 threads) works as expected. Signed-off-by: Elias Naur <mail@eliasnaur.com> 2020-09-13 20:58:47 +10:00			`#define LG_N_TILE (7 + LG_WG_FACTOR)`
Starting coarse rasterizer Working down the pipeline. WIP 2020-05-14 08:35:19 +10:00			`#define N_SLICE (N_TILE / 32)`
unify GPU memory management Merge all static and dynamic buffers to just one, "memory". Add a malloc function for dynamic allocations. Unify static allocation offsets into a "config" buffer containing scene setup (number of paths, number of path segments), as well as the memory offsets of the static allocations. Finally, set an overflow flag when an allocation fail, and make sure to exit shader execution as soon as that triggers. Add checks before beginning execution in case the client wants to run two or more shaders before checking the flag. The "state" buffer is left alone because it needs zero'ing and because it is accessed with the "volatile" keyword. Fixes #40 Signed-off-by: Elias Naur <mail@eliasnaur.com> 2020-12-12 04:30:20 +11:00
Start work on gradients WIP. Most of the GPU-side work should be done (though it's not tested end-to-end and it's certainly possible I missed something), but still needs work on encoding side. 2021-06-24 04:50:51 +10:00			`#define GRADIENT_WIDTH 512`

unify GPU memory management Merge all static and dynamic buffers to just one, "memory". Add a malloc function for dynamic allocations. Unify static allocation offsets into a "config" buffer containing scene setup (number of paths, number of path segments), as well as the memory offsets of the static allocations. Finally, set an overflow flag when an allocation fail, and make sure to exit shader execution as soon as that triggers. Add checks before beginning execution in case the client wants to run two or more shaders before checking the flag. The "state" buffer is left alone because it needs zero'ing and because it is accessed with the "volatile" keyword. Fixes #40 Signed-off-by: Elias Naur <mail@eliasnaur.com> 2020-12-12 04:30:20 +11:00			`struct Config {`
			`uint n_elements; // paths`
			`uint n_pathseg;`
implement variable output sizing Signed-off-by: Elias Naur <mail@eliasnaur.com> 2020-12-18 10:55:21 +11:00			`uint width_in_tiles;`
			`uint height_in_tiles;`
all: add optional memory checks Defining MEM_DEBUG in mem.h will add a size field to Alloc and enable bounds and alignment checks for every memory read and write. Notes: - Deriving an Alloc from Path.tiles is unsound, but it's more trouble to convert Path.tiles from TileRef to a variable sized Alloc. - elements.comp note that "We should be able to use an array of structs but the NV shader compiler doesn't seem to like it". If that's still relevant, does the shared arrays of Allocs work? Signed-off-by: Elias Naur <mail@eliasnaur.com> 2020-12-24 22:00:53 +11:00			`Alloc tile_alloc;`
			`Alloc bin_alloc;`
			`Alloc ptcl_alloc;`
			`Alloc pathseg_alloc;`
			`Alloc anno_alloc;`
ensure consistent path segment transformation As described in #62, the non-deterministic scene monoid may result in slightly different transformations for path segments in an otherwise closed path. This change ensures consistent transformation across paths in three steps. First, absolute transformations computed by the scene monoid is stored along with path segments and annotated elements. Second, elements.comp no longer transforms path segments. Instead, each segment is stored untransformed along with a reference to its absolute transformation. Finally, path_coarse performs the transformation of path segments. Because all segments in a path share a single transformation reference, the inconsistency in #62 is avoided. Fixes #62 Signed-off-by: Elias Naur <mail@eliasnaur.com> 2021-03-15 22:28:04 +11:00			`Alloc trans_alloc;`
unify GPU memory management Merge all static and dynamic buffers to just one, "memory". Add a malloc function for dynamic allocations. Unify static allocation offsets into a "config" buffer containing scene setup (number of paths, number of path segments), as well as the memory offsets of the static allocations. Finally, set an overflow flag when an allocation fail, and make sure to exit shader execution as soon as that triggers. Add checks before beginning execution in case the client wants to run two or more shaders before checking the flag. The "state" buffer is left alone because it needs zero'ing and because it is accessed with the "volatile" keyword. Fixes #40 Signed-off-by: Elias Naur <mail@eliasnaur.com> 2020-12-12 04:30:20 +11:00			`};`
use tag flags for fill vs stroke modes in scene elements Encode stroke vs fill as tag flags, thereby reducing the number of scene elements. Encoding change only, no functional changes. The previous Stroke and Fill commands are merged to one command, FillColor. The encoding to annotated element is divergent, which is fixed when annotated elements move to tag flags. Updates #70 Signed-off-by: Elias Naur <mail@eliasnaur.com> 2021-03-17 21:08:28 +11:00
			`// Fill modes.`
			`#define MODE_NONZERO 0`
			`#define MODE_STROKE 1`

kernel4: separate area from alpha in clip stack This change prepares for kernel4 to output alpha. No functional changes. Signed-off-by: Elias Naur <mail@eliasnaur.com> 2021-03-23 02:13:39 +11:00			`// Size of kernel4 clip state, in words.`
			`#define CLIP_STATE_SIZE 2`

use tag flags for fill vs stroke modes in scene elements Encode stroke vs fill as tag flags, thereby reducing the number of scene elements. Encoding change only, no functional changes. The previous Stroke and Fill commands are merged to one command, FillColor. The encoding to annotated element is divergent, which is fixed when annotated elements move to tag flags. Updates #70 Signed-off-by: Elias Naur <mail@eliasnaur.com> 2021-03-17 21:08:28 +11:00			`// fill_mode_from_flags extracts the fill mode from tag flags.`
			`uint fill_mode_from_flags(uint flags) {`
kernel4: separate area from alpha in clip stack This change prepares for kernel4 to output alpha. No functional changes. Signed-off-by: Elias Naur <mail@eliasnaur.com> 2021-03-23 02:13:39 +11:00			`return flags & 0x1;`
use tag flags for fill vs stroke modes in scene elements Encode stroke vs fill as tag flags, thereby reducing the number of scene elements. Encoding change only, no functional changes. The previous Stroke and Fill commands are merged to one command, FillColor. The encoding to annotated element is divergent, which is fixed when annotated elements move to tag flags. Updates #70 Signed-off-by: Elias Naur <mail@eliasnaur.com> 2021-03-17 21:08:28 +11:00			`}`