Commit graph

900 commits

Author SHA1 Message Date
Raph Levien 4dcf385b18 Remove MemFlags trait 2021-05-21 21:51:33 -07:00
Raph Levien e9a8b4643b Migrate to BufferUsage
Adopt the BufferUsage concept from WebGPU, and replace MemFlags, which
is inadequate.
2021-05-21 19:43:55 -07:00
Raph Levien cd5e799d1a Beginning of Metal back-end
Work in progress, some types in place but mostly a skeleton.
2021-05-21 17:44:49 -07:00
Raph Levien e4b16e706a Timestamp queries
These function, but can use some work.

First, the buffer situation is worse than it should be. It should be
possible to create a single readback buffer rather then copy from
gpu-local to host-coherent.

Second, the command buffer `finish_timestamps` call doesn't correlate to
anything in Vulkan, so needs plumbing up through the hub in one form or
other when that happens. I'm inclined to make it ergonomic by doing a
bit of resource tracking that will trigger the appropriate call (and
subsequent host barrier) in the `finish` method on the command buffer.
2021-05-21 13:19:10 -07:00
Raph Levien f482921806 Create compute pipelines
Create compute pipelines from shader source and descriptor sets. This
gets it to the point where it can run the collatz example.

Still WIP and with rough edges, of course.
2021-05-18 10:08:23 -07:00
Raph Levien ee0802133b Add new types and methods
This brings the signature current so it compiles, but the
implementations are just stubs for now.
2021-05-16 10:38:09 -07:00
Raph Levien 619fc8d4eb Merge branch 'master' into dx12 2021-05-16 10:19:06 -07:00
Raph Levien a28c0c8c83 A bit more work
Chipping away at the dx12 backend. This should more or less do the
signalling to the CPU that the command buffer is done (ie wire up the
fence). It also creates buffer objects.
2021-05-16 10:18:58 -07:00
Raph Levien 34d8fa358b
Merge pull request #91 from linebender/gpu-test
Expand runtime query of GPU capabilities
2021-05-09 06:44:06 -07:00
Raph Levien a5991ecf97 Expand runtime query of GPU capabilities
Test whether the GPU supports subgroups (including size control) and
memory model.

This patch does all the ceremony needed for runtime query, including
testing the Vulkan version and only probing the extensions when
available. Thus, it should work fine on older devices (not yet tested).

The reporting of capabilities follows Vulkan concepts, but is not
particularly Vulkan-specific.
2021-05-08 11:41:47 -07:00
Raph Levien 951f3aa508 Start text rendering
This commit puts in basic integration with ttf-parser and starts
populating the various piet text objects. The font is currently
hard-coded.
2021-05-04 08:21:22 -07:00
Raph Levien f6c2558743
Merge pull request #82 from linebender/android2
Add example Android apk
2021-04-27 08:35:39 -07:00
Raph Levien 6602d58054 Merge branch 'master' into android2 2021-04-20 07:15:10 -07:00
Elias Naur 4b59525e1f use mediump precision for kernel4 colors and areas
Improves kernel4 performance for a Gio scene from ~22ms to ~15ms.

Updates #83

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-20 10:15:42 +02:00
Elias Naur d9d518b248 avoid non-uniform barrier control flow when exhausting memory
The compute shaders have a check for the succesful completion of their
preceding stage. However, consider a shader execution path like the
following:

	void main()
		if (mem_error != NO_ERROR) {
		    return;
		}
		...
		malloc(...);
		...
		barrier();
		...
	}

and  shader execution that fails to allocate memory, thereby setting
mem_error to ERR_MALLOC_FAILED in malloc before reaching the barrier. If
another shader execution then begins execution, its mem_eror check will
make it return early and not reach the barrier.

All GPU APIs require (dynamically) uniform control flow for barriers,
and the above case may lead to GPU hangs in practice.

Fix this issue by replacing the early exits with careful checks that
don't interrupt barrier control flow.

Unfortunately, it's harder to prove the soundness of the new checks, so
this change also clears dynamic memory ranges in MEM_DEBUG mode when
memory is exhausted. The result is that accessing memory after
exhaustion triggers an error.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-20 10:15:29 +02:00
Elias Naur 3b4a72deb9 elements.comp: remove redundant assignment
The assignment was made redundant by eb86456f31.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-20 10:14:04 +02:00
Raph Levien e1aced9c5a Merge branch 'master' into android2 2021-04-12 16:00:50 -07:00
Raph Levien 74f2003a1d
Merge pull request #79 from linebender/ext_query
Query extensions at runtime
2021-04-11 15:34:19 -07:00
Raph Levien 1c842f8471 Merge branch 'master' into ext_query 2021-04-11 15:33:49 -07:00
Elias Naur 45ea43c157 kernel4: replace continue in switch to support D3D11 shader model 5.0
Without this change, the fxc.exe compiler complains

error X3708: continue cannot be used in a switch

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-11 21:49:57 +02:00
Raph Levien 01e4024599 Merge branch 'master' into ext_query 2021-04-11 09:08:46 -07:00
Tatsuyuki Ishi 5e0cdcb193
Merge pull request #85 from ishitatsuyuki/render-ctx-premul
Encode premultiplied alpha in render_ctx.rs
2021-04-11 18:34:18 +09:00
Tatsuyuki Ishi 0637e2d6e5 Encode premultiplied alpha in render_ctx.rs 2021-04-11 13:20:40 +09:00
Elias Naur f4be74c07f winit: fix n_trans count
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-10 18:12:21 +02:00
Raph Levien bcb26c931e Clean up device create extensions 2021-04-08 15:11:17 -07:00
Raph Levien 115cb855d9 Query extensions at runtime
Don't run extensions unless they're available. This includes querying
for descriptor indexing, and running one of two versions of kernel4
depending on whether it's enabled.

Part of the support needed for #78
2021-04-08 15:11:15 -07:00
Elias Naur eb86456f31 elements.comp: don't modify BeginClip bounding box
The BeginClip and EndClip bounding boxes are absolute and must pairwise
match. I mistakenly modified the BeginClip bounding box for stroked
clips.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-08 19:56:37 +02:00
Raph Levien e6b2cc7b2b Android test application
Adds an example binary that can be run with `cargo apk`.

One thing that will still need manual tuning (for now) is the size of
the canvas. A good followup is to sense that from the window size.
2021-04-05 16:23:11 -07:00
Raph Levien d89d0964ec Clean up device create extensions 2021-04-03 08:45:36 -07:00
Raph Levien d1b9821fa8 Query extensions at runtime
Don't run extensions unless they're available. This includes querying
for descriptor indexing, and running one of two versions of kernel4
depending on whether it's enabled.

Part of the support needed for #78
2021-04-02 19:58:48 -07:00
Elias Naur 5db427c549 kernel4: compute and output alpha
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-31 19:51:49 +02:00
Elias Naur ee4429a26f kernel4: separate area from alpha in clip stack
This change prepares for kernel4 to output alpha. No functional changes.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-31 19:51:42 +02:00
Elias Naur 22507dea0e pre-allocate kernel4 scratch space in coarse.comp
coarse.comp knows the maximum stack depth, and can pre-allocate scratch
space for kernel4.comp. Kernel4 no longer contains allocations nor
control barriers.

The invocation local blend stack is gone as well; it didn't seem to make
any difference in performance to always use global memory for pushing
and popping.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-31 18:48:19 +02:00
Elias Naur e6b535d942 coarse.comp: extract area commands into function
No functional changes.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-30 19:56:09 +02:00
Elias Naur d916a9e2c4 backdrop.comp: support stroked Annotated_Image and Annotated_BeginClip
Commit 8db77e180e added support for
strokes to FillImage and BeginClip, but missed backdrop.comp.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-30 19:33:25 +02:00
Elias Naur 678bfedfca kernel4: assume colors in alpha-premultiplied sRGB format
See http://ssp.impulsetrain.com/gamma-premult.html for a description of
the format.

Pre-multiplied alpha only matters for translucent objects; draw a few
such shapes in the test render.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-29 21:17:01 +02:00
Elias Naur eb37db1b05 replace per-element fill mode flags with a SetFillMode element
Fixes #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-29 21:10:25 +02:00
Elias Naur bb61f875dc kernel4: remove dead code left over from previous clipping approach
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-29 21:10:17 +02:00
Tatsuyuki Ishi 76f528c491
Merge pull request #76 from ishitatsuyuki/chunk-x 2021-03-26 03:02:38 +09:00
Tatsuyuki Ishi 4864a7fe0f Create chunks over the x axis in addition to y axis
This allows more coalescing with image loads/stores, since all of our images are stored with a tiled layout.
2021-03-23 20:54:49 +09:00
Elias Naur f0127812eb tightly pack fine rasterizer commands
Reclaims the space waste from splitting fill mode commands from fill
commands.

For example, a CmdStroke + CmdColor use an extra tag word compared to
the former combined CmdStroke. This change shaves off that one word.

In the future, we can pack several command tags into one tag word,
saving even more space.

Fixes #66

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 16:43:33 +01:00
Elias Naur 8db77e180e support stroked fills for clips, images
This change completes general support for stroked fills for clips and
images.

Annotated_size increases from 28 to 32, because of the linewidth field
added to AnnoImage. Stroked image fills are presumably rare, and if
memory pressure turns out to be a bottleneck, we could replace the
linewidth field with a separate AnnoLinewidth elements.

Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 16:43:33 +01:00
Elias Naur db59b5d570 coarse,kernel4: make stroke, (non-zero) fill, solid separate commands
Before this change, every command (FillColor, FillImage, BeginClip)
had (or would need) stroke, (non-zero) fill and solid variants.

This change adds a command for each fill mode and their parameters,
reducing code duplication and adds support for stroked FillImage and
BeginClip as a side-effect.

The rest of the pipeline doesn't yet support Stroked FillImage and
BeginClip. That's a follow-up change.

Since each command includes a tag, this change adds an extra word for
each fill and stroke. That waste is also addressed in a follow-up.

Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 16:43:33 +01:00
Elias Naur 22eb418832 fix Vulkan errors on Wayland and Intel GPU
capabilities.min_image_count is 4 on my system, which is larger than
the hard-coded 2.

Use a default swapchain size if we're not getting any size information
from the surface capabilities.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 16:43:33 +01:00
Elias Naur 44bff2726c collapse FillCubic and StrokeCubic into Cubic with flags for fill mode
Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur df055563bd collapse annotated Fill and Stroke to Color with fill mode flag
No functionality changes, just different encoding.

Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur e9ff509ab9 use tag flags for fill vs stroke modes in scene elements
Encode stroke vs fill as tag flags, thereby reducing the number of scene
elements. Encoding change only, no functional changes.

The previous Stroke and Fill commands are merged to one command,
FillColor. The encoding to annotated element is divergent, which is
fixed when annotated elements move to tag flags.

Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur a5b6bda941 add support for element flags to shaders
Commit 9afa9b86b6 added Rust support for
encoding flags into elements. This change adds support to shaders by
introducing variant tag structs:

struct VariantTag {
    uint tag;
    uint flags;
}

and returning them from Variant_tag functions.

It also adds a flags argument to write functions for enum variants that
include TagFlags.

No functionality changes.

Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur 903ab1fb59 implement FillImage command and sRGB support
FillImage is like Fill, except that it takes its color from one or
more image atlases.

kernel4 uses a single image for non-Vulkan hosts, and the dynamic sized array
of image descriptors on Vulkan.

A previous version of this commit used textures. I think images are a better
choice for piet-gpu, for several reasons:

- Texture sampling, in particular textureGrad, is slow on lower spec devices
  such as Google Pixel. Texture sampling is particularly slow and difficult to
implement for CPU fallbacks.
- Texture sampling need more parameters, in particular the full u,v
  transformation matrix, leading to a large increase in the command size. Since
all commands use the same size, that memory penalty is paid by all scenes, not
just scenes with textures.
- It is unlikely that piet-gpu will support every kind of fill for every
  client, because each kind must be added to kernel4.

With FillImage, a client will prepare the image(s) in separate shader stages,
sampling and applying transformations and special effects as needed. Textures
that align with the output pixel grid can be used directly, without
pre-processing.

Note that the pre-processing step can run concurrently with the piet-gpu pipeline;
Only the last stage, kernel4, needs the images.

Pre-processing most likely uses fixed function vertex/fragment programs,
which on some GPUs may run in parallel with piet-gpu's compute programs.

While here, fix a few validation errors:
- Explicitly enable EXT_descriptor_indexing, KHR_maintenance3,
  KHR_get_physical_device_properties2.
- Specify a vkDescriptorSetVariableDescriptorCountAllocateInfo for
  vkAllocateDescriptorSets. Otherwise, variable image2D arrays won't work (but
sampler2D arrays do, at least on my setup).

Updates #38

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur 07e07c7544 ensure consistent path segment transformation
As described in #62, the non-deterministic scene monoid may result in
slightly different transformations for path segments in an otherwise
closed path.

This change ensures consistent transformation across paths in three steps.

First, absolute transformations computed by the scene monoid is stored
along with path segments and annotated elements.

Second, elements.comp no longer transforms path segments. Instead, each
segment is stored untransformed along with a reference to its absolute
transformation.

Finally, path_coarse performs the transformation of path segments.
Because all segments in a path share a single transformation reference,
the inconsistency in #62 is avoided.

Fixes #62

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:45:23 +01:00