Commit graph

179 commits

Author SHA1 Message Date
Tatsuyuki Ishi bc7b2106b0
Merge pull request #77 from ishitatsuyuki/blend-scratch
Remove manual blend stack spilling and rely on scratch memory instead
2021-07-13 09:49:34 +09:00
Ishi Tatsuyuki afe72804e1 Add command line parameters to winit
So that I don't need to modify lib.rs every time I want to benchmark...
2021-06-26 11:42:33 +09:00
Ishi Tatsuyuki 7a2dc37d36 Remove manual blend stack spilling and rely on scratch memory instead
v2: Add a panic when the nested blend depth exceeds the limit.
v3: Rebase and partially remove code introduced in 22507de.
2021-06-25 17:13:01 +09:00
Raph Levien 379fb1caaa
Merge pull request #89 from linebender/text
Start text rendering
2021-06-23 07:56:24 -07:00
Ishi Tatsuyuki d77dfb8c00 Runtime querying of threadgroup size 2021-06-08 16:29:40 +09:00
Ishi Tatsuyuki c2772ceac7 Boost backdrop parallelism for the prefix sums 2021-06-08 15:09:32 +09:00
Raph Levien bae185efbd API reorg
Move types into the toplevel and hide implementation details. Remove
deref of hub CmdBuf to mux. Restrict public visibility of internals.

Most items have some docs, though improvements are still possible. In
particular, there should be detailed safety info.
2021-05-29 21:11:02 -07:00
Raph Levien 7d7c86c44b API changes and cleanup
Add workgroup size to dispatch call (needed by metal). Change all fence
references to mutable for consistency.

Move backend traits to a separate file (move them out of the toplevel
namespace in preparation for the hub types going there, to make the
public API nicer).

Add a method and macro for automatically choosing shader code, and
change collatz example to generate all 3 kinds on build.
2021-05-28 16:14:39 -07:00
Raph Levien 2ecfc7a414 Wire hub to mux
Make the hub abstraction connect to the mux, rather than directly to the
Vulkan back-end.

As of this commit, both command line and winit examples work (on
Vulkan). In theory it should be possible to get them working on Dx12 as
well by translating the shader code, but there's a lot that can go
wrong.

This commit also contains a bunch of changes to mux to make conditional
compilation of match arms work, and new methods to support swapchain.
2021-05-26 09:30:07 -07:00
Raph Levien f04da3af9d Add multiplexer abstraction
Adds a new "mux" module which can have multiple backends. As of this
commit, it's not wired up at all, but the functionality should be
reasonably complete.

Minor tweaks to the backend trait to accommodate this, mostly changing
Fence and Semaphore to references so they don't need to be Copy.

Part of the work toward #95
2021-05-25 15:12:37 -07:00
Raph Levien 47d2e0a756 Add create_buffer_init method
Add a method to create a buffer with initial content, which requires
staging buffers under the hood.

This patch also changes the lower-level (Vulkan) interface to be closer
to the raw Vulkan call.
2021-05-24 13:18:11 -07:00
Raph Levien e9a8b4643b Migrate to BufferUsage
Adopt the BufferUsage concept from WebGPU, and replace MemFlags, which
is inadequate.
2021-05-21 19:43:55 -07:00
Raph Levien a5991ecf97 Expand runtime query of GPU capabilities
Test whether the GPU supports subgroups (including size control) and
memory model.

This patch does all the ceremony needed for runtime query, including
testing the Vulkan version and only probing the extensions when
available. Thus, it should work fine on older devices (not yet tested).

The reporting of capabilities follows Vulkan concepts, but is not
particularly Vulkan-specific.
2021-05-08 11:41:47 -07:00
Raph Levien 951f3aa508 Start text rendering
This commit puts in basic integration with ttf-parser and starts
populating the various piet text objects. The font is currently
hard-coded.
2021-05-04 08:21:22 -07:00
Raph Levien 6602d58054 Merge branch 'master' into android2 2021-04-20 07:15:10 -07:00
Elias Naur 4b59525e1f use mediump precision for kernel4 colors and areas
Improves kernel4 performance for a Gio scene from ~22ms to ~15ms.

Updates #83

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-20 10:15:42 +02:00
Elias Naur d9d518b248 avoid non-uniform barrier control flow when exhausting memory
The compute shaders have a check for the succesful completion of their
preceding stage. However, consider a shader execution path like the
following:

	void main()
		if (mem_error != NO_ERROR) {
		    return;
		}
		...
		malloc(...);
		...
		barrier();
		...
	}

and  shader execution that fails to allocate memory, thereby setting
mem_error to ERR_MALLOC_FAILED in malloc before reaching the barrier. If
another shader execution then begins execution, its mem_eror check will
make it return early and not reach the barrier.

All GPU APIs require (dynamically) uniform control flow for barriers,
and the above case may lead to GPU hangs in practice.

Fix this issue by replacing the early exits with careful checks that
don't interrupt barrier control flow.

Unfortunately, it's harder to prove the soundness of the new checks, so
this change also clears dynamic memory ranges in MEM_DEBUG mode when
memory is exhausted. The result is that accessing memory after
exhaustion triggers an error.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-20 10:15:29 +02:00
Elias Naur 3b4a72deb9 elements.comp: remove redundant assignment
The assignment was made redundant by eb86456f31.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-20 10:14:04 +02:00
Raph Levien e1aced9c5a Merge branch 'master' into android2 2021-04-12 16:00:50 -07:00
Raph Levien 1c842f8471 Merge branch 'master' into ext_query 2021-04-11 15:33:49 -07:00
Elias Naur 45ea43c157 kernel4: replace continue in switch to support D3D11 shader model 5.0
Without this change, the fxc.exe compiler complains

error X3708: continue cannot be used in a switch

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-11 21:49:57 +02:00
Raph Levien 01e4024599 Merge branch 'master' into ext_query 2021-04-11 09:08:46 -07:00
Tatsuyuki Ishi 0637e2d6e5 Encode premultiplied alpha in render_ctx.rs 2021-04-11 13:20:40 +09:00
Elias Naur f4be74c07f winit: fix n_trans count
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-10 18:12:21 +02:00
Raph Levien 115cb855d9 Query extensions at runtime
Don't run extensions unless they're available. This includes querying
for descriptor indexing, and running one of two versions of kernel4
depending on whether it's enabled.

Part of the support needed for #78
2021-04-08 15:11:15 -07:00
Elias Naur eb86456f31 elements.comp: don't modify BeginClip bounding box
The BeginClip and EndClip bounding boxes are absolute and must pairwise
match. I mistakenly modified the BeginClip bounding box for stroked
clips.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-04-08 19:56:37 +02:00
Raph Levien e6b2cc7b2b Android test application
Adds an example binary that can be run with `cargo apk`.

One thing that will still need manual tuning (for now) is the size of
the canvas. A good followup is to sense that from the window size.
2021-04-05 16:23:11 -07:00
Raph Levien d1b9821fa8 Query extensions at runtime
Don't run extensions unless they're available. This includes querying
for descriptor indexing, and running one of two versions of kernel4
depending on whether it's enabled.

Part of the support needed for #78
2021-04-02 19:58:48 -07:00
Elias Naur 5db427c549 kernel4: compute and output alpha
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-31 19:51:49 +02:00
Elias Naur ee4429a26f kernel4: separate area from alpha in clip stack
This change prepares for kernel4 to output alpha. No functional changes.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-31 19:51:42 +02:00
Elias Naur 22507dea0e pre-allocate kernel4 scratch space in coarse.comp
coarse.comp knows the maximum stack depth, and can pre-allocate scratch
space for kernel4.comp. Kernel4 no longer contains allocations nor
control barriers.

The invocation local blend stack is gone as well; it didn't seem to make
any difference in performance to always use global memory for pushing
and popping.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-31 18:48:19 +02:00
Elias Naur e6b535d942 coarse.comp: extract area commands into function
No functional changes.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-30 19:56:09 +02:00
Elias Naur d916a9e2c4 backdrop.comp: support stroked Annotated_Image and Annotated_BeginClip
Commit 8db77e180e added support for
strokes to FillImage and BeginClip, but missed backdrop.comp.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-30 19:33:25 +02:00
Elias Naur 678bfedfca kernel4: assume colors in alpha-premultiplied sRGB format
See http://ssp.impulsetrain.com/gamma-premult.html for a description of
the format.

Pre-multiplied alpha only matters for translucent objects; draw a few
such shapes in the test render.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-29 21:17:01 +02:00
Elias Naur eb37db1b05 replace per-element fill mode flags with a SetFillMode element
Fixes #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-29 21:10:25 +02:00
Elias Naur bb61f875dc kernel4: remove dead code left over from previous clipping approach
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-29 21:10:17 +02:00
Tatsuyuki Ishi 4864a7fe0f Create chunks over the x axis in addition to y axis
This allows more coalescing with image loads/stores, since all of our images are stored with a tiled layout.
2021-03-23 20:54:49 +09:00
Elias Naur f0127812eb tightly pack fine rasterizer commands
Reclaims the space waste from splitting fill mode commands from fill
commands.

For example, a CmdStroke + CmdColor use an extra tag word compared to
the former combined CmdStroke. This change shaves off that one word.

In the future, we can pack several command tags into one tag word,
saving even more space.

Fixes #66

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 16:43:33 +01:00
Elias Naur 8db77e180e support stroked fills for clips, images
This change completes general support for stroked fills for clips and
images.

Annotated_size increases from 28 to 32, because of the linewidth field
added to AnnoImage. Stroked image fills are presumably rare, and if
memory pressure turns out to be a bottleneck, we could replace the
linewidth field with a separate AnnoLinewidth elements.

Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 16:43:33 +01:00
Elias Naur db59b5d570 coarse,kernel4: make stroke, (non-zero) fill, solid separate commands
Before this change, every command (FillColor, FillImage, BeginClip)
had (or would need) stroke, (non-zero) fill and solid variants.

This change adds a command for each fill mode and their parameters,
reducing code duplication and adds support for stroked FillImage and
BeginClip as a side-effect.

The rest of the pipeline doesn't yet support Stroked FillImage and
BeginClip. That's a follow-up change.

Since each command includes a tag, this change adds an extra word for
each fill and stroke. That waste is also addressed in a follow-up.

Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 16:43:33 +01:00
Elias Naur 22eb418832 fix Vulkan errors on Wayland and Intel GPU
capabilities.min_image_count is 4 on my system, which is larger than
the hard-coded 2.

Use a default swapchain size if we're not getting any size information
from the surface capabilities.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 16:43:33 +01:00
Elias Naur 44bff2726c collapse FillCubic and StrokeCubic into Cubic with flags for fill mode
Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur df055563bd collapse annotated Fill and Stroke to Color with fill mode flag
No functionality changes, just different encoding.

Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur e9ff509ab9 use tag flags for fill vs stroke modes in scene elements
Encode stroke vs fill as tag flags, thereby reducing the number of scene
elements. Encoding change only, no functional changes.

The previous Stroke and Fill commands are merged to one command,
FillColor. The encoding to annotated element is divergent, which is
fixed when annotated elements move to tag flags.

Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur a5b6bda941 add support for element flags to shaders
Commit 9afa9b86b6 added Rust support for
encoding flags into elements. This change adds support to shaders by
introducing variant tag structs:

struct VariantTag {
    uint tag;
    uint flags;
}

and returning them from Variant_tag functions.

It also adds a flags argument to write functions for enum variants that
include TagFlags.

No functionality changes.

Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur 903ab1fb59 implement FillImage command and sRGB support
FillImage is like Fill, except that it takes its color from one or
more image atlases.

kernel4 uses a single image for non-Vulkan hosts, and the dynamic sized array
of image descriptors on Vulkan.

A previous version of this commit used textures. I think images are a better
choice for piet-gpu, for several reasons:

- Texture sampling, in particular textureGrad, is slow on lower spec devices
  such as Google Pixel. Texture sampling is particularly slow and difficult to
implement for CPU fallbacks.
- Texture sampling need more parameters, in particular the full u,v
  transformation matrix, leading to a large increase in the command size. Since
all commands use the same size, that memory penalty is paid by all scenes, not
just scenes with textures.
- It is unlikely that piet-gpu will support every kind of fill for every
  client, because each kind must be added to kernel4.

With FillImage, a client will prepare the image(s) in separate shader stages,
sampling and applying transformations and special effects as needed. Textures
that align with the output pixel grid can be used directly, without
pre-processing.

Note that the pre-processing step can run concurrently with the piet-gpu pipeline;
Only the last stage, kernel4, needs the images.

Pre-processing most likely uses fixed function vertex/fragment programs,
which on some GPUs may run in parallel with piet-gpu's compute programs.

While here, fix a few validation errors:
- Explicitly enable EXT_descriptor_indexing, KHR_maintenance3,
  KHR_get_physical_device_properties2.
- Specify a vkDescriptorSetVariableDescriptorCountAllocateInfo for
  vkAllocateDescriptorSets. Otherwise, variable image2D arrays won't work (but
sampler2D arrays do, at least on my setup).

Updates #38

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur 07e07c7544 ensure consistent path segment transformation
As described in #62, the non-deterministic scene monoid may result in
slightly different transformations for path segments in an otherwise
closed path.

This change ensures consistent transformation across paths in three steps.

First, absolute transformations computed by the scene monoid is stored
along with path segments and annotated elements.

Second, elements.comp no longer transforms path segments. Instead, each
segment is stored untransformed along with a reference to its absolute
transformation.

Finally, path_coarse performs the transformation of path segments.
Because all segments in a path share a single transformation reference,
the inconsistency in #62 is avoided.

Fixes #62

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:45:23 +01:00
Elias Naur ad444f615c elements.comp: use shared array of structs directly
The NVIDIA shader compiler bug that forced splitting of the state struct
into primitive types is now fixed.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:45:23 +01:00
Elias Naur fd746ea7a6 name and comment magic constant
Follow-up to review of PR #61.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:45:23 +01:00
Elias Naur 79d722df48 remove unused commands from pathseg
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:45:23 +01:00