Make max workgroup size 256 and respect LG_WG_FACTOR.
Because the monoid scans only support a height of 2, this will reduce
the maximum scene complexity we can render. But it also increases
compatibility. Supporting larger scans is a TODO.
Fix incorrect workgroup sizes, and change strategy for assigning binding
numbers; ultimately we should get correct values for those from shader
compilation, but this works for now.
This is one of the stages in the new element pipeline. It's a simple
one, just a prefix sum of a couple counts, and some of it will probably
get merged with a downstream stage, but we'll do it separately for now
for convenience.
This patch also contains an update to Vulkan tools 1.2.198, which
accounts for the large diff of translated shaders.
This patch contains the core of the path stream processing, though some
integration bits are missing. The core logic is tested, though
combinations of path types, transforms, and line widths are not (yet).
Progress towards #119
There's a bit of reorganizing as well. Shader stages are made available
from piet-gpu to the test rig, config is now a proper structure
(marshaled with bytemuck).
This commit just has the transform stage, which is a simple monoid scan
of affine transforms.
Progress toward #119
This gets it working on mac. Also delete old implementation.
There's also an update to winit 0.25 in here, because it was easier to
roll forward than fix inconsistent Cargo.lock. At some point, we should
systematically update all deps.
This was motivated by experiments with the Vulkan memory model. To use
that, we actually need to explicitly enable the relevant feature on
device creation time. That's a lot easier to do now that push_next works
on the structs in that chain. This PR doesn't do that though, it only
upgrades the dependency and cleans up deprecations.
This patch gets rid of warnings and runs cargo fmt.
A lot of the warnings were unused items (especially in DX12 land). At
some point we might want to bring some of that back, at which point it
might be useful to refer to what was deleted in this commit.
Pipeline the CPU and GPU work so that two frames can be in flight at
once.
This dramatically improves the performance especially on Android. Note
that I've also changed the default configuration to be 3 frames in
flight and FIFO mode.
Make the scene dependent on timing.
This commit patches the HAL to reuse command buffers; this works well on
Vulkan and prevents a leak, but breaks the other back-ends. That will
require a solution, possibly including plumbing up the resource lifetime
responsibilities to the client.
Other things might be hacky as well.
Separate out render context upload from renderer creation. Upload ramps
to GPU buffer. Encode gradients to scene description. Fix a number of
bugs in uploading and processing.
This renders gradients in a test image, but has some shortcomings. For
one, staging buffers need to be applied for a couple things (they're
just host mapped for now). Also, the interaction between sRGB and
premultiplied alpha isn't quite right. The size of the gradient ramp
buffer is fixed and should be dynamic.
And of course there's always more optimization to be done, including
making the upload of gradient ramps more incremental, and probably
hashing of the stops instead of the processed ramps.
WIP. Most of the GPU-side work should be done (though it's not tested
end-to-end and it's certainly possible I missed something), but still
needs work on encoding side.
Move types into the toplevel and hide implementation details. Remove
deref of hub CmdBuf to mux. Restrict public visibility of internals.
Most items have some docs, though improvements are still possible. In
particular, there should be detailed safety info.
Add workgroup size to dispatch call (needed by metal). Change all fence
references to mutable for consistency.
Move backend traits to a separate file (move them out of the toplevel
namespace in preparation for the hub types going there, to make the
public API nicer).
Add a method and macro for automatically choosing shader code, and
change collatz example to generate all 3 kinds on build.
Make the hub abstraction connect to the mux, rather than directly to the
Vulkan back-end.
As of this commit, both command line and winit examples work (on
Vulkan). In theory it should be possible to get them working on Dx12 as
well by translating the shader code, but there's a lot that can go
wrong.
This commit also contains a bunch of changes to mux to make conditional
compilation of match arms work, and new methods to support swapchain.
Add a method to create a buffer with initial content, which requires
staging buffers under the hood.
This patch also changes the lower-level (Vulkan) interface to be closer
to the raw Vulkan call.
Test whether the GPU supports subgroups (including size control) and
memory model.
This patch does all the ceremony needed for runtime query, including
testing the Vulkan version and only probing the extensions when
available. Thus, it should work fine on older devices (not yet tested).
The reporting of capabilities follows Vulkan concepts, but is not
particularly Vulkan-specific.
Don't run extensions unless they're available. This includes querying
for descriptor indexing, and running one of two versions of kernel4
depending on whether it's enabled.
Part of the support needed for #78
See http://ssp.impulsetrain.com/gamma-premult.html for a description of
the format.
Pre-multiplied alpha only matters for translucent objects; draw a few
such shapes in the test render.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
This change completes general support for stroked fills for clips and
images.
Annotated_size increases from 28 to 32, because of the linewidth field
added to AnnoImage. Stroked image fills are presumably rare, and if
memory pressure turns out to be a bottleneck, we could replace the
linewidth field with a separate AnnoLinewidth elements.
Updates #70
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Encode stroke vs fill as tag flags, thereby reducing the number of scene
elements. Encoding change only, no functional changes.
The previous Stroke and Fill commands are merged to one command,
FillColor. The encoding to annotated element is divergent, which is
fixed when annotated elements move to tag flags.
Updates #70
Signed-off-by: Elias Naur <mail@eliasnaur.com>
FillImage is like Fill, except that it takes its color from one or
more image atlases.
kernel4 uses a single image for non-Vulkan hosts, and the dynamic sized array
of image descriptors on Vulkan.
A previous version of this commit used textures. I think images are a better
choice for piet-gpu, for several reasons:
- Texture sampling, in particular textureGrad, is slow on lower spec devices
such as Google Pixel. Texture sampling is particularly slow and difficult to
implement for CPU fallbacks.
- Texture sampling need more parameters, in particular the full u,v
transformation matrix, leading to a large increase in the command size. Since
all commands use the same size, that memory penalty is paid by all scenes, not
just scenes with textures.
- It is unlikely that piet-gpu will support every kind of fill for every
client, because each kind must be added to kernel4.
With FillImage, a client will prepare the image(s) in separate shader stages,
sampling and applying transformations and special effects as needed. Textures
that align with the output pixel grid can be used directly, without
pre-processing.
Note that the pre-processing step can run concurrently with the piet-gpu pipeline;
Only the last stage, kernel4, needs the images.
Pre-processing most likely uses fixed function vertex/fragment programs,
which on some GPUs may run in parallel with piet-gpu's compute programs.
While here, fix a few validation errors:
- Explicitly enable EXT_descriptor_indexing, KHR_maintenance3,
KHR_get_physical_device_properties2.
- Specify a vkDescriptorSetVariableDescriptorCountAllocateInfo for
vkAllocateDescriptorSets. Otherwise, variable image2D arrays won't work (but
sampler2D arrays do, at least on my setup).
Updates #38
Signed-off-by: Elias Naur <mail@eliasnaur.com>
As described in #62, the non-deterministic scene monoid may result in
slightly different transformations for path segments in an otherwise
closed path.
This change ensures consistent transformation across paths in three steps.
First, absolute transformations computed by the scene monoid is stored
along with path segments and annotated elements.
Second, elements.comp no longer transforms path segments. Instead, each
segment is stored untransformed along with a reference to its absolute
transformation.
Finally, path_coarse performs the transformation of path segments.
Because all segments in a path share a single transformation reference,
the inconsistency in #62 is avoided.
Fixes#62
Signed-off-by: Elias Naur <mail@eliasnaur.com>