Separate out render context upload from renderer creation. Upload ramps
to GPU buffer. Encode gradients to scene description. Fix a number of
bugs in uploading and processing.
This renders gradients in a test image, but has some shortcomings. For
one, staging buffers need to be applied for a couple things (they're
just host mapped for now). Also, the interaction between sRGB and
premultiplied alpha isn't quite right. The size of the gradient ramp
buffer is fixed and should be dynamic.
And of course there's always more optimization to be done, including
making the upload of gradient ramps more incremental, and probably
hashing of the stops instead of the processed ramps.
Don't recompute the parameters from quadratic subdivision, but rather
retain them across the two phases (summing the subdivision estimate, and
generating the subdivisions). The motivation for this is that the values
were subtly different (differing by 1 or 2 least signficant bits) across
the two phases. It *might* also be faster depending on ALU/memory
relative performance.
Fixes#107
WIP. Most of the GPU-side work should be done (though it's not tested
end-to-end and it's certainly possible I missed something), but still
needs work on encoding side.
Move types into the toplevel and hide implementation details. Remove
deref of hub CmdBuf to mux. Restrict public visibility of internals.
Most items have some docs, though improvements are still possible. In
particular, there should be detailed safety info.
Add workgroup size to dispatch call (needed by metal). Change all fence
references to mutable for consistency.
Move backend traits to a separate file (move them out of the toplevel
namespace in preparation for the hub types going there, to make the
public API nicer).
Add a method and macro for automatically choosing shader code, and
change collatz example to generate all 3 kinds on build.
Make the hub abstraction connect to the mux, rather than directly to the
Vulkan back-end.
As of this commit, both command line and winit examples work (on
Vulkan). In theory it should be possible to get them working on Dx12 as
well by translating the shader code, but there's a lot that can go
wrong.
This commit also contains a bunch of changes to mux to make conditional
compilation of match arms work, and new methods to support swapchain.
Adds a new "mux" module which can have multiple backends. As of this
commit, it's not wired up at all, but the functionality should be
reasonably complete.
Minor tweaks to the backend trait to accommodate this, mostly changing
Fence and Semaphore to references so they don't need to be Copy.
Part of the work toward #95
Add a method to create a buffer with initial content, which requires
staging buffers under the hood.
This patch also changes the lower-level (Vulkan) interface to be closer
to the raw Vulkan call.
Test whether the GPU supports subgroups (including size control) and
memory model.
This patch does all the ceremony needed for runtime query, including
testing the Vulkan version and only probing the extensions when
available. Thus, it should work fine on older devices (not yet tested).
The reporting of capabilities follows Vulkan concepts, but is not
particularly Vulkan-specific.
The compute shaders have a check for the succesful completion of their
preceding stage. However, consider a shader execution path like the
following:
void main()
if (mem_error != NO_ERROR) {
return;
}
...
malloc(...);
...
barrier();
...
}
and shader execution that fails to allocate memory, thereby setting
mem_error to ERR_MALLOC_FAILED in malloc before reaching the barrier. If
another shader execution then begins execution, its mem_eror check will
make it return early and not reach the barrier.
All GPU APIs require (dynamically) uniform control flow for barriers,
and the above case may lead to GPU hangs in practice.
Fix this issue by replacing the early exits with careful checks that
don't interrupt barrier control flow.
Unfortunately, it's harder to prove the soundness of the new checks, so
this change also clears dynamic memory ranges in MEM_DEBUG mode when
memory is exhausted. The result is that accessing memory after
exhaustion triggers an error.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Don't run extensions unless they're available. This includes querying
for descriptor indexing, and running one of two versions of kernel4
depending on whether it's enabled.
Part of the support needed for #78
The BeginClip and EndClip bounding boxes are absolute and must pairwise
match. I mistakenly modified the BeginClip bounding box for stroked
clips.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Adds an example binary that can be run with `cargo apk`.
One thing that will still need manual tuning (for now) is the size of
the canvas. A good followup is to sense that from the window size.
Don't run extensions unless they're available. This includes querying
for descriptor indexing, and running one of two versions of kernel4
depending on whether it's enabled.
Part of the support needed for #78
coarse.comp knows the maximum stack depth, and can pre-allocate scratch
space for kernel4.comp. Kernel4 no longer contains allocations nor
control barriers.
The invocation local blend stack is gone as well; it didn't seem to make
any difference in performance to always use global memory for pushing
and popping.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
See http://ssp.impulsetrain.com/gamma-premult.html for a description of
the format.
Pre-multiplied alpha only matters for translucent objects; draw a few
such shapes in the test render.
Signed-off-by: Elias Naur <mail@eliasnaur.com>
Reclaims the space waste from splitting fill mode commands from fill
commands.
For example, a CmdStroke + CmdColor use an extra tag word compared to
the former combined CmdStroke. This change shaves off that one word.
In the future, we can pack several command tags into one tag word,
saving even more space.
Fixes#66
Signed-off-by: Elias Naur <mail@eliasnaur.com>
This change completes general support for stroked fills for clips and
images.
Annotated_size increases from 28 to 32, because of the linewidth field
added to AnnoImage. Stroked image fills are presumably rare, and if
memory pressure turns out to be a bottleneck, we could replace the
linewidth field with a separate AnnoLinewidth elements.
Updates #70
Signed-off-by: Elias Naur <mail@eliasnaur.com>