The very last statement of the `clip_leaf` shader is the assignment to
the `clip_bboxes` buffer. The buffer write is indexed on the global
invocation ID. It is possible for this index to be larger than the total
number of clips in at least one workgroup since the clip count isn't
strictly a multiple of workgroup size.
Currently the size of the clip_bboxes buffer matches the number of
clips. This means the buffer index is likely to run past the buffer.
This is not an issue when running on wgpu as it internally enables
bounds checking when compiling WGSL (so all buffer accesses are
implicitly conditional). When compiling the shaders to native backends
the vello_shaders crate currently does not enable implicit bounds
checking, so a buffer overrun is possible.
There are a few potential solutions:
1. Have an explicit bounds check in the shader. This is straightforward
and consistent with the existing code that reads from clip_inp. The
downside is that with bounds checking enabled, this extra check is
redundant in the generated code. This is the solution included in
this PR.
2. Make sure that the clip_bboxes buffer has a size that is a multiple
of clip_leaf's workgroup size. This was the approach taken by
piet-gpu on its native HALs. This effectively wastes up to 4080 bytes
(255 * 16) to store unused bbox values.
3. Enable Naga's implicit bounds checks when compiling to native. This
would make the behavior consistent with the wgpu backend, however it
comes at the cost of increased renderer complexity as the native
implementation must supply the sizes of each buffer in an implicitly
generated buffer binding to every shader stage.
The examples logged the total processing time for an SVG including both
parse and GPU buffer encoding times. Times for these two operations are
now logged separately.
Removed the workgroup count and buffer size calculations from
src/render.rs. This code now uses the types returned by the
vello_encoding crate for this purpose.
This method was intended for the coverage mask variant of the pipelines
that was present in piet-gpu. This code has regressed since the wgpu
rewrite and the mask rendering variant of the pipelines will be
redesigned. Remove this for now instead of having to maintain it until
the rewrite.
- Use the buffer sizes in src/render.rs as the current 128K is not
sufficient for even the test scenes
- Add BumpAllocators type and bump buffer size
- Support the `base_color` render option
- Use immutable type construction where possible
- Fix the path tag stream length calculation to use the offsets stored
in Layout. This both matches the current behavior in src/render.rs and
makes it so that CpuConfig's construction no longer needs the Encoding
type as an input
- Renamed CpuConfig & GpuConfig types to 'RenderConfig' and
'ConfigUniform'
This change moves the vello encoding logic to a new crate under
crates/encoding. Combined with the `vello_shaders` crate, this enables
lightweight integration of the Vello pipelines into renderers that don't
depend on wgpu (or perhaps written in languages other than Rust).
The Scene/Fragment API currently remain the vello crate.
Naga traslates workgroup variable declarations to threadgroup
address-space entry-point parameters when generating MSL. Metal API
validation requires that the memory sizes for these parameters be set
explicitly by calling setThreadgroupMemoryLength:index on the
MTLComputeCommandEncoder.
The crate now calculates the required memory size for global workgroup
variables that are accessed by the entry point and provides them
alongside the binding list. This is abstracted separately from the
binding list.
While the current usage that we're aware of is limited to Metal, this
information is being provided as part of the generic ComputeShader type
instead of a MSL-specific type, as the information itself is computed
from the parsed WGSL IR and not specific to Metal.
Using vector graphics for the image doesn't make a huge amount of sense. The flower photo is by Raph and happily licensed to anyone who wants to use it.
This allows the graph to display at a reasonable scale in the face of
fluctuations and a max recorded sample that is much larger than the
current average.
The Auto* modes should have wider compatibility as they implement
fallback behavior based on what the platform supports. This also means
that on web "vsync off" will likely be incorrect.