* remove CmdBuff::dispatch() which was moved to ComputePass
* remove CmdBuff::write_timestamp() which is replaced by timestamp index pair in ComputePassDescriptor
This commit adds timestamps to compute pass boundaries for command style timer queries on metal.
It also updates the code in piet-gpu/stages, piet-gpu/lib.rs and tests/ to use the new ComputePass type.
We need to be able to call memory_barrier() on ComputePass, to avoid the borrow checker complaining if we tried to call it on the underlying command buffer.
Use compute pass for tests in tests subdir. This is also shaking out some issues that weren't apparent from just collatz.
In particular, we need more autorelease pools to prevent things from leaking. As of this commit, the "clear" test runs correctly but the others haven't yet been converted to the compute_pass format.
This commit implements actual rendering for the new piet-scene crate.
In particular, it adds a new EncodedSceneRef type in piet_gpu::encoder to store references to the raw stream data. This is used as a proxy for both PietGpuRenderContext and piet_scene::scene::Scene to feed data into the renderer.
Much of this is still hacky for expedience and because the intention is to be a transitional state that maintains both interfaces until we can move to the new scene API and add a wrapper for the traditional piet API.
The general structure and relationships between these crates need further discussion.
Current status: the piet-gpu-hal module (including the collatz example)
have the new API (with queries set on compute pass) implemented. The
other uses have not yet been updated.
On Metal, only M1 is tested. The "command" counter style is partly
implemented, but not fully wired up.
The current plan is to more or less follow the wgpu/wgpu-hal approach. In the mux/backend layer (which corresponds fairly strongly to wgpu-hal), there isn't explicit construction of a compute encoder, but there are new methods for beginning and ending a compute pass. At the hub layer (which corresponds to wgpu) there will be a ComputeEncoder object.
That said, there will be some differences. The WebGPU "end" method on a compute encoder is implemented in wgpu as Drop, and that is not ideal. Also, the wgpu-hal approach to timer queries (still based on write_timestamp) is not up to the task of Metal timer queries, where the query offsets have to be specified at compute encoder creation. That's why there are different projects :)
WIP: current state is that stage-style queries work on Apple Silicon, but non-Metal backends are broken, and piet-gpu is not yet updated to use new API.
This puts most of the infrastructure in place but I'm hitting an error
that "sampleCountersInBuffer is not supported on this device".
The issue is that M1 supports stage boundaries and not command boundaries.
We'll have to rework the logic a bit. (And, in the most general case, support
both)
Start implementing stage boundaries, but it will probably require an API
change.
This patch adds radial gradients, including both the piet API and some
new methods specifically to support COLRv1, including the ability to
transform the gradient separately from the path.
We always do BeginClip/EndClip if it's a solid tile and the blend mode
is not default.
Also fix missing entry in pipeline layout (affects Vulkan but not Metal).
This patch switches to a variable size encoding of draw objects.
In addition to the CPU-side scene encoding, it changes the representation of intermediate per draw object state from the `Annotated` struct to a variable "info" encoding. In addition, the bounding boxes are moved to a separate array (for a more "structure of "arrays" approach). Data that's unchanged from the scene encoding is not copied. Rather, downstream stages can access the data from the scene buffer (reducing allocation and copying).
Prefix sums, computed in `DrawMonoid` track the offset of both scene and intermediate data. The tags for the CPU-side encoding have been split into their own stream (again a change from AoS to SoA style).
This is not necessarily the final form. There's some stuff (including at least one piet-gpu-derive type) that can be deleted. In addition, the linewidth field should probably move from the info to path-specific. Also, the 1:1 correspondence between draw object and path has not yet been broken.
Closes#152
This just runs ninja on the piet-gpu/shaders on a Windows machine, so
translated shaders match the existing pipeline.
At some point, we'll rework this to reduce friction.
* Add blend and composition mode enums to API
* Mirror these in the shaders
* Add new public blend function to PietGpuRenderContext that mirrors clip
* Plumb the modes through the pipeline from scene to kernel4
This PR reworks the clip implementation. The highlight is that clip bounding box accounting is now done on GPU rather than CPU. The clip mask is also rasterized on EndClip rather than BeginClip, which decreases memory traffic needed for the clip stack.
This is a pretty good working state, but not all cleanup has been applied. An important next step is to remove the CPU clip accounting (it is computed and encoded, but that result is not used). Another step is to remove the Annotated structure entirely.
Fixes#88. Also relevant to #119
Also updates comment.
We know the implementation is incomplete and needs refinement, but it
seems useful to commit as a starting point for further work.
This exposes interfaces to render glyphs into a texture atlas. The main changes are:
* Methods to plumb raw Metal GPU resources (device, texture, etc) into piet-gpu-hal objects.
* A new glyph_render API specialized to rendering glyphs. This is basically the same as just painting to a canvas, but will allow better caching (and has more direct access to fonts, bypassing the Piet font type which is underdeveloped).
* Ability to render to A8 target in addition to RGBA.
WIP, there are some rough edges, not least of which is that the image format changes are only on mac and cause compile errors elsewhere.
Make max workgroup size 256 and respect LG_WG_FACTOR.
Because the monoid scans only support a height of 2, this will reduce
the maximum scene complexity we can render. But it also increases
compatibility. Supporting larger scans is a TODO.