Editing improvements in response to feedback

This addresses most of the feedback other than the need to rework the opening paragraphs to be clearer to a casual observer.
This commit is contained in:
Raph Levien 2023-01-09 18:24:02 -08:00
parent cd92a64ea0
commit e59a87c281
3 changed files with 30 additions and 23 deletions

View file

@ -84,34 +84,39 @@ The only currently implemented `Engine` uses `wgpu`.
The idea is that this can abstract easily over multiple GPU back-ends, without either the render logic needing to be polymorphic or having dynamic dispatch at the GPU abstraction.
The goal is to be more agile.
## Goals
The major goal of Vello is to provide a high quality GPU accelerated renderer suitable for a range of 2D graphics applications, including rendering for GUI applications, creative tools, and scientific visualization. The [roadmap for 2023](doc/roadmap_2023.md) explains the goals and plans for the next few months of development
Vello emerges from being a research project, which attempts to answer these hypotheses:
- To what extent is a compute-centered approach better than rasterization ([Direct2D])?
- To what extent do "advanced" GPU features (subgroups, descriptor arrays, device-scoped barriers) help?
- Can we improve quality and extend the imaging model in useful ways?
Another goal of the overall project is to explain how the renderer is built, and to advance the state of building applications on GPU compute shaders more generally. Much of the progress on piet-gpu is documented in blog entries. See [doc/blogs.md](doc/blogs.md) for pointers to those.
## History
Vello was previously known as `piet-gpu`. This prior incarnation used a custom cross-API hardware abstraction layer, called `piet-gpu-hal`, instead of [`wgpu`].
<!-- Some discussion of this transition can be found in the blog post [A requiem to piet-gpu-hal]() TODO: Once the blog post is published -->
Vello was previously known as `piet-gpu`. This prior incarnation used a custom cross-API hardware abstraction layer, called `piet-gpu-hal`, instead of [`wgpu`]. The decision to lay down `piet-gpu-hal` in favor of WebGPU is discussed in detail in the blog post [Requiem for piet-gpu-hal].
Our [roadmap for 2023](doc/roadmap.md) explains our goals and plans for the next few months of development. There is also a [vision](doc/vision.md) document which explained the longer-term goals of the project, and how we might get there.
A [vision](doc/vision.md) document dated December 2020 explained the longer-term goals of the project, and how we might get there.
Many of these items are out-of-date or completed, but it still may provide some useful background.
An archive of this version can be found in the branches [`custom-hal-archive-with-shaders`] and [`custom-hal-archive`].
This succeeded the previous prototype, [piet-metal], and included work adapted from [piet-dx12] by Brian Merchant.
## Goals
## Related projects
<!-- TODO: Are these goals still correct? Are there new goals? Are these useful to have in the readme specifically, now that we're actually "encouraging" users -->
Vello takes inspiration from many other rendering projects, including:
The main goal is to answer research questions about the future of 2D rendering:
- Is a compute-centered approach better than rasterization ([Direct2D])? How much so?
- To what extent do "advanced" GPU features (subgroups, descriptor arrays) help?
- Can we improve quality and extend the imaging model in useful ways?
## Blogs and other writing
Much of the research progress on piet-gpu is documented in blog entries. See [doc/blogs.md](doc/blogs.md) for pointers to those.
<!-- Some mention of `google/forma` here -->
* [Pathfinder](https://github.com/servo/pathfinder)
* [Spinel](https://fuchsia.googlesource.com/fuchsia/+/refs/heads/master/src/graphics/lib/compute/spinel/)
* [Forma](https://github.com/google/forma)
* [Massively Parallel Vector Graphics](https://w3.impa.br/~diego/projects/GanEtAl14/)
* [Random-access rendering of general vector graphics](https://hhoppe.com/proj/ravg/)
## License
@ -152,3 +157,4 @@ licensed as above, without any additional terms or conditions.
[winit]: https://github.com/rust-windowing/winit
[Bevy]: https://bevyengine.org/
[`wgsl-analyzer`]: https://marketplace.visualstudio.com/items?itemName=wgsl-analyzer.wgsl-analyzer
[Requiem for piet-gpu-hal]: https://raphlinus.github.io/rust/gpu/2023/01/07/requiem-piet-gpu-hal.html

View file

@ -2,6 +2,7 @@
Much of the research progress on piet-gpu is documented in blog entries. Here are the most relevant:
* [Requiem for piet-gpu-hal](https://raphlinus.github.io/rust/gpu/2023/01/07/requiem-piet-gpu-hal.html)
* [piet-gpu progress: clipping](https://raphlinus.github.io/rust/graphics/gpu/2022/02/24/piet-gpu-clipping.html), Feb 24, 2022
* [Fast 2D rendering on GPU](https://raphlinus.github.io/rust/graphics/gpu/2020/06/13/fast-2d-rendering.html), Jun 13, 2020
* [A sort-middle architecture for 2D graphics](https://raphlinus.github.io/rust/graphics/gpu/2020/06/12/sort-middle.html), Jun 12, 2020

View file

@ -14,7 +14,7 @@ A 2D renderer needs to support at least a basic imaging model. The biggest singl
Supporting images *well* is tricky, in large part because of limitations in GPU infrastructure. The number of images that may appear in a scene is not bounded, which is not a good fit for the basic descriptor binding model. Ideally a single shader (the fine rasterization stage) can sample from all the images in the scene directly, but that's not really possible in WebGPU 1.0. Perhaps a future extension will have version of this; in Vulkan it's descriptor indexing (and [buffer device address] and [descriptor buffer], as GPU approaches to this problem keep evolving, but it's less likely the latter will be standardized in WebGPU, as they're basically variants of raw pointers and thus extremely difficult to make safe).
Until then, we'll do a workaround of having a single atlas image containing all the images in the scene. That's extremely annoying and takes memory, but the cost of copying is itself not expected to be that bad. And in the common case where an image is reused across multiple frames, it should in most cases be possible to avoid those copies.
Until then, we'll do a workaround of having a single atlas image containing all the images in the scene. That has nontrivial cost in memory allocation and bandwidth for texture copying, and the logic is tricky to write robustly, but the impact of copying on total rendering time is not expected to be that bad. And in the common case where an image is reused across multiple frames, it should in most cases be possible to avoid those copies.
One tricky part is changes to the scene encoding. At the moment, it's more or less self-contained, but will need to be extended so that scene fragments can contain references to image resources (which will be a reference counted pointer to either the image bytes or to an external image reference, which might be rendered by some other WebGPU task). Additionally, there needs to be an additional pass between encoding and submission to the GPU, where references to these resources are replaced by uv quads in the texture atlas. Similar logic is needed to resolve cached glyphs, about which more below.
@ -30,11 +30,11 @@ The last thing that belongs in "basic imaging model" is a proper API for glyph r
Vello primarily runs in GPU compute shader stages, but there are three motivations for also having a CPU fallback path.
First, in some cases a competent GPU won't be available, or perhaps it is on a denylist because of known bugs. In that case, a CPU implementation is necessary in order to display anything.
The most important reason is to improve testing and debuggability. At the moment, we have two or three cases where there are artifacts or missing objects, but only in certain configurations. The systematic approach to this problem is to have a CPU implementation of each compute stage, and then the CPU and GPU outputs can be compared. Other problems might be isolated by swapping out one implementation for another.
Second, because of various overhead, GPU dispatch is only efficient when working with large datasets. When rendering an extremely simple scene, it might be more efficient just to do the compute work on CPU, and save the GPU dispatch. Generally you'll still want to do fine rasterization (production of actual pixels) on GPU, as even if the CPU could do that really quickly there would still be the cost of getting them uploaded.
In addition, in some cases a competent GPU won't be available, or perhaps it is on a denylist because of known bugs. In that case, a CPU implementation is necessary in order to display anything.
But perhaps the most important reason is to improve testing and debuggability. At the moment, we have two or three cases where there are artifacts or missing objects, but only in certain configurations. The systematic approach to this problem is to have a CPU implementation of each compute stage, and then the CPU and GPU outputs can be compared. Other problems might be isolated by swapping out one implementation for another.
Lastly, because of various overhead, GPU dispatch is only efficient when working with large datasets. When rendering an extremely simple scene, it might be more efficient just to do the compute work on CPU, and save the GPU dispatch. Generally you'll still want to do fine rasterization (production of actual pixels) on GPU, as even if the CPU could do that really quickly there would still be the cost of getting them uploaded.
Because of the emphasis on testing, at least the initial CPU implementations will be optimized for clarity and simplicity, not so much performance. It is possible to imagine doing SIMD optimization and running the work on multiple threads, but that is not planned (see non-goals below).
@ -112,7 +112,7 @@ Another issue is the overhead of WebGPU compared to native, which we hope is not
Vello now has a "recording" abstraction that includes lightweight proxies for resources such as buffers and textures, and a declarative approach to specifying the graph of compute shader dispatches. This abstraction is an appealing alternative to an object-oriented hardware abstraction layer (as was the case in the pre-WSGL version of piet-gpu). We also think this abstraction could support focused, lightweight back-ends for more native GPU APIs. The relative priority of building such back-ends is not clear, but we did want a design where we could gain back some of the performance that was given up in the move to WebGPU.
* TODO: link to "requiem for piet-gpu-hal" when done
* [Requiem for piet-gpu-hal](https://raphlinus.github.io/rust/gpu/2023/01/07/requiem-piet-gpu-hal.html)
### Conflation artifacts