This helps performance but not all performance issues have been resolved. Nontrivial CPU goes into write_buffer, and it's also possible that there isn't enough overlapping between CPU and GPU work.
The area bug was found and fixed by @dfrg, and is adapted from #239. I wanted to move it to a separate PR so that one would be more focused on API.
The other bug is currently silent because the two quantities swapped are both 4, but it is triggered when experimenting with tuning for performance.
Previously there was a limit of 256k pathtags in a scene, due to the need for multi-dispatch prefix sum for the pathtag monoid. This patch increases the limit to 64M, which ought to be enough for most applications.
It works by having 4 dispatches for the pathtag prefix sum: 2 to reduce, then 2 to scan.
* Move the vello crate to the root of the crate
* Add warning that README is work in progress
* Add newline in warning
* Move the unlicense into the shader folder
* Fixup wgsl-analyzer include paths