Commit graph

181 commits

Author SHA1 Message Date
Elias Naur 5db427c549 kernel4: compute and output alpha
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-31 19:51:49 +02:00
Elias Naur ee4429a26f kernel4: separate area from alpha in clip stack
This change prepares for kernel4 to output alpha. No functional changes.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-31 19:51:42 +02:00
Elias Naur 22507dea0e pre-allocate kernel4 scratch space in coarse.comp
coarse.comp knows the maximum stack depth, and can pre-allocate scratch
space for kernel4.comp. Kernel4 no longer contains allocations nor
control barriers.

The invocation local blend stack is gone as well; it didn't seem to make
any difference in performance to always use global memory for pushing
and popping.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-31 18:48:19 +02:00
Elias Naur e6b535d942 coarse.comp: extract area commands into function
No functional changes.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-30 19:56:09 +02:00
Elias Naur d916a9e2c4 backdrop.comp: support stroked Annotated_Image and Annotated_BeginClip
Commit 8db77e180e added support for
strokes to FillImage and BeginClip, but missed backdrop.comp.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-30 19:33:25 +02:00
Elias Naur 678bfedfca kernel4: assume colors in alpha-premultiplied sRGB format
See http://ssp.impulsetrain.com/gamma-premult.html for a description of
the format.

Pre-multiplied alpha only matters for translucent objects; draw a few
such shapes in the test render.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-29 21:17:01 +02:00
Elias Naur eb37db1b05 replace per-element fill mode flags with a SetFillMode element
Fixes #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-29 21:10:25 +02:00
Elias Naur bb61f875dc kernel4: remove dead code left over from previous clipping approach
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-29 21:10:17 +02:00
Tatsuyuki Ishi 4864a7fe0f Create chunks over the x axis in addition to y axis
This allows more coalescing with image loads/stores, since all of our images are stored with a tiled layout.
2021-03-23 20:54:49 +09:00
Elias Naur f0127812eb tightly pack fine rasterizer commands
Reclaims the space waste from splitting fill mode commands from fill
commands.

For example, a CmdStroke + CmdColor use an extra tag word compared to
the former combined CmdStroke. This change shaves off that one word.

In the future, we can pack several command tags into one tag word,
saving even more space.

Fixes #66

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 16:43:33 +01:00
Elias Naur 8db77e180e support stroked fills for clips, images
This change completes general support for stroked fills for clips and
images.

Annotated_size increases from 28 to 32, because of the linewidth field
added to AnnoImage. Stroked image fills are presumably rare, and if
memory pressure turns out to be a bottleneck, we could replace the
linewidth field with a separate AnnoLinewidth elements.

Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 16:43:33 +01:00
Elias Naur db59b5d570 coarse,kernel4: make stroke, (non-zero) fill, solid separate commands
Before this change, every command (FillColor, FillImage, BeginClip)
had (or would need) stroke, (non-zero) fill and solid variants.

This change adds a command for each fill mode and their parameters,
reducing code duplication and adds support for stroked FillImage and
BeginClip as a side-effect.

The rest of the pipeline doesn't yet support Stroked FillImage and
BeginClip. That's a follow-up change.

Since each command includes a tag, this change adds an extra word for
each fill and stroke. That waste is also addressed in a follow-up.

Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 16:43:33 +01:00
Elias Naur 44bff2726c collapse FillCubic and StrokeCubic into Cubic with flags for fill mode
Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur df055563bd collapse annotated Fill and Stroke to Color with fill mode flag
No functionality changes, just different encoding.

Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur e9ff509ab9 use tag flags for fill vs stroke modes in scene elements
Encode stroke vs fill as tag flags, thereby reducing the number of scene
elements. Encoding change only, no functional changes.

The previous Stroke and Fill commands are merged to one command,
FillColor. The encoding to annotated element is divergent, which is
fixed when annotated elements move to tag flags.

Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur a5b6bda941 add support for element flags to shaders
Commit 9afa9b86b6 added Rust support for
encoding flags into elements. This change adds support to shaders by
introducing variant tag structs:

struct VariantTag {
    uint tag;
    uint flags;
}

and returning them from Variant_tag functions.

It also adds a flags argument to write functions for enum variants that
include TagFlags.

No functionality changes.

Updates #70

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur 903ab1fb59 implement FillImage command and sRGB support
FillImage is like Fill, except that it takes its color from one or
more image atlases.

kernel4 uses a single image for non-Vulkan hosts, and the dynamic sized array
of image descriptors on Vulkan.

A previous version of this commit used textures. I think images are a better
choice for piet-gpu, for several reasons:

- Texture sampling, in particular textureGrad, is slow on lower spec devices
  such as Google Pixel. Texture sampling is particularly slow and difficult to
implement for CPU fallbacks.
- Texture sampling need more parameters, in particular the full u,v
  transformation matrix, leading to a large increase in the command size. Since
all commands use the same size, that memory penalty is paid by all scenes, not
just scenes with textures.
- It is unlikely that piet-gpu will support every kind of fill for every
  client, because each kind must be added to kernel4.

With FillImage, a client will prepare the image(s) in separate shader stages,
sampling and applying transformations and special effects as needed. Textures
that align with the output pixel grid can be used directly, without
pre-processing.

Note that the pre-processing step can run concurrently with the piet-gpu pipeline;
Only the last stage, kernel4, needs the images.

Pre-processing most likely uses fixed function vertex/fragment programs,
which on some GPUs may run in parallel with piet-gpu's compute programs.

While here, fix a few validation errors:
- Explicitly enable EXT_descriptor_indexing, KHR_maintenance3,
  KHR_get_physical_device_properties2.
- Specify a vkDescriptorSetVariableDescriptorCountAllocateInfo for
  vkAllocateDescriptorSets. Otherwise, variable image2D arrays won't work (but
sampler2D arrays do, at least on my setup).

Updates #38

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:50:12 +01:00
Elias Naur 07e07c7544 ensure consistent path segment transformation
As described in #62, the non-deterministic scene monoid may result in
slightly different transformations for path segments in an otherwise
closed path.

This change ensures consistent transformation across paths in three steps.

First, absolute transformations computed by the scene monoid is stored
along with path segments and annotated elements.

Second, elements.comp no longer transforms path segments. Instead, each
segment is stored untransformed along with a reference to its absolute
transformation.

Finally, path_coarse performs the transformation of path segments.
Because all segments in a path share a single transformation reference,
the inconsistency in #62 is avoided.

Fixes #62

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:45:23 +01:00
Elias Naur ad444f615c elements.comp: use shared array of structs directly
The NVIDIA shader compiler bug that forced splitting of the state struct
into primitive types is now fixed.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:45:23 +01:00
Elias Naur 79d722df48 remove unused commands from pathseg
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-03-19 12:45:23 +01:00
Elias Naur b73eabf4eb kernel4.comp: remove unused commands
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-02-24 15:32:24 +01:00
Elias Naur 6a4e26ef2a all: add optional memory checks
Defining MEM_DEBUG in mem.h will add a size field to Alloc and enable
bounds and alignment checks for every memory read and write.

Notes:
- Deriving an Alloc from Path.tiles is unsound, but it's more trouble to
  convert Path.tiles from TileRef to a variable sized Alloc.
- elements.comp note that "We should be able to use an array of structs but the
  NV shader compiler doesn't seem to like it". If that's still relevant, does
  the shared arrays of Allocs work?

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2021-02-15 16:07:45 +01:00
Elias Naur ee67a0a515 kernel4: simplify a tiny bit
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur 716517cc04 coarse,binning: organize bins into width_in_bins x height_in_bins
The binning shader supports up to N_TILE bins. To efficiently cover wide or
tall viewports, convert the rigid N_TILE_X x N_TILE_Y bin layout to a variable
width_in_bins x height_in_bins layout.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur ef4ec772ad backdrop: repair unsound optimization
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur 8b62022749 backdrop: avoid a (benign) zero-sized read
Found with MEM_DEBUG added in later change.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur c4f5a69a0d implement variable output sizing
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur c67696714b coarse.comp: don't write Cmd_End to tiles out of bounds
If WIDTH_IN_TILES or HEIGHT_IN_TILES are not divisible by N_TILE_X or N_TILE_Y
respectively, the previously unconditional Cmd_End_write would write out of
bounds.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur 4de67d9081 unify GPU memory management
Merge all static and dynamic buffers to just one, "memory". Add a malloc
function for dynamic allocations.

Unify static allocation offsets into a "config" buffer containing scene setup
(number of paths, number of path segments), as well as the memory offsets of
the static allocations.

Finally, set an overflow flag when an allocation fail, and make sure to exit
shader execution as soon as that triggers. Add checks before beginning
execution in case the client wants to run two or more shaders before checking
the flag.

The "state" buffer is left alone because it needs zero'ing and because it is
accessed with the "volatile" keyword.

Fixes #40

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur a2a2d12c5d path_coarse.comp: fix intersection inconsistencies, take 2
The previous attempt to fix inconsistent intersections because of floating
point inaccuracy[0] missed two cases.

The first case is that for top intersections with the very first row would fail
the test

tag == PathSeg_FillCubic && y > y0 && xbackdrop < bbox.z

In particular, y is not larger than y0 when y0 has been clipped to 0.

Fix that by re-introducing the min(p0.y, p1.y) < tile_y0 check that does work
and is just as consistent. Add similar check, min(p0.x, p1.x) < tile_x0, for
deciding when to clip the segment to the left edge (but keep consistent xray check
for deciding left edge *intersections*).

The second case is that the tracking left intersections in the [xray, next_xray]
range of tiles may fail when next_xray is forced to last_xray, the final xray value.

Fix that case by computing next_xray explicitly, before looping over the
x tiles. The code is now much simpler.

Finally, ensure that xx0 and xx1 doesn't overflow the allocated number of tiles
by clamping them *after* setting them. Adjust xx0 to include xray, just as xx1
is adjusted; I haven't seen corruption without it, but it's not obvious xx0
always includes xray.

While here, replace a "+=" on a guaranteed zero value to just "=".

Updates #23

[0] 29cfb8b63e

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-27 20:24:29 +01:00
Elias Naur d21f2b68de all: add SPDX license headers
Fixes #53

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-11 18:24:35 +01:00
Elias Naur 5c04e4882b remove unused tilegroup.h and extra spaces from kernel4.comp
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-11 15:00:58 +01:00
Elias Naur 580b63e558 elements.comp: tighten state size calculations
The state header is only one word (flags), not two.

Move the partition atomic counter to a separate field instead of state[0],
simplifying state offset calculations.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-10 18:48:16 +01:00
Elias Naur 1c6ca7e5fb remove unused BinChunk type
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-08 00:45:08 +01:00
Raph Levien 634530fb91 Merge branch 'master' into image_work 2020-12-02 11:58:45 -08:00
Raph Levien 3906f348fd
Merge pull request #47 from linebender/clip_opt
Optimize clips
2020-12-02 11:57:14 -08:00
Elias Naur 29cfb8b63e eliminate inconsistent line intersections from path_coarse.comp
The finite precision of floating point computations can lead the coarse
renderer into inconsistent tile intersections, which implies impossible line
segments such as lines with gaps or double intersections. The winding number
algorithm is sensitive to these errors which show up as incorrectly filled
paths.

This change forces all intersections to be consistent.
First, the floating point top edge intersection test is removed; top edge intersections are
completely determined by left edge intersections.
Then, left edge intersections are inserted from the tile with the last top edge
intersection. The next top edge is then fixed to be the last tile with a left
edge intersection.

More details in the patch comments.

Fixes #23

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-01 18:35:29 +01:00
Elias Naur 19f4d9fa95 change tile segment representation to (origin, vector)
Eliminates the precision loss of the subtraction in the sign(end.x - start.x)
expression in kernel4. That's important for the next change that avoids
inconsistent line intersections in path_coarse.

Updates #23

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-01 18:34:40 +01:00
Elias Naur 2068171f96 path_coarse.comp: tighten variable scopes, delete unused variables
No functional changes.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-12-01 18:01:04 +01:00
Raph Levien 97dcb5122e Merge branch 'master' into image_work 2020-11-29 17:09:48 -08:00
Raph Levien b8ea1e35cf Merge branch 'master' into clip_opt 2020-11-29 17:07:46 -08:00
Elias Naur feeb459fa1 remove FillMask and FillMaskInv
Obsoleted by BeginClip/EndClip.

Updates #36

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-11-29 16:59:58 +01:00
Elias Naur bd450ef461 piet-gpu-types: remove unused Segment and SegChunk types
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-11-29 16:51:35 +01:00
Raph Levien 4138f8a516 Optimize clips
Optimize tiles with clip masks that are all-zero or all-one.

Part of #36
2020-11-27 09:30:35 -08:00
Raph Levien facc9e0982 Use sampler for texture images
Provide images to fine rasterization kernel as readonly textures with a
sampler, rather than storage images. That lets us use the GPU's hardware
for sampling, which should be considerably more efficient.

There are a bunch of parameters that are hardcoded, but it does seem to
work.
2020-11-25 18:05:10 -08:00
Raph Levien 047a0830d1 Towards wiring up images to k4
This patch passes a dynamically sized array of textures to the fine
rasterizer.

A bunch of the low level Vulkan stuff is done, but only enough of the
shaders and encoders to do minimal testing. We'll want to switch from
storage images to sampled images, track the actual array of textures
during encoding, use that to build the descriptor set (which will need
to be more dynamic), and of course run image elements through the
pipeline.

Progress towards #38
2020-11-24 22:11:38 -08:00
Raph Levien a60c2dd3c8 Scratch buffer for clip stack
We keep a small window of the clip stack in registers in the fine
rasterization kernel, and when that window is exceeded, spill to global
memory, so the clip stack can be unbounded.
2020-11-22 18:14:09 -08:00
Raph Levien b928c7a3ed Restore FillMaskInv logic 2020-11-21 10:47:28 -08:00
Raph Levien 13134e7cb3 Restore FillMask logic
Per discussion, don't remove FillMask until we get unbounded clip stacks.
2020-11-21 07:00:03 -08:00
Raph Levien d14895b107 Continuing work on clips
I realized there's a problem with encoding clip bboxes relative to the
current transform (see #36 for a more detailed explanation), so this is
changing it to absolute bboxes.

This more or less gets clips working. There are optimization
opportunities (all-clear and all-opaque mask tiles), and it doesn't deal
with overflow of the blend stack, but it seems to basically work.
2020-11-20 18:25:27 -08:00
Raph Levien f53d00e6bc Add transforms and state stack
Actually handle transforms in RenderCtx (was implemented in renderer but
not actually plumbed through). This also requires maintaining a state
stack, which will also be required for clipping.

This PR also starts work on encoding clipping, including tracking
bounding boxes.

WIP, none of this is tested yet.
2020-11-20 18:25:27 -08:00
Elias Naur b942e4035b piet-gpu/shader: ensure forward progress in decoupled lookback
The Vulkan and OpenGL specifications offer only weak forward progress guarantees, and
in practice several mobile devices fail to complete the decoupled lookback
spinloop without mitigation.

This patch implements Raph's suggestion from the "Forward Progress"
section from

https://raphlinus.github.io/gpu/2020/04/30/prefix-sum.html

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-10-25 21:02:58 +01:00
Elias Naur bc01180519 piet-gpu/shader: delete unused is_fill from elements.comp
Delete debug code as well.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-10-25 20:59:54 +01:00
Elias Naur 8fab45544e shader: implement clip paths
Expand the the final kernel4 stage to maintain a per-pixel mask.

Introduce two new path elements, FillMask and FillMaskInv, to fill
the mask. FillMask acts like Fill, while FillMaskInv fills the area
outside the path.

SVG clipPaths is then representable by a FillMaskInv(0.0) for every nested
path, preceded by a FillMask(1.0) to clear the mask.

The bounding box for FillMaskInv elements is the entire screen; tightening of
the bounding box is left for future work. Note that a fullscreen bounding
box is not hopelessly inefficient because completely filling a tile with
a mask is just a single CmdSolidMask per tile.

Fixes #30

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-10-09 13:20:26 +02:00
Elias Naur 55cfd472a5 shader: delete unused code
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-10-09 13:20:26 +02:00
Elias Naur 9be0faba6f piet-gpu-types: remove unused scene elements
Delete image compute shader as well; it is unused.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-27 18:57:53 +02:00
Elias Naur fa9bf0dc2b piet-gpu-types: remove unused ptcl types
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-27 18:30:33 +02:00
Elias Naur dceb0f9412 piet-gpu-types: remove unused annotated types
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-21 10:55:58 +02:00
Elias Naur ac3ac3ddff shader: introduce a crude setting for adjusting the maximum workgroup size
Both the Vulkan and OpenGL ES spec allow implementations to limit workgroups to
128 threads. Add a LG_WG_FACTOR setting for easy switching between 128 and 256
threads, with 256 being kept as the default setting.

Manually tested that LG_WG_FACTOR = 0 (128 threads) works as expected.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-13 13:04:13 +02:00
Elias Naur 326f7f0d03 shader: delete more unused code and variables
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-09-13 13:03:56 +02:00
Elias Naur 05636995dd compute IMAGE_WIDTH and IMAGE_HEIGHT; remove dead code from setup.h
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-08-29 15:03:40 +02:00
Elias Naur de4f963ba0 shader: remove dead code
Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-08-28 17:37:46 +02:00
Elias Naur cfd57361c4 Fix linewidth transformations
The transformation determinant is signed, but we're only interested in
the absolute scale for transforming linewidths.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
2020-08-24 16:12:18 +02:00
bhmerchant@gmail.com d836d21d12 Clean up bits of right edge tracking logic left over from sort-middle. 2020-08-12 19:57:14 -07:00
msiglreith 1cc5c7ac0d Shader documentation and a slight cleanup 2020-06-28 15:37:27 +02:00
msiglreith eed71721eb Update winit example 2020-06-14 23:32:59 +02:00
Raph Levien 79cc9da811 Fancy flattening
Implement same flattening algorithm as kurbo.
2020-06-09 20:45:19 -07:00
Raph Levien eaa1d261c3 Sederberg error metric
Use proper math to compute number of subdivisions. This works but is not
very satisfying, as it over-subdivides.
2020-06-09 18:43:49 -07:00
Raph Levien b571e0d10c Continue wiring up gpu-side flattening
All segments given to path coarse raster are cubics. Flatten to
quadratics.

This works but the quality is not (yet) good.
2020-06-09 17:56:11 -07:00
Raph Levien 0f44bc8b78 Start GPU-side flattening
This starts the work on GPU-side flattening by plumbing curves through.
2020-06-09 16:01:47 -07:00
Raph Levien 3a8227d025 Non-load balanced coarse path raster
This is a bit of a revert of the load-balanced ("more parallel") coarse
path rasterizer, but includes fills and also uses atomicExchange.

I'm doing it this way because it should be considerably easier to do
flattening in this structure, even though there will be some performance
regression.
2020-06-09 15:09:53 -07:00
Raph Levien 7118c8efc1 Fix backdrop of segments to left of viewport
Make sure we account for backdrop in segments clipped out of viewport.
2020-06-09 10:25:22 -07:00
Raph Levien 6db4e20bbb More parallel backdrop propagation
This is a nice improvement but still not great on tiger.
2020-06-06 08:23:40 -07:00
Raph Levien af0a1af8e1 Make fills work
The backdrop propagation is slow but it does work.
2020-06-05 22:40:44 -07:00
Raph Levien f9f5961428 Use atomicExchange over atomicCompSwap
Significant perf win (approx 2x in the path coarse rasterizer)
2020-06-05 08:24:26 -07:00
Raph Levien e5dd9ae01e More parallel path coarse raster
Use fancier load balancing algorithm for coarse rendering of paths.

Seems to work and an improvement in some cases.
2020-06-04 17:42:33 -07:00
Raph Levien 877da4a98e Faster coarse raster
Store a lot more tile context in shared memory and do the work from
that.
2020-06-04 10:39:08 -07:00
Raph Levien e1aa9b2f5d Remove bbox guard
It's probably not necessary.

This development still work in progress.
2020-06-03 20:59:19 -07:00
Raph Levien 7f4a6523a8 Filter sparse tiles
Have a more-parallel read of the tile structures based on bbox coverage,
and only set the bit when the tile isn't empty.

This is a speedup, but there is some duplicated work and it is possible
to improve it further.
2020-06-03 17:55:42 -07:00
Raph Levien 63ba45c774 Fix performance issues
Use larger workgroup for tile initialization (utilization was poor).
Provide correct element count to coarse rasterizer.
2020-06-03 15:32:58 -07:00
Raph Levien ff8cee059c Optimize tile allocation
Use parallel scheme to zero out tiles.
2020-06-03 14:46:41 -07:00
Raph Levien 70a9c17e23 Continue building out pipeline
Plumbs the new tiling scheme to k4. This works (stroke only) but still
has some performance issues.
2020-06-03 12:21:09 -07:00
Raph Levien 294f6fd1db Experiment with new sorting scheme
Path segments are unsorted, but other elements are using the same
sort-middle approach as before.

This is a checkpoint. At this point, there are unoptimized versions
of tile init and coarse path raster, but it isn't wired up into a
working pipeline. Also observing about a 3x performance regression in
element processing, which needs to be investigated.
2020-06-03 09:29:25 -07:00
Raph Levien 2c185c3718 Simplify ringbuf
We don't really need a ring buffer, as we only read what we're actually
going to process.
2020-05-30 21:20:48 -07:00
Raph Levien 192ddc5eab Parallel merge
The fancy stuff :)
2020-05-30 21:11:13 -07:00
Raph Levien 121f29fef6 Merge one segment at a time
No parallelism yet, but seems to improve performance.
2020-05-30 08:51:52 -07:00
Raph Levien 894ef156e1 Change to new merge strategy in binning
WIP

We get "device lost" on NV :/
2020-05-29 20:06:16 -07:00
Raph Levien 319aa703c4 Output multiple pixels per thread in k4
In kernel 4, compute a chunk of pixels rather than just one per thread.
This is a dramatic speedup.

(This commit cherry-picked from another working branch)
2020-05-28 07:54:24 -07:00
Raph Levien e16f68d89d Fix buffer overrun
Was a little too eager zeroing out sh_is_segment[]
2020-05-26 22:47:28 -07:00
Raph Levien dbcffb10db Reinstate fills
Add fills back in.
2020-05-25 15:27:03 -07:00
Raph Levien 3d422d9243 Allocate segment chunks in slabs
Another speedup might be to special-case when the number of chunks in a
stroke or fill command is 1, then the segment header doesn't need
allocation and memory traffic is reduced. But right now we'll avoid the
complexity.
2020-05-25 12:22:29 -07:00
Raph Levien 8eaf49a04d Checkpoint parallel output
Parallel segment output seems to be working for strokes.
2020-05-25 12:14:18 -07:00
Raph Levien 24b3def0a1 Start work on parallel segment output
Output of segments is in parallel. Getting closer, some problems with
chaining but mostly correct.
2020-05-24 21:02:19 -07:00
Raph Levien 55df3e6cc8 Fix linewidth math
Coarse rasterization wasn't entirely taking line width into account.

Also fix swizzle in matrix (not yet used). And fix missing End command
in ptcl output (hasn't been a problem because buffer was cleared).
2020-05-24 09:43:41 -07:00
Raph Levien 7d040dff37 Bit magic for backdrop accumulation
Use bit counting rather than iterating backdrop increments one by one.
A nice if not huge speedup.
2020-05-22 07:30:32 -07:00
Raph Levien a616b4d010 Rework right_edge computation in elements
Trying to fit it into the fancy monad doesn't really work, so use a
more straightforward approach to compute it from the aggregate.

Also add yEdge logic (basically copying piet-metal). With a fix to
ELEMENT_BINNING_RATIO (which I had simply gotten wrong), the example
renders almost correctly, with small bounding box artifacts.
2020-05-21 10:00:56 -07:00
Raph Levien ed4ed30708 Adding backdrop logic
Calculation of backdrops kinda works but with issues, so WIP.
2020-05-20 16:03:27 -07:00
Raph Levien 076e6d600d Progress on wiring up fills
Write the right_edge to the binning output.

More work on encoding the fill/stroke distinction and plumbing that
through the pipeline. This is a bit unsatisfying because of the code
duplication; having an extra fill/stroke bool might be better, but I
want to avoid making the structs bigger (this could be solved by
better packing in the struct encoding).

Fills are plumbed through to the last stage. Backdrop is WIP.
2020-05-20 11:14:19 -07:00
Raph Levien 03da52cff8 Start implementing fills
This should get the "right_edge" value for each segment plumbed through
to the binning phase. It also needs to be plumbed to coarse raster and
wired up there.

Also considering WIP because none of this logic has been tested yet.
2020-05-19 20:40:04 -07:00
Raph Levien 0ed759814b Smarter line segment coverage
Compute tile coverage of segments using optimized algorithm. This
algorithm does a bit of setup, then uses an efficient formula to
compute the span per scan-line.
2020-05-19 09:26:44 -07:00