mirror of
https://github.com/italicsjenga/slang-shaders.git
synced 2024-11-23 16:11:31 +11:00
70256e15f6
* add updated NDS color, Super Sleuth gamma ramp and more trogglemonkey blurs
130 lines
7 KiB
Markdown
130 lines
7 KiB
Markdown
This info pertains to the Blurs by TroggleMonkey:
|
|
|
|
DESCRIPTION:
|
|
Gaussian blurs are common building blocks in multi-pass shaders, and this
|
|
library of optimized and tested blurs should make it easier to use whatever size
|
|
blur you need. All of these shaders are based on the tex2Dblur* functions in
|
|
include/blur-functions.h, so you can use those directly if you ever need to
|
|
add more processing to the same pass as a Gaussian blur.
|
|
|
|
PICK THE RIGHT BLUR FOR YOUR USE CASE:
|
|
There are several different types of blurs, ranging in size from 3-12 texels:
|
|
a.) "Resize" separable blurs use vertical and horizontal passes and require N
|
|
taps for an Nx blur. These are arbitrarily resizable.
|
|
b.) "Fast" separable blurs use vertical and horizontal passes and require N taps
|
|
for an (N*2 - 1)x blur. They exploit bilinear filtering to reduce the
|
|
required taps from e.g. 9 to 5. These are always faster, but they have
|
|
strict image scale requirements.
|
|
c.) "Resize" one-pass blurs combine the vertical/horizontal passes of the
|
|
"resize" separable blurs, and they require NxN taps for an NxN blur. These
|
|
perform slowly enough that only tex2Dblur3x3resize is useful/included.
|
|
d.) Other one-pass blurs combine the vertical/horizontal passes of the "fast"
|
|
separable blurs, and they exploit bilinear filtering the same way. They're
|
|
faster than separable blurs at 3x3, competitive at 5x5 depending on options,
|
|
and slower at 7x7 and above...but larger blurs may still be useful if you're
|
|
hurting for passes.
|
|
e.) "Shared" one-pass blurs go a step further: They also use quad-pixel
|
|
communication with fine-grained derivatives to distribute texture samples
|
|
across a 2x2 pixel quad. (ddx() and ddy() are required, as well as a GPU
|
|
that uses fine-grained derivatives). These blurs are faster than the other
|
|
one-pass blurs, but they have some artifacts from combining sample-sharing
|
|
with bilinear sampling, so they're best reserved for reblurring an already-
|
|
blurred input.
|
|
|
|
Every blur expects linear filtering. Except for resize separable blurs, all
|
|
require a pass scale of (1/(2^M)) for some M >= 0. That is, the output image
|
|
has to have a 1:1 pixel:texel ratio with some mipmap of the input image, so use
|
|
e.g. scaleN = "1.0" or scaleN = "0.25", not scaleN = "0.33" or scaleN = "2.0".
|
|
Note: mipmap_inputN must = "true" in your .slangp file for scales other than 1.0.
|
|
|
|
There are two suffixes on the .slang files relating to gamma correction:
|
|
* Blurs with no suffix assume linear RGB input and output.
|
|
* Blurs with a "-last-pass" suffix use pow() to gamma-correct their output.
|
|
* Blurs with a "-gamma-encode-every-fbo" suffix use pow() to linearize each
|
|
input sample and again to gamma-correct the output. These blurs are MUCH
|
|
slower than blurs without this suffix, but they're provided in case you want
|
|
to be [almost] gamma-correct on platforms without sRGB FBO's. (The "almost"
|
|
is because bilinear filtering still won't be gamma-correct without sRGB.)
|
|
* There are also blurs with both suffixes. This may seem redundant, but they
|
|
make it easier to use a different output gamma for the last pass than for
|
|
the rest of the pipeline (such as when simulating another display device like
|
|
a Game Boy Advance or CRT). See srgb-helpers/README.txt for more information.
|
|
|
|
BENCHMARK RESULTS:
|
|
Blurs have different performance characteristics depending on whether the input
|
|
is mipmapped and depending on whether they're gamma-encoding every FBO. Here's
|
|
an excerpt from the blur-functions.h description with a comparison. Note that
|
|
benchmarks without an sRGB heading use "-gamma-encode-every-fbo" suffixes, and
|
|
you can just look at the sRGB performance figures if you don't care about gamma:
|
|
// Here are some framerates from a GeForce 8800GTS. The first pass resizes to
|
|
// viewport size (4x in this test) and linearizes for sRGB codepaths, and the
|
|
// remaining passes perform 6 full blurs. Mipmapped tests are performed at the
|
|
// same scale, so they just measure the cost of mipmapping each FBO (only every
|
|
// other FBO is mipmapped for separable blurs, to mimic realistic usage).
|
|
// Mipmap Neither sRGB+Mipmap sRGB Function
|
|
// 76.0 92.3 131.3 193.7 tex2Dblur3fast
|
|
// 63.2 74.4 122.4 175.5 tex2Dblur3resize
|
|
// 93.7 121.2 159.3 263.2 tex2Dblur3x3
|
|
// 59.7 68.7 115.4 162.1 tex2Dblur3x3resize
|
|
// 63.2 74.4 122.4 175.5 tex2Dblur5fast
|
|
// 49.3 54.8 100.0 132.7 tex2Dblur5resize
|
|
// 59.7 68.7 115.4 162.1 tex2Dblur5x5
|
|
// 64.9 77.2 99.1 137.2 tex2Dblur6x6shared
|
|
// 55.8 63.7 110.4 151.8 tex2Dblur7fast
|
|
// 39.8 43.9 83.9 105.8 tex2Dblur7resize
|
|
// 40.0 44.2 83.2 104.9 tex2Dblur7x7
|
|
// 56.4 65.5 71.9 87.9 tex2Dblur8x8shared
|
|
// 49.3 55.1 99.9 132.5 tex2Dblur9fast
|
|
// 33.3 36.2 72.4 88.0 tex2Dblur9resize
|
|
// 27.8 29.7 61.3 72.2 tex2Dblur9x9
|
|
// 37.2 41.1 52.6 60.2 tex2Dblur10x10shared
|
|
// 44.4 49.5 91.3 117.8 tex2Dblur11fast
|
|
// 28.8 30.8 63.6 75.4 tex2Dblur11resize
|
|
// 33.6 36.5 40.9 45.5 tex2Dblur12x12shared
|
|
|
|
BASIC USAGE:
|
|
The .slangp presets in the quality-test-presets folder provide usage examples
|
|
for basically every .slang blur shader. The "-srgb" suffix on some .slangp
|
|
presets is an explicit notice that they use sRGB FBO's. Note how and when the
|
|
"-last-pass" suffix is used for each .slang file, etc.
|
|
|
|
The provided .slangp files with the "-mipmap" suffix are used to test quality and
|
|
benchmarking with mipmapping enabled, but none of them actually use mipmapping
|
|
as a feature in and of itself. The following contrived .slangp would do that:
|
|
shaders = "4"
|
|
|
|
# Pass0: Linearize RGB:
|
|
shader0 = ../../srgb-helpers/first-pass-linearize.slang
|
|
filter_linear0 = "true"
|
|
scale_type0 = "source"
|
|
scale0 = "1.0"
|
|
srgb_framebuffer0 = "true"
|
|
|
|
# Pass1: Upsize to 4x. Pretend this pass does significant processing at 4x.
|
|
shader1 = ../../stock.slang
|
|
filter_linear1 = "true"
|
|
scale_type1 = "source"
|
|
scale1 = "4.0"
|
|
srgb_framebuffer1 = "true"
|
|
|
|
# Pass2: Blur a source-sized mipmap 9x vertically; just shrink horizontally.
|
|
shader2 = ../blur9fast-vertical.slang
|
|
filter_linear2 = "true"
|
|
scale_type2 = "source"
|
|
scale2 = "0.25"
|
|
srgb_framebuffer2 = "true"
|
|
mipmap_input = "true"
|
|
|
|
# Pass3: Blur 9x horizontally
|
|
shader3 = ../blur9fast-horizontal.slang
|
|
filter_linear3 = "true"
|
|
scale_type3 = "source"
|
|
scale3 = "1.0"
|
|
srgb_framebuffer3 = "true"
|
|
|
|
# Pass4: Scale to the screen size and gamma-correct the output:
|
|
shader4 = ../../srgb-helpers/last-pass-gamma-correct.slang
|
|
filter_linear4 = "true"
|
|
scale_type4 = "viewport"
|
|
scale4 = "1.0"
|