LDR tone-mapping and how to do it properly

I’m a huge fan of post-processing in games. It seems that no matter what I’m writing, I can’t resist the temptation to install an over-the-top bloom effect and some tone-mapping. And that’s me being conservative. The great thing about tone-mapping is that you can throw it on the end of just about any rendering pipeline and instantly glitz up the visuals, giving it that ‘digitally remastered’ feel.

Tone-Mapping

So what is tone-mapping? Well, it’s a post-processing effect that remaps the render’s colour-dynamics to change the overall appearance of the game. Some of the more common tone-mapping operations are contrast & brightness, saturation and HDR exposure. HDR tone-mapping is an art unto itself and it can only be used for HDR pipelines (with floating-point render-targets and HDR textures), so I’ll restrict the conversation to the more universal LDR tone-mapping.

The Conjugate Transform

If you plan on doing anything interesting in a tone-mapping pass, then it’s rather necessary, for the sakes of performance, readability and maintainability, to convert to a more suitable colour-space than RGB. The first such space that springs to mind is HSL, and indeed tone-mapping in HSL is like a gentle walk in the park, but it’s wise to look a little further afield to YCC. But why YCC? Sure, it does offer a luma component for brightness & contrast mapping, but the saturation is tied up in the two chroma components. Granted, this is a bad thing, but it’s not nearly as bad as the cost of a full RGB->HSL->RGB conversion.

The Problem With HSL

I spent a fair while trying to optimise this HSL-detour code in HLSL, hoping that I could make it viable for small shaders, but came out rather disappointed. Despite the availability of vector SIMD instructions, the piecewise-linear nature of the transformation demands a worryingly large number of conditional branches, and unless you have the luxury of Shader Model 4’s true branching, this amounts to a horror story of register-juggling and lerp operations. I didn’t try too hard, but believe it’s impossible to complete the transformation-and-back through HSL in under 100 ps_3_0 operations, which immediately rules out the possibility of assembling on a Shader Model 2 target platform.

YCC To The Rescue

Contrast this with the simplicity of the truly linear RGB->YCC->RGB transformation. If there’s one thing that the GPU does best, it’s vector-matrix multiplication, and that’s exactly what this boils down to:

float4x4 RGBToYCC = 
{ 0.299,  0.587,  0.114,  0.000,
  0.701, -0.587, -0.114,  0.000,
 -0.299, -0.587,  0.886,  0.000,
  0.000,  0.000,  0.000,  1.000};
 
float4x4 YCCToRGB = 
{ 1.000,  1.000,  0.000,  0.000,
  1.000, -0.509, -0.194,  0.000,
  1.000,  0.000,  1.000,  0.000,
  0.000,  0.000,  0.000,  1.000};
 
float4 PS_LDRToneMap(float4 tex_coord : TEXCOORD) : COLOR
{
    float4 RGBA = tex2D(linear_sampler, tex_coord);
    float4 YCCA = mul(RGBToYCC, RGBA);
 
    // Work goes here
 
    RGBA = mul(YCCToRGB, YCCA);
    return saturate(RGBA);
}

This assembles to a handsome 9 instructions, leaving plenty of room even with the arcane ps_1_4’s instruction limit.

The Prize

My current project makes use of this code to ramp the contrast and saturation up and down, according to the scene. The code is simple, and the results rather dramatic.

// Contrast
YCCA.x -= contrast_midpoint;
YCCA.x *= contrast_gain;
YCCA.x += contrast_midpoint;
 
// Chroma
YCCA.y -= chroma_red_midpoint;
YCCA.y *= chroma_red_gain;
YCCA.y += chroma_red_midpoint;
 
YCCA.z -= chroma_blue_midpoint;
YCCA.z *= chroma_blue_gain;
YCCA.z += chroma_blue_midpoint;

LDR Tone-Mapping

Leave a Reply