Derivating HDR-IPT direct and inverse transformations

Following my work on the filmic tonemapping, several users have reported issues with very saturated blue areas (stage spotlights, bright skies) and red areas. The grail of image processing is being able to affect colors and brightness independantly. This is promised by several (too many) colorspaces (L*a*b*, Luv, CIECAM02, etc.) that, unfortunately, fail to deliver. The big conundrum of tonemapping is raising luminance without affecting perceptual colors, and, by color, I mean hue and saturation/chroma. The problem is, once you lifted the luminance, you need to correct the saturation too, because it will look oversaturated. That triggers the need for a proper gamut mapping in addition of the tonemapping.

This is where the nightmare begins. Again, dealing with saturation or chroma is pretty trivial… except when you want to do so while avoiding hue shifts. These hue shifts can be produced either by the subsequent gamut clipping or by color models that don’t entirely decouple hue and saturation/chroma. There is one confidential color space that seems to do well on the matter of hue linearity: IPT, and its HDR variant, by Fairchild and Chen. Unfortunately, the backward transformations are not given in the paper and nowhere to be found over the internet. We will derivate them here.

Forward transformations

From XYZ at D65 illuminant, to HDR-IPT. This section comes directly from Fairchild’s paper.

Let us denote $Y_s$ the relative luminance of the surround, for which Fairchild recommends a standard assumption of 0.2, and $Y_{abs}$ the absolute luminance of the scene, which can be assumed to be 100 Cd/m² when no additional information is provided. Then:

$$ s_f = 1.25-0.25 ×\left(\frac{Y_s}{0.184}\right) $$
$$ l_f = \frac{\log(318)}{\log(Y_{abs})}$$

So, with default values:
$$\epsilon = \frac{0.59}{s_f × l_f} =  0.5213250183194932$$

Then, the Michaelis-Menten function tweaked between $[0;1]$ to fit the original IPT space on top of their data:
$$f(\omega) = 246 × \frac{\omega^\epsilon}{\omega^\epsilon + 2^\epsilon} + 0.02$$

In the above equation, 0.02 is assumed to be the noise level.

Then, we go from XYZ (color space of the wavelengths spectrum) to LMS (color space of the physiological cone response) by:

$$\begin{bmatrix}L\\M\\S\end{bmatrix}_{HDR} =
\begin{bmatrix}
0.4002 & 0.7075 & -0.0809\\
-0.2280 & 1.1500 & 0.0612\\
0.0 & 0.0 & 0.9184
\end{bmatrix} \begin{bmatrix}X \\ Y \\ Z\end{bmatrix}_{D65}$$

where XYZ coordinates are scaled between $[0; 1]$ and LMS between [-100;100].

Now, let’s define an element-wise operation $F(\omega)$ where $\omega$ are the elements of the $[L M S]^T$ vector, such that:

$$F(\omega) = \begin{cases}f(\omega), &\forall \omega \geq 0\\ -f(-\omega), &\forall \omega \leq 0\end{cases}$$

Or, alternatively:

$$F(\omega) = \text{sign}(\omega) \cdot f(\text{abs}(\omega))$$

which is equivalent. So:

$$ \begin{bmatrix}L’\\M’\\S’\end{bmatrix}_{HDR} = \begin{bmatrix}F(L)\\F(M)\\F(S)\end{bmatrix}_{HDR} $$

Notice here that the final goal is to turn the equations into code. For performance, we want to use SIMD, aka streamlining systematic operations on full vectors, and the branches that $F(\omega)$ will create with its element-wise checks will either prevent the compiler from doing auto-vectorization or harm the performance of the pure-SSE code we might write. So, we can describe that transform using a masking approach, more compatible with a SIMD vector logic.

Let’s define a masking vector $I$ containing 1 where $[LMS]^T \geq 0$ and 0 elsewhere, masking the postive values. Then, $\overline I$ the element-wise binary complement of $I$, masking the negative values. Then, redefine $f(\omega)$ as a vectorial function $ f(\vec \omega)$. Building $I$ is an element-wise (slow operation) comparison, but all the rest becomes vector operations. Now:

$$ \begin{bmatrix}L’\\M’\\S’\end{bmatrix}_{HDR} = f \left(I \cdot \begin{bmatrix}L\\M\\S\end{bmatrix}_{HDR} \right) – f \left( -\overline{I} \cdot \begin{bmatrix}L\\M\\S\end{bmatrix}_{HDR} \right) $$

Finally:

$$\begin{bmatrix}I\\P\\T\end{bmatrix}_{HDR} =
\begin{bmatrix}
0.4000 & 0.4000 & 0.2000\\
4.4550 & -4.8510 & 0.3960\\
0.8056 & 0.3572 & -1.1628
\end{bmatrix} \begin{bmatrix}L’\\M’\\S’\end{bmatrix}_{HDR}$$

From which we can derive the chrominance as a simple euclidian norm $C = \sqrt{P^2+T^2}$ and the hue as a simple angle $H = \arctan(\frac{T}{P})$, which then define a cylindric polar space.

Of course, computing an euclidian norm, we assume the IPT space is an orthonormal base, meaning complete linear independance between the vectors of the base, which is a prequisite to be able to compute coordinates and projections with scalar products. As usual, Fairchild does not prove it: in the so-called color science, you please plug the formula and shut up, it works ! Well, it’s true for believers.

Backward transformations

From IPT-HDR to XYZ at D65 illuminant. The above matrix has, at least, a non-zero determinant, so we can invert it (using Numpy):

$$\begin{bmatrix}L’\\M’\\S’\end{bmatrix}_{HDR}=
\begin{bmatrix}
1. & 0.09756893 & 0.20522643\\
1. & -0.11387649 & 0.13321716\\
1. & 0.03261511 & -0.67688718\\
\end{bmatrix}
\begin{bmatrix}I\\P\\T\end{bmatrix}_{HDR} $$

Doing the dot product of the original matrix by its inverse gives an eye matrix with an error L2 norm of 2.8089328150338845e-16. We are good.

Now, let’s inverse the $f(\omega)$ function. Again, Fairchild does not provide the domains of definitions of the direct function and its parameters… (Americans suck at maths) So we need to actually check if it holds the necessary properties of an invertible function.

To be invertible, $f$ needs to be a bijection $\mathbb{R} \rightarrow \mathbb{R}$. From the look of it, it is a kinked sigmoïd branch, so we check that item, and plot the function just to be sure (using the default parameters from above, that is $\epsilon = 0.52$):

Looking at the form of $\epsilon$, it seems $\epsilon \in \mathbb{R}^+$, meaning $\epsilon$ can be lower than 1, for which $\omega^\epsilon$ has no real solution if $\omega$ is negative. So:

$$\exists \, f(\omega) \, \in \, \mathbb{R}^+,  \, \forall \, \omega \, \in \, \mathbb{R}^{+}, \forall \, \epsilon \, \in \, \mathbb{R}^{*+}$$

hence the trick to deal with negative values of $\omega$, thus the piece-wise function $F$ looks like that:

Ooops. The junction between the 2 branches in $0$ looks suspiciously not smooth. In fact, checking more closely, the derivative of $f$ is:

$$f'(\omega) = – \frac{246 \epsilon \omega^{2 \epsilon}}{\omega \left(2^{\epsilon} + \omega^{\epsilon}\right)^{2}} + \frac{246 \epsilon \omega^{\epsilon}}{\omega \left(2^{\epsilon} + \omega^{\epsilon}\right)}, \forall \, \omega \, \in \, \mathbb{R^{*+}}, \forall \, \epsilon \, \in \, \mathbb{R}^{+}$$

which is not definite in $0$, so there is no continuity of the slope nor of the curvature in $0$. Raccording the 2 branches like that is kind of careless and could break the smoothness of the tones gradients around 0 if you don’t pay attention (that kind of thing is important for pixel pushers). Even worse, $F(0^+) = 0.02$ and $F(0^-) = -0.02$ so we have a $0.04$ gap at $0$.

Now, one could think LMS values represent light energy, so negative values should never happen because there is no negative light or negative energy. Well, working in digital raw processing software, that is true only assuming no black level shifting/offsetting has gone berserk earlier in the pipe, which puts quice a few numeric layers between our color space conversion and the physical reality we try to represent.

Besides, looking closely at the transformation matrix XYZ -> LMS, it is not symmetric, nor Hermitian, so it can’t be positive and at least semi-definite. To confirm it, it has two complex eigenvalues. In non-maths language, it means perfectly valid (positive) XYZ values close to $0$ can output invalid (negative) LMS vectors, noticeably when Z is greater than Y (dark saturated blues), so we can’t be clever and clip or normalize input XYZ values, we have no way of avoiding dealing with negatives, and the slope/curvature discontinuity it creates at $0$ (plus the slowdown of element-wise operations in the program). It seems IPT is solving the problem of hue linearity by creating another one. You will need to see for yourself how damaging it could be for your specific application, testing dark saturated blues and reds, but be careful to check that.

To invert $f$, we need to isolate the independant variable $\omega$ in the left-hand and switch it with the dependant variable $y = f(\omega)$. Or just let Sympy do it, because it’s boring calculus and we won’t be smarter for doing it manually:

$$f^{-1} (\omega) = \left(\frac{2^{\epsilon} \left(- 50 \omega + 1\right)}{50 \omega – 12301}\right)^{1 / \epsilon}, \epsilon \, \in \, \mathbb{R}^+$$

Now, we need to assert the domain of definition :

$$\exists \, f^{-1} (\omega) \, \in \mathbb{R}^+ \Leftrightarrow
\begin{cases}\dfrac{2^{\epsilon} \left(- 50 \omega + 1\right)}{50 \omega – 12301} & \geq 0\\ 50 \omega – 12301 &\neq 0 \Rightarrow \omega \neq 246.02 \end{cases}$$

See here why not giving the bounds and domain of definition of the parameters hurt ? If we can assume $\omega \leq 1$, then $1/ \omega \geq 1$ thus we can release the first condition on the sign of the radicand. Since, in the direct transform, $\omega \, \in \, [0;100]$, $50 \omega – 12301 \leq 5000 – 12301 \leq 0$, meaning the denominator is always negative, so the first condition is met only if $2^\epsilon (-50 \omega + 1) \leq 0 $. That triggers:

$$\exists \, f^{-1} (\omega) \, \in \mathbb{R}^+ \Leftrightarrow
\begin{cases}\omega & \geq 0.02\\ \omega &< 246 \end{cases}$$

That’s a very important result that I haven’t found in the litterature: during the backward transform $[L’M’S’]^T \rightarrow [LMS]^T$, the values of $[L’M’S’]^T$ are bounded by the offset and the coefficient of $f$, although Fairchild says (slide 19) that the maximum lightness is unconstrained (at the input, but the whole paper does not seem to care about coming back the other way), and you have to be careful since there is no out-of-bound extrapolation provided. That makes the IPT space not really suitable for luminance editing (although it has not been designed for that, but just in case some people would want to try it).

Same as before, we symmetrize the function, just the same way as before :

$$F^{-1}(\omega) = \text{sign}(\omega) \cdot f^{-1}(\text{abs}(\omega))$$

Notice that you need to symmetrize the function even if $\omega \leq 1$ (in which case the absolute value is not necessary), because otherwise $f^{-1}$ will always output positive values, and negative values of LCM coordinates are to be expected for $F^{-1} \cdot F(\omega)$ to be (sort of) an identity.

But now, since $f^{-1}$ is undefinite for $\omega < 0.02$, it creates a gap on $F^{-1}$ between $[-0.02, 0.02]$ :

This is were an offset in the direct Michaelis-Menten function leads us in the inverse function

Then, we can see $F^{-1} \cdot F(\omega)$ looks like a no-op :

The forbidden $\omega$ values that output $f(\omega) = 0.02$ and for which $f^{-1}$ is undefinite are limited to $0$

Finally, the LMS to XYZ transform is a piece of cake to invert:

$$ \begin{bmatrix}X \\ Y \\ Z\end{bmatrix}_{D65} =
\begin{bmatrix}
1.85024294 & -1.13830164 & 0.23883789\\
0.36683078 & 0.64388454 & -0.01059356\\
0. & 0. & 1.08885017
\end{bmatrix}\begin{bmatrix}L\\M\\S\end{bmatrix}_{HDR} $$

Fixing IPT-HDR

This started as a simple algebra project, but being careful with limits and domains of definition, we raised a few traps in-between. Let’s recall where we sit so far :

  1. the XYZ to LCM transformation matrices are not definite and positive, so positive XYZ vectors can and will get negative LCM projections,
  2. the Michaelis-Menten function is definite only for positive LCM vectors,
  3. the negative values of LMC are dealt with using a symmetric MM function,
  4. the junction of the two MM branches is not smooth,
  5. the symmetrization of the MM reciprocal bijection has a discontinuity between $[-0.02;+0.02]$ that might cause loss of signal.

The offset of the Michaelis-Menten function is meant to represent the noise level. From a signal-processing point of view, when you define a noise level, it means everything below is non-data, so we clip it. But here, we clearly have data below $0.02$ and even below $0$ because of the XYZ to LCM transformation, and we certainly don’t clip them since we symmetrize the MM function. So, the physical justification for the existence of this noise level is non-sequitur to me, and the mathematics show it’s harmful to the reciprocal bijection.

I generated a 63×63×63 lookup table containing all the possible XYZ vectors having coordinates between 0 and 1, made them take a roundtrip XYZ -> IPT-HDR -> XYZ, and counted the volume of valid input vectors that became NaNs after the roundtrip. I got 99.22 % of valid outputs (output = input ± epsilon), so 0.78 % of the XYZ volume is not transferable losslessly (output = NaN). Some may argue these values are outside the visible locus, and they might be right. The thing is, from a math point of view, the transformation XYZ -> IPT-HDR -> XYZ is not an automorphism, from a signal-processing point of view, it does not conserve the energy of the signal, and from an image processing point of view, that sucks to push pixels.

My proposal is to remove the offset term which does not exist in the original formulation of the Michaelis-Menten equation (for good reason). Doing so, we will re-adjust its coefficient to minimize the error, considering the function given by Fairchild matches some experimental measures. Indeed, Fairchild’s paper shows a lot of work to build a mathematical model matching the visual experimental data, but no concern for the mathematical stability and usability of the model. It that sense, building a one-way colorspace doesn’t seem to bother him.

We use a non-linear least-square method to optimize the coefficient of the pure Michaelis-Menten function matching the one with an offset. The new coefficient is not much different, $246.06076715$ with a variance of $5.76 \cdot 10^{-10}$. The new function is then :

$$f(\omega) = 246.06076715 × \frac{\omega^\epsilon}{\omega^\epsilon + 2^\epsilon}, \, \omega \, \in \, \mathbb{R}^+, \epsilon \, \in \, \mathbb{R}^+$$

This function can then be inverted on $[0; 246]$, which is still not an endomorphism but is better, since it has no discontinuity around zero :

$$ f^{-1}(\omega) =  \left(- \frac{2 \cdot 10^7 \cdot 2.0^{\epsilon} \omega}{2 \cdot 10^7 \omega – 4921215343}\right)^{\frac{1}{\epsilon}}, \, \omega \, \in \, [0; 246.06[, \epsilon \, \in \, \mathbb{R}^+ $$

And now this function is safe to use IPT-HDR as a working color space, where no pixel pushing will lead to undefinite values while going back to XYZ. Also, the direct function will be a little bit less harmful for gradients continuity around zero. We can re-run the same LUT benchmark as before to confirm everything behaves on 100 % of the XYZ volume (see the code below).

Conclusion

This article reflects the epistemological problem I have with the color research and many other researches in image-processing : it’s not enough to stack complicated computations on top of each other to be a legitimate science. First of all, calculus is not maths, calculus is the far-end technicality of maths. So, defining a color space on its sole purpose to match experimental datasets is being careless on the fact that a color space is before and foremost a vector space that needs to be mathematically stable one way and the other, meaning having proper forward and backward base transformations.

Then, new color spaces pop out every once in a while, always more or less the same : 14 to 18 undergrads students from a western country university sort out color patches based on their perceptual color difference for their teacher, who numerically fits the experimental data and the spectral measurements with some regression, derivates transfer functions and matrices, bakes all that into a color model, and we have yet another color model describing how western undergrads see colors. Well done !

Finally, where are the linear independance tests for the base vector of the color space ? Where are the boundaries analyses and domains of validity of the functions ? Where are the issues of using non-positive and non-definite matrices explained ? (aka am I the only one perturbed by the fact that positive XYZ vectors can produce negative cone response in LMS vectors ?) Where are the warnings on the nasty calculus workarounds performed to deal with artificially-created negative scalars ? (yes, discontinuity of the slope and of the curvature is a problem, maybe hidden deep inside black point adjustments and such, but a problem nonetheless). Where is the goddam mathematical due-process ?

I have found them nowhere. But look at the Munsell colors, it’s a linear function when we draw their representation in name-of-our-better-space !

*Sigh*

Bonus

Here is some Python/Numpy/Scipy code used for the computations (copy-pasted as is in case you want to check):

The Michaelis-Menten function : (notice we need to write our own sign function since Numpy’s one return 0 for x == 0)

Get its derivative and inverse it through Sympy:

Write it into Numpy :

Find the best coefficient to match Fairchild’s MM function with a pure MM function (without the offset) :

The corrected pure MM functions :

Build a 63×63×63 lookup table and print the xyY coordinates of vectors for which the whole XYZ -> IPT -> XYZ roundtrip is not a no-op (ie input ≠ output) :

Output: