A report by Georg Petschnigg, June 2002

Noisy ISO1600 Image Noise Removed using FlashSharpen

Taking images in low light environments, where the scene illumination is part of the appeal of the shot is tricky without a tripod. Imagine a candle light dinner, a sculpture garden at night, a thoughtfully lit interior or a glowing holiday tree. Using a flash for these pictures ruins the ambient lighting and tripods are cumbersome to use. As digital camera technology improves, cameras are able to take snapshots with higher ISO settings. Most of these so called high ISO images however exhibit significant noise. FlashSharpen is an algorithm that uses a flash light image's noise characteristics to improve a noisy natural light image. Given a camera that can take two images in rapid succession, one with flash and one without, this algorithm can be used to improve low light photography.

Description

The motivation behind FlashSharpen was developed here in my Computer Vision Project. The basic idea of FlashSharpen is to combine a flash images high frequency components with a non-flash image's low frequency components.

Essentially FlashSharpen uses the high frequency components of the flash image to construct a plausible and noise reduce high frequency band for the non flash image. In this way we can express the compositing operation as follows:

noflash_new_highfrequency = noflash_highfrequency + (flash_highfrequency - noflash_highfrequency) * alpha;

or written as the OVER operator for frequency images:

noflash_new_highfrequency = noflash_highfrequency * (1- alpha) + flash_highfrequency  * alpha;

The final image is constructed as follows:

noflash_noise_enhanced = noflash_new_highfrequency + noflash_lowfrequency

The main intellectual challenge becomes how to compute the blending operator alpha, so that the result looks pleasant to the human eye. The difficulty herein lies that shadows and speculars present in the flash image create unwanted edges. Those translate into unwanted high frequencies. I spent almost the entire quarter attempting to solve this problem, using Bayesian techniques and color segmentation, none of which yielded desirable result. Most of these efforts where misguided by a desire to "detect" artifacts such as shadows and speculars and to prevent the compositing in those regions. The current solution performs a frequency domain analysis and does not combine the frequency bands where they strongly disagree.

The algorithm works as follows:

1. Split the two images into their respective low and high pass components
% Low Frequencies
f_l = blur_using_gaussian(img_flash, SIGMA);
nf_l = blur_using_gaussian(img_noflash, SIGMA);

% High Frequencies
f_hf = img_flash - f_l;
nf_hf = img_noflash - nf_l;

function G = gauss(sig)
x = floor(-3*sig):ceil(3*sig);
G = exp(-0.5*x.^2/sig^2);
G = G/sum(G);

The main challenge here is how to pick the filter width SIGMA for the Gaussian. The noisier an image, the greater SIGMA should be, because more of the non-flash image's high frequency band have to be removed. I found a value of 2 works for most images, and you need larger values as the image quality degrades. See the Very Noisy Image, SIGMA = 10, and SIGMA = 30. SIGMA corresponds to the low pass filter's cut-off frequency.

2. Compute the cost function
Now I compute a cost function which prevents disagreeing frequency components to be combined. My proposed measure is the absolute difference of each pixels high frequency component with respect to its counter in the other image.

%Cost function
cost = abs(f_hf - nf_hf);

The cost is then scale to the range 0 - 1.

max_difference = max(cost (:)); % Find largest cost
cost = cost ./ max_difference; % Normalize to 0 - 1 range
cost = 1 - cost ; % Invert for compositing step


Finally we have to weight the cost, by how badly it would look to the human eye if we were to blend a pixel at that particular location. Ideally this would be a function that has some basis in the physiology of vision, but for the time being I use an exponential function. This exponential function will weight small difference more favorable than large ones. The exponent also determines which image should be favored over another. An exponent greater than one favor the flash image, less than one the non flash image. In my case I automatically generate an exponent which attempts to bring the cost mean to 0.6. This cost directly translates into alpha.

mean_cost  = mean(cost (:));
exponent = log(0.6) / log(mean_cost);
cost = cost .^ exponent;


3. Final Composition using cost function
In the final step we combine the images using the OVER operator describe above.

nf_new_hf = nf_hf + (f_hf - nf_hf) .* cost;
fused_image = nf_l + nf_new_hf;

Since this operation is not energy preserving, out of bound values have to be set to zero or one respective.

fused_image(find( fused_image < 0)) = 0.0;
fused_image(find( fused_image > 1)) = 1.0;

Results

After spending so much time on this problem I am particularly pleased by the results. See them here.
Download Matlab Code here (coming soon).

Discussion

If the presented algorithm where implemented in a digital camera, flash could be used as an active sensor to denoise low ambient light images. With the arrival of JPEG200 which stores images in a wavelet tree the compositing operation described above could be performed in real-time, on compressed data with direct feedback to users. This would facilitate picking a blending exponent (paired with a Look Up Table) via user interaction.

This algorithm relies on semi decent looking input images, but scales fully as hardware and image quality improves.

Copyright © 2002 Georg Petschnigg