Saturday, April 28, 2007

Better Video -- Gamma and A/D converters

Okay folks, this is going to get somewhat detailed, because I think I have a half-decent and possibly new idea here. As you read along, just keep in mind that the overall goal is to make a ramp-compare A/D converter that has really large dynamic range (16 bits) and goes fast so we can minimize the frame scan time.
[The referenced wikipedia article has some significantly wrong bits. Just reading the referred articles shows the problems.]
First we're going to talk about gamma. Most digital sensors generate digital values from the A/D converter which are linear with the amount of light received by the pixel. One of the first steps in the processing pipeline is to convert this e.g. 12 bit value into a nonlinear 8 bit value. You might wonder why we would go to all that trouble to get 12 bits off the sensor, only to throw away 4 of them.

Consider just four sources of noise in the image for a moment:
  1. Readout noise. This noise is pretty much constant across varying light levels. For the purposes of discussion, let's suppose we have a standard deviation of 20 electrons of readout noise.
  2. kTC noise. Turn off a switch to a capacitor, and you unavoidably sample the state of the electrons diffusing back and forth across the switch. What you are left with is kTC noise, e.g. 28 electrons in a 5 fF well at 300 degrees K. Correlated double sampling (described below) can cancel this noise.
  3. Photon shot noise. This rises as the square root of the electrons captured.
  4. Quantization noise. This is the difference between the true signal and what the digital system is able to represent. Standard deviation is 1/12 of the step size between quantization levels.
You can't add standard deviations but you can add variances. To add these noise sources, take the square root of the sum of the squares. So, if we have a sensor (such as the Kodak KAF-18000) with 20 electrons of readout noise, and a full well capacity of 100,000 electrons, read by a 14 bit A/D with a range that perfectly matches the sensor, then we will see total noise which is dominated by photon shot noise. I've done a spreadsheet which lets you see this here.

Amazingly enough, we can represent the response of this sensor in just 7 bits without adding significant quantization noise. This is why an 8-bit JPG output from a camera with a 12-bit A/D converter really can capture nearly all of what the sensor saw. JPG uses a standard gamma value, which is tuned for visually pleasing results rather than optimal data compression, but the effect is similar. 8-bit JPG doesn't have quite the dynamic range of today's sensors, but it is pretty good.

The ramp-compare A/D converters described in my last blog entry work by comparing the voltage to be converted to a reference voltage which increases over time. When the comparator says the voltages are equal, the A/D samples a digital counter which rises along with the analog reference voltage. Each extra bit of precision requires the time taken to find the voltage value to double. When we realize that much of that time is spent discerning small difference in large signal values that will subsequently be ignored, the extra time spent seems quite wasteful.

Instead of having the reference voltage linearly ramp up, we could have the reference voltage exponentially ramp up, so that the A/D converter would generate the 8b values from the gamma curve directly. The advantage would be that the ramp could take 2^8=256 compares instead of, say, 2^12=4096 compares -- a lot faster!

It's not quite so easy, however. In order to eliminate kTC noise, the A/D converter actually makes two measurements: one of the pixel value after reset (which has sampled kTC noise), and another of the pixel value after exposure (which has the same sample of kTC noise plus the accumulated photoelectrons). Because the kTC sample is the same, the difference between the two has no kTC noise. This technique is called correlated double sampling (CDS), and it is essential. Because gamma-coded values are nonlinear, there is no easy way to subtract them -- you have to convert to linear space, then subtract, then convert back. As I mentioned, for a typical 5 fF capacitance, kTC noise at room temperature is 28 electrons, so this can easily dominate the noise in low illumination operation.

So what we need is an A/D that produces logarithmically encoded values that are easy to subtract. That's easy -- floating point numbers!

If we assume we have a full well capacity of 8000 electrons and we want the equivalent of 10b dynamic range but only need 6b of precision, then the floating-point ramp-compare A/D does the following:
Mantissa 6 bits, 8 e- step size
64 steps of 8 e- to 512 electrons, measure kTC noise
64 steps of 8 e- to 512 electrons
32 more steps of 16 e- to 1024
32 more steps of 32 e- to 2048
32 more steps of 64 e- to 4096
32 more steps of 128 e- to 8192

That's just 256 compares, and gets 10b dynamic range, so it's 4x faster than a normal ramp-compare.

In the last blog post, I described how you could do sequential, faster exposures per pixel to get increased dynamic range (in highlights, not shadow, of course). For example, each faster exposure might be 1/4 the time of the exposure before. The value from one of these faster exposures would only be used if the well had collected between 2000 and 8000 electrons, since if there are fewer electrons the next longer exposure would be used for more accuracy, and if there are more electrons the well is saturated and inaccurate.

One nice thing about having a minimum of 2000 electrons in the signal you are sampling is that the signal-to-noise ratio will be around 40, mainly due to photon shot noise. kTC noise will be swamped, so there is no need for correlated double sampling for these extra exposures. 40:1 is a good SNR ratio. For comparison, you can read tiny white-on-black text through a decent lens with just 10:1 SNR.

If you make the ratio between exposures larger, say 8:1, then you either lose SNR at the bottom portion of the subsequent exposures, or you need a larger well capacity, and in either case the A/D conversion will take more steps. These highlight exposures are very quick to convert because they don't need lots of high-precision LSB steps.

When digitizing the faster exposures, the ramp-compare A/D coverters just do:
64 steps of 64 e- to 4096
32 more steps of 128 e- to 8192

That's 96 compares and gets another 2 bits of dynamic range.

1 base exposure and 3 such faster exposures would give 16b equivalent precision in 544 compares, which is faster than the 10b linear ramp-compare A/D converters used by Micron and Sony. Now as I said in my previous post, this is a dream camera, not a reality. There is a lot of technical risk in this A/D scheme. These ADCs are very touchy devices. For example, 8000 electrons on a 5 fF capacitor is just 0.256 volts and requires distinguishing 0.256 millivolt signal levels. If the compare rate is 50 MHz, you get just 20 ns to make that quarter-millivolt distinction. It's tough.

But, the bottom line is that this scheme can deliver a wall of A/Ds which can do variable dynamic range with short conversion times. The next post will show how we'll use these to construct a very high resolution, high sensitivity, high frame rate sensor for reasonable cost.