Friday, April 20, 2007

Better Video -- A/D converters

Most of what I shoot with my camera is my kids and extended family, vacations and so forth. I need a better camera, one that can do DSLR quality stills, and also better-than-HDTV video. I'm going to write about a few things I'd like to see in that better camera.

Wall of A/D Converters

Modern DSLRs have a focal plane shutter which transits the focal plane in 4 to 5 ms. This shutter is limited to 200k to 300k operations, or about 166 hours of video, so it's incompatible with video camera operation.

Video cameras typically read their images out in 16 to 33 ms with what is known as an electronic rolling shutter. The camera has two counters which count through the rows of pixels on the sensor. Each row pointed to by the first counter is reset, and each row pointed to by the second counter is read. The time delay between the two counters sets the exposure time, up to a maximum of the time between frames, which is usually 33 ms.

A lot can happen in 33 ms, so the action at the top of the frame can look different from that at the bottom. In video, since the picture is displayed with the bottom delayed relative to the top, this can be okay, but it looks wierd in still shots. ERS is even worse in most higher resolution CMOS sensors which can take a hundred or more ms to read out.

It turns out there is a solution which serves both camps. Micron and Sony both have CMOS sensors (Micron's 4MP 200 FPS and Sony's 6.6MP 60 FPS) designed to scan the image out in about the same time as a DSLR shutter. Instead of running all the pixels through a single or small number of A/D converters, they have an A/D converter per column, and digitize all the pixels in a row simultaneously. The A/D converters are slower, so there is still a limit to how fast the thing can run, but it is feasible (the Micron chip does it) to read the sensor in 5 ms.

These A/D converters are cool because they allow good-looking stop motion like a focal plane DSLR shutter, they can be used for video, and here's the kicker: you get the capability of 200 frame-per-second video!

Currently these A/D converters have 10 bits of precision. Sony's chip can digitize at 1/4 speed and get 12 bits of precision, matching what DSLRs have delivered for years. We can do better than that.

The basic idea is to combine multiple exposures. Generally this is done by doing one exposure at, say, 16ms, and then another at 4ms immediately afterwards, and combining in software. The trouble with this technique is that there is a minimum delay between the exposures of whatever the readout time of the sensor frame is -- call it 5 ms. Enough motion happens in this 5 ms to blur bright objects which one would otherwise expect to be sharp.

Instead, let's have all the exposures at each pixel be done sequentially with no intervening gaps. Three counters progress down the sensor: a first reset, a second which reads and then immediately resets, and a third which just reads. The delay between the first and second waves is 16 times greater than the delay between the second and third waves. The sensor alternates between reading the pixels on the second and third wave rows, and alternates between resetting the first and second wave rows.

Because one exposure is 16x the other, we get 4 more bits than the basic A/D converter would give us otherwise. If the base A/D converter is 10 bits, this would get us to 14 bits. We don't want to have more than a 16x difference, because pixels that just barely saturate the long-exposure A/D have just 6 bits of precision in the short-exposure A/D. 5 bits or less might look funny (you'd see a little quantization noise right at the switchover where darker pixels had less).

But we can do still better. These column-parallel 10 bit A/D converters work by sharing a common voltage line which is cycled through 1024 possible voltage levels by a single D/A converter. So for a 1000 row sensor has to cycle through 1024000 voltages in 5 ms -- the D/A is running at an effective 205 MHz. I'm pretty sure they actually run at 1/2 to 1/4 this clock speed and take multiple samples during each clock cycle. Each column A/D is actually just a comparator which signals when the common voltage exceeds the pixel voltage. If we're willing to have just 9 bits of precision, the thing can run 2 times faster. In low light, that gives us ample time for 4 successive exposures down the sensor (not just two), each, say, 8x smaller than the one before. Now we have 9+3+3+3=18 bits of dynamic range, good for about 14 stops of variation in the scene, with at least six significant bits everywhere but the bottom of the range.

Why bother? Well, if the sensor has a decent pixel size and reasonably low readout noise (I'm thinking of the Micron sensor, but can't say numbers), then an e.g. 16 ms shot with an f/2.8 lens should capture an EV 4 interior reasonably well (here's the wikipedia explanation of EV). That's a dim house interior, or something like a church. Using the 18b A/D scheme above, we could capture an EV 18 stained glass window in that church and a bride and groom at the same time, with no artificial lighting, assuming the camera is on a tripod. That's pretty cool.

The fact that it takes twice as long (e.g. 10 ms instead of 5 ms) to read the sensor is fine. You'd only do this in low light, where your exposures will have to be long anyway. Even if you could read the sensor in 5 ms, if the exposure is 16 ms you can't possibly have better than 60 frames per second anyway. And people who want slow-motion high-resolution video with natural lighting in church interiors are simply asking for too much.

When the scene doesn't need the dynamic range, (say, you are outdoors), you can drop down to 12 bits and run as fast as the 10b column-parallel A/Ds allow in the Micron and Sony chips. This gives you 8 stops of EV range, similar to what most DSLRs deliver today. If you want extra-high frame rates (400 fps full frame), drop to 9 bits of precision.

Actually, if the camcorder back end can handle 8x the data rate, you can imagine very high frame rates (and correspondingly short exposures) done by dropping to 8 or 7 bits of precision, and binning the CMOS pixels together or using a subset of the sensor rows. I think 432-line resolution at 8000 fps would be a pretty awesome feature on a consumer camcorder, even if it couldn't sustain that for more than a second or two after a shutter trigger. By using a subset of the sensor columns or binning CMOS pixels horizontally, you might get the back end data rates down to 1-2x normal operation. That'd be amazing: normal TV resolution, sustained capture of 8000 fps video. Looking at it another way, it gives an idea of how hard it is to swallow the data off a sensor such as I am describing. (I'm getting ahead of myself, talking about resolution here, but bear with me.)

Side note: you don't have and don't want an actual 18b number to represent the brightness at a pixel. Instead, the sensor selects which of the 4 exposures was brightest but not saturated. The output value is then 2 bits to indicate the exposure and 9 bits to indicate the value. This data reduction step happens in the sensor: If the maximum exposure time at full frame rate is 16 ms, then the sensor needs to carry just 1 ms worth of data from the first wave of pixel readouts to the second and later waves... at most about 1/20 of the total number of pixels. That's 560 KB of state for an 8 MP chip. Since the chip is CMOS, that's a pretty reasonable amount of state to carry around.

Stay tuned for an even better place to stuff that 560 KB.