Monday, August 27, 2012

ARGUS-IS

Here's an exotic high flying camera: ARGUS-IS.  I've been trying to figure what these folks have been up to for years, and today I found an SPIE paper they published on the thing.  What follows is a summary and my guesses for some of the undocumented details. [Updated 30-Jan-2013, to incorporate new info from a Nova documentary and a closer reading of the SPIE paper.]
Vexcel Ultracams also use four cameras
with interleaved sensors


  • It looks like BAE is the main contractor.  They've subcontracted some of the software, probably the land station stuff, to ObjectVideo.  BAE employs Yiannis Antoniades, who appears to be the main system architect.  The lenses were subcontracted out to yet another unnamed vendor, and I suspect the electronics were too.
  • Field of view: 62 degrees.  Implied by altitude 6 km and object diameter 7.2 km.
  • Image sensor: 4 cameras, each has 92 Aptina MT9P031 5 megapixel sensors.  The paper has a typo claiming MT9P301, but no such sensor exists.  The MT9P031 is a nice sensor, we used it on the R5 and R7 Street View cameras.
    • 15 frame/second rate, 96 MHz pixel clock, 5 megapixels, 2592 x 1944.
    • It's easy to interface with, has high performance (quantum efficiency over 30%, 4 electrons readout noise, 7000 electrons well capacity), and is (or was) easy to buy from Digikey.  (Try getting a Sony or Omnivision sensor in small quantities.)
  • Focal length: 85mm.  Implied by 2.2 micron pixel, altitude of 6 km, GSD of 15 cm.  Focal plane diameter is 102mm.  The lens must resolve about 1.7 gigapixels.  I must say that two separate calculations suggest that the focal length is actually 88mm, but I don't believe it, since they would have negative sensor overlap if they did that.
  • F/#: 3.5 to 4.  There is talk of upgrading this system to 3.7 gigapixels, probably by upgrading the sensor to the Aptina MT9J003.  An f/4.0 lens has an Airy disk diameter of 5.3 microns, and it's probably okay for the pixels to be 2.2 microns.  But 1.66 micron pixels won't get much more information from an f/4.0 lens.  So, either the lens is already faster than f/4.0, or they are going to upgrade the lens as well as the sensors.
  • The reason to use four cameras is the same as the Vexcel Ultracam XP: the array of sensors on the focal plane cannot cover the entire field of view of the lens.  So, instead, they use a rectangular array of sensors, spaced closely enough so that the gaps between their active areas are smaller than the active areas.  By the way guys (Vexcel and ObjectVideo), you don't need four cameras to do this problem, it can be solved with three (the patent just expired on 15-Jul-2012).  You will still need to mount bare die.
  • The four cameras are pointed in exactly the same direction.  Offsetting the lenses by one sensor's width reduces the required lens field of view by 2.86 degrees, to about 59 degrees.  That's not much help.  And, you have to deal with the nominal distortion between the lenses.  Lining up the optical axes means the nominal distortion has no effect on alignment between sensors, which I'm sure is a relief.
  • The sensor pattern shown in the paper has 105 sensors per camera, and at one point they mention 398 total sensors.  The first may be an earlier configuration and the latter is probably a typo.  I think the correct number is 92 sensors per camera, 368 total.  So I think the actual pattern is a 12x9 rectangular grid with 11.33mm x 8.50mm centers.  16 corner sensors (but not 4 in each corner) are missing from the 9x12=108 rectangle, to get to 92 sensors per focal plane.  The smallest package that those sensors come in is 10mm x 10mm, which won't fit on the 8.5mm center-to-center spacing, so that implies they are mounting bare die to the focal plane structure.
  • They are carefully timing the rolling shutters of the sensors so that all the rolling shutters in each row are synchronized, and each row starts it's shutter right as the previous row finishes.  This is important, because otherwise when the camera rotates around the optical axis they will get coverage gaps on the ground.  I think there is a prior version of this camera called Gorgon Stare which didn't get this rolling shutter synchronization right, because there are reports of "floating black triangles" in the imagery, which is consistent with what you would see on the outside of the turn if all the rolling shutters were fired simultaneously while the camera was rotating.  Even so, I'm disappointed that the section on electronics doesn't mention how they globally synchronize those rolling shutters, which can be an irritatingly difficult problem.
  • They are storing some of the data to laptop disk drives with 160 GB of storage.  It appears they may have 32 of these drives, in which case they've got enough space to potentially store the entire video stream, but only with very lossy video compression.  The design presented has only JPEG2000 (not video) compression, which will be good for stepping through the frames, but the compression ratio will be bulky enough that there is no way they are storing all the video.
  • They have 184 FPGAs at the focal plane for local sensor control, timestamping, and serialization of the data onto 3.3 Gb/s fiber optics.  Supposedly the 3.3 Gb/s SerDes is on the FPGA, which sounds like a Virtex-5 20T.  But something is odd here, because having the SerDes on the FPGA forces them to choose a fairly beefy FPGA, but then they hardly do anything with it: the document even suggests that multiplexing the two sensor data streams, as well as serialization of those streams, happens outside the FPGA (another typo?).  So what's left for a Virtex-5 to do with a pair of sensors?  For comparison, I paired one Spartan-3 3400A with each sensor in R7, and we were able to handle 15 fps compression as well as storage to and simultaneous retrieval from 32 GB of SLC flash, in that little FPGA.  Maybe the SerDes is on some other device, and the FPGA is more of a PLD.
  • The data flows over fiber optics to a pile of 32 6U single board computers, each of which has two mezzanine cards with a Virtex 5 FPGA and two JPEG2000 compressors on it.
Now here's my critique of this system design:
  • They pushed a lot of complexity into the lens.
    • It's a wide angle, telecentric lens.  Telecentric means the chief rays coming out the back, heading to the focal plane, are going straight back, even at the edges of the focal plane.  Said another way, when you look in the lens from the back, the bright exit pupil that you see appears to be at infinity.  Bending the light around to do that requires extra elements.  This looks a lot like the lenses used on the Leica ADS40/ADS80, which are also wide angle telecentric designs.  The Leica design is forced into a wide angle telecentric because they want consistent colors across the focal plane, and they use dichroic filters to make their colors.  The ARGUS-IS doesn't need consistent color and doesn't use dichroics... they ended up with a telecentric lens because their focal plane is flat.  More on that below.
    • The focal lengths and distortions between the four lenses must be matched very, very closely.  The usual specification for a lens focal length is +/- 1% of nominal.  If the ARGUS-IS lens were built like that, the image registration at the edge of field would vary by +/- 500 microns.  If my guesses are right, the ARGUS-IS focal plane appears to have 35x50 microns of overlap, so the focal lengths of the four lenses will have to match to within +/- 0.07%.  Wow.
    • "The lenses are athermalized through the choice of glasses and barrel materials to maintain optical resolution and focus over the operational temperature range."  Uh, sure.  The R7 StreetView rosette has 15 5 megapixel cameras.  Those lenses are athermalized over a 40 C temperature range, and it was easy as pie.  We just told Zemax a few temperature points, assumed an isothermal aluminum barrel, and a small tweak to the design got us there.  But those pixels have a field of view of 430 microradians, compared to the pixels behind the ARGUS-IS lens, which have a 25 microradian PFOV.  MIL-STD-810G, test 520.3, specifies -40 C to 54 C as a typical operating temperature range for an aircraft equipment bay.  If they had anything like this temperature range specified, I would guess that this athermalization requirement (nearly 100 degrees!) came close to sinking the project.  The paper mentions environmental control within the payload, so hopefully things aren't as bad as MIL-STD-810G.
    • The lenses have to be pressure compensated somehow, because the index of refraction of air changes significantly at lower pressures.  This is really hard, since glasses, being less compressible than air, don't change their refractive indices as fast as air.  I have no particularly good ideas how to do it, other than to relax the other requirements so that the lens guys have a fighting chance with this one.  Maybe the camera can be specified to only focus properly over a restricted range of altitudes, like 4km to 8km.  (ARGUS-IR specifies 0 to 10km.  It's likely ARGUS-IS is the same, so no luck there.)  Or maybe everything behind their big flat window is pressurized.
  • They made what I think is a classic system design mistake: they used FPGAs to glue together a bunch of specialized components (SerDes, JPEG compressors, single board computers), instead of simply getting the job done inside the FPGAs themselves.  This stems from fear of the complexity of implementing things like compression.  I've seen other folks do exactly the same thing.  Oftentimes writing the interface to a off-the-shelf component, like a compressor or an encryption engine, is just as large as writing the equivalent functionality.  They mention that each Virtex-5 on the SBC has two 0.6 watt JPEG2000 chips attached.  It probably burns 200 mW just talking to those chips.  It seems to me that Virtex could probably do JPEG2000 on 80 Mpix/s in less than 1.4 watts.  Our Spartan-3 did DPCM on 90+ Mpix/s, along with a number of other things, all in less than 1 watt.
  • I think I remember reading that the original RFP for this system had the idea that it would store all the video shot while airborne, and allow the folks on the ground to peruse forward and backward in time.  This is totally achievable, but not with limited power using an array of single-board PCs.
Let me explain how they ended up with a telecentric lens.  A natural 85mm focal length lens would have an exit pupil 85mm from the center of the focal plane.  Combine that with a flat focal plane and sensors that accept an f/1.8 beam cone (and no offset microlenses), and you get something like the following picture.  The rectangle on the left is the lens, looking from the side.  The left face of the right rectangle is the focal plane.  The big triangle is the light cone from the exit pupil to a point at the edge of the focal plane, and the little triangle is the light cone that the sensor accepts.  Note that the sensor won't accept the light from the exit pupil -- that's bad.


There are two ways to fix this problem.  One way is to make the lens telecentric, which pushes the exit pupil infinitely far away from the focal plane.  If you do that, the light cone from the exit pupil arrives everywhere with it's chief ray (center of the light cone) orthogonal to the focal plane.  This is what ARGUS-IS and ADS-80 do.

The other way is to curve the focal plane (and rename it a Petzval surface to avoid the oxymoron of a curved focal plane).  Your retina is curved behind the lens in your eye, for example.  Cellphone camera designers are now looking at curving their focal planes, but it's pretty hard with one piece of silicon.  The focal plane array in ARGUS-IS is made of many small sensors, so it can be piecewise curved.  The sensors are 7.12 mm diagonally, and the sag of a 85 mm radius sphere across 7.12 mm is 74 microns.  The +/- 9 micron focus budget won't allow that, so curving the ARGUS-IS focal plane isn't going to allow a natural exit pupil.  The best you can do is curve the focal plane with a radius of 360 mm, getting 3.6 mm of sag, and push the exit pupil out to about 180 mm.  It's generally going to be easier to design and build a lens with an exit pupil at 2x focal length rather than telecentric, but I don't know how much easier.  Anyway, the result looks like this:
As I said, the ARGUS-IS designers didn't bother with this, but instead left the focal plane flat and pushed the exit pupil to infinity.  It's a solution, but it's not the one I would have chosen.

Here's what I would have done to respond to the original RFP at the time.  Note that I've given this about two hours' thought, so I might be off a bit:
  • I'd have the lenses and sensors sitting inside an airtight can with a thermoelectric cooler to a heat sink with a variable speed fan, and I'd use that control to hold the can interior to between 30 and 40 C (toward the top of the temperature range), or maybe even tighter.  I might put a heater on the inside of the window with a thermostat to keep the inside surface isothermal to the lens.  I know, you're thinking that a thermoelectric cooler is horribly inefficient, but they pump 3 watts for every watt consumed when you are pumping heat across a level.  The reason for the thermoelectric heat pump isn't to get the sensor cold, it's to get tight control.  The sensors burn about 600 mW each, so I'm pumping 250 watts outs with maybe 100 watts.
  • I'd use a few more sensors and get the sensor overlap up to 0.25mm, which means +/-0.5% focal length is acceptable.  I designed R5 and R7 with too little overlap between sensors and regretted it when we went to volume production.  (See Jason, you were right, I was wrong, and I've learned.)
  • Focal plane is 9 x 13 sensors on 10.9 x 8.1 mm centers.  Total diameter: 105mm.  This adds 32 sensors, so we're up to an even 400 sensors.
  • Exiting the back of the fine gimbal would be something like 100 flex circuits carrying the signals from the sensors.
  • Hook up each sensor to a Spartan-3A 3400.  Nowadays I'd use an Aptina AR0330 connected to a Spartan-6, but back then the MT9P001 and Spartan-3A was a good choice.
  • I'd have each FPGA connected directly to 32GB of SLC flash in 8 TSOPs, and a 32-bit LPDDR DRAM, just like we did in R7.  That's 5 bytes per pixel of memory bandwidth, which is plenty for video compression.
  • I'd connect a bunch of those FPGAs, let's say 8, to another FPGA which connects to gigabit ethernet, all on one board, just like we did in R7.  This is a low power way to get connectivity to everything.  I'd need 12 of those boards per focal plane.  This all goes in the gimbal.  The 48 boards, and their power and timing control are mounted to the coarse gimbal, and the lenses and sensors are mounted to the fine gimbal.
  • Since this is a military project, and goes on a helicopter, I would invoke my fear of connectors and vibration, and I'd have all 9 FPGAs, plus the 8 sensors, mounted on a single rigid/flex circuit.  One end goes on the focal plane inside the fine gimbal and the other goes on the coarse gimbal, and in between it's flexible.
  • I'd connect all 52 boards together with a backplane that included a gigabit ethernet switch.  No cables -- all the gigE runs are on 50 ohm differential pairs on the board.  I'd run a single shielded CAT-6 to the chopper's avionics bay.  No fiber optics.  They're really neat, but power hungry.  Maybe you are thinking that I'll never get 274 megabits/second for the Common Data Link through that single gigE.  My experience is otherwise: FPGAs will happily run a gigE with minimum interpacket gap forever, without a hiccup.  Cheap gigE switches can switch fine at full rate but have problems when they fill their buffers.  These problems are fixed by having the FPGAs round-robin arbitrate between themselves with signals across that backplane.  Voila, no bandwidth problem.
  • The local FPGA does real time video compression directly into the flash.  The transmission compression target isn't all that incredible: 1 bit per pixel for video.  That gets 63 channels of 640x400x15 frames/sec into 274 Mb/s.  The flash should give 1 hour of storage at that rate.  If we want 10 hours of storage, that's 0.1 bits/pixel, which will require more serious video compression.  I think it's still doable in that FPGA, but it will be challenging.  In a modern Spartan-6 this is duck soup.
  • The computer tells the local FPGAs how to configure the sensors, and what bits of video to retrieve.  The FPGAs send the data to the computer, which gathers it up for the common data link and hands it off.
  • I'll make a guess of 2 watts per sensor+FPGA+flash, or 736 watts.  Add the central computer and switch and we're at 1 kilowatt.  Making the FPGAs work hard with 0.1 bit/pixel video compression might add another 400 watts, at most.
  • No SSDs, no RAID, no JPEG compression chips, no multiplexors, no fiber optic drivers, no high speed SerDes, no arrays of multicore X86 CPUs.  That's easily half the electronics complexity, gone.
UPDATE 25-Jan-2013: Nova ran a program on 23-Jan-2013 (Rise of the Drones) which talks about ARGUS-IS.  They present Yiannis Antoniades of BAE systems as the inventor, which suggests I have the relationship between BAE and ObjectVideo wrong in my description above.  They also say something stupid about a million terabytes of data per mission, which is BS: if the camera runs for 16 hours the 368 sensors generate 2,000 terabytes of raw data.

They also say that the ARGUS-IS stores the entire flight's worth of data.  I don't think they're doing that at 12 hertz, certainly not on 160 GB drives.  They've got 32 laptop drives in the system (one per single board computer).  If those store 300 GB apiece, that's 10 terabytes of total storage.  16 hours of storage would require 0.05 bits/pixel -- no way without actual video compression.  The JPEG2000 compressor chips are more likely to deliver at best 0.2 bits/pixel, which means they might be storing one of every four frames.

UPDATE 27-Jan-2013: An alert reader (thanks mgc!) sent in this article from the April/May 2011 edition of Science and Technology Review, which is the Lawrence Livermore National Laboratory's own magazine.  It has a bunch of helpful hints, including this non-color-balanced picture from ARGUS-IS which lets you see the 368 sensor array that they ended up with.  It is indeed a 24 x 18 array with 16 sensors missing from each corner, just as I had hypothesized.
The article mentions something else as well: the Persistics software appears to do some kind of super-resolution by combining information from multiple video frames of the same nearly static scene.  They didn't mention the other two big benefits of such a scheme: dynamic range improvement and noise reduction (hence better compression).  With software like this, the system can benefit from increasing the focal plane to 3.8 gigapixels by using the new sensor with 1.66 micron pixels.  As I said above, if the lens is f/3.5 to f/4.0 lens they won't get any more spatial frequency information out of it with the smaller pixels, but they will pick up phase information.  Combine that with some smart super-resolution software and they ought to be able to find smaller details.  Question though: why not just go to the MT9F002, which gives you 14 million 1.4 micron pixels?  This is a really nice, fast sensor -- I've used it myself.

The article also mentions 1000:1 video compression.  That's very good: for comparison, H.264 level 4 compresses 60 megapixels/second of HDTV into 20 megabits/second, which is 0.33 bits/pixel or 36:1 compression.  This isn't a great comparison, though, because Persistics runs on almost completely static content and H.264 has to deal with action movie sequences.  In any case, I think the Persistics compression is being used to archive ARGUS-IS flight data.  I don't think they are using this compression in the aircraft.

Monday, August 13, 2012

More vision at 360nm

I thought of two other consequences of birds, especially hawks, seeing ultraviolet light.

The first has to due with scattered light.  The nitrogen and oxygen molecules in the atmosphere act like little dipoles, scattering some of the light passing through.  The amount of light scattered increases as the fourth power of frequency (or inverse fourth power of wavelength).  The process is called Rayleigh scattering (yep, same guy as the last rule).

Because of that strong wavelength dependence, our blue-sensitive cones receive 3 times as much light scattered by nitrogen and oxygen as our red cones, and when we look up at a cloudless sky in the day, we see the sky is blue.  Here are the response curves for the three types of cones and the rods in human vision.  (That's right, you actually have four color vision, but you can only see blue-green (498nm) with your rods when it's too dim for your cones to see.)


But now imagine what a hawk sees.  It has another color channel at 360nm, which sees 6 times as much scattered light as red.  When looking up, the sky will appear more UV than blue.  But there is more to it than that.

Rayleigh scattering is not isotropic.  The dipoles scatter most strongly at right angles to the incoming light.  When you look up at the sky, the intensity of blue changes from near the sun to 90 degrees away.  It's a little hard to see because when looking up you also see Mie scattering which adds yellow light to the visible sky near the sun.  But when you look down, like a hawk does, Mie scattering isn't an issue (instead you have less air Rayleigh scattering and more ground signal).  The overall color gradient from Rayleigh scattering that a hawk sees looking down will be twice as strong as the color gradient we see, because the hawk sees in UV. In direct sunlight, the hawk has a measure of the sun's position whenever it is looking at the ground, even when there are no shadows to read.

The other consequence is axial chromatic aberration.  Most materials have dispersion, that is, they present a different index of refraction to different wavelengths of light.  One consequence of dispersion is that blue focusses in front of red (for lenses like the eye).  I used to think this was a bad thing.  But if your cones tend to absorb light of one color and pass light of the others, and your resolution is limited by the density of cones, a little chromatic aberration is a good thing, because it allows you to stack the UV cones in front of the blue cones in front of the green and red cones, and they all get light focussed from the same range.  I know that retinas have layered stacks of rods, cones, and ganglion cells, but I don't know that any animals, hawks in particular, have actually taken advantage of axial chromatic aberration to stack cones of different colors.  It's certainly something I'll be looking for now.

Sunday, August 12, 2012

Hawk eyed



The resolution of a good long-distance camera is limited by diffraction. The simple rule for this is the Rayleigh criterion:

Theta is the angular resolution, the smallest angular separation between two bright lines that can be differentiated by the system.

Lambda is the wavelength of light. You can see reds down at 700nm, and blues to 400nm. Interestingly, birds can see ultraviolet light at 360nm. The usual explanation for this is that some flowers have features only visible in ultraviolet, but I think raptors are using ultraviolet to improve their visual acuity.

D is the diameter of the pupil. Bigger pupils not only gather more light, but they also improve the diffraction limit of the optical system. The trouble with bigger pupils is that they make various optical aberrations, like spherical aberration, worse. These aberrations are typically minimized near the center of the optical axis and get much worse farther from the optical axis.  So a big pupil is good if you want a high-resolution fovea and are willing to settle for crummy resolution but good light gathering outside that fovea.  This is the tradeoff the human eye makes.

A human's eye has a pupil about 4mm across in bright light. According the Rayleigh criterion, human resolution at 550nm should be about 170 microradians. According to a Wikipedia article on the eye, humans can see up to 60 counts per degree, which corresponds to 290 microradians per line pair. That suggests the human eye is not diffraction limited, but rather limited by something else, such as a combination of focal length and the density of cone cells on the retina.

I wasn't able to find a good reference for the pupil diameter of a red-tailed hawk. Judging from various pictures, I'm guessing it could be smaller than a human pupil, since it appears that the hawk's eyeball is quite a bit smaller than the human eye (absolute scale). This doesn't seem good enough, since hawks are reputed to have fabulous vision. The first reference I found online suggested that hawks have visual acuity that's actually worse than humans.

Suppose that this last study was using paints that were undifferentiated in UV, in particular, around 360 nm. The researchers would not have noticed this. Suppose further that hawks are using 360 nm light for high acuity vision. The diffraction limit of a 4 mm aperture in 360 nm light is 110 microradians. This isn't 8 times better than human vision, but it is sufficient to distinguish two twigs 1 cm apart from 100 meters up.