Ambivalent Engineer: ARGUS-IS

Here's an exotic high flying camera: ARGUS-IS. I've been trying to figure what these folks have been up to for years, and today I found an SPIE paper they published on the thing. What follows is a summary and my guesses for some of the undocumented details. [Updated 30-Jan-2013, to incorporate new info from a Nova documentary and a closer reading of the SPIE paper.]

Vexcel Ultracams also use four cameras
with interleaved sensors

It looks like BAE is the main contractor. They've subcontracted some of the software, probably the land station stuff, to ObjectVideo. BAE employs Yiannis Antoniades, who appears to be the main system architect. The lenses were subcontracted out to yet another unnamed vendor, and I suspect the electronics were too.
Field of view: 62 degrees. Implied by altitude 6 km and object diameter 7.2 km.
Image sensor: 4 cameras, each has 92 Aptina MT9P031 5 megapixel sensors. The paper has a typo claiming MT9P301, but no such sensor exists. The MT9P031 is a nice sensor, we used it on the R5 and R7 Street View cameras.

15 frame/second rate, 96 MHz pixel clock, 5 megapixels, 2592 x 1944.
It's easy to interface with, has high performance (quantum efficiency over 30%, 4 electrons readout noise, 7000 electrons well capacity), and is (or was) easy to buy from Digikey. (Try getting a Sony or Omnivision sensor in small quantities.)

Focal length: 85mm. Implied by 2.2 micron pixel, altitude of 6 km, GSD of 15 cm. Focal plane diameter is 102mm. The lens must resolve about 1.7 gigapixels. I must say that two separate calculations suggest that the focal length is actually 88mm, but I don't believe it, since they would have negative sensor overlap if they did that.
F/#: 3.5 to 4. There is talk of upgrading this system to 3.7 gigapixels, probably by upgrading the sensor to the Aptina MT9J003. An f/4.0 lens has an Airy disk diameter of 5.3 microns, and it's probably okay for the pixels to be 2.2 microns. But 1.66 micron pixels won't get much more information from an f/4.0 lens. So, either the lens is already faster than f/4.0, or they are going to upgrade the lens as well as the sensors.
The reason to use four cameras is the same as the Vexcel Ultracam XP: the array of sensors on the focal plane cannot cover the entire field of view of the lens. So, instead, they use a rectangular array of sensors, spaced closely enough so that the gaps between their active areas are smaller than the active areas. By the way guys (Vexcel and ObjectVideo), you don't need four cameras to do this problem, it can be solved with three (the patent just expired on 15-Jul-2012). You will still need to mount bare die.
The four cameras are pointed in exactly the same direction. Offsetting the lenses by one sensor's width reduces the required lens field of view by 2.86 degrees, to about 59 degrees. That's not much help. And, you have to deal with the nominal distortion between the lenses. Lining up the optical axes means the nominal distortion has no effect on alignment between sensors, which I'm sure is a relief.
The sensor pattern shown in the paper has 105 sensors per camera, and at one point they mention 398 total sensors. The first may be an earlier configuration and the latter is probably a typo. I think the correct number is 92 sensors per camera, 368 total. So I think the actual pattern is a 12x9 rectangular grid with 11.33mm x 8.50mm centers. 16 corner sensors (but not 4 in each corner) are missing from the 9x12=108 rectangle, to get to 92 sensors per focal plane. The smallest package that those sensors come in is 10mm x 10mm, which won't fit on the 8.5mm center-to-center spacing, so that implies they are mounting bare die to the focal plane structure.
They are carefully timing the rolling shutters of the sensors so that all the rolling shutters in each row are synchronized, and each row starts it's shutter right as the previous row finishes. This is important, because otherwise when the camera rotates around the optical axis they will get coverage gaps on the ground. I think there is a prior version of this camera called Gorgon Stare which didn't get this rolling shutter synchronization right, because there are reports of "floating black triangles" in the imagery, which is consistent with what you would see on the outside of the turn if all the rolling shutters were fired simultaneously while the camera was rotating. Even so, I'm disappointed that the section on electronics doesn't mention how they globally synchronize those rolling shutters, which can be an irritatingly difficult problem.
They are storing some of the data to laptop disk drives with 160 GB of storage. It appears they may have 32 of these drives, in which case they've got enough space to potentially store the entire video stream, but only with very lossy video compression. The design presented has only JPEG2000 (not video) compression, which will be good for stepping through the frames, but the compression ratio will be bulky enough that there is no way they are storing all the video.
They have 184 FPGAs at the focal plane for local sensor control, timestamping, and serialization of the data onto 3.3 Gb/s fiber optics. Supposedly the 3.3 Gb/s SerDes is on the FPGA, which sounds like a Virtex-5 20T. But something is odd here, because having the SerDes on the FPGA forces them to choose a fairly beefy FPGA, but then they hardly do anything with it: the document even suggests that multiplexing the two sensor data streams, as well as serialization of those streams, happens outside the FPGA (another typo?). So what's left for a Virtex-5 to do with a pair of sensors? For comparison, I paired one Spartan-3 3400A with each sensor in R7, and we were able to handle 15 fps compression as well as storage to and simultaneous retrieval from 32 GB of SLC flash, in that little FPGA. Maybe the SerDes is on some other device, and the FPGA is more of a PLD.
The data flows over fiber optics to a pile of 32 6U single board computers, each of which has two mezzanine cards with a Virtex 5 FPGA and two JPEG2000 compressors on it.

Now here's my critique of this system design:

They pushed a lot of complexity into the lens.

It's a wide angle, telecentric lens. Telecentric means the chief rays coming out the back, heading to the focal plane, are going straight back, even at the edges of the focal plane. Said another way, when you look in the lens from the back, the bright exit pupil that you see appears to be at infinity. Bending the light around to do that requires extra elements. This looks a lot like the lenses used on the Leica ADS40/ADS80, which are also wide angle telecentric designs. The Leica design is forced into a wide angle telecentric because they want consistent colors across the focal plane, and they use dichroic filters to make their colors. The ARGUS-IS doesn't need consistent color and doesn't use dichroics... they ended up with a telecentric lens because their focal plane is flat. More on that below.
The focal lengths and distortions between the four lenses must be matched very, very closely. The usual specification for a lens focal length is +/- 1% of nominal. If the ARGUS-IS lens were built like that, the image registration at the edge of field would vary by +/- 500 microns. If my guesses are right, the ARGUS-IS focal plane appears to have 35x50 microns of overlap, so the focal lengths of the four lenses will have to match to within +/- 0.07%. Wow.
"The lenses are athermalized through the choice of glasses and barrel materials to maintain optical resolution and focus over the operational temperature range." Uh, sure. The R7 StreetView rosette has 15 5 megapixel cameras. Those lenses are athermalized over a 40 C temperature range, and it was easy as pie. We just told Zemax a few temperature points, assumed an isothermal aluminum barrel, and a small tweak to the design got us there. But those pixels have a field of view of 430 microradians, compared to the pixels behind the ARGUS-IS lens, which have a 25 microradian PFOV. MIL-STD-810G, test 520.3, specifies -40 C to 54 C as a typical operating temperature range for an aircraft equipment bay. If they had anything like this temperature range specified, I would guess that this athermalization requirement (nearly 100 degrees!) came close to sinking the project. The paper mentions environmental control within the payload, so hopefully things aren't as bad as MIL-STD-810G.
The lenses have to be pressure compensated somehow, because the index of refraction of air changes significantly at lower pressures. This is really hard, since glasses, being less compressible than air, don't change their refractive indices as fast as air. I have no particularly good ideas how to do it, other than to relax the other requirements so that the lens guys have a fighting chance with this one. Maybe the camera can be specified to only focus properly over a restricted range of altitudes, like 4km to 8km. (ARGUS-IR specifies 0 to 10km. It's likely ARGUS-IS is the same, so no luck there.) Or maybe everything behind their big flat window is pressurized.

They made what I think is a classic system design mistake: they used FPGAs to glue together a bunch of specialized components (SerDes, JPEG compressors, single board computers), instead of simply getting the job done inside the FPGAs themselves. This stems from fear of the complexity of implementing things like compression. I've seen other folks do exactly the same thing. Oftentimes writing the interface to a off-the-shelf component, like a compressor or an encryption engine, is just as large as writing the equivalent functionality. They mention that each Virtex-5 on the SBC has two 0.6 watt JPEG2000 chips attached. It probably burns 200 mW just talking to those chips. It seems to me that Virtex could probably do JPEG2000 on 80 Mpix/s in less than 1.4 watts. Our Spartan-3 did DPCM on 90+ Mpix/s, along with a number of other things, all in less than 1 watt.
I think I remember reading that the original RFP for this system had the idea that it would store all the video shot while airborne, and allow the folks on the ground to peruse forward and backward in time. This is totally achievable, but not with limited power using an array of single-board PCs.

Let me explain how they ended up with a telecentric lens. A natural 85mm focal length lens would have an exit pupil 85mm from the center of the focal plane. Combine that with a flat focal plane and sensors that accept an f/1.8 beam cone (and no offset microlenses), and you get something like the following picture. The rectangle on the left is the lens, looking from the side. The left face of the right rectangle is the focal plane. The big triangle is the light cone from the exit pupil to a point at the edge of the focal plane, and the little triangle is the light cone that the sensor accepts. Note that the sensor won't accept the light from the exit pupil -- that's bad.

There are two ways to fix this problem. One way is to make the lens telecentric, which pushes the exit pupil infinitely far away from the focal plane. If you do that, the light cone from the exit pupil arrives everywhere with it's chief ray (center of the light cone) orthogonal to the focal plane. This is what ARGUS-IS and ADS-80 do.

The other way is to curve the focal plane (and rename it a Petzval surface to avoid the oxymoron of a curved focal plane). Your retina is curved behind the lens in your eye, for example. Cellphone camera designers are now looking at curving their focal planes, but it's pretty hard with one piece of silicon. The focal plane array in ARGUS-IS is made of many small sensors, so it can be piecewise curved. The sensors are 7.12 mm diagonally, and the sag of a 85 mm radius sphere across 7.12 mm is 74 microns. The +/- 9 micron focus budget won't allow that, so curving the ARGUS-IS focal plane isn't going to allow a natural exit pupil. The best you can do is curve the focal plane with a radius of 360 mm, getting 3.6 mm of sag, and push the exit pupil out to about 180 mm. It's generally going to be easier to design and build a lens with an exit pupil at 2x focal length rather than telecentric, but I don't know how much easier. Anyway, the result looks like this:

As I said, the ARGUS-IS designers didn't bother with this, but instead left the focal plane flat and pushed the exit pupil to infinity. It's a solution, but it's not the one I would have chosen.

Here's what I would have done to respond to the original RFP at the time. Note that I've given this about two hours' thought, so I might be off a bit:

I'd have the lenses and sensors sitting inside an airtight can with a thermoelectric cooler to a heat sink with a variable speed fan, and I'd use that control to hold the can interior to between 30 and 40 C (toward the top of the temperature range), or maybe even tighter. I might put a heater on the inside of the window with a thermostat to keep the inside surface isothermal to the lens. I know, you're thinking that a thermoelectric cooler is horribly inefficient, but they pump 3 watts for every watt consumed when you are pumping heat across a level. The reason for the thermoelectric heat pump isn't to get the sensor cold, it's to get tight control. The sensors burn about 600 mW each, so I'm pumping 250 watts outs with maybe 100 watts.
I'd use a few more sensors and get the sensor overlap up to 0.25mm, which means +/-0.5% focal length is acceptable. I designed R5 and R7 with too little overlap between sensors and regretted it when we went to volume production. (See Jason, you were right, I was wrong, and I've learned.)
Focal plane is 9 x 13 sensors on 10.9 x 8.1 mm centers. Total diameter: 105mm. This adds 32 sensors, so we're up to an even 400 sensors.
Exiting the back of the fine gimbal would be something like 100 flex circuits carrying the signals from the sensors.
Hook up each sensor to a Spartan-3A 3400. Nowadays I'd use an Aptina AR0330 connected to a Spartan-6, but back then the MT9P001 and Spartan-3A was a good choice.
I'd have each FPGA connected directly to 32GB of SLC flash in 8 TSOPs, and a 32-bit LPDDR DRAM, just like we did in R7. That's 5 bytes per pixel of memory bandwidth, which is plenty for video compression.
I'd connect a bunch of those FPGAs, let's say 8, to another FPGA which connects to gigabit ethernet, all on one board, just like we did in R7. This is a low power way to get connectivity to everything. I'd need 12 of those boards per focal plane. This all goes in the gimbal. The 48 boards, and their power and timing control are mounted to the coarse gimbal, and the lenses and sensors are mounted to the fine gimbal.
Since this is a military project, and goes on a helicopter, I would invoke my fear of connectors and vibration, and I'd have all 9 FPGAs, plus the 8 sensors, mounted on a single rigid/flex circuit. One end goes on the focal plane inside the fine gimbal and the other goes on the coarse gimbal, and in between it's flexible.
I'd connect all 52 boards together with a backplane that included a gigabit ethernet switch. No cables -- all the gigE runs are on 50 ohm differential pairs on the board. I'd run a single shielded CAT-6 to the chopper's avionics bay. No fiber optics. They're really neat, but power hungry. Maybe you are thinking that I'll never get 274 megabits/second for the Common Data Link through that single gigE. My experience is otherwise: FPGAs will happily run a gigE with minimum interpacket gap forever, without a hiccup. Cheap gigE switches can switch fine at full rate but have problems when they fill their buffers. These problems are fixed by having the FPGAs round-robin arbitrate between themselves with signals across that backplane. Voila, no bandwidth problem.
The local FPGA does real time video compression directly into the flash. The transmission compression target isn't all that incredible: 1 bit per pixel for video. That gets 63 channels of 640x400x15 frames/sec into 274 Mb/s. The flash should give 1 hour of storage at that rate. If we want 10 hours of storage, that's 0.1 bits/pixel, which will require more serious video compression. I think it's still doable in that FPGA, but it will be challenging. In a modern Spartan-6 this is duck soup.
The computer tells the local FPGAs how to configure the sensors, and what bits of video to retrieve. The FPGAs send the data to the computer, which gathers it up for the common data link and hands it off.
I'll make a guess of 2 watts per sensor+FPGA+flash, or 736 watts. Add the central computer and switch and we're at 1 kilowatt. Making the FPGAs work hard with 0.1 bit/pixel video compression might add another 400 watts, at most.
No SSDs, no RAID, no JPEG compression chips, no multiplexors, no fiber optic drivers, no high speed SerDes, no arrays of multicore X86 CPUs. That's easily half the electronics complexity, gone.

UPDATE 25-Jan-2013: Nova ran a program on 23-Jan-2013 (Rise of the Drones) which talks about ARGUS-IS. They present Yiannis Antoniades of BAE systems as the inventor, which suggests I have the relationship between BAE and ObjectVideo wrong in my description above. They also say something stupid about a million terabytes of data per mission, which is BS: if the camera runs for 16 hours the 368 sensors generate 2,000 terabytes of raw data.

They also say that the ARGUS-IS stores the entire flight's worth of data. I don't think they're doing that at 12 hertz, certainly not on 160 GB drives. They've got 32 laptop drives in the system (one per single board computer). If those store 300 GB apiece, that's 10 terabytes of total storage. 16 hours of storage would require 0.05 bits/pixel -- no way without actual video compression. The JPEG2000 compressor chips are more likely to deliver at best 0.2 bits/pixel, which means they might be storing one of every four frames.

UPDATE 27-Jan-2013: An alert reader (thanks mgc!) sent in this article from the April/May 2011 edition of Science and Technology Review, which is the Lawrence Livermore National Laboratory's own magazine. It has a bunch of helpful hints, including this non-color-balanced picture from ARGUS-IS which lets you see the 368 sensor array that they ended up with. It is indeed a 24 x 18 array with 16 sensors missing from each corner, just as I had hypothesized.

The article mentions something else as well: the Persistics software appears to do some kind of super-resolution by combining information from multiple video frames of the same nearly static scene. They didn't mention the other two big benefits of such a scheme: dynamic range improvement and noise reduction (hence better compression). With software like this, the system can benefit from increasing the focal plane to 3.8 gigapixels by using the new sensor with 1.66 micron pixels. As I said above, if the lens is f/3.5 to f/4.0 lens they won't get any more spatial frequency information out of it with the smaller pixels, but they will pick up phase information. Combine that with some smart super-resolution software and they ought to be able to find smaller details. Question though: why not just go to the MT9F002, which gives you 14 million 1.4 micron pixels? This is a really nice, fast sensor -- I've used it myself.

The article also mentions 1000:1 video compression. That's very good: for comparison, H.264 level 4 compresses 60 megapixels/second of HDTV into 20 megabits/second, which is 0.33 bits/pixel or 36:1 compression. This isn't a great comparison, though, because Persistics runs on almost completely static content and H.264 has to deal with action movie sequences. In any case, I think the Persistics compression is being used to archive ARGUS-IS flight data. I don't think they are using this compression in the aircraft.

Ambivalent Engineer

Monday, August 27, 2012

ARGUS-IS

1 comment:

About Me

Read This

Blog Archive

Better Blogs