Sunday, June 29, 2014

Early Days on Street View

A friend was asking about the early Street View timeline, which prompted a trip down memory lane.  It's been at least five years for most of this stuff, so if anyone has corrections I'm more than happy to apply them.

In 1978, DARPA funded MIT to produce the Aspen Movie Map, which was an interactive LaserDisc-based app for tooling around Aspen Colorado.

In 1996, the Clickwalk project (which I've just learned about thanks to a comment on this post) started in Norway.  By 1999 they had panoramic imagery online.  I'll add more info about this as I find it.

Amazon launched BlockView in January 2005. Because Amazon had no top-down maps interface, it was nearly impossible to navigate using BlockView and it remained a curiosity until it was canned September 2006.

In 2001, Larry Page shot a video from the side window of his car while driving around a few spots in the Bay Area.  His point was that Google had no way to index most of the stuff people interacted with every day.  He showed the video to Marc Levoy, a professor at Stanford, and got Marc some funding.  Marc started the Stanford Cityblock project.  This culminated in Augusto Roman's 2006 thesis "Multiperspective imaging for automated urban visualization". Augusto was producing image strips like this from shooting lots of pictures at the side of the street.


I was hired in June 2006 to work on a Google project which had grown out of the Stanford Cityblock project.  At the time I was hired, we had two copies of the first camera set, which I dubbed R1. These had been assembled by bolting five 11 megapixel CCD based book-scanning cameras (shown below) to a plywood board, and bolting that to the roof of a car, much of which was accomplished by Elliot Kroo when he was, if I'm not mistaken, 14 years old (youngest intern ever at Google). Neither R1 worked much, due to problems with the cameras, not Elliot!

R1 camera

That summer we put together R2, which repackaged R1's electronics and off-the-shelf Sigma 28mm DSLR lenses.  R2 had eight cameras, rather than five, for higher resolution, which turned out to exacerbate the camera problems we were already having with R1.  R2 looked a lot like a tank turret. The vertical slot side-facing port you can see below was for a 4 megapixel 200 frame-per-second camera that would produce the multiperspective strips from Augusto's thesis, and there was one on each side. The need to sink 1600 megapixels per second from those camera drove us to a very large and heavy disk array in the van, one of the many flaky pieces of hardware in this rig. The 8-lens pano camera set on top was a backup for a different user interface, in case the strip UI didn't work. There was also an upward-facing 3 megapixel CMOS camera, meant to complete a hemispherical panorama, but it never really worked.  We never figured out that the camera set wasn't watertight, because so many other things in the system were so broken.

R2 camera

The only watertight things in the rig were those Sick LMS291 laser scanners.  The name is German.

Sick LMS291 laser scanner

The multiperspective strip user interface wasn't very usable and the top pano cam became the focus of our attention.

In late 2006 we were shooting San Francisco with R2 and getting some horrible tearing and blooming artifacts. Whenever a an image of the sun was projected onto the sensor, the photocurrent from that image would overflow the pixel capacitors and flow into other places. Shiny surfaces like car windows and paint would lead to overflows into the readout circuitry, which led to streaks. If the sun was actually in the field of view, the photocurrent was so large it would locally raise the ground voltage in that portion of the sensor. Locally, the CCD would stop shifting, leading to the tearing effect.

We didn't know it at the time, but the problems we were having were essentially due to the high resolution we were trying to shoot. High resolution from a moving platform leads to short exposure times. Short exposure times require large relative apertures to get enough sensitivity. Compared to small format cameras, our focal lengths were fairly long (28 mm), so that meant much larger apertures, much larger photocurrents, and thus the problems we were fighting. In hindsight, the two real solutions were lower resolution or CMOS sensors. Rather than try those, I tried to make the high resolution CCD work with shutters (the R3 camera) and choppers (R4).  Oops.

Anthony Levandowski


Sometime in 2006, Sebastian Thrun started VueTool with six Stanford students that had previously worked on the DARPA/Stanford self-driving car project, one of whom was Anthony Levandowski (later major player on the Google self-driving car project, then stole the tech and took it to Uber, then pardoned by Trump on the last day of his first term). Sebastian had been one of Augusto's thesis advisors, and I think he was a bit exasperated by the slow progress at Google. His team built a far lighter rooftop rig from a Point Grey Ladybug2 camera and three LIDAR scanners, and stitched it together with 80/20 aluminum extrusions. Operating at much lower resolution, the camera did not have the worst artifacts. They had the rig working, on the road, in months, and had a very slick UI with movie-like transitions between navigation points. In early 2007, Google purchased VueTool and most of the existing StreetView team was refocussed on deploying the VueTool rig. By October, we'd shot 1 million miles, at which time dew and rain filled and quickly killed the LadyBug2 cameras, which had been designed for indoor use.

LadyBug2 camera

In May 2007, Street View launched with R2 imagery and imagery purchased from Immersive Media. The VueTool rig was such a success that by the end of the year, most if not all of the launched imagery was from that.

In the last few months of 2007, we knew that the LadyBug2 was not going to work long-term, so we needed another camera for the 2008 driving season. Point Grey was promising the LadyBug3, which would be watertight. I was promising R3 or R4 (and fighting with shutter reliability problems in the lab).  Jason Holt on the Google Geo cam team put together 9 5-megapixel CMOS cameras (as opposed the CCD cameras used by R2 and LadyBug2) and called it R5. The race was on. We dropped R3 & R4, and the camera team went nuts that winter getting R5 to work (and be watertight)! I remember assembling the first R5 to use my custom lenses in February 2008 and driving San Francisco... the day before Jason, in Europe, got the first French imagery from an R5 with off-the-shelf lenses. I was still pretty sore from eating crow on the R3/R4 decision, and I wanted to show at least my lenses worked well.

R5 Camera

LadyBug3 ended up over a year late. R5 had a rough start in Europe, barely making an imagery launch just before the 2008 Tour de France. But it worked.

2008 was a great year for the team, as we had a working rig and strong demand. We spent a lot of time on the assembly line getting the camera yields up and improving focus quality. At the same time, the ops guys were gobbling larger and larger hunks of ground. At each launch, we'd see major traffic blips as the first news reports would let people know they had local imagery.

When an average person discovers that his house is online, he tells a few friends, and some number of them discover their houses are online too. When that number is less than one, which was the case for nearly all our launches, we would see a burst of traffic which was some multiple of the initial population that found out about the new coverage through news reports, etc.

However, when that number is more than one, the discovery ripples through the entire networked population. How big is that network? We found out in August 2008, when we launched all of Australia (and a bit of Japan) in one go. We hit critical mass, and the network effect was insane. I don't remember the exact number, but at one point something like a few percent of the entire Australian population, as well as their overseas friends, were actively clicking on Street View. The traffic spike broke all sorts of things, including the help button on Maps. The next day, I remember seeing one email thread from seriously angry SREs which was getting nasty messages faster than I could read them. SREs are the folks who keep google.com running; they are very smart and enjoy excellent management support -- you do not mess with SREs. So it was important not to be seen high-fiving one another or sporting winning smirks. Thereafter we were required to have very conservative and very expensive budgets for launch traffic spikes.

Fresh from their 2007 success, in 2008 Sebastian Thrun and several members of the former VueTool team pointed out that (a) Google now had all the data sources necessary to make its own street vector database, something it had previously licensed from others, particularly TeleAtlas, and (b) those licenses were very expensive. The inevitable project to build this massive database was known as Ground Truth. By year end, not only did the Street View team have a working product, but we had an internal customer laying down a tight schedule to shoot the entire world. This was like spraying lighter fluid on an already-burning heap of charcoal.

Right in the middle of this huge burst of activity, a wrench went into the works.  Over the winter of 2009 and into 2010 we had fleet stoppages due to lawsuits over collection of WiFi packet payloads. Cellphones use WiFi hotspots, even ones which are encrypted, to figure out where they are, because the stronger signal is easier to pick up and lower power to process than GPS. This works even without a password, because all that is necessary is the station ID (from the unencrypted header), signal strength... and the station location. You can get this data from SkyHook Wireless, but once again Google wanted it's own database. So in theory, we should have just grabbed the header. However, a single wrong line in the configuration file caused us to grab the packet payload too. Had we ever audited the data on the disks, we would have found the bug and fixed it, but... nobody ever noticed. Oops.

R7 camera

Quick aside: the field R5s were slowly accumulating moisture, had 1-5 milliseconds of jitter between cameras, had autoexposure issues, and had a low-resolution upward-facing fisheye that led to an ugly resolution change whenever the user looked up at all.  My boss, Chris Uhlik, wanted me to design our own boards rather than rely on the Elphel electronics we had been using, so that we could kill the synchronization problems and bring the electronic reliability issues in-house where we could apply whatever resources we wanted.  So in the summer of 2008, I stole an intern, George Hotz, from Diego Ruspini (who chose the wrong week to go on paternity leave) and started work on R7, shown above.  George, a.k.a. Geohot, was the same guy who had rooted several generations of iPhones, and later went on to become a rap star.  George and I (but mostly George) prototyped the R7 in 2008, and got some basic bits working by the end of the year.  This was my first board level design and first FPGA design, and it showed: the power system and signal integrity were a joke, and (in retrospect this is all so clear) power glitches were causing unending flaky behavior.  I spent the spring of 2009 redesigning the boards.  During this stage Alessandro Temil transferred into our team and we finally had the engineering critical mass to get the thing done.

George Hotz

It was a good thing too.  Sometime in the summer of 2009 Julie Kim (ops) took my Real Soon Now promises too seriously and filled the basement garage of Google's new offices with Subaru Imprezas waiting for new R7 cameras... which weren't working yet.  Once again, getting the thing watertight was proving to be a problem, and the external baffle (the big red ball) was warping while being welded, which led to some of the overlap areas between cameras not getting sampled at all. The WiFi standdown relaxed pressure on the camera team and let us get the assembly line working better.  We finally got the thing into production during the early part of the 2010 driving season. And amid all the secrecy, a bunch of Googlers actually published an article featuring a teardown of the R7 -- without telling me ahead of time!

As I was the tech lead on this camera, and as the details have been published, allow me to brag a bit. For the first time, the circuit boards and firmware were an entirely internal development. The R7 eliminated essentially all of the major problems with earlier cameras: the individual cameras were synchronized to nanoseconds, the auto-exposure was fast and predictable (and it could shoot HDR if needed), it shot the same resolution in all directions, the relative unit camera orientations were very stable, and it was really watertight. It has a 480GB internal FIFO (flash) that lets the camera outpace the disk drive in the car for long periods if needed. I felt this camera is what I had been hired by Google to do. It still had CMOS rolling shutters, which in combination with vehicle movement make for difficult stitching problems.


The R7 is what Google is using today. I expect they'll be using it for quite some time, as advances in sensors haven't really fixed the problem of high resolution from moving platforms. If a CMOS sensor with a global shutter (no rolling shutter) comes out, they might do a new camera based on that, but I'd be very careful to verify that it works when pointed right at the sun first, as you need to read those pixels near the sun before they've marinated in excess photocurrent for too long.

Postscript

Microsoft chose a different path for Bing Maps, partly funding and contributing to the OpenStreetMap project, which is an open-source alternative to the Google Maps street database. To my knowledge, their imagery collection cars do not directly feed into this project, in the sense of having budgets and deadlines. As a result, their StreetSide project has no revenue source to drive it.

Navteq and TeleAtlas continue to drive the world's streets with cars similar to Google's, but at a somewhat smaller scale, and the imagery from those collections is not published that I know of. One wonders why this imagery doesn't end up at Microsoft, as the oblique imagery from Pictometry does.

The success of Ground Truth cemented Sebastian Thrun's position at Google and made it possible for him to get serious long-term funding for the self-driving car project, which eventually became Chauffeur and then Waymo.

I went on to develop aerial cameras for Google, but that story will have to wait for Google to publish some details.