Some might have seen the videos of Microsoft PhotoSynth. That of course makes many people go all “ooooh, must, have!”.

Actually, the videos from University of Washingtons PhotoTour are much more impressive. And it actually tells you how it works. Read the PDF of the paper published at this years SigGRAPH, also linked from that site, for more details.

The Microsoft video only shows the UI; while the Washington video gives a bit more information, but still doesn’t really talk about the “backend requirements”. You can get some facts from the published paper, though.

Basically it’s a smart combination of existing technologies, together with some good optimization (I guess). It uses SIFT feature extraction and matching (use also by panotools: autopano-sift, but unfortunately this is patent encumbered in the US; I doubt it’s patentable in Germany because of it’s very mathematic nature) and basically does the same as panorama tools, except in 3D - using “Structure from Motion” approaches.

So you might wonder why I still call this vapourware, when there is a sigGRAPH paper backing it, and an interesting demo?

Well, in the paper they mentioned their test machine to be a 3.4 GHz computer. And that the Notre Dame photoset took two week to process, and that 597 of the 2635 images were actually placed; the others maybe had too much other stuff on them, or didn’t match properly.

This yields the impression that for this to work you need

  1. lots of CPU time
  2. lots of well-suited images (clean projection, not too many moving people on it etc.)
  3. no gaps inbetween pictures, but a complete covering

I don’t want to talk down the scientific work done here, but the Microsoft annoucement makes it sound (well, just look at who they imply in the video that their users are) like it would be a product for everyone to use within a year. It won’t. Thats why I call it vapourware.

It’s good research, but not a realistic product.

It will be very interesting for professionals, however. They can set their camera to a well-behaved angle (no fisheye; I don’t expect the software to like fisheye artifacts much) and take tons of photos, trying to cover all of a building. The software will then eventually allow them to construct and texture a 3D model, with some weeks of computation.

The viewer is still interesting, for data sets computed by someone with access to excess CPU power. E.g. for a company to offer virtual sightseeing.