Applet: Katie Dektar
Text: Marc Levoy
Technical assistance: Andrew Adams
One way to capture a wide-angle view of the world is to use a wide-angle lens. However, very wide-angle lenses are expensive, and no lens is wide enough to capture a full-surround (360-degree) view. But as most photographers know, you can instead capture a sequence of images, rotating the camera between views, then stitch them together using Photoshop or another tool.
The first step in creating a panorama is to capture a sequence of images while aiming the camera in different directions. To ensure you can stitch the images together, they should overlap by at least 25%. Equally importantly, for stitching to work correctly you must rotate the camera around its center of perspective (COP). This is the point where the lines of sight viewed by the camera converge. In the applet above, the COP is represented by a red ball at the center of the 3D view.
Where is the center of perspective in a camera? In a pinhole camera, it's at the pinhole. In an idealized thin lens, it's in the middle of the lens. In an idealized thick lens, it's at the first principal point, which is one of the so-called cardinal points of the lens. In a real photographic lens, it's in the center of the entrance pupil, which is the apparent position of the in-focus image of the aperture diaphragm as viewed from the object side of the lens. Typically, this image is buried somewhere in the lens assembly; exactly where depends on the design of the lens. Here's a web site that tells you how to find th center of perspective of a photographic lens, which they call the panoramic pivot point.
Note that the center of perspective is never at the tripod screw, which typically lies in the plane of the sensor. If you rotate your camera around the tripod screw, or any other point except the correct center of perspective, then objects close to the camera will shift position relative to their backgrounds from image to image in the sequence, as demonstrated on this web site. This shift is called parallax error. If you try stitching such a sequence of images together, you'll get ghosts (double images) in the resulting panorama. Of course, if there are no objects close to the camera, you're fine; you won't see any parallax errors.
The lens in your camera forms a perspective perspective view of the world on the camera's sensor, which thus plays the role of the picture plane. You can think of the world as being projected onto this plane along lines of sight that converge at the center of perspective.
As you rotate your camera around its center of perspective, its sensor also rotates. Hence, a sequence of images captured while rotating the camera horizontally (e.g. left-to-right) produces a collection of perspective views on picture planes that are rotated to different azimuthal directions. (Azimuthal angle means parallel to the horizon, as opposed to up-down.)
Such a sequence of perspective views is depicted in the applet above, for a camera that was rotated 360 degrees in ten steps (comprising nine pictures) while the photographer was standing in the middle of the old quadrangle at Stanford University. Each picture is texture mapped onto a plane hanging in 3D space. The orientation of each plane represents the orientation of the camera sensor when that picture was captured. If you spin the collection of planes around, you can find Stanford's Memorial Church, with the quadrangle's famous arcades extending away from it on both sides.
In order to display this collection of pictures as a single image on a flat display screen (or piece of paper), you need to map them somehow to a plane, which represents your screen or paper. Like making maps of the Earth, this is a standard problem in map projection, and there are standard solutions to it.
If your collection consists of only two or three pictures, representing a camera rotation of less than about 120 degrees, you can easily reproject them to a single plane, by tracing rays outward from the red ball through the 2-3 pictures to a plane placed outside the circle of images. Unfortunately, this simple method doesn't work if you have many pictures, especially if these pictures span more than 120 degrees. After all, where would you place the plane?
The solution to this problem is to project the pictures onto the surface of a cylinder, then unroll the cylinder to make a flat plane. This yields a single image you can display on a screen, or print on a piece of paper. To see how this works, click on the Project button in the applet above. You'll see red lines of sight extend from the center of projection (red ball) through each picture and continue into they strike the surface of a cylinder (gray mesh). This produces a distorted version of the picture, which appears affixed to the mesh, as well as at the right side of the applet (briefly).
Once all pictures have been projected in this way (hit Skip Animation if you're impatient), click on Blend to feather the pictures one into the other. This removes the obvious seams between pictures. If you now slit the mesh and unroll it, you produce an image (shown at the bottom) you could print. It contains distortions; in particular, straight lines in the scene are no longer straight in the panorama. But like a map projection of the Earth, it's the best we can do. And if your panorama is of a mountainscape instead of an architectural scene like Stanford's old quad, you may never notice the distortions.
Is there no hope for obtaining views from a cylindrical panorama that don't exhibit distortion? Well, there's one way out. Suppose the view you'd like to take (think of it as a virtual camera) lies at the original center of perspective (the red ball), but has a modest field of view, say 90 degrees.
There are two ways to extract such a view. First, you could simply clip out 1/4 of the 360-degree cylindrical panorama. Looking at only a small portion of the whole panorama, the distortions aren't so obvious. Second, you could roll the panorama back into a cylinder, treat it as a 3D scene, and form a new linear perspective view of this scene with a virtual camera having a 90-degree field of view.
To see this second solution in action, click on Reproject. You'll see red lines start at the cylinder surface and extend inwards toward the center of perspective. At an intermediate position the intersection of these lines of sight and a plane is computed, forming a perspective view. This view is a correct linear perspective of the quadrangle, under the (slightly unrealistic) assumption that the quadrangle is infinitely far away from the camera, like an astronomer's map of the stars. Look at the resulting picture. It represents only a portion of the original scene, although this portion spans more than one original picture, and it's undistorted. Straight lines in the original scene are straight in this reprojection.
When might you use such a planar reprojection of a cylindrical panorama? If you're interested in panning across the panorama, while looking at only a small portion of it at one moment. For an experiment viewer that does exactly this, check out Microsoft Research's HDView, which appeared as a paper at SIGGRAPH 2007.
To "reverse engineer" what Photoshop did, we viewed our 3D cylinder from somewhere along the cylinder's axis, then shifted this viewpoint up or down until the curved boundaries of the projected images (on the cylinder) became straight lines. When this happened, we knew we were at the correct station point. Then, standing at this point, we superimposed the original pictures, setting their common size arbitrarily, and we fiddled with the 3D locations of the four vertices of each one until they lined up (in our perspective view) with the corresponding vertices on the cylinder surface. When this happened, we knew the original picture's plane was correctly oriented. Looking at the 3D view, you can see that the station point is near the bottom of the cylinder, and the original pictures are tilted towards us, indicating that we were standing on the ground looking slightly upwards. Of course all of this could be computed rigorously use pose estimation methods from computer vision, but that's a different course...