Video Panoramas

People

Proposal

Objectives

Display an online tour of Stanford campus on DTV with a full surround view and audio.
Provide panoramas (a la Quicktime VR) but time varying and moving over time.
Allow viewer to interact with the content and steer the viewing experience.

Overview

Capture surround view using 8 cameras mounted on a golf cart
MPEG-2 encode and multiplex 8 separate video streams into a single ATSC DTV stream
Receive DTV stream on PC and forward across 100 Mbit Ethernet to multiprocessor machine
Demultiplex and decode 8 video streams in parallel on 8 CPUs
Video-Panorama viewer requests required images and displays them on textured polygons
Position and orientation of textured polygons can be adjusted in order to stitch images seamlessly
Display campus map along with panorama view and allow navigation through map
Maintain audio along the way for running commentary
Planned extension: Combine with Virtual VCR so that viewer can move in time (watch some object longer than guide)

Description

In the late 18th and 19th century, panoramas were a popular form of mass entertainment. Panoramas were invented by Robert Barker in 1792, as a means to show complete 360 degree views of interesting environments (such as landscapes) to the interested and paying public . Panoramas in these days were created in specially build circular buildings with a small viewing platform in the middle and a painting on a large cylindrical canvas around it. These panorama canvases reached heights of up to 18m and measured up to 130m in circumference. In order to achieve the correct perspective view, the painting (which often took months to complete) had to be warped accordingly (see Figure 1).

Figure 1: Interior view of a panorama shortly before completion of the painting [S. Oettermann, The Panorama, Zone Books, p.56]

Due to the large highly realistic drawings at a large distance the viewer, limited vertical viewing, and limited movements on the viewing platform, the human senses were tricked into believing they were seeing the real landscape. This experience was often enhanced by placing real 3D objects in front of the image that merged into the background (providing otherwise non-existent parallax information) and by adjusting the lighting conditions accordingly. In that sense, panoramas provided a early form of virtual reality experience and caused dramatic responses by the viewers of its time.

Animated ``video panoramas'' were first created by Raoul Brimoin-Sanson (Cineoramas) in 1897 and are still used today, e.g. in Disneys CircleVision installations. 3D IMAX and OMNIMAX are limited forms of panoramas but also include stereo effects. Panoramas recently got a new push in the context of image-based rendering by the work of Chen and Williams . Here, a cylindrical panorama is created and a small view can be displayed and moved interactively on a monitor, providing some sense of the surround view of a real panorama. These panoramas, however, are limited to static views points.

Using the infrastructure of DTV, we decided to tackle the problem of video panoramas by providing the viewer with an interactive view of a moving panorama. The context of this application would be a guided tour of the Stanford campus. At each instant the video panorama would provide a 360 degree surround view from the current location. As the location or surroundings changes, new panoramas become visible. The user should also be able to interactively move the current view within the full panorama, so he can be watching a particular object longer than the guide anticipated or he can simply enjoy the surrounding campus.

Figure 2: The camera rig used for capturing the video panoramas.

In order to create the 360 degree panorama we had to capture live video from eight CCD cameras arranged on a circular platform (see Figure 2). In order to manage the immense video bandwidth of roughly 240 Mbit, two quad video mergers are used that combined the analog signals from the eight cameras into two NTSC video signals, each divided into quadrants showing one of the cameras. These two video signals are captured at 752x480 resolution using Motion-JPEG capture boards on two separate PCs. Finally, the eight video streams are extracted from the captured footage, encoded using MPEG-2, and multiplexed into a single ATSC DTV channel.

Figure 3: The arrangements of the polygons on to which the eight DTV streams get projected. By adjusting the position, orientation, and brightness of the polygons, we calibrate the panorama to show a seamless surround view.

On the receiver side, a pipeline of programs work together to display this transport stream. The DTV stream is received on a PC, where it is retransmitted over 100 Mbps Ethernet to an SGI Onyx 2 with eight CPUs. The standard reference MPEG-2 decoder software has been modified so that eight copies can be run synchronously delivering frames in the YUV format. A master process controls the eight decoders, and requests the frames that are needed to construct the current view. These frames are then passed on to the panorama viewer. We have chosen to always decode all eight video streams due to the startup delay of about half a seconds when decoding MPEG-2 streams with large GOPs (group of pictures). This delay would have severely restricted the speed at which viewers could have changed their view.

The viewer program is an OpenGL application that texture maps the video frames on to eight rectangular polygons which form an overlapping octagon around the virtual camera (see Figure 3). The spherical aberration of the capture cameras is corrected by suitable warping of the input video on to each display polygon. During display, the viewer application determines the frames that are needed based on the current view direction and requests them from the MPEG-2 decoder system. The images are loaded as OpenGL textures, and a color transformation matrix is used to convert them from YUV format into RGB in hardware. For a given camera rig, the exact arrangement of the display polygons needs to be adjusted so that the frames appear stitched together into a seamless panorama. Currently, this is accomplished by manually adjusting their positions, but automatic calibration is planned for a future version of the application.

Figure 4: This image shows three frames from the video panorama project. The video panorama consists of a tour of Stanford campus. At each point in time a full 360 degree panorama is received and the user is free to browse around, but a default view is provided. The display also shows a map of the campus with a pointer indicating the current position. By clicking on a location on the map the current viewing direction rotates and locks onto this feature allowing a user to watch it while the tour continues.

In addition to the basic panorama viewer, several additional features were added to make the video panorama more useful (see Figure 4 and the video clip (~90 MB, MPEG2)). Since the captured material was a tour of Stanford campus, a campus map is displayed in addition to the video panorama window. This map displays the position and direction of the current view. The user can point on the map to indicate a new viewing direction, e.g. toward a certain object, which is then tracked. This is useful for getting ones bearings as the tour progresses. Finally, on command or in cases when the user stops interactions, the view drifts back towards a predefined view sequence chosen by the tour guide. Thus, lazy viewers are guaranteed to be looking in the direction of the object currently being discussed by the guide on the associated audio track. At any time, the user has the freedom to override the tour guide or again join his tour, very much like in reality. Combining the current application with the Virtual VCR support would also allow a user to stay behind and enjoy a particular view before catching up or resuming the tour. This is planned as a future extension.

Last Updated: Feb. 25, 1999, slusallek@graphics.stanford.edu