Telepresence Information


Overview

With increasing bandwidth, memory, and processor speeds, it is becoming easier to feel present at remote locations in a more meaningful sense.  This ability to "feel" like you are someplace other than your physical location is known as "telepresence."  There are a wide range of ways to be telepresent, and different places to which one would want to be present.  For example, two friends could share a conversation in a 3-d model of a cafe overlooking the rings of Saturn, or perhaps two executives have a video conference, each being somehow present in the others location.  The goal of this page is to investigate what research is currently being done on telepresence, and to give some firmer definition to what research is of interest to us.  Specifically, the main interest is to look into and determine the feasibility of using telepresence to enhance digital television broadcasts.  The emphasis is on the broadcasting of real and augmented reality scenes, as opposed to completely virtual ones, is of the most interest.  In addition this page also contains some information on related topics, such as MPEG.

Index


What is Telepresence?

This section contains general observations made about Telepresence and its related fields during the collection of the material in this note. Table 1 above gives rise to several observations:

Problem Specification

Although Telepresence is the general area being addressed, the actual focus for us is narrower.  Specifically, we are interested in what Table 1 classifies as "multi-user reality or augmented reality telepresence with viewing interactivity."  In simpler terms, the interest is in broadcast telepresence or real locations, possibly with supplemental information.  Details of this focus, and related ideas are maintained in this section. The most important question, and the one that drives the answers to many of the other points, is what benefit this "telepresence" or 3-D TV offer over traditional programming.  Below is a list of types of programming that would probably benefit, and also programs that probably wouldn't benefit:

Panoramic Presentations

The previous section gave three problems that needed to be solved for broadcast telepresence: what to broadcast, how to record it, and how to reconstruct it.  This section proposes an idea for what to be broadcast, the problem that needs to be solved before the other two.  Two things need to be found to answer the question of what to broadcast: The solution proposed here is that of a "Panoramic Presentation".  The basic idea is that instead of broadcasting just a flat 2-d view of the scene being portrayed, a complete 3-d local environment is being transmitted.  This gives the viewer the ability to look around themselves and get a better appreciation of the location, hopefully allowing them to feel more like they are actually there.  In addition, the viewer would be allowed to change their position within some fixed boundary, or movement box.  As time goes on in the show, the local area will change, either as the viewers position changes (as in a tour or a nature show), or as actors enter and depart (as in a sit-com or play).

There is a problem with such a concept in that the viewer may get lost looking in some funny direction and miss something important.  To avoid such a problem, preassigned viewpoints, or items to look at, and positions from which to look can be defined, including a main viewpoint and position which are defaults.  This way, if the viewer does nothing at all, they have the same experience as watching a standard show (with the possible benefit of depth information).  This brings up a new problem, which is how to switch viewpoints and positions.  One option, would be to allow users to toggle through a set of fixed viewpoints, which translates to just broadcasting several 2-d views of the scene.  The next option, similar to that given by Quicktime VR, is to jump from position to position, but allow viewers to look around at each location.  This still allows the user to get lost looking in some strange direction, though.  A final option is to have each of the viewpoints, and viewing positions serve as a sort of gravity well, such that as the viewer looks and moves around they are naturally attracted to the predefined viewpoints and positions.  Figure 1 below shows how all of this might work.
 

(a) Panoramic Presentation of a Fixed Set Show (ie. Sitcom)
(b) Panoramic Presentation of a Show with Changing Environment (ie. Tour or nature show)

Figure 1- Examples of Scene Gravity for Different Shows

Figure 1a shows a typical fixed set presentation, while Figure 1b shows a scene with movement.  Here are the key features common to each: In both of the types of presentation, viewpoints and positions are defined.  The viewer may change position and view at will as long as they stay within the movement box.  As they move, their point of view will be drawn towards the predefined viewpoints, accelerating as they come nearer to them.  This mimics the way our eyes are naturally drawn to certain items.  Similarly, certain positions will be attractive.  This allows the viewer to roam around some if they wish, but makes it easy to be drawn back to the most interesting (or guaranteed interesting) viewpoints.  Gravity of course could be adjusted to be stronger or weaker.  As gravity becomes very high, the scene becomes one of multiple broadcast views, and as gravity becomes very weak complete freedom of motion is granted.

Of the two types of panoramic presentation shown in Figure 1, the static scene in Figure 1a is simpler.  In that case there is a fixed set which it would be possible to cache on the viewers machine.  The viewpoints, positions and movement box could change over time, but are constrained by the fixed nature of the scene.  The situation in Figure 1b is more complicated in that everything changes as time goes on.  Imagine a nature show of swimming through the Great Barrier reef.  The movement region in this case is a volume which moves as time progresses and the viewer is guided along the reef.  The positions all move as time goes by, and the viewpoints may change to various fish and coral.  The viewer may decide to continue looking at a fish after it is described, and then allow themselves to be swept forward as they reach the back of the viewing region, and have their viewpoint drawn to some new object.  Of course shows need not be entirely fixed, or entirely dynamic.  The types can be intermixed or interleaved at will, and the examples are given as the two extremes.

As shown in Figure 1, both primary and secondary positions and viewpoints could be sent to the viewer at any given time.  These will of course change over time as a show progresses.  The main viewpoint and position over time define the standard method of watching the show as it would be seen on conventional television.  This idea could be extended by having several different sets of time varying positions and viewpoints which extend throughout the show.  Instead of constantly choosing viewpoints and positions as the show progresses, the viewer could lock in one particular "story-line" at the beginning of the show, and follow the action from that sequence of positions and viewpoints.  This would allow a viewer who likes one character better than most to choose a "story-line" that emphasizes that particular character.  It could also be set up that there are several different "directors" for a given show, each of whom comes up with their own "story-line."  The viewer could then choose which "director's" story-line choice to follow through the show.  Eventually, as homes become better networked, viewers could exchange "story-lines", or choose another viewer to be the active one who controls their viewing perspective through the show.

Two questions were posed at the beginning of this section, the first was what could be broadcast, and that has been discussed throughout the section.  The second was what the benefit was, and was it actually worse than regular TV.  It is fairly easy to see that there is no loss over regular viewing in this scheme.  If you just sit back and view your viewpoint and position will be changed automatically for you.  It is also difficult to accidentally shift your view, since gravity will tend to bring you toward a desirable view.  It seems pretty clear that there are also advantages over regular TV since each viewer has the choice of looking at what is most interesting to them, and can choose to linger on certain items, or dash ahead to see what is coming next.

Before ending this section, a brief mention of how this might actually be displayed to the user is appropriate.  The panoramic presentation method presented here is scalable, and could be displayed at least in the following ways:


A Scenario for the Near Future

The previous section outlined the type of scene information that could be transmitted in a broadcast telepresence situation-- a panoramic presentation.  This section presents some ideas for a prototype of a panoramic presentation transmission and reception system that could be constructed within the next year or so.  It goes into some specifics on the entire process including capturing/recording the scene, compression and data segmentation for transmission, transmission, decompression and viewing and interaction by the user.  


MPEG Information

The broadcast channels we are looking at are transmitted using the MPEG-2 Transport Protocol, and would normally be a MPEG-2 encoded television program.  On account of this it is helpful to have some idea of how the MPEG standards fit together:

 

General Notes

This is stuff that doesn't fit well in other sections.


Telepresence Resources

Web Resources

Papers

  1. H. Fuchs, Bishop, G., Arthur, K., McMillan, L., Bajcsy, R., Wook Lee, S., Farid, H., and Kanade, T.,

  2. Visual Space Teleconferencing Using a Sea of Cameras, Proceedings of the First International
    Symposium on Medical Robotics and Computer Assisted Surgery, Vol. 2, Pittsburgh, PA, September
    22-24, 1994 - This paper presents an idea for creating a room with a "sea of cameras" which could be used to allow remote users to navigate the room with a "virtual camera."  The authors then describe an improved algorithm for wide-base line stereo that can acquire a depth map using multiple cameras along a single-baseline.  Some preliminary work is presented demonstrating the effectiveness of the their algorithm, and they propose creating a real-time camera to capture RGB-Z images this way (now complete- see Video-Rate Stereo Machine).  Although the authors propose that using this technique and a "sea of cameras" it would be possible to allow users complete freedom in a room, they provide no specifics as to how the multiple depth maps could be combined to create one 3-d scene, or how to warp/switch between depth map scenes at different locations.

Related Topics Resources

Web Resources

Papers


Brad Johanson