Experiments in Digital Television Project Ideas

The main goal of CS448a is to experiment with different concepts for future television systems. The goal of the project is to prototype such a system. Each project should be interesting at two-levels. At one level it should demonstrate a compelling application of the new technology. At another level, it should involve new technology and/or the development of new algorithms. Each project will be done by a team of 3-5 students. We expect each project be subdivided into multiple smaller units that are "owned" by individual team members. Projects teams should be diverse; try to find a mixture of people with complementary talents and skills.

The first step will be to write a project proposal using the CS448a Project Proposal Template. The proposal itself should be written in html and placed on a publically accessible web site. Project proposals are due on Friday, Feb 4th. During the week of Feb 6th, each project team will be expected to present an overview of their project plan. Students in the course will critique the project and provide constructive suggestions about how to improve it. After that critique, and after some initial experiments, teams will be asked for a final set of deliverables.

Here is the list of ideas for possible projects. This list is meant to be suggestive, so feel free to propose a completely different idea or to modify these suggestions to suite your interests.

Project Ideas

Digital Michelangelo Documentary Concept Pilot
Enhanced Teleconferencing System for Linking Classrooms
Lecture of the Future
Video Panoramas
Linux OpenSource DTV Receiver
User Interfaces for Interactive Television
Personal Video Database and Channel Browser
Augmented Sportcast

Digital Michelangelo Documentary Concept Pilot

Here is a project idea from Marc Levoy:

We came back from Italy with a variety of ideas for making PBS-style-documentaries about the sculptures of Michelangelo, either with or without digital enhancement.

First, some documentary story ideas. Imagine a dark room, perhaps a sculptor's studio. The unfinished St. Matthew stands alone on the dusty floor. We hear Michelangelo reading from his own letters about problems with the Pope, problems finding enough good marble for the twelve apostles (of which St. Matthew is the first), problems finding the time to work in peace. The camera begins to move around the statue; shafts of light play across its face. Now we hear Michelangelo's biographer Vasari describing the pivotal role the St. Matthew played in the artist's career; with this statue Michelangelo introduces the multiple torsions that characterize his mature work. Suddenly, St. Matthew lurches forward along the lines of torsion, trying to tear himself free of the block. It lasts only a second; by the time we notice it, the statue is frozen again. Did it really move, or was it a trick played by the shadows?

Here's another story idea. A giant rectangular block of marble stands in a corner of the Florence cathedral workshop. The block is flawed: it has obvious veins, and it has been partially carved. We hear the board of the cathedral arguing among themselves about whether they should grant Michelangelo's request to carve the block. As they argue, we see the David gradually appear, shimmering in the center of the block. Moving around the block, we see how Michelangelo shifted the figure inside the block, gently splitting its legs, to avoid the flaws.

Now the technology. We've talked about making dual-channel enhanced documentaries - a conventional documentary for homes with only analog TV sets, and an "enhancement layer" for homes with DTVs. One scenario would be to download our computer model of a statue to a set-top box during the course of the documentary, for example one of the story ideas outlined above, allowing suitably equiped home viewers to move around the statue or change its lighting at critical points in the narrative. Viewers would control this experience either by waving their remote controls or by moving their head or another prop back and forth. This interaction would occur either during the course of the conventional documentary narrative or by digitally pausing the video. In these and many other questions, we won't know what works until we try.

Enhanced Teleconferencing System for Linking Classrooms

Traditionally, distance learning technologies have been used to deliver lectures to students at remote sites ala SITN or Stanford OnLine. An interesting alternative, as we have been experimenting with in this course, is to provide a real-time link between distributed classrooms and laboratories.

Here is a future scenerio:

The two linked classrooms are outfitted with multiple digital cameras with computerized pan and tilt stages. The digital, compressed outputs of these cameras are connected directly to a control and routing computer connected to a high speed network. As in today's SITN classrooms, the main camera captures the lecturer; the view of this camera is controlled by a computer and automatically tracks the lecturer as he moves around the classroom. Each site also contains several audience cameras. The main audience camera captures a wide angle shot of those in attendance and other audience cameras capture other people and objects of interest. The remote feed of the audience is projected in the back of the room so from the point of view of the lecturer in the front of the room, the remote audience appears to extend behind the local audience. People and operators in attendance at the remote sites may teleoperate the cameras at the other site. The front of the classroom consist of a large display surface organized as a large virtual blackboard. Images of the people speaking at the remote site are inset into the virtual blackboard and are visible to all. The virtual blackboards are used to project PowerPoint slides and other visual material as well as serving as a distributed virtual drawing surface. Finally, the lecturer is able to run simulations and other demonstrations, and these may be controlled from either site.

Lecture of the Future

What is the lecture of the future going to look like? How can streaming media technologies be used to improve upon the current SITN system?

One major issue with lectures is that there is much more to see in a lecture hall than is transmitted. For example, when you are in the lecture hall you can easily see or switch your attention between the speaker, slides and/or the blackboard, video monitors, and the class. At any time, some of these items may be more interesting than others depending on a the lectures content and the viewer.

Some ideas:

Multiple Stream Lecture

Virtual Lecture Hall

Automatic Camera Control

Many interesting components for such a future system were built last year. However, a complete demonstration was not done.

Linux OpenSource DTV Receiver

Compared to the internet which is a relatively open system, DTV is almost completely closed. Broadcasters tightly control what is broadcast and receivers are built in hardware and not generally reprogrammable. The closed proprietary nature of the system results in little technological innovation. An interesting project would be to build an OpenSource Linux TV receiver that anyone could modify. Coupling this with the Intel/KICU Datacasting Center would make it easy for entrepreneurs, hobbyists and universities to experiment with the technology.

The Santa Barbara class is building several components of this system.

Video Panoramas

Panoramas are great way to create an immersive experience of a place without the hassle of the high-end VR equipment. Panoramas so far have been mostly static, but this could be changed by using techniques and the bandwidth available with DTV. In this respect the project is much in the spirit of QuickTime VR, but extended in the time domain. Instead of stitching a number of still images to form a panorama, we could stitch a number of video streams in much the same way. Possible applications: games (animated instead of mostly static environments), virtual traveling (imagine a number of environment cameras in your favorite museum or location), and many more.

A similar project was done last year with very promising results. The group working on it had many ideas for future work:

Camera Rig

Stitching of Video Streams

User Interface

User Interfaces for Interactive Television

One of the great unanswered questions is how members of a family sitting in a living room will interact with with their television. PC interaction is designed to use a mouse and keyboard by single person sitting 2 ft from the screen. A family viewing situation involves multiple people sitting in easy chairs or couches 6-10 ft from the screen. Keyboards and mice have been replaced with a remote control. This project would involve mocking-up a television viewing situation and experiment with different input technologies and viewing situations.

Here is a scenario to get your started:

Suppose you are viewing a large screen TV at distance from your couch. In front of you is a coffee table with a pen-based laptop computer similar to the Clio. Could the CLIO be used for interactive input? Could auxiliary material be displayed on the CLIO?

Personal Video Database and Channel Browser

TV Broadcast has always been a top-down, linear technology. Except for the selection of the current channel, all decisions have been made for us in advance on the head-end (such as the timing of program, insertion of commercials, etc.). With a VCR some limited non-linear control was possible, but difficult in practice. For examples, VCRs allow us to time-shift programs. Some ideas:

Virtual VCR

Local Insertion of Video

News Filtering

Augmented Sportcast

Lots of new television technology is deployed in sporting events. For example, you may have seen the First and Ten system developed by Sportvision. In their system a line showing where the first down marker is overlaid on a football field. They have also developed or are experimenting with systems for hockey, basketball, golf and NASCAR racing.

3D freeze-frame and 3D instant-replay

One idea we've been kicking around in the Stanford Immersive Television Project is 3D freeze-frame and 3D instant-replay. The idea is to build a system that can capture a critical instant in a sporting event, process it or compress it offline, broadcast it to the home, perhaps alongside a continuing live video feed, and replay it three-dimensionally and under user control, perhaps in a window-in-window or on a second screen.

One obvious format for such replay scenario is a single panoramic image. Another idea is to use an array of cameras. If the array is 1D, then the viewer can slide back and forth among the available views. This permits the viewer to experience horizontal or vertical parallax, but not both. If the array is 2D, then the viewer can fly freely around and toward the object of interest. The viewer's experience is of a 3D scene frozen in space, a sort of 3D freeze-frame.

Choosing the right size and form factor for such a 2D array seems very important. A small array won't provide enough range of motion for an interesting replay, a large dense array might be hard to build, and a large sparse array won't provide smooth motion. A large sparse array of video and range cameras, used in conjunction with a view interpolation algorithm, seems to offer the best chance of success. Since the array probably must be pre-positioned closely around the object, this technique is unsuitable for events where the object of interest moves around or becomes significantly occluded, as in a football game. On the other hand, the timing of the critical event need not be predictable, since the array can free-run until it occurs. Alternatively, the array can be triggered by the event itself.

As an example of this application, consider the instant an Olympic pole vaulter crosses the bar, or a Major League batter hits the ball, or a diver enters the water. While the judges are reporting their scores, the viewer could by studying the diver's form or judging the size of the entry splash for themselves.

An obvious extension of this application is to continuous video and range capture, which would permit 3D instant-replay of an extended event. This extension obviously poses additional challenges for capturing, processing, transmitting, and displaying the event.

Of course, we don't expect students in this class to develop a complete 3D instant-replay system; that's a goal for the Immersive Television research project. However, there are probably intermediate experiments that can be done using available technology.

Coffee-table diorama

Another idea we've had in the Immersive Television project is to create a miniature, three-dimensional, autostereoscopic display of a live event. In this application, the principal object of interest at the event is recorded using multiple video cameras and real-time range cameras. There need not be an equal number of video and range cameras, and the two types of cameras need not be collocated, although it simplifies processing if they are. For each stream, the foreground object is segmented from its background using range, color, motion, or other techniques. Then, the set of rgbaz streams are compressed and broadcast to the viewer. Previously, a detailed 3D model of the event venue was constructed and also broadcast.

In the receiver, the foreground colored range images and background model are rendered in perspective and displayed in a window-in-window, on a second screen, or on a tabletop display that doubles as a coffee table. Viewpoint control could be performed using a joy stick or an autostereoscopic technique such as vision-based head-tracking. Ideally, the viewer's experience is of a miniature moving diorama beneath the glass top of the coffee table.

As an example, consider a bird's-eye view of a tennis match. On your main TV set, or if you only have a single TV set, you see the "director's cut", the same way tennis matches are broadcast today. If you can receive and display the "diorama enhancement layer", you can view the tennis players on your coffee table as view-interpolated rgbaz imagery composited over a 3D model of the court and stadium. Since this diorama is intended as an adjunct to the primary video stream, not as a replacement, its resolution is not critical. It seems entirely plausible to capture, transmit, interpolate, composite, and display two 200 x 200 pixel rgbaz tennis players over a 640 x 480 pixel rendering of a 3D model of a tennis court, in real time, and without heroic technology.

Here again, we cannot expect students to build a complete diorama system as a class project. However, there are probably interesting and useful subsets of this system that can be built as course projects.