Teleconferencing System for Linking Classrooms

Ben Mowery
Donald Tanguay
Feng Xie
Kigen Kandie
Svetoslav Tzvetkov


For the past 30 years, Stanford University has pioneered distance education technologies: first through SITN using microwave-linked television and then with Stanford-Online leveraging the Internet. A major drawback of these deployed systems is the limited student-teacher and student-student interactions for remote students. During a live broadcast, the only way for the remote student to connect to the teacher is through a telephone call. As a consequence, the educational experience is poor for both teachers and remote students: the teacher cannot tailor her lecture to her students and the remote students feel isolated.

All video conferencing systems that we are aware of are designed for linking a small number of people. The standard setup is a single camera and a display at each site. Video is streamed at roughly a quarter of the NTSC TV resolution, which is far below the quality needed to capture a classroom. When this setup is used to link two classrooms, the users are usually painfully aware of the boundary between what is local and what is remote. The goal of this project is to create an immersive teleconferencing system for linking remote classrooms and desktop students.


We envision a system where remote classrooms and desktop students are merged into a single shared space where local and remote students can freely interact with each other. Each classroom is fitted with multiple pan and tilt cameras, multiple pan and tilt projectors, a microphone array, and a spatializable sound system. Together with the microphone array, the camera system automatically tracks the speaker and his gaze direction. Common rules of cinematography are used to create a pleasant video representation of the speaker. This video representation is further refined by segmenting the speaker from the background. Using DTV and Internet2, the segmented speaker is transported to the remote site, where a pan and tilt projector recreates a life-size projection of the speaker. The spatialized sound system is then used to create the sensation that the speaker's voice is coming from the projected speaker. Besides capturing the current speaker, the camera system also creates a coarse-grained foveated and contrast enhanced video of the audience and the blackboard.

Each remote desktop student provides a single video feed back to the classroom. These video feeds are projected in the classroom, thus allowing the teacher to be aware of his students at all times. Each student is also supplied with a comprehension knob. The instantanous compreshension of the whole class is displayed to the lecturer through a reverse multicast channel.

Core Technology

To realized our vision, we will develop the following core technologies:
1) speaker and gaze tracking using a microphone array and cameras (Parham and Itay from the IMTV Project)
2) segmentation of people from a known background (Feng)
3) capture and display of foveated video (Donald and Svetoslav)
4) enhancement of blackboard video (Ben)

Besides the core technologies, we will investigate the possibilities of incorporating the latest teleconferencing products into our system. Of particular interest are:
1) echo cancelling audio system for classrooms
2) better than TV quality video conferencing system (kigen)
3) low illumination video capture (kigen)


one microphone array
one spatializable sound system
two pan and tilt cameras
two pan and tilt projectors

Segmentation report