Teleconferencing System for Linking Classrooms
Introduction
For the past 30 years, Stanford University has pioneered distance
education technologies: first through
SITN
using microwave-linked television
and then with
Stanford-Online
leveraging the Internet.
A major drawback of these deployed systems is the limited student-teacher
and student-student interactions for remote students. During a live
broadcast, the only way for the remote student to connect to the teacher
is through a telephone call. As a consequence, the
educational experience is poor for both teachers and remote students: the
teacher cannot tailor her lecture to her students and the remote students
feel isolated.
All video conferencing
systems that we are aware of are designed for linking a small number of
people. The standard setup is a single camera and a display at each site. Video is streamed
at roughly a quarter of the NTSC TV resolution, which is far below the quality needed to capture
a classroom. When
this setup is used to link two classrooms, the users are usually painfully
aware of the boundary between what is local and what is remote.
The goal of this project is to create an immersive teleconferencing system
for linking remote classrooms and desktop students.
Vision
We envision a system where remote classrooms and desktop students are
merged into a single shared space where local and remote students can
freely interact with each other. Each classroom is fitted with multiple pan
and tilt cameras, multiple pan and tilt projectors, a microphone array,
and a spatializable sound system.
Together with the microphone array, the
camera system automatically tracks the speaker and his gaze direction. Common rules of
cinematography are used to create a pleasant video representation of the speaker.
This video representation is further refined by segmenting the speaker from the background.
Using DTV and Internet2, the segmented speaker is transported to the remote site, where
a pan and tilt projector recreates a life-size projection of the speaker. The
spatialized sound system is then used to create the sensation that the
speaker's voice is coming from the projected speaker.
Besides capturing the current speaker, the camera system also creates a coarse-grained foveated
and contrast enhanced video of the audience and the blackboard.
Each remote desktop student provides a single video feed back to the classroom.
These video feeds are projected in the classroom, thus allowing the teacher to be aware of
his students at all times. Each student is also supplied with a comprehension knob.
The instantanous compreshension of the whole class
is displayed to the lecturer through a reverse multicast channel.
Core Technology
To realized our vision, we will develop the following core technologies:
1) speaker and gaze tracking using a microphone array and cameras (Parham and Itay from the IMTV Project)
2) segmentation of people from a known background (Feng)
3) capture and display of foveated video (Donald and Svetoslav)
4) enhancement of blackboard video (Ben)
Besides the core technologies, we will investigate the possibilities of incorporating the
latest teleconferencing products into our system. Of particular interest are:
1) echo cancelling audio system for classrooms
2) better than TV quality video conferencing system (kigen)
3) low illumination video capture (kigen)
Resources
one microphone array
one spatializable sound system
two pan and tilt cameras
two pan and tilt projectors
Segmentation
report