The purpose of the final project is to provide hand-on experience in designing, implementing, and evaluating a new visualization method, algorithm or tool. Projects will be carried out small by teams and will be assigned a mentor who will guide the project team. Projects can cover a wide variety of concrete visualization problems (see examples bellow) but should propose a novel, creative solution to this problem. The deliverable will be an implementation of the proposed solution and a 8-12 page paper written as a conference paper submission. Each group will be given the opportunity to present the final project to the class.
The group mentor will provide feedback as to whether we think your idea is reasonable, and also try to offer some technical guidance, such as. additional papers you might be interested in reading. You should arrange a meeting with your mentor before 04/26/02.
The project proposal should be submitted as a one web page, and will be presented in class either on 05/01/02 or on 05/08/02 depending on schedule constraints.
Your implementation should be able to handle typical data sets for the problem at hand, and run at speed compatible with the intended use (for example interactive visualization should run at interactive frame rate). Developing algorithms that scale to large data sets is particularly challenging and interesting. However, the project is not a programming contest and mega-lines of code will not help your grade.
We are very flexible about the underlying implementation of your projects. You can start from scratch using OpenGL or any other graphics and windowing toolkit, use an available visualization toolkit (such as Vtk, Rivet, GeomView or OpenInventor), extend a commercial application (such as AutoCAD, Excel...) or use fast prototyping tools (such as MacroMedia Director or Flash).
You will be expected to demo your implementation during the project presentation on 06/12/02.
The final project report should take the shape of a 8-12 page paper written as a conference paper submission. It should present related works, a detail description of your visualization, and include a discussion of your design. Final paper are due on 06/05/02.
The project are done by small groups, but each person will be graded individually. A good group project is a system consisting of a collection of well defined subsystems. Each subsystem should be the responsibility of one person and be clearly identified as their project. A good criteria for whether you should work in a group is whether the system as a whole is greater than the sum of its parts!
To get you started, here are some topics that we think deserve more research.
Polaris makes it very easy to setup visualizations of relational data bases. Data fields are dropped onto the axes of the graph or onto visual variables, and that uniquely specifies the visualization. An interesting extension would be to develop a 3D version of Polaris. In the 3D version, 3D visualization would be produced instead of graphs and charts.
Bertin is famous for his idea of reorderable matrices. If data is a function of two nominal variables, then the rows and columns can be permuted. He advocated permuting the data until the data is clustered and patterns emerge. Develop a tool that does this automatically or semi-automatically.
It is very common to perform large simulations using supercomputers. Often these takes days or weeks to run and generate very large files of results. After the run, post-processing is done to analyze the data and produce visualizations. Systems are currently being developed to help manage and monitor these types of simulations. What would be nice to have is a web-based lab notebook that is automatically created for scientists doing computational experiments. A computer-generated notebook would proactively produce a large number of pages with lots of interesting visualizations of different types, all linked together. Develop a system that would make it easy to create such visual notebooks.
As micro-arrays are used more and more extensively in biology for gene expression analysis (see NYT), the lack of good visualizations are impeding progress. One of the main difficulty is the identification of patterns in very large tables of results. Neither commercial offerings (see GeneMaths) nor research projects (see Jinwook Seo) to have struck a good balance between functionality and scalability. Develop a tool that can help biology during gene expression analysis.
Magnetic resonance imagery (MRI) is still a rapidly improving imaging modality. New acquisition protocols are continually being developed (such as acquiring diffusion tensors) and and the accuracy and resolution of data is being improved. Functional MRI, by adding the time dimension is making analysis of the data even more difficult. Develop some new ways of looking at new types of MRI data sets.
Fluid flow visualization is a major area of research in Aeronautics, Mechanical Engineering and Chemical Engineering. At Stanford, the Center for Integrated Turbulence Simulations is using supercomputers to run complete numerical simulations of fluid flow and turbulence in aircraft gas turbine engines. These simulations are producing large amounts of data that needs to be visualized. Projects could research techniques for abstracting these data sets to produce comprehensible visualizations, applying interaction techniques like cutting planes for visual inspection of simulation results, or finding techniques for integrating traditional graph visualizations with physical visualization of the turbine models, etc.
The Sloan Digital Sky Survey is "the most ambitious astronomical survey project ever undertaken". The survey will map in detail one-quarter of the entire sky, determining the positions and absolute brightness of more than 100 million celestial objects. It will also measure the distances to more than a million galaxies and quasars. We have available a 10% sampling of the Sloan data available in a SQL Server database. One project would be to develop an interface for exploring and visualizing this data (or some specific type of entity, e.g. quasars, within the data). The data is complex and some astronomy knowledge is recommended. Although we can provide help designing visualizations, we do not have extensive knowledge of the data.
In the first lecture we discussed the many possible ways for visually representing data values and demonstrated Polaris as one tool for visualizing relational data. Using Polaris an analyst can encode their data many different ways--many of them ineffective or misleading. Rather than having the analyst construct the visualizations directly, the system could automatically suggest good views of the data fields they have selected. Alternatively, there may be ways to refine the Polaris interface to still give the user control but encourage the development of effective visualizations.
A common type of visualization that almost everyone has used are the assembly instructions that come with mechanical objects such as modular furniture, Lego, etc. Designing effective assembly instructions is a complex task typically performed by human designers and requires picking an assembly sequence for the object, dividing the individual actions into separate diagrams ("frames"), picking a viewpoint for each frame and an orientation for each object in the frame, labeling the parts and actions, etc. A reasonable project would address one aspect of the design of these instructions. (e.g., automatically picking a good viewpoint or object placement for each step of the assembly). See the Augmented Cognition web page for more details.
Visualizing paths through 3D environments (such as directions through a hospital or a character's movement in a video game) introduces many challenges. Effectively showing the route may require showing several views of the environment or using transparency, cutting planes, and exploded views. A good project in this area would be to visualize player's movement in a game of quake. See the Augmented Cognition web page for more details.
When designing presentations or documents, one of the challenges we often face is finding ways to arrange all of the content (pictures, text, etc.) on a page. One project would be to develop automated algorithms that apply graphic design techniques to effectively perform document/presentation layout. This layout could include cropping of pictures, scaling, etc. and should incorporate graphic design techniques.
Cartographers have developed techniques and guidelines for displaying thematic maps. In a thematic map, geographically varying statistics are overlaid on a map. These values are normally portrayed using colors and or gray scale, but usually only a few intervals are shown and these are mapped to an ordered set of colors. In scientific visualization, it is common to display data on a surface. Can cartographic techniques be used to make data on surfaces easier to understand. Note there seems to be one important difference, curved surfaces are normally smooth and displaying discrete colors make them seem facetted. Can you develop a technique that makes curved surfaces appear smooth by still makes it easy to see values over the surface?
Tcpdump is a powerful tool that shows all network traffic on a link, but it can be quite hard to understand what's going on when confronted with the raw tcpdump output. "Visual tcpdump" would ideally run off either log file of a past tcpdump session or in realtime with live tcpdump connection. There are several tasks one might target from this dataset. First, visually characterizing traffic patterns - for example, showing the distribution of session lengths or packet types. Second, highlighting dangerous packets that could occur in a stream - for example, passwords sent in plain text. Third, characterizing protocols - for example, showing the TCP window size changes over the course of a session. Some previous knowledge of networking will be helpful for this project.
Noticing that a network is under attack is difficult because of the sheer volume of benign traffic, and the number of attack methods. The two main tasks are real-time detection that an attack is occurring, and forensic analysis of a past attack. There is a publicly available dataset of network traces with four different simulated attacks plus a control baseline with no attacks. Previous knowledge of networking and security issues will be helpful for this project.
One way to map "the Internet" is to consider the structure of the backbone router interconnections. Bill Cheswick has been keeping archives of the daily changes in the roughly 100,000 core reachable routers for over three years. Even the static dataset from a single day is a difficult challenge to show comprehensibly, and showing growth and changes over time is an even more interesting problem. The H3 browser for large graphs is a potential resource. This project should be feasible without previous knowledge of networking.