There are several relevant threads of related work. We have already discussed some of the core information visualization data- and task-based taxonomies in Chapter 1.
We begin with the previous work in the deliberate use of distortion to show as much context as possible around a focus point. The bulk of this chapter is a discussion of the many previous systems for drawing graphs and hierarchies, both topologically and geographically. Our main focus when reviewing previous systems for automatic graph drawing is their limited scalability. Of all the systems that we discuss in this chapter, only two of them handle very large datasets. In section 2.2.2 we cover the Cheops system, which has a highly compact display footprint for tree display that is more suited for an index than for exploration [BPV96]. On page 2.2.6 we cover the Nicheworks system for large graph exploration [Wil97,Wil99b], which was concurrent with our work on H3. Our discussion of effectiveness is more limited, since few of these systems attempt to specialize for a particular task to the degree that we pursue with our Constellation system. We end by justifying our choice to embark on design studies by discussing the limited relevance of automatic presentation systems, since their finite palette of visual encoding techniques does not extend to the domain of interactive presentations of large graphs.
One of the important challenges in a visualization system is how to present as much important information as possible given a finite display area. When the structure of interest is too big to see in detail all at once, the most straightforward solution is to allow the user to pan and zoom the visible area. The disadvantage of simply providing navigation controls is that users often lose track of the position of their current viewport with respect to the global structure. Adding a smaller secondary window showing a global overview with the current viewport location marked can provide some guidance, but forcing users to continually switch their locus of attention from one window to another can still lead to disorientation.
A large class of visualization techniques have been developed to address this problem by attempting to smoothly integrate detail views with as much surrounding context as possible, so that users can see all relevant information in a single view. Distortion techniques of this sort have been given several more or less general names, including Focus+Context [RC94], nonlinear magnification [KR97], fisheye views [SB94,Fur86], and pliable surfaces [CCF95]. These categories are not completely interchangeable: the Magic Lens system [SFB94], which featured movable filters, falls into the Focus+Context category but is not a distortion technique. Multiscale views such as Pad++, where the visual appearance of an object changes radically based on the distance to the virtual viewpoint, share some of the ideas of distortion-based systems [BH94,PF93,FB95]. Leung and Apperly taxonomize distortion techniques that appeared in the literature before 1994 [LA94].
Early work on automatic graph layout and drawing is scattered through the computer science literature [FPF88,WS79,Moe90]. The first book devoted solely to graph drawing, by Battista and colleagues [BETT99], summarizes large areas of the field. The Graph Drawing conference series beginning in 1994 has resulted in proceedings that cover recent work in both systems and theory. The focus of this thesis is systems, so we do not concentrate on the wealth of theoretical proofs about upper and lower algorithmic bounds: suffice it to say that most interesting computations on general graphs are NP-hard [Bra88].
A geographical view of a graph or network is appropriate for some tasks, particularly when showing telecommunication network topology or traffic information. The 1992 video by Cox and Patterson showed a ``2''-dimensional view of the NSFNet backbone rising above a flat map of the US from an oblique viewpoint [CP92]. The SeeNet [BEW95] system featured a totally flat 2D geographic layout of links on a map. The SeeNet3D system [CE95] presented two different visualization approaches. The first was a somewhat abstract 3D view of arcs lofted into the third dimension over a flat map, seen from an oblique viewpoint. The second layout approach, which showed links as arcs on a three-dimensional globe, inspired the similar visual encoding in the Planet Multicast system.
All the systems in the previous paragraph were highly literal, since each graph node had a geographic location attribute that was used for placement. Although the drawn links in some sense corresponded to physical cables in the real world, drawing them is an abstraction since those cables are not visible to the casual real-world observer.
The fsn system from Tesler and Strasnick of SGI [TS92] also places nodes on a ground plane, but has two major differences. First, the node locations are an abstract visual encoding of file system directory structure rather than inherent attributes of the data. Second, the geographic scale is that of a city rather than a country or the entire world, since the file size is encoded as the height of a building-like structure. The MineSet system from SGI includes an implementation of this algorithm. The Harmony system [And95] also used a similar visual metaphor.
Several systems for showing geographic traceroutes in both 2D and 3D [Too,Jon,Chr96,PN99,Aug98], have appeared either concurrently with or later than the Planet Multicast project, which was published in the fall of 1996. None of these newer systems have a more sophisticated interface from an information visualization point of view, and moreover many of them are more primitive.
It is worth noting that these systems discussed here fall into the realm of information visualization even though GIS (geographic information systems) and terrain rendering do not. In the latter two cases, no notion of an abstraction or a visual encoding choice spatializes data that is not inherently spatial.
Strict hierarchies are a subset of general graphs, and aesthetic tree layout has been proved to be possible in polynomial time, making it a more tractable problem than general graph layout [SR83]. The literature on creating aesthetically pleasing 2D drawings of trees includes both rectilinear [WS79,RT81,Moe90] and radial [BETT99, pp. 52--55,] methods, all of which are recursive. However, the example datasets are quite small.
A number of early Web visualization systems use either straightforward or previously presented tree layouts to show hyperlink structure [Döm94,AS95,PB94]. None of these approaches are suited for more than one hundred nodes.
Treemaps are a visualization method for hierarchies based on enclosure rather than connection [JS91]. Treemaps make it easy to spot outliers (for example, the few large files that are using up most of the space on a disk) as opposed to parent-child structure.
The Cheops system provides a highly compact interface to navigating very large hierarchies [BPV96]. They discuss an example hierarchy with a depth of 9 and a branching factor of 8 that can contain up to 20 million nodes. Cheops is compact to the point of being terse; it functions well as an index, but is not well-suited for browsing local areas or serving as the substrate for encoding auxiliary information in addition to the structure encoded in spatial layout.
Multitrees allow the depiction of several different link structures atop the same set of nodes [FZ94]. The H3 approach of distinguishing between links that belong to the spanning tree and non-tree links can be described as a 2-way multitree.
Noik's taxonomy of distortion-based graph layouts [Noi94] summarizes the systems that appeared in the literature before 1994. Several systems have explored fisheye-style distortions, including Generalized Fisheye Views [Fur86], Graphical Fisheye Views and Rubber Sheets [SSTR93,SB94], and SemNet [FPF88]. In all instances the dataset size was quite limited, which remained the case in later systems such as the work of Kaugars [KRB94].
The SemNet system for visualizing semantic networks [FPF88] offered a choice of 3D layout algorithms, a few of which used deliberate distortion. SemNet was also one of only a few early systems to address navigation as well as layout. In addition to absolute and relative viewpoint positioning controls, the user could jump directly to a previously saved site, or navigate hop by hop through the graph structure. The SemNet visualization was bidirectionally linked to a knowledge management system, so that the knowledge base could be directly manipulated via interacting with the graph structure, and knowledge base operations could be reflected in the graph view.
The 3D Pliable Surface graph viewer [CCFS95] epitomizes the difference between the H3 algorithms and all of the previous work in distortion-based graph drawing: Carpendale uses a known algorithm from the GraphEd system [Him94] for layout, and uses distortion techniques only for navigation. In H3, both layout and navigation occur in the 3D hyperbolic space. The layout algorithm is carefully tuned for the characteristics of the distorted space, resulting in a more uniform information density.
The SHRiMP graph viewer is based on the multiscale Pad++ system [SWFM97,SM98]. The paper makes no explicit claims on scalability but the example datasets were limited to a few dozen nodes. One of the challenges of Pad-style multiscale views is the propensity for users to lose track of their whereabouts, a problem that is only partially addressed in recent work on multiscale navigation [JF98]. The earlier Continuous Zoom system allows multiple focal points [BHDH95], and has a similar multiscale feel and scalability limits.
The influential Cone Tree approach [RMC91] for recursive 3D tree layout is more similar to the radial 2D tree layouts than the rectilinear ones. The authors argue that Cone Trees fall into the Focus+Context domain, since the standard 3D Euclidean perspective projection emphasizes near objects at the expense of distant ones. There have been many extensions and refinements to Cone Trees, including the bottom-up algorithm of Carrière and Kazman that minimizes the chances of territory overlap [CK95]. Koike [KY93] presented the extension of fractal trees to visualize what he called ``huge'' hierarchies. His scalability claim is ``hundreds or thousands'' of nodes, and the example images show at least 1000 nodes.
The 2D Hyperbolic Tree browser from Lamping, Rao and Pirolli is one of the best known examples of a distortion-based layout for hierarchies [LR94,LRP95]. Their system was limited to strict hierarchies and used a two-dimensional layout algorithm. Section 3.4 contains a detailed analysis of the scalability and information density differences between this system and H3. Another difference between the two systems is covered in in Section 3.2: the PARC system uses the conformal projection from hyperbolic to Euclidean space, as opposed to the projective model used in H3.
Force-directed layout [BETT99, Chapter 10,] is a popular choice for general graph layout. One reason for its appeal is the intuitively clear analogy of a system with attracting springs along the links and magnetic-style repelling forces at the nodes. Nodes are seeded in an initial position and the system converges to an energy-minimizing state using methods such as gradient descent [KK89] or simulated annealing [DH96]. Although the graph drawing literature includes some quite sophisticated combinations of the best features of several previous algorithms [BHR95,Tun93], many straightforward force-directed implementations have appeared that do not seem to benefit from this previous work. Examples include the 3D Narcissus system for visualizing the hyperlink structure of the web [HDWB95], and some recent applets for browsing semantic and conceptual networks [Des,The98].
The Gem system [FLM94] has been recognized [BETT99, p. 323,] as one of the most scalable 2D force-directed systems by virtue of handling datasets of more than 100 nodes. The Gem3D system [BF95] extends their algorithm to three dimensions and scales to a few hundred nodes. The Gem3D paper is one of the few in the Graph Drawing proceedings to explicitly discuss navigation separately from layout.
The literature on n-body simulation has a different but equivalent problem statement, and contains results which scale to large datasets where n is 10,000 by using a hierarchical algorithm instead of a naive approach [App85].
Scalability is not the only problem with force-directed systems when considered from an information visualization perspective. The final visual appearance of a dataset is usually different on each invocation of the system, either because the initial node positions are random or because the minor tweaking of layout parameters results in major changes in the final layout. In contrast, systems that repeatably lay out a graph the same way on each invocation can help a user form a stable mental model of the graph structure, and are well-suited for incremental or online layouts.
A few systems try to address the problem of handling extremely large graphs by incrementally processing only a subset of the graph. These systems have an operational window of a fixed number of nodes, and can continually add nodes and vertices dynamically to the considered set, dropping older items when the window fills. However, these windows are extremely modest in size: both the DaTu [HE97] and OFDAV [CH97] systems have windows of only a few dozen nodes.
North's 1995 paper on an incremental layout which could handle graphs of modest size proposed online hierarchical layout for the exploration of large graphs as a fruitful area for possible future work [Nor95]. A incremental layout approach that features a resistive circuit distance model provides notable speedups, handling datasets of over one hundred nodes in a few seconds [CH97].
Some 2D graph drawing systems attempt to highlight symmetry [LNS85], circular structure [TX94], or hierarchy [GKNV93]. The dot system [GKNV93] is one of the most popular 2D hierarchical systems, and scales to datasets of hundreds or even thousands of nodes. However, the noninteractive interface limits its applicability for very large datasets.
All attempts to use three dimensions have at least some form of interactivity. However, simply adding a third dimension does not solve scalability limits: the orthogonal Giotto3D system [GT96] uses computationally intensive display elements such as Bézier tubes, but the example figures feature less than two dozen nodes.
Nicheworks [Wil97,Wil99b] is the only graph drawing system to date besides our own H3 system that scales to large datasets. The Nicheworks papers are concurrent with the 1997-1998 H3 work. The layout algorithms easily scale to well over 100,000 nodes. (Although the system can theoretically handle one million nodes, layout would take one day or more.) Nicheworks features interactive navigation and a divide-and-conquer layout approach, with a suite of 2D layout algorithms that separately handle each connected component. Both focus on scalability, but their presentation strategies are quite different: Nicheworks allows the user to zoom between an overview and a detail view, while H3 focuses on showing a very large neighborhood around a focus point. We discussed the tradeoffs of these two approaches earlier in this chapter, in Section 2.1.
Computer-based system for the automatic design of graphical presentations have been gradually expanding their coverage. Mackinlay's influential APT system used a data-oriented taxonomy and a palette of noninteractive 2D techniques including scatterplots, bar charts, and simple networks [Mac86a,Mac86b]. Casner's BOZ system introduced a task-based taxonomy [Cas91]. The Sage systems provided a slightly expanded palette of encoding techniques, but was still limited to two dimensions and no interactivity [RKMG94]. The ANDD system of Joe Marks brought automatic design to graph drawing [Mar91], but was limited to very small datasets of fewer than two dozen nodes.
None of the automated presentation systems exploit interactivity. Our three design studies on interactive navigation of large graphs can be thought of as forays into an unexplored part of the design space that could expand the vocabulary of future automatic presentation systems. A full characterization of this space will require both successful case studies and the kind of principled analysis of non-literal interaction techniques for which we argued in section 1.3.1.