Lecture on Sep 30, 2009. (Slides)


  • Required

    • Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases. Stolte, Tang, and Hanrahan. IEEE Transactions on Visualization and Computer Graphics, 8(1), Jan 2002. (pdf)

    • Chapter 8: Data Density and Small Multiples, In The Visual Display of Quantitative Information. Tufte.
    • Chapter 2: Macro/Micro Readings, In Envisioning Information. Tufte.
    • Chapter 4: Small Multiples, In Envisioning Information. Tufte.
  • Optional

    • Multidimensional detective. A. Inselberg. Proc. IEEE InfoVis 1997. (pdf)

    • Dynamic queries, starfield displays, and the path to Spotfire. Shneiderman. (html)


nmarrocc wrote:

I do not agree with Chapter 8's assertion that more information is better than less information, at least all at once. We have computers now, we dont need to pack everything onto one picture. We can use all kinds of techniques to abstract some of the data. In Chapter One of Information Visualization the authors say that the whole point of a graph is to aid in cognition. What does that mean? It means to look for patterns in data. Take a bunch of data, and try to eliminate the data that does not fit into a pattern, reducing or abstracting the dataset in our minds.

  • Sure we could start by looking at a graphic with a high data density, but ultimately what we will form in our minds is some simple pattern. The tools/visualizations we use should support simplifying the datasets were looking at. If we take a picture with a high data density its going to be harder to think about it because well have to keep more information in our heads at once, but if we can use methods of sifting through the data so we dont have to look at it all at once thats going to help us simplify faster, which is ultimately what were doing anyway.
gankit wrote:

Yeah. I don't think Tufte was considering interactive visualizations with the data in that chapter. But, I definitely feel that more information is definitely better than less. The ideal visualization would be one that shows a lot of information at various levels so that, depending on the user interest, they can drill down to that level and understand the data.

I really liked the idea of small multiples as it naturally fits the paradigm of animating visualizations. Understanding what works using small multiples is a great way to know which variable you can animate upon :)

wchoi25 wrote:
cwcw wrote:

In their creation of the Polaris interface, Stolte, Tang, & Hanrahan discussed the need for more sophisticated HCI systems, as we amass greater collections of data for analysis. With rapidly advancing innovations in digital technology, they note that an important area to apply these innovations is in data visualization. Their program, Polaris, works to that end. Operating under the model of the "data cube," Polaris allows for data transformations, comparisons, isolations, and groupings. As such, the user can then command certain data for examination, from a very wide pool.

As such, I think it's beneficial to HAVE a greater amount of information, but not necessarily to USE all of it. Indeed, by using computers to abstract the useful information, we can pare it down into an understandable, digestible, and ideally interesting/exciting visualization. To me, the most exciting feature that computer technology offers data visualization is in animation, as I am highly interested in how things change over time. The makers of Polaris mark animation as an important next-step in where to take their product. By adding the animation "shelf," they can use digital visualization to tell a story of how things change over time. It's almost like having incredibly advanced time-elapsed photography (in the abstract) at your fingertips. What incredible perspective this has to offer humans--a view of creation we've never seen before.

vagrant wrote:

So far in this course, I have leaned toward simpler data visualizations that emphasize specific information (i.e. practicing data-ink principles). The argument for greater degrees of data density does not necessarily conflict with this attitude, however. The macro/micro illustrations in Tufte’s text allow for multiple layers of observation that less dense representations would struggle to afford. Some examples displayed seem like a cluttered mass of data overload, but others have an immediate high-level story to convey, and allow for detailed exploration upon further scrutiny. I believe that the constructs of data-ink principle are not violated by a well-formed high density data representation. In fact, some of the maps and schedules shown in the text prints out a high degree of data, and allow for multiple patterns to be recognized—all with reasonable amounts of ink. Modern graphical representations generated by software—such as Polaris—allow for clearer filtering of such density, but there is a certain aesthetic blend of vision and elegance that allows for a static image to express as much as a good macro/micro case does.

On the topic of small multiples, however, I feel that software representations are the best way to go. I wager that the fidelity of intervals in between small multiples is a value that is best determined after a battery of experimentation and review, so having each frame automatically generated (or digitally captured) seems like the only practical way to accomplish that.

fxchen wrote:

Polaris represents software to fuse any number of data collections at different granularities in order to make the data useful for a specific analysis. In discussing the different ramifications of Polaris design and (more importantly) visualization usefulness, I believe cwcw captured the essence of vis fabulously: offering "a view of creation we've never seen before".

After the Tufte readings on Micro / Macro design and (to a lesser degree) Small Multiples, I could not help but be struck by the power of interactive vis (i.e. Polaris, ManyEyes) that allows a user to consider and discern patterns high-density data. In writing Envisioning Information and focusing (almost to a fault) on beautiful vis, Tufte praises high-density design "allows viewers to select, to narrate, to recast, and personalize data for their own uses". This statement translates even better to interactive vis by augmenting high density visualization with the opportunity for information overload, with the power to filter and control the data being represented.

wchoi25 wrote:

The Tufte reading this week superbly argues for an easy-to-miss point:

"Simplicity is another aesthetic preference, not an information display strategy, not a guide to clarity. What we seek instead is a rich texture of data, a comparative context, an understanding of complexity revealed with an economy of means."

At first reading, I wondered how this fits in with his previous chapter on data-ink maximization that seemed to emphasize simplicity such as the redesign of the box plot, etc. However, on further thought it is clear that there is no contradiction here, but two distinct visualization guidelines that are almost synergistic. Good visualizations don't shy away from showing the enormous complexities in data sets, and carefully designing the graphics such that this complexity is apparent and ready to be examined means necessarily that there is not much room for the fluff: non-data or repeated data visual elements. It is then the portrayal of complexity that helps trim down the fat and bring about simplicity where it matters.

I also liked how these readings were paired up with the Polaris reading, which discusses how interactivity, rather than small multiples or other static techniques, could help in exploring large, complex data sets.

malee wrote:

Having visited DC recently, I found Tufte's macro/micro discussion of the Vietnam War Memorial to be particularly thought-provoking. Regardless of one's distance to the memorial, the gravity of the war and its death toll is consistent. Interaction with the memorial (ie walking towards and away, touching the engravings, etc) only leads to a change in the amount of detail that the viewer absorbs.

I like the analogy between the memorial and a data graphic because it highlights the viewer's ability to control his/her experience with the information. Just as walking towards/away/around the memorial is a natural interaction, interactions (and their resulting actions) with a data visualization should be just as intuitive. As with the memorial, no matter how the viewer interacts with the visualization, the overall message should remain consistent, too.

joeld wrote:

I really think that multivariate visualization is one area that is very important and where good tools (or more widespread knowledge of good tools) could have a big benefit. I frequently encounter optimization and design problems with tens of variables and being able to visualize the convex hull of possible solutions would be quite valuable. Designers could get a lot of intuition by being able to enter some constraints and see the hyperplanes involved.

At first I thought that the scatter plot matrix would provide a very good solution for this problem, until i realized that the individual scatter plots are projections of the data. In order to visualize a surface we would need to draw slices, which would require us to choose a slicing hyperplane for each coordinate pair, which would perhaps be a large burden. I wonder if plotting the vertices of a convex hull would provide enough of an assistance to make the data usable.

I may explore this as part homework 1.

jieun5 wrote:

I found some of the artistic examples given in pp. 69-73 (Chapter 4: Small Multiples, In Envisioning Information. Tufte) particularly unique and interesting. Unlike most other multi-dimensional visualization examples we've seen, which deal with relationships between measurable (often quantitative) data, the artistic examples are meant to convey aesthetic meanings.

Hence, I feel it is inappropriate to apply principles such as efficiency of communication or saving data-ink to these examples. Rather, as Tufte nicely articulates, the goal for these artistic representations should be to "isolate detail and place it in context" (p. 71).

This goal naturally reminded me of my recent visit to Indianapolis, where the walls of the Indiana State Museum featured artworks that represented the 92 counties of the state (which were so unique that I ended up taking many pictures). From far away, I saw several counties' art embedded outside of a giant stone wall of the museum, which provided me with the "context". Literally taking a few steps closer, I realized that one of the artworks comprised of nine glass jars, eight of which had pickles in them, and one had a strange orange object inside. Curious, I "zoomed in" to get the detail, and found that the mystery orange object was poor Garfield. Having this realization was quite fun and memorable; I feel that this is a great example of a aesthetic-concept visualization that "isolates details and places it in context", not un-like that of the Vietnam War Memorial example.

bowenli wrote:

I agrew with @cwcw that having more information is better, but perhaps not all of it will be in use. Given how incredibly effective the preprocessing variables were that we saw in class I would think that visualizations are really subject to overloading very quickly. On the idea of small multiples, I think separating the visualization out into different levels can be helpful. In that, there may be a lot of data, but you find relevant information by searching in different tiers.

Polaris: Amazing that in 2002 people still used terms like "very large" without qualification. I liked part 5, the data transformations and visual queries section because it shows how complete the tool is. The brushing technique they mentioned and the different binning techniques sound really useful.

cabryant wrote:

(removed duplicate post)

cabryant wrote:

In his discussion of micro-readings of detailed cityscapes (p. 37, Envisioning Information), Tufte references an inspired statement by Italo Colvino: “[cities are] relationships between the measurements of its space and the events of its past . . . the height of that railing and the leap of the adulterer who climbed over it at dawn.” I would argue that an engaging, underlying story is present in all visualizations, from the implied causes, the present data, and the future possibilities. One pertinent example of the narrative potential of data, particularly with respect to micro/macro interpretations, is the DIVER (Digital Interactive Video Exploration & Reflection) Project here at Stanford. This interactive tool allows users to create annotated perspectives of a video record. This may involve focusing on a small interaction within a large mass of activity, or creating a first-person navigation of a complex scene. The resulting “DIVEs” may then be shared in a collaborative space to convey unique interpretations of the raw video data. It would be interesting to determine the potential for this type of interaction on other forms of data visualization.

On a side note, I think I will pull a Tufte on one of Tufte's visualizations, and question why ordinality is not imposed on the “leaf” values in the volcano height plot (p. 46, Envisioning Information). Given that the digits signify a quantity, a sorted, ordered list of leaves would better convey the number of volcanos at each height, and promote more rapid and facile comparisons of volcanos with similar heights.

vad wrote:

The Data-ink rule is good when applied to the area of the graph itself but when things like the axis, borders and labels are included it raises a red flag. Case in point: have a look at the visualization on page 59 of The Visual Display of Quantitive Information; this clearly is, without doubt, the best visualization ever produced by anyone ever; however if you were to count its Data-ink ratio it would be very low. Which is why I think Data-ink should only be measured on the area of the graph itself.

anuraag wrote:

@gankit and @cwcw, I agree that animation offers great potential in visualization, but I think it is worth thinking about what sorts of cognition and analysis tasks might be suited or unsuited for. Animation strikes me as fantastic for general pattern recognition in a time series, as we're quite accustomed to detecting movement and making judgments as to what moving objects in our field of vision mean. I speak without any scientific background to support my claim, but I'd intuitively guess that we are trained to focus our vision on the items in sight that change frame to frame, while readily casting out of mind the data that stays constant.

That said, there are some ways in which animation strikes me as a poor substitute for the use of small multiples. For close analysis, having several frames in sight at once for side-by-side comparison is useful. While animation requires you to hold what happened before in memory, small multiples allow you to focus on a particular time point's image and directly compare it to both the time point before and the one after.

dmac wrote:

I think one pitfall that quickly becomes more dangerous when using small multiples is an excess of non-data-ink. One must take extra care to not include any extraneous ink in a small multiple frame, because by the nature of small multiples that excess will increase as you collect frames. Case in point: I feel like the chart of the train lights on page 68 in Envisioning Information has a fair amount of extraneous non-data-ink that could have been excluded.

An interesting side note: the data-ink *ratio* does not actually increase with the addition of more frames; only the *absolute* amount of non-data-ink does. Whether this is bad or not can be debated.

zdevito wrote:

In his discussion of data density and small multiples, Tufte argues for high density of information but downplays the confusion that too much information may cause. Increasing data density makes sense when information displayed can still be processed pre-attentively since we can automatically scale up to the additional information, but if additional information will require more attention or prompt a linear search for some information it may be a poor choice.

  • Another interesting question that comes up with discussion of data density is how you would intelligently scale a visualization given more or less space. A simple image-scale might not be appropriate as text must remain readable and the data points must remain separable given the resolution of the medium. Some GPS maps already do somewhat intelligent scaling by omitting small roads in favor of high-ways at a lower resolution while including everything when zoomed in. It seems like a similar operation may be possible for non-physically-based visualizations where a specific data-density is targeted information is omitted/aggregated to hit a specific density target.
jqle09 wrote:

I also agree with Tufte that making visualizations information dense is desirable. Of course this shouldn't be at the expense of clarity. I think the visualization should be made such that all the data that is needed for the story should be included. As others have mentioned, it does not seem like Tufte considered interactivity at all within his chapter. But I think small multiples is Tufte's way of creating interactivity within static graphics. In a small multiples graphic you can drill down into pieces of interest by expending greater focus or you can process the larger picture by blurring focus.

It has a different feel than a truly interactive graphic but is as close as you can get for a static visualization. Truly interactive graphics should be flexible enough to offer much greater routes for exploration than anything that is static. To Tufte's credit technology was not such that creating interactive graphics was as easy as it is today. Although today this is still a difficult problem, so hopefully we'll be able to work on flexible tools to make it simple to visualize extremely multivariate data very interactively. Though playing around with Tableau has been really nice.

nornaun wrote:

I agree with @Anuraag that sometimes animation is a poor substitution of small multiples. I believe, without scientific support yet, that humans intuitively associate animation with changes in time. In this sense, one dimension of the data should be time if we want to put animation to the good use. Animation does not provide much aid in displaying the data that are not time-related. Still, I think interaction will be of much help for such cases, may be even more powerful than small multiples. I get this idea while trying Tableau. Interaction brings users to direct manipulation of data display. It can be fun and give the users insight inconceivable from static data.

rajsat wrote:

In Tufte's data density reading, he explains his stand for having highly dense by stating that the cost of handling and interpreting additional information is low in most cases. There is one anomaly to notice here. When the data measures are shrunk, then the cluttering non-data ink and redundant data ink also comes down- this helps in cognition of a high-information graphics for sure. Agreed. But then, the shrink principle eventually leads to another effective graphical design- the small multiples. But here there is a contradiction- the fact that we are shifting to MANY smaller graphics to show the same combination of variables- in turn increases data ink. For instance, the graphics showing the hourly average distribution of hydrocarbon emission in Page 170, increases the non-data ink to a large extent even though the concept of small multiples aids cognition. Hence I seem to think achieving both - reduction is redundant data-ink and shrunk graphics through small multiples is tough, if not impossible.(of course all this applies to graphics in printed form).

As many have already pointed out correctly, the best way to get over this is to have an interactive system where a slider can be used to move between the different stages of the variable that changes in small multiples. It not only helps in going through a series of images on a single window one on top of another, but also helps us to pick and choose the data that we want to view, leaving out variables(from a huge collection in a multi variate data) that is not of interest.

tessaro wrote:

Based on some of posts thus far, I think that the chapter on small multiples in Visual Display of Quant Info has unintentionally conflated two notions whose separability is actually the crux of Tufte's thesis. The emphasis on more information being inherently beneficial is easily confused with the various descriptions of data-density and data-ink vs. non-data ink. Tufte's use of the term information is I think meant to be anything that is useful and compelling that can be obtained thru investing attention in a graphic, it's the good stuff, as in the root word 'inform'. It is in opposition to anything that is not of cognitive value in the abstract; not simply another way of expressing sheer data size or another euphemism for visual complexity per se. It is about ink in the service of purpose which explains the maxim "For data-ink, less is a bore". Simple data sets that do not justify the ink the are printed in are instances of non-interesting and therefore information poor graphics. The presentation of such data could adhere perfectly to the precepts of minimal chart-junk and the steadfast elimination non-data ink and still not be valuable as embodied carriers of interest. This is unfortunately easily confused point because of the flexible use of the term information to mean the addition of anything to be visually considered as well as meaning the useful bit which is extracted - the thing that is the goal rather than the descriptor of a graphic.

rmnoon wrote:

This might be more of a technical comment, but Polaris (and thus Tableau) seem like they're as much UI challenges as data structure challenges. It can't be easy to craft an intuitive wrapper around a new kind of relational algebra. I wonder why the designers have chosen to stuck to such traditional UI toolkits instead of pushing the envelope with new interaction types. Specifically, an 3d accelerated interface, even of 2d data, could be so much more data rich. I must think about this for the class project.

alai24 wrote:

I think that the key in maximizing data ink is that only pertinent information should be presented, but as much of it as possible. This requires knowing who the audience is and designing accordingly. It is difficult to find patterns in an overloaded visualization so I am personally a fan the use of small multiples. As mentioned by previous posters, small multiples should probably only be used by static images since if one has the luxury of using interactivity, a nice animation or user inputs should be able to filter data and move across dimensions.

akothari wrote:

I'm split between too much data or less data. There's something about looking at a limited data viz and understanding it quickly without spending a ton of time and energy trying to figure out what the viz is trying to communicate; but there;s also viz which pack a whole lot of information in very aesthetic ways. It might take a lot more cognitive load and time to decipher all the information, but you learn a lot also. Maybe we can calculate (amount of info gained/time-energy spent) ratio and figure out what's better?

Here are some fantastic examples of packing a ton of data together: http://www.webdesignerdepot.com/2009/06/50-great-examples-of-data-visualization/

aallison wrote:

It is interesting that Tufte seems to go against (or at least, lend some nuance to) his earlier principles in the "Micro/Macro" chapter. He claims that simpleness "is another aesthetic preference, not an information display strategy, not a guide to clarity." His old ideas of maximizing data-ink have been given depth, now labeled as "rich texture of data, comparative context,[...], economy of means". This book seems to drive at a more abstract, more broad sense of aesthetics for data visualizations than the first book does. Frankly, I find the graphics in this book to be much more provocative than those in the first.

codeb87 wrote:

In reading Multidimensional Detective, I couldn't help but question the effectiveness of the parallel multi-dimensional display when analyzing variables with drastically different ranges and distributions. When analyzing a diverse set of variables, it seems like a lot of front-end work would be necessary to formatting the data along the y-axis so that apt comparisons could be made. Perhaps this would be as simple as centering the data about the horizontal line through the center of the graph. But it also seems like every variable would have to be formatted along the same scale (ie everything would have to be flattened into linear comparable scales). Thus different variables would be transformed logarithmically or exponentially, translated and scaled so that the range of the data set was centered and with a range that covered the entire vertical length of the graph. Now this is not an impossible task, but it seems to me that in order to use the parallel display as the paper presents, the reader must have an intimate knowledge of what is going on in each vertical axis. Once the reader knows this, then he will be able to more accurately compare heights, trends, et cetera. I would like to know if there is a programmatic process whereby these data transformations are made, and if there is, what algorithm it uses to maximize the accessibility of the data relations.

nornaun wrote:

Wow, I just looked at the link akhotari posted, it is really cool. One thing is some of the example there outright violates Tufte's concept of efficient visualization. The graphics have different mood and feel in them too. Anyone know about a study on how different presentations affect mood and performance of the viewers? I tried to google it but I haven't found anything about that yet.

rnarayan wrote:

A top-of-mind roadblock I see with the use of small multiples is the issue of scale. For instance, if one were to design a molecular level (with millions of datapoints) simulation of a ribosome in motion (to perhaps identify potential target sites for antibiotics), then the static snapshots afforded by small multiples would be woefully inadequate. Some form of motion simulation would be indispensable.

Further, if any kind of dynamic behavior such as what-if scenarios is to be compared and contrasted, a simple before-during-after time step sequencing of static graphs/images would again fall short. An example here would be visualizing traffic buildups, air-flow patterns, etc. based on simultaneous simulated control of multiple flow parameters.

In retrospect, the small multiple seems like a reasonable starting point for any visualization problem that requires comparison and contrasting. If supplemented with interactive techniques such as brushing and linking, it can be capably deployed to track changes and patterns in domains where the degree of parameterization is not very high.

Leave a comment