current page

history

user

Lecture on Sep 23, 2009. (Slides)

Readings

  • Required

    • The Structure of the Information Visualization Design Space. Stuart Card and Jock Mackinlay, IEEE InfoVis 97. (pdf)

    • Chapter 1: Graphical Excellence, In The Visual Display of Quantitative Information. Tufte.
    • Chapter 2: Graphical Integrity, In The Visual Display of Quantitative Information. Tufte.
    • Chapter 3: Sources of Graphical Integrity, In The Visual Display of Quantitative Information. Tufte.
  • Optional

    • On the theory of scales of measurement. S.S. Stevens. (jstor)

    • The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations, Shneiderman, Proc. IEEE Conference on Visual Languages, Boulder 1996. (pdf)

Comments

vad wrote:

Chapter 2 on Graphical Integrity illustrates something that I felt since I was a kid; it's nice to finally see it in print. Any type of circular graph which isn't a pie chart or a 3-D graph which doesn't actually use the third dimension to illustrate anything (especially if it's painted in "real" 3-D and with a weird perspective) sets off red flags for me.

bowenli wrote:

The paper seems a bit loose on their definitions. For example, when talking about automatic vs controlled processing, I believe this applies to the "real world." That is, what humans can judge based on the literal graphical properties of something (example: 2 pixels down and 5 to the right).
However, when talking about the set of Graphical Properties, they switched to a kind of meta-language talking about the world inside the visualization. In that: in their example of GIS based visualizations (Table2 SDM), they take longitude and latitude to be X,Y, and financial data to be Z. However, in this case, they mean the PERCEIVED X,Y,Z values, which are actually not literal translations of the pixel values. But instead they should be viewed through human processing and recognition of perspective, depth, etc.
I make this point because given that you define Graphical Properties as these meta-properties, then it may actually interfere with whether something is automatic or controlled processing. For example, you could imagine that instead of having a slightly skewed American map, that the map was actually displayed on some strange bumpy surface going every possible direction. Then something seemingly automatic like position processing may actually be difficult to determine.

Short paper? While they have definitely proposed *a* framework for characterizing visualizations, I'm a bit disappointed that they didn't really talk about why it's any good. They didn't really spend much time talking about potential uses.

On the Theory of Scales
They sort of lost me on the argument about perceptions and scales. Its interesting that out of all the problems dealing with measuring human senses, they chose a very cut and dry type of definition argument. Still an interesting read though.

aallison wrote:

I suppose the main purpose of the Card & Mackinlay is to present their taxonomy, which enables us to view the whole design space in terms of possible permutations of properties of the data and visualization. Each of the examples provided in the paper helps demonstrate how to classify a visualization using their taxonomy.

I agree with Bowen that the distinction between controlled processing and automatic processing is a bit vague. They only provide two examples with the CP column filled in (one text, the other is area). I suppose Area is controlled processing because a viewer must approximate the area under a curve to compare one area to another? Not sure. What else would be controlled processing, in their view? Symbols for different elements in a legend? Connections between nodes that represent some relationship?

I love how Tufte has created a way to calculate how much any given graphic is lying. I see the dialog going:

  • "Oh, it doesn't make that much of a difference..." "Um, you distorted the numbers by a factor of almost 10"

It makes me wonder how the publications that lacked graphical integrity when Tufte wrote his book are doing today. I think many of the misconceptions that "Statistics are boring" soon-to-be extinct in our increasingly data-driven culture. Many companies use statistics as their main way to judge a decision/design choice as successful or not.

jieun5 wrote:

In chapter 2 (Graphical Integrity), I was pretty shocked to see several examples that used inconsistent time-scale step on the x-axis to falsely convey a drop/ decline in aggregate amount. For instance, the bar charts in bottom of p. 54 compares first half of 1978 with the full 1976 and 1977; and the line graph on bottom left of p. 60 compares first 4 years of 1970s with a series of a full decade preceding 1970.

I think what makes these examples especially "awful" is that, though the graphs *correctly label* the time intervals (and for the second example even employs a consistent x-axis spacing proportional to the time span, such that 1971-1974 is 4/10 as wide as 1961-1970), the visualizations -- whether it be bar heights or up/down contour of line graphs -- are much more immediately perceived then the small texts shown at the bottom. It actually took me quite a long time to spot the text labels, and once I did, I felt quite cheated. :)

dmac wrote:

I found Playfair's depiction of Britain's sky-rocketing national debt quite interesting (Tufte, 64-65). While Tufte often lauds Playfair for pioneering many new ideas in visualization design, this is one example of where he intentionally exaggerates the data. It reminded me of the Florence Nightingale diagram of the cause of mortality in soldiers that we saw in the first class, where she also exaggerates the data to make a point. I'm curious to explore the line between designers who create poor visualizations because they are ignorant of good and fair design principles, and those who intentionally skew the data in order to make a rhetorical point.

I suspect Tufte would also find fault in Nightingale's diagram, as he states the following principle on page 71: "The number of information carrying (variable) dimensions depicted should not exceed the number of dimensions in the data." In other mediums, such as writing, it is often taken for granted that the author or designer creates their work with some bias and intent in mind, and we the audience must interpret it within that context. Is Tufte suggesting that we hold visualizations to a higher standard, and attempt to remove all bias and rhetoric from our depictions of data?

vagrant wrote:

I found Card and Mackinlay's taxonomy of information visualization to be rather dry. The article struck me more as a retrospective summary than an informative exploration. To be frank, keeping my eyes up with the text required determined effort.

I found Tufte’s exploration in the first chapter considerably more elucidating. In particular, I enjoyed the historical evolution of data representation documented—the study encouraged me to reevaluate the common visualizations I take for granted today, and also made my mind percolate with ideas for newer innovative visualization techniques.

My favorite section of the reading was the third chapter of Tufte’s text. While the second chapter on integrity was certainly entertaining, the lessons presented were for the most part common sense. But what was striking to me was that that article of common sense—that data representation often obscures or exaggerates relationships within data—had become so common that I had as a reader accepted it. Until I came upon the third chapter, there had never been a time when I questioned why data integrity was so poor in publications, or when I expected better. The brief but concise examination of causes behind data integrity issues caused me to look introspectively for a moment, in particular at my first assignment.

My first assignment strived to be easy on the eyes and minimalist; I wanted to emphasize certain patterns, and was willing to leave out labels and frills that did not contribute toward that end. But while I am confident that I avoided skewing my visualization via any inconsistency, Tufte makes a good point about respecting the reader’s intellect. I personally found some of the examples Tufte illustrates in the first chapter overly busy from a visual perspective. As a technically-minded (and reasonably intelligent) computer scientist, I wonder if there is an artist inside me, one that prefers visual representations to be a simple and incremental instead of dense and spatially economic.

zdevito wrote:

While the distinction between automatic and controlled processing is left vague by Card and Mackinlay, this phenomena has actually been studied in psychology and does suggest that certain visual task can be processed 'automatically' while others are controlled. A standard experiment in this area is a visual search task: the subject is asked to determine whether or not an object with some specific quality (color, shape, texture, etc) exists in an image that contains distractor objects with different properties. Experimenters vary the number of distractors and measure the average time to do the search task. For certain features (e.g. color), the task does not take significantly longer as the number of distractors is increased, indicating so-called 'automatic' processing. For others (e.g. identifying an F in a set of T's) the time scales linearly with number of distractors, indicating some active search on the part of the participants. One example in this space is Treisman and Gelade's paper "A feature-integration theory of attention" which attempts to categorize which tasks will require extra attention (http://dx.doi.org/10.1016/0010-0285(80)90005-5). They suggest that there is some set of primitive features like color or shape that can be processed automatically, but conjunctions of these features require controlled/serial processing. Of course, the set of primitives features and what constitutes conjunction is not always well-defined, but the experimental results do reveal two pathways in the brain: one that can multitask and handle large datasets immediately and one that scales linearly with the dataset capable of more complication decisions. Another good resorce for more information would be: Shiffrin and Schneider (1984) http://psycnet.apa.org/index.cfm?fa=search.displayRecord&uid=1977-24785-001, which is the one cited by Card and Mackinlay about this distinction. One issue with applying this distinction to visualizations is the difficulty of determining when a mapping will be automatic and when it will require serial attention, especially considering conjunctions of 'automatic' properties easily become controlled.

tessaro wrote:

Tufte critique of the historical shortcomings of information graphics and the distortions (intentional or not) manifest in myriad examples leaves out an important component to improving quality of data graphics - that of raising the statistical/analyitic skills of the consumers of data. By solely emphasizing the role that practitioners/designers play in communicating data, through the worthy adherence to a design ethic (as embodied in his excellent rules of thumb for pursuing graphical excellence), the reciprocal role of data consumers is assumed to be a static (if unfortunate) circumstance inherent to our culture. Interestingly, Tufte's own books are perhaps the most striking example of the kind of impact one can have championing quality in data graphics, with a somewhat surprising following with devotees from many fields.

The following link to a short (3 min.) TED lecture by math educator and performer Arthur Benjamin makes the case for working to improve the quality of our collective understanding of information by proposing an alternative structure to our high school mathematics curriculum. His proposal is to change the default toolset within the culture for engaging in the escalating dialogue with digitized information by emphasizing statistics and probability. In a world which increasingly uses visualization as a tool for analysis and decision making, a better informed audience can also play a role in raising the bar of graphical excellence.

LINK:

http://www.ted.com/talks/arthur_benjamin_s_formula_for_changing_math_education.html

alai24 wrote:

To expound on vagrant's observation that poor designs and misleading graphics are widely accepted, I remember a lot of the educational material and kid's magazines from elementary school covered in crazy diagrams. I don't remember any being intentionally deceptive, but I'm guessing their goal was to maintain the attention of kids. It seems that this sort of attitude never really goes away so people sort of see it as the way it's always been.

On another note, it blew my mind when I learned the first scatter plots emerged in the 1700s. Something that seems so natural to us now probably took quite a leap in abstract thinking.

anuraag wrote:

The Mackinlay rankings presented in lecture made me think a bit about the very concept of creating generalized rankings of "effectiveness" for various encodings. In particular, I wonder to what extent the 'effectiveness' of an encoding for a particular viewer is culturally determined, or determined by familiarity with a particular type of graph. It's easy to imagine that each type of encoding has its own characteristic learning curve, and that there might be a different ranking of encoding effectiveness when they are considered for novices vs. for experienced 'viewers'.

I'm looking forward to delving more deeply into the concept of effectiveness in the graphical perception lecture.

mattgarr wrote:

Chapter 2 of Tufte, on Graphical Integrity brings up the concept of a "lie factor" in graphics, and certainly does not hesitate in applying it. The example I found particularly interesting was his calling the use of dollar values, when not adjusted for inflation, to be introducing a "lie factor." Using non-inflation-adjusted data has always been something that has irked me about many graphics in print, but it has become something I've taken for granted, and I will often compare data graphics against another, showing the adjusted value of money over time. I have always just assumed this is work I have to do to get the real story.

Tufte's term, "lie factor" is liberating for me. I now look at such charts and feel that the author was not doing his or her job--it is his or her responsibility to ensure that the graphic tells a true story without my doing additional work.

Unfortunately, in the particular case of using non-inflation-adjusted money, the practice has become so widespread, it is hard to see how we can break the wider community of graphic designers out of what could be called a standard practice of lying.

nornaun wrote:

I agree that those 3D-chart graph are quite obtrusive when I first saw them as a kid. Still, I surprisingly adapt to them after being exposed to that kind of presentation for a year. More examples in chapter 2 also expose the outright lie in the graph that we should avoid. I appreciate Tufte's effort to establish Lie Factor as measurement of the graph distortion, though I still believe that there are still merits in some distorted graphs. This goes back to the story a visualization wants to tell. Some visualization's purpose is to make an argument for something which is not obvious to see without elaborated presentation. Nightingale's visualization of death's cause is one example. As long as, the visualizer explicitly informs her readers of such elaboration, I think it is fine. However, you should limit this kind of distortion as much as possible. Presentation of elaborated is like a double-edged sword. It can be for the good or strong support for the bad.

mpolcari wrote:

I was pleased to read the criticism of unecessarily '3d' graphs. Like an author overly dependent on arcane vocabulary, they are the essence of confounding complexity and complicatedness.

akothari wrote:

I concur with other comments on the dryness of the paper. It was a good overview of the methods and techniques used in different fields, but I was confused about several statements it makes - particularly the automatic vs. controlled processing. Bowen provides a great example of maps, which I couldn't agree more.

I enjoyed reading about the "lie factor" in Chapter 2. I remembered a lot of data viz in news channels and newspapers when elections took place last year. Several times, I wondered if the visualization told the "correct story". Essentially, the same news was visualized very differently by Fox News and CNN. And we can see this across all sorts of media, from big newspapers to small blogs (like one below) http://techincolor.blogspot.com/2009/03/lie-factor-gizmodo-popcorn-isnt-that.html

rmnoon wrote:

I'm trying to figure out what I think about the sentiment expressed by Terrence above that we should endeavor to raise the statistical and analytical skills of data consumers.

On many levels I agree and of course I would love to see a more mathematically inclined world, but on the other hand it seems more fruitful to design visualizations that encapsulate human nature's perceptive properties. If we can pinpoint the qualities that underlie our collective spatial and quantitative abilities, we have much less of a need for wider mathematical knowledge. Tufte is popular, but while he's had a significant impact on, say, the graphic designers at the NY Times, I wouldn't say he's had a significant impact on the readership.

It also seems to me that there are a a lot of inherent dangers to stratifying or escalating the skills necessary to consume data effectively. The more formalism we build into these methods, the more we leave ourselves open to Groupthink. Having seen a lot of this unfold in financial models recently, when you make inherent assumptions of the end-user's academic knowledge there are two very likely and very unfortunate outcomes:

1. If the user understands your assumptions too well, it's possible to get lost in them and leave off otherwise important context that doesn't fit cleanly into them. Think of this as an inability 'to see the forest for the trees'. This exposes all of us to the 'long tail' of the hidden consequences of narrow mindedness.

2. If we're making assumptions about our users that require them to have undergone training outside of their own nature we're playing a dangerous game in a society where information flows so freely. If our audience changes (which in today's world a single cross-blog post can do), we run the risk of saying two different things to two different groups of people. Our figures could be used as justification for opposing viewpoints. Also likely is the exploitation of my credibility as an analyst to convey distorted information to the less educated.

These outcomes are dangerous, and although we're nowhere near that level of cultural severity at the moment, an increasingly data-sensitive planet will likely have more and more conflict derived from misperceived data. If we can be deconstructionists appealing to the fundamentals of human perceptual nature we can help avoid or stave off this slippery slope into weaponized information usage.

fxchen wrote:

I really enjoyed most of the readings this week (though the Card paper was a bit dry). Chapter 2 & 3 in Tufte's text were the most interesting for me. I think the perception of many visualization (mis)representations in many reputable publications glosses over really easily as true information in my mind. Anyways, I usually learn best through hands-on experience, I think of the data re-design we're doing today is particularly valuable to learn how to transform bad visualization to good visualization (my group's example is from Good Mag).

For those who are interested in collaborative visualization, I would recommend you check out this IBM research: http://manyeyes.alphaworks.ibm.com/manyeyes/ It's pretty rad

cabryant wrote:

In Lecture 2, a distinction is made between Data Models and Conceptual Models. My appreciation of the latter has grown considerably, particularly during the group analysis of the initial project submissions. In the absence of a guiding Conceptual model, there is a tendency toward accentuating all of the available data. The result, is a visual display that may hide the critical trends and data points, by the very fact that all aspects are treated equally in the visual-perceptual space.

One example of a visual representation with a strong conceptual underpinning is Florence Nightingale's coxcomb depicting causes of death to British soldiers during the Crimean War. Nightingale's admitted objective, "to affect thro' the Eyes what we fail to convey to the public through their word-proof ears," concedes to the criticism that the representation visually exaggerates the number of deaths due to preventable diseases by depicting a linear quantity as an area. However, Nightingale's visualization convincingly expresses a conceptual model that transcends numbers. It is likely that her objective was to depict a moral failing, as a wartime death due to preventable disease could be considered a tragedy of greater proportion than a death due to battlefield injury. The former represents a failing of government to adequately protect troops who have dedicated their lives to country (the national psychological impact of which is likely profound), while the latter is an assumed and accepted consequence of war. As such, Nightingale's choice of visualization is an arguably acceptable approach given her conceptual model.

malee wrote:

Tufte's analysis of the NY state government spending chart (pg67) unraveled the 'graphical gimmicks' used to exaggerate the actual data. Although I agree that chartjunk and (over)exaggeration detracts from a visualization, sometimes charts with 'hyperactive design' are just more interesting/fun to look at. I dislike unnecessary use of a third dimension too, but if a 3d and a 2d chart were on the same page of the newspaper, I'd probably want to look at the 3d one first. The goal, I suppose, is to be minimalist enough to present the truth while being presented in a visually appealing way.

rnarayan wrote:

Valid point (cabryant) about the separation of the conceptual model from the data model. A related, but perhaps more common issue in visualization design is that of partitioning the data itself to show/accentuate what is relevant/desired. The present explosion in both the recorded instantiations and dimensionality of data for any non-trivial space, cannot be visually accomodated without proper slicing/dicing. The case in point being large data warehouses.

The population census pyramid presented in class was one nifty example of data partitioning mechanics. The stacked graph paradigm that was demonstrated, selectively brings into view the visual for the variable of interest such as geographic region, marital status, etc. - this selection is strictly under user control. If designs are able to achieve such a clean separation and a logical mapping for slicing and dicing the data, discerning trends and patterns can become synonymous with viewing them.

Wonder what other known/documented techniques exist for data partitioning? Is there a coherent methodology/theory that supports the same?

wchoi25 wrote:

I must echo previous comments and say the Card & Mackinlay paper did not go sufficiently in depth to provide insightful details about what exactly the framework reveals about general principles behind visualization, or why and how exactly such an analysis helps us distinguish between good and bad visualizations. The examples were interesting enough, but I'm not sure if the charts used to concisely show what visual encodings were mapped to what variables were the best visualizations themselves. I found it hard to keep referring back to what each of the abbreviations were and it didn't reveal much in the way of "trends" or "patterns" in encodings.

From the Tufte reading, I found Marey's graphical train schedule (p.31) to be one of the more fascinating ones. It uses only a 2D position encoding with lines, but manages to capture much more data in condensed space, making it easy to find train speed, departure/arrival times, duration of travel, possible connection routes, fastest connections, and distance between cities.

This is interesting because the lines themselves are only constructed from very basic information. It connects two points on the (place, time) 2D-space. What's powerful is the information that this implicitly exposes. Slope gives you time; visually, diagonals spanning from top-left to bottom-right go one way and those spanning from bottom-left to top-right go the opposite direction; where lines meet or are close together are natural route transfer points, etc. It is interesting to think about how these kinds of implicit information could be utilized in a visualization.

nmarrocc wrote:

Maybe in some cases it can be a good thing to violate graphical integrity. Lets say you have a graphic and you are representing one dimensional data using area. So instead of depicting the actual difference between whatever your measuring you exaggerate things by 50% ( the lie factor, as they say). You could argue that by exaggerating things a bit, your not actually deceiving your just making it easier for the viewer to see your point, which (sometimes) is the whole purpose of having a graph in the first place. I bet if you did a study and showed a group of people an honest graph and then one that was exaggerated by 50%. Then asked them all what they thought the designers main point was, you would not be able to distinguish between the two groups of responses. But then if you did a test to see which graph was easier to read you might see that the dishonest one was easier. Of course this only works if the purpose of the design is to convey a point and not to aid in cognition.

jqle09 wrote:

I think having a slight lie factor to emphasize your point is reasonable as long as the actual numbers are provided, as Tufte mentions at the end of the chapter this defense. Depending on context especially I think visualizations that can be alright. Nightingale's pie chart is a distortion I think was necessary to evoke the intended emotional response from me, particularly because I think preventing preventable deaths is important. But data distortions for political means usually incite my ire, particularly when referring to governmental spending as in the New York State Budget graph in Tufte.

Also I thought it was interesting when Tufte noted that time series graphs are so widely used in publications. I suppose this happens because many interesting stories happen over time and visualizing progression across time particularly evokes emotional responses to our past experiences.

Also as many people have already mentioned the Card and MacKinley paper seemed to lack focus (and sadly was also very dry). I thought it was a very comprehensive paper with regard to what types of visualizations are out there. But I was expecting more information on how we can infer from our data which mapping is most useful and then how to apply it. Also there notation for the different visualizations seemed difficult to parse and understand so I am not sure how it is conducive to automatic visualization generation.

joeld wrote:

I really like Mackinlay's approach of trying to formalize visualization as a space and find automated ways of generating "good" visualizations using rules. Even if such a system didn't produce results superior to humans the existence of a formalism provides for a way to compare different visualizations.

One could even imagine rules which are completely data driven: for example mapping variables to different visual feature types based on the distribution of the data in that variable. PCA does this in effect by finding a new coordinate system which expresses the greatest variations in the data with the fewest set of vectors, at the complexity of mixing the input dimensions. We could do something simpler by doing a "Principal Axis Analysis" and choosing to plot data based on the most salient features without mixing.

gankit wrote:

Echoing some of the previous comments, I was amazed at the richness of analysis presented in Tufte's 2nd chapter on Graphical Integrity. Although it gives us a lot of insights into how one should avoid misleading the user and presenting the data truthfully, it also allows us to really understand how media is capable of biasing our opinions and impose theirs. I was particularly surprised by the budget vs. time graph (pg. 67) and how 3D elements (and the optical illusions they introduced) exaggerated the data.

But, looking at it from another perspective, the data graphics are essentially supposed to allow the user get answers to certain questions and so, evoke a certain response. Thus, it is only natural that the producer of the graphic would want to make his/her view highly apparent (and maybe cross the lines in the process). Understanding how (not) to bias user opinion could be a really useful tool if you want to evoke a really strong response from people while still following the honor code.

cabryant wrote:

Of the four uses of color enumerated by Tufte, the nominative, the decorative, and the imitative carry the most promise. In fact, the only quantitative renditions of color that seem reasonable are those that seek to imitate reality (e.g. visual correlates of depth, temperature, and the like). Those that do not, run the risk of unnecessarily violating Tufte's maxim: Above all, do no harm.

One color effect that deserves additional consideration is that of desirable instances of 1 + 1 = 3. The road map on page 93 of Envisioning Information makes the case for color use that blends the boundaries between decoration and nominality. Although this aspect of color would be incredibly difficult to automate (indeed, it would likely be foolhardy to do so), there may be merit in providing support for such manual alterations in visualization software.

Finally, the description of the use of Munsell's coloring scheme to systematize the world of manufacturing begs the question: although Munsell drew upon nature as the source of inspiration for his schemes, has this application altered our collective perception of color, subtly affecting and/or standardizing our preferences?

Leave a comment