current page



Lecture on Sep 23, 2010. (Slides)


  • Required

    • Chapter 1: Graphical Excellence, In The Visual Display of Quantitative Information. Tufte.
    • Chapter 2: Graphical Integrity, In The Visual Display of Quantitative Information. Tufte.
    • Chapter 3: Sources of Graphical Integrity, In The Visual Display of Quantitative Information. Tufte.
    • Levels of Measurement, Wikipedia.

  • Optional

    • The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations, Shneiderman, Proc. IEEE Conference on Visual Languages, Boulder 1996. (pdf)

    • The Structure of the Information Visualization Design Space. Stuart Card and Jock Mackinlay, IEEE InfoVis 97. (pdf)

    • On the theory of scales of measurement. S.S. Stevens. (jstor)


gdavo wrote:

During the lecture we saw that "color value" could be used to encode continuous quantitative variable, but that it was controversial. Indeed, human perception of weight, sound and also brightness obey to a certain extent to Weber's law. Weber conducted experiments that showed the smallest perceivable difference in a stimulus (such as brightness) is roughly proportional to the starting value of the stimulus. In the case of brightness:

ΔI/I = KWeber = 1…2 % where I is luminance, measured in cd/m2.

So brightness is in fact perceived as a discrete variable. Given typical display contrasts (cathode ray tube: 100:1, print on paper: 10:1), this law explains why digital images are generally quantized to 256 levels.

Source: pp. 6-8

mariasan wrote:

Before I forget, here's the info for free Amazon Prime.

mariasan wrote:

Before I forget, here's the info for free Amazon Prime.

jasonch wrote:

The "trivariate data" example in today's lecture reminds me of this TED Talk: Very impressive visualizations which partially inspired me to take this class.

In an example, it uses position encoding for fertility rate and life expectancy, size for population, color for location by continent, and animation for time. It expresses 5 dimensions with one very expressive visual.

esegel wrote:

My favorite Principles from the reading:

  1. “Show data variation, not design variation.”
  2. Data should be shown in context, answering the question "Compared to what?"
  3. Write out explanations and label important events on the graphic.
  4. The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data.
rakasaka wrote:

Clarity, context and consistency- I find it interesting that design ideas that are seem rational at first glance can be so misleading. I am slightly apprehensive of the suggestion that most graphical work "particularly at news publications, is under the direction of but a single expertise - the artistic". Surely journalistic integrity has advanced to the point where bias need not and should not creep into the most salient of communicative means? At least it's perhaps reassuring to realize graphical representation does require careful attention to maths even though it isn't the first thing that one realizes when seeing a graphic.

selassid wrote:

I have been thinking about the differences between sci-vis and info-vis a bit more. We discussed that some of sci-vis' feel comes from there being obvious graphical dimensions for the physical dimensions being measured (position, time, etc.). I don't think we pointed out the need for very explicitly qualitative relations being shown in sci-vis. Yes, an info-vis or a sci-vis could be based on tables and tables of accurate numbers, but a scientific argument is often trying to show the applicability of some mathematical model. I would say it's unacceptable to provide a visualization in support of a model that does not demonstrate quantitatively how that model fits: e.g. it's more difficult to easily show a linear fit to the point shape or to show statistical analyses on color. This might just mean that sci-vis are more limited to the number of variables you can show at once, since you might be more limited to the most numerically expressive graphical techniques, or that you'd have to separately compare model with data.

rakasaka wrote:

@jasonch I was thinking of the same. Apparently Google bought his software and renamed it Google Motion Chart:

jorgeh wrote:

One of the goals of visualizations, as we have discussed in class, is to pose a question and possibly to offer and answer to that question. Even though I agree with that goal, I also think that another potential use of data visualization is to help us find questions that we didn't think of at the beginning. In that sense, systems that automatically propose different visualizations of a dataset, such as the one proposed by Mackinlay (and probably others that I don't know off), could be a useful tool to find those questions, don't you think?

trcarden wrote:

@jorgeh I agree completely. However lets take it a step farther for fun. Imagine a machine that could find statistically significant patterns or outliers, visualize them, and based off the dimensions of the data propose possible facts rather than questions. For example if you look at the tufte reproduction of the space shuttle o-ring damage visualization, the fact we deduce is that the temperature at launch day is unsafe because of the high potential for damage. Do you think it could be possible to create a crude algorithm for generating facts from the data labels/dimensions and outlier/pattern area? I think fact generation with supporting visualization would be really cool extension to Mackinlay's work.

Also if you are interested in Mackinlay's research contributions he published a book with quite a few of his research contributions in it. Here is the amazon link

acravens wrote:

Long story short, my econometrics-steeped roommate saw the van Wijk reading from Tuesday on the kitchen table and we got into a discussion about the difference between data visualization and statistics. My claim was that they are sometimes complementary modes of analysis and sometimes alternative strategies. The complementary aspect seems clear from yesterday's readings on communicating statistical information and the lecture (e.g. plotting things on a graph in order to find initial patterns in your dataset, looking for patterns like on the yesterday in class and then using what you find to inspire new questions about what variables to investigate further with statistical techniques). But I don’t think my roommate is very familiar with dynamic algorithm-based visualizations that are being used to think rather than communicate insights generated using statistics. And I think that’s a case where statistics and visualization may actually be alternative methods. So curious what others think. What characteristics of the data lend themselves to one or the other? Is this related to the info-vis vs sci-vis debate at all?

yanzhudu wrote:

@trcarden There is already "outlier detection algorithms" being used. For example, there is "Statistical anomaly-based Intrusion Detection System (IDS)" in computer security:

Essentially it assumes that statistically, illegal access (e.g. hacker trying to copy everything out as fast as possible) will have different access pattern than legitimate access(e.g. small edits). Therefore, illegal access will be outliers and can be detected by algorithm.

On other hand, visualization, in some application, is purposefully capitalizing on human vision's processing capability to detect patterns and outliers. For example: using visualization to discover gene expression for cancer

andreaz wrote:

I was curious about reading more about critiques on Bertin's fundamental graphic variables and I came across an article by Alan MacEachren that summed up some research on this topic. Some main points MacEachren makes:

• There is a distinction between density and size of texture elements. The size of texture elements corresponds to Bertin's variable grain, but density of texture elements is unaccounted for.

• Saturation is unaccounted for in Bertin's variables, which is often used in visualizations to create a distinction between foreground and background and to represent the degree of data certainty.

• Bertin's variables are not based on empirical research or cognitive theory. Despite this fact, most of Bertin's intuitions have been supported by research.

• Evolution in graphic display technology has resulted in more potentially fundamental variables, including transparency and crispness. Developments in technology have made it easy to add touch, temporal variation, and sound to graphic representations, prompting some researchers to propose sets of tactile, dynamic, and sonic variables modeled on Bertin's approach.

Here's a link to the pdf of the article:

jtamayo wrote:

Prof. Heer suggested that there is a mistake in the "Large design space (visual metaphors)" slide. Graphs 2,5 and 8 suggest there are some linear trends in the data, and they are drawn explicitly in graph 8. This might be misleading, though, since the factors A,B,C,D,E are just nominal data, and a linear trend whose independent variable is nominal makes no sense.

amirg wrote:

Also, on the "Large design space" slide, Row 2 in Graph 7 corresponds to Row 3 in Graph 8 and Row 3 in Graph 7 corresponds to Row 2 in Graph 8. That's a fairly minor detail, but the reason I noticed it is that I was trying to understand the visual encoding of Row 2 in Graph 9, which seems to be done by the size of the points. But there is also an axis labeled "2" extending out of the page in this graph, so I guess the size of the points is also supposed to reflect their position along this axis. Additionally, by using the size of the points to represent the value of Row 2, the graph uses an area to represent one-dimensional data, which Tufte argues against in the chapter on graphical integrity. In general, I think one has to be very careful when using size as a visual encoding for data because it has great potential to be misleading, which is shown clearly in the reading.

trcarden wrote:

@yanzhudu : Interesting note on the outlier detection algorithms for hacker detection (looks really cool!) however the distinction between the algorithmic approach and the data visualization approach is still blurry for me.

I agree that data visualization capitalizes on the human visual process and our native abilities to make sense of visual data but how do we create effective visualizations automatically? Thats what i want to figure out. From the lecture it seems like Mackinlay’s approach is very similar to what i am advocating for except it was done before i was born (aka a long time ago). What i would be interested in is seeing if new AI methods or the additional computational power would allow for it as a basis for a visualization designer tool. In this regard i see visualization going hand in hand with statistical methods.

msavva wrote:

After seeing Tufte's tables on the topic of graphical sophistication in the press/education and his subsequent argument that primary school age children are easily capable of interpreting bivariate plots, I was reminded of various visual interface designs in video games. Tufte's claim somehow clicked in my mind with another mistaken popular belief that the majority of video games are cognitively simplistic (i.e. childish). In my opinion many video games exemplify how players are expected to quickly internalize novel visualizations of complicated player states, often with minimal explicit instructions. Just a few examples that come to mind:

- Overview mini-maps, multiple unit selection and range indicators in most strategy games (cf. overview/detail, focus/context, brushing in Card et al's formalization of the visualization model -- some early examples such as "Dune II" predate the book by almost a decade)

- Inventory management and organization screens with area based equipment size/weight representations (a screen-shot of a grid system from "Deus Ex": image -- arguably this is an instance of a visualization with a Lie Factor not equal to 1 since size or weight is not proportional to 2D area)

- Player centric orientation, state and enemy direction indicators (note the multiple color coded radar rings and sectors centered on the player robot in this video from "Zone of the Enders: The 2nd Runner": -- many modern jet fighters have Heads-Up Display systems that seem to rank lower in expressiveness and effectiveness)

In all the above, much can probably be said about how a feedback loop through the inherent interactivity of the game is training a player to interpret the interface visualization. Video games can serve as good examples to underline the usefulness of interaction and animation in making visualizations accessible.

adh15 wrote:

One of Tufte's principles of graphical excellence is that a visualization should use the smallest space possible. This principle could be improved by citing the need to use the smallest perceptual space, since efficiency in the use of physical space (area) and resolution (dots/pixels per inch) is different than efficiency in the use of perceptual space and resolution. To this end, the designer must know the anticipated viewing distance, lighting conditions, and other attributes of the viewing environment before making any judgement about the efficiency of perceptual space & resolution. Through the end of chapter 3 (and based on a quick glance at chapter 8), Tufte seems to assume that the designer is creating a visualization meant to be viewed by one person at a time, in a well-lit office, and at a distance of about 18 inches. To his credit, such scenarios are common, but in general, visualization designers should not make assumptions about the settings in which their work will be used.

Along these lines, does anyone know whether user testing is common in the design of visualizations for various media (e.g. newspapers, academic journals)? If it is not, facilitating such testing could be an interesting design opportunity.

gneokleo wrote:

I found Tufte's chapter about visualization integrity very interesting. Tufte makes a good point by saying that most people are trained in fine arts and not some analytical field like statistics and often times tend to focus a lot on the "visual" and artistic part of a visualization but in fact the right balance between accuracy, correct representation and detail must exist for a visualization to be good. I also thought that the Lie Factor equation to be an interesting way of figuring out if a visualization is lying but will people really spend the time to do such a thing to really investigate if a visualization is lying? The purpose of a visualization is to look at it very quickly and draw conclusions out of it very fast (especially when in a magazine or newspaper with a big audience).

ankitak wrote:

I have been thinking about the social applications of visualization; and how the principles of visualization we discussed in the class might differ when talking about mobile devices - considering the small screen size, limited hardware and bandwidth, as well as the possibility that the user is on the move and need information in a concise manner.

I came across an interesting introductory paper by Nokia about principles of visualization and graphic design principles for mobile devices here.

Though this paper goes on to discuss many graphic design issues not relevant in the current context, there are a few basic but interesting principles put together in section 3 -
1. It talks about how various colors and hues can be / generally are used as standard indicators. Even though usage of color is controversial due to different perceptions by different people, as discussed in the class, certain basic colors are almost universally used to indicate the same meaning.
2. It points out that though animation may be a great way to get the user's attention, it should be used judiciously on a mobile device.
3. It also suggests that sounds can be used as an aid to visualization.

lekanw wrote:

It seems like in the age of computers and animation and interaction, Tufte's space-minimization principles should be modified to allow for ways of interacting like panning and zooming, or more generally, drilling down and rolling up. A simple example is the visualizer for the Netflix movie clustering data using the GUESS package, located here. When representing this data visually, we should ideally try to optimize not just the space the visualization uses, but the space used by the subset of data present on the screen, and the ease of visualizing the relevant parts of the rest of the data. Then, relevance of off-screen data transitioned in-screen, and the ease of transition become paramount.

emrosenf wrote:

@anitek You got me thinking about colors on mobile apps and SaaS web apps. I'm very interested in optimizing conversion funnels and a surprising result that I discovered recently is that red buttons convert better than green buttons. See these 'studies': here, and here.

This is seemingly in contrast to what you might expect from red, which is usually an urgency color (error, warning, emergency). On the other hand, green makes one think of "go", "money", "start".

Perhaps red is better at drawing attention, and here attention is paramount? I wonder if anyone has ever categorized colors by power in an application-specific way.

jbastien wrote:

Tufte is at time fascinating, mind-opening, and at times frustrating in how he presents things. His view of the world is manichean and often his own criticism doesn't apply to himself.

He talks a lot about the evils of using pretty visualizations to advance one's point, but he's obviously very good at using writing to advance his own points.

I especially like that a self-published statistician "filled with all sorts of opinions about design and typography" ridicules books publishers for being "appalled at the prospect that an author might govern design" while saying that "allowing artist-illustrators to control the design and content of statistical graphics is almost like allowing typographers to control the content, style, and editing of prose." He obviously believes the converse of his argument doesn't apply.

I doubt statisticians always have pure motives and artist merely want to create nice pictures. I strongly agree with his questioning, it brings points worth taking into consideration, but I think what he often despises is more easily attributed to ignorance than to malignancy.

dlburke wrote:

In Prof. Heer's critique of the visual encoding variables, he suggests area and volume to be added to length. But as I understand it, the variables deal with images, which are inherently 2D. Given that the image is static, it is impossible to determine volume. The 3D appearance comes from the color and texture. While the projection of a 3D image may suggest depth, it does not really exist in the image. Therefore, it seems that even if one could argue that volume can be a visual encoding in an image, it does not seem like it would be a very useful one.

nikil wrote:

I was rather surprised to see in the Wikipedia article on measures that there was still debate on the system of classifying measures. Specifically the geographic cartography has a proposed system of 10 different measures. The system of nominal, ordinal, and quantitative has worked fairly well for the examples that we have seen in class, but I am curious to explore new measures for complex examples.

anomikos wrote:

During the lecture I couldn't stop thinking about the possibility to "lie" about everything with a good visualization. Other commentators like @gneokleo have also successfully identified the problem which unfortunately runs deeper than a simple artistic beautification of the visualization. What if someone wants to explicitly make viewers reach a specific conclusion? The US went to war against Iraq because of some photos. We base our decisions on information, but what if the information is tampered. Cognition plays are more important role for us humans than data do. Are there any intuitive methods we can learn that can help us distinguish visualizations that lie? Can a visualization framework be defined in such a way that it can aid this process?

msewak wrote:

I loved the chapter on graphical integrity. The lie factor for the Fuel Economy Standards for Autos is a great example. I thought the ordering of years was pretty unconventional. As pointed out, the lie factor was 14.8 which is very large.

jeffwear wrote:

In reply to @anomikos and @gneoloko, perhaps a standardized visualization framework and rules of application, as might be implemented by something like Mackinlay's APT system, would aid in preparing "correct representations" of data. But I find the concept of "correctness" somewhat ambiguous in the context of preparing representations. There are two relevant processes at stake when mapping data to a visualization:

* choosing how to represent the data (the type of visualization)

* rendering the data within the chosen format

Certain datasets and types of data lend themselves to particular types of representations - for instance quantitative, statistical data to a boxplot - and in these instances it is also easy to verify that the data was rendered accurately in that representation.

But other types of data, for instance nominal categories, may not inherently suggest a certain visualization. It may become less easy to verify that the data is represented correctly, for the rules of representation may no longer be empirically bound by the dataset but rather subjectively, by the designer. We could verify that the representation is coherent with the data, given the designer's explanation - if we have the time to analyze the visualization, as @gneoloko points out, let alone access to the underlying data - but then as @anomikos suggests, the designer might have chosen, quite faithfully, to emphasize certain aspects of the data even at the expense of other important aspects.

I think that as we might expect to compare multiple secondary news sources to arrive at the best depiction of what did "actually happen", we should seek to compare multiple visualizations of a certain dataset to arrive at definitive conclusions regarding its contents.

A counterpoint to further the discussion: to minimize misinterpretation by the viewer and to fully convey the intended message, might the most effective visualizations attempt to constrain interpretations of the data?

skairam wrote:

Above, @jbastien said: "I doubt statisticians always have pure motives and artist merely want to create nice pictures."

I think this is a great point.

So much of what I have read about visualizations assumes that there is an objective truth and that a "good" visualization is one that surfaces this truth as effectively as possible. The analogy to writing or painting is great because those are domains where the role of perspective and bias are clearly understood (or at least up for discussion). At least with respect to visualizations designed for communication (rather than exploration), the communication of content is inherently fraught with perspective and bias.

Great rhetoric consists of words which convince, and likewise, a great visualization should be judged by its ability to convince.

jsnation wrote:

I found the chapter on Graphical Integrity to be really interesting to read. Specifically, it is crazy that so many graphs are skewed to show someones point of view rather than the facts from the data. I don't even know why some graphic creators felt the need to exaggerate their proportions in such a way, since the data seemed to agree with whatever trend they were trying to show without lieing in that way. It seems like this wasn't always done as a purposeful attempt to mislead people, but rather because the graphical artist was making an artist decision in the graph instead of following the statistics.

I also really like the list of principles to follow to maintain graphical integrity listed at the end of chapter 2. I think that I have broken the principle of always ensuring the number of dimensions depicted does not exceed the number of dimensions of the data. The book gave me the impression that graphic design used to be dominated by artists more than the creators the of the scientific data, but that has been shifting more recently. It seems obvious that the people that have the most knowledge on the data should be involved in the creation of visualizations from that data.

ericruth wrote:

I think the issue @anomikos brings up about "lying" or deceiving with visualizations is a really interesting and dangerous issue. There are so many ways to transform and slice data, that it almost seems possible to tell completely separate and seemingly contradictory stories about the same data set.

For me, this really drives home the importance of the data source. Prof. Heer mentioned this about the first assignment - and although it seemed a bit silly in the context of an assignment with a fixed data set - I definitely see why he emphasized the importance of the source.

Even with a source, however, it's by no means reasonable to expect all viewers to check the original data source. This is the scary part in my eyes, because the person presenting the data has so much control over how it is perceived (or even what parts are perceived). This makes me wonder if there are good techniques for presenting data in a easy-to-parse, yet neutral manner. A manner that allows viewers to find their own story, rather than being spoon-fed one.

asindhu wrote:

@ericruth and everyone else commenting about how visualizations can be deceiving, I just wanted to add my take on it:

The basic takeaway point from all of this discussion seems to be that if the way we design a visualization has so much impact on what conclusions we draw from the underlying data, we can imagine specifically designing a visualization with a certain "agenda," as it were, to convey the message we want it to convey. However, as Eric points out, there's more to it than that, because we have to take into account the underlying data. And I would take it a step further to say that in fact, the underlying data is really the only thing that matters, and here's why:

As educated viewers of visualizations, we have to approach each visualization with the understanding that it was designed to convey a certain message in the data. If it is a well-designed visualization, it's not going to convey any information that wasn't originally in the source data, it's just going to choose which message to highlight. In other words, a visualization can't fabricate something that's not in the data, but it will be selective about what higher-level messages it conveys from the data. As long as we approach a visualization with this understanding in mind, I don't think we need to be worried about being deceived or "lied to" by the design of a visualization.

avogel wrote:

@jasonch and @rakasaka, those are some remarkable charts. They seem a logical extension of having animation and database techniques (roll-up, drill-down, etc.) available. I hope to have the time to try working with those tools this quarter.

Also @rakasaka regarding journalistic integrity, I take a more jaded view - I'm sure there was a period when journalistic integrity advanced, but I believe at some point it hit a natural plateau - as a race, humans have not changed that much. Perhaps the best we can hope for is that natural competition will ensure that the best information drowns out bad information, however, my fear is that the information that sells best will dominate, and the best information is not what sells.

Regarding the reading, I like Tufte's devotion to minimalism, but felt he takes it rather too far. I think the best way to validate his thoughts would be extensive study of applying his principles, but lacking good information on this my gut instinct is that he is undervaluing tradition and familiarity in his critique of basic visualizations like scatterplots. Of course, were his principles to be commonly adopted and be taught at young ages, they could readily become the standards.

In contrast to my own argument, or perhaps as an extension, I like @lekanw's point that we have new and different ways of interacting with visualizations and need to consider how that might affect Tufte's principles.

I'm coming to think of Tufte as an early philosopher - not so much valuable for exactly what he's saying, but for the way of thinking about his topic and his way of presenting it.

hyatt4 wrote:

In regard to one of @adh15's comments above:

Along these lines, does anyone know whether user testing is common in the design of visualizations for various media (e.g. newspapers, academic journals)? If it is not, facilitating such testing could be an interesting design opportunity.

I can only speculate on user testing for various media, but I do know that there is quite some interest in testing visualization for comparison of different evaluation schemes that can be used to gather such information. For example, if you want to do a survey of which designs are best visually for an audience, what visualization representation works the best in helping that audience truthfully and effectively navigate through the provided choices. A visualization design of a visualization design (which starts getting pretty abstract I will admit, but still quite useful).

jayhp9 wrote:

@hyatt4 In extension to your answer to adh15 about user testing, I asked in class today, at what stage in the process of building a visualization is user testing suitable, if at all?

The answer in brief: In many cases, the person making the visualization is the 'user' himself. This is because often, like in exploratory data analysis, we are interested in studying data and visualizations of that data to see what other questions we can come up with and explore. Therefore, if one wants to gain suitable insights from a visualization, it should be made in a manner that one likes, since that visualization is probably going to be used by noone else. Still, showing an intermediary visualization to someone who is an expert in the domain of knowledge you are working on may be very useful because he/she can point our insights that you may not know.

Another side of the issue is that you are creating a website to post your visualizations to the general public. In this case, user testing is important to gauge how the user interact with visualization, what components attract their attention, what parts completely miss their eyes etc. It is usually preferred to have a small carefully chosen group that is close to you, so that you can iterate quickly. This process is very similar to product design, where the user's opinion is often more reliable than that of the developer of the product. By 'carefully chosen', I mean pick a few members of various categories from within your chosen target user group.

Leave a comment