current page

history

user

Lecture on Thursday, September 29, 2011. (Slides)

Readings

  • Required

    • Chapter 1: Graphical Excellence, In The Visual Display of Quantitative Information. Tufte.
    • Chapter 2: Graphical Integrity, In The Visual Display of Quantitative Information. Tufte.
    • Chapter 3: Sources of Graphical Integrity, In The Visual Display of Quantitative Information. Tufte.
    • Levels of Measurement, Wikipedia.

  • Optional

    • The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations, Shneiderman, Proc. IEEE Conference on Visual Languages, Boulder 1996. (pdf)

    • The Structure of the Information Visualization Design Space. Stuart Card and Jock Mackinlay. InfoVis 97. (pdf)

    • On the theory of scales of measurement. S.S. Stevens. (jstor)

Comments

eschkufz wrote:

I enjoyed today's discussion about automatically generating the best possible visualization for a dataset. However it sounds like the big open issue in the area is how to extend the generality of the approach.

One idea is to relax the notion of 'best', and to consider designing a tool to serve as a 'creativity aid'. Ranjitha Kumar has done something similar in the area of web design: http://hci.stanford.edu/publications/2011/Bricolage/Bricolage-CHI2011.pdf.

Could the approach be extended to visualization? It does rely on a large pre-existing corpus of known good exemplars. Does such a thing exist for visualization?

yeyleo wrote:

I'm not sure if there can be one generalized "best" possible visualization for a dataset. Even as Tufte points out with the maps of deaths from various types of cancer (p. 16-19), although the maps are great at displaying vast amounts of data in a small amount of space, they still wrongly emphasize the geographical areas of the counties rather than with the number of people living in the country or the number of people in a county who had died from cancer. So the definition of "best" will vary depending on what story one wants to show to the viewer. It seems that standards in visualization are a bad thing, at least according to Tufte where he states "Not a word about deception; no tortured attempts to construct more "graphical standards" in a hopeless effort to end all distortions" (p.53). However, if standards are not desirable, then what should algorithms which attempt to create correct data visualizations do?

jojo0808 wrote:

It's discussions like today's that remind me that art and science are not as separate as I sometimes tend to think. Without an understanding of the data itself and the interesting questions that could be asked of it, a visualization will not be able to tell a very good story. Neither can the story be told without a sense of design and the ability to skillfully use visual properties to represent what the data is saying.

zhenghao wrote:

I was amazed by the work done in automatic generation of visualizations. To echo a comment in class today, why don't we have this on our computers? An interesting reference on this topic is the psychology of visualization. Specifically, in section 3, Andrew Csinger discusses Bertin and Mackinlay's work on formalizing design in some detail and introduces different components of visualization processes. It is also interesting to note how similar Bertin's definitions of group operations (association and selection) are to the operations on relational databases we saw today in class (rolling up and drilling down).

Another topic that I find especially interesting from today's lecture is the use of color (both hue and saturation) as visual encodings. There is a paper by Chris Healey which contains some very interesting psychophysical findings! For example that colors are more easily distinguished if they can be linearly separated from all other background colors in colorspace. The section on perceptual visualization also describes preattentive visual features such as isolating visual elements which can be done in under 200ms! It seems exciting that we can directly take advantage of the quirks of low level vision to make more effective use of color (and other visual elements). Pre-attentive discrimination is also mentioned in Csinger's work linked above.

jhlau wrote:

I found our discussion about guiding visualization concepts to be very useful today, especially in coming up with a "taxonomy" of visualization techniques. I can see how these concepts arose from Bertin's original collection of visual encoding variables, and I'm somewhat surprised that they've held true without major modification up until now. One interesting area to research would be in what order the visual encoding variables showed up historically. I imagine the first encoding variables ended up being those most common to our thinking and most easily perceived.

Another interesting research idea would be to look at how quickly we can discern different visual encoding variables. For example, color is probably very easy to discern, as is position. Something like texture or brightness would be more difficult, and it seems like this correlates with "strong" or "weak" visual encoding variables. In class it seemed like we suggested using no more than 4 encoding variables, and it seems like it'd be pretty simple to figure out the "best" 4 to use. Then again, I'm sure different data would require different visualizations and therefore different encoding variables. Still, it seems that some of the variables are redundant - for example, color and texture are both very similar in that they modify the area of a shape. I can't imagine that you'd pick 2 of your 4 variables to be color and texture....

And yet another interesting area of research (this topic brings up many questions, I feel) would be to investigate how well these concepts hold across cultures. For example, some Asian languages write down and to the left... so do graphs that go up and to the right mean something completely different to them?

Also, it's been proven that mapmakers only need four colors to arrange any shape such that no shape is adjacent to another shape of the same color. I'm curious if this mathematical property of spatial color arrangement has anything to do with our perception of different colors as an unordered arrangement.

abless wrote:

I have two comments.

First, I'd like to comment on Mackinlay's APT. The above comments talked about the notion of "best" visualization. I would like to point out that Mackinlay wasn't talking about "best", but rather about the most _effective_ visualization. Now, effective in this context has a very special definition (see slides). However, it is worth noting that even Mackinlay points out the challenge in measuring effectiveness:

"The difficulty is that there does not yet exist an empirically verified theory of human perceptual capabilities that can be used to prove theorems about the effectiveness of the graphical language. Therefore, one must conjecture a theory of effectiveness that is both intuitively motivated and consistent with current empirically verified knowledge about human perceptual abilities." [Mackinlay, 86].

Basically, we don't really know how to measure effectiveness, but Mackinlay's approach is an approximation.

My second comment concerns the distinction of data into different types (Nominal, Ordinal, etc.). While my first reaction to such abstraction is rather skeptical (why have that distinction? why aren't there others?), I came to appreciate the different types later during the lecture when talking about Bertin's "Levels of Organization". I think this can really help in choosing an effective visualization.

zgalant wrote:

On page 51 of "The Visual Display of Quantitative Information," Tufte defines Graphical Excellence.

"Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space."

At first, this seems reasonable, since it sounds like the graphic is more effective, but I don't think that reducing the ink or space is necessarily helpful. For example, more space may be required to accurately show scale. A diagram of the solar system that is drawn in a small space may not accurately highlight the vast distance between the outer planets and the sun. The use of blank space should actually increase the excellence of the graphic rather than detracting from it.

I think Tufte's definition is generally correct, but it can't be an accurate calculation of actual excellence, as it fails to account for situations like this.

jsadler wrote:

Hey @zgalant nice point about the potential shortcomings of what Tufte thinks makes an "excellent" visualization.

It seems that Tufte is indeed in the camp of extreme "data-density" being a good thing. I agree with you that a good use of ink to data is probably a good thing but if taken to the extreme we can lose information in our quest pack in more data in the same amount out ink. Is there a yet a medical phobia for fear-of-white-space-in-dataviz ?

Tufte's affinity for the Marey Train Schedule (which he even chooses as the cover of the book) shows his appetite for packing in a lot of "bang for the buck".

When is too much?

-- By the way the dynamic population pyramid shown in class today blew my mind. So much information in is encoded in the data brought alive by movement ...

luyota wrote:

What most interested me today was what the extreme of information visualization is. In the class we learned from Bertin's quotation that visualization has its limitations in terms of the order of dimensions a single image can encode, which is 3, as it's the limitation of human's visual perception. But is it really true? Indeed, an image can only show the world within 3 dimensions; higher dimension space is not something that we can see directly without imagination. However, information visualization doesn't necessary have to exactly reflect the real world. Maybe with the combination of color, size, and other variables we can encode a table with higher dimensions yet make the visualization easy to understand?

Besides this, I found one thing in the class useful - what variables are suitable for encoding what types of data (nominal, ordinal and quantitative). Putting their relations in a single table makes it much easier to decide which element to use when designing the visualization work.

olliver wrote:

Since visualizations are so dependent on the medium through which they are displayed, It is interesting to go back and amend more possibilities to our list of visual cues which are enabled with technological advancements. 3D displays are becoming much more common, and here at Stanford we even have our own virtual reality lab where a person can become immersed in a virtual world. Demos that allow the user to see cars rushing towards them or fly add many new visual dimensions, the dimensions of proximity and feelings of danger, excitement and confusion. (While those are more pronounced in VR, we shouldn't forget their importance in typical screen/paper visualizations. )

crfsanct wrote:

I would like to see some more examples of 5 or more visual encoding variables in one plot. I want to get a better feeling about how good or bad that would actually be. I did a quick search online and found a really cool 4 dimensional example at http://www.gapminder.org/world/ This shows how position (2x), size, and color can effectively encode a wide variety of world data into one plot. It's interactive so you can change the data sets and also play back through time.

bbunge wrote:

On the note of using a tool to aid data visualization, it can only get one so far.

To use the example of web design, tools like Dreamweaver, pre-made templates, drag-and-drop site builders, etc. limit the expressive power of the designer. Knowing the capabilities of html, css, and javascript unleash the designer to conceptualize more interesting and complicated designs. I am not saying that this is impossible with the tools, but the results are not likely to be as compelling and unique. Granted, though, visual tools, such as Photoshop, are very useful for experimenting with different visual designs.

In the space of data visualization, tools such as excel, R, etc. minimize the effort it takes to create many different types of graphs. However, this class has shown me that there are many techniques and strategies that go beyond these types of graphs and what the creator of the aforementioned tools may have had in mind. I am excited to learn how to use d3, a lower level approach, because it seems as though d3 is to data vis, what html, css, and javascript are to web design.

I feel that a tool for visual exploration (like PhotoShop for web design) would be useful before coding up something using a graphing library. What might this look like?

jkeeshin wrote:

I thought the demo and visualization of census data in class was very interesting. I think at first the way to present the data was obvious, but what became apparent was that the story you wanted to tell, and the data you chose to hide or show was crucial to making a point apparent. Even when we got to the marital status slide, it seemed that our number of buckets could even have been a few less, making the graphic more clear.

A few other interesting ideas from class that I found interesting were redundant encodings, for example using hue and color in a stock chart, and finding a way to tell a story or reason about your data.

One thing I was not sure about from class and also in the readings was this scale for evaluating effectiveness. It seems we are trying to make the visualizations which are most easily understood by the viewer, and which can compress lots of data into something much simpler. For Tufte, it seems that he is overly concerned with how many numbers you are condensing on this graphic.

"...this decomposition of economic data, arraying 1,296 numbers" (38)

"thus 28,800 pollutant readings are shown" (42)

"Each map portrays some 21,000 numbers" (16)

Honestly, the way he writes some of these sentences it sounds like he is bragging that one can fit so many numbers onto a graphic. And although I think he makes a lot of good points, I think he is a little off here. I don't think the amount of data that we can shrink into a graphic is one of the main indicators of how great it is. When he writes about the graphic of Napoleon's march, he seems so proud that the creator could show six variables! Six variables on one chart!

I don't think it should be about the number of variables or data points you can fit onto a graphic. It sounds like high school boys bragging to each other. I think it should be about the idea that you are able to convey, whether it is a data-set of 100 or 1 million data points, or one variable to 10 variables. Ideas and clarity are more important than the data-count.

babchick wrote:

I'd like to respond to the question posed in class regarding Mackinlay's Design Algorithm and why doesn't it exist in any popular form today (ignoring Tableau for a moment): I think all the formal taxonomies, the breakdown of visual encodings, and Mackinlay's Ranking for effectiveness are useful for designers who already know the story they want to tell. I think the main limitation with an idea like this is that it still doesn't solve the problem of surfacing the most relevant data, similar to Jeff's note in the Car Nationality chart slide, that it wouldn't be surprising to see a computer do that based on some arbitrary mapping of data types to visual encodings.

As far as I'm aware, figuring out how to effectively tell a visual story is still a uniquely human task. The real potential computers have in automating our design process, in my opinion, is to aid us in rapidly manipulating and intelligently iterating on these permutations of visual encodings. This is something like Tableau's true power, beyond the aesthetically pleasing GUI – in that it allows you to, without coding at all, churn through 10-15 ways of visualizing a complex data set with respect to selected dimensions and discover quickly which dimensions are worth exploring, and more importantly, tell interesting stories.

mbarrien wrote:

To build off of what crfsanct said, I know this lecture was about using a static picture to display data, but there is a lot to be said about using a 5th dimension of time if you have it. The 4-D charts he mentioned are actually 5-D, since it allows scrolling through time.

The founder of Gapminder that crfsanct mentioned has given several talked at TED conferences specifically showing off the power of that 5th dimension using his visualization tool. An example is at http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html

I'm curious how effective some of the other encoding variables would work in a visualization that uses time, such as texture and orientation.

chanind wrote:

I'm very skeptical of automated tools being able to generate high quality visualizations. Making a great visualization requires an understanding of what data is being visualized and for what purpose. I like the idea brought up in class that a visualization can be an argument. How can an automated algorithm ever understand the meaning of the problem being visualized well enough to form an argument?

I also agree that trying to cram as many variables as possible onto a visualization is missing the point. Distilling the variables being displayed to their simplest possible subset and visualizing less should aid cognition more than trying to fit 5 or 6 dimensions onto a single page. This reminds me of a famous book in the web design field called "Don't Make me Think". In this book the author argues your goal as a designer is to make it as simple and easy as possible for your user to get the information he wants (minimal thinking required). Adding extreme amounts of variables just means it takes more effort for the viewer of your visualization to understand what's going on.

ividya wrote:

Similar to bbunge, I was thinking about the list of data type taxonomy and the use of tools to aid data visualization. Jeff mentioned in class that there was a couple of examples that we could use nowadays that were not in the reach of the technology at the time. This leads me to wonder if there's even a point in trying to organize and categorize the different data types we have today, if technology is just going to render that list obsolete. Considering that technology is innovating at an even faster rate, it seems that any list would be obsolete almost by the time it was compiled. I can see how such a list must be useful in a design algorithm or the creation of an application that creates visualization, but I don't think that it's possible to create such an application without some serious limits. Take for example the census data that Jeff presented in class. There was a multitude of obvious and simple ways to show that data, but the end result was much more effective because it was designed with a use case and goal in mind. The representation of the census numbers could not have been produced by a design algorithm, no matter how many data types it could generate. It was clear in the visualization that the author took a lot of time considering the relationship between the different columns of the data and was looking for a visualization that would emphasis some of those relationships over others. For example, the slide on the bottom of the visualization that allowed the user to switch between different time periods could have been generated by design algorithm. I would argue that the use of area to represent to the number of people married across the varying ages while splitting up the females and males seems to be an idea that requires the understanding that the chances of being married increases as age increases and that since marriage generally requires one male and one female, there should be a sort of symmetry across the gender axis. This is in a large part why I'm excited to learn how to design effective visualizations since the task seems like one that can not be reduced simply to finding the ultimate algorithm that will produce the most effective visualization.

pcish wrote:

It was mentioned in class on several occassions that "interaction" was missing from the visualization taxonomies that were presented. In the optional reading by Shneiderman, the "traditional" visualization taxonomy of 7 data types is further augumented with 7 verb classes (e.g. zoom, filter). Although the author does not explicitly say so, the 7 new visualization techniques essentially represent the actions that can be accomplished with interactive visualizations. The paper is interesting in a few ways. The first being its publication date: 1996, back when the WWW was still gaining popularity and interactive web pages were probably rare, yet there is mention of HTML in the paper. Interactive presentation of data obviously first appeared in non-web-based applications though. The second is that it seems confusing to discuss these action orientated classifications in conjunction with data type categorizations. It feels that these verb types are not so much new data types but methods for portraying the underlying data, and would fall in more naturally with the classification of data encoding techniques (size, texture, colour) that was discuessed in class. A final minor thing of note is that the manuscript had a distinct lack of visualizations for a paper discussing them.

kpoppen wrote:

I agree completely with babchick and awpharr. Ultimately it's unclear, at least in the interim before the singularity, that computers are / will be smart enough to make human-caliber decisions about what is the optimal visualization (at least for high-dimensional / complex datasets). That is not to say, however, that there isn't a use for computers to help automatically visualize data, or to help in the ideation process, and I think that computers today are well within reach of having a "button in Excel" that does basically the right thing.

I think that awpharr's point about Garry Kasparov (beside the fact that Bobby Fischer was better :P) was very cogent in this regard. Until we have Strong AI, I don't think it even makes sense to claim that we could produce a system that could produce all possible viable designs and choose between them. Take even the example of the visualization of Napoleon's failed invasion of Russia. The reason why that chart is effective is because even though there is a huge amount of information overload, the overloading is done in such a way that the result was still intuitive and useful. That said, I'm not sure any visualization algorithm I would ever write would even endorse such a chart, nor am I even sure that I would recognize its value if one spat the chart out to me (maybe after this class :).

arvind30 wrote:

The sentence that jumped out most to me in the readings was Tufte's claim that "data graphics are no different from words...any means of communication can be used to deceive. There is no reason to believe that graphics are especially vulnerable to exploitation by liars" (Visual Display, p. 53). At first, I was inclined to believe him but then I began to think about it more and I'm not sure that I agree any longer.

I believe that graphics are typically more vulnerable to exploitation because they rely so heavily on our powers of perception of visual encoding variables. I think we touched on this in lecture but as Cleveland et al. discovered (https://secure.cs.uvic.ca/twiki/pub/Research/Chisel/ComputationalAestheticsProject/cleveland.pdf), we're not able to perceive every type of encoding with an equal amount of accuracy and "liars" are able to utilize this to their advantage. This is why I'm also not entirely convinced of his claim that we "have pretty good graphical lie detectors." Tufte's examples of the mileage road and oil barrels highlighted this for me - had it not been for his calculations of Lie Factors, I would not even have begun to guess that the graphics weren't accurately representing their statistics. Perhaps I'm not cynical enough! With words, on the other hand, while they can certainly be twisted and greased to argue a point of view (i.e. politicians), I feel like we (by which I mean, again, laypeople) have more tools at our disposal to verify - whether that's a quick google search or reading a wikipedia article.

This reminds me of a visualization I frequently saw during the debt ceiling talks - http://usdebt.kleptocracy.us/ - a visualization of the US Debt in $100 bills. I knew several members of my family that found it a very effective (and scary) visualization and yet, I was skeptical because of how it had been presented, particularly because there was no way to verify the dimensions the author was claiming. Plus, wouldn't it have been even more scary if he'd used $50 bills rather than $100, or $1 for that matter...

On a different note, some of my favourite visualizations are on http://www.worldmapper.org/ that resize the geographic areas of countries based on other data such as population, income etc.

ifc wrote:

'Best' is a vague word especially in a subjective area like design. For simple situations (a few columns, perhaps nominal variables), I think there might be visualizations that may be commonly referred to as the 'best' and maybe even be able to be derived by Mackinlay's APT. For more typical situations, I don't think there would be an automatic way of deriving them for a general audience.

As chanind pointed out, the data itself should have some underlying semantics that probably won't be understood by a computer. Mackinlay had users tell it what where the most important things to show in the visualization as input to help address this issue, but I feel like even this would not be sufficient. First I think it would be difficult to know what exactly you want to emphasize in a visualization before playing around with the data with some type of visualizer (If you already had a good visualization, what would be the point of using a tool to generate a new one). Second there would be issues with the expressivity of what you want to emphasize. In Mackinlay's case, I think it was just selecting a column but in a more general sense it would be much more difficult.

Beyond this issue is the subjectivity of design. I don't think I really have to explain this, but it suffices to say that there are a million tradeoffs between designs (some people may be willing to give up a little air resistance to have a slightly cooler looking bumper on their car). Personally I dont think there is really a way around subjectivity, though we can do our best to make a majority of people happy.

jessyue wrote:

In class we briefly touched on the drawbacks of overloading a visualization with too many dimensions of data. I believe this is parallel to a cumbersome user interface design. In the early days, pilots make a lot of errors due to the inefficiency of the dashboard display, and overloading of information. Since then a lot of effort have been put in to design the dashboard better for viewing and training the pilots to read and interact with it efficiently. On the same line I believe selecting and filtering information to present in a visualization is crucial for effectiveness, which takes a lot of human insight. This is why computer algorithms can only automate visualizations in basic cases.

ajoyce wrote:

Algorithmic data visualization is a bit like algorithmic sentence generation. Sure, it might be readable and grammatically correct, but it won't be poetry. As others have said, there is no quantitatively "best" visualization just as there is no "best" poetic meter. Crafting an excellent visualization requires a sense for composition and aesthetics that would be intractable given current algorithmic methods.

In a broader sense, the drive to enumerate and dissect visualization elements comes from a natural human urge to organize and categorize. However, the work of Bertin and others attempts to create a finite representation of an infinite problem space. There are an innumerable number of methods for representing information visually, so it is a largely futile effort to represent them in their entirety. While it is reasonable to list examples or detail common visualization techniques, I don't believe it is productive to pursue a common representational framework for the practice of data visualization.

tpurtell wrote:

@pcish, one of the compelling parts of an interactive visualization is the chance to visually explore how the data is processed. Being able to subtly perturb the data being presented can help us focus on a part of the data we understand to get a grasp of how to use the visualization on other parts of the data set. This is especially true when highly abstract data is involved because there is commonly small window of the data being presented at any time.

As mentioned in class, basic texture as a visual cue in a diagram does seem to be a legacy of the printing process, however, a modern interpretation of texture as a cue seems to still be fruitful. There are some nice examples that fine grained data driven texture as a means of visualizing fluid flow, http://www.idav.ucdavis.edu/~garth/pdfs/eacfm08.pdf . I'm curious if there are any instances of similar texturing applied to abstract data.

scbhagat wrote:

It was interesting to look at the visual encoding variables listed by Bertin even though they weren't complete. I was wondering on what basis did he compile his "Levels of Organization". What was his sample space of people on which he conducted these experiments?

Because I firmly believe that every person would have a different reception to these variables and it cannot be generalized for the mass population. Thus comes the question, can visualization ever be truly customized for each person?

awpharr wrote:

I agree with babchick above. The point of the graphical computer algorithms for visualization design is to help humans make quick and efficient decisions by generating many different visualizations to choose from. At least at this point, I would expect graphs like the Car Nationality example to appear frequently when using the Mackinlay’s Design Algorithm.

This discussion brings to mind Garry Kasparov (who is considered widely to be the greatest chess player of all time) and his review of Diego Rasskin Gutman’s book Chess Metaphors: Artificial Intelligence and the Human Mind. The book discusses a 2005 chess tournament in San Diego where players could compete with or without the aid of a computer. Additionally, just a computer could be entered as a contestant. Many grandmasters entered, but a pair of amateur chess players used three computers, which they manipulated and “coached” to counteract their opponents’ strategies, and ended up winning the tournament. Kasparov noted that “Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.” I feel that this observation is applicable to our conversation about Mackinlay’s Design Algorithm. We work best when the human-computer symbiosis is at its highest. We must learn to let computers do what they are good at, let humans do what they are good at and then synthesize the two together instead of inferring that one could be better without the other.

netj wrote:

Tufte's argument on "graphical excellence" was enlightening, and sharpened my eyes by teaching me how to actually judge the integrity in visualizations. I was first skeptical about his words concluding the first chapter, that graphical excellence requires telling the truth about the data. Since a hypothesis usually drives the creation of visualizations, and the purpose of such graphics is to communicate some message to the audience, I thought it would be inherently biased. However, with his introduction of lie factor and demonstration on many distorted examples in the past, I learned that there is a clear way to determine the extent of bias. Moreover, he also discusses distorting the truth by hiding data, which was one of my main concerns.

However, some parts of his arguments, such as nominal money vs. adjusted, were not so clear to me why it had to be included in principles for "graphical" integrity or excellence. I rather see it as a problem at the level of processing data, similar to errors in statistical analysis.

At the beginning of last class, while Prof. Heer was showing Nightingale's coxcomb diagram and mentioning the power of visualization to communicate messages, I became curious what the lie factor would be and whether it was telling the truth. I was suspecting some distortion in displaying data with radius instead of area. But I've found an article telling she made a mistake at first but was immediately corrected. One interesting point was that if she used a chart with bars instead of the circular one with area, it would have been more effective to show the dramatic change. However, I found another article explaining the reason behind her choice and understood why it is a great historical example.

blouie wrote:

@ividya I'm not really sure that's the best argument to make, though I definitely understand where you're coming from. As you say yourself, we can reasonably assert a generalization about the rapid pace with which technology grows and develops. In other words, we can extrapolate this trend both inside and outside the field of data visualization. But if we reserve the categorization of pertinent things (in this case, data viz and data types) for when technology has exhausted all possibilities of data visualization, will we ever have real taxonomy? It's sort of like waiting to get the best and brightest cell phone. If we wait for what is truly the greatest, we'll probably never end up actually buying a new phone. (This is, ironically enough, a problem I'm dealing with right now.)

Furthermore, even if the scope of possible data visualization has barely been opened thanks to the limits of modern technology, it probably helps in present contexts to have some sort of visualization. After all, the visualizations we create now have nothing to do with future visualizations made plausible only by innovation. And the study of data visualizations is likely something that will take place sometime soon after their creations, so a more "primitive" vocabulary should be sufficient for getting across primary points.

stojanik wrote:

@Jhlau – I agree, I find the process of encoding/decoding visual artifacts in visualizations an interesting area of discussion. One thing that you commented on provoked my curiosity even more about the judicious use of encoding variables. You mentioned that, “…it seems that some of the variables are redundant - for example, color and texture are both very similar in that they modify the area of a shape.” Thanks for injecting that idea into the discussion. I think you are right to say that they modify the area of the shape but they may also may modify/sculpt the emotional response to that “area” in very different ways, either individually or together. Working with texture and color can add dimension, depth (density), guide movement through a visualization, decorate elements, support groupings, in ways that color or texture alone may not achieve. A flat grey bar and a flat grey textured bar may illicit different emotional responses and provoke different inferences about the data. But whether or not color and texture would be your top 4 I don’t know, it’s a grey area. It may depend on the visualization, the audience. When thinking about the use of color, we may need to think about how color is “rendered” to an audience with color blindness deficiencies – things get knocked into grey scale or color distortions. When designing websites I usually pass the site through a colorblindness filter to see how the colors may appear colorfilter. I don’t know, maybe it’s useful to pass a visualization through such a filter to get a sense of density or hierarchy?

And a quick response to your comment on encoding/decoding of visualizations through the “cultural filter” – I agree. I think (maybe want to believe) that different groups of people react to spatial relationships, color-coding, movement, usage of time-space, etc. through a cultural framework that requires additional attentiveness when designing visualizations or exploratory visualization tools/libraries. By not doing so are we in effect negating the advantages that “effective visualizations” provide, in that we force another layer of decoding? Do we need the Masai algorithm ...

blakec wrote:

I really liked the idea of automating visualization creation. I feel like this could work with very simple sets of data, but wouldn't be as useful with more complex features. A person creates a great visualization by truly understanding each feature in a data set and figuring out which visual technique would be best for it based on their knowledge of a feature. A computer wouldn't have this understanding of the features and thus wouldn't always create an amazing visualization.

Also, I there is an artistic component to making a great visualization, like how the colors interact, the curvature of lines and other attributes. A computer wouldn't be able to make the artistic choices that create a truly great visualization.

diwu wrote:

@scbhagat I agree that effectiveness has a subjective component. However, human beings are wired in such a way that we tend to respond in similar ways to certain things. Our eyes perceive color, shape, and movement, in very similar manners (albeit likely not exactly the same). In this sense, although the effectiveness of visualizations may have some variability based on the subjects, in general I think it is possible to rank perhaps a set of "best" visualizations, even if there is no absolute "best" or "right answer".

The reasonings for a "representative framework" as proposed by Mackinlay relies on exactly this--his belief that there are distinct classes of human understanding and perception about the world. Certainly, these distinctions are probably self-supported, and as @jhlau mentioned, likely contain a cultural basis (and would thus vary at least somewhat between different cultures).

vulcan wrote:

I also found Tufte's concept of the "lie factor" quite interesting. However, I believe that graphics with large lie factors can be legitimate, especially if the spread of the data is very large.

For example, in digital communications the BER (bit error rate) is used as a figure of merit and is often plotted as a function of the SNR (signal to noise ratio). Because the data spans several orders of magnitude, it necessitates the use of a log scale on both axes. So while a communications engineer would make perfect sense out of these curves, someone without that background might not fully appreciate, for example, the difference a 1/10 of a dB can make.

Perhaps the above is too domain-specific of an example, but I would argue that although the lie factor is useful in quantify how misleading a graphic is, it is not always a good metric.

insunj wrote:

How should we present and transform data? I think this is the question that we are going to formalize and learn throughout the quarter. As some people above have mentioned, there may not the right or the best way to do it, but there is a pre-established rules and elements that make up a good one. I was curious, could there be a metric that evaluates effectiveness of visualized data?

As the lecture covered, there are distinct elements that make up the data visualization. The elements play together to speak about the information they are conveying, and our brain grasp information with its limited capacity. It would be very interesting to analyze how each elements interact within the visual, whether some help each other or negate, and if there is a cap for the amount of information that is most effective for grasping.

Data visualization seems not just about being creative or presenting data visually, but the crux is about understanding human limitations and fitting the visualization into it.

yangyh wrote:

I was impressed by the material taught in the class yesterday. As a student who've been doing some sort of visualization (well, I bet everyone has done a bunch of slides/reports) for years, I've never known an systematic way to categorize all sorts of variables and data. Through Bertin's quotation, it's much easier to observe the set of data clearly, thus makes us able to visualize the data in an effective and expressive way.

Also, the graphs on U.S. census demonstrated in class was really impressive. I realized that all attributes matter if we truly want to make an visualization informative and effective, which I didn't spend much attention on. Before, I usually directly dive into making slides/presentations before thinking how the data should be presented in a thoughtful and meaningful way. Now after the class, each time when I try to present the data to people, I will start thinking how to make it most effective to my readers. I'm looking forward to the future class material!

joshuav wrote:

@bbunge I think a tool like this would be very cool. Its difficult to say how it would work though. Referring to the data visualization highlighted in the lecture, actually prepping the data for use in such a tool would be quite time consuming and probably detract from the power of such a tool.

I appreciate that this class will give us the tools to imagine beautiful visualizations without such a tool and then implement them directly. Without that vision, I think a powerful tool in the wrong hands would do little in the way of enabling better visualizations.

Also, today's Washington Post had an interesting article on how Twitter can display our emotions and the visualizations behind it are pretty neat.

Here: http://www.washingtonpost.com/national/health-science/twitter-tweets-our-emotional-states/2011/09/28/gIQAVb9r7K_story_1.html

And the research queries here: http://timeu.se/

wtperl wrote:

I think a lot of the unordered perception of color comes from the looping nature of the spectrum. I think by focusing on any partial range of colors allows the potential for ordering. For example, Blue-through-Red can represent temperature in a fairly universal way. As we saw in class, Red-to-Green also conveys information in an ordered way. However those are still relatively limited examples. I'd be curious to see if a wider range of colors (one not based on existing conventions like red-hot, blue-cold), but a range still less than the whole spectrum, could be used for ordering. In tests, participants had trouble ordering the colors, likely because they could start anywhere. However, if they were given colors from a more limited range (combined with the natural knowledge of the progression of colors in the rainbow/spectrum) I bet participants could offer more consistent orderings.

elopez1 wrote:

I agree with babchick's point that the role computers can play in automating the design process is that they allow the designer to rapidly manipulate and iterate of different permutations of visual encodings. I was thinking was that maybe we can do better to automate the design process. We've come a long way in machine learning and personalized recommendation algorithms. Perhaps we can make a "smart" design algorithm that learns as it's used. It can learn what types of visualizations fit certain tasks well and what kind of visual encodings fit certain types of data well. Maybe it could even use natural-language processing to figure out what fields in the schema are important for a stated visualization goal.

However, I am also aware that as cool as Machine Learning and NLP are, they can only go so far. As bbunge posited above, tools cannot have the expressive power of the designer.

junjie87 wrote:

Although the general rule of thumb is that colors are unordered, I think choosing a specific subset of colors allow them to represent ordered values, similar to what wtperl mentioned. In particular, I think colors are unordered because color spaces are usually three-dimensional (RGB, HSV, etc), and two or three dimensional spaces cannot be ordered well. However, if we were to project these spaces down to a one-dimensional line, then I think it is very possible for colors to be used. In particular, I would like to explore what happens when we take colors along a line in the LAB color space, because the LAB color space is designed to be close to human perception, and so seems to have the highest chance of success.

Also, there have been some debate in the comments on whether algorithmic programs like Tableau will eventually be able to (mass?)produce data visualizations. I believe that true art can only be done through a human. However, data visualizations serve a very important real-world need, to allow people to understand data in an intuitive way, and I think from what we have seen from Tableau, research has already reached a level good enough that we should embrace the new technology.

phillish wrote:

To echo what others have posted before, I too have always had an intrinsic sense of what visualization model to use for a given data source, but Thursday's lecture gave more structure and proper definition. To give some examples, we talked about the categorization of differentiating features (e.g. color, shape, form) as well as data types (nominal, ordinal, quantitative).

The US census in-class demo was pretty fascinating. It emphasized that showing more information is not always better and can often detract from the overall message of the graph. One example was the marital status of the population. When we have an absolute scale, we see the raw numbers of people in each marital status, but the "married" and "single" graphs dominated the graph and made it hard to even see the slivers for the other marital status graphs. By scaling by the entire population, we lose these raw numbers but can better see, for instance, the percentage of the population widowed at age 50.

schneibe wrote:

I really liked the concept of the "lie" factor. However it would be handy to have a list of typical things you should be aware of when you read a graph; something like: - check the scale of the effect (a small effect can look important because we "zoom in") - always keep in mind that correlation does not imply a causal effect - make sure that the effect is not due to outliers - know which transformations have been performed (ratio, log, ...) - check whether 3D effects bias users' perception of the graph - and so on... That would be really useful to have a rating scheme like this to evaluate the quality or correctness of a graph (or make sure that you are not making on of these mistakes when you design your own visualization).

njoubert wrote:

I want to go back to what @crfsanct posted. That 4-dimensional visualization is quite fantastic, allowing you to graph multiple aspects in meaningful ways, and with time allowing animation, we can track changes well to find historic events. It raised interesting questions about the data: for example, tracking the life expectancy versus income for South Africa, we see that life expectancy plummeted after 1994. This happens to coincide with the fall of apartheid. Now we've raised an interesting question that the data helped up ask - what is the connection between the historic timeline and the unusual life expectancy of South Africans? My hypothesis is that immigration from poorer surrounding countries shot up once apartheid fell. We can now test this by using one of the other dimensions of the graph to show immigration. Unfortunately they don't have this data! And here's the biggest failing of this chart - it's extremely hard (in comparison) to explore axes not currently selected. I think this highlights a general concern with multivariate data - not only do you want to have excellent visualization for the currently chosen subsets, but you also need to be able to navigate through the axes of the data cube well. So far this hasn't been addressed as much, but is clearly also a concern.

bsee wrote:

There were a few thoughts that came to my mind while doing the reading.

One of them was how the types of graphs mentioned in the first chapter is still relevant to computer science right now. For example, in computer game programming, death/heat maps are still being used to improve the design of the game. Even through advancement in the field of data visualization, there are some classic visualizations that are still considered the "best" visualization for the data type. I am in fact amazed at how the different visualizations even started, and how they survived till today.

The other thought that I had while reading the book was the chapter about graphical integrity. Tufte mentioned about the lie factor, as a metric on how correct the representation is. I think that this lie factor is definitely missing the cognitive part of the reader. In particular, I remember that in class, the professor mentioned about how humans perceive volume, loudness, brightness etc on different scale. Doubling the size of a 3-D representation on print may in fact lead to a smaller lie factor because of that. (If the volume of an object was doubled, humans generally think that the volume was less than doubled). That said, however, the lie factor does a good job in giving a metric on the encoding part of the visualization. It will be interesting to see how we can add the cognitive/decoding part of visualization. With that, we can tell the true effect of an incorrect represented graphical visualization.

chamals wrote:

In regards to the role of software in generating high quality data visualizations, I think software like Tableau can produce reasonable data visualizations that would otherwise be out of reach for the average person. I still believe, however, that software will not be able to produce the most "effective" data visualizations alone, they will need a human hand in the creation of it.

I think there is great power in Tableau and other software like it to help people rapidly create many variations of visualizations very easily. I think the design of data visualization software should allow for quick iteration and comparison of several visualizations on the same data set. This type of software reminds me of Juxtapose (http://hci.stanford.edu/publications/paper.php?id=16), which allows for the creation of multiple user interfaces to be used and compared directly with each other at the same time.

stubbs wrote:

After reading the 97 InfoVis Card/MacKinley paper: from a computer science perspective, the possibility of codifying visualization design becomes very seductive when you consider the possibilities for quantitative/objective analysis or even applying high-throughput computational methods (pick your favorite ML algorithm) to spontaneously create an ordered framework of visualizations. But to me, this feels a bit like "physics envy", and perhaps the field is too nascent to create a hard standard that may only serve to limit and create a false barrier/burden (similar to codifying largely-unknown epigenetics "circuits" or creating excessive software documentation).

I love the idea of automated visualizations, but like many above, I agree they have their place. Perhaps that place is in domain-specific implementations (e.g. an automated viz algorithm for any/all financial information).

It was so apparent from the collective reaction to the class demo that movement and time are universally and intrinsically powerful tools in visualizations (a few fun evolutionary theories spring to mind); they arise emotion without needing the dulling cognitive filter of matching color/shape/etc. to intended meaning (i.e., you don't need a legend to know that the blob is growing). Regardless, you have to admit, watching the population perturb like a greedy, amorphous, autonomous entity was very viscerally interesting.

angelx wrote:

I also enjoyed Tufte's discussion of graphical integrity and the concept of the lie factor which attempts to quantitatively measure by how much a graphics can mislead. I sometimes wonder often a graphic is designed deliberately to mislead and how often is just due to the lack of understanding on the part of the designer, Tufte discusses that part of the problem due to the lack of quantitative skills of graphical designers and the attempt to make statistical data more exciting. However, it also seems like that there are conscious decisions to misinform (as the bar chart illustrating the net income).

It is also interesting to me to discover how very recent some of the innovations in visualization are (like the scatterplot, pie chart, or the box and whisker plot). For me, I find while visualizations can be very effective at presenting information, it often takes me a while to get used to a new type of visualization, and often visualizations are displayed without instructions on how to read them (familiarity with the type of visualization seems to be implicitly assumed). I am curious as to at what point does a visualization technique become a standard way of displaying data that is easily understood by most of the audience.

bgeorges wrote:

vulcan and schneibe already touched on this, but I too think that the "lie factor" definition put forward by Tufte is unnecessarily narrow. My classmates have already brought up the examples of when logarithmic and other non-linear scales are more helpful for evaluating data in a chart, but I think on a conceptual basis (to the extent that false coloring actually communicates more than "accurate" coloring just like a logarithmic scale may communicate more), the idea of false-color imaging is related. Some of the most iconic images of deep space that we've seen (such as the Gaseous Pillars) are actually colored based on measurements of different wavelengths of light that are not visible by the human eye, but broadly disseminated as "photographs." Some might argue that these images are misleading, since no human can ever hope to experience a view like this when looking into a powerful telescope, but it can be argued that presenting the data in this way (and it is data, since a digital photograph is just a series of measurements of the light intensity hitting a sensor, and false coloring can thus be thought of as an extreme instance of a choropleth) is actually more helpful because it allows the viewer to understand the structure of what was imaged. Both the logarithmic scales and the false coloring example demonstrate that intent is also an important factor in determining the "lie factor" of a visualization. In other words, maybe a scientist using a logarithmic scale to better communicate findings is not as bad as Day Mines Inc. making a bar-chart where 0 is above the x-axis (Tufte p. 54) in order to mislead investors.

jneid wrote:

I wanted to comment on the last chapter of the reading. While I agree with Tufte that statistics and numerical data are often incorrectly categorized as uninteresting, his claim that this view results in poor, unsophisticated graphics is unjustified. First, his measure of "sophistication" is too limited. He defines a sophisticated graphic precisely as a statistical graphic based on more than one variable that is not a time-series or a map. I think that there are many more measures of sophistication, such as the complexity of the data format. A specific example, as mentioned above, is using a logarithmic scale to better visualize data. This should be viewed as sophistication, not as a lie. Further, more complex graphics are not always better. We've mentioned previously the benefits of the use of a small multiple, which separates data into visualizations with fewer variables, rather than attempting to fit all information into one image. A primary role of data visualization is to make the data understandable, and if this is better achieved by including only a single variable in a graphic, it should be done that way. It is better to make it too easy for an audience to understand than to make it too hard. Thus, the important thing is not to add complexity to a graphic, but to create a good balance of complexity and clarity so that the data and the statistics involved may actually be viewed as interesting.

dsmith2 wrote:

In Tuesday's Card, Mackinlay, and Shneiderman reading, the authors discuss how one of the initial steps for creating a data visualization is selecting the variables that will be compared and ultimately visualized (a schema). In lecture on Thursday, Prof. Heer discussed classifying these variables as nominal, ordinal, and quantitative.

The fact that variables have different types, while seemingly obvious, is a valuable thing to take note of. Similar to how, in the previous lecture's reading, the ways in which visualizations amplify cognition were specifically stated, expressly identifying variables types within the schema allows for a more formal examination of a visualization and its successes and failures.

As Prof. Heer went on to discuss, different variables can and should only be used in specific ways. Bretin's "Levels of Organization" was introduced as a formal way to understand how the different variables can be used in visualizations. Unfortunately the example of the Countries of car origin being used as a length variable in a horizontal bar chart leads me to believe that Bertin's rules are a little bit flimsy and should be taken with a grain of salt.

marcoct wrote:

I am interested in unsupervised machine learning, which is in simple terms the task of finding structure in data. This area is very related to Professor Heer's discussion of a taxonomy of data types (beyond machine format), such as the nominal/ordinal/quantitative groups, as well as other more complex types that were mentioned such as hierarchies or general networks/graphs. For an interesting computational-cognitive-science paper on writing algorithms that discover these types of structure automatically from data, see: http://www.stanford.edu/~ngoodman/papers/tkgg-science11-reprint.pdf

An interesting philosophical question (maybe not so relevant for visualization though) is where these data types actually come from---that is why have our brains evolved the capacity to understand the nominal/ordinal/quantitative distinction, or the concept of a hierarchy/tree. How do natural phenomena come to organize themselves in this way?

More relevant to visualization is the computational task of dimensionality reduction. That's the task of taking very high dimensional data (say ~1000's) and projecting it onto some smaller number of dimensions. If the goal is to make human interpretation easier, the projected data can be visualized, and the target number of dimensions are around 1 or 2 or 3. However, often dimensionality reduction is used to make the data more interpretable by an artificial neural network or other learning algorithm, in which case the target dimension can be much larger. See http://www.cs.toronto.edu/~hinton/science.pdf for a machine learning paper on a (somewhat) recent paper on dimensionality reduction with an interesting visualization graphic.

marcoct wrote:

I am interested in unsupervised machine learning, which is in simple terms the task of finding structure in data. This area is very related to Professor Heer's discussion of a taxonomy of data types (beyond machine format), such as the nominal/ordinal/quantitative groups, as well as other more complex types that were mentioned such as hierarchies or general networks/graphs. For an interesting computational-cognitive-science paper on writing algorithms that discover these types of structure automatically from data, see: http://www.stanford.edu/~ngoodman/papers/tkgg-science11-reprint.pdf

An interesting philosophical question (maybe not so relevant for visualization though) is where these data types actually come from---that is why have our brains evolved the capacity to understand the nominal/ordinal/quantitative distinction, or the concept of a hierarchy/tree. How do natural phenomena come to organize themselves in this way?

More relevant to visualization is the computational task of dimensionality reduction. That's the task of taking very high dimensional data (say ~1000's) and projecting it onto some smaller number of dimensions. If the goal is to make human interpretation easier, the projected data can be visualized, and the target number of dimensions are around 1 or 2 or 3. However, often dimensionality reduction is used to make the data more interpretable by an artificial neural network or other learning algorithm, in which case the target dimension can be much larger. See http://www.cs.toronto.edu/~hinton/science.pdf for an important machine learning paper on dimensionality reduction with an interesting visualization graphic.

ardakara wrote:

Like marcoct, I'm also interested in the cognition of visualized data and the physical analogies we sometimes call upon for help. In the reading, Tufte talks about the first shift from labeling maps with coordinates that have clear physical analogies to labeling charts of abstract data. Quoting Playfair:

"Suppose the money we pay in any one year for the expence of the Navy were in guineas, and that these guineas were laid down upon a large table in a straight line, and touching each other [...] The Chars are exactly this upon a small scale"

I found it very intriguing that people needed an introduction to graphically plotting numerical values. A basic scatter plot was not easy to understand and needed explanation; however, it was a powerful enough tool that it got accepted. Are there other more complicated ways to visualize data that would look incomprehensible at first but would prove to be similarly useful in time? Yu talks about visualization as analogy making here: http://www.springerlink.com/content/v37p8x5404521782/fulltext.pdf

jofo wrote:

I agree with @chamals that automatic visualization software should be treated as a tool for generating design alternatives as well as for exploration.

Regarding Tufte and his Lie Factor I like the discussion about "lying" but the quantification does not make sense. Actually I find it a bit paradoxical that his obsession with (generating) quantitative data, as @jkeeshin describes, when at least in my view visualization is all about generating qualitative conclusions from the data. To correct for inflation would also apply to other reports of the data than through diagram. One big "lie" often occurring in newspapers is related to ignoring Bayes theorem (as in the in-class example ignoring population when reporting on cancer distribution). This is done all the time regardless of presentation form.

I find the deconstruction of data into O,N,Q and the larger parts of Card et al's framework to be very useful and I'm hoping to apply it to analyze medical visualizations I currently study ("color" Doppler ultrasound etc). In the article they tread Year as Q which I didn't really understand (is 2010 .5% more than 2000?) but as we discussed in class the classification "depends" and prof. Heer added the distinction of Q-ratio and Q-interval that fixes that problem.

mkanne wrote:

@ jneid: I would also like to comment on the third chapter of Tufte. I do agree with jneid that Tufte's measure of sophistication in graphical representations of data is not rigorous. His attempt to provide quantitative support for the distinctions he is making by counting multivariate, non-map or time series graphics condenses complex visualizations down to one binary variable. Like jneid, I am not sure that this evidence convincingly supports the abstractions he is gleaning from the data.

However, I believe that the overall trend in widespread visualizations (aka in periodicals) Tufte is trying to highlight is relevant. From the graphics featured in chapter 2 as well as the quotes from newspaper editors in chapter 3, Tufte's claim that visual artists are regularly being employed to create misleading visualizations is supported. While visualizations can be most beneficial to the audience when they are simple, the graphical sophistication Tufte is proposing does not necessarily mean more complication but simply the integrity to reveal the "real story" in the data without the use of "lies" or misleading visual choices. I believe that making data understandable often doesn't require the technique of using the least graphic elements possible. A bit of sophistication can more clearly represent the data while making the visualization more compelling from a visual perspective.

cmellina wrote:

@bsee, your point that the lie factor is missing cognitive burdens on the the viewer is a good one. Take, for example, a visual illusion like the Ebbinghaus illusion. This illusion creates systematic misperceptions of size. If, for some accidental reason, a graphical design unintentionally induced the Ebbinghaus illusion, it would be possible for the graphic to have a lie factor of 1 if the actual circle sizes in the graphic did correspond correctly to effect size. This is in spite of the fact that there will be an illusory lie factor that cannot be captured by Tufte's definition. Perhaps this scenario sounds far fetched, but consider that the prototypical form of the Ebbinghaus illusion (as shown in the wikipedia page) is exactly that: prototypical. The illusion can exist in a variety of forms. Take this graphic of CA motor vehicle deaths for example. Who is to say that the Ebbinghaus illusion in not subtly active in that graphic? What's more, that style of graphic is by no means uncommon. These considerations coupled with the fact that 1) there are a large variety of similarly pervasive perceptual illusions (Ponzo illusion, Checker shadow illusion, etc. - I mean, just look at them all!) and 2) that most people are not aware of these illusions, even people designing visualizations, make accidental (or intentional…) illusory distortions of visualizations a quite real threat.

emjaykim wrote:

I found the "Phillips curve" plots on p.48 interesting- whereas all of the other graphs are meant to prove a relationship and a pattern, these ones "demonstrate the collapse of what was once thought to be an inverse relationship between the random variables."

fcai10 wrote:

I was surprised to see the The New York Times do so poorly in Graphical Sophistication (Tufte, p. 83) and have its graphics showcased as bad examples (Tufte, pp. 67, 81). Tufte would be happy with the way the newspaper has improved its visualizations since his book was written. I read the New York Times often and think they have very interesting, interactive visualizations that conveys a lot of information in a clear way. I came across this article where a NYTimes graphics editor describes his team's approach to visualizing data: Lessons in Data Journalism From The New York Times. Data visualization is not only useful for conveying information to readers, but also as a tool for reporters as they do their investigations -- data-driven journalism. For more about that: Using Data Visualization as a Reporting Tool Can Reveal Story's Shape.

grnstrnd wrote:

In response to the third chapter, I agree that Tufte's definition of sophistication is very limited. However, I feel that part of the limitation we perceive is based upon our extrapolations from the few examples he gives us and less comprehensive definition. In particular, while his definition of sophistication as relational graphs with multiple variables seems fairly limited, in particular because his examples of these are often x vs y graphs of some form, I feel that his definition encompasses every decent graphic he's shown us, especially the simpler ones. While I certainly read this definition the first time as indicating the scientific charts (scientific used to refer to scientific relational data, not the previous week's definition), which naturally conform to line graphs, etc, his definition as well indicates all of the atypical charts we've seen and practiced encoding many variables in different ways. Many of these have been colorful and eye-catching but simple in presentation whether or not they were x vs y relational line graphs. What his definition excludes are graphics that are colorful for the sake of color, such as the frivolous cartoon examples that abound in chapter two: these encode perhaps two variables, and yet the data itself is not well represented or strongly relational. In conclusion, I think his attack upon unsophisticated graphics is much more comprehensive than it seems at first.

daniel89 wrote:

@blakec's and @chamals comment about the automatic generation of visualizations is interesting, but I view it as merely a means of exploring the design space. Creating a good visualization (as mentioned in class) is a confluence of many more different factors (visual aesthetic, etc).

I found Tufte's sophistication argument fascinating. I think there is a nuanced difference between visualizations (something which should communicate something very quickly, almost instantly) and diagrams, which function more as maps. Sophistication, in the form of trying to communicate too much that the simplicity of meaning is lost, seems plausible; conflating it with trying to communicate multiple variables does not address the root cause of the problem.

rc8138 wrote:

I also enjoyed very much Tufte's discussion on graphical integrity. the famous quote "lies, damned lies, and statistics" can be equally apply to visualizations as well. Connecting to the discussion with visual perceptions, a lot of time we do not intend to deceive the audience, but our lack of understanding the human perceptions often mislead the audience.

Tufte listed 6 principles that help us to adhere to graphical integrity. I found his advocacy for clear labeling, though simple, often ignored by creators of visualization. Sometimes we simply forgot to put down the labels, but more often we make assumptions on how reader should be able to draw logical or cognitive connection to what the visualization tried to convey out of the data.

His principle of showing data variation, not design variation is also enlightening. A lot of time we try to find the best design for the data presentation possible, but we forgot that is different viewing the data and digging out data variation in various form. Keeping this principle in mind can help us to monitor our efforts to focus on the data and the truth.

Leave a comment