Lecture on Sep 30, 2010. (Slides)


  • Required

    • Chapter 8: Data Density and Small Multiples, In The Visual Display of Quantitative Information. Tufte.
    • Chapter 2: Macro/Micro Readings, In Envisioning Information. Tufte.
    • Chapter 4: Small Multiples, In Envisioning Information. Tufte.
    • Low-Level Components of Analytic Activity in Information Visualization. Robert Amar, James Eagan, and John Stasko. IEEE InfoVis 2005 (pdf)

  • Optional


trcarden wrote:

Perhaps the books have changed or i have the wrong version but chapter 8 in the visual display of quantitative information is titled "High Resolution Data Graphics" which seems different from a chapter on small multiples and data density.

mariasan wrote:

@trcarden I think it's the right chapter, if you look further one of the sections is called "Small multiples".

amirg wrote:

@rakasaka There are advantages to train timetables in the United States as well I think. For example, as far as I can tell the Japanese schedule only tells when the train will be arriving, which by representing as a stem-and-leaf plot can give you a sense of the frequencies. However, if you want information about the direction of the train or when it will arrive at your destination or what stops are along the way, you would have to supplement that with another table or graphic. By contrast, in many US train schedules, and I'm thinking of the Caltrain schedule in particular as an example, you can see not only when the next train arrives at your stop, but when it arrives at each stop en route to your destination. For a daily commuter, this might not be as useful since you probably already know the details of your route quite intimately, but I actually find it quite useful. To summarize, I would say that it's important to think about the context in which you are using the schedule as both versions have their advantages and disadvantages.

gneokleo wrote:

I always liked dense representations which portrayed a lot of information in an organized non-cluttering and readable manner, something that is sometimes very hard to achieve. It allows the reader to come up with their own interpretation or stories by looking at the data and as Tufte interestingly says “Control of information is given to the viewers not the editors”. Since readers can choose to “read” the data in many different ways, this also ties well with the paper by Robert Amar, James Eagan and John Stasko where they try to provide low level analysis tasks for “navigating” through data. The authors took tasks drawn from frequent questions that people ask when interpreting data after observing a number of people trying to interpret data. Tufte makes a lot of good points in his book. Specifically I liked how the author presented the significance of macro/micro visualizations and how they sometimes tell different stories at the macro and micro level, e.g. the wall of D.C.. Tufte also points out how difficult it sometimes is to represent high density information and doing that on paper on a static image is harder. The example the author gave of midtown NY where the streets were widened to allow for less overlap between the buildings is an interesting trick. This also reminded me of googlemaps where the digital medium in which streets are represented, where you can zoom in and out simplifies the task a lot. While a lot of the examples that Tufte gave in his book were interesting, I wish Tufte would give some more general points (or tips) on how to make a successful high dense representation.

rakasaka wrote:

I think what's particularly fascinating is the use of the metric of "how easily can a chart/graphic retain its information if it is shrunk"? When we are forced to display information within a restricted area, it also forces us to make the best use of the space available, while aiming for ways to provide the most amount of information.

In that sense I was intrigued by the mention of the Japanese bus/train schedules, because it had been something I had taken for granted when I was there. The efficiency attributed to a stem and leaf structure is evident - with the hours on a vertical axis, people can easily inspect at what times the buses run most frequently, as well as find out when the next bus/train arrives. By comparison, timetables in the United States often attempt to encode the location of the bus/train at any particular time, which may or may not be most efficient.

msavva wrote:

Reading through the Amar, Eagan and Stasko paper I found that one of the most interesting parts was the section on questions omitted from their taxonomy of analytic tasks. In particular, it struck me as rather insightful that many of the questions people ask can't be included because they invoke uncertain criteria or they imply comparison operations at too high a level. However, it unfortunately seems that these are exactly the questions people will initially ask when presented with a given data set (i.e. we tend to start analyzing at a high level and have to work our way down to more fundamental analytic tasks in pursuit of having our high level question answered). With some knowledge of the taxonomy of analytic tasks we can hopefully deconstruct our high level questions more easily and then have a clearer idea of what tasks we need our visualizations to achieve. Maybe a good approach would be to ask people to explicitly perform such high level query deconstructions into analytic task primitives and then use that information to drive interactive visualization environments more effectively towards the user's goals.

strazz wrote:

"The principle, then, is: Maximize data density and the size of the data matrix, within reason." I think that on chapter 8 Tufte makes several interesting points such as demonstrating the degree up to which human eyes can make visual distinctions, and actually provides a mathematical model to calculate it. After reading this I can think of several corrections I could do to the first assignment in order to try to maximize the data-ink within my graphic in less space. The more data your graphic can represent by itself makes it easier to the viewer to compare and analyze the information without losing focus, facilitating the process of gathering knowledge and noticing patterns contained within it.

adh15 wrote:

My initial reaction to Amar, Eagan and Stasko is positive. I agree with their argument for analytic primacy. I find this to be a significant contrast from Tufte who comes across as more of an advocate for the data. In the end, both the analytic needs of those viewing a visualization and the truth revealed in the data are important, a fact that Wattenberg touched on in the interview we read for Tuesday when he noted that, "...having real data as apart of your project is as important as having real users look at your project."

The taxonomy proposed should be helpful in analysis. I am interested to see examples of how it can be used for design as well.

@gneokleo I also enjoyed the micro/macro discussion, but agree that more instruction in generating visualization would be helpful since there's often an gap in analytic and generative ability.

dlburke wrote:

I've noticed in a lot of the redesigns, there is a lot of movement towards Tuftesque simplicity. This tends to lead to near-complete eradication of aesthetic qualities. The point becomes entirely about answering a particular question via the most straightforward method. Indeed, some of the biggest criticisms are leveled at visualizations, which attempted to make some sort of aesthetic connection with its topic, e.g. the wind grapes. So I've been wondering do aesthetics have a place in visualization? Is it only allowable to the extent that it keeps things simple and/or attracts the eye? It seems that in creating a visualization, especially when it is going to be published in a magazine/website/etc., the designer should seek to embed the design into the context of the topic.

yanzhudu wrote:

As for the train timetable example: There is also a data quantity issue in making design decisions. Japan has a vast train/subway system (Tokyo's subway system looks like a spider web: http://speedymole.com/Tubes/Tokyo/tokyo-subway-map.html). Train frequency is also quite high, at peak hours, train can arrive a station at 3 minutes interval. List down timetable for all trains at all stations is simply impossible. Whereas in US, railway line tends to be linear with fewer stations(e.g. Caltrain). Listing every train on timetable is possible.

gdavo wrote:

In Chapter 8 of the Visual Display of Quantitative Information, Tufte defines a measure of graphical performance as:

data density of a graphic = number of entries in the data matrix / area of data graphic

Tufte claims that the higher the data density, the better the graphic. Consequently he expounds the principle that "Graphics can be shrunk way down".

I quite disagree with this maxim: the quality of a graphic is not strongly correlated with its size. If you have a really bad graphic with very few information, if you shrunk it a lot it will not become better. On the other hand if you expand a very information-rich graphic to make it more clear it won't work. The human eye can only catch information on a reasonable span, so I think graphics' size should (and do) not vary too much, and be more dependable on constraints such as page size.

jayhp9 wrote:

@sholbert Overloading the user with data might certainly be very instructional, as long as it is done in a way that the user actually knows what are the key components of the infographic. If multiple data increases distraction, it serves little purpose, than to degrade the effectiveness of the visualization. Of course, it boils down to how thoughtfully the person making the visualization conveys the important data, while still including other possibly less significant data.

I think that adding in graphical components that serve no form of visual encoding, might still be particularly useful, like in one case we saw today in class. In the beginning of the class, we went through one of the group's work on the infographic about India. The revised version of the visualization created by the group actually included a not-so-detailed map of India. As Professor Heer pointed out, I agree that while the map of India served little to convey anything, it would immediately let the viewer of the visualization know that this graphic is about India, without ever needing to write 'India' anywhere on the graphic. It helps to build context.

sholbert wrote:

@Strazz I agree, I found this section very interesting too. Overloading the user with data isn't a bad thing, as long as the data is decipherable. A great example of this I think are picture books for kids that describe in extreme detail how large mechanical products work (ships, blimps, planes...etc). I loved these books was a kid and they greatly facilitated the learning process and engagement with the text.

acravens wrote:

@dlburke and @ankitak -- I noticed this same focus on visualization redesign to maximize information conveyance especially regarding the one about high school students studying abroad. I know the source was National Geographic, but we ran out of time before I was able to ask where in the magazine that graphic originally appeared. One thing we didn't discuss in any of the redesigns is that sometimes stylized charts/visualizations can serve a dual function of conveying information and creating a spatially pleasing layout. I found myself wondering how large that graphic appeared and whether in the "intro" part of the article or title page, etc. While there were certainly valid points made about the misleading nature of the sweeping curves in the original, I'm not convinced any of the proposed redesigns could have served this dual function, if that is indeed what the original was doing. And despite its faults, I did come away with a strong impression of the order of magnitude difference between the students coming into and out of the US studying abroad. Which might have been all I needed to understand, if this were presented early in an article I subsequently read. Thoughts or counter-arguments??

ankitak wrote:

@dlburke I noticed the same thing in the class today. It seemed interesting that though all the redesigns were done by different groups of students on different visualizations and completely different sets of data, the underlying common principle was simplification of the design. This inevitably resulted in removal of aesthetic features in most cases and a movement towards bar / line graphs combined in simple ways. Though this definitely aided in understanding the data and its properties in a better way, I wonder how much attention such visualizations would get in the print.

abhatta1 wrote:

@jayhp9- I agree with you that a hidden dimension in data visualization comes from utilizing the imagery embedded in the minds of the users. It helps people to easily connect and interpret the data. One possible method which can be explored is how do we transform/plot data so as to bring out a visual imagery through the plot itself ?

clfong wrote:

Reading Chapter 2 - Macro/Micro Readings from Envisioning Information, I find that the idea of adding more and more detail into a design to for clarity make even more sense with modern display technology such as Microsoft Seadragon. (http://www.youtube.com/watch?v=PKwTurQgiak) With the ability to zoom in/out of a huge imagery with the aid of multi-touch technology, much more details can be encoded in a single image, and potential audience could freely explore whichever part of the image that they find more interesting, leading to a more interactive experience. Some examples in the chapter, such as satellite debris, could be make into a much more informative display using this kind of technology.

saahmad wrote:

In the "Small Multiples" chapter (Chapter 4) I really liked the idea of focusing on a comparison rather than just the raw data. I think this notion is really important because often times trying to understand and appreciate a visualization takes a lot of effort. It would be much easier if the author provide two graphics, one being "normal" while the other clearly presenting the interesting anomaly. It brings back memories of child hood puzzles in Highlight magazine (I spent a lot of time at the dentist as a kid), one which would ask "What is wrong with this picture" and the other asking "Find the 10 differences between these two pictures". Anecdotally, it was always easier for me to do the difference puzzle.

Now, obviously sometimes you do not have actual data to show a comparison. Many times you have a restricted data set, which is self described and the visualization wants to draw the attention to the raw data and not a change in that data. Obviously in these cases there is not much you can do, but when at all possible, I think it would be helpful to do additional data collection to serve as a baseline. For example, if you have a graph of the financial data of China and you want to present it, I think it would be really valuable to also show financial data from a country the viewer is already familiar with (say, the USA) to serve as the baseline graphic and guide the user's sense making process. Another thing to point out is that this baseline visualization does not even need to be generated from real data. For example, if you wanted to visualize the number of transistors on CPUs every year for the last decade, it may also be interesting to add an artificial line which plot what it *would* have looked like if CPUs doubled transistors every year. By just glancing at that visualization, the author is much more effective in telling his story and the user can more easily generate insights and questions.

malcdi wrote:

I enjoyed the Amar et al. paper, and so will comment mostly on that. First and foremost, I was highly amused that not only did it contain exactly 0 visualizations, but also that the layout was highly aesthetically unappealing. That said, I found their taxonomy clean and intuitive, and their ideas well explained. This made me wonder about whether visualizations should be used to explain visualization. Perhaps it tethers ideas more than it illustrates them. For example, in the discussion for Tuesday's lecture, a number of people pointed out "contradictions" between Tufte's designs and his design principles. But it's hard to create a design that follows all principles. I wonder whether a large part of visualization design is instinct/talent, which renders "rules" into guidelines in comparison.

I feel that this applies less to exploratory data analysis: the process of EDA is principled and ordered, and so it is much easier to follow design rules (because you're following a sequence standard, statistical tests). An artistic rendering designed to communicate a unique aspect of a data set seems to have a quite different process. Is this true?

A final note on the Amar et al. paper: in their conclusion they note: "For example, how does the visualization presented by a system support characterizing distributions or finding anomalies?" I think that the overall paper would have been much improved by incorporating small discussions along these lines into each taxonomy-item section.

@saahmad: I like the analogy with the kids' puzzles! It's much easier to start a task when you know that it is finite. However, what if (hypothetically speaking) there were 11 differences, and you stopped after finding 10? I am not sure how to solve the problem of giving the viewer all the data they need to make their own inferences, but at the same time making it simple enough that the anomalies or interesting parts are clear. Perhaps the workload of the task should be split: as researchers / designers we look at the big picture, and illustrate the results for a general audience?

lekanw wrote:

@ankitak Yes, I agree with you and the comments before. I think also in addition, there was a real effort to find the uses of the graphic, and the story it was trying to tell before the actual redesign. For the Dulles map, the wine, and the Indian utilities graphics, especially, groups thought very thoroughly about the use cases--navigating to your gate, finding which wines to drink when, and finding the causes of the utilities delays--in designing their graphics. So, even if the graphics are simpler, they are far clearer in what they are designed to do.

asindhu wrote:

I think the heart of the debate about the place of aesthetics in visualizations comes down to the fundamental purpose of the graphic. Tufte has a very "utilitarian" view of visualizations -- from his perspective, a graphic serves a very specific purpose intended to communicate a very specific message from the data, and he very much emphasizes the technical/statistical aspects of the underlying data, lamenting that most people who create visualizations do not have expertise in those areas.

From that perspective, I think it makes sense that nearly all of the redesigns in class focused on stripping away the "unnecessary" aesthetic elements. In this utilitarian view of visualization, you have to draw a very solid line between what serves a useful purpose and what doesn't, and any aesthetic element that might obscure clarity should be removed.

However, as people have pointed out, this isn't the only way to look at visualizations. We can also create visualizations where our main goal is to create something visually compelling for artistic or aesthetic reasons, while also conveying some useful information, just perhaps not in the optimal way.

One interesting analogy I thought of to compare these two perspectives is comparing a visualization to a piece of writing. Tufte's view of visualization is equivalent to the writing in a scientific paper: the writing is (ideally) concise, clear, unambiguous, and gets the point across effectively with no frills. On the other extreme, however, we could design visualizations analogous to poetry, where the merit is in the aesthetic sense; while there is meaning, it is often subtle or hidden from first view.

So these are the two extremes. I think the overall point is that we have to decide where we lie on that spectrum before we create a visualization. I don't think there's anything wrong with building an aesthetic sense into a visualization as long as we make a conscious decision in the relative priorities of aesthetics versus clarity and stick with that throughout the design.

estrat wrote:

In lecture, talking about missing data and other problems with data hit close to home for me. I think it's the sort of thing where people think that it won't happen to them so they don't have to put much thought into it. Earlier this year I was working on a paper for a conference and was collecting some data for the paper. The data came from software users had installed which sent us anonymous data about their browsing habits and we had been running the experiment for a month. We had gone over what sort of questions we wanted to answer and what data we would need, but when it came time to analyze and visualize the data we had collected, we realized we had forgotten to collect an important piece of information. We ended up throwing away all the data and had to start over. Fixing the code only took a few minutes, but all the data we had was useless.

hyatt4 wrote:

While I have enjoyed learning how we can encode data in the most informative way, I am not convinced that this is always the best way. I think the best way would depend on the audience and goal of the visual designer. For instance, during the lecture there was a record design that conveyed information about the various music/artist samples. While the color scheme did not encode much information, and there was some difficulty with the circularness, I would argue that it may have worked just fine in a teeny-bopper pop magazine. It looked like the kind of visualization that I might catch my wife reading in people magazine. Yet, in class it felt like we were beginning to transform the information so that it could be used as a reference for record label executives or DJ's who could use to put together play lists.

I am not qualified to speak as an editor of a magazine (particularly entertainment magazines), but I do wonder how much they choose to trade-off a more informative visualization design over a more flashy, catchy, stylish, or artistic one. Would my magazine (by way of sales) be better served by a graduate of CS448B or a graphic designer? I will admit that sales and informativeness do not have to go hand-in-hand (i.e. I assume the National Enquirer or Weekly World News are still selling well).

emrosenf wrote:

I found the discussion of the Vietnam Memorial to be especially interesting given the controversy that surrounded the creation of the memorial. From reading about it, I gather that people were insulted by the bareness and size of the memorial, when these seem to be the very features that give it the micro/macro effect.

I think that we have trouble imagining how minimalism can clarify. It's easy to think, 'more is more'. I remember in the reading about data-ink minimization thinking to myself, 'there's no way erasing can make things clearer'. I wonder if whether the power of minimalism is hard to appreciate without seeing it live, and whether that may be what happened with the Vietnam Memorial.

amaeda10 wrote:

Several comments on data density and gdavo's comment.

1. I agree with gdavo's idea of "The human eye can only catch information on a reasonable span, so .. graphics' size should (and do) not vary very much". If this is the case, then the data density is pretty much decided by the amount of information you put in the graphic area, and this is what we should primarily care about in terms of data density. Starting from the size of the graphic are is the wrong way.

2. I would like to add on to gdavo's idea of "If you have a really bad graphic with very few information, if you shrunk it a lot it will not become better". Are you saying "bad graphic" because the graphic has too few information? Or because the graphic has other flaws, like a wrong data, bad color scheme, etc? (or both?) If the former is the case, then actually I think it has some benefit to shrink the graphic within the reasonable extent. Viewers will wonder if very simple data is shown in large area.

3. Tufte says that the data density is related to the credibility. I agreed with this idea and liked a lot because this explains clearly about why we should care about the data density. Although not every time, many times we create data visualization to convince somebody. As Tufte says, if we were just shown super simple bar chart for the complex problem, we will be skeptical and think that something should be missing.

4. I also think that data density is important to make the visualization be fun to play with. If data only shows too simple fact, there will be not much range for exploration. As Martin Wattenberg said in the previous reading, having fun to play with is one of the major criteria for the good visualization.

5.In last class, Prof. Heer showed the human relationship matrix in facebook. Although he claimed that having symmetry is beneficial for some reasons (Sorry, I forgot the details on this. Maybe easy to emphasize the pattern? or shape of the triangle is hard to see? Does anyone remember that?), it might take some time to notice that the data is symmetric and could mislead viewers. In addition to those reasons, I think omitting the symmetry of data visualization is beneficial from the point of view of data density. (Do you think omitting the symmetry is better or worse? If it depends on the context, please explain.)

selassid wrote:

@asindhu When Jeffrey said in one class something to the effect of "your failed visualizations might make great art," I think this comments on the trade off being described between something visually interesting and precisely informative. If there's a visualization that looks cool but does not communicate quickly and effectively, then by some definition, it's not a good visualization; it's not a trade off, it's possible to have a good vis that communicates well and looks awesome, but it's not possible to have one that looks awesome and communicates poorly. I would say the Girl Talk vis we saw was fun to look at, but didn't communicate as well as it could have.

Also about the Girl Talk vis, I like the interpretation that its cluttered feeling was an attempt to recreate the cluttered feeling of the music. It would be an interesting technique to take advantage of the fact that the viewer was overwhelmed in a vis to guide them through a story. If the designer could control the order in which aspects of the mess were comprehended in order to guide the viewer through the same realization they had. We haven't talked much in class about the order in which people's eyes tend to traverse graphics, so I don't know what design decisions guide a viewer around a space, but it might be possible.

nikil wrote:

I really like how Amar, Eagan, and Stasko were cognizant of the pitfalls and the scope of their research of "Low-Level Components of Analytic Activity in Information Visualization". Their taxonomy and also their classification system seemed to cover all of the bases for their examples, but as I was about to say (before they preempted it) that was only based on their own dataset. The people who had classification ideas and designed taxonomy systems before them had some overlap in their primitives, but the differences suggest that the data set is indeed important in deciding the primitives for a specific task. This suggests that in order to actually have a more accurate and complete classification system, a new way of creating and collecting data(or at least a lot more data) definitely must be used.

The wise authors are cognizant of this and this definitely adds credibility to their argument. The system is interesting, but as they say, more complete.

jtamayo wrote:

On the subject of Micro/Macro readings, I found very interesting how some visualization have finite, explicit levels, whereas others simply present all the information at once and the mind chooses how to abstract information to the appropriate level. Stem-and-leaf plots, for example, have essentially two levels: the high-level distribution of numbers (histogram) and the detailed information of each entry. Maps, on the other hand, contain infinitely many levels, depending on how we choose to interpret the information.

Also very interesting is how maps aid in building a consistent mental representation of the information. Physical maps use different font sizes and layouts, so that it is easy to ignore detailed information when looking at the big picture. Digital maps, on the other hand, require an explicit zoom level, making the process much harder by introducing an additional "context switch".

felixror wrote:

Upon reading Tufte’s chapter on data density, I am quite skeptical in his statement – “Maximize data density and the size of the data matrix, within reason”. His reasoning is that high data density, so long as they do not interfere too much with the visual quality, can make a more credible and objective account of the data being presented. I agree with his view; however, I think that maximizing data density might not always be the best way out. As Martin Wattenberg remarks, what is fascinated about data visualization is its ability to tell a story that cannot, or extremely hard to tell otherwise. In this regard, low-information design as refuted by Tufte might serve Wattenberg’s purpose very well. For example, if we obtain a very noisy data set and we want to show its underlying pattern, we can deliberately devise statistical model to filter out the noise, and that only revealing information is displayed at the end. So this is a case where maximizing data density might not serve us very well.

andreaz wrote:

I appreciate the emphasis Tufte makes on how clutter and confusion are not attributes of high-density designs, and how they are rather a failure of design. If complex data is presented correctly, the truth in the data will be revealed by its overall texture on a global scale. The quote from Josef Albers about how we read words as a whole really hammers down the point that our perceptual system is naturally suited for looking for overall patterns in complexity. We don't need to oversimplify data to detect a pattern; rather, we must present complex data in a way so that it reveals the truth in its totality. I thought the visualization of a newborn infant's sleep-wake cycle from the Card, Mackinlay, and Schneiderman reading was a great example of a micro/macro visualization because the normalization of the infant's sleeping patterns was revealed through viewing the complexity of the data on a macro level.

It's also interesting to see micro/macro visualizations applied to UI design. I feel that Ableton Live is one product that employs this type of visualization particularly well. The ease at which the user can easily zoom in and out of a track is really helpful for navigating through a complex project quickly and effortlessly because the macro view prevents the user from having to make frequent context switches.

jsnation wrote:

I found the chapter on Small Multiples interesting, but I was wishing it had more concrete detail on the relationship between figure size / level of detail and user perception of that figure. It makes sense that "comparisons must be enforced within the scope of the eyespan", but I was wondering what size is the "eyespan". And even more useful, is there some relationship anyone's studied as to the ideal size of a figure? Or as to the trade-offs between increasing size and understandability. I guess the size of a figure is often determined by the medium it will be presented in anyway though... I also liked the train pictures example as it was a bit counter intuitive to me. I would think that if you were only concerned with the lights arrangement on the train, then you would de-emphasize the train portion of the image by drawing it with as little detail as possible. But actually, you should draw the train with a lot of detail, but then only change the portion with lights, because we will naturally perceive the changes better and de-emphasize the rest of the image in our minds.

jdudley wrote:

I was happy to see mention of Principal Components Analysis (PCA) as a useful tool for EDA in the NIST handbook (although you had to dig for it). This can be a very useful method for EDA as it will let you reduce the dimensionality of your data and view in reduced dimensions (e.g., first and second principal components on X-Y scatter. Still, I feel like PCA is becoming once of the most abused EDA methods. The main problem is in assuming that the components have meaning. Since the components look for maximal variance along orthogonal components, the first and second principal components could simply be different sources of noise in the data, and the "good" stuff could be hidden away in smaller components. Actually, by definition PCA components are non-interprable as meaningful factors, although they are often seen this way. In many cases I like to apply Independent Component Analysis (ICA) which will attempt to "unmix" statistically independent, non-gaussian components from high-dimensional data. The nice thing is that they need not be orthogonal, which is great because the orthogonality assumption does not often reflect reality for many types of data. (e.g., genes differentially expressing in two different pathways).

esegel wrote:

Regarding macro/mico readings:

Visualizations choose a level of analysis (e.g. countries, states, counties, towns) that is relevant to the task at hand. For example, when trying to analyze which states should receive funding, it likely doesn't make sense to look at data individual towns. Presenting data on the right level of analysis involves aggregating data (e.g. summing, averaging) to better convey patterns while hiding smaller irrelevant details. Macro-micro visualizations (as presented by Tufte) do this just like other visualizationsl, revealing "emergent" macro patterns by combing smaller-pieces into larger-wholes. At the same time, they allow the user to "drill-down" into the composition of the larger patterns, examining the tiny details that add up the bigger picture. This lets the user "choose" his own level of analysis. Tufte does a good job describing this process. He writes, "Micro/macro designs enforce both local and global comparisons, and at the same time, avoid the disruption of context switching...High-density designs also allow viewers to select, to narrate, to recast and personalize the data for their own uses. Thus control of information is given over to viewers, not to editors, designers, or decorators."

This is all good. But Tufte does a much worse job at describing when this design is appropriate. Ideally, one would also be able to explore the data in more depth to better understand the patterns, but this seems to only work well with some datasets and not others. Sometimes (oftentimes) it breeds confusion to drill into the details. What determines this distinction? When should I choose a macro-micro and when should I avoid it? Tufte doesn't do a good job explaining this.

anomikos wrote:

@dlburke, @ankitak et.al. It is interesting to see how the readings affected the design effort of the groups. Most of us are always inclined to do everything by the book. I wonder though to what extend this limits our capability to approach a problem from radically different angles. It seems that an iterative process (d.school style) with user testing is more appropriate for creating visualizations.

On another note I really want to challenge the necessity of all the taxonomies like the one described by Amar, Eagan and Stasko. What are the metrics that can help us evaluate the performance of one taxonomy in describing a set of data?

"We have found that the proposed ten tasks cover the vast majority of the corpus of analytic questions we studied." Quotes like that make me really skeptic. What does vast majority imply? Are there any theorems on the completeness of each taxonomy. Are some taxonomies better at describing sets of data with specific elements? As an metaphor I was thinking about domain specific languages. Not Turing complete but really good for implementing programs in a specific area.

ankitak wrote:

@lekanw: I completely agree that "they [the graphics] are far clearer in what they are designed to do" after the redesign. And that's precisely my point: Is simplicity one of the underlying principles of good designs?

@acravens: Though the visualization with students coming into and going out of the US for education did provide the visual impact required, most other original visualizations didn't. They were definitely more aesthetically pleasing than the redesigns. But the question here is probably about balance of both. In this regard, many good examples of visuals that precisely convey data patterns and let the user clearly answer various questions in the data but are aesthetically pleasing at the same time are present in Tufte's micro / macro readings.

jbastien wrote:

I was procrastinating and reading my Google Reader feeds instead of working on the homework when I stumbled on this: http://datajournalism.stanford.edu/

There's a special appearance in it!

I guess the tubes are telling me to work!

jasonch wrote:

In the "Low-Level Components of Analytic Activity in Information Visualization" reading, I couldn't help but try to associate the 10 categories with respective SQL commands when I can. For example, 1. Select, 2. Select...where, 3. Select aggr(..), 4. Select max(..), 5. Select... order by. Of course for higher order questions like 7,8,9, and 10, it is not possible to done with simple SQL commands. But these can still be performed by combination of 1-5, perhaps we can even further deduce the 10 "low-level" task to even simpler forms, based on this logic?

zeyangl wrote:

Just some more thoughts on the "Pandemic Flu Hits the U.S" charts presented in class. It's an example of small multiples that invite readers to do both micro and macro reading. One of the dominant problem in that chart is the scale. Top of the scale, colored red, is >= 30 in every 1000 people. This makes the day 60 chart almost entirely red and excessively alarming. Another problem is the use of circles to represent cases, because very close samples tend to create overlapping circles. Coupled with the problematic color scale, it creates very cluttered results, like the day 45 chart in the series. It would be much cleaner if the map is pixelated like the Tokyo Population chart in P40 Envisioning Information, because grid squares do not overlap and thus bring out the natural gradient of the color scale.

mariasan wrote:

In both "Envisioning Information" and "The Visual Design of Quantitative Information" Tufte writes that thin data prompts suspicions. I'm not convinced that's the case for people outside of his field of expertise. In contrast, I think that it's a common and effective trick to communicate with "the masses".

There are two daily newspapers that are distributed for free in and around the public transportation entry points in Stockholm. Every day a column on the front page displays a little visualization of some arbitrary data. It's often ill constructed, skews the data, and/or has errors (many times percentages do not add up to 100). But people read it. And believe it. They talk about it with their colleagues at lunch, and the message has been spread - regardless if it's right or wrong.

Don't get me wrong, I'm not a fan of these visualizations, but my point is that they work. And I believe that for their audience they work better than what Tufte would call a "high-density design". A lot of people enjoy getting the light version, the one that doesn't show everything and leaves it up to them to navigate. And to be honest, on a cold, dark, rainy and windy mid-November Monday morning I can't say that I blame them.

arievans wrote:

My comment is mostly a discussion based on @mariasan's post above as I felt similarly in some respects. I agree that the masses put a certain trust into printed charts and graphs in sources that may or may not be legitimate. Even if the data is "thin," they have no real reason to doubt it. After all, the source of the data is always cited somewhere, and therefore it must be authentic, right? In general people seem to be too busy to question much of what they hear. We're all continuously moving towards easily consumable data. We want everything summed up and presented to us neatly so that we require minimal further effort to assimilate the information. Thin data, unfortunately, when summed up, appears to be just that. As @mariasan writes above, this is a super effective "trick" to communicate with the masses. In fact, it's so apparent in television news that I no longer watch it.

So the question is, can we do anything about the thin data problem? Are there any ways that we can prompt people to be suspicious about the summarized and pretty statistical graphs they see? I'm going to go out on a limb and say that no, we cannot do anything about it...yet. The main issue I see at the moment is that the majority of the content we have in our lives in non-interactive--especially anything data intensive. Imagine the future, however, when we view news in a more collaborative way. That is, data and visualizations that are open source and modifiable in a real-time time fashion. Someone could take data that has been skewed, rework it, and republish it in the same location. If people could then vote on which presentation divulged the greatest value--a la digg or similar, for example--we would be able to fight off this situation. Given our history as humans and our inherent authority-granting to print media, however, I don't think we can do much about them. We can still be presented a "light version," but the ways that it gets to us and and the ways in which we put trust into the sources of these summaries needs to change.

ericruth wrote:

I like @anomikos's point about taxonomies and principles laid out in our readings. I see the didactic value of them, but I also worry that they stifle creativity and experimentation. It seems like human nature (or computer scientist nature) to want to categorize and quantify data visualization methods, but I think the value of such principles diminishes as they get less general.

Data visualization incorporates an interesting mix of skills (technical, quantitative, aesthetic, artistic, etc), so it seems important to study it from all of these different angles, rather than focusing on one or two. To that end, I think it would be beneficial to try more unconventional, creative approaches to our work. It seems like there is a lot to be learned from an iterative/experimental approach.

msewak wrote:

Multifunctioning Graphical Elements in Tufte's book

The War with Germany is a beautiful chart, I like how the numbers serve two purposes- they tell us how long each unit was in France for, horizontally and vertically, which units were at France at any month. However I wonder if this graph could still be used if the data was non contiguous. What if unit 1 was not present in france from April to June 1918. Wouldn't holes in the graph distort the sorting (by length of each unit's presence)

rparikh wrote:

Tufte's vehemence about data density made me feel bad about my first assignment submission :(

I think generally Tufte is right; the example of the bar chart he showed was clearly an example of a low-density, unnecessary graph. Some of my favorite graphics are the ones like Napoleon's march or the New York City weather chart included in the chapter. I can pore over those for hours, deconstructing the data in my mind and drawing multiple inferences about many different things. However, I think it's very important to keep in mind the task at hand. In our first assignment, the task was explicitly to show our "coworker" the general changing landscape of the smartphone industry, with less emphasis on things like exact figures and numbers. The goal was to answer a very simple question with a simple chart (at least that was the goal in my mind). While numbers are important and ultimately metrics-driven thinking has proven effective, some situations may call for abstracting away much of the data to show a more clear picture. This attitude can be used for positive efforts, such as very obviously showing benefits of things like water conservation, or it can be used to "dumb down" a message and suppress critical thought. This is evident in the table that Tufte uses that shows the data density in various publications, and in dead last by a large margin is Pravda.

jeffwear wrote:

Tufte begins his chapter on "High-Resolution Data Graphics" by criticizing a bar chart from the OMB for its sparsity. At first, I puzzled over how the graph might be improved. While the "Total Participation" bar seemed somewhat superfluous - the total participation could be rendered in other ways (for instance by stacked areas) without necessarily sacrificing easy comparison of the two participation metrics (for instance if the stacked areas were arrayed one above and one below a common baseline) - what if all the OMB had or desired to display was the contrast between the data rendered by the two bars on the right? Was there not a place for simple graphics?

Reading onward, I came to recognize that it would be very simple to increase the data density by shrinking the graph. Especially if the "total" bar was removed, the graph was much bigger than it needed to be. But I would like to consider Tufte's first suggestion instead, that we increase data density by enlarging the data matrix. While I don't believe that Tufte suggests this possibility explicitly, I feel that the emphasis on high-information displays might prompt us not just to reevaluate our designs but also our investigatory methods, our data sources, and our hypotheses.

In the context of the OMB chart, maybe it was the case that all they wanted to display was a comparison of the two participation metrics. Might this have limited their explorations? In the pursuit of such a narrow goal, maybe it became the case that all they had to display were these two categories of participation - that they could not have displayed more.

If they by contrast had been encouraged to seek out a breadth of data to display, they might have been able to answer questions concerning college participation vs. adult education participation and more. As Tufte says, "Summary graphics can emerge from high-information displays, but there is nowhere to go if we begin with a low-information design." To revisit my above question - "Is there not a place for simple graphics?" - the answer may well be - "To answer simple questions." But as we cannot be sure that the problems we investigate are simple we are well encouraged to dig deeper for data.

Leave a comment