Lecture on Tuesday, October 11, 2011. (Slides)


  • Required

    • Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases. Stolte, Tang, and Hanrahan. IEEE TVCG 2002. (pdf)

    • Multidimensional detective. A. Inselberg. InfoVis 1997. (pdf)

  • Optional

    • Dynamic queries, starfield displays, and the path to Spotfire. Shneiderman. (html)


jkeeshin wrote:

In the paper, "Polaris: A System for Query, Analysis, and Visualization of Multidimensional Relational Databases," the authors present a tools to allow you to create and sort through large multivariate datasets. It's an interesting tool, but I found most of the paper to simply be working definitions, and they only presented a few examples of the use cases of their tool.

It is interesting to read their justifications for data types, organization, and operators (concatenation, cross, and nest), and what sorts of visualizations it allows you to create. However, I found the images much more informative than then explanation, and imagine with a tool like this, a few minutes of browsing or a short video would explain it much better than a long-winded discussion of types, graphics, and mappings.

I think the examples they present at the end about a CFO making decisions about a marketing budget and researchers trying to figure out causes about performance times are interesting--they seem to be "real life applications," but at the same time, relatively contrived. They make it sound like a "magic" tool that makes sorting through complicated data very easy--I imagine it makes it easier, but that it is still difficult to find the right perspective on the data.

bbunge wrote:

What surprised me the most about the Polaris paper is that it was authored in 2002! But, after watching some of the demos on the Tableau website, I see that the legacy of this initial idea (of providing an automated and visual way of working with data) has lived on.

The Polaris study used several sophisticated data sets. Is there some block that prevents this system from being used on data in practice? How difficult it is to set up a relational database such that it is compatible with the structure required by the system?

I went to a series of talks on Monday that described how interpreting the world's massive amount of data is the way of the future, specifically in the bio-tech and legal domains. I imagine that these sorts of tools and their use will become more and more commonplace.

The thought of using a tool to aid in the exploration of data excites me. However, I can't help but think that this to some degree oversimplifies statistical analysis and data visualization. The Tableau website and videos heavily emphasized "drag & drop" and "Ease of use that gives you chills". Statements like this make me wonder what the market is for the product. No doubt, the examples were impressive. The Polaris paper emphasized the consideration of data visualization principles and how they were incorporated into the project. I see these elements in Tableau too, but they aren't the focus. Nevertheless, I am very excited to use Tableau and experience its benefits, especially during the stage of figuring out what story to tell.

abless wrote:

Jeremy, I think it is not surprising that in an academic paper you would expect to find "working definitions", and the underlying functionality of the system. Clearly, the paper is not meant as a tutorial, but rather as a presentation to the scientific community. The videos on the Tableau website are intended for the end user, and are maybe more what you were looking for.

I think the Inselberg paper on Parallel Coordinates was _really_ interesting. It amazes me what insights a simple plot over the various variables can yield. This really seems to be a great help in exploratory data visualization. However, as pointed out in the lectures, the relative ordering of the variables on the x axis is probably quite important, dependent on the data and the questions you might want to ask.

So far, I have seen how visualizations can serve two different purposes. On the one hand, they can be presented to a viewer in order to convey a message. In that case, it is important that the visualization be easy to decode, easy to understand and conveys the underlying message. The other purpose of visualization seems to be in the exploratory data analysis. In that case, even highly complex (and hard to decode) visualizations (like the parallel coordinates) seems to be helpful in understanding patterns. However, I doubt that these kind of visualizations would be presented to a viewer in an article. Rather, they seem to be the first steps towards finding a suitable visualization.

That leads me to the question whether certain visualizations are more suitable for data exploration, and others are more suitable for presentation to the end user?

cmellina wrote:

@jkeeshin, I found the Polaris paper to be a nice explanation of how design decisions in the tool were made based on a task analysis of data exploration. I think it's nice to think back to the "Low-Level Components of Analytic Activity in Information Visualization" paper as well, and I found the contrast between their approaches interesting, as the Polaris paper is more directly concerned with exploration. I also enjoyed their formalization, which seems like a necessary step to building a system that is both complex and self-consistent. But in general, you're right, I didn't get a full sense of the power of that sort of system until seeing the demo of Tableau in class today.

@abless, I also really liked the parallel coordinates visualization. I especially appreciated how, in the paper, multi-dimensional constraints within the data were translated to 2d constraints on the parallel coordinates graph in the form of a curve envelope. But the Protovis in-class demo really illustrated that this visualization too is most powerful in an interactive format.

With respect to your question about presentation vs. exploration, I think there's sort of a mixed answer. It's true that some techniques, esp. dimensionality reduction, are best suited to the exploration phase, as they don't provide easily interpretable graphics. But I'd say an increasingly important thing to consider in presenting a visualization on the web is that the fact that the user can explore a data set interactively in fact makes it the best form of presentation. So generally I'm not sure that there are some visualizations more suitable for exploration or some for presentation. In both cases, you're trying to maximize the ease with which you comprehend the data.

jojo0808 wrote:

@bbunge, I was also really surprised at how similar Polaris was to Tableau! Until Jeff mentioned it in class, I didn't make the connection that Polaris was actually Tableau's precursor. Interesting to see how the tool evolved over time!

I enjoyed reading the taxonomy of graphics described in the Polaris paper. It was yet another interesting way of breaking down a graphic into smaller parts; here, it seems that 2 dimensional graphics (either Ordinal-Ordinal, Ordinal-Quantitative, or Quantitative-Quantitative) are the basic "building blocks" of a graphic, which can be put together into more complicated graphics by layering them, using a small multiples scheme, or changing the visual properties of the marks used to represent records on a graphic. I still have a hard time imagining how I would visualize data, so this might a useful way of thinking about how to build up a graphic using smaller, more manageable parts.

ajoyce wrote:

Brie, I think your point about oversimplification is a valid criticism, but I think a major point of Tableau as a commercial product is actually that it does abstract away the need to deal directly with such principles of data visualization. While relatively few people are familiar with many of the concepts discussed in the Polaris paper, almost everyone can interface with "drag & drop" simplicity. In many cases, the corporate executive or scientific researcher doesn't need or want to be familiar with the underlying processes involved in Tableau's functionality, but simply wants to draw conclusions from the data as efficiently and easily as possible.

Similarly, from what I've seen of exploratory data analysis practices, a common theme seems to be that experimentation takes precedence over precision. Generating a dozen different exploratory visualizations of a data set is typically more valuable to the process than producing a single, superbly crafted visualization. The goal is to quickly and efficiently get a broad idea of the data's characteristics, something a tool like Tableau can be of great help in, even if it requires a bit of simplification to do so.

jkeeshin wrote:

@everyone -- Fair, then I guess I didn't make such a good point. But I found the demo in class today to be orders of magnitude more illuminating than the papers.

mlchu wrote:

@bbunge: I agree that the use of Tableau for data visualization simplifies statistical analysis and data visualization. However, I do not see the visualization produced by the tool as the absoluyte end result of data exploration. As with any kind of data exploration techniques, the generated visualizations (in any formats, such as graphs, animations, etc…) serve as a “stimulus” for further analysis and interpretation. Using Tableau provides the user with huge convenience to explore the dataset without spending much time parsing the data. If there is any interesting pattern found, the user can always scrutinize the sub-set with more vigorous analysis (like PCA or other statistic analysis). This helps the data user to understand the characteristics of the dataset and apply the appropriate advanced analysis, rather than entangled in the result from complex analysis without intuition beforehand.

@abless: I think the choice of visualizations also depends on which stage of data exploration we are at. It seems to me that the class started off looking at different elements in static visualizations that help us to convey the message. In today’s and last Thursday’s class, we discussed more about exploring the data itself and finding out underlying message we would like to convey by visual inspection. So it becomes clear to me that, interactive and dynamic visualizations would help us to spot the interesting stories hidden in the unorganized data. Once we identify the message or the answer to our initial questions, the issue here is to effectively convey the message, subjected to the limitations in presenting the results, such as monochrome, interactivity, etc…

grnstrnd wrote:

I found that The Multidimensional Detective was a particularly engaging example of data exploration (the parallel coordinate example from class today). In part, this was because it was such a successful exploration in that it directly assisted the firm that provided the data. Also, it was simply encouraging to know that data visualization can have a tangible affect on the world. Moreover, the parallel coordinate visualization was simply captivating and the interactive demonstration in class even more so. I found myself wondering: why do we need any visualization except for parallel coordinate plots? The weakness that I first see in these visualizations is that other encodings are so much more tangible. Certainly, simple scatter plots make relationships much easier to visualize between two variables. However, any time we have more than a few relevant variables, I will almost certainly try parallel coordinates first.

luyota wrote:

I think previous comments on how Tableau seems to oversimplify the data visualization work is kinda true. Indeed, drag & drop functions make it easy for general users to generate various kinds of visualizations based on their needs, and it might involve some automatic decision making inside the software so that users can probably think less about visualization design principles. It greatly facilitates the process even for non-experts.

However, as for experts, the software can still help build the design ideas. While the visualizations provided by the tool might be limited, it establishes the first step for these experts to think deeper. For example, since it's fast, people can spend more time on refining better insights through thinking about more ideas. Also, the tool can help them quickly understand the data's characteristics. I think for both general users and experts, the tool can be helpful.

stubbs wrote:

Inselberg mitigates his contribution by calling it a 'preprocessor', but it functions as an elegant optimization nomogram. Many times we wish to avoid aggregation/PCA/SVD/etc. and explore the differences between individual tuples in a database, sometimes in a sparse and high-dimension (and heterogenous) space; the Inselberg parallel cords are a compelling/clever solution. It may be interesting to weight/scale the variables, per a given linear combination, by vertical height, thereby giving an implicit (albeit heuristic) semantic significance to the integral of the curve (particularly with heterogeneous variables).

Polaris, nor any of the systems mentioned throughout the Polaris paper, consider the semantics of each variable. Automatically mapping state names to a map is a start, however, would an underlying ontology or a base of structured domain knowledge be pertinent to a variable's 'domain'? Could something like this be implemented as a plug-in for Tableau?

olliver wrote:

I found Inselberg's plots both inspiring ant extremely poorly executed. It seems to me that there are a number of ways that a similar plot could be made that would better capture perception. First of all, since each line represents a different object in data, why not color each separately, That way users could follow the same piece of data across all variables and therefore make relationships between non-adjacent data. I'm not convinced that one should use lines at all in fact, since they look like a tangled web. Vectors would work just as well as would non-linear shapes, such as smoothed curves that fit through each intersection points. I'm also curious about extending the display into 3D. What if each variable corresponded to a vertical pole, and then one could draw lines (or whatever) from each pole to every other pole. Of course, you would need some camera and interaction to move around in the space, but then you could visualize all the n-squared relationships! which would be pretty rad.

jnriggs wrote:

Inselberg's Multidimensional Detective paper (and today's corresponding demo) are some of my favorite parts of this course. I believe that nothing else we've come across has made such a strong case for human perception's irreplaceable power to locate hidden meaning in large datasets.

However, while I'm basically in love with the ideas in this article, it asks a lot of questions that I'd love to learn more about. First, how arbitrary (or not) were some of the decisions they made when forming the parallel design they describe? It would be great to see some task-driven research proving that even the smallest details of their framework are "perceptually optimal."

As a simple example, how does the left-right orientation affect our ability to go find patterns in the data compared with how a top-bottom orientation would? If we're looking for different patterns in the data, can we somehow know in advance what the optimal presentation would be?

Maybe this whole process can even work the other way; patterns regarding how we locate meaning within data might inform us about perception itself.

blakec wrote:

Isenberg's multivariate parallel coordinate frame works very well for finding correlation among lots of variables and seeing some patterns among sets of data points. This system, however doesn't create the same impact and general understanding that other forms of visualizations that display fewer dimensions of data. Yes, his technique can be useful at sifting through large dimensional data to find properties that are interesting, but to get a better understanding of how those chosen variables interact, a visualization that shows fewer variables and utilizes varying techniques (color, shape size... Etc.) needs to be used, for example if you plot all the different data points in a car such as engine speed, miles per gallon, engine temperature and many other properties that can be measured from a car, you might be able to see some trends between how engine temperature, speed and mpg change. This pattern will only be sets of lines. Encoding other features such as color for temperature can add and putting the other two properties on axis can result in a visualization that more clearly displays this.

phillish wrote:

The Inselberg paper goes over a sample visualization of a large dataset and gives some pointers toward finding "visual cues"--patterns or anomalies in the chart. Finding patterns in such a cluttered mess, however, proves extremely difficult in a single static visualization. Like we mentioned in lecture, separating data points into multiple smaller instances of the graph more clearly reveals insights (e.g. "Batches with low defect rates have poor yield and low quality").

This got me thinking about other ways to visualize large datasets. In class we mentioned utilizing interactivity (data on-demand) and animation to show more information while minimizing clutter. In any case, the point is to never show a visualization of the entire data set. The key is to selectively show just enough information that the reader can make an insight from. In the likely case that multiple insights can be inferred (such as the VLSI chip yield example), the same principle holds. Then all we need is another dimension to differentiate between different graphs, such as juxtaposing them (small multiples), transitioning between them (animation), and differentiating them (through color, opacity, etc.).

fcai10 wrote:

I thought the Multidimensional Detective paper to be very underwhelming. Parallel coordinates looked like a pretty blunt tool with which to slice the metaphorical data watermelon. The graphs look like a clot of hairs in the bathtub sink and while Inselberg exhort us to "not let the picture intimidate you", I can't help but be a bit scared of what I see. Inselberg also "admonishes" us to "carefully scrutinize the picture", "no matter how messy it looks". Yes, there should always be careful scrutiny, but there also needs to be more visualization types available so that the data can be presented in as non-messy a way as possible. A multitude of black lines representing data points, overlaid on another set of black grid lines makes for an unnecessarily messy and sometimes unintuitive presentation of data. The Inselberg paper seems like it might be one of the very early papers about exploratory data visualization tools, so I am surprised that the Polaris paper (which I thought was described a much better tool), was published only 5 years later.

njoubert wrote:

The Polaris paper (aka Tableau) has generated a lot of discussion about the paper and application's relative merits, all within the implicit type of data the authors assume. Everything we've seen so far falls into the "business intelligence" type of data. Sales prices, computer systems, etc. In other words, the fields where the formal statisticians has had less time to work on, in comparison to pure scientific data where rigorous statistical testing is a necessary part of data exploration.

What I'm most curious about is whether Tableau can bee bootstrapped into supporting more formal methods, or whether their design tradeoffs has made this infeasible. The "Data Transformations" section appears to be the place that needs the most extension, where aggregation, partitioning and grouping no longer supports the full range of formal statistical methods. Since these tools are applied to a single quantitive field at a time, we cannot easily infer causal relationships or test covariance between fields. So possibly we can extend this part of Tableau?

As a different point in the design space, Graphpad Prism attempts to solve the data statistics problem directly for scientists. http://www.graphpad.com/welcome.htm. My colleagues in biosciences swear by this product, since it allows them to apply many different statistical methods to their data and quickly converge on the appropriate tests and models to use on their dataset. It would be interesting to compare and contrast these tools.

jneid wrote:

I think the comments discussing the purpose of Polaris/Tableau and the Parallel Coordinates visualization are especially relevant. As mentioned, it is a departure from last assignment's goal of displaying data and a message to a general audience. In these cases, the tools are made specifically for exploratory analysis (Polaris is defined as "an interface for exploring large multidimensional databases", and Parallel Coordinates is meant for "Knowledge Discovery" and "Decision Support"). For this reason, some of the discussed "downfalls" of the tools are actually beneficial. For instance, Brie mentioned that Tableau "drag-and-drop" mentality oversimplifies statistical analysis and data visualization. The purpose of the tool, however, is to explore the data, which seems to be best achieved through an iterative process. For this, the most important aspects are latency of interactivity and ease of use. Once the data has been explored to find overall patterns and more specific interesting instances or clusters, the data can be more heavily analyzed using statistics or transformed into a visualization with a more specific purpose. For assignment 1, even though the assignment was to create a visualization with a message, using a tool like Tableau would have been incredibly helpful in finding a message to portray. Likewise, the parallel coordinates visualization seems at first overwhelming, seeming to cram too much information into a small area. For the purpose of exploration, however, this is optimal. General patterns can be observed, so that you can then drill down into interesting areas (the interactive version of this tool is great for that). In general, I think the tools presented, although perhaps not the best for presenting a single idea, are incredibly helpful for the purpose of exploring data.

crfsanct wrote:

Obviously, Tableau is effective for databases and is useful when we want to know correlations, trends or patterns among various dimensions. However, I cannot say how easily it visualizes other metrics such as connections in a network or flows or cycles of some kind of process (air flow over and airplane wing). This brings me all the way back to the first discussion about scientific vs information visualization. Now that I have seen Tableau in action, I think I have a better grasp of what the distinction is. Tableau is definitely good at information visualization, where you can arbitrarily arrange dimensions to explore the data. But I can see how scientific phenomenon would need more targeted exploration and require a more specifically tailored visualization to convey a message as effectively.

jsadler wrote:

@fcai10 you have made my favorite critique of the Inselberg example graph with "The graphs look like a clot of hairs in the bathtub sink". I indeed think it is a bad sign with the designer has to lay down a "warning rule" of rule #1 "do not be afraid of this graph ... trust us ... there is useful data in here"... I think the Inselberg paper did a great job of going step by step through the use of the tool BUT the limitations of printed paper did not do them justice. The interactive demo in class of the car component parallel coordinates plots was way more effective for convincing me that this is an actual tool i might want to try vs some crazy academic invention i read in a static paper...

Reading the Polaris paper was good to get the sense of design rationale behind Tableau. Its funny to see how the "Bertin data vis ideology" made its way into the this tool.

I am wondering why we haven't seen more Tablueau-like features in programs like excel...

aliptsey wrote:

The idea of using parallel coordinates is interesting exactly because, as Inselberg notes, "every variable is treated uniformly". This means no external encoding that people might interpret differently. (If anyone sees how this might not be the case, please note!). @jnriggs, I think this is what Inselberg was getting at when he notes that they are "perceptually optimal". I do think that organizing it in series is potentially less than optimal, because users might perceive that a data point 'upstream' has an influence on a data point 'downstream', when in reality correlation does not mean causation.

Inselberg notes that it is a tool for skilled users, which is an important distinction to note when we make assertions about how useful this multidimensional visualization is. This method will not be used to generate a graphic that will be published in a newspaper; rather it is a tool for experts in any field to analyze their data in a way that is very, very difficult to misunderstand because of the parallel coordinate system. I think this is brilliant, but thought so only after seeing it queried in class. It seems interactivity (or at least multiple static queries) are critical to interpreting the graph for even the most skilled users.

ardakara wrote:

Reading the polaris paper, I realized that even though Tableau's Show Me defaults may not always make the most handsome visualizations, there is a strong set of theory behind those decisions. So it's probably worth taking a second to think why certain charting forms might the ones suggested before dismissing them. Even though the exact suggested form may not be useful, there might be insights based on theory behind it.

The parallel graphs that Inselberg talk about really shine if they can be interacted with, or if it is possible to isolate a subset of the data. If all data has to be present, or if no interaction is possible, one good way of utilizing them would be to keep trends linear (e.g engine displacement and acceleration would be adjusted to have a straight line trend) and highlight/label outliers that might be interesting to look at (e.g small engine displacement, but quick acceleration).

I want to finally add that scatterplot matrices feel overcrowded and hard to digest. Once each frame is carefully observed they offer a great depth of information, however, they seem to violate the "vary the data not the design" principle, which makes it hard to get the meanings of the trends in different frames let alone comparing them.

yangyh wrote:

I found the Polaris paper to be really interesting to read as it not only introduces the tool but also presents many concepts on data visualization, which I thought to be really helpful. However, I would say many of them are not really easy to understand and put into practice, and that's why a tool such as Polaris or even today's Tableau people is of great importance, and can definitely helps people achieve a higher level of data visualization, both in an effective and expressive way.

After today's demo, I was also impressed by how much Tableau can do for me. I can't wait to start my assignment 2 and get to explore the true essence of data visualization!

Last thing, as jsadler said, I'm also wondering why we don't really see more Tableau-like features in the currently prevalent chart tools, such as Microsoft Excel or Apple Keynote. Maybe it's because it's way too complicated for normal users? Though from the interface, it's not that hard for users learn how to make use of it, it's probably more difficult to teach people how to use it in a correct away? I really don't know the answer, but I hope I can learn as much as I can from assignment 2, and hopefully to be a great "visualizer" in the future!

yangyh wrote:

I found the Polaris paper to be really interesting to read as it not only introduces the tool but also presents many concepts on data visualization, which I thought to be really helpful. However, I would say many of them are not really easy to understand and put into practice, and that's why a tool such as Polaris or even today's Tableau people is of great importance, and can definitely helps people achieve a higher level of data visualization, both in an effective and expressive way.

After today's demo, I was also impressed by how much Tableau can do for me. I can't wait to start my assignment 2 and get to explore the true essence of data visualization!

Last thing, as jsadler said, I'm also wondering why we don't really see more Tableau-like features in the currently prevalent chart tools, such as Microsoft Excel or Apple Keynote. Maybe it's because it's way too complicated for normal users? Though from the interface, it's not that hard for users learn how to make use of it, it's probably more difficult to teach people how to use it in a correct away? I really don't know the answer, but I hope I can learn as much as I can from assignment 2, and hopefully to be a great "visualizer" in the future!

zgalant wrote:

I thought the most interesting part of the Tableau demo was the scatterplot matrix. It was so quick and easy to generate, and the matrix is a really great way to understand what is actually happening with the data. It's kind of a meta-visualization, since it helps you visualize the data in order to create a visualization.

I think it does a really good job of showing all different relationships, so you can pick one that actually differentiates something and tells a story.

zhenghao wrote:

It is interesting how much emphasis the authors of the Polaris paper placed on being able to rapidly iterate through different "views" to explore different facets of the data.

As mlchu (and others) pointed out, tools like Polaris and Tableau really shine at enabling the rapid generation of high level visualizations of the data in order to give an understanding of the areas of potential interest. This ability for rapid iteration and hypothesis generation is perhaps much more important for us when dealing with extremely high dimensional data where even feature selection -- identifying the dimensions that we care about -- is a difficult task!

This issue seems particularly interesting to me since I happened to be at an info session by Tableau today where I had the opportunity to ask Dr Mackinlay a few questions about Tableau and automatic visualization. In particular, he mentioned that there was so much emphasis on making the visualization process almost instantaneous because they had realized that people tend to be more likely to find interesting things during data exploration if they are allowed to generate ideas without having to pause in between. This perspective might help explain the tendency for "over-simplification" that we observe in the design choices for Tableau : )

awpharr wrote:

@jnriggs: You definitely point out a questionable part of parallel coordinates. They seem highly dependent on the orientation (i.e. top to bottom or left to right as you mentioned, but also the position of the variables next to each other seem just as important), which makes them difficult to get right. Inselberg talks a small amount about this in his paper. I have had to use them for a class once before, where we were optimizing a truss system for the lowest weight that still met the constraints of the truss members. The final visualization was a parallel coordinates chart where the constraints and truss member sizes were grouped together as the categories, and each line between them was colored based on its weight. By horizontally grouping the constraints and truss members together, the general trends become more clear within both categories as to the range of values that minimized the weight of the truss.

Finding the best orientation for a given data set's parallel coordinate visualizations can take a long time, but once this point is reached, the information gleaned from it can be very obvious and important. Like I have mentioned in prior posts, visualizations work best when they are the products of high level synthesis of computer and man. These visualizations are exactly that.

jessyue wrote:

I also thought the paper, “Polaris: A system for Query, Analysis and Visualization of Multi-dimensional Relational Databases” was a very interesting read in finding out more about Tableau. It is always fascinating to go back in time and see how the original idea evolved into the product it is today. I thought the paper should have mentioned more about VizQL, the database visualization language behind Tableau, which is one of the reasons the software is so powerful. I am also interested in finding out how Polaris evolved from handling relational databases, to other forms of less structured data not in a database format, and the challenges faced there. I did enjoy reading about the examples and the practical applications given in the paper. In addition, I was impressed by the simplicity of the Tableau interface, and how fast the program responds and generates graph given how much data it has to analyze.

jessyue wrote:

I also thought the paper, “Polaris: A system for Query, Analysis and Visualization of Multi-dimensional Relational Databases” was a very interesting read in finding out more about Tableau. It is always fascinating to go back in time and see how the original idea evolved into the product it is today. I thought the paper should have mentioned more about VizQL, the database visualization language behind Tableau, which is one of the reasons the software is so powerful. I am also interested in finding out how Polaris evolved from handling relational databases, to other forms of less structured data not in a database format, and the challenges faced there. I did enjoy reading about the examples and the practical applications given in the paper. In addition, I was impressed by the simplicity of the Tableau interface, and how fast the program responds and generates graph given how much data it has to analyze.

insunj wrote:

I always thought data visualization as a scientific method for proof but more as an art. Some may argue, but data visualization and art was very similar in that they both create visual argument and has to be effective. Art speaks artists' voice, and visualization speaks data's voice. In Polaris paper, I found it interesting how they categorized and formalized data visualization process from step a to z. There were some selections on the process ( for example, summation, grouping, etc), and I wondered if there could be more and whether this standardized process may limit the users' creativity in creating data visualization.

Overall, Inselberg's paper was interesting and encouraged me to look beyond pictures/lines. However I found his argument on 'visual cues' sometimes problematic as pointed out by Tufte in data integrity. Data visualization, when misguidedly or mistakenly depicted, could lead to different perceptions than what numbers represent. Then, we can no longer rely on visual cues and sometimes we shouldn't trust our visual perception 100%.

arvind30 wrote:

I found both papers to be very interesting reads. I wish I'd read the Polaris paper before I'd started initially playing around with Tableau. Whereas previously I was just throwing fields onto shelves to see what Tableau created, understanding the algebra behind the table and visual specification helped me understand how to better create multivariate graphs. Besides this, there were two things that jumped out at me from this paper. I was surprised by the claim that the interactive experience "does not need to be real-time in order to maintain a feeling of exploration: the query can even take several tens of seconds." These seems counter to what I would expect (for example, Google and Amazon have found that even milliseconds of delays to page loads cause significant drops in conversions: http://www.svennerberg.com/2008/12/page-load-times-vs-conversion-rates/). Why are users willing to tolerate a greater lag during data exploration? I wonder if this is a result of users having some domain knowledge about their datasets and having an understanding that it takes a while to grok and visualize a large dataset? As tools such as Tableau begin to reach a wider audience, I wonder if the average wait time a user is willing to tolerate will fall?

I'm also curious whether the use of computer-generated visualizations as a tool for storytelling is a more recent phenomenon - I was a little surprised that the examples in the paper were of "mundane" data analyses tasks as I typically find storytelling examples (such as the one of Napoleon's march and retreat) more compelling. Clearly this is a major focus for Tableau now - their welcome screen showcases a number of storytelling visualization (from tech IPOs to Oil Spill statistics) and at yesterday's tech talk, Mackinlay showcased Tableau Public which allows you to publish and share your visualizations online. Perhaps, with the explosion of the social web/graph, the stars are now aligned to better address this.

ifc wrote:

Inselberg's multidimensional detective construct is one of the best ways that I have seen to explore general multidimensional datasets. It has its issues with metric sorting and interval ranges, but overall it gives people a very powerful tool to explore data. His running example of chip production is a perfect way to demonstrate its potential usefulness.

Personally the overwhelming amount of data shown for large datasets can be among its largest limitations. This overload prevents useful people from extracting useful information and is just generally a visual hinderance. Heavily filtering metrics and line opacity can cut down on this problem, but this issue remains. A simple solution to this might be adding simple statistics, like mean and confidence intervals, to the bottom of each metric. I realize these could be misleading, but I think more times than not, they would communicate more information than what is essentially a 1D plot. You could even build in a way to see how changing one interval effects the statistics of another metric (ie car cylinders between 4 and 6 -> average mpg goes up 20% to 25mpg).

pcish wrote:

In response to arvind30's comment on response time, I think the mentality of users using these different tools is the key. The verb used is also telling: when users are "exploring" data sets, they have a willingness to play with the data, to investigate it in different ways, to derive insights and ideas from what they see, instead of merely demanding a quick answer to their question.

On the Inselberg paper, I am actually semi-surprised that it qualifies as an academic paper. It's content, and wording, seem to be much more appropriate for a handbook on data analysis techniques or some other less formal venue. Nevertheless, it was an interesting read, and refreshing change of pace. The in class demo of the car characteristics data using a tool similar to the one described in the paper was also very helpful in understanding what the potential of such a tool could be, which might not be as apparent from just the paper.

wtperl wrote:

Although Jeremy may have missed the point of the paper, I think he does get at something important-- the visualization tools we've seen all have fairly complicated and cluttered interfaces that seem to *need* video/live tutorials to show how they work. This seems odd to me because data visualization is obviously such a designed medium. Although interface design is a fairly unique kind of design, the interfaces of Polaris and Tableau are quite visually unappealing. Maybe this is just an inherent problem in data vis software; maybe (good) tools just have too many features to present more cleanly.

It's not as though the interfaces show no thought... we see in the paper that a lot of thought went into Polaris' design. In addition, we saw Jeff use Tableau in class very quickly and efficiently; clearly it's a solid interface for experts. But in my own tinkering with the program I found it to have a steep learning curve because of so many controls and options on the screen. I would love to see a data vis tool built with the beginner in mind; with graduating levels of complexity and an interface you could discover by just playing with it. I think a tool like that would be a lot more fun and easy to use for visualizers of all technical ability, though of course it would be no easy task (especially if you wanted to maintain all the features of something like Tableau). But still a challenge worth exploring!

tpurtell wrote:

The parallel coordinates paper really did not convey the sense that this particular type of visualization was the best way to analyze the data. It's more a description of a specific exploration in human driven data segmentation that suggests the following method:

Step 1: Make cluttered plot, Step 2: Select parts of the data that are hard to read as a focal window Step 3: If it de-clutters it just right, report a new segment boundary. Step 4: Think about why that boundary is there and try looking at different subsets of the data.

For datasets with complex relationships, it is necessary to look at windows of data to get a picture of the substructure of the data. It would be nice to automatically cluster the bundles of data so its possible to easily jump into different explorations. Regardless, I'm more of a fan of a SPLOM approach because it makes it much easier to quickly look at each pair of variables and visually identify clusters. The sensitivity of the presentation to the variable ordering really puts a dent in its usefulness as an exploratory tool. The scatter plot view will be even more if automatic segmentation and sub-selection highlighting are included.

mkanne wrote:

Like pcish, I also really liked the Inselberg reading more as a guide to exploring data well then as an academic research paper. I find the guidelines very helpful as they made me reconsider the tools I am using to explore data. As netj mentioned, Assignment 1 could have benefited from the application of better exploratory data analysis techniques. I also used excel to create simple relational line graphs and scatterplots that gave me some indication of the relationships between variables and led me to the ultimate story that I told with my visualization. If I had followed Inselberg's advice and more closely scrutinized the picture, I would have understood the inverse relationship between automobile deaths and motorcycle deaths. Rather than simply considering the binary direction of deaths for both variables, a closer examination of those simple excel graphs would have shown that the trends are almost inverse meaning that they may be due to extenuating circumstances. This is also a case where testing the "I a sure of.."s would have led to similar and possibly more useful and accurate conclusions about the data.

dsmith2 wrote:

Multivariable datasets are certainly a reality in this day and age, and two of the ways proposed to deal with so many variables we particularly interesting to me: Scatterplot Matrix and Inselberg's "Parallel" Coordinates.

Scatterplot matrices seem to be an excellent way to quickly examine moments where two particular variables really differentiate the data. In class the example of corporate vs labor funding between democrats and republicans was an interesting place where the data was split very effectively.

Typically, when examining such large data sets, one's own questions and biases are what generally drive one to examine/plot particular variables against each other, but allowing the visualization of so many variables plotted against each other potentially unleashes all kinds of unexpected visualizations.

On the other hand it really still gives rise to the problem of, as Inselberg says: orthogonality using up the plane very quickly. Scatterplot matrices attempt to overcome this problem by simply displaying more axes, but this is hardly a solution.

Inselberg's Parallel Coordinates allow one to visualize ALL of the data in comparison to ALL of the other data but not in a simple variableA vs variableB setting. The advantage of this is quite interesting because it allows one to see how more than 2 variables relate - which has obvious benefits.

Further more it allow filtering by individual and multiple variables to reduce the data deluge of the initial image (as Inselberg says: do not let the picture intimidate you). As seen in class it allows for incredible interactive possibilities (with the car example).

As I see it, the parallel coordinates style of visualization is not a be all end all visualization, but rather, it facilitates the ability to examine the variables in relation to each other and therefore could lead to the design of some fantastic more specific visualizations. On their own, the parallel coordinates are something of a data deluge.

jhlau wrote:

The Polaris paper was very interesting, but one thing that jumped out at me was when the paper asserted that a moderate amount of lag (>10s) was acceptable for data transformation and querying. This goes strongly against my intuition, because Tableau is also based on quickness and ease of use. The more cognitive work you have to do between modifications in your visualization, the harder it is (I feel) to stay focused and achieve a strong end product. A product that lagged between modifications seems to be extremely hard to use. While I haven't dealt with this in making visualizations specifically, I do know that when I'm coding, the longer it takes to compile, the harder it is for me to pick up where I left off. I imagine much of the same effect exists when creating visualizations.

More on Tableau: the technology amazes me. The tool itself is extremely useful, but the piece I see as most interesting isn't necessarily the human interaction, but Tableau's ability to identify suitable visual encodings for variables. The thing is, Tableau wasn't brilliant or anything. It wasn't always spot on, and most of the time it chose an option that I would've chosen anyways. What's important is not having to spend cognitive cycles on that part of the job, because it makes it so much easier to focus on creating the visualization as a whole.

Along those lines, I wonder if Tableau could craft whole visualizations based on the general look. It seems to me that interesting visualizations always show data with some kind of dynamic difference, e.g. bunching up is much more exciting than equal distribution in most cases. Is there a way to use computer vision to try to generate visualizations that tend to be more dynamic?

One other quick comment.. I noticed that my initial approach to a dataset is much like scatter gather. I'll first look at the dataset as a whole, then zoom into the parts I'm interested in and explore those. Once I've found the smaller parts that I'm interested in, I'll isolate those and explore those by themselves. Just thought it was interesting that the algorithm holds for my intuitive thought process.

netj wrote:

Interactivity seems to be one of the most important factors that determine the effectiveness of visualizations for exploratory data analyses.

As @aliptsey and several others have commented, I also discovered Inselberg's Parallel Coordinates technique very brilliant, but his presentation hindered the smooth perception of what it actually was. I have never seen this technique before, so only after reading nearly half of the paper could I finally start to understand the power of it. With the monochrome printing medium given, I see he did his best by showing multiple figures in series, but we can contrast this to the interactive version we've seen in class. I was shocked by how intuitive and engaging the technique became by adding interactivity to it. As Jeff brushed some range and moved it, we were able to instantly understand and start to see the obscured patterns. Even if I had seen this interactive demo first, I would have understood most of its power without reading any of his words.

I recall my experience with Assignment1, trying to understand the data using the primitive pivot table and chart feature in Excel. It was much better than writing relational queries in SQL myself, but still the delay and distracting tasks I had to perform between the time a hypothesis came up in my mind and when the actual data was drawn on the screen, were significantly slowing down my exploration. I was very excited to see Polaris extending this basic feature and Tableau taking it to the extreme with great interactivity. Some like @bbunge argue that Tableau's emphasis on drag&drop and ease-of-use is somewhat misleading, but I think those features that simplify and enable interactivity are exactly what make the product so effective for exploring multidimensional data.

sakshia wrote:

After seeing the demo of Tableau in class I was very impressed by the range of functionality that the software has, and how easy it looks to use. As referenced in the Polaris paper, it reminded me of Pivot Tables; the one I'd seen prior to this is Excel. Tableau felt to me like a much more powerful version of Excel.

As I was reading the Polaris paper, however, I started to wonder about who the user is for this software. The paper refers to other systems which "have taken the approach of providing a set of predefined visualizations," describes their features, and then goes on to critique that "this approach is much more limiting than providing the user with a set of building blocks that can be used to interactively construct and refine a wide range of displays to suit an analysis task." At this point, I stopped and questioned - because I wondered about how easy it is for a regular user to be able to utilise this interface in a way that's meaningful. Faced with too many options might be overwhelming for the novice user, and it is important to have defaults that can serve two purposes - a) to help the user navigate the interface, and b) to serve as a learning aid. I noticed, however, that it seems that the target user for this application are 'data analysts,' in which case, I wonder what adaptions would need to be made to the interface for a novice.

While I both of the papers contain interesting models, I felt something lacking. I would like to see empirical validation of the models, by the means of experiments with subjects/other methods. With Tableau, perhaps its success (or user feedback) can serve as testimony to what works and what needs to be improved. Inselberg's model, on the other hand, was grounded in examples of analysis, but I would be curious to see whether a viewer can actually interpret this, and do the mental work themself. Furthermore, it'd be interesting to see the interaction of users with varying level of 'expertise' (or 'statistical background'), and also to measure what the learning curves are for these models.

kchen12 wrote:

When reading through the specs and capabilities of Polaris, I was definitely (mentally) taking notes on how I should plan to explore my data for the next assignment. Reading about how they successfully created an interface for exploration naturally indicates the type of exploration us amateur data scientists can use to jumpstart our own data exploration. Many of the types of displays and aggregation/derivation capabilities Polaris supports were discussed in class: multivariate, comparing small displays, scatterplots and trellis displays, aggregation, binning, partitioning, grouping etc.

The two scenarios presented in the Polaris paper as well as the Multidimensional Detective paper show how powerful and useful these tools are, furthering echoing the idea that the analysis process is often an unpredictable exploration of the data--that data exploration isn't just to produce a pretty and readable visualization but to provide insights and solutions in response to a problem, be it coffee sales or the root of system failures.

One thing I would really like to experiment with is animations of the same data sets over time, something Jeff showed us in the beginning of lecture with lifespans. It was mentioned in the Polaris paper as a "to be continued" feature and I'm wondering if Tableau or any other tool we have access to supports it, or whether one must statically generate the images first and put them in another program to animate.

stojanik wrote:

One of most impressive aspects of using parallel coordinates for multivariate data set visualization-exploration is that is is highly scalable and efficient; efficient in the sense that you can effectively establish patterns and relationships between "mulitvariate/multidimensional problems without loss of information" using only a few visual encodings, e.g. shape, color, and orientation. I can see how the original image from the article is "intimidating", but it is the interaction with the model, through a series of sub-problem partitioning/hypersurfacing that may yield that one "digestable" story-image or collection of partioned images in the form of small multiples that makes sense of the problem(s) being explored. One area that I wish Inselberg explored more (and maybe he has in other papers) is the role of the queries - the "cutting tools" - in making this method of exploratory data visualization really work. I would have liked to see some examples of what he means by "combining atomic queries to form complex queries" and how those queries generated intricate cuts of the dataset.

Speaking to the comments others have made about the Tableau software, I agree with @bbunge when he/she (sorry) states that the use of Tableau for data visualization simplifies statistical analysis and data visualization, and that the visualization produced by the tool is not the absolute end result of data exploration. It appears very useful for prototyping and rapid visual exploration. But I also agree that offline, and perhaps collaborative exploration (offline/online) of data visualizations may contribute to the final iteration of a graphic or interaction tool to explore the data domain further. Martin Wattenberg and Fernanda Viegas speak to this in the ACM Queue article (p.4) when they discuss the importance of showing/talking to people about the visualization.

kpoppen wrote:

I like the focus that both of the papers highlighted the profound usefulness of using an automated system like Tableau for prototyping visualizations and investigating data. In many ways, I think this focus is more in line with computers' strengths-- much in the same way that software libraries obviate the need for programmers to repeat certain kinds of overhead for every project, using Tableau to see the story of the data automates very efficiently the process that people would otherwise have to go through to investigate data. Additionally, this use case is nice because it de-emphasizes the weakest point of computer-designed visualizations, which is that they can be less polished / perfect / display-ready than their human-designed counterparts. Focusing on interactivity, maximizes the overall productivity of the entire process from data collection to processing to investigation and visualization. I also agree with @jkeeshin that the most poignant demonstration of the power of the approach in the Polaris paper was in the actual in-class demo. To be honest, I would have been ok with them just straight giving me a taxonomy of what kinds of visualization are good in what situation, and then described how they convert data into an appropriate form so as to use these various kinds of visualization.

bsee wrote:

Both papers presented very interesting models and visualizations that I have never seen before, prior to this class. I found the Multidimensional Detective paper more intriguing, maybe because it is less technical.

@olliver: I have to disagree that the parallel coordinates visualization was poorly executed. Even though it is daunting at first, the paper does a really good job in guiding through the visualization. You mentioned about color, and on why there was not color given to the program. That point was actually mentioned on the paper, stating that color would definitely help in understanding the visualization. I'm guessing that color was not included in the paper because most prints are still in black and white. Also, color might not be helpful, especially for the x1 coordinate, because all the color will be cluttered together.

Also, I think that non-linear shapes (eg a spline interpolation between the points) will not work here. I believe that the vectors between the points represent some sort of mapping, and not data interpolation. However, I do agree that this can be improved. For example, we might be able to group certain data, and have the thickness of the line represent the number of batches that has property x(n) and x(n+1).

That said, however, I believe that interactive design, allowing users to choose and color batches they want to trace, would definitely improve the quality of the visualization. When you can reduce the set you are tracing, then colors would definitely make sense.

Correct me if I'm wrong, but I personally think that the point of Multidimensional Detective is not to propose the use of multidimensional plot, or parallel coordinates, but to teach readers how to break down a seemingly crazy plot. I found it amusing how the paper state "do not let the picture intimidate you", because I was immediately intimidated when I first saw the crazy plot. After being guided through the paper, I felt like I am more equipped to dissecting crazy plots.

Furthermore, the paper showed that that raw data with very primitive visualization can lead to questions. With these questions, we can then come up with new visualizations that leads to more insights. That, to me, was really cool.

chamals wrote:

I think the Polaris paper's comments in regards to lag is due to the time period the paper was written in. Zhenghao's meeting with Tableau shows that they have a completely different belief now. "In particular, he mentioned that there was so much emphasis on making the visualization process almost instantaneous because they had realized that people tend to be more likely to find interesting things during data exploration if they are allowed to generate ideas without having to pause in between." I also found it interesting that people think this software oversimplifies data viz, yet some people think it is hard to use. It seems to me that the product could still be easier to use for people who want to explore data sets with little background in the area. But as wtperl said, for experts, it seems like a very quick and powerful tool. Perhaps Tableau could autogenerate many visualizations and a user could narrow them down with a set of constraints. This would be a very noob friendly feature and allow people to go through dozens of visaulizations with little work by the user.

elopez1 wrote:

Perhaps the biggest takeaway I got from the Polaris paper was just how efficiently it (and Tableau) allows the user to explore different aspects of a data set. Stolte et al. make the point that "the analysis process is often an unpredictable exploration of the data. Analysts must be able to rapidly change what data they are viewing and how they are viewing the data." I could not agree more and, based on the Tableau demo in class, I can see the power in a tool like this. It's hard to spot trends in most real-world data sets, especially very large ones, and a tool like Tableau allows us to rapidly flip through different visualizations of the data, exploring and hoping to come upon interesting trends.

Additionally, it's clear that data can hava a lot of dimensions, as we've seen in several examples in class. Having a tool that allows us to quickly set different dimensions to observe is also immensely powerful. The point is this: You don't need to know what point you're trying to make in a data visualization when you set out to create one. Explore.

bgeorges wrote:

I started using Tableau a few days before reading the Polaris paper, and it was very interesting to see the evolution of the product. I particularly found the discussion of the "Table algebra" that is used to reason about how combinations of dimension types would be translated into specific visualizations to be enlightening. A question that I'd had a lot during a lot of the lectures and readings is if there was a precise way of describing visualizations, and it was interesting to see that that this was a problem that had to be addressed to create something like Tableau.

mbarrien wrote:

I actually find this lecture+set of papers not to necessarily be focused on multivariate display, but instead a continuation of the discussion about iterating through different sets of visualizations for exploratory purposes. The parallel axes paper was focused on the "let's spot something interesting, now filter, now find something else interesting, filter more" experience. (In fact, I argue that parallel axes is completely useless without the filtering and iteration.) The Polaris paper pointed out the need to have human insight to spotting a trend in the visualization.

So why not have the software try to point out the interesting features of the visualization, to help the viewer? In the parallel axes paper, the tool could use color to highlight the spikes in X6, or highlight the gaps in the clusters in X15 when the analyst gets a certain view. Pointing out a small subset of these anomalies in the data seems straightforward (all data clustered together, data split apart, data deviating from majority, data directly/inversely correlated, etc). Or from the in class demo, have Tableau highlight those clusters that ended up with a larger preponderance of one glyph than the other. (Or those gaps indicating non-randomness from plot #14.)

You could argue that a human would spot them anyways, but in cases where a human is exploring, he wouldn't necessarily know to look for those patterns, whereas a computer could spot them easily, especially when there are so many variables to look at. The tools presented do a good job of presenting all these variables in succinct ways, but even so it can be overwhelming.

junjie87 wrote:

I liked the example in the Polaris paper which described how they found systems error with the help of data visualization. One thing to note though, is that Polaris helped with the data viz, but not the data collection. It would be cool if the paper described how long it took to develop the data collection procedures, like how they managed to measure cache performance and lock contention, and how easy/hard it was to fit though data (with a nice UI) with Polaris.

I also wonder why such nice beautiful visualizations are not in Excel or similar softwares yet. The two reasons I can think of would be 1) some of the visualizations are rather new (eg I hadn't heard of parallel coordinates till the class), so demand isn't there yet to include them in softwares. It is worth noting though, that google docs allows you to do a map plot, whereas Excel (as of version 2007) does not. So I guess the more useful but new visualizations will catch up over time. 2) Polaris or Tableau might be powerful, but they also have a learning curve that users already familiar with Excel would not want to deal with. So the demand might not be there yet to include such features in the software products. That said, I can definitely see how using data visualizations to guide some business decisions would give a strategic edge to a company.

blouie wrote:

I would tend to agree with the idea that Tableau in of itself isn't sufficient to really reach the final iteration of a visualization. In fact, no tool should really be viewed that way, for full exploration should be a fully exhaustive process of examining different tools and methodologies. Tableau, then, should really only be one ingredient in determining what would be the final result of exploration.

So in criticizing the merits of the Polaris paper, I think we need to take it with a bit of a grain of salt, so to speak. Instead of thinking of it holistically as a great study, we need to think more singularly about its impact in more specialized (or smaller-scale) domains.

schneibe wrote:

As other people have pointed out, I found Tableau extremely useful for exploring dataset. I tried it on the data I gathered for a previous experiment and I was surprised to discover new trends just by playing with the interface. However, as @blouie said, it doesn't mean that you can skip steps when exploring your data. For instance formulating clear hypotheses before loading the data is still a crucial step and I don't think that Tableau is going to provide a story for you. But all in all, this is a very useful tool that allows users to easily explore big datasets.

jofo wrote:

I like the rich reporting of the analysis process in Inselberg, as well as the case studies in the Polaris paper.

On the statement "When encoding a quantitative variable, it is important to vary only one psychophysical variable, such as hue or value." I got curious and read the cited Rogowitch and Treinish (1995) article and was stunned by the discussion of colormaps like "A colormap which only varies in luminance (e.g., a grayscale image) cannot adequately communicate information about gradual changes in the spatial structure of the data."" this ought to have huge impact on medical imaging (which is also their example domain), but still doctors are stuck with grayscale. But the statement about encoding only to hue or value, is that the right interpretation of Rogowitch & Treinish? They say you should variate both in a colormap. Perhaps they just mean you can have one variable representing hue and another value. Anyhow, I am looking forward to the color lecture!

daniel89 wrote:

Reading the Polaris paper and playing around with Tableau, I saw both just as tools to rapidly and easily prototype and explore the design space for visualizations (and in the case of graphs, tools like Gephi). Such tools have limited (or "opinionated") toolsets, and I think much of the criticism levelled at them has been that they tend to oversimplify matters.

Exploration, especially automated, seems to be heavily limited to the wholly "visual" sense. Polaris' main aspect was automation, while Tableau's "suggest visualization" aspect was also interesting. Both papers seem to focus greatly on visualization. I would have instead guessed that statistical techniques also play a part- for example, a learning algorithm might realize the importance of a particular set of variables. Using statistics to inform the visualizations (and possibly narrow the automated results) may be very useful to narrow the design space.

Another topic that has been brought up a lot was also the parallel axes model (especially the interactive plot). Random idea: it'd be a great way to find a car: think hipmunk.com for cars.

babchick wrote:

With so much interactivity and utility for the task of exploring data sets, I wondered why Tableau does not have more collaborative and/or social features baked into such that others can share their collective insights on data sets. Then I found http://www.tableausoftware.com/public Tableau Public. Good one Tableau!

It's amazing to me how useful Undo and Redo can be in playing with data and exploring the 'trail' of visualisations we leave as we experiment with different plots. The paragraph explaining how trivial building that feature was, thanks to the formal visual specifications that fuel Tableau, illustrates the power in the theoretical study of visualisations that I did not appreciate until reading this paper. This ability, paired with brushing in particular, seems like it allows people to explore questions at a rate much much quicker than any tool before Polaris.

Finally, I was not the biggest fan of the Multidimensional Detective paper; my main issue with the idea is that it makes no effort to distinguish the contextually important variables and leaves that cognitive process of focusing attention on interesting variables to the viewer. I wonder if it would be as effective to space the more self-identified 'important' variables further apart from lesser weighted variables.

rc8138 wrote:

In statistics, there are quite a number of dimension reduction techniques, including PCA, Multi-dimensional scaling, local linear embedding, Isomap...to name a few). One of the challenge is to interpret the data in the lower dimensional space, and to avoid drawing erroneous inferences.

I found similar themes when looking at visualization. While it is useful to compress data and visualize them in a low dimensional space, the practice of exploratory data analysis could be difficult because the compressed data is not natural to us (e.g.What does it mean to take the linear combination of variable X and Y with a specific weights?).

For this reason, I prefer using the principle of "small multiples". As mentioned by Tufte, "small multiples" principle directly enforce comparisons. The fact that we presented a series of similarly structured visualizations as a function of one underlying (preferably nominal) variable allow us to pick up the changes very quickly.

pmpai wrote:

While going through Inselberg, I couldn't help but think that interactivity would have helped make parallel coordinates so much more useful at the time. With dense data sets, discerning between different data points is almost impossible without redrawing the plot with just the data points in question. In class, with the selectivity and greying-out, looking at individual lines and groups is so much more comfortable.

This led me to think about how much time would have been spent in multidimensional exploratory analysis. Each question or hypothesis would have required the re-plotting of all those data points in different combinations. Even inverting axes would have been a pain. With tools like tableau, the visualization is reduced to a few clicks. I think that's a real boon to the process of exploratory analysis. With the actual task of plotting and changing visualizations reduced to a trivial one (drag, drop, plot!), there is so much more time to spend on asking the pertinent questions.

Leave a comment