(click on the image bellow to open) (Mirror)
Caption: Deaths in England and Wales in 2007 broken down by age, sex and cause (ICD10 code). the area of the rectangles is proportional to the number of deaths. Rectangles age grouped into larger rectangles based on their ICD10 group.
Source code notes:
- Source view is enabled so if you just want to browse the code you can right click (on the application) and select view source.
- I modified and debugged several flare libraries and as such I am including my version of the flare source code as well.
The compiler used was Flex 4-Beta 2 Build 220.127.116.1185. You can download it here.
Designing the visualization
Having looked at the British National Death Register (2007) for A2-OgievetskyVadim I have decided to stick with this, arguably, morbid topic for this assignment as well. The bulk of the yearly Death Register report is dedicated to a detailed breakdown of all deaths calcified by age, sex and cause of death (ICD10 code). This data domain appealed to me because a well designed representation of these statistics would not only be interesting to the casual observer but also, potentially, be of great use to doctors and policy makers.
It should be noted that I have previously come across and attempted to explore this data set when trying to compare the risks of smoking against motorcycle riding.
Choosing a visualization type
The decision to use a tree map to visualize this data was very straight forward. To understand why, some background on the ICD coding system is needed.
The ICD (International Statistical Classification of Diseases and Related Health Problems) coding system is a tree of codes. When a death occurs it is assigned a code which is a leaf in the ICD tree and there codes are reportedly bundled together into groups that code for the general unifying condition.
For example: G12.2 codes for "Motor neuron disease" and is a specific instance of G12 - "Spinal muscular atrophy and related syndromes" which in turn is part of G10-G13 - "Systemic atrophies primarily affecting the central nervous system" which belongs to the basic group of G00-G99 a.k.a Chapter G - "Diseases of the nervous system".
There are a limited number of common visualizations that are able to display the data while preserving the ICD10 tree structure these are: Tree-map layout, Icicle layout and Sunburst diagram.
The Icicle layout does not use space efficiently and the Sunburst makes it difficult to compare between different layers because of radial distortion. Furthermore the tree map layout is the prettiest.
Hover over a rectangle in layout and see to which ICD10 code it corresponds to and how many people have died from that specific cause. This is an essential visualization as without it the rectangles are meaningless.
Age selection/Gender selection
Limit the data display to a specific age group or groups. Infants, teenagers and the elderly die, on average, from totally different reasons; even though cumulative deaths statistics are interesting, a narrowing of the scope to a specific age group would be not less interesting. For some condition (in particular cancer) male and female fatalities will be significantly different.
‘Zoom in’ to a sub-cause having it occupy the whole screen to give more detail on the underlying fatalities. The breakdown of deaths within a specific code group is very important and will allow a viewer with specific interest (e.g. infectious diseases) to explore that date subset without distraction.
Search for deaths by specific cause (e.g. search for all deaths that involve a motorcycle) and search for deaths by exact code or code range (e.g deaths with code in V10-V80) should be available to the user. The search should dim out the non matching results and if several search terms are entered then the irrelevancy of results should be indicated by shades of gray.
Interface plan (storyboard)
The mail goal of this visualization is to allow the user to make immediate comparisons between the relative numbers of fatalities of different conditions. The Tree Map layout is perfect for such an application which makes the basic interface pretty clear.
Both of which posses the basic interaction techniques that I want.
One idea that I had that is not seen in the examples above is to add a tree ‘zoom’ in interaction. This will allow the user to focus on a specific condition group (for example: Infectious Diseases) instead of looking at all deaths as a whole.
I want my interface to look like this:
The age selector is represented as a series of check boxes that each correspond to a specific age range. Unchecking a box excludes its range form the data set.
A big part of the implementation of this visualization was getting the data in the correct format.
The data came as twenty separate Excel spreadsheets, each corresponding to the twenty basic morality chapters (Capter I, A00-B99, Certain infectious and parasitic diseases; Chapter II, C00-D48, Neoplasms, Chapter III, D50-D89, Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism; e.t.c). My first task – a most tedious and lengthy task – was to remove all the irrelevant aggregated data and footnotes form all the tables and save them as CSV flies. This then allowed me to concatenate them all together into one huge CSV file.
The data itself had evidently been formatted by someone with very limited understanding of spreadsheets, I spent a lot of time correcting typos and extracting the data into a usable form. One clear of a silly typo was that some of the I codes (e.g I13) were enterd as 113, with 1 (one) instead of the I (i).
I decided to not bother with loading the data from an online location and just bake the data into the application.
Implementing the code was a delight. I have fairly extensive familiarity with Flash/Flex and I really enjoyed studying the code for Flare. I started with the Package Map application and toyed with it until I got the desired results (it should be noted that over time every line of code in the original package map had been replaced so all the submitted code is my own). I did finds several shortcomings in the Flare implementation. Mostly I could get around them by writing my own classes that extended the originals (for example I extended the Visualization class to a class that adds the nodes to the Flash display tree in BFS traversal) but sometimes I had to change the actual Flare code (there was a silly divide by zero bug in the TreeMapLayout component).
I had to spend some time optimizing the tree building algorithm which, due to the large amount of data, initially did it in 20.5 seconds. Through a clever caching technique, and by adding some meta-data to the stored data, I managed to get it down to 0.45 seconds.
Having made a working version of the visualization I did a fairly extensive amount of user testing, both playing with it myself and giving friends and passes by explore British deaths. From the feedback I got I made these refinements:
- One of the biggest things that I noticed was that a lot of the time people were interested in a specific age range. They patiently unchecked every box but one. This was very inefficient as every box uncheck generated a lag while the visualization updated itself. The check-box version also allowed users to uncheck al the boxes, which was a special condition that needed to be checked. I decided instead to implement the age selector as a single slider with two thumbs. While reducing the users expressiveness by not allowing him to select disjoint age ranges (e.g. 20-24 and 95+), this improved usability as a whole by making age selection so much faster.
- In an attempt to squeeze ever more data onto the canvas I changed the highlighter of the rectangles to one that would show a two color separation of the male/female ratio (pictured above).
- In order to aid navigation I have added labels to the base classes they were requested by a lot of people, and even though they do help I believe that, by obscuring the data, they also hinder a little bit. Therefore I included a check-box to turn them off.
- For very small rectangles the labels would fill the whole space and then get clipped. To Eliminate this unnecessary clutter I defined a minimum width and height that the rectangles need to satisfy to receive a label.
- I have played around with the easing function/timing for the animation and discovered the one that looks best.
- I have experiments with fade in as opposed to no animation (a.k.a pop) for the reappearance of squares and found that the fade in added nothing visually and made the transition longer and more cumbersome.
While writing the code I though of a few ideas that would improve the visualization, sadly I did not have time to implement them.
- Use the #anchor to save the states of the visualization, enabling users to share links to specific views.
Add simpler descriptions/tags to certain ICD10 codes. When showing the visualization to my sister she tried to discover how many people die from AIDS; she entered "AIDS" into the Filter box and got no results. What she was looking for was: "HIV disease resulting in infectious and parasitic diseases" [B20], "HIV disease resulting in malignant neoplasms" [B21], e.t.c. Since none of these ICD10 code descriptions contained the string "AIDS" she did not get any results. One possible way to implement it would be to use the power of mturk to add more laymen friendly descriptions.
- Add an alternative color scheme that would aid the color blind.
It is difficult for me to estimate the total number of hours put into this project (time flies by when you are having fun). If pressed I would guess 40-50 hours.
- Data clean up and formatting: 10-15 hours
- Initial design and plan: 2 hours
- Initial (prototype) implementation: 10-15 hours
- Adding polish + flare debug: 10-15 hours
- Partial rewrite and refinements: 4-6 hours (I rewrote some of my code when, through a deeper understanding of Flare, I saw a better and more robust way to do something.)