Utilizing Treemaps to Visualize Information
In a recent blog post here by my pal Andy Kriebel, Andy talks about Makeover Monday and discusses his views on treemaps and bubble charts. A couple of folks requested questions on Twitter about when these might be used. A couple of folks, together with Andy, responded « by no means ». I made a decision to put in writing this put up to supply some background and some situations the place I believe these charts work, particularly the treemap.
What are treemaps
The treemap was invented by Ben Shneiderman. Dr. Shneiderman created the treemap to visualise hierarchical information. He wished « a compact visualization of listing tree constructions », however different extra widespread visualization strategies didn’t work effectively for this. For instance, the folder construction on a pc is a tree construction. The folder « My Paperwork » would possibly comprise « Footage », « Movies », and « Paperwork ». These folders may also comprise sub-folders and so forth. The treemap was a method to visualize this massive quantity of knowledge, many folders and subfolders, in an environment friendly method. He writes, « Tree structured node-link diagrams grew too giant to be helpful, so I explored methods to point out a tree in a space-constrained format. » For extra details about the historical past of treemaps see his article Treemaps for space-constrained visualization of hierarchies.
Martin Wattenberg created a slight variation on this, which Dr. Shneiderman known as a clustered treemap. This design is what most individuals immediately generally confer with as a treemap.
The treemap visualizes the biggest segement to the smallest section so as, encoding the info utilizing dimension of the rectangle. Colour is commonly used to encode extra information or as double encoding.
The issues with treemaps
The largest drawback with treemaps (and bubble charts), as Andy factors out in his weblog put up, is that utilizing dimension to encode the info makes it not possible to make exact quantitative comparisons vs. utilizing size/top of a bar or place of a dot or line. In different phrases, bar charts, dot plots and line charts supply a significantly better method of encoding information for exact comparisons. As well as, now that treemaps are widespread chart sorts in lots of enterprise intelligence instruments, they’re typically getting used to point out easy categorical comparisons that will be higher visualized as a bar chart.
Let’s take a look at an instance. The treemap beneath is from a Makeover Monday viz here.
Now it is quiz time. See in the event you can reply the next questions rapidly.
What are the highest 5 international locations?
What are the underside 5 international locations?
What’s the promoting worth in Croatia?
Are you able to visualize the small distinction between Sweden and Romania?
I believe you will discover a few of these are simpler to reply than others. You had bother answering the underside 5 as a result of within the format of this treemap you may’t see the names and spending quantities on these labels. Croatia is not marked as a result of there wasn’t sufficient room for the labels and Romania and Croatia have been arduous to check as a result of they look like about the identical dimension rectangle. So that you needed to depend on the order of them and the label to make that comparability.
Now have a look at the identical information compared to a normal bar chart and ask the identical questions.
These questions are a lot simpler to reply with the bar chart. Even the small distinction within the information for Sweden and Romania can now be seen. So utilizing this instance I utterly agree with Andy. The bar chart is a significantly better method to present this comparability.
As famous above, the unique intent of the treemap was to visualise hierachical information. Generally there’s a want to point out comparisons at a number of ranges. On this subsequent treemap instance, see in the event you can reply the next questions.
Which Area has a bigger inhabitants, Africa or the Americas?
Which Area has the smallest inhabitants?
What nation in Africa has the biggest inhabitants?
What are the highest 3 most populated international locations within the Americas?
What’s the third most populated nation in Asia?
There are a number of issues to level out. First, discover that you’re answering questions at two completely different ranges within the information. You have been in a position to reply questions evaluating areas, but in addition international locations inside areas. This is a vital distinction. If creating bar charts to switch this picture you then would want a number of charts. You would want a bar chart evaluating areas in addition to 7 different bar charts to check international locations inside areas to reply these identical questions.
As well as, there’s a part-to-whole comparability you could now make, albeit in an estimated method, that you simply can not make with bar charts. For instance, even with out labels on each nation, you may see that the nation of India with 1.2 billion folks is barely greater than the complete area of Africa (the entire blue rectangles), which is 1.1 billion. This kind of comparability in built-in to the treemap, however would require particular dealing with if creating bar chart comparisons throughout these completely different ranges.
Let’s take one other quiz and see if we will reply these questions with the identical treemap, this time within the model of Hans Rosling.
Which international locations beneath have the biggest inhabitants?
Nigeria or Vietnam?
Bangladesh or Germany?
Japan or Italy?
South Africa or Nepal?
As Hans Rosling did in his survey of nations, these international locations have been picked in order that the inhabitants for one nation is twice the dimensions of the opposite. I picked South Africa and Nepal to showcase that even with out the info label it’s nonetheless doable to reply the query.
Though it is doable to reply these questions, you most likely observed that it wasn’t straightforward to seek out these international locations. Your eyes needed to seek for every of them after which make the comparability. As well as, all of them had the nation identify listed within the field. I did not ask about Malaysia, which isn’t labeled. This highlights one other concern with treemaps, which is that generally there cannot be a label for each field. Because of this it’s a must to depend on interactive options of the visualization, like tooltips or spotlight features to make a few of the comparions and subsequently it turns into much less helpful when printed or displayed in static type.
Giant numbers of categorical comparisons
There can be instances when you will have to make categorical comparisons with numerous classes. For instance, information in 50 states, 196 international locations, 3,100+ counties within the US or different actually giant numbers of classes. Bar charts usually are not helpful on this scenario. One answer is to point out the highest n or backside n from the info. Nonetheless, that is solely a small subset of the complete information.
One good instance of a treemap with 50 states is the electoral school. This explicit dataset will be very tough to visualise. It’s typically visualized on a choropleth map (shaded map), which distorts the comparability due to the dimensions of the land within the completely different states. Others have tried visualizing it with cartograms which distorts the form of the nation making it barely recognizable (here are some variations by the Financial Times). My co-author, Steve Wexler, for our upcoming guide the Big Book of Dashboards wrote an incredible weblog put up on this very matter. In his put up, In Praise of Treemaps, he used a treemap to check the electoral votes for the poltical events for the 2012 election. It is very straightforward to see which occasion is over 50% of the electrate votes. Understanding that finding states within the treemap is likely to be tough, Steve added the checklist of the states on the suitable making it straightforward for a consumer to seek out any state or set of states and spotlight them.
Discover that the blue is greater than half of the complete treemap. That is straightforward to see, however he additionally supplemented this with a bar chart beneath so the reader might make a exact comparability.
Even when the info is not hierarchical a treemap can nonetheless be helpful. Within the subsequent instance I present 607 corporations within the Client Monetary Safety Bureau’s grievance database. The treemap highlights that just a few corporations make up over half of the complaints. I included a separate bar chart on this visualization for the highest 10, not proven, however the treemap and stacked bar chart spotlight just a few issues. First, the highest 6 corporations characterize 60% of the complaints within the database. A exact comparability shouldn’t be wanted to see that Citibank has extra complaints than Capital One. The info is definitely quadruple encoded, so even when the reader cannot simply examine the dimension of the rectangle, the order, label and colour will help the reader. The interactive model has a tooltip so the info will be seen for the containers that aren’t labeled. An extra annotation « 584 corporations = 14.7% » was added to focus on the variety of small corporations which can be within the backside proper nook of the treemap.
I am not advocating that that is the one answer. In reality, a bar chart might additionally work effectively on this scenario.
Discover within the bar chart answer that the 584 corporations are aggregated into an « Different » class. This may increasingly or is probably not one of the best ways to visualise this data relying on the aim of the visualization. For instance, it might be not possible to focus on « USAA Financial institution » as a result of it is aggregated within the « Different » class. This might be crucial, particularly in the event you work for USAA Financial institution and wish to see the way you rank in opposition to the opposite banks.
A dot plot is likely to be one other different relying on the info. It may be very helpful to point out how one information level compares to the entire others. Nonetheless, this strategy has it is downsides too. For instance, within the case of the CFPB complaints, there are such a lot of small corporations that even when making use of jitter to the dot plot it’s diffciult to see the entire corporations close to zero.
It is essential to grasp the strengths and weaknesses of a treemap if you will use them.
1. Initially constructed for hierarchial information.
2. Permits for non-precise comparisons between prime stage classes in addition to comparisons inside classes at a decrease stage.
3. Can encode numerous classes, hierachical or not.
4. Permits for straightforward secondary encoding with colour.
5. Permits for a part-to-whole comparability.
6. Reveals the entire information (non-aggregated) enabling spotlight and tooltips when interactive.
7. Can be utilized as a filter, spotlight or navigation device.
1. Can not make exact comparisons as a result of it encodes information utilizing dimension (and colour) of rectangles.
2. Smaller containers can’t be labeled making them arduous to seek out and make comparisons.
3. Smaller containers can’t be labeled making it tough or not possible to learn, particularly in printed type.
Take into account the Options:
Bar Chart – if exhibiting a small variety of classes then a bar chart is nearly all the time going to be higher.
Bar Chart with « Different » Class – if there are too many classes for an efficient bar chart and you’ll be able to mixture the info then grouping classes into an « Different » class will be very efficient.
Bar Chart with High N – if there are too many classes for an efficient bar chart and solely the highest N are essential then exhibiting solely the highest N will be an efficient answer.
Dot Plot – utilizing a dot plot or a dot plot with jitter (aka jitter plot) will be an effective way to point out giant variety of categoral comparisons with out aggregating the info.
Small Multiples – a number of charts can be utilized to permit for various comparability ranges within the information. For instance, a collection of bar charts.
Icicle charts – Adam McCann identified that one other different is likely to be an icicle chart. For the suitable variety of classes this might be helpful, however because the variety of nodes will increase this can be arduous to visualise in the identical area.
Different specialists within the area of knowledge visualizaiton have written about treemaps. For instance, Ben Shneiderman’s article, Discovering Business Intelligence Using Treemap Visualizations, is featured right here as a visitor writer on Stephen Few’s PerceptualEdge.com.
Stephen Few mentioned treemaps in his article, Tableau Veers from the Path. He writes, « Ben Shneiderman created treemaps to show giant numbers of values that exceed the quantity that might be displayed extra merely and successfully utilizing a bar graph. » He additionally exhibits an instance of a treemap from his guide Now You See It (Web page 46).
As traditional, Steve is ready to describe the treemap completely.
When typical graphs, equivalent to bar graphs, can’t be used as a result of there are too many objects to characterize as bars in a single graph or perhaps a collection of graphs on a single display, treemaps remedy the issue by making optimum use of display area. As a result of they depend on pre-attentive attributes to encode values (space and colour) that we won’t examine exactly, we reserve such strategies for circumstances when different extra exact visualizations can’t be used, or precision is not mandatory. » – from Now You See It, Web page 46, Stephen Few, 2009.
« Treemaps can show a substantial amount of data fairly powerfully however for a restricted set of functions. That’s, treemaps weren’t designed to assist exact quantitative comparisons, which we won’t make based mostly on relative dimension and colour. » -from Now You See It, Web page 90, Stephen Few, 2009.
Replace: Thanks Rob Radburn who jogged my memory that there was some analysis on this space. Reference beneath.
Perceptual Guidelines for Creating Rectangular Treemaps by Nicholas Kong, Jeffrey Heer, and Maneesh Agrawala, 2010. This paper outlines lots of the issues mentioned on this put up, particularly that treemaps are helpful when evaluating throughout nodes or when there are very giant numbers of classes to check.
These are the important thing discovering taken from this paper:
Leaf/Leaf comparisons – « Bar charts are extra correct than treemaps as much as a density of two,048 leaves, after which treemaps turn into equally correct. At 4,096 leaves, treemaps turn into sooner than bar charts—as much as 5 seconds sooner at 8,000 leaves. »
Leaf/Non-Leaf comparisons – « Treemaps are extra correct than bar charts in any respect densities, however no sooner. »
Non-Leaf/Non-Leaf comparisons – « Treemaps are extra correct, however exhibit related estimation instances. »
I am hopeful that extra analysis can be finished sooner or later to assist us perceive these chart sorts higher and the way they can be utilized successfully. Like many different chart sorts (sankey diagrams, chord diagrams, node-link diagrams, and so on.) treemaps have vital limitations. Nonetheless, when used for the suitable goal they are often efficient charts for visualizing information.
I hope you discover this data useful. In case you have any questions be happy to electronic mail me at Jeff@DataPlusScience.com
Jeffrey A. Shaffer
Comply with on Twitter @HighVizAbility