From the category archives:

Chart Types

If you are a market researcher, and you want to make sure that you get more reliable results for a subgroup in a survey, what do you do? You must increase the overall sample size (and spend a lot of money), right?

Actually, you don’t.

You can oversample that group only, and then weight it down to its known proportion in the population. For example, you may want to increase the number of managers and decrease the number of housewives (because the former are usually more heterogeneous than the latter). Oversampling is a common research method, and a very cost-effective way to get precise estimates for a subgroup.

This is a real-world solution, and if we have finite resources to solve a real-world problem, resource allocation must be part of the equation. Higher variability usually demands for more resources.

Why is this relevant in a blog about charts and information visualization? Glad you ask.

The Great Irregular Interval Debate

Let me give you an example. A while back, Jon Peltier wrote in his blog:

I don’t understand the obsession with an equal date interval. A line chart need not show the trend of only evenly-spaced data. Suppose I am observing temperatures, and I decide for simplicity that where the temperature hasn’t changed, or where it has been changing steadily, I do not need to record every value. Overnight after the temperature has dropped, I can characterize my temperature profile with one point per hour. As the sun rises, I may need more frequent recordings to capture the morning warm up. Then the clouds blow over, it starts to rain, then it clears up again; I may need minute-by-minute data points to track this. When I make my plot, is it any less relevant because the spacing of the data ranges from minutes to hours?

This is oversampling, and a wise resource allocation, too. In a survey, you weight the subgroup down to its right proportion, and that’s also what you do in a chart, when irregular date intervals are displayed proportionally.

Stephen Few disagrees:

Using a line to connect values along unequal intervals of time or to connect intervals that are not adjacent in time is misleading.

Furthermore:

How could we trust graphical representations of time series or frequency distributions if their shapes could have been altered by inconsistently manipulating the sizes of intervals along the scale, either arbitrarily or intentionally to deceive? We can derive meaning from patterns and trends that these graphs display only if the intervals are consistent.

wrong-line-chartHe exemplifies his argument with these two charts (actually, there are three, but we can safely disregard the third one).

The first chart displays the correct annual sales. The second one displays arbitrarily grouped annual sales and, obviously, its pattern is quite different.

Now, the second chart is plain wrong, so I am not sure if you can use it to argue against unequal intervals.

corrected-line-chart

Let’s use a fairer example with the same dataset and the same arbitrary grouping.

Compare the orange line with Few’s first chart. I actually don’t see much difference. Sure you lose a lot of detail, but the basic pattern is there. Instead of sums, I am using averages (you can’t compare a single year with the total sales of three or four years).

The other two lines show the difference between equal and unequal intervals. The brown line displays the data points unequally spaced while the gray one uses equal intervals (Few’s second chart). I had to make some assumptions regarding the reference date, so this is not the best example, but it is good enough to show the potential risk of using equal intervals with unequal intervals of time.

Bottom line, oversampling is a useful method for better resource allocation. We can view irregular time series as some sort of oversampling, provided there are no missing values and irregular intervals in the chart are consistent with intervals in the time series.

Grouping data points is always a tricky issue, and Stephen Few show it clearly, but we shouldn’t infer that “line graphs and irregular intervals is an incompatible partnership.”

(When using time series in Excel, make sure that category axis labels are recognized as dates. Alternatively, use a scatter plot with connected data points.)

{ 8 comments }

poverty-ratios-skyscraperTextures. 3D. Pie charts. Primary colors. Trends hidden behind labels. Backgrounds. Pie charts again.

Clear signs of a bad chart, right? Right. It is so easy to spot a badly designed chart that you can use a computer to do it. Don’t waste your time.

Let’s stop discussing the obviously wrong and start discussing the useless right. Like this chart here. (I’ve borrowed the dataset Nathan used in one of his visualization challenges – some interesting entries and great discussion there, by the way).

There may not be anything really, really wrong with this chart, but it reflects a bureaucratic way of thinking about data and data presentation where every single data point must be clearly shown and labeled. Just like a table.

Listen, unless you work for a statistics office, you should never create a chart like this. I know, it’s irresistible to check how well my state ranks, but identifying each and every data point in a virtually limitless bar chart makes no sense in most cases.

Do you read the labels between the top five and the bottom five? Charts like this encourage look up of individual data points, and for that a table is probably a better option. If anything, a skyscraper bar chart is a clear sign of loss aversion.

A Flexible Bar Chart: Introducing the Accordion Bar Graph

How do you graph a categorical variable with more than, say, 20 data points without creating a skyscraper? This is what I have in mind:

  • You must retain the overall pattern, so you can’t remove data from the chart;
  • Create one or more focus area (top five and bottom five, for example);
  • Gaps between bars should be larger in these focus areas, so that labels can easily be added.
  • Minimize the height of the remaining bars and remove the labels;

The chart should look like this:

focus-context-bar-chart

 

I like the accordion metaphor and I’m playing with it. An interactive version could use a simple event to create a focus inside the context area, so when the user moves the mouse the bar is enlarged and the label is shown.

What do you think? Do you agree that skyscraper bar charts are (almost) useless or should we focus on reducing the number of data points instead? How would you improve this design? Please share your comments and charts below.

Update

Well, if you want to know how to do this in Excel and read a great discussion about it, Jon wrote Accordion Chart for Jorge. He not only discusses some of the options but also shares the Excel file with us. Thanks Jon! And Dick, over the Daily Dose of Excel wants to make sure that your state is automatically highlighted (Ego Charts). Nice “quarter step”!

{ 22 comments }

You can add  silly 3D effects to a pie chart, you can explode all the slices, you can compare multiple pie charts, you can use a legend instead of labeling the slices directly. This will probably render your graph useless, and make you look kind of dumb, but it is not the end of the world-as-we-know-it. But when making a pie chart there is something that you should never ever do, a capital sin that will make you burn in the hell of information visualization: using more than one variable in a single graph.

Well, since we are witnessing the end of the world-as-we-know-it, computer scientists at the University of Utah decided to give a little push, visualization-wise. They are designing a computer application “they hope eventually will allow news reporters and citizens to easily, interactively and visually [analyze] election results, political opinion polls or other surveys”. They boldly state that they “have developed new techniques for exposing complex relationships that are not obvious by usual methods of statistical analysis” (press release). And what are those new techniques? A doughnut chart:

The outer ring labels the series and the inner ring displays the data. Apparently you may add as many series as you wish and you can filter the results by socio-demographic characteristics. There is a video demonstration here [via].

This is the kind of joke that I would expect to be related to April Fool’s Day, but they seem to be serious about it. No one told them that showing part-of-a-whole is one of the few strenghts of circular charts, that when people see 52,7% they see a pie cut in half, not a quarter, that “whole” mean 100%, not 200% or 300%.

Regular readers know that I rarely utter such harsh comments on visualization ideas and applications (I even tried to create a dashboard using Crystal Xcelsius), but this is the stupidest idea of the year. They should know better (here are some tips).

By the way, I found this through a post by Sarah Perez at ReadWriteWeb. She writes: “unfortunately, the poll-analysis software isn’t quite ready for prime time. What a tease!” Fortunately, it is not! And judging from other posts, they could use an information visualization consultant. 

Well, perhaps I’m missing something. Am I?

{ 9 comments }

This article goes much against conventional wisdom about pie charts (and doughnut charts) by answering these two simple questions:

  • Can we use a large number of categories in pie charts? (Yes, we can.)
  • Can we make a productive use of the apparently useless doughnut chart? (Yes, we can.)

Disclaimer (Sort of…)

Let me start by declaring this: I believe that the analysis of simple proportions is, by its very nature, very limited. It only scratches the surface of the data and it is useless for serious, decision-making processes.

A circular chart is poor because the underlying message is poor. If you can run a business using pie graphs to make sense of your data please let me know what market are you in, because I want to be there too (well, not really…).

Pie chart belong to the media and to some simple presentations. Leave them there. And don’t make the charts you see in the media your role model.

The part-of-whole issue

That said, one must recognize that proportions are so pervasive and hard-wired into our brain that escaping them is almost impossible.

A circular chart conveys perfectly the idea of part-of-whole relationship. You can’t use a bar chart to show this relationship because the whole just isn’t there! Yes, you can use percentage scales, yes you can say it in the title, but it isn’t the same thing, is it?

As I wrote in my previous post on loss aversion, each chart answers a question from a different perspective. Charts are not interchangeable.

Often pie charts are used just because they may look better (this is, of course, in the eyes of the beholder) but what the user really wants/needs to know would be better answered by a bar chart. This is a problem of graphic literacy and information management. It has nothing to do with the intrinsic qualities of pie charts.

The limit of 4 to 6 categories in pie charts

There is a widespread believe that you should not use more than four to six categories in a pie chart.

That’s is wrong or, at the very least, very incomplete.

In fact, you can use as many categories as you want, and still get meaningful insights from the chart. Problem is, you must know what to do with your data (graphic literacy and information management, again), and a large number of bad charts come from this simple fact: people don’t know what to do. Garbage in, garbage out.

“The Secret Strenght of Pies”

Here comes the fun part. In an article published back in 1991 by Ian Spence and Stephan Lewandowsky, titled “Displaying Proportions and Percentages” the authors write:

“the pie chart outperforms the bar chart for complicated comparisons, suggesting that the perceptual addition and comparison of components is inherently easier with the pie chart than the bar chart.” (emphasis added)

(By the way, the authors also say that this advantage will be lost if you “explode” the slices.)

Stephen Few, in his “Save the Pies for Dessert“, cites this article and writes about “the secret strength of pies”:

It is not difficult to believe that it is somewhat easier to sum the areas of slices in a pie than it is to imagine the combined heights of bars stacked on one another.(…) Regardless, the fact remains that a comparison of two sets of summed parts is rare in the real world. But, by all means, should you ever need to display data for this purpose, a pie chart would serve you well.

Please note that Stephen Few, in his highly regarded book “Show me the Numbers” says:

I don’t use pie charts, and I strongly recommend that you abandon them as well.”

Few acknowledges that pie charts “could serve you well” in a very limited set of circumstances (“a comparison of two sets of summed parts is rare in the real world”).

Is it really rare? It may be, but that’s because people don’t know what to do with their data (again). Let’s see.

You have 10 or even 20 categories and you want to use them all (your loss aversion tendency?). Because 20 ungroupable categories are rare in the real world, you should be able to visually group them, using a color (hue) for each group and a different saturation for each category. By doing this, you are adding layers of detail, and the reader will be able to select the level of detail that suits his/her needs. This works best when using an interactive chart because you don’t have to label everything (just use your mouse to identify on-demand the more relevant detail categories) but even a static chart can be used (in this case, label only the relevant details).

The Consumer Expenditure Chart

I used this methodology to design the consumer expenditure chart above, with living expenditure (on the right) and discretionary expenditure (on the left).  As you can see, living expenditure accounts for almost 60% of the total. That’s something you can’t easily see with a bar chart.

Then, there is a second level of detail, where you have categories like Housing (more than half of living expenditure) or Transportation. And finally, you could use your mouse to identify those detailed categories in the outer gray ring.

I’ve added some arcs to compare the profile of total consumer units to consumer units with five or more persons. Each arc always starts at the same degree of the corresponding slice. Different proportions lead to gaps or overlaps. Please note that this is not a core feature of this chart. Just wanted to play a little with comparisons (an obvious issue: since the first arcs are closer to the center, a gap between them is different than a gap between the last arcs).

The Secret Strength of Doughnut Charts

As we saw above, pie charts are better than bar charts when comparing proportions. But, as soon as you add a second pie chart you are trying to compare proportion A1 with proportion A2, not proportions A and B of the same pie. There is a shift in the analysis and the pies become useless (use bar charts instead).

Just because you can merge both pie charts in a single doughnut chart it doesn’t mean that you gain efficiency, because the essential problems remain in place.

For many, a doughnut chart is a bad mutation of a bad chart. But if, just if, two bad’s become on good? Could a doughnut, if correctly use, become a kind of pie chart on steroids?

Let me emphasize this: never use a doughnut chart to compare series. I don’t, and I strongly recommend that you should avoid it as well… Always use a doughnut chart to add detail to a series. That’s the secret strength of doughnut charts.

And please, please, could someone write an article on doughnut charts for the English Wikipedia?

I made this chart in Excel

In case you are wondering, you can make the Consumer Expenditure chart in Excel, 2003 or 2007. Instead of the default theme colors, I used some of the colors that will be available in Chart Tamer (thanks, Andreas!).

Conclusion

Pie charts do not deserve their bad reputation. They seem to be more efficient than bar charts in some very specific tasks, like  comparing combined proportions. We should take advantage of that by adding multiple levels of detail. We shouldn’t be afraid of using a large number of categories, provided that those levels of detail are clear and meaningful.

The doughnut chart is the most misunderstood of our chart toolbox. It is seen as completely useless because two series should not be compared using circular charts, but that’s not what doughnut charts should be used for. They should be used to extend the power of pie charts, managing efficiently the level of detail that we need to add to create more insightful charts.

Is this a good way to use pie and doughnut charts? Please share your thought in the comments.

[Update: If you want to know how to create this chart (with a bonus hole-remover...) Jon has a detailed explanation here.]

{ 12 comments }

In what seems to be a post-vacation syndrome, I am in the mood for pie charts. I see them everywhere, even in car logos.

Actually, I am more in the mood to defy current “crowd wisdom” about pie chats.

Search the web for “pie chart” and you’ll get more than one million results, and a depressing picture of human knowledge. Browse the first 100 and what do you get? Some educational(?) sites (poor kids), tutorials (Excel, php, java, Illustrator), humor (here, here, here), bad (here, here, here, here, here or here) or just plain stupid examples. You’ll also find them in in court or fighting government (who could ever imagine that?). I’ll leave for another post what the Wikipedia and the pie chart thread in Tufte’s Ask E.T. say about pie charts (Stephen Few’s Save the Pies for Dessert is not listed within the first 100 results).

An old litany

Some of these sites discuss the use of pie graphs, but they usually recite the same old litany: our perception is bad at judging angles, you should use no more than five or six categories, don’t use them to compare series, Cleveland’s findings, etc. (there also is at least one unfair comparison between pie and bar graphs and one very aggressive rant against them).

If there is something that I would like to have written about pie graphs it is this Expert notes at ManyEyes:

Pie charts have a mixed reputation. They are popular in business and the media but many information designers have criticized the technique. Some claim that the pie slice shape communicates numbers less exactly than other possibilities such as line length. But this remains unclear in the context of proportions: for example, we have seen no studies that looked at the task of judging whether an item is more or less than 50%. It’s also unclear whether exact communication of numeric values is the only evaluation criterion; at least one study indicates that use of a pie chart for analyzing a problem as opposed to a bar chart changes the way people think about the problem.

This is clearly more constructive than saying that “they are as professional as a pair of assless chaps” (less funny though).

Not all charts are born equal

Current wisdom presumes that bar graphs and pie graphs are equivalent. For that reason, bar graphs should be used, always. After all, they are more efficient, right? But if they are not equivalent, as the above quote suggests? Take a time series, for example. If you want to see trends, you’ll choose a line graph; if you want to compare data points you’ll use a column graph. They are very similar, but by choosing one or the other, the designer is making a choice of how he/she’ll  look at the problem. Bar graphs and pie graphs are very different, so shouldn’t we think twice before selecting a bar graph because of its presumed superior efficiency?

This disdain for pie charts has its roots in Cleveland’s work and in Tufte’s and Few’s writings. Their positivist view towards information visualization may be as relevant as the classic economic theory and its presumption that consumer always take the rational decision, but are we not all predictably irrational? I agree with Robert at EagerEyes when he says:

There is no doubt that we need to be careful about the choice of visual representation, and that we need to encourage the use of good charts and criticize the bad ones. But that doesn’t mean we can get lazy and squeeze everything into a few standard charts types we’ve been using for decades. That is especially true if we want people to actually care about what we’re trying to show – and not bore them to tears.

We should probably try to be more rational and circumspect in a decision-making environment and do not use the media as our role model, otherwise business visualization may become useless. However, ruling pie charts out is not the wisest decision.

Simple rules are made for beginners. Let’s break some. How about this one:  “you should use no more than five or six categories in a pie chart”. Are you sure?

(Before that, we must re-read what Cleveland said and what others said about Cleveland. That’s the next post.)

{ 10 comments }

Best Pie Chart Award
(clean and balanced. Your perception may not be great at comparing angles, but who cares?)

 

2th Place
(also nice, but too many slices, and I don’t like the title around the pie)

 

Lateral Pie-Thinking Award
(well, perhaps someone just messed up the template)

 

Designer’s Pie Charts Award
(data? what data?)

 

Seth Godin’s Pie Chart Award
(“makes an obvious point, no nuances“)

 

Consensus Pie Charts: The Venn Pie

 

Consensus Pie Charts: The Line Pie

 

Consensus Pie Charts: The Bar Pie

 

Flash Gordon Pie

 

We Try Harder Award

 

{ 13 comments }

This is the time for scatter plots in the 10 x 10 charting tips series:

  1. A scatter plot is square by definition (I forget that sometimes…);
  2. In some cases, it makes more sense to use a scatter plot than two column charts: for example, instead of having a column chart to display product market share and another chart to display product growth, consider merging both into a scatter plot (market share on the x axis and growth on the y axis);
  3. If you are plotting several data series, color code them instead of using different markers…
  4. … but consider using several charts;
  5. In scatter plots, use empty circles as markers to let the reader see the overlapping points;
  6. Use a scatter plot matrix to analyze pairwise relationships between series;
  7. Use a scatter plot as an alternative to horizontal bar charts, like in a population pyramid;
  8. If needed, use a scatter plot instead of a line chart if you have an unevenly-spaced time series;
  9. You can use a scatter plot to create a basic map;
  10. An outline can ruin your scatter plot. If possible, remove it and explain it;

As you can see, you can use a scatter plot in Excel to create many other charts. Just use your imagination and share it in the comments.

{ 0 comments }

geo_scatterplot This is an Excel scatterplot. Each point is one of the 4200 Portuguese civil parishes. The green point shows the active parish and the red ones some parishes that may have a similar profile. Of course, if you select a different parish the red set also changes.

I like this idea of displaying geographic coordinates in a scatter plot and by that be able to see some (very basic) geographic patterns. Just by plotting the coordinates you get an idea of how the territory is structured and you can start asking questions (“why is the north so different from the south?”). By providing some more data (color coding the data points) we can add complexity to our questions.

If you think there may be a spatial pattern in your data and you (or the users) can’t have access to GIS software, or you just don’t want to learn another application, this technique could come in handy.

{ 5 comments }