From the monthly archives:

February 2009

If you are a market researcher, and you want to make sure that you get more reliable results for a subgroup in a survey, what do you do? You must increase the overall sample size (and spend a lot of money), right?

Actually, you don’t.

You can oversample that group only, and then weight it down to its known proportion in the population. For example, you may want to increase the number of managers and decrease the number of housewives (because the former are usually more heterogeneous than the latter). Oversampling is a common research method, and a very cost-effective way to get precise estimates for a subgroup.

This is a real-world solution, and if we have finite resources to solve a real-world problem, resource allocation must be part of the equation. Higher variability usually demands for more resources.

Why is this relevant in a blog about charts and information visualization? Glad you ask.

The Great Irregular Interval Debate

Let me give you an example. A while back, Jon Peltier wrote in his blog:

I don’t understand the obsession with an equal date interval. A line chart need not show the trend of only evenly-spaced data. Suppose I am observing temperatures, and I decide for simplicity that where the temperature hasn’t changed, or where it has been changing steadily, I do not need to record every value. Overnight after the temperature has dropped, I can characterize my temperature profile with one point per hour. As the sun rises, I may need more frequent recordings to capture the morning warm up. Then the clouds blow over, it starts to rain, then it clears up again; I may need minute-by-minute data points to track this. When I make my plot, is it any less relevant because the spacing of the data ranges from minutes to hours?

This is oversampling, and a wise resource allocation, too. In a survey, you weight the subgroup down to its right proportion, and that’s also what you do in a chart, when irregular date intervals are displayed proportionally.

Stephen Few disagrees:

Using a line to connect values along unequal intervals of time or to connect intervals that are not adjacent in time is misleading.

Furthermore:

How could we trust graphical representations of time series or frequency distributions if their shapes could have been altered by inconsistently manipulating the sizes of intervals along the scale, either arbitrarily or intentionally to deceive? We can derive meaning from patterns and trends that these graphs display only if the intervals are consistent.

wrong-line-chartHe exemplifies his argument with these two charts (actually, there are three, but we can safely disregard the third one).

The first chart displays the correct annual sales. The second one displays arbitrarily grouped annual sales and, obviously, its pattern is quite different.

Now, the second chart is plain wrong, so I am not sure if you can use it to argue against unequal intervals.

corrected-line-chart

Let’s use a fairer example with the same dataset and the same arbitrary grouping.

Compare the orange line with Few’s first chart. I actually don’t see much difference. Sure you lose a lot of detail, but the basic pattern is there. Instead of sums, I am using averages (you can’t compare a single year with the total sales of three or four years).

The other two lines show the difference between equal and unequal intervals. The brown line displays the data points unequally spaced while the gray one uses equal intervals (Few’s second chart). I had to make some assumptions regarding the reference date, so this is not the best example, but it is good enough to show the potential risk of using equal intervals with unequal intervals of time.

Bottom line, oversampling is a useful method for better resource allocation. We can view irregular time series as some sort of oversampling, provided there are no missing values and irregular intervals in the chart are consistent with intervals in the time series.

Grouping data points is always a tricky issue, and Stephen Few show it clearly, but we shouldn’t infer that “line graphs and irregular intervals is an incompatible partnership.”

(When using time series in Excel, make sure that category axis labels are recognized as dates. Alternatively, use a scatter plot with connected data points.)

{ 8 comments }

poverty-ratios-skyscraperTextures. 3D. Pie charts. Primary colors. Trends hidden behind labels. Backgrounds. Pie charts again.

Clear signs of a bad chart, right? Right. It is so easy to spot a badly designed chart that you can use a computer to do it. Don’t waste your time.

Let’s stop discussing the obviously wrong and start discussing the useless right. Like this chart here. (I’ve borrowed the dataset Nathan used in one of his visualization challenges – some interesting entries and great discussion there, by the way).

There may not be anything really, really wrong with this chart, but it reflects a bureaucratic way of thinking about data and data presentation where every single data point must be clearly shown and labeled. Just like a table.

Listen, unless you work for a statistics office, you should never create a chart like this. I know, it’s irresistible to check how well my state ranks, but identifying each and every data point in a virtually limitless bar chart makes no sense in most cases.

Do you read the labels between the top five and the bottom five? Charts like this encourage look up of individual data points, and for that a table is probably a better option. If anything, a skyscraper bar chart is a clear sign of loss aversion.

A Flexible Bar Chart: Introducing the Accordion Bar Graph

How do you graph a categorical variable with more than, say, 20 data points without creating a skyscraper? This is what I have in mind:

  • You must retain the overall pattern, so you can’t remove data from the chart;
  • Create one or more focus area (top five and bottom five, for example);
  • Gaps between bars should be larger in these focus areas, so that labels can easily be added.
  • Minimize the height of the remaining bars and remove the labels;

The chart should look like this:

focus-context-bar-chart

 

I like the accordion metaphor and I’m playing with it. An interactive version could use a simple event to create a focus inside the context area, so when the user moves the mouse the bar is enlarged and the label is shown.

What do you think? Do you agree that skyscraper bar charts are (almost) useless or should we focus on reducing the number of data points instead? How would you improve this design? Please share your comments and charts below.

Update

Well, if you want to know how to do this in Excel and read a great discussion about it, Jon wrote Accordion Chart for Jorge. He not only discusses some of the options but also shares the Excel file with us. Thanks Jon! And Dick, over the Daily Dose of Excel wants to make sure that your state is automatically highlighted (Ego Charts). Nice “quarter step”!

{ 22 comments }