From the category archives:

Data

If you are a market researcher, and you want to make sure that you get more reliable results for a subgroup in a survey, what do you do? You must increase the overall sample size (and spend a lot of money), right?

Actually, you don’t.

You can oversample that group only, and then weight it down to its known proportion in the population. For example, you may want to increase the number of managers and decrease the number of housewives (because the former are usually more heterogeneous than the latter). Oversampling is a common research method, and a very cost-effective way to get precise estimates for a subgroup.

This is a real-world solution, and if we have finite resources to solve a real-world problem, resource allocation must be part of the equation. Higher variability usually demands for more resources.

Why is this relevant in a blog about charts and information visualization? Glad you ask.

The Great Irregular Interval Debate

Let me give you an example. A while back, Jon Peltier wrote in his blog:

I don’t understand the obsession with an equal date interval. A line chart need not show the trend of only evenly-spaced data. Suppose I am observing temperatures, and I decide for simplicity that where the temperature hasn’t changed, or where it has been changing steadily, I do not need to record every value. Overnight after the temperature has dropped, I can characterize my temperature profile with one point per hour. As the sun rises, I may need more frequent recordings to capture the morning warm up. Then the clouds blow over, it starts to rain, then it clears up again; I may need minute-by-minute data points to track this. When I make my plot, is it any less relevant because the spacing of the data ranges from minutes to hours?

This is oversampling, and a wise resource allocation, too. In a survey, you weight the subgroup down to its right proportion, and that’s also what you do in a chart, when irregular date intervals are displayed proportionally.

Stephen Few disagrees:

Using a line to connect values along unequal intervals of time or to connect intervals that are not adjacent in time is misleading.

Furthermore:

How could we trust graphical representations of time series or frequency distributions if their shapes could have been altered by inconsistently manipulating the sizes of intervals along the scale, either arbitrarily or intentionally to deceive? We can derive meaning from patterns and trends that these graphs display only if the intervals are consistent.

wrong-line-chartHe exemplifies his argument with these two charts (actually, there are three, but we can safely disregard the third one).

The first chart displays the correct annual sales. The second one displays arbitrarily grouped annual sales and, obviously, its pattern is quite different.

Now, the second chart is plain wrong, so I am not sure if you can use it to argue against unequal intervals.

corrected-line-chart

Let’s use a fairer example with the same dataset and the same arbitrary grouping.

Compare the orange line with Few’s first chart. I actually don’t see much difference. Sure you lose a lot of detail, but the basic pattern is there. Instead of sums, I am using averages (you can’t compare a single year with the total sales of three or four years).

The other two lines show the difference between equal and unequal intervals. The brown line displays the data points unequally spaced while the gray one uses equal intervals (Few’s second chart). I had to make some assumptions regarding the reference date, so this is not the best example, but it is good enough to show the potential risk of using equal intervals with unequal intervals of time.

Bottom line, oversampling is a useful method for better resource allocation. We can view irregular time series as some sort of oversampling, provided there are no missing values and irregular intervals in the chart are consistent with intervals in the time series.

Grouping data points is always a tricky issue, and Stephen Few show it clearly, but we shouldn’t infer that “line graphs and irregular intervals is an incompatible partnership.”

(When using time series in Excel, make sure that category axis labels are recognized as dates. Alternatively, use a scatter plot with connected data points.)

{ 8 comments }

poverty-ratios-skyscraperTextures. 3D. Pie charts. Primary colors. Trends hidden behind labels. Backgrounds. Pie charts again.

Clear signs of a bad chart, right? Right. It is so easy to spot a badly designed chart that you can use a computer to do it. Don’t waste your time.

Let’s stop discussing the obviously wrong and start discussing the useless right. Like this chart here. (I’ve borrowed the dataset Nathan used in one of his visualization challenges – some interesting entries and great discussion there, by the way).

There may not be anything really, really wrong with this chart, but it reflects a bureaucratic way of thinking about data and data presentation where every single data point must be clearly shown and labeled. Just like a table.

Listen, unless you work for a statistics office, you should never create a chart like this. I know, it’s irresistible to check how well my state ranks, but identifying each and every data point in a virtually limitless bar chart makes no sense in most cases.

Do you read the labels between the top five and the bottom five? Charts like this encourage look up of individual data points, and for that a table is probably a better option. If anything, a skyscraper bar chart is a clear sign of loss aversion.

A Flexible Bar Chart: Introducing the Accordion Bar Graph

How do you graph a categorical variable with more than, say, 20 data points without creating a skyscraper? This is what I have in mind:

  • You must retain the overall pattern, so you can’t remove data from the chart;
  • Create one or more focus area (top five and bottom five, for example);
  • Gaps between bars should be larger in these focus areas, so that labels can easily be added.
  • Minimize the height of the remaining bars and remove the labels;

The chart should look like this:

focus-context-bar-chart

 

I like the accordion metaphor and I’m playing with it. An interactive version could use a simple event to create a focus inside the context area, so when the user moves the mouse the bar is enlarged and the label is shown.

What do you think? Do you agree that skyscraper bar charts are (almost) useless or should we focus on reducing the number of data points instead? How would you improve this design? Please share your comments and charts below.

Update

Well, if you want to know how to do this in Excel and read a great discussion about it, Jon wrote Accordion Chart for Jorge. He not only discusses some of the options but also shares the Excel file with us. Thanks Jon! And Dick, over the Daily Dose of Excel wants to make sure that your state is automatically highlighted (Ego Charts). Nice “quarter step”!

{ 22 comments }

A chart is always an answer to an underlying question. If you don’t know the question be prepared for random answers (300-slide Powerpoint presentations, anyone?).

Do yourself a favor and and write down the questions that define your project. Group them meaningfully and use them as chart titles. Each chart may prove irrelevant or force new questions. Write them down. Repeat the process.

Jacques Bertin tells us that a chart should be able to answer elementary (“how much did we sell in March?”), intermediate (“what happened in the North district?”) and global (“how does our product compares with the market?”) questions. If it doesn’t, then it is an inefficient construction and should be redesigned or removed. This is also a simple way to identify redundant charts.

Don’t replace information overload with chart overload. Similar questions may require a single answer. Create a single, interactive chart and let the users find their own answers.

Embrace the questions, delegate the answers.

{ 1 comment }

More data = better decisions, right? Not always. When you are getting more information than you can process within a specific time period information overload starts creeping. Confusion, stress, anxiety and low motivation usually follow. Can we prevent that?

information-overloadIn general, the more information you have, the more accurate your decisions will be. But at some point, the trend reverses, and the more information you have, the less accurate your decisions are (recommended reading: this paper for causes and consequences of information overload and the – poor – Wikipedia article).

Too often you can root causes of information overload to poor information and report design and poor information management skills. Let me exemplify. Can you memorize this sequence?

1123581321345589

It is not easy. Let’s try again:

1123-581-321-345-589

Better, but not good enough. Let’s try this one:

1+1=2+3=5+8=13+21=34+55=89

You’ll probably recognize these as the Fibonacci numbers, a sequence of numbers where each is the sum of the two preceding numbers.

So, you’ve tried to memorize a string of 16 digits. Then five strings of three or four digits. Then a word, “Fibonacci”. Which was the easiest?

Small scale information overload: working memory management

Let’s assume for the sake of discussion that information overload takes place when the information you try to manage exceeds the capacity of your working memory (it goes much beyond that, of course). Let’s also assume that there are five slots of working memory that you can use to store chunks of data.

As you can see, there is no room in working memory for the first sequence, the second barely fits and the third uses only one slot, for exactly the same data.

While you can’t do much to add more slots to your working memory, you can have an active role at the design of those chunks and by that greatly improve the way you handle data and reduce the danger of information overload.

There’s a thread in Edward Tufte’s forums where he discusses Miller’s classic paper, “The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity for Processing Information“. Tufte rightly argues that:

… the deep point of Miller’s paper is to suggest strategies, such as placing information within a context, that extend the reach of memory beyond tiny clumps of data.

By providing context or some sort of linking between “tiny clumps of data” you can create a single chunk of data. This is a basic strategy for information management and visualization.

Just be aware of the limits of our working memory and understand what you can do to maximize its capacity. This is a great starting point to design better charts.

Simple tip: avoid a back-and-forth movement

Minimizing the need for a back-and-forth movement is a practical application of the these principles.

We decode a chart with multiple series by reading the legend and storing the meaning in our working memory. If there are more series than the available memory, a pendular eye movement between the legend and the plot area occurs. Try to prevent that by directly labeling the series (specially in line and pie charts) or make sure that you really need all those series. If you do, a panel could be a better option.

When you have two related charts in two different slides in a presentation your audience will probably want to compare them, and a back-and-forth movement between slides happens again. Try to change the presentation design so that both charts are placed in a single slide, making comparisons easier.

(Why the sunflower?)

Photo credit: catd mitchell

{ 0 comments }

For many of us this is a provocative question. Haven’t Tufte, Few, Cleveland and many others proved that, beyond reasonable doubt? Isn’t there a prosperous industry based on the obvious usefulness of charts and information visualization? Is everyone wrong?

Let me play devil’s advocate here. A large majority of charts you’ll find in the corporate sector is irrelevant, if not misleading (check some annual reports for a grim picture). Corporate presentations are nothing more than futile rituals of impression management where the presenter gets his “wow factor” not from how insightful his presentation is, but from how cool the rendering of his 3D flying charts are. Serious managers will never use charts as decision support tools (“just show me the numbers”). Finally, there is scientific evidence that charts will not improve the decision quality (“naive superiority hypothesis”, according to this article).

So, give me a table report with some well chosen key indicators and leave charts to lazy marketers.

How would you respond to this?

{ 11 comments }