Posts tagged as:

chart redesign

Annotating your chart helps your audience to understand the reasons behind some patterns or outliers. But, please, please, don’t bury the data under boxes and arrows and busy grid lines, like this one on the right does (from WTRG Economics).

How can you improve a chart like this? First of all, the series must be visible… :) Don’t create a competition for the readers’ attention. Notes should be in the background, like the axis, labels and other chart objects.

A note is not a marker. Add a note to explain a specific behaviour (“this happened because…”). If you have a time series, create a timeline outside the chart and add markers for relevant contextual events (there are several other design options).

It is easy to come up with something cleaner:

There was enough notes about quotas to create a new chart, with new insights. All the other notes are there, but they only murmur. I don’t feel confortable with using two axis, but that’s a topic for another post.

By the way: there is (at least) a perceptual error in my chart. Can you find it?

{ 1 comment }

We are so busy creating sexy charts to illustrate some random data that we often forget to check if our chart really answers the question. Heck, most of the times we don’t even have one. Chart first, ask questions later.

One of the major differences between tables and charts is this: a tables says “here is your data, now go find the answers (they must be here, somewhere)”, while a good chart says “here is your answer”.

The more precise and clear our question is, the easier is to select the right data and the right chart. Let me give you a recent example. Robert Kosara, at EagerEyes, discusses the “swing states”. Several readers contributed with great alternative displays but, as I commented, there is a fundamental issue: if we want to see “the swing” that’s what should be displayed, not the election outcomes. This:

 

 

is different from this:

 

 

In the first chart, we can see that the Republican candidate won in Alaska in 1968. In the second chart, we know that, in 1968, in Alaska, there was a different outcome – a “swing”.

Sure we can infer the swing from the first chart, and our answer about the “swing states” is there somewhere, but only the second chart can provide a clear and concise answer.

It is prudent to keep all the data (who knows what the future will bring, right?), but we should always be aware of our loss aversion tendency, and make sure that our chart is displaying what it was designed for. Edit your chart without mercy, and let redundant or plainly useless data go. That’s the only way to highlight the patterns we are looking for.

Now, you may want to reevaluate your question to allow for a broader answer. That’s ok, but do it carefully. Add detail without breaking the pattern. For instance, we may want to know more about the direction of the swing:

 

 

Bottom line, make sure that what your chart says is aligned with what you asked. If you can use your question in the chart title that’s a good sign that you are on the right track.

{ 2 comments }

Loss aversion – wrong chart example

Monney Income - wrong chart

JunkCharts writes an interesting post on how loss aversion can happen in chart-making. The general concept of loss aversion tells us that “people strongly prefer avoiding losses than acquiring gains”. Translated to chart-making, it means that there is a “tendency to avoid losing data at any cost”.

“To clarify, add detail” says Tufte. Corollary: you should make data-dense charts and maximize the data-ink ratio. Problem is, this fits too well into the loss aversion tendency. Take the above chart, for instance: does it make any sense to add those nine series to a single chart? What insight do you get from it? Only one: the designer don’t know how to handle a larger number of data series.

Remove irrelevant data series and you risk a mutiny on the Bounty, even if relevant trends are easier to detect. It is absurd, but very human.

So, how can you give the users all the data they expect while keeping the chart clean and readable? Well, to clarify, add detail to existing patterns (that’s what I just did to Tufte’s sentence…).

Tufte talks about “data layers”; Ben Schneiderman’s Visual Information-Seeking Mantra (“overview first, zoom and filter, then details-on-demand”); the focus+context technique. All they convey a simple idea: prioritize your data. Know what is relevant and what is nice to have. Don’t give the user a final product. Make an interactive chart and let her discover what’s inside.

I see this loss aversion tendency at work every day at the office. Do you too? How do you handle it?

{ 10 comments }

I’m a consensus kind of guy, I can’t help it. I always try to find the best parts of not-so-good things (a Curate’s egg syndrome?). Let me give you an example.

One of the reasons why people like pie charts is because of its strong and familiar metaphor – it is part of our daily life.

Another good metaphor is the analog clock. You don’t need a legend to know the time. So, why don’t you use it to display hourly data?

Take a look at the radar chart on the left (the roman numerals – neat, hum?). It displays pageviews per hour by hour of the day. There are two series, daytime and nighttime. As you can see, the nighttime pageviews are much lower (I wonder why…).

If you want to compare daytime and nighttime data do everyone a favor: forget about day and night. Don’t assume that those 24 data points should be split in midnight to midday and midday to midnight. Or just because you raise early, the split should be 6:00 a.m. to 6:00 p.m. and 6:00 p.m. to 6 a.m. Look at the data and do what it tells you to do.  A good split creates two series that maximizes variability between them (and each series becomes more internally consistent). In this case, the split was at 8:00 a.m/p.m.

Yes, but what about the Curate’s egg? Glad you asked.

Chandoo, over PointyHairedDilbert, had “an interesting charting idea to show the data around the clock“:

Jon Peltier doesn’t really like the idea and suggests a much more conservative aproach:

Now, shake both charts (shaken, not stirred…) and what do you get? My radar chart, of course! And what a fine mix of both it is!

Ok, where was I? Ah, yes, my soft boiled egg…

{ 11 comments }

Abortion ratios 1980-2003 by race, marital status and age

Source: U.S. Census Bureau (original Excel file). The abortion ratio is defined by the number of abortions per 1,000 abortions and live births.

(Click to enlarge)

Notes:

1. We know that information visualization is all about pattern detection. But often our design choices hide relevant patterns behind the obvious one(s). Take this panel, for instance. Everyone can see the downward pattern, but what about the U-shaped pattern across age groups? You can see it, right? Well, follow the usual path and you’ll miss it.

2. A ratio (or a growth rate) is something that should always be put in the context of actual volumes or proportions. There is no “best answer” to link both dimensions but the panel displays a reasonable solution. As you can see, the abortion ratio among women less than 15 years old is very high but its proportion in the total number ob abortions is almost residual. On the other hand, the ratio in the 15-19 age group may be lower, but it is much higher than the average ratio and accounts for around 17 of the total abortions. Whenever possible, you should keep these two measures close together.

3. There are seven age groups in this data set. Put them all together in a single line chart and you’ll miss the pattern across groups, as discussed above, but you’ll also have a hard time disentangling the whole thing. Before creating the chart always ask: what is my specific question? This will help you to prioritize and create a focus-context display. If a series answers your question (need-to-have) add it to the chart and color-code it. If a series is interesting but doesn’t directly answer your question (nice-to-have), you may add it to the chart to provide context, but gray it out and delete if from the legend, if you have one.

Please let me know what you think and suggest ways to improve the panel.

[Update: Jon discusses the process of pattern discovery in RE: Abortion Ratios 1980-2003. Andreas adds several good suggestions and shows how to display the date in a more consistent small multiples chart.]

{ 20 comments }

Nathan asks us Can You Improve this Mediocre Statistical Graphic?

Since there are only two series (two parties) with a obvious mirror effect, I would say it doesn’t make sense (from a chart economy point of view) to display both series. And since the 50% mark is relevant in election results, why shouldn’t we just look at the trend of one of those parties around that mark? It would help to tell a more interesting story.

So, this is my radical suggestion, with Bonavista’s sparklines:

“The percentage of counties in California that have a Democrat majority of registered voters in Presidential election years droped sharply in the last four elections and now stays well below the 50% mark ( Microcharts). Loren ipsum….” (I have to work on a better integration of sparklines and the blog template, but you get the idea…)

It would be nicer to have more data points, but this small footprint chart conveys the essencial message. Of course you can follow the standard approaches (a line chart with both parties or a stacked bar chart). As always, it all depends on what you want to say and how you want to say it.

{ 3 comments }