1. Tufte, the Father of Eye-Candy Charts

Tufte’s The Visual Display of Quantitative Information, published in 1983, is probably the most influential book in the history of data visualization, and it is likely to remain so for some more time.

In his book, Tufte outlines for the first time a consistent theory of how a chart object should look like and why it should look like that. His guidelines are easy to understand and very quotable, not buried under six feet of abstractions. Think of well-known concepts like “data-ink ratio”, “data-density” or “chartjunk”: they all come from The Visual Display

However, too often these principles are taken as self-evident, somehow “discovered”, not invented. A fundamental clarification must be made: these are aesthetic principles that Tufte transposes (brilliantly) from Ludwig Mies van der Rohe’s minimalism to the field of data visualization. These are not universal principles backed up by scientific evidence. Some studies find them helpful, some studies say they are irrelevant, but their effectiveness is hard to measure and they should not be taken as indisputable laws (I call this the “what-would-tufte-say syndrome“).

Unlike other authors (Jacques Bertin, Tukey, William Cleveland), Tufte recognizes that only an aesthetic framework can structure the image (color management, the role of non-data objects, how to emphasize/de-emphasize elements in a chart…). This is clearly the realm of graphic design.

Using aesthetics to improve function is probably the major contribution of Edward Tufte to the display of quantitative information. Unfortunately, this idea that a chart can be an aesthetically pleasing object (“Beautiful Evidence”, the title of his latest book, says it all) went astray and gave birth to a whole industry of eye-candy visualization tools.

From Tufte’s positivist point of view, a chart is defined by how well it makes a pattern stand out. It may be boring but, if that is the case, then “you’ve got the wrong numbers”. His faith in human rationality is both charming and frightening…

2. Patterns, patterns, patterns. And something else.

There are so many misconceptions  to discuss about data visualization that we often forget to emphasize this simple true: data visualization is about pattern discovery, finding useful, actionable visual patterns hidden in the data and make them stand out. Let me repeat: it’s all about visual patterns.

Tufte would agree, but here is the fun part: there is nothing wrong with using 3D effects, textures, and all the decoration in the world. Use them! It is your good taste against Tufte’s. You don’t have to like minimalism. Add color, clipart, anything that you think can engage your audience.

I am not kidding. It’s you, not Tufte, who defines your aesthetic program. Almost anything goes. But, whatever you do:

  • Don’t design technically incorrect charts: do not distort a circle, do not use more than one series in a pie chart, do not make an object variate in two dimensions when you are using a single series, etc. Just common sense, really. And, of course, if you want to break the rules, know them first.
  • Don’t hide the patterns: find the patterns and make them visible. Remove everything except the series themselves. Now start embellishing your chart. But remember: every little thing you add multiplies the clutter and makes the patterns harder to see. You’ll have to find that point where the impact of eye-catching decoration on pattern visualization goes beyond an acceptable threshold.

Please note that minimalism was not randomly chosen. Not only it makes pattern discovery much simpler but also creates a framework to evaluate what belongs to the chart and what doesn’t belong. You can reject it, but if you don’t have a different framework you must decide on an ad hoc basis. Unless you are an accomplished graphic designer (and even then), a minimalist approach is a good start and it should help you to find your own style.

3. Emotions, Emotions, Emotions

Let’s face it: you don’t have much choice. If you do not want to sacrifice patterns, the amount of of decoration that you can actually use is very limited.

So, what do you do with that limited amount of decoration? Essentially you’ll try to create the right emotional response. This is not what you would expect from a over-positivist chart that you end up with by choosing the minimalist path.

Refusing to acknowledge the role of emotions in data visualization is a bizarre thing, considering that you can’t remove aesthetics from the equation, and we all have an emotional sense of Beauty. What many hardcore Tufte fans may consider chartjunk can actually keep the audience from turning the page.

4. Edward Tufte and Excel

Throughout his books, Tufte often refers to the higher resolution of paper, and how it outperforms the current screen resolutions. His sparklines are meant to be printed, because only then the fine details can be observed.

In Edward Tufte’s vision, each chart is unique, and deserves the attention of a work of art. He despises PowerPoint and hardly mentions Excel. His charting tool is Adobe Illustrator, where he is in full control of each small detail. He admonishes against patronizing the readers, but he never really discusses the audience as something that should be taken into account when designing a chart.

5. Knowledge Is Built by the User

matrixpermutator

Much as changed in the last 27 years and you may think that Tufte’s The Visual Display… emphasizes the use of paper just because the extraordinary changes in information technology were still in their infancy back in 1983.

Thing is, that’s not the reason. The real reason is that Tufte always thought of a chart as a final product to be printed and handed to the audience, not something that could be manipulated by the audience.

There is a striking difference between Edward Tufte and Jacques Bertin. Bertin’s “reorderable matrix” is dynamic by definition, and and one of my preferred quotes summarizes perfectly his views:

“It is the internal mobility of the image which characterizes modern graphics. A graphic is no longer ‘drawn’ once and for all; it is ‘constructed’ and reconstructed (manipulated) until all the relationships which lie within it have been perceived.”

This was written in 1967, long before the PC was even imagined. Edward Tufte wants to design an efficient but elegant chart, Bertin wants to solve a business problem. There is no contradiction, one is not better than the other. They just serve different masters. (The image above is from Bertin’s Graphic Semiology and shows how a “dynamic chart” looked in 1967…)

Forty years have passed, but a vast majority of data users have no access to dynamic charts, either because they don’t have access to the right charting tools or they are unable to create those charts using their current tools (it is not that easy for a beginner to create a dynamic chart in Excel).

6. The Life Span of a Business Chart

In his essay “The Cognitive Style of PowerPoint” Edward Tufte argues that the tool itself is intrinsically flawed. I agree with him. Tools are not neutral. They can be forced to do things against their will, but that’s never easy. You can create a dynamic chart in Excel, but it is difficult. You can even force Excel to work like Tableau, but that’s like reinventing the wheel. You can create good chart in Crystal Xcelsius, but that’s against its nature.

The point is, you can apply Edward Tufte’s principles by the book, but that means spending hours perfecting a chart in Illustrator and then printing it. I’d love to. Unfortunately, that’s not exactly how things work in a business environment. The life span of a business chart is short and the time to create it, even shorter. We cannot use Illustrator to create business charts.

7. Take-Away Points

Break away from Edward Tufte, but make sure you know why. Add emotion to your charts (rationally). Decide if the level of eye-candy your audience needs goes beyond what you are willing to add. Other things been equal, an interactive chart should need less eye-candy than a static one. Above all, show the patterns (but make sure your audience wants to see them).

{ 20 comments }

I am a moderately advanced Excel user. This means “a dangerous person” for the IT department, but I like this daily fight, and Excel dashboards are among my preferred weapons. Let’s see how they can be used.

Excel is the best tool for executive dashboard prototyping, because of its flexibility and development costs. Creating a fully functional prototype is not hard and it should be available for user feedback in a matter days. So, make sure that, every time you spot a dashboard project, a prototype in Excel is included.

Since most business intelligence applications are notorious for their lack of basic chart formatting options, it shouldn’t be hard for you to create a simply set of charts that the IT is unable to implement. If needed, use some advanced Excel charting techniques (including dummy series), but make sure they add real value to the user experience. Interactive features like visual what-if analysis are always cool and the users love them.

When presenting your project, do your best to convince your audience that you are technology-agnostic and all you care about is to create the best answer to users needs.

IT will try to change your project, naturally. Try to avoid the “security bomb” (their favorite). You know how poor their expensive BI toys are, and you should know what they can and can’t do with them. Minor concessions can earn you some points. When they tell you they can’t implement your core ideas be prepared to fake genuine surprise, compare costs (again) and emphatically say that their options clearly don’t meet the organization’s needs.

Pissing off the IT department is one of the most enjoyable games in corporate life, but be a gentleman and don’t make them look stupid. They don’t usually have a good sense of humour and take their quest to conquer the world very seriously. If you really want to implement the dashboard, don’t make it an island if you can avoid it (connect it to the tables in the IT infrastructure, instead of copy/pasting data). 

Seriously: Excel is a great tool for dashboard prototyping. You can easily create multiple alternative user interfaces, get feedback from users or find design flaws. The end result should be much better than trying to capture some ill-defined requirements and send them to the IT, where user interface design usually ranks very low in their priorities list.

{ 16 comments }

Annotating your chart helps your audience to understand the reasons behind some patterns or outliers. But, please, please, don’t bury the data under boxes and arrows and busy grid lines, like this one on the right does (from WTRG Economics).

How can you improve a chart like this? First of all, the series must be visible… :) Don’t create a competition for the readers’ attention. Notes should be in the background, like the axis, labels and other chart objects.

A note is not a marker. Add a note to explain a specific behaviour (“this happened because…”). If you have a time series, create a timeline outside the chart and add markers for relevant contextual events (there are several other design options).

It is easy to come up with something cleaner:

There was enough notes about quotas to create a new chart, with new insights. All the other notes are there, but they only murmur. I don’t feel confortable with using two axis, but that’s a topic for another post.

By the way: there is (at least) a perceptual error in my chart. Can you find it?

{ 1 comment }

We are so busy creating sexy charts to illustrate some random data that we often forget to check if our chart really answers the question. Heck, most of the times we don’t even have one. Chart first, ask questions later.

One of the major differences between tables and charts is this: a tables says “here is your data, now go find the answers (they must be here, somewhere)”, while a good chart says “here is your answer”.

The more precise and clear our question is, the easier is to select the right data and the right chart. Let me give you a recent example. Robert Kosara, at EagerEyes, discusses the “swing states”. Several readers contributed with great alternative displays but, as I commented, there is a fundamental issue: if we want to see “the swing” that’s what should be displayed, not the election outcomes. This:

 

 

is different from this:

 

 

In the first chart, we can see that the Republican candidate won in Alaska in 1968. In the second chart, we know that, in 1968, in Alaska, there was a different outcome – a “swing”.

Sure we can infer the swing from the first chart, and our answer about the “swing states” is there somewhere, but only the second chart can provide a clear and concise answer.

It is prudent to keep all the data (who knows what the future will bring, right?), but we should always be aware of our loss aversion tendency, and make sure that our chart is displaying what it was designed for. Edit your chart without mercy, and let redundant or plainly useless data go. That’s the only way to highlight the patterns we are looking for.

Now, you may want to reevaluate your question to allow for a broader answer. That’s ok, but do it carefully. Add detail without breaking the pattern. For instance, we may want to know more about the direction of the swing:

 

 

Bottom line, make sure that what your chart says is aligned with what you asked. If you can use your question in the chart title that’s a good sign that you are on the right track.

{ 2 comments }

Naked womanCan a picture of a nude person improve your decision-making processes? (Please don’t say “yeaaaaah”.) Probably not, but if you need a good attention grabber a picture of a naked body is your best bet. Make sure you’ll add one to your next sales report.

Because, if you are using those glossy 3D pie charts from Crystal Xcelsius (or Dundas, or…), you are applying the same principle, safe-for-work version. Your underlying message to your audience is “you are so dumb that you don’t even understand a simple chart with a clear message. I have to use charts that obfuscate the message, but they grab your attention and that’s all that matters. Let me take my shirt off too.”

I’m a business analyst I usually try to create charts that can support the decision-making process. I am not a graphic designer, trying to illustrate a story and get reader’s attention.

When you are in a corporate environment you can enjoy the attention of your audience (the organization is paying for it…). Also, information is shared among people with similar professional profile that at least know what the basic concepts are.

On the contrary, in a magazine, your readers don’t know or may not care about your subject. How do you grab their attention? Your best option is to add a photo of a naked male/female. Can you justifiably use it to illustrate the story? Do it. You don’t? OK, try other attention-grabber devices, like a nice, glossy pie chart (not as satisfying, though).

These are different needs, but we, the so called “visualization experts” often fail to aknowledge that.

Eye-Catching Charts vs. Decision-Support Charts

Eye-catching charts are used to get the reader’s attention by providing some sort for light entertainment. Their primary focus is on the format. They use many colours and and large textured surfaces. Because of that, their data density is low and context is almost absent. A 3D pie chart is the typical eye-catching chart.

Decision-support charts  focus on the data and should be “invisible” (the audience sees the patterns, not the chart). There are no textured surfaces and colors are used to highlight specific details. The display real estate can be filed with context data, maximizing data density. The typical decision-support chart is, obviously, the scatterplot.

Charts for Analysis and Charts for Communication – Not Anymore?

This is the traditional split. After the analysis stage, the analyst should prepare his/her findings for the communication stage. But vendors like Microsoft and Business Objects have been short-circuiting this process, selling the idea that all you need is form, not content, and whatever stage you are in, you must have a nicely textured 3D chart.

These charts are sold as “professional-looking” and let’s accept that for a moment. They are professional-looking from a graphic designer perspective, but they are completely useless in a corporate environement where you have masssive amounts of data to deal with. I’m sorry to say, but the more textured charts you have the dumber you look.

Pin-Up Charts Don’t Belong Here

I don’t really care about pin-up charts (charts that the media pin up on their pages…).  Sometimes they are amusing (not sexy, unfortunately) but they just don’t belong in a corporate environment. If you need attention, make better use of your data to find its inner beauty or use a photo of a proper pin-up.

{ 6 comments }

According to Stephen Few, the founders of Tableau Software made some assumptions about visual analytics’ adoption that we can summarize in a single sentence: analysts want to find hidden insights in large and complex data sets using new visual paradigms. Later on, they discovered that these assumptions were somewhat flawed, and that what people really want is to save time in their daily routine when analyzing small and simple data sets, using familiar formats. Reality check, anyone?

We all make some wrong and costly assumptions. I wrote a blog on data visualization in Portuguese for about a year and then I had to give up, because no one seemed to care. I’m selling a tutorial on how to create Excel dashboards that I am proud of, but I should have started with a simpler version that delivers similar results (I’m working on that, by the way…).

Many of these assumptions are powered by what Chip and Dan Heath in Made to Stick call the “curse of knowledge” (“the better we get at generating great ideas—new insights and novel solutions—in our field of expertise, the more unnatural it becomes for us to communicate those ideas clearly”). Our wishful thinking makes us to believe that the knowledge gap is narrower than it really is, and some basic notions that we take for granted are not.

I’d love to write a blog on data visualization using higher-end tools like Tableau or Spotfire, but you can’t tell people “ditch Excel, use these great tools instead”. They have their (growing) market, but an overwhelming  proportion of business charts are made in Excel because that’s the only tool people have access to. Excel is good enough to teach sound visualization principles, so visualization experts should start by saying “you can do it in Excel; here is how”. At some point the newly acquired knowledge would make people move up, if needed. In information visualization, we have the (graphic literacy-wise) rich and the poor. Now we need a solid middle class. Accessible learning tools is one of the answers.

(This is what I am trying to do with pie charts: instead of banning them, I’m trying to show how to create acceptable pie charts. At some point people will realize that they will need something better. I may be wrong, but the other options don’t seem to be working, either.)

If we fail to communicate this simple message (“you can do it in Excel; here is how”) do you know what we’ll get? A new Dundas/Crystal Xcelsius user.

{ 7 comments }

Misconception #1: A Better Chart Starts With… the Chart

Wrong. It starts by asking yourself if you really need one. Perhaps a statistical measure of some sort is good enough, perhaps you should use a table. If your job is to find patterns in a data set and build shared knowledge about it, what really matters is how efficiently the message is sent, and how efficiently it is received by the audience (two different things).

Misconception #2: You Should Master the (Technological) Tools of the Trade

No, you don’t. Just because you know how to create a chart in Excel it doesn’t mean that you know how to create a chart. If you use Microsoft Excel as your charting software then yes, you should learn more Excel (to spend more time with the kids). But you must go beyond technology, or else you end up creating some very stupid charts. Please note that a vast majority of Excel training courses will not teach you what it should (best practices). It will only tell you how to make “cool” graphs, like a 3D exploded pie chart.

Misconception #3: Defaults are good enough

They aren’t. Each chart must be tailored to the specific data set, audience and message. For instance, try to create a graph that clearly displays a large number of series and you’ll fail if you use the defaults (but can do it with clever color coding). And if you use recognizable defaults, like the Excel 2003 charts, you’ll look very, very, lazy (at best).

Misconception #4: Vendors obviously implement the very best templates

(I’ve heard this one recently, and I found it so incredibly naive that I had to write about it.) They don’t. About 90% of the Excel 2003 chart gallery is junk, and you must heavily reformat the remaining 10% to get something useful. Select other tools, like Crystal Xcelsius and the scenario is even worse. And I am unable to create in Cognos something that remotely resembles a chart (people tell me that version 8.4 is a little better).

Misconception #5: Better charts are just “prettier” charts

I hear this all the time. A good chart may look “prettier”, but that’s just an unintended consequence of a design that communicates better. In information visualization, prettiness must be a by-product of function. The very concept of a “better communicator” is sometimes difficult to comprehend, and trying to explain it is a waste of time, because people need to see it in action. You must take the user by the hand and guide him/her. You must force comparisons: “what can you learn about x using this chart?” and “what can you learn about x using that chart?” “how long did it take you to learn x using this and using that?”.

Misconception #6: It’s All About the Wow Factor

It is not. Many marketers and graphic designers fail to understand this. Marketers are hopeless in their relentless search for the wow factor and the eye-catching, “professional-looking” graphs, and graphic designers should know better, but they prefer to sacrifice data on the altar of Beauty (form is everything, data is a nuisance).

The dominant view among visualization experts (namely Tufte and Few) is that “form follows function“: every ornament in a graph should be eliminated, every object must serve a clear purpose, efficiency should be maximized (labeling series instead of using a legend, for instance). Given the extremely low graphic literacy levels among the general population, this may not always be the best approach.

Misconception #7: A good chart displays the actual values

No. If you label each data point you get a useless table over a useless chart. Labels are not only a distraction but often actually hide patterns in the data. Short labels and annotations can, and should, be used to identify or explain outliers or other interesting data points and circumstances. If your audience expects to see the underlying data then add a link to the table.

Misconception #8: Good Charts Should Be Read at a Glance

No, they don’t. The more complex, the longer it takes. It really doesn’t matter if it takes a second or an hour. What matter is how efficiently the graph  communicates. If a chart takes for ever to be read look for bottlenecks: the series are not easily identifiable, patterns are hidden, demands on the working memory are high, etc.

Misconception #9: The More Detail the Better

What we see as detail can be seen by someone else as clutter. Clutter is the natural child of loss aversion and is is very difficult to remove. If you have 12 competitors your audience will want to see the market share for each of them, even if it doesn’t make any sense. Tufte says “to clarify, add detail”, and yes, 12 competitors in a line chart can be made clear and useful, but you must know how to categorize them and provide a framework to help the user (you can use a large number of categories in a pie chart, for instance).

Misconception #10: It’s All About Selling Your Point, No Nuances

In The three laws of great graphs Seth Godin says that “there is no room for nuance [in a presentation]” and your charts should reflect that. Maybe it is just me, but I hate it when I am not allowed to draw my own conclusions because the data made available by the presenter is too biased towards his/her own points of view. Depending on the situation, a clear path that is supported by a lot of details is much better than a yes/no pie chart.

Misconception #11: You Have to Have Color, Lots of Color

Wrong. Color is a very difficult subject. Large surfaces of primary colors like we often see in presentations should be avoided because they are hard on the eyes and, because everything stands out, nothing stand out. A good option is to use grays for non-data elements like grid lines, and pale colors for color-coding. As a rule of thumb, color should always carry some meaning. Use primary colors to highlight a data point or some other small detail.

Misconception #12: A Single Chart is Enough

It is not. We live in an increasingly complex world, and traditional charts are very simple tools. While we wait for a new set of charts to be invented, we can use interaction (see below) and multiple charts to create a richer picture. That’s why scatter plot matrices, small multiples or trellis displays, and specially those multiple variations of executive dashboards are much more powerful than a simple chart.

Misconception #13: Charts Are Interchangeable

They aren’t. You can use a column chart or a line chart to display a time series, but while a line chart performs better than a column chart when reading trends, it is easier to compare data points using a column chart. Most visualization experts will tell you that you should use a bar chart instead of a pie chart (also because it is easier to compare data points), but a pie chart gives you the perception of a whole that is absent in a bar chart. Every graph has its own strengths, and you should select the one that suits your needs.

Misconception #14: Create It and Forget It

Don’t. Making sense of your data is a process of exploration and discovery. A pattern in a subset may be hidden by a noisy background. Different measures may lead to more complex insights. Creating a chart that the user can interact with should always be your primary goal. Unfortunately, that’s beyond the skills of an intermediate Excel user (if you what to learn about interactive charts my Excel dashboards may be a good starting point).

*

This post lists 14 widespread misconceptions about charts, but probably is a very incomplete list and you may not agree with all of them. What misconceptions would you add/remove?

*

[Update: Jon has been writing extensively about Excel 2003 and Excel 2007 (by the way, it's a great resource that helps us to see through the marketing noise). I said in the comments below that I prefer to use Excel 2007 charts to post images in this blog. He doesn't agree and he tries to prove in his last post that charts in Excel 2003 are actually better. He uses good examples to prove his point but I still believe that this (Excel 2007):

looks better than this (Excel 2003):

Yes, probably there is an "overaggressive anti-aliasing", but the line in Excel 2003 is too "crispy" for my taste. Again, it is just a matter of creating images for a blog, not exactly for serious work...]

{ 22 comments }

You can add  silly 3D effects to a pie chart, you can explode all the slices, you can compare multiple pie charts, you can use a legend instead of labeling the slices directly. This will probably render your graph useless, and make you look kind of dumb, but it is not the end of the world-as-we-know-it. But when making a pie chart there is something that you should never ever do, a capital sin that will make you burn in the hell of information visualization: using more than one variable in a single graph.

Well, since we are witnessing the end of the world-as-we-know-it, computer scientists at the University of Utah decided to give a little push, visualization-wise. They are designing a computer application “they hope eventually will allow news reporters and citizens to easily, interactively and visually [analyze] election results, political opinion polls or other surveys”. They boldly state that they “have developed new techniques for exposing complex relationships that are not obvious by usual methods of statistical analysis” (press release). And what are those new techniques? A doughnut chart:

The outer ring labels the series and the inner ring displays the data. Apparently you may add as many series as you wish and you can filter the results by socio-demographic characteristics. There is a video demonstration here [via].

This is the kind of joke that I would expect to be related to April Fool’s Day, but they seem to be serious about it. No one told them that showing part-of-a-whole is one of the few strenghts of circular charts, that when people see 52,7% they see a pie cut in half, not a quarter, that “whole” mean 100%, not 200% or 300%.

Regular readers know that I rarely utter such harsh comments on visualization ideas and applications (I even tried to create a dashboard using Crystal Xcelsius), but this is the stupidest idea of the year. They should know better (here are some tips).

By the way, I found this through a post by Sarah Perez at ReadWriteWeb. She writes: “unfortunately, the poll-analysis software isn’t quite ready for prime time. What a tease!” Fortunately, it is not! And judging from other posts, they could use an information visualization consultant. 

Well, perhaps I’m missing something. Am I?

{ 9 comments }

I really dislike stacked bar charts… Let’s see a new bad example.

Steve Rubel shares with us how he spent his time online over the last two weeks. He uses the Firefox add-on PageAddict to monitor the time. He writes:

As you can see almost all of my time online is work related. Still I can see that I need to shrink my social network usage a little bit and increase my time with documents, web applications while also keeping RSS contained. I also need to go through the undefined section to see if there are big groups of sites that can be tagged.

I would say that it is hard to see what Steve Rubel wants us to see. It is not his fault, of course, he is just pasting a chart from the application. I do see something interesting: since he spends “the vast majority of [his] computing time” using Firefox (let’s say 80%) he’s using his computer around 4.5 hours a day only. That’s nice… :)

If I wanted to visually track my time online (I should…) these are some of the options I’d like to have:

  • Color-code work/non-work related categories;
  • Label the x axis with dates, not “days ago”;
  • Remove non-working days;
  • Use small-multiples to track each category;
  • Use weeks instead of days;
  • Annotate outliers;
  • Show planned vs. actual time spending;
  • Minimize the “undefined” category;

I’m installing the add-on. Hope I can have an interesting dataset to share by the end of October.

Do you use these tools? Do you like their reporting functionalities?

{ 6 comments }

{ 2 comments }