Posts tagged as:

Stephen Few

Stephen Few left a comment in my post “Is Data Visualization Useful? You’ll Have to Prove it“. We all have much to learn with Steve, so instead of leaving the discussion buried in an old post, I thought it would be interesting to make it more visible. Please read the comment then come here and join the discussion. Here is my answer.

Steve, sorry if I sound provocative, that’s not my intention. You are the leading expert in data visualization for business, you are doing a remarkable work with your books, with your blog, with your forum, with your patience to answer posts like mine. I have to be  thankful for that. And I do agree with 95% of what you write. But you don’t want to be surrounded by people who fully agree with you, do you?

The Effectiveness of Data Visualization

You say “the effectiveness of data visualization is well established by a large body of empirical evidence”. I want to believe that too. However in this study Jarvenpaa writes:

“Graphical charts are generally thought to be a superior reporting technique compared to more traditional tabular representations in organizational decision making. The experimental literature, however, demonstrates only partial support for this hypothesis.”

And J.-A. Mayer adds:

“This study refutes the general superiority of visual information in improving the decision quality (‘naive superiority hypothesis’). The choice and design of visual presentation is determined by information structure, decision environment, the decision-maker and the task decision. (…) The successful use of visual information depends substantially on its acceptance by the manager and the environment.”

What do these authors tell us? First, we cannot be 100% sure about the effectiveness of data visualization. Second, there are many other variables at play. And third, managers must accept it. This is a critical factor. Managers love impression management, and making a good impression using the dreaded “professional-looking charts” is the path of least resistance.

Data Visualization Success Stories

I have no doubts that you could share with us many success stories. When I write about an “admission of impotence” I am not questioning your ability to create/lead/mentor successful data visualization projects. But if you want to use those projects to inspire the average person I think you’ll fail most of the time, unfortunately.

Let me tell you how the layman looks like in my part of the world. He makes charts like this:

He believes that a 3D pie chart “looks more precise” and he doesn’t know that Excel chart defaults can be changed (more advanced laymen are able to switch to more “impactful” colors like reds, yellows and bright greens). In my part of the world, a layman doesn’t even know what “data visualization” is about (and they don’t even care). (Here are some more profiles.)

If you are preaching to the choir your conversion rate may be high. But the layman is not easily impressed. You must convert one at a time, and that’s something many of us can’t afford. Can you? He’ll keep making those pie charts because that’s what his manager requires him to do, he doesn’t know better, he’s lazy or you fail to convince him of a causality effect between better charts and better results.

The Layman Must Like Your Charts

In a business environment, charts don’t have to be memorable, only results do. But if you want to change behaviors, your audience must like the new behavior and accept the unavoidable pain. Likable charts help conversion.

You say “I do not discount people’s emotions”. I don’t see it, I’m sorry. The way I see it, you sacrifice everything to the altar of “chart effectiveness”. I don’t find a single one of your charts where the use of color is not purely functional. You say “you should support your claim with concrete examples”. I do have lots of examples: all your charts!

Let me reemphasized this: I agree with you. Chart effectiveness is what we should aim at. But I’m part of the choir. I’m not the layman. I don’t use pie charts.

Pie Charts Again

Unlike most people, I don’t think pie chart addiction is a disease. It is a symptom of a much more serious problem: low numeracy and poor data management skills. Address this problem and pie charts will virtually disappear.

How do you address this problem? “I don’t use pie charts, and I strongly recommend that you abandon them as well.” Researchers like Ian Spence and Stephen Kosslyn don’t think pie charts are as bad as you paint them. Even if they are, it’s very hard to talk people out of an addiction with purely rational arguments.

Perhaps this is my European soul speaking, but I do prefer a gradual approach (“this is acceptable, for the time being”) whereby people (hopefully) start to develop a sensibility to the perceptual issues.

By the way, how come we keep telling people that charts are about trends and patterns, not about the precise figures and then we argue that pie charts are bad because we can’t tell the difference between a 13% slice and a 14% slice? It doesn’t make sense (I’m exaggerating).

We must find more compelling arguments. I don’t like pie charts just because they are a waste of space (low data density) and can only answer very basic questions, better answered using a table. These arguments are good enough for me. I don’t care if we humans are bad at calculating areas and angles. That’s an academic argument that is irrelevant in the real world (I’m being provocative now…).

To Sum Up

You have  a very consistent approach to data visualization and you practice what you preach. You believe that you can convince people using rational arguments.

Mine is a much more comfortable place. I know that eye-candy is a can of worms that shouldn’t be opened. I know that we should protect the layman from himself. I know that simple rules with no exceptions work better than complex rules no one bothers to learn or understand.

But I like the gray areas. I like to protect the poor and the oppressed pies and I try to find their small role in the world of data visualization. The same with eye-candy. The same with emotions. The right amount can get your foot in the door. What is “the right amount”? I don’t know. I’m still searching.

{ 9 comments }

Great data visualization is hard to measure: you can’t prove you have a good chart. Unless you can convince your employer to deploy at least two different formats/layouts and are able to compare results, you can say “this is a good chart” but that’s an act of faith, not an act of science.

It’s True Because It Rhymes

Information visualization experts like to evaluate a chart based on its compliance to some more or less accepted standards (Tufte’s data-ink ratio, for example). That’s like saying “it must be true because it rhymes”: the truth is defined by the language itself, not by the real world. Now, please close the curtains of our ivory tower…

I know, it’s not easy to assess the efficiency and effectiveness of good displays. They look natural and obvious, undeserving of praise and, probably, boring and uninspiring. Compare these charts:

Bubble charts

This is a true story: users wanted to evaluate sales territories, one at a time. Color-coding each bubble (Example A) was pointless, while Example B provided context without distractions. Guess what chart they would choose if they were allowed to… (happy ending: they reluctantly accepted Example B). (A word of advice: if you are looking for a promotion, a kindergarten chart variety always outperforms a “serious” chart.)

If your chart is doing a good job at helping people, no one will actually be aware of the chart’s role at making sense of the data. That’s why it is so hard to find good examples of data visualization using standard charts. If people actually like them, they like them because of their usability and/or interactive features.

When Stephen Few asks the readers “true stories about the benefits of data visualization” that’s almost an admission of impotence.  He should have hundreds if not thousands of good examples to share with us, right? Well, I know there are many examples out there, but I can give you none, sorry. Is data visualization some kind of astrology? I know it works. Why? Because I have faith. (On second thought, he is not asking for good data visualization examples. It really doesn’t matter if you use Tableau or Xcelsius, and that’s a relief.)

Opening the Pandora Box

Ultimately, what makes a good chart is how it resonates with your audience. Assuming that your are not unethically distorting the data, a chart that forces people to act is better than another one that only makes people aware of the subject.

If a single chart can save the world, it will not be a Few’s or Tufte’s 100% compliant chart. It will be a glossy Xcelsius pie chart.

(Wow, that’s depressing…)

If you read this blog that’s a clear sign of intelligence and sophistication :) . Unfortunately, you are not representative of the typical data visualization user and/or producer. The real world loves pie charts and doesn’t understand scatter plots.

Here is my Pandora box: give the audience what it expects and understands, even if that hurts your data visualization soul (OK, give it 90% of what it expects and use the remaining 10% to educate it.)

Cultural Relativism? Not So Fast.

Please don’t misrepresent these arguments. I’m not saying that all charts are born equal. There is a reference point and some misconceptions should be avoided A chart that maximizes insights, removes clutter, uses color wisely and clearly shows the patterns hidden in vast amounts of data, that’s probably a good chart and that’s what you should aim for. And yes, you should avoid pie charts.

If you present some sophisticated charts to your unsophisticated audience you’ll lose it. Relax. Draw a line but don’t forget the candies. You can take a horse to the water, but you can’t make him drink, unless you give him some sugar cubes…

{ 13 comments }

I have a confession to make: my past is paved with chart-making sins, including some capital ones (yes, 3D pie charts, too). But years ago I saw the light in Edward Tufte’s The Visual Display of Quantitative Information and since then I’ve been avoiding eye-candy temptations. Now I do my best to pursuit the path of data visualization virtue.

Every God Has His Moses: Edward Tufte and Stephen Few

Some time after that first revelation, I stumbled on Stephen Few’s Show Me the Numbers and I though: “wow, Tufte for business!”. As a father of twins, I know that good things come in pairs, and now I had two great role models to help my recent conversion.

Or should I say one and a half?

Edward Tufte and Stephen Few are often cited together, as if they were a single entity. For many of us, simple mortals, Stephen Few is some kind of translator of God’s voice. Given Few’s background, that wouldn’t be completely inappropriate…

For some time that’s how I looked at Few’s work on charts and data visualization. But I was wrong. They do share similar views about basic data visualization principles. And they seem to share the same level of stubbornness, too. But there is a major difference.

Tufte, the Artist vs. Few, the Engineer

Tufte is an artist. His data visualization principles derive from Ludwig Mies van der Rohe’s minimalism, and in that sense, he approaches charts from an aesthetic point of view. His charts are as beautiful as a chart can be, if you happen to like the aesthetic minimalism.

I don’t know how and when Few became aware of the need for better data visualization. But he embraced Tufte’s principles not because he is an aesthete like Tufte, but because he values efficiency and those principles happen to improve it.

Stephen Few would never title a book “Beautiful Evidence”. He doesn’t mind to use Excel to create his chart examples, while Tufte needs full control of details like kerning (and he uses a designer’s tool, Adobe’s Illustrator).

On the other hand, Tufte would never write a book about dashboards (Beautiful Dashboards? brrrr…). From an actionable, business visualization point of view, Tufte is The Visual Display… Almost everything else is beautiful, yes, and perfect for the coffee table.

And while Tufte escaped Flatland for good, Few still keeps both feet firmly on the ground, discussing BI tools, pie charts or irregular time series (and I don’t think his new book changes that).

The Need for a New Business Visualization Model: the Emotional Link

Both approaches are very consistent and they give you a set of guidelines that you can apply to all your charts and adopt as a general framework.

What I am not comfortable with is their positivist attitude, specially in Few. Because Tufte’s charts are aesthetically pleasing, we can derive some emotion from that. In Few’s case, his charts are purely functional.

I still don’t know where to draw the line between purely rational/functional visualizations and the eye-candy. Let’s see this pattern:

Boy meets girl, boy gets girl, boy loses girl, boy gets girl back.

Do you feel emotionally overwhelmed? No? Do you even care about the story? Do you even care about the boy and the girl? Let’s try again:

John fell in love with Anna the moment she spilled coffee on his shirt.

This sounds much more interesting. Add three more sentences and you’ll complete the boy-meets-girl pattern. Both versions share the same pattern, but the second one adds some (perhaps irrelevant) detail and creates an emotional link between the audience and the characters.

You need that in data visualization, too. You don’t have to cry because you chart shows a market share drop in Alaska, but you must connect with the reality behind the chart and the data.

The Need for a New Business Visualization Model: Interaction

Jacques Bertin says that knowledge is built by the user when interacting with the chart. Why interaction (and animation) is absent from Tufte’s and Few’s books is something I don’t really understand.

Although I respect Tufte and Few, I feel that there are pieces missing in their theories. We can borrow some pieces from Bertin’s work (and Tukey’s?) and that will surely help, but the real issue here is to find the balance between the need to correctly (bureaucratically?) display the data and the emotional response that helps to keep the audience interested.

Back to you, a very simple question: what are Tufte and/or few missing? What pieces do we need for a XXI century visualization?

Photo credits: ~L. and David Zellaby.

{ 19 comments }

If you are a market researcher, and you want to make sure that you get more reliable results for a subgroup in a survey, what do you do? You must increase the overall sample size (and spend a lot of money), right?

Actually, you don’t.

You can oversample that group only, and then weight it down to its known proportion in the population. For example, you may want to increase the number of managers and decrease the number of housewives (because the former are usually more heterogeneous than the latter). Oversampling is a common research method, and a very cost-effective way to get precise estimates for a subgroup.

This is a real-world solution, and if we have finite resources to solve a real-world problem, resource allocation must be part of the equation. Higher variability usually demands for more resources.

Why is this relevant in a blog about charts and information visualization? Glad you ask.

The Great Irregular Interval Debate

Let me give you an example. A while back, Jon Peltier wrote in his blog:

I don’t understand the obsession with an equal date interval. A line chart need not show the trend of only evenly-spaced data. Suppose I am observing temperatures, and I decide for simplicity that where the temperature hasn’t changed, or where it has been changing steadily, I do not need to record every value. Overnight after the temperature has dropped, I can characterize my temperature profile with one point per hour. As the sun rises, I may need more frequent recordings to capture the morning warm up. Then the clouds blow over, it starts to rain, then it clears up again; I may need minute-by-minute data points to track this. When I make my plot, is it any less relevant because the spacing of the data ranges from minutes to hours?

This is oversampling, and a wise resource allocation, too. In a survey, you weight the subgroup down to its right proportion, and that’s also what you do in a chart, when irregular date intervals are displayed proportionally.

Stephen Few disagrees:

Using a line to connect values along unequal intervals of time or to connect intervals that are not adjacent in time is misleading.

Furthermore:

How could we trust graphical representations of time series or frequency distributions if their shapes could have been altered by inconsistently manipulating the sizes of intervals along the scale, either arbitrarily or intentionally to deceive? We can derive meaning from patterns and trends that these graphs display only if the intervals are consistent.

wrong-line-chartHe exemplifies his argument with these two charts (actually, there are three, but we can safely disregard the third one).

The first chart displays the correct annual sales. The second one displays arbitrarily grouped annual sales and, obviously, its pattern is quite different.

Now, the second chart is plain wrong, so I am not sure if you can use it to argue against unequal intervals.

corrected-line-chart

Let’s use a fairer example with the same dataset and the same arbitrary grouping.

Compare the orange line with Few’s first chart. I actually don’t see much difference. Sure you lose a lot of detail, but the basic pattern is there. Instead of sums, I am using averages (you can’t compare a single year with the total sales of three or four years).

The other two lines show the difference between equal and unequal intervals. The brown line displays the data points unequally spaced while the gray one uses equal intervals (Few’s second chart). I had to make some assumptions regarding the reference date, so this is not the best example, but it is good enough to show the potential risk of using equal intervals with unequal intervals of time.

Bottom line, oversampling is a useful method for better resource allocation. We can view irregular time series as some sort of oversampling, provided there are no missing values and irregular intervals in the chart are consistent with intervals in the time series.

Grouping data points is always a tricky issue, and Stephen Few show it clearly, but we shouldn’t infer that “line graphs and irregular intervals is an incompatible partnership.”

(When using time series in Excel, make sure that category axis labels are recognized as dates. Alternatively, use a scatter plot with connected data points.)

{ 8 comments }

Stephen Few shares with us his capstone presentation that he delivered at InfoVis 2007. If you follow his newsletter or his blog (you should) there is nothing really new but, if you don’t, this is a good summary of his views regarding information visualization.

I’d like to comment a few points.

Knowing how to use Excel or some other software that can be used to analyze data is not the same as knowing how to make sense of data. (p. 9)

I strongly agree with this. It is not about charts (it never is): it is about data analysis and communication/presentation skills. If you don’t know how to do something in Excel, just ask. But you must know what to do with the data. This means higher literacy (numeracy/graphicacy), something that you shouldn’t expect to get from the (mass market) software industry. Some months ago I contributed to a discussion about this problem in Few’s discussion board.

Despite the primitive nature of Excel’s visual analysis and charting functionality, it is used more than any other product to make sense of data and, in combination with PowerPoint, to present data to others. Almost everyone who takes my table and graph design course wants to know, more than anything else, how to apply the data presentation principles that I teach to Excel. (p. 37)

Excel is the de facto standard in information visualization for the masses. We should start from here, showing how to apply general principles and best practices in Excel, because only a very small fraction of office users will ever be able to use another, more sophisticated, charting tool. They need to see how these principles impact their lives. They want to seat at their desks and apply these principles immediately.

I’ve found consistently in my work that, when people are shown effective alternatives to the bad visualizations that are common and familiar, they easily recognize the difference. We need to combat the bad visualizations that dominate the market by exposing people to visualizations that really work. (p. 42)

This is the only way. Principles and best practices don’t matter if people can’t see how/why they work. If someone tells me how “prettier” are my charts (than the standard charts they see…) it means that I failed somewhere and I have to start all over again. Charts, and data analysis in general, are about insights, about return on investment, about more bang for the buck. Beauty is a by-product of function.

Not all displays, however, require the high resolution of the printed page. And something you can’t do with the printed page is interact with the data, which is critical to data exploration and analysis. (p. 61)

Jacques Bertin wrote something like this 40 years ago. A chart must allow some level of interaction because knowledge is constructed by the chart user, not by the chart designer.

Tufte’shas locked himself out of much of the fine work that has been done in our field because of his uncompromising prejudices, which has cause his relevance to our work to decline. (p. 61)

I understand this, but probably his views still hold value because of his “uncompromising prejudices” and, let me tell you, I like them. Tufte is the best starting point to understand information display. At some point you must leave him because he has no answers to our real-world questions, but there are some good sources around to fill in the gaps. What I really, really don’t like is the “Tufte would be proud” / “what would Tuft say” syndrome.

There are some points that I view differently. I wouldn’t remove pie charts from our tool set. I think they have a minor role in information visualization, they are more a “design device” than a real chart but I like to use them in very specific situations. Also, I am not sure if B&W dashboards can be an alternative to all those bad examples that Few loves to hate. I believe we can have a functional use of color that can be attractive to the non-initiated (we’ll discuss this one of these days).

We need a coherent and real world problem solving perspective of information visualization in the corporate sector. Do you think Few has the answer?

{ 0 comments }

Well, I must say I am a bit disappointed with the September issue of Stephen Few’s Visual Business Intelligence Newsletter. It discusses an important but much neglected topic, visualizing change through animation. Few’s paper was written for SAS Institute, and uses JMP, a statistical analysis product from them. From the screenshots, I wouldn’t say I am overly impressed. Animation requires interaction, and the only available interaction is the same that you get from any player (go, stop, step…). There seems to be no interaction with the data.[Update: this post discusses the paper, not the software, but you can take a look at the online demos here.]

Actually, Few discusses the patterns of change through time and how magnitude, shape, velocity and direction contribute to those patterns and how that can be displayed in traditional charts like line charts, but if you read it carefully there is not much discussion of animation itself. I guess he doesn’t really like to see “the state of Florida bounc[ing] around the bubble plot as time passes”. And who can blame him? The “trails” feature gives him the opportunity to return to a more stable ground in the form of “a single image”. It plots the entire time series in a single image, like the traditional line charts do, using a sequential range of color where lighter shades refer to the beginning of the series. I would say that Few tested the waters of animation and was not convinced.

I exemplified animation in a previous post on visualization of demographic information. As I wrote in that post, animation is useful only if a global pattern and perhaps sub-patterns emerge from the trend of multiple data points/series, a pattern that would be harder to spot if we had to browse over multiple charts. A global trend and some meaningful outliers can be seen for example in Hans Rosling’s presentation. From my point of view, it doesn’t really matters if “the state of Florida bounce around the bubble plot as time passes”. What matters is: how does Florida contribute to the overall pattern (assuming there is one)? Is it a well-behaved state? Is it an outlier? We don’t need animation to see how Florida changes, just like we don’t need a static chart to plot a single data point.

Let me quote the last paragraph:

“The stories that time-series data have to tell are often rich and important. They are much too important to remain unknown simply because we lack tools that can bring them to light.”

We do lack interactive tools. But for simple animations we can use Excel, as you can see in this draft example, and I am sure the Excel virtuoso Jon Peltier could come up with a great add-in…

{ 5 comments }