August 13, 2016 by jerome on d3, data visualization, javascript, react, tips

Beyond rendering

This is the 5th post in my Visualization with React series. Previous post: React components

The lifecycle functions

I’m not going to go into great details on this, but a talk on React without mentioning the lifecycle functions would not be complete.
React components come with several functions which are fired when certain events occur, such as when the component is first created (‘mounts’), when it’s updated or when it’s removed (‘unmounts’).
Pure functional components, which we’ve been mostly using, don’t have lifecycle functions.
But components with a state can have them.

Some examples of usage of those lifecycle functions include:

Before the component is rendered, you can load data. that’s a job for ‘componentWillMount’.
After a component is rendered, you can animate it, or add an event listener. Use ‘componentDidMount’.
Prevent a component from rendering under certain circumstances, even if it receives new properties or its state changes. Use ‘shouldComponentUpdate’.
After a component receives new props or new state, you can trigger another function before the component updates (‘componentWillUpdate’) or right after (‘componentDidUpdate’).
When a component is going to be removed, you can do some cleanups, like deleting event listeners. Use ‘componentWillUnmount’.

Oftentimes, you can simply get by by using the default behavior of React components, which re-render only when they receive different properties or when their state changes. But it can be really convenient to have that extra degree of control.

Here is an example of using these lifecycle functions in context.

See the Pen lifecycle functions example by Jerome Cukier (@jckr) on CodePen.

What is going on there?
We’re adapting our earlier scatterplot example, only this time, we are not using pure functional components (which don’t have those lifecycle functions), but creating classes.
We’re going to have three classes: Chart, at the highest level; Scatterplot, a child of Chart; and Points.
Chart passes data to Scatterplot. What it passes depends on whether the button is clicked. That button changes the state of Chart (which causes a rerendering of the Scatterplot and the Point elements).
Chart also has a private variable that holds a message we can display on top. We can still use callback functions to change this variable, just like we change the state, but the difference between changing the state and changing a private variable is that changing a private variable doesn’t cause the children to re-render.
When we first create the scatterplot element, the componentDidMount function is called, and the message is changed to reflect that.
Then, each time we click the button, a different data property is passed to the Scatterplot element. Also, the componentDidUpdate method is triggered, which changes the message.
(changing the state of the parent from a componentDidUpdate method can cause an endless re-rendering loop, this is why I used private variables instead of the state, and there are ways to address this but for the sake of brevity this is the easiest way to deal with that problem).
Now, when a full dataset is passed to the Scatterplot element, many Point elements will be created. I’ve also added a lifecycle method to these Points: when they are first created, they receive a small animation. To that end, I have also used the componentDidMount method, but this time at the Point level.
Exit animations are also possible, but – full disclosure – they are less easy to implement in React than entry animations, or than in D3. So again in the interest of concision I’ll skip these for now.

React and D3

We just saw that with React, we can create a DOM element, then immediately after, call a function to do whatever we want, such as manipulating that element. That function would have access to all the properties and state of that React element.
So what prevents us from combining React and D3? Nothing!
We can create components that are, essentially, an SVG element, then use componentDidMount to perform D3 magic on that element.
Here’s an example:

See the Pen mixing react and d3 by Jerome Cukier (@jckr) on CodePen.

In that example, I have used a bona fide bl.ocks (https://bl.ocks.org/mbostock/7881887) and wrapped it inside a React component. So I can create one, or in the case of that example, several such elements by just passing properties. Those components can perfectly function as black boxes: we give them properties, they give us visualizations that correspond to these parameters. And it doesn’t have to be D3 – once a React element has been created, we can use componentDidMount to do all kinds of operations on it.

In the last 2 articles I will present actual data visualization web apps made with React.
In the next post, we’ll see how to set up a simple web app and we’ll create our first example app.

August 10, 2016 by jerome on d3, data visualization, javascript, react, tips

Coding with React

This article is part of my series Visualization with React. Previous article: An ES6 primer

Setting things up in Codepen

In the last two articles of the series, I’ll cover how to get a real React coding environment going, but to start dabbling, playgrounds like jsFiddle or codepen are great. I’m partial to codepen. When you create a new pen, you still have a couple of options to set up before you can start creating React code:

In the Settings / Javascript / quick-add section (the drop-down at the bottom left) please choose React, then React DOM.

All of the code examples of articles 1, 3, 4, and 5 can be found in this codepen collection.

Creating elements

See the Pen React simple scatterplot by Jerome Cukier (@jckr) on CodePen.

In this first React example, we’re going to create a couple of very simple elements and render them. Let’s start by the end: ReactDOM.render(myDiv, document.querySelector('#root')); We use ReactDOM.render to, well, render something we have created (myDiv) somewhere in our document (on top of what corresponds to the ‘#root’ selection. We conveniently have a div with the id “root” in the HTML part of the pen). That’s it! we’ve output something using React. While the syntax can appear a bit daunting, it really does one simple thing: take what you’ve made and put it where it should be. But what’s that myDiv? To find out, let’s look at the first 2 lines of our code. const mySpan = React.createElement('span', {style: {fontFamily: 'sans-serif'}}, 'hello React world'); const myDiv = React.createElement('div', {className: 'my-div'}, mySpan); Oh, so before there was a myDiv, there was a mySpan. MySpan is a React element, the building brick of the React eco system. To create it, we use React.createElement which is the workhorse of React. React.createElement takes three arguments: the type of React element we are creating, its properties, and its content. The type of element can be any HTML or SVG element, and we’ll see later that we can also make our own. The second argument is the properties. It’s an object. In the d3 world, the properties could be what goes in the attr method. So when you create an SVG element like a rect, its properties could include things like x, y, width and height. In d3, style is treated slightly differently. This is also the case in React. When using React.createElement with an HTML or SVG element, that could be styled using CSS, you can use a style property to pass a style object. That style object contains all CSS properties you want to apply to the object, but instead of hyphenating them, they are written in camel case (so font-family, for instance, becomes fontFamily). The third argument is content: it can either be a string, a single React element, or an array of React elements. In the first line (mySpan) we’ve used a string. So, this first line created a React element which is a span, which contains “hello React world”, and which has a simple style applied to it. In the second line, we create a second React element. Again, React.createElement takes three arguments: type of element (now it’s an HTML div), properties and content. Instead of providing a string, we can pass another React element, such as mySpan that we created above. And that’s it! we’ve rendered something using react.

Creating elements from data

In the example above, we’ve used React.createElement with two kind of content: a string and another React element. But I mentioned that there was a third possibility: an array of React elements. If you’re familiar with d3, you might think: in d3, I could do that from an array of data. The idea in React is pretty similar, only, instead of using select / selectAll / data / enter / append, we can just create our array.

See the Pen React simple scatterplot by Jerome Cukier (@jckr) on CodePen.

One simple way to do that is to use map: myArray.map(d => React.createElement(...)). This is exactly what we are doing in the snippet of code above. This uses the birthdeathrates dataset, one of the example files provided with R (number of births and deaths per thousand people per year in various countries.) The interestingness happens here:

birthdeathrates.map(
  (d, i) => React.createElement('div', {
    'key': i,
    'style': {
       background: '#222',
       borderRadius: 5,
       height: 10,
       left: 5 * d.birth,
       top: 300 - 5 * d.death,
       position: 'absolute',
       width: 10,
       opacity: .2}
  })
)

In the properties that I pass, some depend on the underlying data. This is a mapping so it’s going to return something for each item of the array. Each item of the array is represented by d, and has the birth, death and country properties. In left and top, we use a calculation based on these properties. And for each item, we get a React element created with these calculations. This isn’t unlike what we’d had in d3 if we had written:

 ...
 .selectAll('div')
 .data(birthdeathrates)
 .enter()
 .append('div')
 .style({ ..., left: d => 5 * d.birth, top: d => 300 - 5 * d.death, ...})

Take note of the key property in the react code. This is necessary when you create many elements using map (well not really necessary but strongly recommended, you’d get a warning if you don’t use it). This is used so that if for some reason your parent element has to re-render, each child element will only be re-rendered if needed. If you’ve followed this far, you are now capable of doing things in React the critical part of what you were doing with d3: creating elements out of data. You might wonder: but does it work for svg? yes, and the logic is exactly the same:

See the Pen React simple scatterplot by Jerome Cukier (@jckr) on CodePen.

Introducing JSX

See the Pen React simple scatterplot JSX by Jerome Cukier (@jckr) on CodePen.

At this point you might think: all of this is great, but typing React.createElement all the time is kind of cumbersome. Many people do too, and there are a number of ways to not do that. The most popular, and the one in use at Facebook, is JSX. I personally use r-dom most of the time, but since JSX is definitely the most common way to write React, now that you have a feel for what React.createElement does, it’s not unreasonable to continue with JSX.

The main idea of JSX is that you’ll write tags in your javascript. Instead of writing:

React.createElement('type', {property1: value1, property2: value2, ...}, content)

you would write:

<type property1={value1} property2={value2} ...>
content
</type>

So in our previous example, we create an SVG element that has a width and a height property.
We’d write:

const svg = <svg height={300} width={300}>
//... content ...
</svg>

Inside the opening tag, we list all the properties and we give them a value. This isn’t unlike what you’d see when you watch the source of an HTML file.
The value of the properties go inside curly braces, unless they are a string. For length-type values (ie height, width, top, left, font-size…) if a number is provided, it’s assumed it’s in px.
If the element we create has other elements inside of it, there has to be an opening and a closing tag (ie <element> and </element>). But if there isn’t, there can be a single tag (ie <element />)
Inside our svg, we have a bunch of circle elements. They are of the form:

<circle 
  cx={5 * d.birth}
  cy={300 - 5 * d.death}
  key={i}
  r={5}
  style={{
    fill: '#222',
    opacity: .2
  }}
/>

inside our curly braces, we can have calculations. And since there are no elements inside our circles, we don’t have to have both a <cirlce> and a </circle> tag, although we could; we can have a single <circle /> tag.

That’s it in a nutshell, more details can be found here.
Now you might think: if I type that in my text editor, I’m almost certain I’ll get an error! Indeed, behind the scenes, there has to be some transformation for your browser to understand JSX. This magical operation is called transpilation. Transpilation is just what you’d hope it is, it turns code written how you like into code that the browser can interpret flawlessly. The flip side is that… well, you have to transpile your code.
If you work into environments like codepen or JsFiddle, they can take care of this for you. The most common transpiler is Babel. Babel turns your code into ES2015 compliant javascript (that is, before ES6). So the added bonus is that we can use any ES6 feature without wondering whether the user browser supports it or not. Most ES6 features are supported by the latest versions of many browswers, but there’s no guarantee that your users will run your favorite browser, let alone its latest version.
If you want to do this on your own, you’ll have to set up a build environment, we’ll cover this in the last two articles.

August 9, 2016 by jerome on data visualization, javascript

An ES6 primer

This article is part of my series Visualization with React. Previous article: React and D3 – similarities and differences

In this guide, I will use ES6 notation whenever I would normally do it in my day job. There are many, many differences in ES6 compared to previous versions of javascript and that can be daunting. But in this time and age, a lot of code you can run into is going to be written in ES6, so a cursory knowledge of some of its features can go a long way.
Here are the features I use most often:

Fat arrows functions

ES6 introduces a new, more concise way to write functions. Instead of writing:

function add10(d) {return d + 10;}

one can write:

var add10 = d => d + 10

Technically, those two forms are not completely equivalent. The main difference is if you need to use the context keyword this. But in most cases, fat arrows are a much more concise, and therefore more legible syntax.

To do a function which takes more than 1 argument, wrap them in parentheses like so:

(a, b) => a + b

For a function with several statements, you must use curly braces like a regular function, and use return if needs be:

d => {
  console.log(d);
  return d;
}

For a function that returns an object, wrap that object in parentheses to avoid confusing it with the previous form:

d => ({key: d})

const and let

Before ES6, to declare variables, you could only use the var keyword.

var a;
var b = 10;

In ES6, you can declare variables using const or let. When you use const, the variable cannot be reassigned:

 const a; // error - a must be assigned a value

 const b = 1;
b = 2; // error - b can no longer be reassigned

Variables declared with let can be reassigned.

 let a; // ok
a = 10; // ok
a = 20; // still no error

const and let are more expressive than var, because they specify whether we plan to change the value of a variable or not. They have another advantage: they are scoped. They can be used inside of an if branch, or a for loop, and not be accessible outside of it. Conversely, var are not scoped – they are available everywhere within the function where they are declared, or globally if declared outside of a function.
So, there’s no reason to ever use the var keyword in ES6.

Spread operator

In ES6,if you have an array inside another array, you can replace that array with its items in sequence using the spread operator. An example would make this clearer:

const a = [1, ...[2, 3, 4]]; // [1, 2, 3, 4].

That may seem like not much but it’s super versatile and useful. For instance, instead of concatenating two arrays, you can write:

const a = [1, 2];
const b = [3, 4]
const c = [...a, ...b]; // [1, 2, 3, 4]

Interestingly, it works with empty arrays:

 
const a = [1, 2];
const b = [];
const c = [...a, ...b]; // [1, 2]

But not with primitive literals (numbers, etc)

const a = [1, 2];
const b = 3;
const c = [...a, ...b]; // error

Instead of pushing an item to the end of an array, you can also use the spread operator:

const a = [1, 2];
const b = 3;
const c = [...a, b]; // added benefit - creates a new array instead of changing it in place.

Note the difference between these 2.
[…[1,2], 3] will work;
[…[1, 2], …3] will not.

Composing arrays like this can be conditional!

const a = [1, 2];
const b = 3;
const condition = false;
const c = [...a, ...(condition ? [b] : [])] // [1, 2];

What just happened? just like above, but we applied the spread operator to the expression which is in parentheses. If the condition were true, that expression would be equivalent to [b]. But since it’s false, it is equivalent to an empty array.

An experimental feature of ES7 extends the spread operator to objects.

const obj = {key: 'a'};
const newObj = {...obj, value: 1}; // {key: 'a', value: 1}

Destructuring assignment

You can do more stuff with objects in ES6. Destructuring assignment covers a number of nifty features but here are the two I use the most.

const obj = {key: 'a', value: 123};
// you can read several properties from an object at once and assign them to variables:
const {key, value} = obj;
console.log(key, value); // 'a', 123

Before ES6, you had to write a function like this:

function div(num, den) {return num / den;}
div(6, 2); // 3;
div(2, 6); // 0.333333 order of the parameters matters!
div(3); // NaN - can't forget an argument!

In ES6, we can expect the function as argument to take an object and access its properties:

function div({num, den = 1}) {return num / den;}
div({num: 6, den: 2}); // 3
div({den: 2, num: 6}); // still 3
div({num: 3}); // still 3 - den has a default value

Using an object, we don’t have to remember in what order the function parameters have to be provided.

Wrapping it all together

what’s another form for the last function I wrote?

const div = ({num, den = 1}) => num / den;

In the next article in the series, Coding with React, we will finally start to write some React code.

August 9, 2016 by jerome on d3, data visualization, javascript, react, tips

D3 and React – similarities and differences

Previous article: Visualization with React

d3, meet cousin React

Let’s not sugarcoat it: working with React is more complicated than with plain vanilla javascript. There are a few additional notions which may sound complex. But the good news is that if you come from the d3 world, there are many concepts which are similar. Visualization on the web is, at its core, transforming a dataset into a representation. That dataset is probably going to be an array, and at some point the framework will loop through the array and create something for each element of the array. For instance, if you want to make a bar chart, you probably will have at some point an array with one item that corresponds to each bar, which has a value that corresponds to its length. Looping through the array, each item is turned into a rectangle whose height depends on that value. That logic, which is at the core of d3 with selections, is also present in React. In d3, visualizations are essentially hierarchical. We start from an SVG element (probably), add to it elements like groups, which can hold other elements. To continue with our bar chart example, our chart will have several data marks attached to it (the bars), and there’s a hierarchical relation between them: the chart contains those marks. Our chart could be part of a dashboard that has several charts, or part of a web page that has other information. At any rate, in the d3 world, everything we create is added to a root element or to elements we create and arranged in a tree-like hierarchy. This is also the case in React, with a difference: the world of React is mostly structured into components. When that of d3 is mostly made of DOM elements like HTML or SVG elements, that of React is made of components. A React component is a part of your end result. It can be as small as a DOM element, or as big as the whole application. And it can have component children. So, components are logical ways to package your application. Components are created with modularity and reusability in mind. For instance, I can create a bar chart component in React, and the next time I have a bar chart to make, I can just reuse the exact same component. I can also create components to make axes or the individual bars in the chart, which my bar chart component will use. But the next time I need a bar chart, I wouldn’t have to think about it. In contrast, if I want to create another bar chart in d3, I probably have to recreate it from scratch and decide manually what happens to all the SVG or HTML elements that constitute the bar chart.

You say dAYta, I say da-tah

One fundamental way in which d3 and React differ is by how they treat data. If you read about React, you may find fancy terms like one-way data binding. In how many ways the data is bound in d3 is not something which is on top of our head when we create visualization so that might sound confusing. In d3, an element can modify the data which is associated to it. I’ve used it so many times.

See the Pen React simple scatterplot by Jerome Cukier (@jckr) on CodePen.

These bars have been made with a very simple dataset shown below them. If you click on the bars, the underlying data will change and the representation will change as well. In the world of React, when properties are passed to a component, they cannot be changed. That’s what one-way binding means. Data in React flows unidirectionally. Once it’s defined, it determines the way all components should be displayed. In certain events, a parent component can pass different properties to its children component and they may be re-rendered (we’ll see this in just a moment). But the important part is that properties passed to a component can never be changed. You may wonder: why is that even a good thing? React makes it impossible to do things! yes, but in the process it also makes things much safer. If properties can’t be changed, it also means that they can’t be altered by something we hadn’t thought of. Also, it makes it much easier to work on components independently: one team mate could make a bar chart component while her colleague makes a line chart. They don’t really need to know the inner working of the other component, and they know that nothing unexpected will disturb the way their component function.

The state of affairs

Components of React have a really powerful feature which doesn’t explicitly exist in d3: the state. The state represents the current status of a component. Here’s a simple switch component:

See the Pen React simple scatterplot by Jerome Cukier (@jckr) on CodePen.

We want to record the fact that the switch can be turned on or off. Here’s how it should work: First we create our switch component, and it’s supposed to be off. Then, if the user clicks it, it becomes on if it were off and vice-versa. And it should also be redrawn, to reflect that it’s been turned on (or off). The state does all that. It records the current status of the component. Technically, it’s an object with properties, so it can save a lot of information, not just a simple true or false value. The other great thing about the state is that if it changes, then the component is re-rendered. Rendering a component could require rendering children component, and they may be re-rendered as well. So with the state and properties, we have a powerful framework to handle events and whatever may happen to our app. If there’s an event we care about, it can change the state of a component, then, all of the children components may be re-rendered. And there’s no other way to redraw them. This is in contrast with plain d3 where anything goes – anything can change anything, data can be used to render elements, the underlying data can be changed without changing the element, the element can be transformed without changing the underlying data.

State is an important notion in computer science. In d3, it’s just doesn’t have a more formal representation than all the variables in your code.

In the next article in the series, an ES6 Primer, we will look in some useful ES6 features for React.

August 9, 2016 by jerome on d3, data visualization, javascript, react, tips

Visualization with React

Some back story…

In what seemed a lifetime ago now, I wrote tutorials to help people get started with Protovis as I learned it myself (it is indeed a lifetime away since Protovis is dead and buried for all intents and purposes). And a few years ago, I did the same for d3 as this started to become the most powerful visualization framework and for which documentation was still scarce. I wrote these tutorials to help me learn, but since then I have met many people who found them useful which blew my mind. By documenting my learning, I got noticed by Facebook and travelled 9000 miles to a new life.

At Facebook, I had some loosely-defined data visualization explorer role. While I joined during the infancy of React, I didn’t feel super comfortable using it then – I felt more effective writing one-off d3 applications. Eventually, I moved on and am now working at Uber as a fully-fledged data visualization engineer. I work almost exclusively with dashboards, which have a pretty elaborate UI.

Like in many other companies, at Uber, we use React for our web applications, including our visualizations, dashboards and maps. Increasingly, React is becoming the lingua franca of visualization: more than a tool that allows one to draw data, a mindset that informs how one should think a visualization. React is no longer a young library – the initial public release dates back from May 2013, and its very first application at Facebook was visualization (my first Facebook project, pages insights). There’s already many, many learning resources and tutorials for React. What I’ll try to do here is to show how React can be used for visualization: hopefully, this will be useful both for people who come from d3 and who’ve never worked with a web framework before, and for people who are familiar with React but who don’t know visualization well. That won’t be a complete and exhaustive guide, more a way to get started with references on how to go further.

The articles

I’ve structured this guide in 7 parts, and I’ll publish one per day:

React vs D3, where we’ll explore similarities and differences between these two frameworks.
An ES6 primer. I have written all the examples in good, sensible, modern ES6 javascript, because as of 2016 this is probably the most common syntax. Without going too deep into the details, I’ll explain what parts of the language I used for the examples, and how they differ from ES5.
Gentle introduction to coding with React, where we’ll explore the high-level concepts of the framework and see how we can create visual elements from data. We’ll end on a presentation of JSX, a flavor of Javascript used to write React applications, which I’ll also use for most of the examples for the same reasons as ES6 – because it’s the most widespread way of writing React code today.
React components, the most important concept in React and the building blocks of React applications.
Beyond rendering. We’ll look at the React concept of lifecycle methods, and also how we can use d3 within React components.
Creating a React visualization web app – using what we’ve seen, and two libraries – Facebook’s create-react-app and Uber’s React-vis, we’ll create a small standalone React visualization that can be deployed on its own website.
The big leagues – in that last part, we’ll write together a more complex visualization with live data and several components interacting with one another.

The code

The examples of parts 1-5 can be found on codepen. I’ll add link to examples of part 6 and 7 when they go live.

February 7, 2015 by jerome on charts, dashboards, data visualization, tips

Charts, assemble!

From the past posts, you would have gathered that dashboards are tools to solve specific problems. They are also formed from individual charts and data elements.

Selecting information

That dashboards are so specific is great, because the problem that they are designed to solve will help choosing the information that we need and also prioritizing it – two essential tasks in dashboard creation. Again, we don’t want to shove every data point we have.

Another great tool to help us do those two tasks is user research. As a designer, we may think we chose the right metrics, but they have to make sense to real users and resonate with them. The bias that we may have is that we would favor data which is easy to obtain or that makes sense to us, compared to data which can be more elaborate, more sophisticated or more expensive to collect or compute, even if that makes more sense to the user.

Here’s an illustration of that.

When I was working at Facebook on this product, Audience Insights, we designed this page to help marketers understand how a group of users they could be interested in used Facebook. (The link / screenshot showcases fans of the Golden State Warriors). One of the main ways we classified users at Facebook, for internal purposes, is by counting how many days of the last four weeks they have been on Facebook. It’s a metric called L28 and one of the high-level things Facebook knows about everyone. So, we integrated it in the first version of this page. But, even though it’s not a concept unique to Facebook, it wasn’t that useful to our users, and it was taking space from a more relevant indicator.

Instead, we have included indicators which are more relevant to the task at hand (ie getting a sense of who those users are through the prism of their activity). For instance we can see that very few Warriors fans only use Facebook on their computer, compared to the general population of US Facebook users. They tend to skew more towards Android and mobile web (going to www.facebook.com from their phone, versus using an app.) They tend to be more active in terms of likes, comments and shares.

Information hierarchy

Once information is chosen and you get a sense of what is more important than the rest, it’s time to represent that visually.

Here are some of the choices you can make.

Show some metrics on top or bigger than others.

That’s probably the first thing that comes to mind when thinking hierarchy and prioritization. And it needs to be done! Typically, you should get one to three variables that really represent the most important thing you want your users to read or remember. If you come up with more than 3, you should refine your question/task and possibly split it in two.

The rest of the variables will support these very high level metrics. Again, in a typical situation, you could come up with up to three levels of data (with more than three being a good indication to rethink your scope). Some metrics can support the high-level metrics (i.e. show them with a different angle, or explain them) and some metrics could in turn support them.

Present some metrics together.

Stephen Few argues that dashboards should fit on one page or one screen because their virtue is to present information together. With the flexibility offered by the modern web, and the size constraints of mobile, this is a requirement that shouldn’t be absolute. But it’s relevant to remember that some variables add value when seen along other variables. With that in mind, you can have part of your dashboard as a fixed element (always visible on screen) while the rest can scroll away, for instance.

Push some metrics to secondary cards (such mouseovers, pop-ups or drill-down views)

Hierarchizing information is not just about promoting important information. It’s also about demoting information which, while is useful in its own right, doesn’t deserve to steal the show from the higher level metric. The great thing about interactive dashboards is that there are many mechanisms for that. Some information can be kept as “details on demand” and only shown when needed.

Figure out what form to give to the data

So you have data. It probably changes over time, too (and you have that history as well!). And a sense of how important it is.

You can represent it as a static number (and, further, to adjust the precision of that number) or as a time series (i.e. line graph, area graph, bar graph etc.), or both.

The key question to answer is whether the history and how the metric moved over time is relevant and important, versus the latest figures. If you think that the history is always important or that it doesn’t hurt to have it for context anyway, consider that it’s yet another visual element to digest, another thing that can be misinterpreted, and that unless its importance is clearly demonstrated, you’d rather not include it. Yes – even as a tiny sparkline.

Here is another example from my work at Facebook of a page where proper hierarchy has been applied.

Page Insights, to use a parallel with a better known product, is like google analytics, only for Facebook Pages instead of web sites. Unsurprisingly, the metric we put to the top left is the Page Likes, which is the number of people who like a page. The whole point of the system is to let people understand what affects that number and how to grow it. Two other high-level metrics are shown on the same row in the two cards on the right: the Post Reach for the week (number of people who have seen content from this page this week, whether they like the Page or not) and Engagement (number of people who acted on the content – actions could be liking, commenting, sharing, clicking, etc.)

The number of new Page Likes of the past week, which is represented as a both a line chart and a number in the left card, is an example of a level two metric. It supports the top metric – total likes. The number of Page Likes of the past week, which is represented as a line chart only, is a level three metric. It’s here just as a comparison to the number of the current week – here, it helps us figuring out that last week has been a better week.

Connecting the dots

Ultimately, a dashboard is more than a collection of charts. It’s an ensemble: charts and data are meant to be consumed as a whole, with an order and a structure. Charts that belong together should be seen together. The information gained like so will be much more useful than from looking at them in sequence.

Linking, for instance, is the concept of highlighting an element in a given chart with repercussions on other charts, depending on the element highlighted. A common use case is to look at a data for one given time point, and see the value for that time point highlighted in related charts. Here is an example:

In this specific case, the fact that both charts share the same x-axis makes comparing the shape of both charts easier even without linking.

Each variable doesn’t have to be on its own chart. Your variables can have an implicit relation between one another. Bringing them together might make that relation explicit. Here are some interesting relationship between variables or properties of variables that can be made apparent through the right chart choice.

One variable could be always greater than another one, because the second is a subset of the first. Here are some examples:

The number of visits on a website last week will always be greater or equal than the number of unique visitors that week, which will always be greater than the number of visitors last day.
The number of visitors will always be greater to the number of first-time visitors.
The cumulative number of orders over a period of time will always be greater than the number of daily orders over that same period.
The time that users spend with a website in an active window of their browser will always be greater than the time they spend actively interacting with the site.

What’s interesting here is that these relations are not just true because of experience, they are true by definition. It’s also metrics that are expressed in the same units, and, in most cases, with the same order of magnitude, so they can be displayed on the same chart. When applicable, showing them together can show how they, indeed, move together or not.

One variable could be the sum of two other, less important variables.

In the example below we go even one step further and we show that one variable is the sum of two variables minus a fourth one.
Here, we look at the net likes of a Facebook Page, that is, the difference between the number of people who like a page on a given day and the day before.
Two factors can make more people like a page: paid likes (a user sees an ad, is interested, and from it, likes the page) or organic likes (a user visits a page, or somehow see content from that page, and likes it, without advertisement involved). Finally, people may also decide to stop liking the page (“unlikes”).
Here, net likes = organic likes + paid likes – unlikes. The reason why we have decomposed Likes between organic and paid is because we wanted to show that ads can amplify the effect of good content. So, visually, we chose to represent that as a layer on top of the rest. (important remark: your dashboard doesn’t have to be neutral. If it can show that your product, company, team etc. is delivering, and you have an occasion to demonstrate it, don’t hesitate a moment). By showing the unlikes as a negative number, as opposed to a positive variable, going up, possibly above the likes (which would be unpredictable) we can keep the visual legible and uncluttered. A user can do the visual combination of all these variables. This chart, by the way, shows the typical dynamic of a Page : new content will generate peaks of new users, but also will cause some users to stop liking the page.

One variable could be always growing. Or always positive.

When that is the case this can be used to make choices to represent the chart. If a variable is always growing by nature (i.e. cumulative revenue) you may want to consider representing a growth rate rather than the raw numbers. A reason to consider that is that your axis scale will have to change over time (i.e. if you plot a product that sells for around $1m per day, having an axis that goes from 0 to $10m would be enough for a week, but not for a month let alone for a year, whereas with a growth rate you can represent a long period of time consistently). And if a variable is always positive (ie stock price), your y axis can start at 0, or even at an arbitrary positive value, as opposed to allocate space for negative values.

Conversely, if a variable doesn’t change over time, it doesn’t mean that it’s not interesting to plot. That absence of change could be a sign of health of the system (which is the kind of task that dashboards can be useful for). So the absence of change doesn’t mean that there’s an absence of message.

February 7, 2015 by jerome on charts, dashboards, data visualization, tips

Dashboards as products

In the past few articles I’ve exposed what dashboards are not:

an exercise in visual design,
an exercise in data visualization technique.

Another way to put this is that “let’s do this just because we can” is a poor mantra when it comes to designing dashboards, or visualizations in the broader sense by the way.

Do it for the users

Now saying that dashboards should be products is a bit tautological. Products, in product design, refer to the result of a holistic process that solves problems of users – a process that includes research, conception, exploration, implementation and testing.

Most importantly, it’s about putting the needs of your users first. And your users first. Interestingly, treating your dashboard as a product means that the dashboard – your product – doesn’t come first.

Creating an awesome dashboard is a paradox. Googling for that phrase yields results such as: 20+ Awesome Dashboard Designs That Will Inspire You, 25 Innovative Dashboard Concepts and Designs, 24 beautifully-designed web dashboards that data geeks or 25 Visually Stunning App Dashboard Design Concepts. This is NOT dashboard product design (though it’s a good source of inspiration for visual design of individual charts).

Eventually, no one cares for your dashboard. When designing a dashboard, it’s nice to think that somebody out there will now spend one hour everyday looking at all this information nicely collected and beautifully arranged, but who would want to do that? Who would want to add to their already busy day an extra task, just to look at information the way you decided to organize it? This point of view is a delusion. We must not work accordingly.

Instead, let’s focus on the task at hand. What is something that your users would try to accomplish that could be supported by data and insights?

What is the task at hand?

If you start to think “show something at the weekly meeting” or “make a high-level dashboard” I invite you to go deeper. Show what? a dashboard for what? not for its own sake.

Trickier – how about: “to showcase the data that we have”? That is still not good enough. You shouldn’t start from your data to create your dashboard, and for several reasons. Doing so would limit yourself to the data that you have or which is readily available for you. But maybe that this data, in its raw form, is not going to be relevant or useful to your users. Conversely, you would be tempted to include all the data that you have, but each additional information that you bring to your dashboard would make it harder to digest and eventually detrimental to the process. Most importantly, if you don’t have an idea of what the user would want to accomplish with your data, you cannot prioritize and organize it, which is the whole point of dashboard design.

Finally – “to discover insights” is not a task either. Dashboards are a curated way to present data for a certain purpose. They are not unspecified, multi-purpose analytical exploration tools. In other words: dashboards will answer a specific, already formulated question. And they will answer in the best possible way, if they are designed as such. For exploration, ad-hoc analysis is more efficient, and is probably best left to analysts or data scientists than end users.

Here are some example of tasks:

check that things are going ok – that there is no preventable disaster going on somewhere. For instance: website is up – visits follow a predictable pattern.
Specifically, check that a process had completed in an expected way. For instance: all payments have been cleared.
If something goes wrong, troubleshoot it – find the likely cause. For instance: sales were down for this shop… because we ran out of an important product. Order more to fix the problem, make sure to stock accordingly next time.
Support a tactical decision. For instance: here are the sales of the new product, here are the costs. Should we keep on selling it or stop?
Decide where to allocate resources. For instance: we launched three variations of a product, one is greatly outperforming the other two, let’s run an ad campaign to promote the winner.
Try to better understand a complex system. For instance: user flow between pages can show where users are dropping out or where efficiency gains lie.

This list is by no means limitative. But it’s really useful to start from the problem at hand than just try to create a visual repository for data.

Next, we’ll see how to implement these in the last article: charts assemble!

February 7, 2015 by jerome on charts, dashboards, data visualization, tips

Dashboards versus data visualization

Dashboards are extreme data visualizations

In the recent Information is Beautiful 2014 awards, I found interesting that there is an infographics and a data visualization categories. My interpretation is that the entries in the infographics section are static and illustrated, while those in the data visualization are generated and data-driven. However, all the featured data visualization projects are about a one-off dataset. So aesthetical choices of the visualization depend on the characteristics of this particular dataset. By contrast, the dashboards I have worked with are about a live, real-time datastream. They have to look good (or at least – to function) whatever the shape and size of the data that they show. The google quote and news chart that we saw earlier must work for super volatile shares, for more stable ones, for indices, currencies, etc. So, if the distinction between infographics and data visualization makes sense to you, imagine that dashboards sit further in that continuum than data visualization. Not only are dashboards generated from data, like data visualizations, but they are also real-time and should function with datasets of many shapes and sizes.

But dashboards problems are not data visualization problems

Data visualization provides superior tools and techniques to present or analyze data. With libraries and languages dedicated to making visualizations, there is little that can’t be done. In many successful visualizations, the author will create an entirely new form, or at least control the form very finely to match their data and their angle. Even without inventing a new form, there are many which have been created for a specific use, and which are relatively easy to make on the web (as opposed to say, in Excel): treemaps, force-directed graphs and other node-link diagrams, chord diagrams, trees, bubble charts and the like. And even good old geographic maps.

In most cases, it is not a good idea to be too clever and have a more advanced form.

Up until mid November 2014, Google Analytics allowed users to view their data using motion charts.

This was really an example of having a hammer and considering all problems as nails. Fortunately, this function disappeared from the latest redesign.

Likewise, on twitter followers dashboard, the treemap might be a bit over the top:

and possibly confusing and not immediately legible to some users. On the other hand, it is economical in terms of space and would probably work in almost every case which are two things that dashboards should be good at. So while I wouldn’t have used it myself I can understand why this decision has been made.

Dashboards are not an exercise in visual design either

A dashboard such as this:

(for which I can’t find the source. I found it on pinterest and was able to trace it to this post but not prior) is well designed visually, it makes proper use of space, colors and type, its charts are simple.

But what good is it? what do I learn, what can I take away from it, what actions can I perform?

Most of the dashboards examples I find on sites like dribbble or beyance (see my Pinterest board) fall into that category: inspiring visual design, probably not real data, no flow, no obvious use.

Dashboards are problems of their own

What makes a dashboard, or any other information-based design successful, is neither the design execution nor the clever infovis technique. Dashboards, eventually, are meant to be useful and to solve a specific problem.

How so? We’ll see in the next article: dashboards as products.

February 7, 2015 by jerome on charts, dashboards, data visualization, tips

Charts in the age of the web

In 2008, when I was working at OECD, my job description was that of an editor. That implied I was mostly working on books. I was designing charts, but they were seen as components of books. And this was typical of the era.

So we would create charts like this one:

And it was awesome! (kind of). I mean, we respected all the rules. Look at that nicely labelled y-axis! and all the categories are on the x-axis! the bars are ordered in size, it’s easy to see which has the biggest or smallest value! And with those awesome gridlines, we can lookup values – at least get an order of magnitude.

What we really did though was apply styling to an excel chart (literally).

Print charts vs interactive charts

Origin of rules for print charts

Rules that govern traditional charts (which are many: ask Tufte, Few) make a certain number of assumptions which are interesting to question today.

One is that charts should be designed so that values can be easily looked up (even approximately) from the chart. This is why having labeled axes and gridlines is so useful. This is also why ordering bar charts in value order is nice. With that in mind, it also makes sense that charts like bar charts or area charts, which compare surfaces, be drawn on axes that start at 0.

The other assumption is that a chart will represent the entirety of a dataset that can be shown at a time. We have to come up with ways to make sure that every data point can be represented and remains legible. The chart author has to decide, once and for all, which is the dataset that will be represented, knowing that there will be “no backsies”.

In the same order of thought, the author must decide the form of his chart. If she wants to compare categories, she may go for a bar chart. If she wants to show an evolution over time, for a line chart. And if she wants the user to have exact values, she will choose a table.

And so, when everything else than a table is chosen, we typically don’t show values with all the data points, because adding data labels would burden the chart and make its overall shape harder to make out.

In this framework, it makes sense to think in term of data-ink (the cornerstone of Tuftean concepts): make sure that out of all the ink needed to print the chart (you can tell it’s a print concept already…), as much should go to encode the data as possible, versus anything else.

How about now

However, there is not a single of these reasons which is valid today in the world of web or mobile charts. Data-ink only made sense on paper.

Web charts have many mechanisms to let the user get extra information on a given data point. That can be information that updates on mouseover, callouts and tooltips… This might be less true of mobile in general where the distinction between hovering and clicking is less distinct. But it is definitely possible to obtain more than what is originally displayed. If I want to have an exact value, I shouldn’t have to simply deduce that from the shape of the chart. There can be mechanisms that can deliver that to me on demand.

An example: Google Finance Quote & News

The Google Finance Quote and News chart is a very representative example of a web-native chart. Around since 2006, they provide the price of a given security, along with news for context. While its visual design has probably been topped by other dashboards, what makes it a great example is that it’s publicly available, which is uncommon for business data.

While this chart has gridlines and labelled axes, that is not enough to lookup precise values. However, moving the mouse over the chart allows the user to read a precise value at a given point in time. A blue point appears and the precise value can be read in the top left corner.

One very common data filter in chart is controls that affect the time range: date pickers. By selecting a different time range, we make the chart represent a different slice of the dataset – we effectively filter the dataset so that only the relevant dates are shown. This is in contrast with the traditional printed charts, again, where all of the dataset is shown at once. For instance, we can click on “6m” and we’ll be treated with data from the last 6 months:

Comparing the selected security with others will make the chart show the data in a different mode. This is the same data (plus added series), in the same screen and the same context, but the chart is visually very different:

As to the other two characteristics of web charts I mentioned, data exports and drill downs, they are also featured (but less graphical to show, so I haven’t captured a screenshot for those). There is a link on a left-side column to get the equivalent data (so it is always possible to go beyond what is shown on screen). The little flags with letters in the 3 first screenshots are clickable, and represent relevant news. Clicking them will highlight that article in a right-side column. So it is always possible to get more information.

What does that change?

Everything.

Rules or best practices based on the assumption that data is hard to lookup or to compare are less important. The chart itself has to be legible though. So, for instance, it’s ok to have pie charts or donut charts, as long as the number of categories doesn’t go totally overboard.

Web charts, and dashboards even more so, should focus on only showing relevant data first, then showing it in the most useful and legible way. Again, a noted difference with the print philosophy where as much data as possible should be shown.

How this play out is what we’ll cover in the next articles of the series: Dashboards versus data visualizations.

February 7, 2015 by jerome on dashboards, data visualization, tips

Dashboards ahoy!

During my time at Facebook, I worked almost exclusively on one problem: dashboards. More specifically, how to present frequently-updated data in the most efficient way to business users. And so today, I am starting a series of blog posts / tutorials about dashboards.

Why talk about dashboards?

Legit questions. Dashboards are so uncool and boring! I could make more advanced tutorials on d3 or canvas or processing (which is also… in the plans). Or update new cool visualizations.

But the interest of discussing dashboards is precisely because they are not cool. In the first part to his Information Dashboard Design, Stephen Few presents a lengthy gallery of terrible dashboards he collected over years. Most of these dashboards exhibit a serious and obvious production flaws: they are often gaudy, using 3d columns or pie charts, when not taking the dashboard metaphor too literally with replicas of gauges and meters. Here’s a typical dashboard from the early 2000s:

In the past 5 years though, these problems have largely been solved. The overall “Graph Design IQ”, to borrow another Stephen Few concept, has greatly increased. People who make charts are increasingly aware that there are some best practices to build them and that there are a variety of forms beyond the core “Excel” chart types such as bar charts, line charts, pie charts and scatterplots. Besides, anyone who had to code a chart from scratch realized that, as opposed to Excel or similar software where users can rely on defaults, every detail of a chart needs decisions: not only how to encode the data (ie bars, lines etc.) but also whether to have gridlines or not and if so how, how to format axes, how to present legends, and so on and so forth. Oftentimes, having to make these decisions implies taking the time to think about these choices which makes the overall chart quality stronger. Also, in products like Tableau (and to be honest in every version of Excel) the default choices are much more robust than they used to be.

Down with the old, up with the new

While these old problems are as good as solved, dashboards are still not awesome because they are plagued with a set of new problems.

First, the rules and best practices for charts that we keep perpetuating were thought for an old world of printed or otherwise static charts, not the interactive environments such as web or mobile. As such, some recommendations of the 90s have become myths that need to be busted (I’m looking squarely at you, data-ink ratio).

Second, dashboard design is neither a data visualization problem, nor a visual design problem. By this I mean that thinking strictly as a designer or as a data visualization specialist might provide a textbook answer to some well-identified problems that arise with dashboards, but neither of these approaches is optimal.

The not-so-secret secret to dashboards is to apply product thinking. How will people use the dashboard? That should guide what you try to accomplish.

Finally, it’s really critical to realize that dashboards are not collection of individual charts, but an ensemble. Components of a dashboard should not be thought individually but as pieces that fit with one another.

Each of these themes will be the subject of an individual article!

Follow me on Pinterest

On Pinterest, I maintain two dashboard-related boards you may find interesting.

The first is called “Dashboards” (duh) and is examples of complete dashboards, with no judgment on quality, most often found in the wild.

The second, data vis / dashboard UI elements, is centered around lower-level problems such as charts, parts of dashboards and their visual design. Virtually every dashboard example found on a visual design platform like dribbble or beyance is not so much a true dashboard than a collection of individual charts, not that it’s not interesting.