For a few years now, Canadian-born Jer Thorp has been working on the forefront of data visualisation, creating ever new ways to interact with vast data sets. Operating under the title of a “data artist”, Jer has recently teamed up with Mark Hansen and Ben Rubin to form the Office of Creative Research.
During the summer I had the chance to sit down with Jer and talk about their latest project for Scientific American, about the role of narrative in visualisation and about how to be careful with your data…
Jer, what is the latest project that you have worked on here at the Office of Creative Research?
We have been working on a data visualization for Scientific American on fruit and vegetable imports into the US. Once a year, Scientific American publishes a single topic issue, and this year’s issue is around food. Up to now, most of their data visualization has been paired with stories, whereas for this issue we wanted to create a stand-alone project with data visualization.
We worked with a data set tracked by the US Department of Agriculture (USDA). They track every fruit and vegetable import into America, and they have been doing so since 1998. At first it seems like a relatively bland topic, but there are trends inside of it that are really interesting: you can see political changes, technology changes and taste changes.
I am always excited about data that involves people and is human. The food data was interesting because it’s about the stuff that we consume and how that is changing over time.
So you took the dataset and didn’t really know what to expect or did you have a claim beforehand?
We have a research based practice which can be summed up in two words: “Data first”. Let’s look at the data and let it tell us where we are going to. We went through probably a few hundred iterations of the graphic and tried to find something that we wanted to tell. The central part is that our balance between import and export is changing drastically.
We used to import a lot of vegetables during the winter and exported some of what we grew during the summer. But this seasonality is going out of the way that we eat, and the import has generally increased. You can see that for instance with asparagus, which in 1998 was only seasonal and in small production, and it was all grown inside the US. Now it is available all year round and mostly grown outside of America.
There are also taste changes at the bottom of this – kale for example was not particularly popular 15 years ago, and now it is. And there are a million of these things. We’re currently working on the interactive version so that people can look at the variety of these stories and explore them, at any time range and any group of commodities.
Within data visualization there has been much talk about storytelling lately. How do you consider that in your work? Does creating a narrative precede the data plotting?
I think creating a narrative is a natural response when we see a system. I can draw a square and a circle beside it and all of us will come up with a narrative of what that square is beside that circle. Sometimes you want to assemble a fairly clear narrative about things. Especially when time is involved it is impossible not to think of a story. That is how we understand time – it is essentially narrative. If I say this happened in 2001, this happened in 2002 and this happened in 2003 – that is a story.
In our work we are more interested in post-modern forms of storytelling – something that is not so cleanly structured. In the more formal design work that we do we try to create exploratory tools which allow people to draw their own stories out of deep data sets. For example in the online version of the fruit and vegetable project people will be able to make their own thing. Maybe they run a potato farm and they want to know what have been the changes in potato farming over the last three years then they can make that story.
Now there are all these scientific papers being published about the value of narrative in data visualization – as if this was something really exciting and new. A few years back, there often used to be this argument that a narrative style of data visualization is obfuscating the data. Now academia is catching up and saying “Right, maybe a narrative does help…”
I would say: “We can do both!” I use this phrase of the “oh!-ah!-principle” to describe what I like to do with data visualization. At first, you need to get people in with some kind of “Oh!” moment in creating a piece that is visually arresting. And then you can go on and explain something and then they go like “Ah!”. It’s this “Oh!” moment that raised a lot of hackles because for many people, science is adversed to aesthetic.
So when you work with a data set how do you go about analysing its contents?
At least we try to “let the data speak to us”. Describing the history of a project after it is done always has a revisionist aspect – you always describe it in the smoothest possible way. In reality it is usually a bit more jumbled than that. One way to go is taking the data and visualizing them in all possible ways, doing some statistical analysis on it and so on.
But another important part of it is working backwards and asking: “Where did the data come from? What was the methodology that was used to gather it? What were the human systems that were involved in its production? How can you understand those systems better?” This also helps you analyzing the data. Data visualization has become so easy that sometimes I see students getting a data set, and without really even thinking about it they start visualizing it and making assumptions from it.
I had a really interesting conversation the other day about the domain expert and their role in this process, which is a really interesting and precarious thing: You don’t want to get too deep with the domain expert because then you will absorb their biases and maybe you are not going to do as good of a job. But on the other hand you don’t want to be so naive that either you are going make a colossal error or you are just going show something which is so obvious to them…
So what we try to do is rather sitting down with the experts and ask “Tell me about this data. What do you think I should do with it? Where did it come from, what was the question you were trying to answer? Why did you do it, and what are some things that might be there that are interesting…”
There is often a danger of taking a data set (like the fruit and vegetable data) and saying “I am gonna do a piece about all the fruit and vegetable in the North of America…” We were careful of our language when talking about this piece, because the only thing you can say is that it is about all the fruit and vegetable imports that were measured by the USDA. It is actually a visualization of this system of measurement.
This difference is sometimes hard for people to grapple – you are not visualizing the system, you are visualizing measurements of the system. No measurement is ever fully accurate.
I think data visualization is much easier if you can understand that it is not this pure form that sometimes it is believed to be. You are always going to be biased, you are always going to carry your error, you are always going to have uncertainty. All of these things will always be in the picture, so you have to ask yourself: “How can I be honest with them?” Once you understand that it is much easier.
A big thanks to Jer. For more information check The Office of Creative Research’s website.