Four things I learned about data visualisation during my PhD

Getting started in the world of data visualisation can be challenging. You may have some fantastic new results, or a bold new take on existing data, but are unsure how to make them really jump off the page. Phil Lamb (@lamb_ecology), who is undertaking a NERC internship here at infohackit, shares some things he learned about data visualisation during his PhD.

Phil getting to grips with a crab during field work.

Colour really matters

Scientific figures have been shaped by a legacy of physical media: colour printing was expensive, and the use of colour was avoided to keep the costs of publication down. However, colour is a key component of effective data visualisation and defaulting to black and white images will damage your ability to effectively present your data.

As highlighted in Lisa Rost’s excellent article, colour can serve two key purposes: first it can help you set the mood or theme of your figure. Second, it allows you to draw attention to particular aspects of your visualisation.

You can use colour to create associations in the mind of the viewer: for instance, if presenting marine data, the use of blue or green colours reinforces the subject matter. You should also consider how you use colours together. Colours that neighbour each other on the colour wheel suggest harmony, where opposite colours imply contrast. You can use these associations to help explain data or data sources.

Once you’ve decided on the overall impression you’re going for, consider using online tools (covered in-depth in the Rost article) to make sure your visualisation is both attractive and easy to understand.

Tools like can help build a colour palette from a ‘core’ colour of the users choosing. It also features a handy colour-blind simulation feature as well.

The other consideration is drawing attention to particular elements of a figure. My biggest revelation in this regard has been the use of grey. Grey works well with all colours and doesn’t draw focus. It is therefore perfect for displaying information which is not at the crux of your visualisation. Want to show uncertainty? Grey. Want to show monte-carlo simulations? Grey. Want to differentiate other data sets from your own? Grey. You get the idea.

An further consideration when choosing colour schemes is colour blindness. About 8% of men are colour blind. I myself am colour blind. However, this does not mean I cannot see colour! Colour is still a valuable tool: but be aware that poor colour selection can obfuscate the message behind your data. There are plenty of resources in this regard: consider using a colour-blind friendly palette or double checking with a colour-blind online simulation tool before finalising your colour selection.

R is the king

Entering my PhD I had used R a bit, but didn’t understand just how much you can do with it. Moving away from a menu-based approach (like SPSS or excel) to constructing figures using code can be daunting, but the reward is great.

The biggest advantage is customisation. R has an array of packages, each with a specific use. These are maintained by some truly wonderful individuals, who give a lot of time and support to make these tools freely available. If you can dream it, there is probably a package to do it on R. Want to animate your plot? Want to use cats for data points? Want to design a watch? It is all possible in R.

Speed and reproducibility are another plus. Once you’ve made your data visualisation you can save the code. Creating a new figure from the code takes mere seconds. Better yet – if you see a visualisation you like on the internet, it is possible you may be able to visualise your data in this way immediately: people are often very generous, and provide the code so you can create a similar graph.

The other major area of improvement is accuracy. We’re all humans; we all make mistakes. If you’re working hands-on with a lot of data, inevitably, you will start to introduce errors at some point. Spotting mistakes is really tricky! Generating figures with code reduces the risk of human error (computers never tire or get distracted). Additionally, using code keeps a record of everything you did during figure creation, which makes it is easy to review and identify any mistakes.

If you’re convinced, I’d start by downloading Rstudio (this provides a great environment to use R in), and installing the ggplot2 package. Guides for getting started can be found here and here – but there is almost always a helpful soul available on Rstudio community or Stack Overflow if you get stuck.

Make it attractive

Don’t underestimate the importance of making your data visualisations beautiful and interesting to look at. It is not frivolous: it serves an important purpose. Simply put, people are more likely to engage with the data if it is interesting to look at. Furthermore, much like colour choice, including diagrams and annotations in the visualisation can help the viewer keep the subject of the data in mind.

Really well-designed, attractive data visualisations can actually make the data easier to understand.

Consider the example from my PhD below: both contain the same amount of information. However, diagram a. not only looks better, but keeps the subject (marine species interactions) firmly rooted in the viewer’s mind.

Achieving more beautiful graphs needn’t be difficult. Saving your figures created in R as a pdf file allows them to be edited using tools such as Affinity Designer, or Inkscape. From here it is simple to add vector images from sites like phylopic (biological organisms) or flat icon to help illustrate your visualisation.

Don’t overestimate your audience

My advice, so far, has focused around what you can do to make your visualisations more engaging. However, it is worth noting that data visualisation is only one tool in the science communication kit. A picture may say a thousand words, however equally it may not say anything at all: research by Pew suggests that 37% of American adults could not correctly interpret a simple scatter plot. More complex visualisations may cause more uncertainty: two simple visualisations may very well be more effective than a single one that confers the same information (see this article by Elijah Meeks for a more nuanced take). If you are communicating with the general public (although scientists misinterpret figures too!) remember that data visualisation alone may not communicate your point (Michael Corell’s article on visualization literacy addresses this topic in much greater detail). Have a look at a video infographic I made at an infohackit event. How effectively do I communicate my point in this medium compared to the figure shown above, or the paper I wrote on the same topic? Sometimes figures are not enough, providing different types of media (text / video / audio) on top of your core data visualisation will help to effectively communicate your message.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s