… Let’s start with Gadfly
If you have done any kind of data analysis you know that good visualisation is key to everything: you can see if your data are tidy enough, and you can identify what models to use. It sits between your data tidying and data modelling.
What’s the best strategy to plot graphs and visualisations? Leland Wilkinson wrote, IMHO, the most definitive textbook on the topic named “The Grammar of Graphics” where he laid out the visual elements that make up visualisations. Read the book here:
The Grammar of Graphics
Presents a unique foundation for producing almost every quantitative graphic found in scientific journals, newspapers…
Hadley Wickham ported the concepts to R statistical system with ggplot2, and wrote a summary paper you can read here. I strongly recommend that you read the paper. In my opinion, it is perhaps the most authoratitative introduction to the concept from a programming perspective. Do not miss it, here is the link:
In Julia, you can use Gadfly and AlgebraOfGraphics packages to map the concepts of Grammar of Graphics. I find Gadfly to be more intuitive of the two, and if you are starting out to use the concepts of Grammar of Graphics, I recommend that you use Gadfly and test out these ideas.
So, what are the concepts?
Grammar of Graphics states that you build your visualisations layer by layer, so regardless of how complex it might be, start with one layer and build anothe layer on top of it. Each layer, in turn, has the following five components:
- Aesthetics. — this refers to the data you use to build the plot. Data frame or an array of points where you decide the x and y positions.
- Geometry. — geometry is where you decide what would be the shape of the visual appearance of what you want to show. For example, will these be points, or lines, or bars? A scatter plot of x and y for instance, will be based on points that are displayed on two coordinates while a regression line would be a “line”, that kind of thing.