How to use Gadfly.jl for data visualisations
--
Layering, sizing, and colouring of points
Gadfly is Julia package for data visualisations. You can of course use several other packages such as Plots, StatsPlots, CairoMakie or *Makie, and AlgebraofGraphics (based on Makie), and each of them are very powerful as well and feature rich, I like Gadfly because it closely resembles ggplot (Grammar of Graphics based on Leland Wilkinson’s same named book). In this post, I am going to dive into Gadfly for data graphics. Assuming that you have already installed Julia and within Julia you have installed Gadfly, we will dive into some basic preliminary ideas through codes.
Let’s start with generating a set of randomly distributed numbers and then creating a simple normal distribution plot like so:
Random.seed!(123)
# generate a sequence x with 500 data points
x = rand(Normal(10, 2), 500)
# Now let's draw a Normal distribution
p = plot(x = x,
Geom.histogram,
Guide.title("Histogram"),
Guide.xlabel("Value of X"),
Guide.ylabel("Counts")
)
The above code produced the above plot.
The plot is constructed as an object “p” and is a simple histogram of the distribution of X. Note that on the X axis they have printed the value of X and on the Y axis, they have printed the “counts” corresponding to the values of X.
If you wanted to print histogram of X, dividing X into 30 bins, and superimpose a density plot on top of it, you’d do as follows
plot(x = x,
Geom.histogram(bincount = 30, density = true),
layer(Geom.density,
color = [colorant"red"]),
Guide.title("Histogram with density plot"),
Guide.xlabel("Value of X"),
Guide.ylabel("Proportion"),
Theme(bar_spacing = 0.5mm)
)
If you did that, it’d produce a plot like as follows:
The difference this time is in the Y axis where you see they have printed the proportions of the values of X, and on the X axis they still have the values of X as before.
Let’s use these two codes to illustrate some basic properties of drawing graphs in Gadfly in Julia and how they correspond to concepts…