Label line ends in time series with ggplot2  

@drsimonj here with a quick share on making great use of the secondary y axis with ggplot2 – super helpful if you’re plotting groups of time series!

Here’s an example of what I want to show you how to create (pay attention to the numbers of the right):

init-example-1.png

 Setup

To setup we’ll need the tidyverse package and the Orange data set that comes with R. This tracks the circumference growth of five orange trees over time.

library(tidyverse)

d <- Orange

head(d)
#> Grouped Data: circumference ~ age | Tree
#>   Tree  age circumference
#> 1    1  118            30
#> 2    1  484            58
#> 3    1  664            87
#> 4    1 1004           115
#> 5    1 1231           120
#> 6    1 1372           142

 Template code

To create the basic case where the numbers appear at the end of your time series lines, your code might look something like this:

# You have a data set with:
# - GROUP colum
# - X colum (say time)
# - Y column (the values of interest)
DATA_SET

# Create a vector of the last (furthest right) y-axis values for each group
DATA_SET_ENDS <- DATA_SET %>% 
  group_by(GROUP) %>% 
  top_n(1, X) %>% 
  pull(Y)

# Create plot with `sec.axis`
ggplot(DATA_SET, aes(X, Y, color = GROUP)) +
    geom_line() +
    scale_x_continuous(expand = c(0, 0)) +
    scale_y_continuous(sec.axis = sec_axis(~ ., breaks = DATA_SET_ENDS))

 Let’s see it!

Let’s break it down a bit. We already have our data set where the group colum is Tree, the X value is age, and the Y value is circumference.

So first get a vector of the last (furthest right) values for each group:

d_ends <- d %>% 
  group_by(Tree) %>% 
  top_n(1, age) %>% 
  pull(circumference)

d_ends
#> [1] 145 203 140 214 177

Next, let’s set up the basic plot without the numbers to see how each layer adds up.

ggplot(d, aes(age, circumference, color = Tree)) +
      geom_line()

unnamed-chunk-5-1.png

Now we can use scale_y_*, with the argument sec.axis to create a second axis on the right, with numbers to be displayed at breaks, defined by our vector of line ends:

ggplot(d, aes(age, circumference, color = Tree)) +
      geom_line() +
      scale_y_continuous(sec.axis = sec_axis(~ ., breaks = d_ends))

unnamed-chunk-6-1.png

This is a great start, The only major addition I suggest is expanding the margins of the x-axis so the gap disappears. You do this with scale_x_* and the expand argument:

ggplot(d, aes(age, circumference, color = Tree)) +
      geom_line() +
      scale_y_continuous(sec.axis = sec_axis(~ ., breaks = d_ends)) +
      scale_x_continuous(expand = c(0, 0))

unnamed-chunk-7-1.png

 Polishing it up

Like it? Here’s the code to recreate the first polished plot:

library(tidyverse)

d <- Orange %>% 
  as_tibble()

d_ends <- d %>% 
  group_by(Tree) %>% 
  top_n(1, age) %>% 
  pull(circumference)

d %>% 
  ggplot(aes(age, circumference, color = Tree)) +
    geom_line(size = 2, alpha = .8) +
    theme_minimal() +
    scale_x_continuous(expand = c(0, 0)) +
    scale_y_continuous(sec.axis = sec_axis(~ ., breaks = d_ends)) +
    ggtitle("Orange trees getting bigger with age",
            subtitle = "Based on the Orange data set in R") +
    labs(x = "Days old", y = "Circumference (mm)", caption = "Plot by @drsimonj")

init-example-8.png

 Sign off

Thanks for reading and I hope this was useful for you.

For updates of recent blog posts, follow @drsimonj on Twitter, or email me at drsimonjackson@gmail.com to get in touch.

If you’d like the code that produced this blog, check out the blogR GitHub repository.

 
57
Kudos
 
57
Kudos

Now read this

Five tips to improve your R code

@drsimonj here with five simple tricks I find myself sharing all the time with fellow R users to improve their code! This post was originally published on DataCamp’s community as one of their top 10 articles in 2017 1. More fun to... Continue →