Label line ends in time series with ggplot2  

@drsimonj here with a quick share on making great use of the secondary y axis with ggplot2 – super helpful if you’re plotting groups of time series!

Here’s an example of what I want to show you how to create (pay attention to the numbers of the right):

init-example-1.png

Setup #

To setup we’ll need the tidyverse package and the Orange data set that comes with R. This tracks the circumference growth of five orange trees over time.

library(tidyverse)

d <- Orange

head(d)
#> Grouped Data: circumference ~ age | Tree
#>   Tree  age circumference
#> 1    1  118            30
#> 2    1  484            58
#> 3    1  664            87
#> 4    1 1004           115
#> 5    1 1231           120
#> 6    1 1372           142

Template code #

To create the basic case where the numbers appear at the end of your time series lines, your code might look something like this:

# You have a data set with:
# - GROUP colum
# - X colum (say time)
# - Y column (the values of interest)
DATA_SET

# Create a vector of the last (furthest right) y-axis values for each group
DATA_SET_ENDS <- DATA_SET %>% 
  group_by(GROUP) %>% 
  top_n(1, X) %>% 
  pull(Y)

# Create plot with `sec.axis`
ggplot(DATA_SET, aes(X, Y, color = GROUP)) +
    geom_line() +
    scale_x_continuous(expand = c(0, 0)) +
    scale_y_continuous(sec.axis = sec_axis(~ ., breaks = DATA_SET_ENDS))

Let’s see it! #

Let’s break it down a bit. We already have our data set where the group colum is Tree, the X value is age, and the Y value is circumference.

So first get a vector of the last (furthest right) values for each group:

d_ends <- d %>% 
  group_by(Tree) %>% 
  top_n(1, age) %>% 
  pull(circumference)

d_ends
#> [1] 145 203 140 214 177

Next, let’s set up the basic plot without the numbers to see how each layer adds up.

ggplot(d, aes(age, circumference, color = Tree)) +
      geom_line()

unnamed-chunk-5-1.png

Now we can use scale_y_*, with the argument sec.axis to create a second axis on the right, with numbers to be displayed at breaks, defined by our vector of line ends:

ggplot(d, aes(age, circumference, color = Tree)) +
      geom_line() +
      scale_y_continuous(sec.axis = sec_axis(~ ., breaks = d_ends))

unnamed-chunk-6-1.png

This is a great start, The only major addition I suggest is expanding the margins of the x-axis so the gap disappears. You do this with scale_x_* and the expand argument:

ggplot(d, aes(age, circumference, color = Tree)) +
      geom_line() +
      scale_y_continuous(sec.axis = sec_axis(~ ., breaks = d_ends)) +
      scale_x_continuous(expand = c(0, 0))

unnamed-chunk-7-1.png

Polishing it up #

Like it? Here’s the code to recreate the first polished plot:

library(tidyverse)

d <- Orange %>% 
  as_tibble()

d_ends <- d %>% 
  group_by(Tree) %>% 
  top_n(1, age) %>% 
  pull(circumference)

d %>% 
  ggplot(aes(age, circumference, color = Tree)) +
    geom_line(size = 2, alpha = .8) +
    theme_minimal() +
    scale_x_continuous(expand = c(0, 0)) +
    scale_y_continuous(sec.axis = sec_axis(~ ., breaks = d_ends)) +
    ggtitle("Orange trees getting bigger with age",
            subtitle = "Based on the Orange data set in R") +
    labs(x = "Days old", y = "Circumference (mm)", caption = "Plot by @drsimonj")

init-example-8.png

Sign off #

Thanks for reading and I hope this was useful for you.

For updates of recent blog posts, follow @drsimonj on Twitter, or email me at drsimonjackson@gmail.com to get in touch.

If you’d like the code that produced this blog, check out the blogR GitHub repository.

 
185
Kudos
 
185
Kudos

Now read this

focus() on correlations of some variables with many others

Get the correlations of one or more variables with many others using focus() from the corrr package: library(corrr) mtcars %>% correlate() %>% focus(mpg) #> # A tibble: 10 x 2 #> rowname mpg #> <chr> <dbl>... Continue →