October 11, 2016

corrr 0.2.1 now on CRAN

@drsimonj here to discuss the latest CRAN release of corrr (0.2.1), a package for exploring correlations in a tidy R framework. This post will describe corrr features added since version 0.1.0.

You can install or update to this latest version directly from CRAN by running:

install.packages("corrr")

Let’s load corrr into our workspace and create a correlation data frame of the mtcars data set to work with:

library(corrr)
rdf <- correlate(mtcars)
rdf
#> # A tibble: 11 × 12
#>    rowname        mpg        cyl       disp         hp        drat
#>      <chr>      <dbl>      <dbl>      <dbl>      <dbl>       <dbl>
#> 1      mpg         NA -0.8521620 -0.8475514 -0.7761684  0.68117191
#> 2      cyl -0.8521620         NA  0.9020329  0.8324475 -0.69993811
#> 3     disp -0.8475514  0.9020329         NA  0.7909486 -0.71021393
#> 4       hp -0.7761684  0.8324475  0.7909486         NA -0.44875912
#> 5     drat  0.6811719 -0.6999381 -0.7102139 -0.4487591          NA
#> 6       wt -0.8676594  0.7824958  0.8879799  0.6587479 -0.71244065
#> 7     qsec  0.4186840 -0.5912421 -0.4336979 -0.7082234  0.09120476
#> 8       vs  0.6640389 -0.8108118 -0.7104159 -0.7230967  0.44027846
#> 9       am  0.5998324 -0.5226070 -0.5912270 -0.2432043  0.71271113
#> 10    gear  0.4802848 -0.4926866 -0.5555692 -0.1257043  0.69961013
#> 11    carb -0.5509251  0.5269883  0.3949769  0.7498125 -0.09078980
#> # ... with 6 more variables: wt <dbl>, qsec <dbl>, vs <dbl>, am <dbl>,
#> #   gear <dbl>, carb <dbl>

Plotting functions #

The significant changes involve the rplot() and new network_plot() functions that support the visualisation of your correlations.

rplot() #

rplot() produces a shape plot of the correlations. More visible dots correspond to stronger correlations, and blue and red respectively to positive and negative. The default plot looks like this:

rplot(rdf)

There are now four arguments that allow you to make adjustments to this plot:

legend Boolean indicating whether a legend mapping the colours to the correlations should be displayed.
shape geom_point aesthetic. A number corresponding to the shape of each point. See http://sape.inf.usi.ch/quick-reference/ggplot2/shape
colours or colors Vector of colours to use for n-colour gradient. See http://sape.inf.usi.ch/quick-reference/ggplot2/colour
print_cor Boolean indicating whether the correlations should be printed over the shapes.

Here are some examples that change these values:

rplot(rdf, legend = TRUE, shape = 1)

rplot(rdf, legend = TRUE, colours = c("firebrick1", "black", "darkcyan"))

rplot(rdf, print_cor = TRUE)

And don’t forget that you can rearrange() your correlations first:

rdf %>% rearrange(absolute = FALSE) %>% rplot(shape = 0, print_cor = TRUE)

network_plot() #

network_plot() produces a network that lays out and connects variables based on the strength of their correlations:

network_plot(rdf)

For a good intro to network_plot(), see my previous blogR post. Three arguments allow you to adjust this plot:

min_cor Number from 0 to 1 indicating the minimum value of correlations (in absolute terms) to plot.
legend same as rplot()
colours or colors same as rplot()

Some examples:

network_plot(rdf, legend = TRUE, colours = c("slategrey", "palegreen"))


network_plot(rdf, legend = TRUE, min_cor = .7)

Other features #

fashion() #

fashion() will now try to work on almost any object (not just correlation data frames). It also provides arguments to adjust the number of decimals, whether to display leading_zeros, and how to print missing values (na_print):

fashion(rdf)
#>    rowname  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb
#> 1      mpg      -.85 -.85 -.78  .68 -.87  .42  .66  .60  .48 -.55
#> 2      cyl -.85       .90  .83 -.70  .78 -.59 -.81 -.52 -.49  .53
#> 3     disp -.85  .90       .79 -.71  .89 -.43 -.71 -.59 -.56  .39
#> 4       hp -.78  .83  .79      -.45  .66 -.71 -.72 -.24 -.13  .75
#> 5     drat  .68 -.70 -.71 -.45      -.71  .09  .44  .71  .70 -.09
#> 6       wt -.87  .78  .89  .66 -.71      -.17 -.55 -.69 -.58  .43
#> 7     qsec  .42 -.59 -.43 -.71  .09 -.17       .74 -.23 -.21 -.66
#> 8       vs  .66 -.81 -.71 -.72  .44 -.55  .74       .17  .21 -.57
#> 9       am  .60 -.52 -.59 -.24  .71 -.69 -.23  .17       .79  .06
#> 10    gear  .48 -.49 -.56 -.13  .70 -.58 -.21  .21  .79       .27
#> 11    carb -.55  .53  .39  .75 -.09  .43 -.66 -.57  .06  .27

fashion(mtcars) %>% head()
#>     mpg  cyl   disp     hp drat   wt  qsec   vs   am gear carb
#> 1 21.00 6.00 160.00 110.00 3.90 2.62 16.46  .00 1.00 4.00 4.00
#> 2 21.00 6.00 160.00 110.00 3.90 2.88 17.02  .00 1.00 4.00 4.00
#> 3 22.80 4.00 108.00  93.00 3.85 2.32 18.61 1.00 1.00 4.00 1.00
#> 4 21.40 6.00 258.00 110.00 3.08 3.21 19.44 1.00  .00 3.00 1.00
#> 5 18.70 8.00 360.00 175.00 3.15 3.44 17.02  .00  .00 3.00 2.00
#> 6 18.10 6.00 225.00 105.00 2.76 3.46 20.22 1.00  .00 3.00 1.00

fashion(c(0.340823, NA, -10.000032), decimals = 3, na_print = "MISSING")
#> [1]    .341 MISSING -10.000

fashion(c(0.340823, NA, -10.000032), leading_zeros = TRUE)
#> [1]   0.34        -10.00

focus() #

A standard evaluation version of focus() is now available, focus_(), to programatically focus on specific correlations:

vars <- c("mpg", "disp")
focus_(rdf, "hp", .dots = vars)
#> # A tibble: 8 × 4
#>   rowname         hp        mpg       disp
#>     <chr>      <dbl>      <dbl>      <dbl>
#> 1     cyl  0.8324475 -0.8521620  0.9020329
#> 2    drat -0.4487591  0.6811719 -0.7102139
#> 3      wt  0.6587479 -0.8676594  0.8879799
#> 4    qsec -0.7082234  0.4186840 -0.4336979
#> 5      vs -0.7230967  0.6640389 -0.7104159
#> 6      am -0.2432043  0.5998324 -0.5912270
#> 7    gear -0.1257043  0.4802848 -0.5555692
#> 8    carb  0.7498125 -0.5509251  0.3949769

Bugs and stuff #

Other than these, there have been fixes to various bugs and minor improvements made to existing functions. Please don’t forget to open an issue on GitHub or email me if you spot an issue or would like a new feature when using corrr.

Acknowledgements #

Many thanks to the community who have already been using corrr and made suggestions along the way. Your help is invaluable for improving corrr!

Sign off #

Thanks for reading and I hope this was useful for you.

For updates of recent blog posts, follow @drsimonj on Twitter, or email me at drsimonjackson@gmail.com to get in touch.

If you’d like the code that produced this blog, check out the blogR GitHub repository.

Kudos