August 3, 2016

fashion() output with corrr

Tired of trying to get your data to print right or formatting it in a program like excel? Try out fashion() from the corrr package:

d <- data.frame(
  gender = factor(c("Male", "Female", NA)),
  age    = c(NA, 28.1111111, 74.3),
  height = c(188, NA, 168.78906),
  fte    = c(NA, .78273, .9)
)
d
#>   gender      age   height     fte
#> 1   Male       NA 188.0000      NA
#> 2 Female 28.11111       NA 0.78273
#> 3   <NA> 74.30000 168.7891 0.90000

library(corrr)
fashion(d)
#>   gender   age height  fte
#> 1   Male       188.00     
#> 2 Female 28.11         .78
#> 3        74.30 168.79  .90

But how does it work and what does it do?

The inspiration: correlations and decimals #

The insipration for fashion() came from my unending frustration at getting a correlation matrix to print out exactly how I wanted. For example, printing correlations typically looks something like:

mtcars %>% correlate()
#> # A tibble: 11 x 12
#>    rowname        mpg        cyl       disp         hp        drat
#>      <chr>      <dbl>      <dbl>      <dbl>      <dbl>       <dbl>
#> 1      mpg         NA -0.8521620 -0.8475514 -0.7761684  0.68117191
#> 2      cyl -0.8521620         NA  0.9020329  0.8324475 -0.69993811
#> 3     disp -0.8475514  0.9020329         NA  0.7909486 -0.71021393
#> 4       hp -0.7761684  0.8324475  0.7909486         NA -0.44875912
#> 5     drat  0.6811719 -0.6999381 -0.7102139 -0.4487591          NA
#> 6       wt -0.8676594  0.7824958  0.8879799  0.6587479 -0.71244065
#> 7     qsec  0.4186840 -0.5912421 -0.4336979 -0.7082234  0.09120476
#> 8       vs  0.6640389 -0.8108118 -0.7104159 -0.7230967  0.44027846
#> 9       am  0.5998324 -0.5226070 -0.5912270 -0.2432043  0.71271113
#> 10    gear  0.4802848 -0.4926866 -0.5555692 -0.1257043  0.69961013
#> 11    carb -0.5509251  0.5269883  0.3949769  0.7498125 -0.09078980
#> # ... with 6 more variables: wt <dbl>, qsec <dbl>, vs <dbl>, am <dbl>,
#> #   gear <dbl>, carb <dbl>

But this is just plain ugly. Personally, I wanted:

Decimal places rounded to the same length (usually 2)
All the leading zeros removed, but keeping the decimal aligned with/without - for negative numbers.
Missing values (NA) to appear empty ("").

This is exactly what fashion does:

mtcars %>% correlate() %>% fashion()
#>    rowname  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb
#> 1      mpg      -.85 -.85 -.78  .68 -.87  .42  .66  .60  .48 -.55
#> 2      cyl -.85       .90  .83 -.70  .78 -.59 -.81 -.52 -.49  .53
#> 3     disp -.85  .90       .79 -.71  .89 -.43 -.71 -.59 -.56  .39
#> 4       hp -.78  .83  .79      -.45  .66 -.71 -.72 -.24 -.13  .75
#> 5     drat  .68 -.70 -.71 -.45      -.71  .09  .44  .71  .70 -.09
#> 6       wt -.87  .78  .89  .66 -.71      -.17 -.55 -.69 -.58  .43
#> 7     qsec  .42 -.59 -.43 -.71  .09 -.17       .74 -.23 -.21 -.66
#> 8       vs  .66 -.81 -.71 -.72  .44 -.55  .74       .17  .21 -.57
#> 9       am  .60 -.52 -.59 -.24  .71 -.69 -.23  .17       .79  .06
#> 10    gear  .48 -.49 -.56 -.13  .70 -.58 -.21  .21  .79       .27
#> 11    carb -.55  .53  .39  .75 -.09  .43 -.66 -.57  .06  .27

And if I want to change the number of decimal places and have a different place holder for NA values (na_print):

mtcars %>% correlate() %>% fashion(decimals = 1, na_print = "x")
#>    rowname mpg cyl disp  hp drat  wt qsec  vs  am gear carb
#> 1      mpg   x -.9  -.8 -.8   .7 -.9   .4  .7  .6   .5  -.6
#> 2      cyl -.9   x   .9  .8  -.7  .8  -.6 -.8 -.5  -.5   .5
#> 3     disp -.8  .9    x  .8  -.7  .9  -.4 -.7 -.6  -.6   .4
#> 4       hp -.8  .8   .8   x  -.4  .7  -.7 -.7 -.2  -.1   .7
#> 5     drat  .7 -.7  -.7 -.4    x -.7   .1  .4  .7   .7  -.1
#> 6       wt -.9  .8   .9  .7  -.7   x  -.2 -.6 -.7  -.6   .4
#> 7     qsec  .4 -.6  -.4 -.7   .1 -.2    x  .7 -.2  -.2  -.7
#> 8       vs  .7 -.8  -.7 -.7   .4 -.6   .7   x  .2   .2  -.6
#> 9       am  .6 -.5  -.6 -.2   .7 -.7  -.2  .2   x   .8   .1
#> 10    gear  .5 -.5  -.6 -.1   .7 -.6  -.2  .2  .8    x   .3
#> 11    carb -.6  .5   .4  .7  -.1  .4  -.7 -.6  .1   .3    x

Look but don’t touch #

There’s a little bit of magic going on here, but the point to know is that fashion() is returning a noquote version of the original structure:

mtcars %>% correlate() %>% fashion() %>% class()
#> [1] "data.frame" "noquote"

That means that numbers are no longer numbers.

mtcars %>% correlate() %>% sapply(is.numeric)
#> rowname     mpg     cyl    disp      hp    drat      wt    qsec      vs 
#>   FALSE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE 
#>      am    gear    carb 
#>    TRUE    TRUE    TRUE

mtcars %>% correlate() %>% fashion() %>% sapply(is.numeric)
#> rowname     mpg     cyl    disp      hp    drat      wt    qsec      vs 
#>   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE 
#>      am    gear    carb 
#>   FALSE   FALSE   FALSE

Similarly, missing values are no longer missing values.

mtcars %>% correlate() %>% sapply(function(i) sum(is.na(i)))
#> rowname     mpg     cyl    disp      hp    drat      wt    qsec      vs 
#>       0       1       1       1       1       1       1       1       1 
#>      am    gear    carb 
#>       1       1       1

mtcars %>% correlate() %>% fashion() %>% sapply(function(i) sum(is.na(i)))
#> rowname     mpg     cyl    disp      hp    drat      wt    qsec      vs 
#>       0       0       0       0       0       0       0       0       0 
#>      am    gear    carb 
#>       0       0       0

So fashion() is for looking at output, not for continuing to work with it.

What to use it on #

fashion() can be used on most standard R structures such as scalars, vectors, matrices, data frames, etc:

fashion(10.277)
#> [1] 10.28
fashion(c(10.3785, NA, 87))
#> [1] 10.38       87.00
fashion(matrix(1:4, nrow = 2))
#>     V1   V2
#> 1 1.00 3.00
#> 2 2.00 4.00

You can also use it on non-numeric data. In this case, all fashion() will do is convert the data to characters, and then alter missing values:

fashion("Hello")
#> [1] Hello
fashion(c("Hello", NA), na_print = "World")
#> [1] Hello World

Now is a good time to take a look back at the opening example to see that it works on a data frame and with a factor column.

Exporting #

Don’t forget that it’s easy to export your fashioned output with something like:

my_data %>% fashion() %>% write.csv("fashioned_file.csv")

So what are you waiting for? Go forth and fashion()!

Sign off #

Thanks for reading and I hope this was useful for you.

For updates of recent blog posts, follow @drsimonj on Twitter, or email me at drsimonjackson@gmail.com to get in touch.

If you’d like the code that produced this blog, check out the blogR GitHub repository.

Kudos