fashion() output with corrr
Tired of trying to get your data to print right or formatting it in a program like excel? Try out fashion()
from the corrr
package:
d <- data.frame(
gender = factor(c("Male", "Female", NA)),
age = c(NA, 28.1111111, 74.3),
height = c(188, NA, 168.78906),
fte = c(NA, .78273, .9)
)
d
#> gender age height fte
#> 1 Male NA 188.0000 NA
#> 2 Female 28.11111 NA 0.78273
#> 3 <NA> 74.30000 168.7891 0.90000
library(corrr)
fashion(d)
#> gender age height fte
#> 1 Male 188.00
#> 2 Female 28.11 .78
#> 3 74.30 168.79 .90
But how does it work and what does it do?
The inspiration: correlations and decimals #
The insipration for fashion()
came from my unending frustration at getting a correlation matrix to print out exactly how I wanted. For example, printing correlations typically looks something like:
mtcars %>% correlate()
#> # A tibble: 11 x 12
#> rowname mpg cyl disp hp drat
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 mpg NA -0.8521620 -0.8475514 -0.7761684 0.68117191
#> 2 cyl -0.8521620 NA 0.9020329 0.8324475 -0.69993811
#> 3 disp -0.8475514 0.9020329 NA 0.7909486 -0.71021393
#> 4 hp -0.7761684 0.8324475 0.7909486 NA -0.44875912
#> 5 drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 NA
#> 6 wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065
#> 7 qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476
#> 8 vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846
#> 9 am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113
#> 10 gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013
#> 11 carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980
#> # ... with 6 more variables: wt <dbl>, qsec <dbl>, vs <dbl>, am <dbl>,
#> # gear <dbl>, carb <dbl>
But this is just plain ugly. Personally, I wanted:
- Decimal places rounded to the same length (usually 2)
- All the leading zeros removed, but keeping the decimal aligned with/without
-
for negative numbers. - Missing values (
NA
) to appear empty (""
).
This is exactly what fashion does:
mtcars %>% correlate() %>% fashion()
#> rowname mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 mpg -.85 -.85 -.78 .68 -.87 .42 .66 .60 .48 -.55
#> 2 cyl -.85 .90 .83 -.70 .78 -.59 -.81 -.52 -.49 .53
#> 3 disp -.85 .90 .79 -.71 .89 -.43 -.71 -.59 -.56 .39
#> 4 hp -.78 .83 .79 -.45 .66 -.71 -.72 -.24 -.13 .75
#> 5 drat .68 -.70 -.71 -.45 -.71 .09 .44 .71 .70 -.09
#> 6 wt -.87 .78 .89 .66 -.71 -.17 -.55 -.69 -.58 .43
#> 7 qsec .42 -.59 -.43 -.71 .09 -.17 .74 -.23 -.21 -.66
#> 8 vs .66 -.81 -.71 -.72 .44 -.55 .74 .17 .21 -.57
#> 9 am .60 -.52 -.59 -.24 .71 -.69 -.23 .17 .79 .06
#> 10 gear .48 -.49 -.56 -.13 .70 -.58 -.21 .21 .79 .27
#> 11 carb -.55 .53 .39 .75 -.09 .43 -.66 -.57 .06 .27
And if I want to change the number of decimal
places and have a different place holder for NA
values (na_print
):
mtcars %>% correlate() %>% fashion(decimals = 1, na_print = "x")
#> rowname mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 mpg x -.9 -.8 -.8 .7 -.9 .4 .7 .6 .5 -.6
#> 2 cyl -.9 x .9 .8 -.7 .8 -.6 -.8 -.5 -.5 .5
#> 3 disp -.8 .9 x .8 -.7 .9 -.4 -.7 -.6 -.6 .4
#> 4 hp -.8 .8 .8 x -.4 .7 -.7 -.7 -.2 -.1 .7
#> 5 drat .7 -.7 -.7 -.4 x -.7 .1 .4 .7 .7 -.1
#> 6 wt -.9 .8 .9 .7 -.7 x -.2 -.6 -.7 -.6 .4
#> 7 qsec .4 -.6 -.4 -.7 .1 -.2 x .7 -.2 -.2 -.7
#> 8 vs .7 -.8 -.7 -.7 .4 -.6 .7 x .2 .2 -.6
#> 9 am .6 -.5 -.6 -.2 .7 -.7 -.2 .2 x .8 .1
#> 10 gear .5 -.5 -.6 -.1 .7 -.6 -.2 .2 .8 x .3
#> 11 carb -.6 .5 .4 .7 -.1 .4 -.7 -.6 .1 .3 x
Look but don’t touch #
There’s a little bit of magic going on here, but the point to know is that fashion()
is returning a noquote version of the original structure:
mtcars %>% correlate() %>% fashion() %>% class()
#> [1] "data.frame" "noquote"
That means that numbers are no longer numbers.
mtcars %>% correlate() %>% sapply(is.numeric)
#> rowname mpg cyl disp hp drat wt qsec vs
#> FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> am gear carb
#> TRUE TRUE TRUE
mtcars %>% correlate() %>% fashion() %>% sapply(is.numeric)
#> rowname mpg cyl disp hp drat wt qsec vs
#> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> am gear carb
#> FALSE FALSE FALSE
Similarly, missing values are no longer missing values.
mtcars %>% correlate() %>% sapply(function(i) sum(is.na(i)))
#> rowname mpg cyl disp hp drat wt qsec vs
#> 0 1 1 1 1 1 1 1 1
#> am gear carb
#> 1 1 1
mtcars %>% correlate() %>% fashion() %>% sapply(function(i) sum(is.na(i)))
#> rowname mpg cyl disp hp drat wt qsec vs
#> 0 0 0 0 0 0 0 0 0
#> am gear carb
#> 0 0 0
So fashion()
is for looking at output, not for continuing to work with it.
What to use it on #
fashion()
can be used on most standard R structures such as scalars, vectors, matrices, data frames, etc:
fashion(10.277)
#> [1] 10.28
fashion(c(10.3785, NA, 87))
#> [1] 10.38 87.00
fashion(matrix(1:4, nrow = 2))
#> V1 V2
#> 1 1.00 3.00
#> 2 2.00 4.00
You can also use it on non-numeric data. In this case, all fashion()
will do is convert the data to characters, and then alter missing values:
fashion("Hello")
#> [1] Hello
fashion(c("Hello", NA), na_print = "World")
#> [1] Hello World
Now is a good time to take a look back at the opening example to see that it works on a data frame and with a factor column.
Exporting #
Don’t forget that it’s easy to export your fashioned output with something like:
my_data %>% fashion() %>% write.csv("fashioned_file.csv")
So what are you waiting for? Go forth and fashion()
!
Sign off #
Thanks for reading and I hope this was useful for you.
For updates of recent blog posts, follow @drsimonj on Twitter, or email me at drsimonjackson@gmail.com to get in touch.
If you’d like the code that produced this blog, check out the blogR GitHub repository.