# Easy leave-one-out cross validation with pipelearner

@drsimonj here to show you how to do leave-one-out cross validation using pipelearner.

## Leave-one-out cross validation #

Leave-one-out is a type of cross validation whereby the following is done for each observation in the data:

- Run model on all other observations
- Use model to predict value for observation

This means that a model is fitted, and a predicted is made *n* times where *n* is the number of observations in your data.

## Leave-one-out in pipelearner #

pipelearner is a package for streamlining machine learning pipelines, including cross validation. If you’re new to it, check out blogR for other relevant posts.

To demonstrate, let’s use regression to predict horsepower (`hp`

) with all other variables in the `mtcars`

data set. Set this up in pipelearner as follows:

```
library(pipelearner)
pl <- pipelearner(mtcars, lm, hp ~ .)
```

How cross validation is done is handled by `learn_cvpairs()`

. For leave-one-out, specify *k = number of rows*:

```
pl <- learn_cvpairs(pl, k = nrow(mtcars))
```

Finally, `learn()`

the model on all folds:

```
pl <- learn(pl)
```

This can all be written in a pipeline:

```
pl <- pipelearner(mtcars, lm, hp ~ .) %>%
learn_cvpairs(k = nrow(mtcars)) %>%
learn()
pl
#> # A tibble: 32 × 9
#> models.id cv_pairs.id train_p fit target model params
#> <chr> <chr> <dbl> <list> <chr> <chr> <list>
#> 1 1 01 1 <S3: lm> hp lm <list [1]>
#> 2 1 02 1 <S3: lm> hp lm <list [1]>
#> 3 1 03 1 <S3: lm> hp lm <list [1]>
#> 4 1 04 1 <S3: lm> hp lm <list [1]>
#> 5 1 05 1 <S3: lm> hp lm <list [1]>
#> 6 1 06 1 <S3: lm> hp lm <list [1]>
#> 7 1 07 1 <S3: lm> hp lm <list [1]>
#> 8 1 08 1 <S3: lm> hp lm <list [1]>
#> 9 1 09 1 <S3: lm> hp lm <list [1]>
#> 10 1 10 1 <S3: lm> hp lm <list [1]>
#> # ... with 22 more rows, and 2 more variables: train <list>, test <list>
```

## Evaluating performance #

Performance can be evaluated in many ways depending on your model. We will calculate R^{2}:

```
library(tidyverse)
# Extract true and predicted values of hp for each observation
pl <- pl %>%
mutate(true = map2_dbl(test, target, ~as.data.frame(.x)[[.y]]),
predicted = map2_dbl(fit, test, predict))
# Summarise results
results <- pl %>%
summarise(
sse = sum((predicted - true)^2),
sst = sum(true^2)
) %>%
mutate(r_squared = 1 - sse / sst)
results
#> # A tibble: 1 × 3
#> sse sst r_squared
#> <dbl> <dbl> <dbl>
#> 1 41145.56 834278 0.9506812
```

Using leave-one-out cross validation, the regression model obtains an R^{2} of 0.95 when generalizing to predict horsepower in new data.

We’ll conclude with a plot of each true data point and it’s predicted value:

```
pl %>%
ggplot(aes(true, predicted)) +
geom_point(size = 2) +
geom_abline(intercept = 0, slope = 1, linetype = 2) +
theme_minimal() +
labs(x = "True value", y = "Predicted value") +
ggtitle("True against predicted values based\non leave-one-one cross validation")
```

## Sign off #

Thanks for reading and I hope this was useful for you.

For updates of recent blog posts, follow @drsimonj on Twitter, or email me at drsimonjackson@gmail.com to get in touch.

If you’d like the code that produced this blog, check out the blogR GitHub repository.