Five ways to calculate internal consistency  

Let’s get psychometric and learn a range of ways to compute the internal consistency of a test or questionnaire in R. We’ll be covering:

If you’re unfamiliar with any of these, here are some resources to get you up to speed:

The data #

For this post, we’ll be using data on a Big 5 measure of personality that is freely available from Personality Tests. You can download the data yourself HERE, or running the following code will handle the downloading and save the data as an object called d:

temp <- tempfile()
download.file("http://personality-testing.info/_rawdata/BIG5.zip", temp, mode="wb")
d <- read.table(unz(temp, "BIG5/data.csv"), header = TRUE, sep="\t")
unlink(temp); rm(temp)

At the time this post was written, this data set contained data for 19719 people, starting with some demographic information and then their responses on 50 items: 10 for each Big 5 dimension. This is a bit much, so let’s cut it down to work on the first 500 participants and the Extraversion items (E1 to E10):

d <- d[1:500, paste0("E", 1:10)]
str(d)
#> 'data.frame':    500 obs. of  10 variables:
#>  $ E1 : int  4 2 5 2 3 1 5 4 3 1 ...
#>  $ E2 : int  2 2 1 5 1 5 1 3 1 4 ...
#>  $ E3 : int  5 3 1 2 3 2 5 5 5 2 ...
#>  $ E4 : int  2 3 4 4 3 4 1 3 1 5 ...
#>  $ E5 : int  5 3 5 3 3 1 5 5 5 2 ...
#>  $ E6 : int  1 3 1 4 1 3 1 1 1 4 ...
#>  $ E7 : int  4 1 1 3 3 2 5 4 5 1 ...
#>  $ E8 : int  3 5 5 4 1 4 4 3 2 4 ...
#>  $ E9 : int  5 1 5 4 3 1 4 4 5 1 ...
#>  $ E10: int  1 5 1 5 5 5 1 3 3 5 ...

Here is a list of the extraversion items that people are rating from 1 = Disagree to 5 = Agree:

You can see that there are five items that need to be reverse scored (E2, E4, E6, E8, E10). Because ratings range from 1 to 5, we can do the following:

d[, paste0("E", c(2, 4, 6, 8, 10))] <- 6 - d[, paste0("E", c(2, 4, 6, 8, 10))]

We’ve now got a data frame of responses with each column being an item (scored in the correct direction) and each row being a participant. Let’s get started!

Average inter-item correlation #

The average inter-item correlation is any easy place to start. To calculate this statistic, we need the correlations between all items, and then to average them. Let’s use my corrr package to get these correlations as follows (no bias here!):

library(corrr)
d %>% correlate()
#> # A tibble: 10 x 11
#>    rowname        E1        E2        E3        E4        E5        E6
#>      <chr>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
#> 1       E1        NA 0.4528892 0.5002331 0.5237516 0.5378563 0.3657830
#> 2       E2 0.4528892        NA 0.4792026 0.5549109 0.5916995 0.5694593
#> 3       E3 0.5002331 0.4792026        NA 0.4930294 0.6161848 0.3296186
#> 4       E4 0.5237516 0.5549109 0.4930294        NA 0.5123502 0.4714739
#> 5       E5 0.5378563 0.5916995 0.6161848 0.5123502        NA 0.4996636
#> 6       E6 0.3657830 0.5694593 0.3296186 0.4714739 0.4996636        NA
#> 7       E7 0.6360617 0.4731673 0.5684515 0.4999339 0.6205428 0.3725773
#> 8       E8 0.4498123 0.3791802 0.4177098 0.4505763 0.3850780 0.3310247
#> 9       E9 0.5280366 0.3958574 0.4753085 0.4631383 0.4851778 0.3280024
#> 10     E10 0.4908861 0.4487868 0.5000737 0.5234228 0.5525188 0.4137737
#> # ... with 4 more variables: E7 <dbl>, E8 <dbl>, E9 <dbl>, E10 <dbl>

Because the diagonal is already set to NA, we can obtain the average correlation of each item with all others by computing the means for each column (excluding the rowname column):

inter_item <- d %>% correlate() %>% select(-rowname) %>% colMeans(na.rm = TRUE)
inter_item
#>        E1        E2        E3        E4        E5        E6        E7 
#> 0.4983678 0.4827948 0.4866458 0.4991764 0.5334524 0.4090418 0.5133091 
#>        E8        E9       E10 
#> 0.4270190 0.4731896 0.4814496

Aside, note that select() comes from the dplyr package, which is imported when you use corrr.

We can see that E5 and E7 are more strongly correlated with the other items on average than E8. However, most items correlate with the others in a reasonably restricted range around .4 to .5.

To obtain the overall average inter-item correlation, we calculate the mean() of these values:

mean(inter_item)
#> [1] 0.4804446

However, with these values, we can explore a range of attributes about the relationships between the items. For example, we can visualise them in a histogram and highlight the mean as follows:

library(ggplot2)

data.frame(inter_item) %>% 
  ggplot(aes(x = inter_item)) +
    geom_histogram(bins = 10, alpha = .5) +
    geom_vline(xintercept = mean(inter_item), color = "red") +
    xlab("Mean inter-item correlation") +
    theme_bw()

unnamed-chunk-8-1.png

Average item-total correlation #

We can investigate the average item-total correlation in a similar way to the inter-item correlations. The first thing we need to do is calculate the total score. Let’s say that a person’s score is the mean of their responses to all ten items:

d$score <- rowMeans(d)
head(d)
#>   E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 score
#> 1  4  4  5  4  5  5  4  3  5   5   4.4
#> 2  2  4  3  3  3  3  1  1  1   1   2.2
#> 3  5  5  1  2  5  5  1  1  5   5   3.5
#> 4  2  1  2  2  3  2  3  2  4   1   2.2
#> 5  3  5  3  3  3  5  3  5  3   1   3.4
#> 6  1  1  2  2  1  3  2  2  1   1   1.6

Now, we’ll correlate() everything again, but this time focus() on the correlations of the score with the items:

item_total <- d %>% correlate() %>% focus(score)
item_total
#> # A tibble: 10 x 2
#>    rowname     score
#>      <chr>     <dbl>
#> 1       E1 0.7520849
#> 2       E2 0.7297506
#> 3       E3 0.7350644
#> 4       E4 0.7485092
#> 5       E5 0.7945943
#> 6       E6 0.6367799
#> 7       E7 0.7768180
#> 8       E8 0.6640914
#> 9       E9 0.7273987
#> 10     E10 0.7306038

Again, we can calculate their mean as:

mean(item_total$score)
#> [1] 0.7295695

And we can plot the results:

item_total %>% 
  ggplot(aes(x = score)) +
    geom_histogram(bins = 10, alpha = .5) +
    geom_vline(xintercept = mean(item_total$score), color = "red") +
    xlab("Mean item-total correlation") +
    theme_bw()

unnamed-chunk-12-1.png

Cronbach’s alpha #

Cronbach’s alpha is one of the most widely reported measures of internal consistency. Although it’s possible to implement the maths behind it, I’m lazy and like to use the alpha() function from the psych package. This function takes a data frame or matrix of data in the structure that we’re using: each column is a test/questionnaire item, each row is a person. Let’s test it out below. Note that alpha() is also a function from the ggplot2 package, and this creates a conflict. To specify that we want alpha() from the psych package, we will use psych::alpha()

d$score <- NULL  # delete the score column we made earlier

psych::alpha(d)
#> 
#> Reliability analysis   
#> Call: psych::alpha(x = d)
#> 
#>   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd
#>        0.9       0.9     0.9      0.48 9.2 0.0065    3 0.98
#> 
#>  lower alpha upper     95% confidence boundaries
#> 0.89 0.9 0.91 
#> 
#>  Reliability if an item is dropped:
#>     raw_alpha std.alpha G6(smc) average_r S/N alpha se
#> E1       0.89      0.89    0.89      0.48 8.2   0.0073
#> E2       0.89      0.89    0.89      0.48 8.3   0.0072
#> E3       0.89      0.89    0.89      0.48 8.3   0.0073
#> E4       0.89      0.89    0.89      0.48 8.2   0.0073
#> E5       0.89      0.89    0.89      0.47 7.9   0.0075
#> E6       0.90      0.90    0.90      0.50 8.9   0.0068
#> E7       0.89      0.89    0.89      0.47 8.1   0.0075
#> E8       0.90      0.90    0.90      0.49 8.8   0.0069
#> E9       0.89      0.89    0.89      0.48 8.4   0.0071
#> E10      0.89      0.89    0.89      0.48 8.3   0.0072
#> 
#>  Item statistics 
#>       n raw.r std.r r.cor r.drop mean  sd
#> E1  500  0.75  0.75  0.72   0.68  2.6 1.3
#> E2  500  0.73  0.73  0.70   0.66  3.2 1.3
#> E3  500  0.74  0.74  0.70   0.67  3.3 1.3
#> E4  500  0.75  0.75  0.72   0.68  2.8 1.3
#> E5  500  0.79  0.80  0.78   0.74  3.3 1.3
#> E6  500  0.64  0.64  0.59   0.55  3.6 1.3
#> E7  500  0.78  0.77  0.75   0.70  2.8 1.5
#> E8  500  0.66  0.66  0.61   0.58  2.7 1.3
#> E9  500  0.73  0.72  0.68   0.64  3.1 1.5
#> E10 500  0.73  0.73  0.69   0.66  2.3 1.3
#> 
#> Non missing response frequency for each item
#>        1    2    3    4    5 miss
#> E1  0.28 0.23 0.23 0.16 0.10    0
#> E2  0.12 0.21 0.22 0.21 0.23    0
#> E3  0.09 0.20 0.25 0.22 0.24    0
#> E4  0.19 0.26 0.23 0.21 0.12    0
#> E5  0.11 0.21 0.20 0.23 0.25    0
#> E6  0.09 0.15 0.17 0.28 0.30    0
#> E7  0.28 0.20 0.17 0.16 0.20    0
#> E8  0.23 0.27 0.19 0.19 0.12    0
#> E9  0.19 0.21 0.17 0.19 0.25    0
#> E10 0.36 0.27 0.12 0.15 0.09    0

This function provides a range of output, and generally what we’re interested in is std.alpha, which is “the standardised alpha based upon the correlations”. Also note that we get “the average interitem correlation”, average_r, and various versions of “the correlation of each item with the total score” such as raw.r, whose values match our earlier calculations.

If you’d like to access the alpha value itself, you can do the following:

psych::alpha(d)$total$std.alpha
#> [1] 0.9024126

Split-half reliability (adjusted using the Spearman–Brown prophecy formula) #

There are times when we can’t calculate internal consistency using item responses. For example, I often work with a decision-making variable called recklessness. This variable is calculated after people answer questions (e.g., “What is the longest river is Asia”), and then decide whether or not to bet on their answer being correct. Recklessness is calculated as the proportion of incorrect answers that a person bets on.

If you think about it, it’s not possible to calculate internal consistency for this variable using any of the above measures. The reason for this is that the items that contribute to two people’s recklessness scores could be completely different. One person could give incorrect answers on questions 1 to 5 (thus these questions go into calculating their score), while another person might incorrectly respond to questions 6 to 10. Thus, calculating recklessness for many individuals isn’t as simple as summing across items. Instead, we need an item pool from which to pull different combinations of questions for each person.

To overcome this sort of issue, an appropriate method for calculating internal consistency is to use a split-half reliability. This entails splitting your test items in half (e.g., into odd and even) and calculating your variable for each person with each half. For example, I typically calculate recklessness for each participant from odd items and then from even items. These scores are then correlated and adjusted using the Spearman-Brown prophecy/prediction formula (for examples, see some of my publications such as this or this). Similar to Cronbach’s alpha, a value closer to 1 and further from zero indicates greater internal consistency.

We can still calculate split-half reliability for variables that do not have this problem! So let’s do this with our extraversion data as follows:

# Calculating total score...
score_e <- rowMeans(d[, c(TRUE, FALSE)])  # with even items
score_o <- rowMeans(d[, c(FALSE, TRUE)])  # with odd items

# Correlating scores from even and odd items
r <- cor(score_e, score_o)
r
#> [1] 0.7681034

# Adjusting with the Spearman-Brown prophecy formula
(2 * r) / (1 + r)
#> [1] 0.8688445

Thus, in this case, the split-half reliability approach yields an internal consistency estimate of .87.

Composite reliability #

The final method for calculating internal consistency that we’ll cover is composite reliability. Where possible, my personal preference is to use this approach. Although it’s not perfect, it takes care of many inappropriate assumptions that measures like Cronbach’s alpha make. If the specificities interest you, I suggest reading this post.

Composite reliability is based on the factor loadings in a confirmatory factor analysis (CFA). In the case of a unidimensional scale (like extraversion here), we define a one-factor CFA, and then use the factor loadings to compute our internal consistency estimate. I won’t go into the detail, but we can interpret a composite reliability score similarly to any of the other metrics covered here (closer to one indicates better internal consistency). We’ll fit our CFA model using the lavaan package as follows:

library(lavaan)

# Define the model
items <- paste(names(d), collapse = "+")
model <- paste("extraversion", items, sep = "=~")
model
#> [1] "extraversion=~E1+E2+E3+E4+E5+E6+E7+E8+E9+E10"

# Fit the model
fit <- cfa(model, data = d)

There are various ways to get to the composite reliability from this model. After receiving a great suggestion from Gaming Dude, a nice approach is to use reliability() from the semTools package as follows:

library(semTools)
reliability(fit)
#>        extraversion     total
#> alpha     0.9022248 0.9022248
#> omega     0.9032530 0.9032530
#> omega2    0.9032530 0.9032530
#> omega3    0.9023035 0.9023035
#> avevar    0.4859231 0.4859231

You can see that this function returns a matrix with five reliability estimates for our factor (including Cronbach’s alpha). In this case, we’re interested in omega, but looking across the range is always a good idea. A nice advantage to this function is that it will return the reliability estimates for all latent factors in a more complex model!

Below is the original method I had posted, involving a “by-hand” extraction of the factor loadings and computation of the omega composite reliability.

sl <- standardizedSolution(fit)
sl <- sl$est.std[sl$op == "=~"]
names(sl) <- names(d)
sl  # These are the standardized factor loadings for each item
#>        E1        E2        E3        E4        E5        E6        E7 
#> 0.7266721 0.6903172 0.7154705 0.7118762 0.7841427 0.5791526 0.7589250 
#>        E8        E9       E10 
#> 0.5962855 0.6726232 0.6940120

# Compute residual variance of each item
re <- 1 - sl^2

# Compute composite reliability
sum(sl)^2 / (sum(sl)^2 + sum(re))
#> [1] 0.9029523

There you have it. The reason for me mentioning this approach is that it will give you an idea of how to extract the factor loadings if you want to visualise more information like we did earlier with the correlations. I’ll leave this part up to you!

Sign off #

Thanks for reading and I hope this was useful for you.

For updates of recent blog posts, follow @drsimonj on Twitter, or email me at drsimonjackson@gmail.com to get in touch.

If you’d like the code that produced this blog, check out the blogR GitHub repository.

 
80
Kudos
 
80
Kudos

Now read this

k-fold cross validation with modelr and broom

@drsimonj here to discuss how to conduct k-fold cross validation, with an emphasis on evaluating models supported by David Robinson’s broom package. Full credit also goes to David, as this is a slightly more detailed version of his past... Continue →