THE SPEARMAN'S RANK-ORDER CORRELATION COEFFICIENT

In the previous post, we learned how to test a relationship between two categorical/nominal variables. This post discusses how to test a relationship between two variables, each of which has an ordinal or higher level of measurement. The correlation coefficient that can be used for this level of data is Spearman’s rho (or Spearman’s Rank-Order Correlation Coefficient), which is denoted by $\rho_{s}$ . This coefficient was discovered by Charles Edward Spearman (1863-1945), a British psychologist who contributed greatly to the development of statistics. He was a pioneer of factor analysis, an analysis that is widely used in quantitative research in the social sciences.

Testing the relationship using Spearman’s rho is a nonparametric statistical test. Thus, the original population from which the samples were gathered need not have a certain theoretical probability distribution. Because of the absence of this assumption, Spearman’s rho is usually utilized as an alternative if the original population is not normally distributed. (If the original population is normally distributed, Pearson’s Product-Moment Correlation might be an alternative to test the relationship.) Spearman’s correlation coefficient is calculated based on the rank of the values for the first variable and the rank of the values for the second variable. (See explanation below.)

The Spearman’s rho (computed from the samples) is $r_s = 1 - \frac{6 \Sigma_{i=1}^{n} {d_i}^2}{n^3 - n}$ , where n is the sample size, and for i = 1, 2, …, n, $d_i$ is the difference between the rank of the i-th sample on the first variable and the rank of the i-th sample on the second variable. For example, suppose there are two variables X and Y, and the samples show the values of X and Y as follows.

Table 1

First, we assign a rank to each value for each variable. For example, for variable X, we assign a rank of 1 for the smallest value in X, which is 15. The next larger number after 15 is 19; therefore, we assign a rank of 2 to 19, and so on, until the largest value of X is 50, to which we assign a rank of 10. Do the same for the values of the variable Y. The complete ranks are shown in the following table.

Table 2

In Table 2, column rank(X) includes the ranks assigned to the values of variable X, and rank(Y) contains the ranks assigned to the values of variable Y. Then, add two more columns, namely, column $d_i$ , which contains the difference between rank(X) and rank(Y), and column ${d_i}^2$ . See Table 3.

Table 3

In this example, the Spearman’s rho is: $r_s = 1 - \frac{6 \Sigma_{i=1}^{n} {d_i}^2}{n^3 - n}=1- \frac{6 \cdot 138}{10^3 - 10} \approx 0.164$ . The following is an SPSS output, based on the samples.

Table 4

The table displays the Spearman’s correlation coefficient from the samples, which is 0.164, as produced by the previous calculation. By only considering this value, we cannot determine whether there is a relationship between X and Y. To determine whether there is a relationship between the two, we need to conduct a hypothesis test regarding ρ_s. This will be discussed later in this article.

The following are some important notes related to the correlation test using Spearman’s rho:

The Spearman’s rho range is between -1 and 1. A small ρ_s value (close to zero) indicates a weak relationship. A ρ_s value close to 1 or -1 indicates a strong relationship.
If ρ_s > 0, then both the variables move in the same direction. This means that the values of X are large for large values of Y, and the values of X are small for small values of Y.
If ρ_s < 0, then both variables move in opposite directions. This means that the values of X are large for small values of Y, and the values of X are small for large values of Y.
If ρ_s = 0, it cannot be concluded that the two variables are unrelated.
The condition ρ_s ≠ 0 does not automatically allow us to conclude that there is a causal relationship between the two, even though the test is significant. Causal relationships cannot be determined by relying only on statistical significance tests.

SIGNIFICANCE OF SPEARMAN’S RHO TEST

To test whether there is a relationship between two variables with ordinal data levels or higher, three possible alternative hypotheses (H₁) can be proposed.

Possibility I: ρ_s ≠ 0

Possibility II: ρ_s > 0 (one-tailed test, right-tailed test)

Possibility III: ρ_s < 0 (one-tailed test, left-tailed test)

The null hypothesis in this test is: “There is no relationship between the two variables.“

In deciding whether to reject the null hypothesis, we can use the Spearman correlation coefficient critical value table (click here), or, for sample size n > 100, we use the following t statistic: $t = r_s \sqrt{\frac{n-2}{1-{r_s}^2}}$ with degrees of freedom ν = n-2.

The use of critical value tables

The critical value used depends on the sample size (n) and the level of significance (α). In the table, the row headings indicate the sample size, and the column headings indicate the level of significance. If the test is one-tailed, the column heading is indicated by α(1). If the test is two-tailed, the column heading is indicated by α(2). For example, if the sample size is 15 and the test is one-tailed with a significance level of 0.05, then based on the table, the critical value of r_s is 0.446. If the sample size is 20 and the test is two-tailed with a significance level of 0.01, the critical value of r_s is 0.570.

The following table is used to decide whether the null hypothesis is rejected.

Table 5

The use of the t statistic

The Spearman correlation critical value table does not provide a critical value if the sample size is greater than 100. To decide on the null hypothesis, we use the statistic $t = r_s \sqrt{\frac{n-2}{1-{r_s}^2}}$ with degrees of freedom ν = n-2. Use the following table to determine the H₀ rejection criteria.

Table 6

To determine the critical value of a one-tailed test, we can use the table of t critical values or Microsoft Excel software. To display the critical value using the software, use the command =ABS(T.INV()). For example, if α = 0.05, and the degree of freedom is ν = 13, then the following expression is used in one of the cells of the Excel worksheet: =ABS(T.INV(0.05;13)), and Excel will show the value 1.771. To determine the critical value of a two-tailed test, we can use the table of the t-critical values or by using the command =T.INV.2T() in one of the cells in the Excel worksheet. For example, in a two-tailed test, the t critical value for a 0.05 significance level and 18 degrees of freedom can be obtained by typing =T.INV.2T(0.05;18), and Excel will show a value of 2.101.

CASE STUDY

An example of research in the field of communication science that applies the Spearman correlation coefficient is the study by Adeyeye et. al. (2019) entitled “Data on New Media Use for Agricultural Training and Research at Agricultural Services and Training Center”. To obtain the article, click here. Adeyeye hypothesized that there was a relationship between the use of the internet as a source of farmer training materials and the use of new media in training farmers. To test the hypothesis, the researchers gathered 47 samples. The steps of how the test should be conducted are as follows.

Step 1 (Formulating the hypotheses)

H₀: There is no relationship between the use of the internet as a source of farmer training materials and the use of new media in training farmers.

H₁: ρ_s ≠ 0 [There is a relationship between the use of the internet as a source of farmer training materials and the use of new media in training farmers.]

Step 2 (Selecting the level of significance)

α = 0.05

Step 3 (Determining the test statistic and the rejection criterion)

The appropriate test statistic is $r_s = 1 - \frac{6 \Sigma_{i=1}^{n} {d_i}^2}{n^3 - n}$ . To determine the rejection criterion, we use the Spearman correlation coefficient critical values table, and from the table, we get a critical value of 0.288. Looking at Table 5, we obtain the following H₀ rejection criterion: Reject H₀ if r_s < -0.288 or r_s > 0.288.

Step 4 (Finding the computed test statistic)

The samples were recorded in a .sav file and can be downloaded from the following link: (click here). In the file, there are two variables, namely the internet and newmedia. The internet variable represents the use of the internet as a source of farmer training materials, and the newmedia variable represents the use of new media in training farmers. Below is the SPSS output.

Table 7

From the table above, r_s = -0.087. According to the H₀ rejection criteria in the previous step, the r_s value does not meet the rejection criteria. Therefore, we should not reject H₀.

Step 5 (Conclusion)

There is no significant relationship between the use of the internet as a source of farmer training materials and the use of new media in training farmers.

PROBLEM 1

A researcher hypothesizes that there is a relationship between the grades of the Statistics course and the grades of the undergraduate thesis. For this purpose, a sample of 10 graduates was randomly selected, and the results are summarized in the following table.

At a significance level of 0.05, can it be concluded that there is such a relationship?

PROBLEM 2

Do husbands and wives like the same TV shows? A recent study by Nielsen Media Research asked young married couples to rank shows from best to worst. A rank of 1 indicated the most liked show, and a rank of 14 indicated the least liked show. The following is the sampling result from a married couple..

At the 0.05 level of significance, is it reasonable to conclude that there is a positive association between the two ratings? Source: Lind, D. A., Marchal, W. G., & Wathen, S. A. (2012). Statistical techniques in business & economics (15th ed.). McGraw-Hill Irwin.

The presentation file can be downloaded here.

The Table of Critical Values of Spearman’s Ranked Correlation Coefficient can be downloaded here.

On 16/05/2025

THE SPEARMAN’S RANK-ORDER CORRELATION COEFFICIENT

Leave a Reply Cancel reply

Leave a Reply Cancel reply

Related Topics

A SIMPLE WAY TO TEST THE VALIDITY OF QUESTIONNAIRES

AN INTRODUCTION TO THE MULTIPLE LINEAR REGRESSION

GETTING STARTED WITH SIMPLE LINEAR REGRESSION