Statistics can only process quantitative data? BIG MISTAKE! Statistics can process all types of data; the only difference is the method of processing. In this article, we will learn how to test for a relationship between two variables whose values are qualitative.
The following example is inspired by a communication study by Zhao and Gantz, who analyzed the use of disruptive and cooperative interruptions by male and female television characters in prime-time fiction. (Their journal article can be downloaded here.) However, for educational purposes, this article simplifies the actual hypothesis they proposed. In this case, the hypothesis is that there is a relationship between the gender of the interrupter and the type of interruption. The gender of the interrupter is a nominal variable that can take the values “MALE” or “FEMALE.” The type of interruption is also a nominal variable that can take the values “DISRUPTIVE” or “COOPERATIVE.”
To check for a relationship between two nominal variables, we can use Cramer’s V formula as follows: . In this formula,
, n is the sample size, and L is the smallest value between r and c. (The meaning of other symbols can be seen in the following description.) Cramer’s V measures the strength of the relationship between two variables at the nominal level. The range of V values is: 0 ≤ V ≤ 1. Therefore, the lowest value of V is 0 (which indicates no relationship at all) and the highest value is 1 (which indicates a perfectly strong relationship).
To check for the existence of such a relationship, we perform the usual steps of statistical hypothesis testing. Here is an example of how it is performed. (See the STEP-BY-STEP EXPLANATION at the end of this article.)
Step 1: Formulate the hypotheses
H0: V = 0 [There is no relationship between the gender of the interrupter and the type of interruption.]
H1: V ≠ 0 [There is a relationship between the gender of the interrupter and the type of interruption.]
Step 2: Determine the level of significance
α = 0.05
Step 3: Determine the test statistic and critical region
Test statistic:
with degrees of freedom ν = (r-1)(c-1).
The critical value = 3.841. The critical/rejection region is .
Rejection criteria: reject H0 if .
Step 4: Calculating the value of the test statistic from the samples
The sampling result data can be downloaded via the following link: (click here)
Using SPSS Version 21, we have the following output.
Table 1
The value of the test statistic from the samples can be obtained from Table 1, namely, Χ2 = 1.007. This value is less than the critical value (3.841). Therefore, we cannot reject H0.
Step 5: Drawing conclusions
The samples fail to prove that there is a relationship between the gender of the interrupter and the type of interruption.
STEP-BY-STEP EXPLANATION
Explanation for Step 1
Formulating hypotheses is always the first step in any statistical hypothesis test. To test the relationship between two nominal variables, the null hypothesis always states that there is no relationship (V=0), and the alternative hypothesis states that there is a relationship (V≠0). Please note that the test using Cramer’s V is always 2-tailed. Because of its nominal nature, it will not make sense if we state that the value of a variable increases or decreases as the other variable increases or decreases. This test is only to check for a relationship.
Explanation for Step 2
In actual research practice, researchers can choose their level of significance. However, it is common in communication science research that the significance level is set at 0.05.
Explanation for Step 3
The test statistic for testing the relationship between two nominal variables is with a degree of freedom of ν = (r-1)(c-1). The meaning of the symbols in this formula can be looked at in the Step 4 Explanation below. The critical value can be obtained by typing =CHISQ.INV.RT(0.05;1) in one of the cells of the Excel worksheet, and the value 3.841 will be displayed. The CHISQ.INV.RT function has two parameters, namely the level of significance (α) and the degree of freedom (ν). In this case, α = 0.05 has been set. Here, r = 2 and c = 2 (see Step 4 Explanation below), so that ν = (2-1)×(2-1) = 1. Accordingly, the parameters of the function are 0.05 and 1. Next, the H0 rejection region is shown in the following figure.
Figure 1
In the figure above, the area of the red shaded region is the significance level, i.e., 0.05. Thus, if the value of the test statistic from the samples falls in the region, then H0 is rejected.
Explanation for Step 4
The statistic used is with degree of freedom ν = (r-1)(c-1). In the formula, r is the number of categories in one variable, and c is the number of categories in the other variable. In this case, there are 2 possible values for gender, namely “FEMALE” and “MALE”. So, r = 2. Also, there are 2 possible values for the type of interruption, namely “DISRUPTIVE” and “COOPERATIVE”. Thus, c = 2. The symbol Oij indicates the number of samples that belong to category i in the first variable and category j in the second variable.
Table 2
Suppose i = 1 represents the category “FEMALE”, i = 2 represents “MALE”, j = 1 represents “DISRUPTIVE”, and j = 2 represents “COOPERATIVE”. Table 2 above is a summary of the data in this case. From Table 2, we get O11 = 82 (the number of FEMALE samples whose interruptions are DISRUPTIVE is 82), O12 = 41 (the number of FEMALE samples whose interruptions are COOPERATIVE is 41), O21 = 80 (the number of MALE samples whose interruptions are DISRUPTIVE is 80), and O22 = 30 (the number of MALE samples whose interruptions are COOPERATIVE is 30).
In the formula above, . Ri represents the total number of samples that belong to the i-th category in the first variable, Cj is the total number of samples that belong to the j-th category in the second variable, and n is the total number of samples. In this example, R1 = the number of “FEMALE” samples, R2 = the number of “MALE” samples, C1 = the number of “DISRUPTIVE” samples, and C2 = the number of “COOPERATIVE” samples. As can be seen in Table 2, R1 = 123, R2 = 110, C1 = 162, C2 = 71, and n = 233. Thus, E11, E12, E13, and E14 are calculated as follows.
The results are summarized in the following table.
Table 3
After determining Eij for each cell, apply . To facilitate the calculation, combine Table 2 and Table 3 as follows.
Table 4
Next, Χ2 is calculated as follows.
(See Pearson Chi-Square values in Table 1.)
Figure 2
See Figure 2 above. The figure shows that the value of the test statistic from the samples does not fall in the H0 rejection region. Since Χ2 < 3.841, we cannot reject H0.
Explanation for Step 5
Since we failed to reject H0, the test is not significant. There is no significant relationship between the gender of the interrupter and the type of interruption.
CALCULATING CRAMER’S V
To measure the strength of the relationship between two nominal variables, we use the formula , where L = the smallest value between r and c.
In this case, L = 2 because the smallest value between r = 2 and c = 2 is 2. As a result, . The V value can also be obtained in the following table.
Table 5
PROBLEM
The advertising director of the Carolina Sun Times studied the relationship between the type of community in which the subscribers reside and the section of the newspaper they read first. The sampling result is summarized in the following table.
At the 0.05 level of significance, can we conclude that there is a relationship between the type of community in which the subscribers reside and the section of the newspaper they read first?
Here is the data file: (click here)