Like simple linear regression, the model described below is also often used in research in communication science to test the influence of one variable on another variable. This model is called multiple linear regression. Here are two examples of research in the field of communication science that applied multiple linear regression: 1) Communication Channels: The Effects of Frequency, Duration, and Function on Gratification Obtained (2014), 2) Applications of Multiple Linear Regression in Social Media Related Marketing (2024).
What is the difference between simple linear regression and multiple linear regression? The difference is, among other things, in terms of the number of explanatory variables. In simple linear regression, there is one explanatory variable and one dependent variable, while in multiple linear regression, there is more than one explanatory variable and one dependent variable. The assumptions for applying multiple linear regression are also more than simple linear regression. These assumptions will be briefly described in this article.
The population regression function in multiple linear regression with (k-1) explanatory variables is as follows: E(Y|X2,X3, …, Xk) = β1 + β2X2 + β3X3 + … + βkXk. In this equation, X2, X3, …, Xk are explanatory variables, and E(Y|X2,X3, …, Xk) is the average value of the dependent variable (i.e., Y) when the values of X2, X3, …, Xk are given. The function can also be expressed in stochastic form as follows:
. Here, ui is a stochastic disturbance or stochastic error term. The values of β1, β2, β3, …, βk are never known. We can only estimate them from the sampling results. From the sample data, we can determine the sample regression function in stochastic form as follows.
.
SIGNIFICANCE TEST OF INFLUENCES
In multiple linear regression, there are two types of significance tests, namely the test of the influence of each (individual) explanatory variable and the test of joint (or simultaneous) influence. To test whether the variable Xj affects the dependent variable Y, the hypothesis test is carried out with the following steps.
H0: βj = 0
H1: βj ≠ 0
Determine the level of significance (α)
Test statistic: with degrees of freedom ν = n – k and
= standard error
.
H0 rejection criteria: Reject H0 if the p-value < α.
Compare the p-value with α and conclude.
In multiple linear regression, an explanatory variable may have no effect on the dependent variable, but all explanatory variables simultaneously affect the dependent variable. To test this simultaneous effect, we apply the overall significance test. The following are the steps to perform the test.
H0: β2 = β3 = β4 = … = βk = 0
H1: There are some β2 , β3 , β4 , … , βk that are not zero.
Determine the level of significance (α).
Test statistic: with the degrees of freedom of the numerator = k – 1 and the degrees of freedom of the denominator = n – k.
Rejection criterion for H0: Reject H0 if the p-value < α.
Compare the p-value with α and conclude.
In the context of regression analysis, the above test is often called the F test. This test is identical to testing whether R2 = 0. Thus, to test the overall significance, we can write H0 and H1 as follows without changing the meaning.
H0: R2 = 0
H1: R2 ≠ 0
CLOSING NOTES
Each model has assumptions so that the model can be applied. The following are the assumptions that must be met in applying multiple linear regression.
- The model is linear in parameters.
- The values of the explanatory variables are fixed in repeated samples, or if the explanatory variables are stochastic, then the explanatory variables must be independent of the disturbance. So, cov(Xi,ui) = 0.
- Disturbance (ui) is normally distributed with a mean of 0 for each i.
- ui is homoscedastic: var(ui) = σ2 for every i. This means that the variance of ui is constant, independent of i.
- There is no autocorrelation between disturbances. That is, for every i ≠ j cov(ui,uj|Xi, Xj) = 0 or, if the explanatory variables are nonstochastic, cov(ui,uj) = 0.
- n > k ; n = sample size and k = total number of variables in the regression model
- There is variation in the values of the explanatory variables, and there are no outliers.
- There is no perfect multicollinearity among the explanatory variables.
- There is no bias in the model specification.
The discussion of these assumptions is beyond the scope of this article.
SAMPLE PROBLEM
A researcher in communication science is conducting a study that aims to examine the influence of three variables on communication effectiveness (Y). The three variables that are suspected of influencing communication effectiveness (Y) are the use of social media (X2), interpersonal communication skills (X3), and involvement in active verbal communication (X4). The following is a brief explanation of the variables above.
Y = perceived communication effectiveness, using a scale of 0–100
X2 = daily use of social media (in hours)
X3 = interpersonal communication skills, measured using a scale of 0–10
X4 = involvement in active verbal communication, measured by the number of group discussions attended in a week
Questions
- Write the population regression function in stochastic form, for this case.
- Conduct a hypothesis test that social media use influences communication effectiveness.
- Conduct a hypothesis test that interpersonal communication skills influence communication effectiveness.
- Conduct a hypothesis test that active verbal communication influences communication effectiveness.
- Conduct a hypothesis test that the three variables together influence communication effectiveness.
- What is the value of the coefficient of determination R2? Interpret it.
- Write the sample regression function in stochastic form.
Please download the data related to this example question via the following link: (click here)
Answer
The population regression function for this case is as follows.
, with:
Yi = perceived communication effectiveness
X2i = daily social media usage (in hours)
X3i = interpersonal communication skills
X4i = involvement in active verbal communication
ui = stochastic disturbance
To answer questions no. 2, 3, and 4, the hypotheses are formulated as follows.
No. 2
H0: β2 = 0 [Social media usage does not affect communication effectiveness.]
H1: β2 ≠ 0 [Social media usage affects communication effectiveness.]
No. 3
H0: β3 = 0 [Interpersonal communication skills do not affect communication effectiveness.]
H1: β3 ≠ 0 [Interpersonal communication skills affect communication effectiveness.]
No. 4
H0: β4 = 0 [Active verbal communication does not affect communication effectiveness.]
H1: β4 ≠ 0 [Active verbal communication affects communication effectiveness.]
The selected level of significance is α = 0.05.
The test statistic used: for j = 2, 3, 4, with degrees of freedom ν = n – k and
= standard error
.
H0 rejection criteria: Reject H0 if p-value < α.
The following are the results of data processing using SPSS.
Table 1
The p-values for X2, X3, and X4 are shown in the column titled Sig. in Table 1. All p-values are less than 0.05. The decisions made on the three variables are summarized in Table 2 below.
Table 2
Conclusion:
- The use of social media significantly influences communication effectiveness.
- Interpersonal communication skills significantly influence communication effectiveness.
- Active verbal communication significantly influences communication effectiveness.
No. 5
H0: β2 = β3 = β4 = 0
H1: Some β2, β3, β4 are not zero.
Significance level: α = 0.05.
Test statistic: degrees of freedom of the numerator = k – 1 = 4 – 1 = 3 and degrees of freedom of the denominator = n – k = 50 – 4 = 46.
Criteria for rejecting H0: Reject H0 if the p-value < α.
The SPSS output for this simultaneous influence test is shown in Table 3 below.
Table 3
From Table 3, the statistical value of the F test from the sample is 19.694, and the p-value is 0.000, as shown in the Sig. column in the table. This value is less than 0.05, thus, this test is significant. Therefore, the three variables simultaneously influence the effectiveness of communication.
No. 6
The value of the determination coefficient R2 can be seen in Table 4 below.
Table 4
From the table above, we have R2 = 0.562 = 56.2%. The three explanatory variables together contribute 56.2% in explaining the variability in communication effectiveness. There are other factors that can influence effectiveness, with a contribution of 43.8%. These other factors were not examined in the study.
No. 7
From Table 1 above, we get the following values.
From these values, the sample regression function in its stochastic form is as follows.
where
Yi = communication effectiveness
X2i = daily social media usage
X3i = interpersonal communication skills
X4i = engagement in active verbal communication
= stochastic disturbance
Problem 1
A communication science researcher proposes two hypotheses about factors that influence communication competence. There are two factors that he thinks influence communication competence, namely the intensity of communicating in public and the level of anxiety in communicating. The researcher uses a scale of 0-100 to measure communication competence. To measure the intensity of communicating in public, he measures the number of public speaking activities each month. In measuring the level of anxiety in communicating, he uses a scale of 0-10.
The following is an .xlsx file that records 71 samples: (click here). In the file, Y = communication competence, X2 = intensity of communicating in public, and X3 = level of anxiety in communicating. The .sav file for this case can be downloaded via the following link: (click here)
The researcher hypothesizes that:
- The intensity of communicating in public has a “positive” effect on communication competence; the more often someone communicates in public, the better their communication competence.
- The level of anxiety in communicating has a “negative” effect on communication competence; the higher the level of anxiety, the worse their communication competence.
Use multiple linear regression to test the hypotheses.
Questions
- Are the researcher’s hypotheses supported by the sampling results?
- Do both variables simultaneously affect communication competence?
- What is the value of the coefficient of determination R2? Interpret the value.
Problem 2
In The Effects of Frequency, Duration, and Function on Gratification Obtained (2014), assume that the research dataset (in .xlsx) is the one in the following link: (click here). In the file, Y, X1, X2, and X3 were the variables used in the research, namely Y = gratification obtained, X1 = frequency, X2 = duration, and X3 = function.
Questions
- Do frequency, duration, and function (individually, not simultaneously) affect the gratification obtained?
- Do frequency, duration, and function simultaneously affect the gratification obtained?
- What is the value of the coefficient of determination R2? Interpret the value.