MRC CBU Wiki
4 stars based on
When a researcher wishes to include a categorical variable with more than two level in a multiple regression prediction model, additional steps are needed to insure that the results are interpretable. These steps include binary variable correlation spss the categorical variable into a number of separate, dichotomous variables.
This recoding is called " dummy coding. Multiple regression is a linear transformation of the X variables such that the sum of squared deviations of the observed and predicted Y is minimized.
The prediction of Y is accomplished by the following equation:. The "b" values binary variable correlation spss called regression weights and are computed in a way that minimizes the sum of squared deviations. Categorical variables with two levels may be directly entered as predictor or predicted variables in a multiple regression model.
Their use in multiple binary variable correlation spss is a straightforward extension of their use in simple linear regression. When entered as predictor variables, interpretation of regression weights binary variable correlation spss upon how the variable is coded.
If the dichotomous variable is coded as 0 and 1, the regression weight is added or subtracted to the predicted value of Y depending upon whether it is positive or negative. If the dichotomous variable is coded as -1 and 1, then if the regression weight is positive, it is subtracted from the group coded as -1 and added to the group coded as 1. If the regression weight is negative, then addition and subtraction is reversed. Dichotomous variables can be included in hypothesis tests for R 2 change like any other variable.
A block of variables can simultaneously be entered into an hierarchical regression analysis and tested as to whether binary variable correlation spss a whole they significantly increase R 2given the variables already entered into the regression equation. The degrees of freedom for the R 2 change test corresponds to the number of variables entered in the block of variables.
Adding variables to a linear regression model will always increase the unadjusted R 2 value. If the additional predictor variables are correlated with the predictor variables already in the model, then the combined results are difficult to predict. In some cases, the combined result will provide only a slightly better prediction, while in other cases, a much better prediction than expected will be the outcome of combining two correlated variables.
Namely the R 2 change will be equal to the correlation coefficient squared between binary variable correlation spss added variable and predicted variable. In this case it makes binary variable correlation spss difference what order the predictor variables are entered into the prediction model. The value for R 2 change for X 2 given X 1 was in the model would be. The value for R 2 change for X 2 given no variable was in the model would be.
It would make no binary variable correlation spss at what stage X 2 was entered into the model, the value for R 2 change would always be.
Similarly, the R 2 change value for X 1 would always binary variable correlation spss. Because of this relationship, uncorrelated predictor variables will be preferred, when possible. It is fairly clear that Gender could be directly entered into a regression model predicting Salary, because it is dichotomous. The problem is how to deal with the two categorical predictor variables with more than two levels Rank and Dept.
In general, a categorical variable with k levels will be transformed into k-1 variables each with two levels. For example, if a categorical variable had six levels, then five dichotomous variables could be constructed that would contain the same information as the single categorical variable. Dichotomous variables have the advantage that they can be directly entered into the regression model.
The process of creating dichotomous variables from categorical variables is called dummy coding. Depending upon how the dichotomous variables are constructed, additional information can be gleaned from the analysis. In addition, careful construction will result in uncorrelated dichotomous variables.
As discussed earlier, these variables have the advantage of simplicity of interpretation and are preferred to correlated predictor variables. The simplest case binary variable correlation spss dummy coding is when the categorical variable has three levels and is converted to two binary variable correlation spss variables.
This variable could binary variable correlation spss dummy coded into two variables, one called FamilyS and one called Biology. The dummy coding is represented below. The Dept variable is the "Numeric Variable" that is going to be transformed.
In this case the FamilyS variable is going to be created. The window on the screen should appear as follows:. Clicking on the Change button and then on the Old and New Values The Old Value is the level of the categorical variable to be changed, the New Value is the value on the transformed variable.
In the example window above, a value of 3 on the Dept variable will be coded as a 0 on the FamilyS variable. The Add button must be pressed to add the recoding binary variable correlation spss the list.
When all the recodings have been added, click on the Continue button and then the OK button. The recoding of the Biology is accomplished in the same manner. A listing of the data is presented below. Two things should be observed in the correlation matrix. The first is that the correlation between FamilyS and Biology is not zero, rather it is. Second is that the correlation between the Salary variable and the two dummy variables is different from zero. The correlation between FamilyS and Salary is significantly different from zero.
The results of predicting Salary from FamilyS and Biology using a multiple regression procedure are presented below. The first table enters FamilyS in the first block and Biology in the second.
The second table reverses the order that the variables are entered into the regression equation. The model summary tables are presented below. In the first table above both FamilyS and Biology are significant. In the second, only FamilyS is statistically significant. Note that both orderings end up with the same value for multiple Binary variable correlation spss. It makes a difference what order the variables are entered into the regression equation in the hierarchical analysis.
In the next tables, both FamilyS and Biology have been entered in the first block. The Coefficients table can be interpreted as Biology making 8.
Note that the "Sig. ANOVA is a special case of linear regression when the variables have been dummy coded. The second notable comparison of the tables involves the regression weights and the actual differences between the means. Note that the regression weight for FamilyS in the regression procedure is Selection of an appropriate set of dummy codes will result in new variables that are uncorrelated or independent of each other.
In the case when the categorical variable has three levels this can be accomplished by creating a new variable where one level of the categorical variable is assigned the value of -2 and the other levels are assigned the value of 1. The signs are arbitrary and may be reversed, that is, values of 2 and -1 would work equally well. The second variable created as a dummy code will have the level of the categorical variable coded as -2 given the value of 0 and the other values recoded as 1 and In all cases the sum of the dummy coded variable will be zero.
Trust me, this is actually much easier than it sounds. Each of the new dummy coded variables, called a contrastcompares levels coded with a positive number to levels coded with a negative number. Levels coded with a zero are not included in the interpretation. This variable could be dummy coded into two variables, one called Business comparing the Business Department with the other two departments and one called FSvsBio for Family Studies versus Biology.
The Business contrast would create a variable where all members of the Business Department would be given binary variable correlation spss value of -2 and all members of binary variable correlation spss other two departments would be given a value of 1. The FSvsBio contrast would assign a value of 0 to members of the Business Department, 1 divided by the number of members of the Family Studies Department to member binary variable correlation spss the Family Studies Department, and -1 divided by the number of members of the Biology Department to members of the Biology Department.
The FSvsBio variable could be coded as 1 and -1 for Family Studies and Biology respectively, but the recoded variable would no longer be uncorrelated with the first dummy coded variable Business. In most practical applications, it makes little difference whether the binary variable correlation spss are correlated or not, so the simpler 1 and -1 coding is generally preferred. The contrasts are summarized in the following table. Orthogonal dummy coded Variables. Note that the correlation coefficient between the two contrasts is zero.
The correlation between the Business contrast and Salary is. This correlation coefficient has a significance level of. The correlation coefficient between the FSvsBio contrast binary variable correlation spss Salary is. In this case entering Business or FSvsBio first makes no difference in the results of the regression analysis. Entering both contrasts simultaneously into the regression equation produces the following ANOVA table.
It may be concluded that it does not make a difference what set of contrasts are selected when only the overall test of significance is desired. It does make a difference how contrasts are selected, however, if it is desired to make a meaningful interpretation of each contrast. The coefficient table for the simultaneous entry of both contrasts is presented below.
In this case the Business contrast was significant and the FSvsBio contrast was not. The interpretation of these results would be that the Business Department was paid significantly more than the Family Studies and Biology Departments, but that no significant differences in salary were found between the Family Studies and Binary variable correlation spss Departments. By carefully selecting the set of contrasts to be used in the regression with categorical variables, it is possible to construct tests of specific hypotheses.
The hypotheses to be tested are generated by the theory used when designing the study. If a categorical variable had six levels, five dummy coded contrasts would be necessary to use the categorical variable in a regression analysis. For example, suppose that binary variable correlation spss researcher at a headache care center did a study with six groups of four patients each N is being deliberately kept binary variable correlation spss.
The dependent measure is subjective experience of pain. The six groups consisted of six different treatment conditions. The six treatment conditions of the second example. An independent contrast is a contrast that is not a linear combination of any other set of contrasts.