How can I create multiple dummy (indicator) variables in Stata?

5 stars based on 72 reviews

Researchers may often need to create multiple indicator variables from a single, often categorical, variable. For example, the variable region where 1 indicates Southeast Asia, 2 indicates Eastern Europe, etc. You may use the generate and replace commands twelve times to create each of the indicator variables:. Repeating this code twelve times is tedious and could lead to mistakes. An alternative to this approach is the tabulate To generate twelve indicator variables based on the variable region, execute the following code in Stata:.

This single command will generate twelve indicator variables dregion1dregion2etc. For example, dregion10 takes the value of 1 when region equals 10, and is 0 otherwise. If you have questions about using statistical and mathematical software at Indiana University, contact Research Analytics. Research Analytics is located on the IU Bloomington campus at Woodburn Hall ; staff are available for consultation Monday-Friday 9am-noon and by appointment.

This is document bajq in the Knowledge Base. Last modified on Please provide your IU email address. If you currently have a problem receiving email at your IU account, enter an alternate email address. Options Help Chat with a consultant. How can I create multiple dummy indicator variables in Stata?

You may use the generate and replace commands twelve times to create each of the indicator variables: To generate twelve indicator variables based on the variable region, execute the following code in Stata: I need help with a computing problem.

Please note that you must be affiliated with Indiana University to receive support. All fields are required. Email address Please provide your IU email address. Please enter your question or describe your problem. I have a comment for the Knowledge Base. Fill out this form to submit your comment to the IU Knowledge Base. If you are affiliated with Indiana University and need help with a computing problem, please use the I need help with a computing problem section above, or contact your campus Support Center.

My comment concerns this document.

Steam startoptionen arma 3

  • How to fail trading binary options

    Stock market trading online broker certificate

  • Binary method trading

    Optionbit binary options broker reviews

Strategy of selling 30 min binary option

  • Kopieren handel mit binare optionen erfahrung

    How to predict binary options accurate call and put

  • What does a crude oil broker do

    Watch how to trade binary options successfully pdf

  • High frequency trading firms uk

    Binary option delta hedge libros de opciones binarias pdf

4 option binary trading system free signals

20 comments Binarycom winning strategy

Capped call optionsschein

This page makes use of the program xi3 which is no longer being maintained and has been from our archives. References to xi3 will be left on this page because they illustrate specific principles of coding categorical variables. In the previous two chapters, we have focused on regression analyses using continuous variables. However, it is possible to include categorical predictors in a regression analysis, but it requires some extra work in performing the analysis and extra work in properly interpreting the results.

This chapter will illustrate how you can use Stata for including categorical predictors in your analysis and describe how to interpret the results of such analyses. Stata has some great tools that really ease the process of including categorical variables in your regression analysis, and we will emphasize the use of these timesaving tools. This chapter will use the elemapi2 data that you have seen in the prior chapters.

The variable api00 is a measure of the performance of the schools. Below we see the codebook information for api The variable meals is the percentage of students who are receiving state sponsored free meals and can be used as an indicator of poverty.

This was broken into 3 categories to make equally sized groups creating the variable mealcat. The codebook information for mealcat is shown below. We can include a dummy variable as a predictor in a regression analysis as shown below. This may seem odd at first, but this is a legitimate analysis.

But what does this mean? Filling in the values from the regression equation, we get. If a school is not a year-round school i. We can graph the observed values and the predicted values using the scatter command as shown below. Based on the results above, we see that the predicted value for non-year round schools is As you see, the regression equation predicts that the value of api00 will be the mean value, depending on whether a school is a year round school or non-year round school.

For the non-year-round schools, their mean is the same as the intercept It may be surprising to note that this regression analysis with a single dummy variable is the same as doing a t-test comparing the mean api00 for the year-round schools with the non year-round schools see below.

Since a t-test is the same as doing an anova , we can get the same results using the anova command as well. In summary, these results indicate that the api00 scores are significantly different for the schools depending on the type of school, year round school vs. Non year-round schools have significantly higher API scores than year-round schools. Based on the regression results, non year- round schools have scores that are Say, that we would like to examine the relationship between the amount of poverty and api scores.

Below we repeat the codebook info for mealcat showing the values for the three categories. But this is looking at the linear effect of mealcat with api00 , but mealcat is not an interval variable. Instead, you will want to code the variable so that all the information concerning the three levels is accounted for.

You can dummy code mealcat like this. We now have created mealcat1 that is 1 if mealcat is 1, and 0 otherwise. Likewise, mealcat2 is 1 if mealcat is 2, and 0 otherwise and likewise mealcat3 was created. We can see this below. We can now use two of these dummy variables mealcat2 and mealcat3 in the regression analysis.

We can test the overall differences among the three groups by using the test command as shown below. This shows that the overall differences among the three groups are significant. The interpretation of the coefficients is much like that for the binary variables. The coefficient for mealcat2 is the mean for group 2 minus the mean of the omitted group group 1.

And the coefficient for mealcat3 is the mean of group 3 minus the mean of group 1. You can verify this by comparing the coefficients with the means of the groups. Based on these results, we can say that the three groups differ in their api00 scores, and that in particular group2 is significantly different from group1 because mealcat2 was significant and group 3 is significantly different from group 1 because mealcat3 was significant.

We can use the xi command to do the work for us to create the indicator variables and run the regression all in one command, as shown below. When we use xi and include the term i. As you can see, the results are the same as in the prior analysis. If we want to test the overall effect of mealcat we use the test command as shown below, which also gives us the same results as we found using the dummy variables mealcat2 and mealcat3.

One of the improvements in Stata 7 is that variable names can be longer than 8 characters, so the names of the variables created by the xi command are easier to understand than in version 6. From this point forward, we will use the variable names that would be created in version 7.

What if we wanted a different group to be the reference group? With group 3 omitted, the constant is now the mean of group 3 and mealcat1 is group1-group3 and mealcat2 is group2-group3. We see that both of these coefficients are significant, indicating that group 1 is significantly different from group 3 and group 2 is significantly different from group 3.

When we use the xi command, how can we choose which group is the omitted group? By default, the first group is omitted, but say we want group 3 to be omitted.

We can use the char command as shown below to tell Stata that we want the third group to be the omitted group for the variable mealcat. If you save the data file, Stata will remember this for future Stata sessions. You can compare and see that these results are identical to those found using mealcat1 and mealcat2 as predictors. We can also do this analysis using the anova command. The benefit of the anova command is that it gives us the test of the overall effect of mealcat without needing to subsequently use the test command as we did with the regress command.

We can see the anova test of the effect of mealcat is the same as the test command from the regress command. We can even follow this with the anova, regress command and compare the parameter estimates with those we performed previously. While you can control which category is the omitted category when you use the regress command, the anova, regress command always drops the last category.

It is generally very convenient to use dummy coding but that is not the only kind of coding that can be used. As you have seen, when you use dummy coding one of the groups becomes the reference group and all of the other groups are compared to that group. This may not be the most interesting set of comparisons. Say you want to compare group 1 with groups 2 and 3, and for a second comparison compare group 2 with group 3.

You need to generate a coding scheme that forms these 2 comparisons. We will illustrate this using a Stata program, xi3 , an enhanced version of xi that will create the variables you would need for such comparisons as well as a variety of other common comparisons. The comparisons that we have described comparing group 1 with 2 and 3, and then comparing groups 2 and 3 correspond to Helmert comparisons see Chapter 5 for more details.

We use the h. Otherwise, you see that xi3 works much like the xi command. Both of these comparisons are significant, indicating that group 1 differs significantly from groups 2 and 3 combined, and group 2 differs significantly from group 3.

Using the coding scheme provided by xi3 , we were able to form perhaps more interesting tests than those provided by dummy coding. The xi3 program can create variables according to other coding schemes, as well as custom coding schemes that you create, see help xi3 and Chapter 5 for more information. As a result, cell3 is the reference cell.

The constant is the predicted value for this cell. Since this model has only main effects, it is also the difference between cell2 and cell5, or from cell1 and cell4.

Since this model only has main effects, it is also the predicted difference between cell4 and cell6. We should note that if you computed the predicted values for each cell, they would not exactly match the means in the 6 cells. The predicted means would be close to the observed means in the cells, but not exactly the same.

This is because our model only has main effects and assumes that the difference between cell1 and cell4 is exactly the same as the difference between cells 2 and 5 which is the same as the difference between cells 3 and 6. Note that we get the same information that we do from the xi: The anova command automatically provides the information provided by the test command. If we like, we can also request the parameter estimates later just by doing this.

However, the anova command is rigid in its determination of which group will be the omitted group and the last group is dropped. Since this differs from the coding we used in the regression commands above, the parameter estimates from this anova command will differ from the regress command above.

In summary, these results indicate the differences between year round and non-year round schools is significant, and the differences among the three mealcat groups are significant. When using xi , it is easy to include an interaction term, as shown below. We can test the overall interaction with the test command. This interaction effect is not significant. It is important to note how the meaning of the coefficients change in the presence of these interaction terms.

The presence of an interaction would imply that the difference between year round and non-year round schools depends on the level of mealcat. Below we have shown the predicted values for the six cells in terms of the coefficients in the model. It can be very tricky to interpret these interaction terms if you wish to form specific comparisons. Constructing these interactions can be somewhat easier when using the anova command.

As you see below, the anova command gives us the test of the overall main effects and interactions without the need to perform subsequent test commands. It is easy to perform tests of simple main effects using the sme command. You can download sme from within Stata by typing search sme see How can I used the search command to search for programs and get additional help?

Although this section has focused on how to handle analyses involving interactions, these particular results show no indication of interaction. We could decide to omit interaction terms from future analyses having found the interactions to be non-significant.