![]() ![]() We know from the data generating process that there isn't any heteroscedasticity, and the primary plots for exploring this didn't show any either, so what is happening here? Maybe these plots will help: Uh oh, it does look like there may be a problem. the individual binary predictor variable to see if it looks like there is heteroscedasticity there: However, let's look at the plot of the residuals vs. Let's examine the relevant plots of the model to see if they imply problematic heteroscedasticity: You can see from the data generating process that there is no heteroscedasticity. ![]() Y = 5 + 1*x1 + 2*x2 + rnorm(48) # the true data generating process, there is X2 = rep(c(1,0,0,1), each=12) # here is the (dichotomous) x2 variable X1 = sort(runif(48, min=0, max=50)) # here is the (continuous) x1 variable an individual predictor variable does not help when you have a multiple regression model, consider this example: set.seed(8603) # this makes the example exactly reproducible I discuss it here: Why Levene's test of equality of variances rather than F ratio? In R you use ?leveneTest from the car package.Įdit: To better illustrate the point that looking at a plot of the residuals vs. However, these issues don't apply with a binary predictor.įor what it's worth, if you only have categorical predictors, you can test for heteroscedasticity. To see an example, look at the second plot in answer here: Checking model quality in linear regression. For example, if the residuals form a parabola, there is some curvature in the data that you have missed. What you can do with plots of residuals against individual predictors is check to see if the functional form is properly specified. (I honestly don't even know how that would work.) Likewise, you don't have to check the residuals for each predictor for normality. To see examples, look at the bottom row of my answer here: What does having "constant variance" in a linear regression model mean? Probably the most helpful plot for this purpose is a scale-location plot (also called 'spread-level'), which is a plot of the square root of the absolute value of the residuals vs. This is why, when we have a multiple regression model, we diagnose heteroscedasticity from plots of the residuals vs. It isn't terribly meaningful (and you don't have) to check for heteroscedasticity for each predictor individually. A (multiple) regression model assumes there is only one error term, which is constant everywhere. The question asks, "how do you test assumptions of linear regression such as homoscedasticity when an independent variable is binary?" You have a multiple regression model. Let me address some of the explicit questions and implicit assumptions that lie behind this thread. Has done a good job talking about displays of residuals when you have two groups. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |