FAQ How do I interpret the sign of the quadratic term in a polynomial regression?
Consider the six graphs of the nonlinear (curvilinear) relationships depicted below. Image y1 Image y3 Image y6. Image y2 Image y4 Image y5. Although each of. ous variable (age) that has an interesting curvilinear relationship to the probability . linear and quadratic equations produce similar results in that the curvilinear. Jul 20, One polynomial equation is a quadratic equation, which has the you can test when doing curvilinear regression is that there is no relationship.
Note that the regression line for predicting productivity from creativity becomes steeper and the error of prediction is reduced as cognitive ability increases r increases. Such an interaction would be symmetric. For people with little creativity, there would be little or no correlation between intelligence and productivity. For people with high creativity, there would be a strong correlation between intelligence and productivity.
Curvilinear regression - Handbook of Biological Statistics
We could create three new graphs to show these relations. All we would have to do is take the graphs we already make and to substitute the terms "creativity" and "cognitive ability. In regression terms, an interaction means that the level of one variable influences the slope of the other variable.
We model interaction terms by computing a product vector that is, we multiply the two IVs together to get a third variableand then including this variable along with the other two in the regression equation. A graph of the hypothesized response surface: Note how the regression line of Y on X2 becomes steeper as we move up values of X1.
Also note the curved contour lines on the floor of the figure. This means that the regression surface is curved. Here we can clearly see how the slopes become steeper as we move up values of both X variables.
When we model an interaction with 2 or more IVs with regression, the test we conduct is essentially for this shape.Multiple Linear Regression: Curvilinear (quadratic & cubic) and Interaction
There are many other shapes that we might think of as representing the idea of interaction one variable influences the importance of the otherbut these other shapes are not tested by the product term in regression things are different for categorical variables and product terms; there we can support many different shapes.
Pedhazur's Views of the Interaction In Pedhazur's view, it only makes sense to speak of interactions when 1 the IVs are orthogonal, and 2 the IVs are manipulated, so that one cannot influence the other. In other words, Pedhazur only wants to talk about interactions in the context of highly controlled research, essentially when data are collected in an ANOVA design.
He acknowledges that we can have interactions in nonexperimental research, but he wants to call them something else, like multiplicative effects. Nobody else seems to take this view. The effect is modeled identically both mathematically and statistically in experimental and nonexperimental research. True, they often mean something different, but that is true of experimental and nonexperimental designs generally.
If we follow his reasoning for independent variables that do not interact, we might as well adopt the term 'main effect' for experimental designs and 'additive effect' for nonexperimental designs. I don't understand his point about not having interactions when the IVs are correlated. Clearly we lose power to detect interactions when the IVs are correlated, but in my view, if we find them, they are interpreted just the same as when the IVs are orthogonal.
But I may have missed something important here Conducting Significance Tests for Interactions The product term is created by multiplying the two vectors that contain the two IVs together. The product terms tend to be highly correlated with the original IVs. Most people recommend that we subtract the mean of the IV from the IV before we form the cross-product. This will reduce the size of the correlation between the IV and the cross product term, but leave the test for increase in R-square intact.
It will, however, affect the b weights. When you find a significant interaction, you must include the original variables and the interaction as a block, regardless of whether some of the IV terms are nonsignificant unless all three are uncorrelated, an unlikely event.
Regress Y onto X1 and X2. Test whether the difference in R-square from steps 2 and 3 is significant. Alternatively, skip step 2 and check whether the b weight for the product term is significant in step 3, that is, in a simultaneous regression with Type III sums of squares.
If the b weight for the product term is significant, you have an interaction. Now you need to graph your regression equation to see how to interpret it. You may have to split your data to understand the interaction. If the b weight for the product term is not significant, you do not have an interaction bearing in mind the sorts of errors we make in statistical work.
Drop the product term, go back to step 2, and interpret your b weights for the independent variables as you ordinarily would. Moderators and Mediators Some people talk about moderators and moderated regression.
The moderator variable is one whose values influence the importance of another variable. In that case, the linear regression line will not be very good for describing and predicting the relationship, and the P value may not be an accurate test of the null hypothesis that the variables are not associated. You have three choices in this situation. If you only want to know whether there is an association between the two variables, and you're not interested in the line that fits the points, you can use the P value from linear regression and correlation.
This could be acceptable if the line is just slightly curved; if your biological question is "Does more X cause more Y? However, it will look strange if you use linear regression and correlation on a relationship that is strongly curved, and some curved relationships, such as a U-shape, can give a non-significant P value even when the fit to a U-shaped curve is quite good.
And if you want to use the regression equation for prediction or you're interested in the strength of the relationship r2you should definitely not use linear regression and correlation when the relationship is curved. A second option is to do a data transformation of one or both of the measurement variables, then do a linear regression and correlation of the transformed data. There are an infinite number of possible transformations, but the common ones log, square root, square will make a lot of curved relationships fit a straight line pretty well.
This is a simple and straightforward solution, and if people in your field commonly use a particular transformation for your kind of data, you should probably go ahead and use it. If you're using the regression equation for prediction, be aware that fitting a straight line to transformed data will give different results than fitting a curved line to the untransformed data.
Your third option is curvilinear regression: There are a lot of equations that will produce curved lines, including exponential involving bX, where b is a constantpower involving Xblogarithmic involving log Xand trigonometric involving sine, cosine, or other trigonometric functions. For any particular form of equation involving such terms, you can find the equation for the curved line that best fits the data points, and compare the fit of the more complicated equation to that of a simpler equation such as the equation for a straight line.
Here I will use polynomial regression as one example of curvilinear regression, then briefly mention a few other equations that are commonly used in biology. A polynomial equation is any equation that has X raised to integer powers such as X2 and X3.
It produces a parabola. You can fit higher-order polynomial equations, but it is very unlikely that you would want to use anything more than the cubic in biology. Null hypotheses One null hypothesis you can test when doing curvilinear regression is that there is no relationship between the X and Y variables; in other words, that knowing the value of X would not help you predict the value of Y.
This is analogous to testing the null hypothesis that the slope is 0 in a linear regression. You measure the fit of an equation to the data with R2, analogous to the r2 of linear regression. A cubic equation will always have a higher R2 than quadratic, and so on. The second null hypothesis of curvilinear regression is that the increase in R2 is only as large as you would expect by chance.
Assumptions If you are testing the null hypothesis that there is no association between the two measurement variables, curvilinear regression assumes that the Y variable is normally distributed and homoscedastic for each value of X. Since linear regression is robust to these assumptions violating them doesn't increase your chance of a false positive very muchI'm guessing that curvilinear regression may not be sensitive to violations of normality or homoscedasticity either.
I'm not aware of any simulation studies on this, however. Curvilinear regression also assumes that the data points are independentjust as linear regression does.
Handbook of Biological Statistics
You shouldn't test the null hypothesis of no association for non-independent data, such as many time series. However, there are many experiments where you already know there's an association between the X and Y variables, and your goal is not hypothesis testing, but estimating the equation that fits the line. For example, a common practice in microbiology is to grow bacteria in a medium with abundant resources, measure the abundance of the bacteria at different times, and fit an exponential equation to the growth curve.
The amount of bacteria after 30 minutes is not independent of the amount of bacteria after 20 minutes; if there are more at 20 minutes, there are bound to be more at 30 minutes.
However, the goal of such an experiment would not be to see whether bacteria increase in abundance over time duh, of course they do ; the goal would be to estimate how fast they grow, by fitting an exponential equation to the data. For this purpose, it doesn't matter that the data points are not independent. Just as linear regression assumes that the relationship you are fitting a straight line to is linear, curvilinear regression assumes that you are fitting the appropriate kind of curve to your data.
If you are fitting a quadratic equation, the assumption is that your data are quadratic; if you are fitting an exponential curve, the assumption is that your data are exponential. Violating this assumption—fitting a quadratic equation to an exponential curve, for example—can give you an equation that doesn't fit your data very well.
In some cases, you can pick the kind of equation to use based on a theoretical understanding of the biology of your experiment. If you are growing bacteria for a short period of time with abundant resources, you expect their growth to follow an exponential curve; if they grow for long enough that resources start to limit their growth, you expect the growth to fit a logistic curve.
Other times, there may not be a clear theoretical reason for a particular equation, but other people in your field have found one that fits your kind of data well. And in other cases, you just need to try a variety of equations until you find one that works well for your data.
How the test works In polynomial regression, you add different powers of the X variable X, X2, X3… to an equation to see whether they increase the R2 significantly.
- FAQ How do I interpret the sign of the quadratic term in a polynomial regression?
The R2 will always increase when you add a higher-order term, but the question is whether the increase in R2 is significantly greater than expected due to chance. You can keep doing this until adding another term does not increase R2 significantly, although in most cases it is hard to imagine a biological meaning for exponents greater than 3. Even though the usual procedure is to test the linear regression first, then the quadratic, then the cubic, you don't need to stop if one of these is not significant.
For example, if the graph looks U-shaped, the linear regression may not be significant, but the quadratic could be. Examples Fernandez-Juricic et al. They counted breeding sparrows per hectare in 18 parks in Madrid, Spain, and also counted the number of people per minute walking through each park both measurement variables. Graph of sparrow abundance vs.