## Note August 2017: This website will not be updated. Due to other research commitments, I will not have time to engage in pure methodological research or discussions in the foreseeable future.

I here share an unsubmitted manuscript from April 2017 that I had planned to do more work on. Because I will not have the time to do so, I instead publish it online so that anyone interested can read it. Please note that I will not have time to respond to questions about the manuscript. Feel free to develop the insights in the manuscripts, as long as you cite the source.

**

If you have come here to find my 2010 ESR article, it is available for free below. Beware, however, that logistic regression is often misused in several other ways, discussed further below

**Logistic regression: Why we cannot do what we think we can do and what we can do about it.**

European Sociological Review 2010 26(1): 67-82. Doi: 10.1093/esr/jcp006

(Advance Access published on March 9, 2009)

DOWNLOAD Full-text without subscription requirement: Logistic regression

Reprinted in (2013) Babones, S (Ed.) Applied Statistical Modeling SAGE Publications: SAGE Benchmarks in Social Research Methods

This article discusses some problems with common uses of logistic regression. Logistic regression estimates do not behave like linear regression estimates in one important respect: They are affected by omitted variables, even when these variables are unrelated to the independent variables in the model. This fact has important implications that have gone largely unnoticed by sociologists. Importantly, we cannot straightforwardly interpret log-odds ratios or odds ratios as effect measures, because they also reflect the degree of unobserved heterogeneity in the model. In addition, we cannot compare log-odds ratios or odds ratios for similar models across groups, samples, or time points, or across models with different independent variables in a sample. This article discusses these problems and possible ways of overcoming them.

**Other common problems: ****1. Misinterpretation of odds ratios as relative risks**

People think that odds ratios (OR) are meaningful because they make them think of relative risks (RR). Sometimes, there is a genuine misunderstanding, and sometimes people know that they are not the same, but just vaguely think of them as something similar.

Here is an example showing the difference between OR and RR: If 30 percent of girls and 60 percent of boys fail a test, then the RR is 0.6/0.3=2. That is: A boy’s risk of failing the test is 2 times a girl’s risk, or: A boy is 2 times more likely to fail the test than a girl The OR, on the other hand, is (0.6/0.4)/(0.3/0.7)=1.5/0.43=3.5, that is: The odds of a boy failing the test are 3.5 times higher than the odds for a girl failing it. However, many sociologists report an odds ratio of 3.5 as saying that the boy is 3.5 times more likely to fail the test than the girl. This is misleading, since most readers will take “3.5 times more likely” to describe a ratio of probabilities (i.e. RR), not a ratio of odds (i.e., OR). If the outcome in question is not very rare, OR and RR will not be similar, and OR will overstate the RR. There are thus two problems here: First, many researchers themselves mistake OR for RR, or think that the difference between them is unimportant. Second, researchers report OR in ways that make many readers intuitively think of them as RR. **2. Lack of meaningful effect estimates**

Logistic regression produces effect estimates in terms of OR or its log, the log odds ratio (LnOR). Before publishing such estimates, one should think about what they say and how this relates to the question of interest. OR and LnOR inform us about the direction of an effect, but they are less informative about the substantive size of an effect. When the outcome is categorical, the unit of interest from an effect point of view is almost always percentage units. Although odds may be understandable, ratios of odds are not as intuitive as ratios of probabilities, as thay are ratios of ratios (e.g., (0.6/0.4)/(0.3/0.7) instead of just ratios (e.g., 0.6/0.3).

The correct interpretation of the OR of 3.5 (see above) is that for every boy who does not fail, 3.5 times as many boys will fail than the number of girls failing for every girl who does not fail (cf. Osborne 2006). That is as intuitive as it gets. In our case, boys had a 60 percent risk of failing vs. the girls’ 30 percent risk. However, an OR of 3.5 could also represent risks of, for example, 84 versus 60, 10 versus 3, or even 0,9989 versus 0,9960. Although the difference between these is probably important for most research questions, the OR in itself cannot distinguish between them.

And log-odds ratios? An OR of 3.5 corresponds to a LnOR of 1.25. With a popular latent variable interpretation, this means that the latent propensity for the outcome increases with 1.25. Only problem is, you don’t have a scale for this latent propensity, so you cannot judge the size of this effect. The scale varies between models, samples and groups, so 1.25 can be a small effect in one case but a large effect in another. If you have binary outcome, only two outcomes exist and can be observed, and if you want to say something about the increase in the risk of having one outcome rather than the other, the number 1.25 tells you nothing.

To make results substantively interpretable, baseline probabilities should always be reported. Average marginal effects (AME), or the average effect of a one-unit change in discrete variables, are understandable (percentage unit terms), policy relevant (give the average effect across all observations) and comparable across groups and models. AME or their discrete counterparts should always be reported from logit (or similar models such as probit). One can complement these with marginal effects at different points in the probability distribution, which give percentage unit effects for those with different probabilities, explicitly acknowledging the non-linearity of the relation. **3. Misinterpreting interactions**

Logistic regression is intrinsically non-linear, so interaction effects in these models tell us about non-linear deviations from a non-linear functional form. If we are interested in the substantive outcomes, i.e., the risk of having a certain outcome, the results will differ across groups even without interaction effects in the model, if these groups only have different baseline risks. This is often overlooked, with interactions being interpreted as non-linear deviations from a linear effect.

Norton and Ai (2003) and Norton, Wang, Ai (2004) are excellent articles about interactions in logit and probit. They show that the issue is far too complex to understand intuitively: ”Some interaction effects are positive, and some are negative, no matter what the sign of the coefficient on the interaction term” (2004:167). **What about the linear probability model?**

For the braver among us, there is the taboo-breaking option of using LPM, i.e. OLS for binary outcomes. Many economists do it, there are much fewer mistakes to make (coefficients can be compared across models and groups, and interactions are intuitive), and the coefficients are almost always very close to the AME (or to their discrete counterparts in case of dummy variables). If all sociologists swapped from logit to OLS, fewer errors would be made and the results would be more reliable and interpretable. In economics, Angrist and Pischke (2009, Chapter 3) show that LPM is a good option for different kinds of limited dependent variables. Hellevik (2009) also makes a compelling case for choosing LPM over logit. My experience is that, however well argued, using or recommending LPM raises strong negative feelings in many camps. This appears to spring from the argument that the logistic curve normally describes effects on binary outcomes more correctly than the straight line does. This is of course correct, but although a nice model fit may feel nice, we must be guided by our research questions and the best ways to answer them. A bad model fit is bad for our purposes only if it distorts the estimates that we are interested in. If we are interested in the average effect of some variable upon some outcome, which we often are, LPM normally works just fine. If we are interested in how the effect varies over the probability distribution, LPM is not as straightforward. However, keep in mind that a logit or probit model will not per se give us any information about the non-linearity of the effects either, as the coefficients tell us nothing interpretable about this. To say something about variation in effect sizes from such models, we need to report a range of different marginal effects at different relevant points in the probability distribution. **References **

Ai C, Norton EC, Wang H. 2004. Computing interaction effects and standard errors in logit and probit models. *The Stata Journal *4: 154-167.

Ai C, Norton EC. 2003. Interaction terms in logit and probit models. *Economics letters *80:123-129.

Angrist JD and Pischke J-S. 2008. *Mostly Harmless Econometrics: An Empiricist's Companion. *Princeton University Press.

Hellevik O. 2009 Linear versus logistic regression when the dependent variable is a dichotomy *Quality and Quantity *43:59–74

Osborne JW. 2006 Bringing balance and technical accuracy to reporting odds ratios and the results of logistic regression analyses *Practical Assessment, Research & Evaluation *11(7):1-6.

Back to SOFI homepage

Back to IFFS homepage