9 Logistic Regression
Learning Objectives
In this chapter you will learn:
• How to use logistic regression to describe the relationship between an interval-level
independent variable and a dichotomous dependent variable
• How logistic regression is similar to—and different from—ordinary least squares
regression
• How maximum likelihood estimation works
• How to use logistic regression with multiple independent variables
• How to use probabilities to interpret logistic regression results
Political analysis is not unlike using a toolbox. The researcher looks at the substantive problem at
hand, selects the methodological tool most appropriate for analyzing the relationship, and then
proceeds with the analysis. Selection of the correct tool is determined largely by the levels of
measurement of the variables of interest. If both the independent and the dependent variables are
measured by nominal or ordinal categories—a common situation, particularly in survey
research—the researcher would most likely select cross-tabulation analysis. If both the independent
and the dependent variables are interval level, then ordinary least squares (OLS) regression would be
applied. Finally, if the researcher wanted to analyze the relationship between an interval-level
dependent variable and a categorical independent variable, then he or she might use mean
comparison analysis or, alternatively, the researcher could specify and test a linear regression model
using dummy variables. These techniques, all of which have been discussed in earlier chapters, add up
to a well-stocked toolbox. Even so, one set of tools is missing.
Logistic regression is part of a family of techniques designed to analyze the relationship between an
interval-level independent variable and a categorical dependent variable, a dependent variable
measured by nominal or ordinal values. The dependent variable might have any number of
categories—from several, to a few, to two. In this chapter we discuss how to use and interpret logistic
regression in the simplest of these situations, when the dependent variable takes on only two values.
For example, suppose we are using survey data to investigate the relationship between education and
voter turnout. We think that a positive relationship exists here: As education increases, so does the
likelihood of voting. The independent variable, education, is measured in precise 1-year increments,
from 0 (no formal education) to 20 (20 years of schooling). The dependent variable, however, takes on
only two values—respondents either voted or they did not vote. In this situation, we have a binary
dependent variable. A binary variable is a dichotomous variable, one that can assume only two
values. Binary variables are identical to dummy variables, discussed in Chapter 8
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-49#s9781483396095.i1646) .
Thus, voted/did not vote, smoker/nonsmoker, approve/do not approve, and married/unmarried are
all examples of dummy variables or binary variables. 1
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2027)
In some ways, logistic regression is similar to OLS regression. Like OLS, logistic regression gauges the
effect of the independent variable by estimating an intercept and a slope, both familiar fixtures of
linear regression. Plus logistic regression provides a standard error for the slope, which allows the
researcher to test hypotheses about the effect of the independent variable on the dependent variable.
And like OLS, logistic regression is remarkably flexible, permitting the use of multiple independent
variables, including dummy independent variables.
In one fundamental way, however, logistic regression is a different breed of cat. When we perform OLS
regression, we can reasonably assume a linear relationship between an independent variable (x) and a
dependent variable (y). For example, for the relationship between years of schooling (x) and income in
dollars (y), we can use a linear model to estimate the average dollar-change in income for each 1-year
increase in education. OLS would give us an idea of how closely the relationship between x and y fits
this linear pattern. However, when we have a binary dependent variable, we must assume that it bears
a nonlinear relationship to x. So as education (x) increases from 8 years to 9 years to 10 years, we
most plausibly assume that the likelihood of voting (y) is low and increases slightly for each of these
1-year increments. But as education increases from 11 years to 12 years to 13 years, we would expect
voter turnout to show large increases for each 1-year increment in this range of x. In the higher values
of education—say, beyond 13 years—we would assume that turnout is already high and that each
additional year of schooling would have a weaker effect on voting. A logistic regression analysis would
give us an idea of how closely the relationship between x and y fits this nonlinear pattern.
This chapter is divided into four sections. In the first section we use both hypothetical and real-world
data to illustrate the logic behind logistic regression. Here you will be introduced to some unfamiliar
terms—such as odds and logged odds—that define the workings of the technique, and you will learn
what to look for in your own analyses and how to describe and interpret your findings. In the second
section we take a closer look at maximum likelihood estimation, the method logistic regression uses to
estimate the effect of the independent variable (or variables) on the dependent variable. Here you will
see how logistic regression is similar to other techniques and statistics we discussed previously,
particularly chi-square. In the third section we demonstrate how the logistic regression model, much
like multiple linear regression, can be extended to accommodate several independent variables.
Finally, we consider some additional ways to present and interpret logistic regression results. By the
end of this chapter you will have added another powerful technique to your toolbox of political
research methods.
Get the edge on your studies. edge.sagepub.com/pollock (http://edge.sagepub.com/pollock)
• Take a quiz to find out what you’ve learned.
• Review key terms with eFlashcards.
• Watch videos that enhance chapter content.
9.1 The Logistic Regression Approach
We begin with a hypothetical example. Suppose we are investigating whether education (x) affects
voter turnout (y) among a random sample of respondents (n = 500). For purposes of illustration, we
will assume that the independent variable, education, is an interval-level variable that varies from 0
(low) to 4 (high), and that voter turnout is a binary dependent variable, coded 1 if the individual voted
and 0 if he or she did not vote. Table 9-1
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1883)
shows the results from a cross-tabulation analysis of the hypothetical sample data. Although column
percentages have not been supplied in Table 9-1
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1883) ,
they are easy to figure out because each value of education contains exactly 100 cases. For example, of
the 100 people in the low-education category, 6 voted—a percentage equal to 6 or a proportion equal
to .06. Twenty percent (.20) of the 100 middle-low education individuals voted, 50 percent (.50) of the
middle group voted, and so on. The bottom row of Table 9-1
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1883)
presents the proportion of voters for each value of education, but it uses the label “Probability of
voting.” Why use probability instead of proportion? The two terms are synonymous. Think of it this
way: If you were to randomly select one individual from the group of 100 low-education respondents,
what is the probability that this randomly selected person voted? Because random selection
guarantees that each case has an equal chance of being picked, there are 6 chances in 100—a
probability of .06—of selecting a voter from this group. Similarly, you could say that there is a random
probability of voting equal to .06 for any individual in the low-education category, a probability of .20
for any respondent in the middle-low group, and so on. It is important to shift your thinking from
proportions to probabilities, because logistic regression is aimed at determining how well an
independent variable (or set of independent variables) predicts the probability of an occurrence, such
as the probability of voting.
Consider the Table 9-1 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint57#s9781483396095.i1883) probabilities and make some substantive observations. Clearly, a positive
relationship exists between education and the probability of voting: As education (x) goes up, so does
the probability of voting (y). Now examine this pattern more closely and apply the logic of linear
regression. Does a one-unit increase in the independent variable produce a consistent increase in the
probability of voting? Starting with the interval between low and middle-low, the probability goes
from .06 to .20—an increase of .14. So by increasing the independent variable by 1 in this interval, we
see a .14 increase in the probability of voting. Between middle-low and middle, however, this effect
increases substantially, from .20 to .50—a jump of .30. The next increment, from middle to middlehigh, produces another .30 increase in the probability of voting, from .50 to .80. But this effect levels
off again between the two highest values of the independent variable. Moving from middle-high to
high education occasions a more modest increase of .14 in the probability of voting. Thus the linear
logic does not work very well. A unit change in education produces a change in the probability of
voting of either .14 or .30, depending on the range of the independent variable examined. Put another
way, the probability of voting (y) has a nonlinear relationship to education (x).
Rest assured that there are very good statistical reasons the researcher should not use OLS regression
to estimate the effect of an interval-level independent variable on a binary dependent variable. 2
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2029)
Perhaps as important, there are compelling substantive reasons you would not expect a linear model
to fit a relationship such as the one depicted in Table 9-1
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1883) .
Think about this for a moment. Suppose you made $10,000 a year and were faced with the decision of
whether to make a major purchase, such as buying a home. There is a good chance that you would
decide not to make the purchase. Now suppose that your income rose to $20,000, a $10,000 increase.
To be sure, this rise in income might affect your reasoning, but most likely it would not push your
decision over the purchasing threshold, from a decision not to buy to a decision to buy. Similarly, if
your initial income were $95,000, you would probably decide to buy the house, and an incremental
$10,000 change, to $105,000, would have a weak effect on this decision—you were very likely to make
the purchase in the first place. But suppose that you made $45,000. At this income level, you might
look at your decision a bit differently: “If I made more money, I could afford a house.” Thus that
$10,000 pay raise would push you over the threshold. Going from $45,000 to $55,000 greatly
enhances the probability that you would make the move from “do not buy” to “buy.” So at low and high
initial levels of income, an incremental change in the causal variable has a weaker effect on your
dichotomous decision (do not buy/buy) than does the same incremental change in the middle range of
income.
Although fabricated, the probabilities in Table 9-1
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1883)
show a plausible pattern. Less educated individuals are unlikely to vote, and you would not expect a
small increment in education to make a huge difference in the probability of voting. The same idea
applies to people in the upper education range. Individuals in the middle-high to high category are
already quite likely to vote. It would be unreasonable to suggest that, for highly educated people, a
one-unit increase in the independent variable would have a big effect on the likelihood of voting. It is
in the middle intervals of the independent variable—from middle-low to middle-high—where you
might predict that education would have its strongest effect on the dependent variable. As people in
this range gain more of the resource (education) theoretically linked to voting, a marginal change in
the independent variable is most likely to switch their dichotomous decision from “do not vote” to
“vote.” Logistic regression allows us to specify a model that takes into account this nonlinear
relationship between education and the probability of voting.
As we have seen, the first step in understanding logistic regression is to think in terms of the
probability of an outcome. The next step is to get into the habit of thinking in terms of the odds of an
outcome. This transition really is not too difficult, because odds are an alternative way of expressing
probabilities. Whereas probabilities are based on the number of occurrences of one outcome (such as
voting) divided by the total number of outcomes (voting plus nonvoting), odds are based on the
number of occurrences of one outcome (voting) divided by the number of occurrences of the other
outcome (nonvoting). According to Table 9-1
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1883) ,
for example, among the 100 people in the middle-high education group, there were 80 voters—a
probability of voting equal to 80/100 or .80. What are the odds of voting for this group? Using the raw
numbers of voters and nonvoters, the odds would be 80 to 20, or, to use a more conventional way of
verbalizing odds, 4 to 1, four voters to every nonvoter. In describing odds, we ordinarily drop the “ ...
to 1” part of the verbalization and say that the odds of voting are 4. Thus, for the middle-high
education group, the probability of voting is .80 and the odds of voting are 4. In figuring odds, you can
use the raw numbers of cases, as we have just done, or you can use probabilities to compute odds. The
formula for converting probabilities to odds is as follows:
Odds = Probability / ( 1 − Probability )
Apply this conversion to the example just discussed. For middle-high education respondents, the odds
would be .80 divided by (1 minus .80), which is equal to .80/.20, or 4. The “Odds of voting” column in
Table 9-2 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint57#s9781483396095.i1895) shows this conversion for the five education groups.
Consider the numbers in the “Odds of voting” column and note some further properties of odds. Note
that probabilities of less than .50 produce odds of less than 1 and probabilities of greater than .50
convert to odds of greater than 1. The probabilities for low and middle-low education respondents
(.06 and .20, respectively) convert to odds of .06 and .25, and the probabilities among the highest
education groups translate into odds of 4 and 16. If an event is as likely to occur as not to occur, as
among the middle education people, then the probability is .50 and the odds are equal to 1 (.50/.50 =
1).
Now examine the “Odds of voting” column in Table 9-2
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1895)
more closely. Can you discern a systematic pattern in these numbers, as you proceed down the column
from low education to high education? Indeed, you may have noticed that the odds of voting for the
middle-low education group is (very nearly) four times the odds of voting for the low education
category, since 4 times .06 is about equal to .25. And the odds of voting for the middle education group
is four times the odds for the middle-low group, since 4 times .25 equals 1. Each additional move, from
middle to middle-high (from an odds of 1 to an odds of 4) and from middle-high to high (from 4 to 16),
occasions another fourfold increase in the odds. So, as we proceed from lower to higher values of the
independent variable, the odds of voting for any education group is four times the odds for the nextlower group. In the language of logistic regression, the relationship between the odds at one value of
the independent variable compared with the odds at the next-lower value of the independent variable
is called the odds ratio. Using this terminology to describe the “Odds of voting” column of Table 9-2
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1895) ,
we would say that the odds ratio increases by 4 for each one-unit increase in education.
The pattern of odds shown in Table 9-2
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1895)
may be described in another way. Instead of figuring out the odds ratio for each change in education,
we could calculate the percentage change in the odds of voting for each unit change in education.
This would be accomplished by seeing how much the odds increase and then converting this number
to a percentage. Between low and middle-low, for example, the odds of voting go from .06 to .25, an
increase of .19. The percentage change in the odds would be .19 divided by .06, which is equal to
3.17—a bit more than a 300-percent increase in the odds of voting. For the move from middle-low to
middle we would have (1 − .25)/.25 = 3.00, another 300-percent increase in the odds of voting. In fact,
the odds of voting increases by 300 percent for each additional unit increase in education: from
middle to middle-high ([4 − 1]/1 = 3.00) and from middle-high to high ([16 − 4]/4 = 3.00). Using this
method to describe the Table 9-2
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1895)
data, we could conclude that the odds of voting increase by 300 percent for each one-unit increase in
education.
Briefly review what we have found so far. When we looked at the relationship between education and
the probability of voting, we saw that an increase in the independent variable does not produce a
consistent change in the dependent variable. In examining the relationship between education and the
odds of voting, however, we saw that a unit change in education does produce a constant change in the
odds ratio of voting—equal to 4 for each unit change in x. Alternatively, each change in the
independent variable produces a consistent percentage increase in the odds of voting, a change equal
to 300 percent for each unit change in x. What sort of model would summarize this consistent pattern?
The answer to this question lies at the heart of logistic regression. Logistic regression does not
estimate the change in the probability of y for each unit change in x. Rather, it estimates the change in
the log of the odds of y for each unit change in x. Consider the third column of numbers in Table 9-2
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1895) .
This column reports an additional conversion, labeled “Logged odds of voting.” For low education, this
number is equal to −2.8; for middle-low education, it is equal to −1.4; for middle education, 0; for the
middle-high group, +1.4; and for high education, +2.8. Where did these numbers originate?
A logarithm, or log for short, expresses a number as an exponent of some constant or base. If we chose
a base of 10, for example, the number 100 would be expressed as 2, since 100 equals the base of 10
raised to the power of 2 (102 = 100). We would say, “The base-10 log of 100 equals 2.” Base-10 logs
are called common logarithms, and they are used widely in electronics and the experimental
sciences. Statisticians generally work with a different base, denoted as e. The base e is approximately
equal to 2.72. Base-e logs are called natural logarithms and are abbreviated ln. Using the base e, we
would express the number 100 as 4.61, since 100 equals the base e raised to the power of 4.61 (e 4.61 ≈
[100], or ln[100] ≈ 4.61). We would say, “The natural log of 100 equals 4.61.” The five numbers in the
Table 9-2 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint57#s9781483396095.i1895) column “Logged odds of voting” are simply the natural logs of .06 (e
−2.8
= .06), .25 (e −1.4 = .25), 1 (e 0 = 1), 4 (e 1.4 = 4), and 16 (e 2.8 = 16). Using conventional notation: ln(.06)
= −2.8, ln(.25) = −1.4, ln(1) = 0, ln(4) = 1.4, and ln(16) = 2.8.
These five numbers illustrate some general features of logarithmic transformations. Any number less
than 1 has a negatively signed log. So to express .25 as a natural log, we would raise the base e to a
negative power, −1.4. Any number greater than 1 has a positively signed log. To convert 4 to a natural
log, we would raise e to the power of 1.4. And 1 has a log of 0, since e raised to the power of 0 equals 1.
Natural log transformations of odds are often called logit transformations or logits for short. So the
logit (“lowjit”) of 4 is 1.4. 3 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint61#s9781483396095.i2031)
You are probably unaccustomed to thinking in terms of odds instead of probabilities. And it is a safe
bet that you are really unaccustomed to thinking in terms of the logarithmic transformations of odds.
But stay focused on the “Logged odds of voting” column. Again apply the linear regression logic. Does a
unit change in the independent variable, education, produce a consistent change in the log of the odds
of voting? We can see that moving from low education to middle-low education, the logged odds
increases from −2.8 to −1.4, an increase of 1.4. And moving from middle-low to middle, the logged
odds again increases by 1.4 (0 minus a negative 1.4 equals 1.4). From middle to middle-high and from
middle-high to high, each one-unit increase in education produces an increase of 1.4 in the logged
odds of voting.
Now there is the odd beauty of logistic regression. Although we may not use a linear model to estimate
the effect of an independent variable on the probability of a binary dependent variable, we may use a
linear model to estimate the effect of an independent variable on the logged odds of a binary
dependent variable. Consider this plain-vanilla regression model:
As you know, the regression coefficient, , estimates the change in the dependent variable for each unit
change in the independent variable. And the intercept, , estimates the value of the dependent variable
when x is equal to 0. Using the numbers in the “Logged odds of voting” column to identify the values
for and , we would have:
Logged odds (voting) = − 2.8 + 1.4 (education)
Review how this model fits the data. For the low-education group (coded 0 on education), the logged
odds of voting would be −2.8 + 1.4(0), which is equal to −2.8. For the middle-low education group
(coded 1 on education): −2.8 + 1.4(1), equal to −1.4. For the middle group (coded 2): −2.8 + 1.4(2),
equal to 0. And so on, for each additional one-unit increase in education. This linear model nicely
summarizes the relationship between education and the logged odds of voting.
Now if someone were to ask, “What, exactly, is the effect of education on the likelihood of voting?” we
could reply, “For each unit increase in education there is an increase of 1.4 in the logged odds of
voting.” Although correct, this interpretation is not terribly intuitive. Therefore, we can use the
regression coefficient, 1.4, to retrieve a more understandable number: the change in the odds ratio of
voting for each unit change in education. How might this be accomplished? Remember that, as in any
regression, all the coefficients on the right-hand side are expressed in units of the dependent variable.
Thus the intercept, a, is the logged odds of voting when x is 0. And the slope, b, estimates the change in
the logged odds of voting for each unit change in education. Because logged odds are exponents of e,
we can get from logged odds back to odds by raising e to the power of any coefficient in which we are
interested. Accordingly, to convert the slope, 1.4, we would raise e to the power of 1.4. This
exponentiation procedure, abbreviated Exp(b), looks like this:
Now we have a somewhat more interpretable reply: “For each unit change in education, the odds ratio
increases by 4. Members of each education group are four times more likely to vote than are members
of the next-lower education group.”
Even more conveniently, we can translate the coefficient, 1.4, into a percentage change in the odds of
voting. Here is the general formula:
Percentage change in the odds of y = 100 × (Exp (b) − 1)
For our example:
Thus we now can say, “Each unit increase in education increases the odds of voting by 300 percent.”
Note further that, armed with the logistic regression equation, “Logged odds (voting)= -2.8 + 1.4
(education),” we can estimate the odds of voting—and therefore the probability of voting—for each
value of the independent variable. For the middle-low education group, for example, the logistic
regression tells us that the logged odds of voting is −2.8 plus 1.4(1), equal to −1.4. Again, because −1.4
is the exponent of e, the odds of voting for this group would be Exp(−1.4), equal to .25. If the odds of
voting is equal to .25, what is the probability of voting? We have already seen that:
Odds = Probability / ( 1 − Probability )
Following a little algebra:
Probability = Odds / ( 1 + Odds )
So for the middle-low group, the probability of voting would be:
.25 / ( 1 + .25 ) = .25 / 1.25 = .20
By performing these reverse translations for each value of x—from logged odds to odds, and from
odds to probabilities—we can retrieve the numbers in the “Probability of voting” column of Table 9-2
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1895) .
If we were to plot these retrieved probabilities of y for each value of x, we would end up with Figure
9-1 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint57#s9781483396095.i1920) .
In the beginning we realized that the linear regression logic could not be accurately or appropriately
applied to the nonlinear relationship between education and the probability of voting. But after
transforming the dependent variable into logged odds, we could apply a linear model. So x bears a
nonlinear relationship to the probability of y, but x bears a linear relationship to the logged odds of y.
Furthermore, because the logged odds of y bears a nonlinear relationship to the probability of y, the
logistic regression model permits us to estimate the probability of an occurrence for any value of x.
An S-shaped relationship, such as that depicted in Figure 9-1
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1920) ,
is the visual signature of logistic regression. Just as OLS regression will tell us how well our data fit a
linear relationship between an independent variable and the values of an interval-level dependent
variable, logistic regression will tell us how well our data fit an S-shaped relationship between an
independent variable and the probability of a binary dependent variable. 4
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2033)
In the hypothetical education−voting example, the relationship worked out perfectly. The logistic
regression returned the exact probabilities of voting for each value of education. By eyeballing the
“Logged odds of voting” column of Table 9-2
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1895) ,
we could easily identify the intercept and regression coefficient of the logistic regression equation. No
prediction error. In the practical world of political research, of course, relationships are never this
neat.
Figure 9-1 Plotted Probabilities of Voting (y), by Education (x)
Note: Hypothetical data.
To apply what you have learned thus far—and to discuss some further properties of logistic
regression—we enlist here a real-world dataset, the 2012 General Social Survey (GSS), and reexamine
the relationship between education and voting. We estimate the logistic regression equation as
follows:
The dependent variable is reported turnout in the 2008 presidential election. As in the hypothetical
example, voters are coded 1 and nonvoters are coded 0. Unlike the hypothetical example, the
independent variable, education, is measured in years of formal schooling. This is a more realistic
interval-level variable, with values that range from 0 for no formal education to 20 for 20 years of
education. Table 9-3 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint57#s9781483396095.i1927) reports the results obtained from a logistic regression analysis using Stata.
First consider the numbers in the column labeled “Coefficient estimate.” Plugging these values into our
equation, we would have:
Logged odds (voting) = − 2.068 + .226 (education)
The coefficients tell us that, for individuals with no formal schooling, the estimated logged odds of
voting is equal to −2.068, and each 1-year increment in education increases the estimated logged odds
by .226. The value in the right-most row, “Exp(b),” translates logged odds back into odds and thus
provides a more accessible interpretation. Every unit increase in education increases the odds ratio by
1.254. Individuals at any given level of education are about 1.25 times more likely to vote than are
individuals at the next-lower level of education. That is, as we move from one value of education to the
next-higher value, we would multiply the odds of voting by about 1.25. Perhaps the most intuitively
appealing way to characterize the relationship is to estimate the percentage change in the odds of
voting for each 1-year increase in education. As we have seen, this is accomplished by subtracting 1
from the exponent, 1.254, and multiplying by 100. Performing this calculation: 100 × (1.254 − 1) =
25.4. Thus the odds of voting increase by about 25 percent for each 1-year increase in the independent
variable.
What would the null hypothesis have to say about these results? As always, the null hypothesis claims
that, in the population from which the sample was drawn, no relationship exists between the
independent and dependent variables, that individuals’ levels of education play no role in determining
whether they vote. Framed in terms of the logistic regression coefficient, the null hypothesis says that,
in the population, the true value of the coefficient is equal to 0, that a one-unit increase in the
independent variable produces no change in the logged odds of voting. The null hypothesis also can be
framed in terms of the odds ratio, Exp(b). As we have seen, the odds ratio tells us by how much to
multiply the odds of the dependent variable for each one-unit increase in the independent variable. An
odds ratio of less than 1 means that the odds decline as the independent variable goes up (a negative
relationship). (For a discussion of negative relationships in logistic regression, see Box 9-1
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1934)
.) An odds ratio of greater than 1 says that the odds increase as the independent variable goes up (a
positive relationship). But an odds ratio of 1 means that the odds do not change as the independent
variable increases. Thus an odds ratio equal to 1 would be good news for the null hypothesis, because
it would mean that individuals at any level of education are no more likely to vote than are individuals
at the next-lower level of education. So if the logistic regression coefficient were equal to 0 or Exp(b)
were equal to 1, then we would have to say that the independent variable has no effect on the
dependent variable. 5 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint61#s9781483396095.i2036) As it is, however, we obtained an estimated coefficient equal to .226, which
is greater than 0, and an odds ratio equal to 1.254, which is greater than 1. But how can we tell if these
numbers are statistically significant?
Notice that, just as in OLS regression, logistic regression has provided a standard error for the
estimated coefficient, b. And, again like OLS, the standard error tells us how much prediction error is
contained in the estimated coefficient. 6
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2038)
Thus, according to Table 9-3
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1927) ,
each 1-year increase in education produces a .226 increase in the logged odds of voting, give or
take .030 or so. OLS regression computes a test statistic based on the Student’s t-distribution. In Stata,
logistic regression returns z, based on the normal distribution. SPSS computes the Wald statistic,
which follows a chi-square distribution. 7
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2041)
All computer programs provide a P-value for the test statistic. Like any P-value, this number tells you
the probability of obtaining the observed results, under the assumption that the null hypothesis is
correct. A P-value equal to .000 says that, if the null hypothesis is correct, then the probability of
obtaining a regression coefficient of .226 is highly remote—clearly beyond the .05 standard.
Therefore, we can safely reject the null hypothesis and conclude that education has a statistically
significant effect on the likelihood of voting.
As you can see, in many ways logistic regression bears a kinship to OLS regression. In running OLS, we
obtain an estimate for the linear regression coefficient that minimizes prediction errors. That is, OLS
provides the best fit between the predicted values of the dependent variable and the actual, observed
values of the dependent variable. OLS also reports a standard error for the regression coefficient,
which tells us how much prediction error is contained in the regression coefficient. This information
permits us to determine whether x has a significant effect on y. Similarly, logistic regression minimizes
prediction errors by finding an estimate for the logistic regression coefficient that yields the maximum
fit between the predicted probabilities of y and the observed probabilities of y. Plus it reports a
standard error for this estimated effect.
However, a valuable statistic is missing from the analogy between OLS and logistic regression: Rsquare. As you know, R-square tells the researcher how completely the independent variable (or, in
multiple regression, all the independent variables) explains the dependent variable. In our current
example, it certainly would be nice to know how completely the independent variable, education,
accounts for the likelihood of voting. Does logistic regression provide an analogous statistic to Rsquare? Strictly speaking, the answer is no. 8
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2043)
Even so, methodologists have proposed R-square−like measures that give an overall reading of the
strength of association between the independent variables and the dependent variable. To understand
these measures, we need to take a closer look at maximum likelihood estimation, the technique
logistic regression uses to arrive at the best fit between the predicted probabilities of y and the
observed probabilities of y.
Box 9-1 How to Interpret a Negative Relationship in Logistic Regression
As we expected, Table 9-3
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint57#s9781483396095.i1927) reveals a positive relationship between education and the
likelihood of voting. Each 1-year increase in schooling increases the logged odds of voting
by .226. Alternatively, each increment in education boosts the odds ratio by 1.25—a 25
percent increase for each increment in educational attainment. However, in your own or in
others’ research you will often encounter negative relationships, situations in which a unit
increase in the independent variable is associated with a decrease in the logged odds of the
dependent variable. Negative relationships can be a bit trickier in logistic regression than
in OLS regression. Suppose we were to investigate the relationship between the likelihood
of voting and the number of hours respondents spend watching television per day. In this
situation, we might expect to find a negative relationship: The more television that people
watch, the less likely they are to vote. In fact, we would obtain these estimates:
Logged odds (voting) = 1.110 − .107 (TV hours)
Thus, each 1-hour increase in daily television watching occasions a decrease of .107 in the
logged odds of voting. Obtaining the odds ratio, we would have: Exp(– .107) = e– .107 = .898.
Positive relationships produce odds ratios of greater than 1, and negative relationships
produce odds ratios of less than 1. How would you interpret an odds ratio of .898? Like
this: Individuals watching any given number of hours of television per day are only about .9
times as likely to vote as are individuals who watch the next-lower number of hours. For
example, people who watch 4 hours per day are .9 times as likely to vote as are people who
watch 3 hours per day. Following the formula for percentage change in the odds: 100 ×
(.898 – 1) = -10.2. Each additional hour spent in front of the television depresses the odds
of voting by about 10 percent. a
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint57#s9781483396095.i1941)
a
Data for this analysis are from the 2008 General Social Survey (n = 1,163). The
independent variable, number of hours spent watching television, is based on the following
question: “On the average day, about how many hours do you personally watch television?”
9.2 Finding the Best Fit: Maximum Likelihood Estimation
By way of introducing maximum likelihood estimation, it is helpful to recall the logic behind
proportional reduction in error (PRE) measures of association, such as lambda or R-square. You will
remember that a PRE measure first determines how well we can predict the values of the dependent
variable without knowledge of the independent variable. It then compares this result with how well
we can predict the dependent variable with knowledge of the independent variable. PRE uses the
overall mean of the dependent variable to “guess” the dependent variable for each value of the
independent variable. This guessing strategy produces a certain number of errors. PRE then figures
out how many errors occur when the independent variable is taken into account. By comparing these
two numbers—the number of errors without knowledge of the independent variable and the number
of errors with knowledge of the independent variable—PRE determines how much predictive
leverage the independent variable provides.
Maximum likelihood estimation (MLE) employs the same approach. MLE takes the sample-wide
probability of observing a specific value of a binary dependent variable and sees how well this
probability predicts that outcome for each individual case in the sample. At least initially, MLE ignores
the independent variable. As in PRE, this initial strategy produces a number of prediction errors. MLE
then takes the independent variable into account and determines if, by knowing the independent
variable, these prediction errors can be reduced.
Consider a highly simplified illustration, which again uses education (x) to predict whether an
individual voted (coded 1 on the dependent variable, y) or did not vote (coded 0 on y). MLE first would
ask, “How well can we predict whether or not an individual voted without using education as a
predictor?” For the sake of simplicity, suppose our sample consists of four individuals, as shown in
Table 9-4 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint58#s9781483396095.i1945) . As you can see, two individuals voted (coded 1) and two did not (coded 0).
Based only on the distribution of the dependent variable, what is the predicted probability of voting
for each individual? MLE would answer this question by figuring out the sample-wide probability of
voting and applying this prediction to each case. Since half the sample voted and half did not, MLE’s
initial predicted probability (labeled P) would be equal to .5 for each individual. Why .5? Because there
is a .5 chance that any individual in the sample voted and a .5 chance that he or she did not vote. We
will label the model that gave rise to the initial predictions Model 1. Table 9-4
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-58#s9781483396095.i1945)
shows the predicted probabilities, plus some additional information, for Model 1.
How well, overall, does Model 1 predict the real values of y? MLE answers this question by computing
a likelihood function, a number that summarizes how well a model’s predictions fit the observed
data. In computing this function, MLE first determines a likelihood for each individual case. An
individual likelihood tells us how closely the model comes to predicting the observed outcome for that
case. MLE then computes the likelihood function by calculating the product of the individual
likelihoods, that is, by multiplying them together. The likelihood function can take on any value
between 0 (meaning the model’s predictions do not fit the observed data at all) and 1 (meaning the
model’s predictions fit the observed data perfectly).
Stated formally, the likelihood function is not beautiful to behold. 9
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2045)
Applied in practice to a small set of data, however, the function is not difficult to compute. If a case has
an observed value of y equal to 1 (the individual voted), then the likelihood for that case is equal to P.
So individuals A and B, with predicted probabilities equal to .5, have likelihoods equal to P, which is .5.
If a case has an observed value of y equal to 0 (the individual did not vote), then the likelihood for that
case is equal to 1 − P. Thus individuals C and D, who have predicted probabilities of .5, have
likelihoods equal to 1 − P, or 1 − .5, also equal to .5. The likelihoods for each individual are displayed in
the right-most column of Table 9-4
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-58#s9781483396095.i1945) .
The likelihood for Model 1 is determined by multiplying all the individual likelihoods together:
Model 1 likelihood = .5×.5 ×.5 ×.5 =.0625
MLE would use this number, .0625, as a baseline summary of how well we can predict voting without
knowledge of the independent variable, education. The baseline model is sometimes called the
reduced model, because its predictions are generated without using the independent variable.
Informally, we could also call it the “know-nothing model” because it does not take into account
knowledge of the independent variable.
In its next step, MLE would bring the independent variable into its calculations by specifying a logistic
regression coefficient for education, recomputing the probabilities and likelihoods, and seeing how
closely the new estimates conform to the observed data. Again, for the sake of illustration, suppose
that these new estimates, labeled Model 2, yield the predicted probabilities displayed in Table 9-5
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-58#s9781483396095.i1955) .
Model 2, which takes into account the independent variable, does a better job than Model 1 in
predicting the observed values of y. By using education to predict voting, Model 2 estimates
probabilities equal to .9 and .8 for individuals A and B (who, in fact, voted), but probabilities of only .3
and .1 for individuals C and D (who, in fact, did not vote). Just as in the Model 1 procedure, the
individual likelihoods for each case are equal to P for each of the voters (for whom y = 1) and are equal
to 1 − P for each of the nonvoters (for whom y = 0). The individual likelihoods appear in the right-most
column of Table 9-5 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint58#s9781483396095.i1955) . As before, the likelihood function for Model 2 is computed by multiplying
the individual likelihoods together:
Model 2 Likelihood = .9×.8×.7×.9 =.4536
How much better is Model 2 than Model 1? Does using education as a predictor provide significantly
improved estimates of the probability of voting? Now, MLE does not work directly with differences in
model likelihoods. Rather it deals with the natural log of the likelihood, or logged likelihood (LL) of
each model. Thus MLE would calculate the natural log of the Model 1 likelihood, calculate the natural
log of the Model 2 likelihood, and then determine the difference between the two numbers. Table 9-6
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-58#s9781483396095.i1960)
shows these conversions, plus some additional calculations, for Model 1 and Model 2.
Examine the Table 9-6 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint58#s9781483396095.i1960) calculations. As we found earlier, Model 2’s likelihood (.4536) is greater
than Model 1’s likelihood (.0625). This increase is also reflected in the LLs of both models: The LL
increases from −2.78 for Model 1 to −0.79 for Model 2. MLE makes the comparison between models by
starting with Model 1’s LL and subtracting Model 2’s LL: −2.78 − (−0.79) = −1.99. Notice that if Model
2 did about as well as Model 1 in predicting y, then the two LLs would be similar, and the calculated
difference would be close to 0. 10
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2047)
As it is, MLE found a difference equal to −1.99.
Does the number −1.99 help us decide whether Model 2 is significantly better than Model 1? Yes, it
does. With one additional calculation, the difference between two LLs follows a chi-square
distribution. The additional calculation is achieved by multiplying the difference in LLs by −2. Doing
so, of course, doubles the difference and reverses the sign: −2 (−1.99) = 3.98. This calculation, often
labeled in computer output as “Chi-square,” is a chi-square test statistic, and MLE uses it to test the
null hypothesis that the true difference between Model 1 and Model 2 is equal to 0. There is nothing
mystical here. It is plain old hypothesis testing using chi-square. If the calculated value of the change
in −2LL, equal to 3.98, could have occurred more frequently than 5 times out of 100, by chance, then
we would not reject the null hypothesis. We would have to conclude that the education−voting
relationship is not significant. However, if the chances of observing a chi-square value of 3.98 are less
than or equal to .05, then we would reject the null hypothesis and infer that Model 2 is significantly
better than Model 1. Using the appropriate degrees of freedom and applying a chi-square test, MLE
would report a P-value of .046 for a test statistic of 3.98. 11
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2049)
The P-value is less than .05, so we can reject the null hypothesis and conclude that education is a
statistically significant predictor of the probability of voting.
MLE proceeds much in the way illustrated by this example. It first obtains a set of predictions and
likelihoods based on a reduced or know-nothing model, that is, a model using only the sample-wide
probability of y to predict the observed values of y for each case in the data. It then “tries out” a
coefficient for the independent variable in the logistic regression model. MLE usually obtains the first
try-out coefficient by running a version of least squares regression using x to predict y. It enlists this
coefficient to compute a likelihood, which it then compares with the likelihood of the reduced model.
It then proceeds in an iterative fashion, using a complex mathematical algorithm to fine-tune the
coefficient, computing another likelihood, and then another and another—until it achieves the best
possible fit between the model’s predictions and the observed values of the dependent variable.
MLE is the heart and soul of logistic regression. This estimation technique generates all the coefficient
estimates and other useful statistics that help the analyst draw inferences about the relationship
between the independent and dependent variables. Return now to the GSS data and consider some of
these additional statistics, as reported in Table 9-7
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-58#s9781483396095.i1969) .
To enhance the comparison between the real-world data and the hypothetical example just discussed,
the reduced model—the model estimated without taking into account education—is called Model 1.
Model 2 refers to the results obtained after the independent variable, education, is used to predict the
likelihood of voting. Note the difference between the LLs of the models: When education is used to
predict voting, the LL increases from −1,032.78 to −958.82. Is this a significant improvement? Yes,
according to the “Model comparison” numbers in the table. Subtracting Model 2’s LL from Model 1’s
LL yields a difference of −73.96. Multiplying the difference by −2, labeled “Chi-square,” gives us a chisquare test statistic of 147.92, which has a P-value well beyond the realm of the null hypothesis. Thus,
compared with how well we can predict the dependent variable without knowledge of the
independent variable, knowledge of respondents’ education significantly improves our ability to
predict the likelihood of voting. If Model 2 had several predictors of voting—education, age, and race,
for example—then the change in −2LL, the “Chi-square” statistic, would provide a chi-square test for
the null hypothesis that none of these variables is significantly related to the likelihood of voting.
Logistic regression enlists the change in likelihood function in yet another way—as the basis for Rsquare−type measures of association, three of which are reported in the Table 9-7
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-58#s9781483396095.i1969)
“Model 2 summary.” These statistics are grounded on the intuitive PRE logic. Model 1’s LL represents
prediction error without knowing the independent variable. The difference between Model 1’s LL and
Model 2’s LL represents the predictive leverage gained by knowing the independent variable. In
conceptual terms, then, we could express the difference between the two models as a proportion of
Model 1’s LL:
R−square = ( Model 1 LL − Model 2 LL ) / ( Model 1 LL )
If Model 2 did about as well as Model 1 in predicting voting—if the two models’ LLs were
similar—then R-square would be close to 0. If, by contrast, Model 2’s LL was a lot higher than Model
1’s LL, then R-square would approach 1. 12
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2051)
The various R-square measures build on this conceptual framework. McFadden R-square, familiar to
Stata users, is derived from the above formula. Cox−Snell R-square makes an adjustment based on
sample size. The Cox−Snell R-square is somewhat conservative, however, because it can have a
maximum value of less than 1. The Nagelkerke statistic adjusts the Cox−Snell number, yielding a
measure that is usually higher. By and large, though, these measures, and several others that you may
encounter, give readings of strength that are pretty close to each other. 13
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2053)
So what are we to make of an R-square in the .07 to .12 range? Again, unlike least squares regression,
MLE is not in the business of explaining variance in the dependent variable. So we cannot say
something like, “Education explains about 7 percent of the variation in voter turnout.” However, we
know that R-square can assume values between 0 and 1, with 0 denoting a very weak relationship and
1 denoting a strong relationship. Thus we can say that education, while significantly related to the
likelihood of voting, is not by itself a particularly strong predictive tool. From a substantive
standpoint, this is not too surprising. You can probably think of several additional variables that might
improve the predictive power of the logistic regression model. Age, race, political efficacy, strength of
partisanship—all these variables come to mind as other possible causes of voting. If we were running
OLS, we could specify a multiple regression model and estimate the effect of each of these variables on
the dependent variable. Logistic regression also accommodates multiple predictors. We turn now to a
discussion of logistic regression using more than one independent variable.
9.3 Logistic Regression with Multiple Independent
Variables
Thus far we have covered a fair amount of ground. You now understand the meaning of a logistic
regression coefficient. You know how to interpret coefficients in terms of changes in the odds ratio, as
well as the percentage change in the odds. You know how to evaluate the statistical significance of a
logistic regression coefficient. Plus you have a basic understanding of MLE, and you can appreciate its
central role in providing useful statistics, such as the change in −2LL, as well as R-square−type
measures of association. So far, however, our substantive examples have been of a simple variety, with
one independent variable. Yet political researchers are often interested in assessing the effects of
several independent variables on a dependent variable. We often want to know whether an
independent variable affects a dependent variable, controlling for other possible causal influences. In
this section we show that the logistic regression model, much like the linear regression model, can be
extended to accommodate multiple independent variables. We also illustrate how logistic regression
models can be used to obtain and analyze the predicted probabilities of a binary variable.
To keep things consistent with the previous examples—but to add an interesting wrinkle—we
introduce a dummy independent variable into the education−voting model:
Education, as before, is measured in years of schooling, from 0 to 20. “Partisan” is a dummy variable
that gauges strength of party identification: Strong Democrats and strong Republicans are coded 1 on
this dummy, and all others (weak identifiers, Independents, and Independent leaners) are coded 0.
From an empirical standpoint, we know that strongly partisan people, regardless of their party
affiliation, are more likely to vote than are people whose partisan attachments are weaker. So we
would expect a positive relationship between strength of partisanship and the likelihood of voting.
The coefficients in this model— â,
regression. The coefficient
, and
—are directly analogous to coefficients in multiple linear
will estimate the change in the logged odds of voting for each 1-year
change in education, controlling for the effect of partisan strength. Similarly, will tell us by how
much to adjust the estimated logged odds for strong partisans, controlling for the effect of education.
To the extent that education and partisan strength are themselves related, the logistic regression
procedure will control for this, and it will estimate the partial effect of each variable on the logged
odds of voting. And the intercept, â, will report the logged odds of voting when both independent
variables are equal to 0, for respondents with no schooling (for whom education = 0) and who are not
strong party identifiers (partisan = 0). This point bears emphasizing: The logistic regression model
specified above is a linear-additive model, and it is just like a garden-variety multiple regression
model. The partial effect of education on the logged odds of voting is assumed to be the same for
strong partisans and nonstrong partisans alike. And the partial effect of partisan strength on the
logged odds of voting is assumed to be the same at all values of education. (This point becomes
important in a moment, when we return to a discussion of probabilities.)
Table 9-8 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint59#s9781483396095.i1978) reports the results of the analysis, using the GSS data. Plugging the
coefficient values into the logistic regression model, we find:
Interpretation of these coefficients is by now a familiar task. When we control for partisan strength,
each 1-year increase in education increases the logged odds of voting by .224. And, after we take into
account the effect of education, being a strong partisan increases the logged odds of voting by 1.522.
Turning to the odds ratios, reported in the “Exp(b)” column of Table 9-8
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-59#s9781483396095.i1978) ,
we can see that a unit increase in education multiplies the odds by about 1.25. And, when “partisan” is
switched from 0 to 1, the odds ratio jumps by 4.6. In other words, when we control for education,
strong partisans are nearly five times more likely to vote than are weak partisans or Independents.
Framing the relationships in terms of percentage change in the odds: The odds of voting increase by
about 25 percent for each incremental change in education and by 358 percent for the comparison
between nonstrong partisans and partisans. Finally, according to the z statistics (and accompanying Pvalues), each independent variable is significantly related to the logged odds of voting.
Overall, the model performs fairly well. The chi-square statistic (255.84, P-value = .000) says that
including both independent variables in the estimation procedure provides significant predictive
improvement over the baseline know-nothing model. 14
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2055)
The McFadden (.124), Cox−Snell (.133), and Nagelkerke (.195) R-square values, while not
spellbinding, suggest that education and partisanship together do a decent job of predicting voting,
especially when compared with our earlier analysis (see Table 9-7
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-58#s9781483396095.i1969)
), in which education was used as the sole predictor.
These results add up to a reasonably complete analysis of the relationships. Certainly it is good to
know the size and significance of the partial effects of education and partisan strength on the logged
odds of voting, and it is convenient to express these effects in the language of odds ratios and the
percentage change in odds. Often, however, the researcher wishes to understand his or her findings in
the most intuitively meaningful terms: probabilities. We might ask, “What are the effects of the
independent variables on the probability of voting? Although education and partisan attachments
clearly enhance the odds of voting, by how much do these variables affect the probability that people
will turn out?” These questions are perfectly reasonable, but they pose two challenges. First, in any
logistic regression model—including the simple model with one independent variable—a linear
relationship exists between x and the logged odds of y, but a nonlinear relationship exists between x
and the probability of y. This signature feature of logistic regression was discussed earlier. The
marginal effect of x on the probability of y will not be the same for all values of x. Thus the effect of,
say, a 1-year increase in education on the probability of voting will depend on where you “start” along
the education variable. Second, in a logistic regression model with more than one independent
variable, such as the model we just discussed, the independent variables have a linear-additive
relationship with the logged odds of y, but they may show interaction effects on the probability of y.
For example, logistic regression will permit the relationship between education and the probability of
voting to vary, depending on respondents’ levels of partisan strength. The technique might find that
the marginal effects of education on the probability of voting are different—perhaps weaker, perhaps
stronger—depending on partisan strength. Or, it could find that the marginal effects of education are
the same or very similar, regardless of partisan strength. 15
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2058)
Odd as it may sound, these challenges define some rather attractive features of logistic regression.
Properly applied, the technique allows the researcher to work with probabilities instead of odds or
logged odds and, in the bargain, to gain revealing substantive insights into the relationships being
studied.
9.4 Working with Probabilities: MEMs and MERs
Return to the logistic regression model we just estimated and consider how best to represent and
interpret these relationships in terms of probabilities. The model will, of course, yield the predicted
logged odds of voting for any combination of the independent variables. Just plug in values for
education and the partisan dummy, do the math, and obtain an estimated logged odds of voting for
that combination of values. As we saw earlier, logged odds can be converted back into odds and, in
turn, odds can be translated into probabilities. These conversions—from logged odds to odds, and
from odds to probabilities—form the basis of two commonly used methods for representing complex
relationships in terms of probabilities. 16
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2061)
First, we could examine the effect of one independent variable on the probability of the dependent
variable, while holding the other independent variables constant at their sample averages. Thus, one
retrieves “marginal effects at the means,” or MEMs. In the current example, we could estimate the
probability of voting at each value of education, from 0 to 20, while holding partisan constant at its
mean. This would allow us to answer the question, “For individuals of ‘average’ partisan strength, how
does the probability of voting change as education increases?” A second approach is to report changes
in the probability of the dependent variable across the range of an interesting independent
variable—and to do so separately, for discrete categories of a another independent variable. Thus, one
retrieves “marginal effects at representative values,” or MERs. For example, we might estimate the
probability of voting at each value of education, from 0 to 20, separately for weak partisans and for
strong partisans. This would enable us to answer these questions: “In what ways does education affect
the probability of voting for individuals who are weakly tied to parties? How do these effects differ
from education’s effect for people who are more strongly partisan?” We consider each of these
approaches, beginning with MEMs.
Suppose that we wanted to see what happens to the probability of voting as education increases from
0 to 20 years, holding partisan constant at its mean. We would enlist the logistic regression
coefficients and use them to calculate—or, better, use a computer to calculate—twenty-one separate
probabilities, one for each value of education, from education = 0 through education = 20. For
convenience, here is the logistic regression equation we obtained earlier:
The first group, people with 0 years of schooling, has a value of 0 on the education variable and a value
of .2768 on partisan, its sample-wide mean. (The mean of any dummy is defined as the proportion of
the sample scoring 1 on the dummy. Because 27.68 percent of the GSS sample are strong partisans,
then partisan has a mean equal to .2768. 17
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2063) )
Using the logistic regression model to estimate the logged odds of voting for this group:
So, the estimated logged odds of voting are equal to −1.933. What are the odds of voting for this
group? Odds can be reclaimed by taking the exponent of the logged odds: Exp(−1.933), which is equal
to .145. Thus the odds are .145. Next, convert .145 to a probability using the formula: Probability =
Odds / (1 + Odds). Converting .145 to a probability of voting, we have .145/1.145 ≈ .13. Thus, the
estimated probability of voting for individuals with no formal schooling and with average partisan
strength is equal to .13. There is a very weak probability—barely better than one chance in ten—that
these individuals voted. What is the turnout probability of average partisans with 20 years of
schooling? We would have:
If the logged odds are 2.547, then the odds would be Exp(2.547), equal to 12.769. Finally, the
estimated probability is: 12.769/13.769 ≈ .93. There are over nine chances in ten that these people
voted. Table 9-9 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint60#s9781483396095.i2000) and Figure 9-2
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-60#s9781483396095.i2003)
present the estimated probabilities of voting at each value of education, calculated while holding
partisan constant at its sample mean.
Logistic regression’s nonlinear signature is clearly evident in Table 9-9
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-60#s9781483396095.i2000)
and Figure 9-2 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint60#s9781483396095.i2003) . At low levels of education, the probabilities are lackluster, and they
increase at a slow rate of about .03 or .04 for each 1-year increment in education. Higher values of
education show a version of the same thing—in this case, high probabilities, and low marginal
changes. How else, besides these signature features, might we describe the relationship between the
independent variable and the probability of the dependent variable? We could cite two additional
facets: the switchover point and the full effect. The switchover point is defined by the interval of the
independent variable in which the probability of the dependent variable changes from less than .5 to
greater than .5. (In negative relationships, it changes from greater than .5 to less than .5.) In the
current example, the switchover occurs between 8 and 9 years of education. It is here that the
probability of voting increases from .465 to .521, an increase of .056, which is the largest marginal
increase in the data. In fact, the largest marginal change in probabilities, sometimes called the
“instantaneous effect,” always occurs at the switchover point. 18
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2065)
The full effect of the independent variable is determined by subtracting the probability associated
with the lowest value of the independent variable from the probability associated with the highest
value of the independent variable. Applied to the current example: .928 − .126 = .802. Thus, across its
full range of values, and holding partisan strength constant at its sample mean, education changes the
probability of voting by a healthy .802.
Source: 2012 General Social Survey.
Switchover points and full effects become especially valuable interpretive tools in comparing marginal
effects at representative values (MERs). To illustrate MERs, we will estimate the probability of voting,
by years of education—and we will do so separately for weak and strong partisans. Does the
switchover occur at different points for the two partisan types? Does the full effect of education vary
by strength of partisanship? These questions require the calculation of forty-two probabilities:
twenty-one for weak partisans and twenty-one for strong partisans. Here again is the logistic
regression model:
What is the estimated logged odds of voting for individuals with no formal schooling (education = 0)
and weak partisan ties (partisan= 0)? It is −2.354 + .224(0) + 1.522(0) = −2.354. The odds of voting is
equal to Exp(−2.354) = .095; and the probability = .095/1.095 = .087. Thus, for individuals who lack
both participatory resources, education and partisan strength, the probability of voting is
extraordinarily low. How do these individuals compare with their strongly partisan counterparts,
those for whom education = 0 and partisan = 1?
Figure 9-2 Predicted Probabilities of Voting, by Years of Education (graphic)
Source: 2012 General Social Survey.
Note: Partisan strength held constant at .2768.
The logged odds of voting, −2.354 + .224(0) + 1.522(1) = − .832. The odds of voting, Exp(− .832)
= .435; and the probability, .435/1.435 = .303. These two probabilities alone permit a remarkable
comparison: At the lowest level of education, switching partisan from 0 to 1 boosts the probability of
voting by .303 − .087 = .216. This is indeed a sizeable effect.
Table 9-10 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint60#s9781483396095.i2007) and Figure 9-3
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-60#s9781483396095.i2008)
present the predicted probabilities for all combinations of education and partisan strength. We can
plainly see the different effects of education: Education plays a bigger role for weak partisans than for
strong partisans. For weak partisans, the full effect of education is equal to .894 − .087 = .807. For
strong partisans, .975 − .303 = .672. And notice the different switchover points. For individuals more
weakly attached to parties, the probabilities build at a leisurely pace, not switching over until
education reaches the interval between 10 years and 11 years of schooling. Strong partisans, by
contrast, cross the threshold between 3 and 4 years of education. Indeed, a strong partisan with an
associate’s degree (education = 14) is as likely to turn out as a weak partisan with a graduate
education (education = 20): .909, compared with .894.
It is worth pointing out that MEMs and MERs are not mutually exclusive strategies for analyzing
probabilities. Suppose we added another predictor, income, to the current turnout model. We could
define a MEMs−MERs hybrid—for example, looking at the effect of income on the predicted
probability of voting, separately for weak and strong partisans (MERs), holding education constant at
its sample mean (MEMs). MEMs and MERs are two basic approaches. There are alternative ways of
working with probabilities in logistic regression. (For a discussion of one alternative approach,
average marginal effects or AMEs, see Box 9-2
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-60#s9781483396095.i2010)
.)
Source: 2012 General Social Survey.
Figure 9-3 Predicted Probabilities of Voting, by Education and Partisan Strength (graphic)
Source: 2012 General Social Survey.
Box 9-2 Average Marginal Effects
As an alternative to MEMs, some researchers recommend retrieving average marginal
effects (AMEs) for variables of interest. Consider an example used by methodologist
Richard Williams. Suppose we wish to estimate the probability of being diabetic, separately
for “average” whites and blacks. In the MEMs approach, we would reclaim two
probabilities, one for whites and one for blacks, while holding all other predictors constant
at their sample means. In the AMEs approach, we first would calculate a probability for
each individual in the sample, under the assumption that each individual is white, and we
would allow all other predictors to assume their observed values for each case. Second, we
would recalculate the probabilities, this time under the assumption that each individual is
black, and again we would allow all other predictors to assume observed values. For any
individual, the difference between the two probabilities is the marginal effect (ME) of race
for that case. The average of the MEs across all cases is the AME of race on the probability
of diabetes. According to Williams, “With AMEs, you are in effect comparing two
hypothetical populations—one all white, one all black—that have the exact same values on
the other independent variables in the model. The logic is similar to that of a matching
study, where subjects have identical values on every independent variable except one.
Because the only difference between these two populations is their races, race must be the
cause of the difference in their probabilities of having diabetes.” 19
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint61#s9781483396095.i2067)
Summary
A political researcher wants to explain why some people approve of same-sex marriage, whereas
others disapprove. Thinking that age plays a causal role, she hypothesizes that as age increases, the
likelihood of disapproval will go up, that older people will be more likely than younger people to
disapprove of same-sex marriage. Recognizing that women may be more liberal than men on this
question, the researcher wants to isolate the effect of age, controlling for gender. Consulting her
survey dataset, the researcher finds a binary variable that will serve as the dependent variable
(respondents who approve of same-sex marriage are coded 0 on this variable and those who
disapprove are coded 1). She also finds an interval-level independent variable, age measured in years,
from 18 to 89. She has a dummy variable labeled “female,” coded 0 for men and 1 for women. So she
has the hypothesis, the data, and the variables. Now what? Which analytic technique is best suited to
this research problem? If this researcher is someone other than you, she may need to test her idea by
collapsing age into three or four categories, retrieving the tool labeled cross-tabulation from her
methods toolbox, and comparing the percentages of disapprovers across the collapsed categories of
the independent variable for each value of the control. That might work okay. But what if she decides
to control for the effects of several other variables that may shape individuals’ approval or disapproval
of same-sex marriage—such as education, ideology, and partisanship? Cross-tabulation would become
cumbersome to work with, and she may need to settle for an incomplete analysis of the relationships.
The larger point, of course, is that this researcher’s ability to answer an interesting substantive
question is limited by the tools at her disposal.
If this researcher is you, however, you now know a far better approach to the problem. Reach into
your toolbox of techniques, select the tool labeled logistic regression, and estimate this model: Logged
odds (disapproval) = a + b 1 (age) + b 2 (female). The logistic regression coefficient, b 1, will tell you
how much the logged odds of disapproval increase for each 1-year change in age controlling for
gender. Of course, logged odds are not easily grasped. But by entering the value of b 1 into your handheld calculator and tapping the e x key—or, better still, by examining the Exp(b) values in the
computer output—you can find the odds ratio, the change in the odds of disapproving as age increases
by 1 year. You can convert Exp(b) into a percentage change in the odds of disapproval. You can test
the null hypothesis that b 1 is equal to 0 by consulting the P-value of the z statistic or Wald statistic.
You can see how well the model performs by examining changes in the magnitude of −2LL and
reviewing the accompanying chi-square test. Several R-square−like measures, such as McFadden,
Cox−Snell, and Nagelkerke, will give you a general idea of how completely age and gender account for
the likelihood of disapproving of same-sex marriage. You can calculate and examine the predicted
probabilities of disapproval at each value of age, and hold female constant at its sample mean. Perhaps
more meaningfully, you could see whether age works differently for women and men when it comes to
disapproval of same-sex marriage. If you are challenged by a skeptic who thinks you should have
controlled for education and partisanship, you can reanalyze your model, controlling for these
variables—and any other independent variables that might affect your results. By adding logistic
regression to your arsenal of research techniques, you are now well prepared to handle any research
question that interests you.
Take a closer look.
edge.sagepub.com/pollock (http://edge.sagepub.com/pollock)
Key Terms
average marginal effects (p. 238)
binary variable (p. 216)
common logarithms (p. 220)
likelihood function (p. 227)
logged likelihood (p. 228)
logits (p. 220)
marginal effects at representative values (p. 233)
marginal effects at the means (p. 233)
maximum likelihood estimation (p. 226)
natural logarithms (p. 220)
odds (p. 218)
odds ratio (p. 219)
percentage change in the odds (p. 220)
Exercises 20
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint61#s9781483396095.i2069)
1. In this exercise, you will interpret the results of a logistic regression analysis of the
relationship between gun control opinions and party identification. The binary
dependent variable is coded 1 for pro-control opinions and 0 for anti-control opinions.
The independent variable, party identification, is a 7-point scale, ranging from 0
(strong Republicans) to 6 (strong Democrats).
A. The logistic regression coefficient tells us that, for each one-unit increase in the
party identification scale, the logged odds of a pro-control opinion increases
by .506. Turn your attention to the odds ratio, Exp(b). Recall that higher values
of party identification are more Democratic and lower values are less
Democratic. This coefficient says that an individual at one value of party
identification is ______________ times as likely to be pro-control as an individual
at the next-lower value of party identification.
B. Use the value of Exp(b) to compute a percentage change in the odds. According
to your calculations, each unit increase in party identification, from less
Democratic to more Democratic, increases the odds of being pro-control by
how much?
C. (i) State the null hypothesis for this relationship. (ii) What is your inferential
decision—reject the null hypothesis or do not reject the null hypothesis? (iii)
Explain your answer.
2.
Here is an extension of the analysis in Exercise 1. The following analysis adds gender as
a second independent variable. The model includes “female,” a dummy coded 1 for
females and 0 for males.
Parts A−D present interpretations based on these results. For each part, (i) state
whether the interpretation is correct or incorrect, and (ii) explain why the
interpretation is correct or incorrect. For incorrect interpretations, be sure that your
response in (ii) includes the correct interpretation.
A. Interpretation One: If we control for gender, each one-unit increase in the
party identification scale increases the likelihood of pro-control opinion by
50.3 percent.
B. Interpretation Two: If we control for party identification, females are almost
twice as likely to be pro-control than are men.
C. Interpretation Three: Compared with how well the model performs without
including measures of party identification and gender, inclusion of both of
these independent variables provides a statistically significant improvement.
D. Interpretation Four: Party identification and gender together explain between
15.1 percent and 25.3 percent of the variance in the likelihood of a pro-control
opinion.
3. The following table reports the predicted probabilities of a pro-control opinion, by
party identification, for females and males.
A.
(i) For females, the switchover point occurs between which to values of party
identification? (ii) For males, the switchover point occurs between which to
values of party identification?
Parts B and C present interpretations based on these results. For each part, (i)
state whether the interpretation is correct or incorrect, and (ii) explain why
the interpretation is correct or incorrect. For incorrect interpretations, be sure
that your response in (ii) includes the correct interpretation.
B. Interpretation One: For both women and men, the full effect of partisanship on
the probability of a pro-control opinion is greater than .6.
C. Interpretation Two: Strong Republicans have a larger “gender gap” on gun
control opinions than do Strong Democrats.
Notes
1. Methodologists have developed several techniques that may be used to analyze binary dependent
variables. One popular technique, probit analysis, is based on somewhat different assumptions than
logistic regression, but it produces similar results. Logistic regression, also called logit analysis or logit
regression, is computationally more tractable than probit analysis and thus is the sole focus of this
chapter. For a lucid discussion of the general family of techniques to which logistic regression and
probit analysis belong, see Tim Futing Liao, Interpreting Probability Models: Logit, Probit, and Other
Generalized Linear Models (Thousand Oaks, Calif.: SAGE Publications, 1994).
2. There are two statistically based problems with using OLS on a binary dependent variable, both of
which arise from having only two possible values for the dependent variable. OLS regression assumes
that its prediction errors, the differences between the predicted values of y and the actual values of y,
follow a normal distribution. The prediction errors for a binary variable, however, follow a binomial
distribution. More seriously, OLS also assumes homoscedasticity of these errors, that is, that the
prediction errors are the same for all values of x. With a binary dependent variable, this assumption
does not hold up. An accessible discussion of these problems may be found in Fred C. Pampel, Logistic
Regression: A Primer (Thousand Oaks, Calif.: SAGE Publications, 2000), 3–10.
3. Because of this natural log transformation of the dependent variable, many researchers use the
terms logit regression or logit analysis instead of logistic regression. Others make a distinction
between logit analysis (used to describe a situation in which the independent variables are not
continuous but categorical) and logistic regression (used to describe a situation in which the
independent variables are continuous or a mix of continuous and categorical). To avoid confusion, we
use logistic regression to describe any situation in which the dependent variable is the natural log of
the odds of a binary variable.
4. Logistic regression will fit an S-shaped curve to the relationship between an interval-level
independent variable and the probability of a dependent variable, but it need not be the same
S-shaped pattern shown in Figure 9-1. For example, the technique may produce estimates that trace a
“lazy S,” with probabilities rising in a slow, nearly linear pattern across values of the independent
variable. Or perhaps the relationship is closer to an “upright S,” with probabilities changing little
across the high and low ranges of the independent variable but increasing rapidly in the middle
ranges.
5. If the logistic regression coefficient, , were equal to 0, then the odds ratio, Exp( ), would be Exp(0),
or e 0, which is equal to 1.
6. Computer output also will report a standard error and test of significance for the intercept, â. This
would permit the researcher to test the hypothesis that the intercept is significantly different from 0.
So if we wanted to test the null hypothesis that the logged odds of voting for individuals with no
formal education (who have a value of 0 on the independent variable) was equal to 0—that is, that the
odds of voting for this group was equal to 1—we would use the standard error of the intercept. Much
of the time such a test has no practical meaning, and so these statistics have been omitted from Table
9-3 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint57#s9781483396095.i1927) .
7. The Wald statistic (named for statistician Abraham Wald) divides the regression coefficient by its
standard error and then squares the result. The value of Wald follows a chi-square distribution with
degrees of freedom equal to 1.
8. The estimation procedure used by logistic regression is not aimed at minimizing the sum of the
squared deviations between the estimated values of y and the observed values of y. So the
conventional interpretation of R-square, the percentage of the variation in the dependent variable that
is explained by the independent variable(s), does not apply when the dependent variable is binary.
9. The likelihood function = П {Pi yi × (1−Pi)1−yi }. The expression inside the brackets says, for each
individual case, to raise the model’s predicted probability (P) to the power of y, and to multiply that
number by the quantity 1 minus the predicted probability raised to power of 1 − y. The symbol П tells
us to multiply all these individual results together. The formula is not as intimidating as it looks. When
y equals 1, the formula simplifies to (P × 1), since P raised to y equals P, and (1 − P) raised to 1 − y
equals 1. Similarly, when y equals 0, the formula simplifies to 1 − P.
10. A likelihood model that uses the independent variable(s) to generate predicted probabilities is
called the full model or complete model. In making statistical comparisons between models, some
computer programs work with the log of the likelihood ratio, denoted ln(L1/L2), in which L1 is the
likelihood of Model 1 (reduced model) and L2 is the likelihood of Model 2 (complete model). Taking
the log of the likelihood ratio is equivalent to subtracting the logged likelihood of Model 2 from the
logged likelihood of Model 1: ln(L1/L2) = ln(L1)-ln(L2).
11. Degrees of freedom is equal to the difference between the number of independent variables
included in the models being compared. Since Model 2 has one independent variable and Model 1 has
no independent variables, degrees of freedom is equal to 1 for this example. We can, of course, test the
null hypothesis the old-fashioned way, by consulting a chi-square table. The critical value of chisquare, at the .05 level with 1 degree of freedom, is equal to 3.84. Since the change in −2LL, which is
equal to 3.98, exceeds the critical value, we can reject the null hypothesis.
12. Logged likelihoods can be confusing. Remember that likelihoods vary between 0 (the model’s
predictions do not fit the data at all) and 1 (the model’s predictions fit the data perfectly). This means
that the logs of likelihoods can range from very large negative numbers (any likelihood of less than 1
has a negatively signed log) to 0 (any likelihood equal to 1 has a log equal to 0). So if Model 2 had a
likelihood of 1—that is, it perfectly predicted voter turnout—then it would have a logged likelihood of
0. In this case, the conceptual formula for R-square would return a value of 1.0.
13. Cox−Snell R-square and Nagelkerke’s R-square are included in SPSS logistic regression output.
Another measure, popular among political researchers, is Aldrich and Nelson’s pseudo R-square:
(Change in −2LL) / (Change in −2LL + N), in which N is the sample size. Menard has proposed yet
another measure, based on the correlation between the logistic regression’s predicted probabilities of
y and the actual values of y. See John H. Aldrich and Forrest D. Nelson, Linear Probability, Logit, and
Probit Models (Thousand Oaks, Calif.: SAGE Publications, 1984), 54–58; and Scott Menard, Applied
Logistic Regression Analysis, 2nd ed. (Thousand Oaks, Calif.: SAGE Publications, 2002), 24–27. For a
comparison of the various pseudo-R-square measures, see Thomas J. Smith and Cornelius M. McKenna,
“A Comparison of Logistic Regression Pseudo R 2 Indices,” Multiple Linear Regression Viewpoints 39,
no. 2 (2013): 17–26.
14. The logged likelihoods for the reduced model and the complete model are not shown in Table 9-8
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-59#s9781483396095.i1978) .
Rather, only the chi-square test statistic of interest, the change in −2LL, is reported. Note that, because
no independent variables are included in the baseline model and two independent variables are
included in the complete model, there are 2 degrees of freedom for the chi-square test.
15. On the complexities of interaction effects in logistic regression, see UCLA: Statistical Consulting
Group, “Deciphering Interactions in Logistic Regression,” accessed April 1, 2015,
http://www.ats.ucla.edu/stat/stata/seminars/interaction_sem/interaction_sem.htm
(http://www.ats.ucla.edu/stat/stata/seminars/interaction_sem/interaction_sem.htm) . The UCLA authors
offer this pithy yet vaguely discouraging quote of the day: “Departures from additivity imply the
presence of interaction types, but additivity does not imply the absence of interaction types.” This
quote is attributed to Kenneth J. Rothman and Sander Greenland, Modern Epidemiology, 2nd ed.
(Philadelphia: Lippincott Williams and Wilkins, 1998).
16. This section adopts the terminology used by Richard Williams, in “Using the Margins Command to
Estimate and Interpret Adjusted Predictions and Marginal Effects.” Stata Journal 12, no. 2 (2012): 308
–331.
17. It may seem odd to consider a 0−1 variable as having a mean or average in the conventional sense
of the term. After all, respondents are coded either 0 or 1. There is no respondent who is coded
“.2768” on partisan strength. However, means can also be thought of as random probabilities, termed
expected values by statisticians. If you were to pick any respondent at random from the General Social
Survey dataset, what is the probability that the case you chose would be a strong partisan? The
answer is .2768, the expected value, or mean value, of this variable.
18. See Pampel, Logistic Regression: A Primer, 24–26.
19. See Williams, “Using the Margins Command to Estimate and Interpret Adjusted Predictions and
Marginal Effects,” Stata Journal, 326.
20. Data used in these exercises are from the 2014 Pew Political Typology/Polarization Survey (N =
10,003), conducted by the Pew Research Center, 1615 L Street, NW, Suite 700, Washington, D.C.
20036. See Michael Dimock, Carroll Doherty, Jocelyn Kiley, and Russ Oates, “Political Polarization in
the American Public: How Increasing Ideological Uniformity and Partisan Antipathy Affect Politics,
Compromise and Everyday Life,” published June 12, 2014, http://www.peoplepress.org/2014/06/12/political-polarization-in-the-american-public/ (http://www.peoplepress.org/2014/06/12/political-polarization-in-the-american-public/) . The data are publicly available
from http://www.people-press.org/category/datasets/?download=20057011
(http://www.people-press.org/category/datasets/?download=20057011) .
10 Thinking Empirically, Thinking Probabilistically
This book has covered only the basics—the essential skills you need to understand political research
and to perform your own analysis. Even so, we have discussed a wide range of topics and
methodological issues. The first five chapters dealt with the foundations of political analysis: defining
and measuring concepts, describing variables, framing hypotheses and making comparisons,
designing research, and controlling for rival explanations. In the last four chapters we considered the
role of statistics: making inferences, gauging the strength of relationships, performing linear
regression analysis, and interpreting logistic regression. As you read research articles in political
science, discuss and debate political topics, or evaluate the finer points of someone’s research
procedure, the basic knowledge imparted in this book will serve you well.
This book has also tried to convey a larger vision of the enterprise of political analysis. Political
scientists seek to establish new facts about the world, to provide rich descriptions and accurate
measurements of political phenomena. Political scientists also wish to explain political events and
relationships. In pursuit of these goals, researchers learn to adopt a scientific mindset toward their
work, a scientific approach to the twin challenges of describing and explaining political variables. As
you perform your own political analysis, you too are encouraged to adopt this way of thinking. Here
are two recommendations. First, in describing new facts, try to think empirically. Try to visualize how
you would measure the phenomenon you are discussing and describing. Be open to new ideas, but
insist on empirical rigor. Political science, like all science, is based on empirical evidence. This
evidence must be described and measured in such a way that others can do what you did and obtain
the same results. Second, in proposing and testing explanations, try to think probabilistically. You are
well aware of one reason that political researchers must rely on probabilities: Random samples are a
fact of life for much political science. Another reason is political science deals with human behavior
and human events, and so it is an inexact science. Let’s briefly illustrate why it is important to think
empirically. Let’s also look at the reasons political scientists must think probabilistically.
10.1 Thinking Empirically
The main projects of political science are to describe concepts and to analyze the relationships
between them. But potentially interesting relationships are often obscured by vague, conceptual
language. During a class meeting of a voting and elections course, for example, students were
discussing the electoral dynamics of ballot initiatives, law-making vehicles used frequently in several
states. Controversial proposals, such as denying state benefits to undocumented immigrants or
banning same-sex marriage, may appear on the ballot to be decided directly by voters. Near the end of
the discussion, one student observed: “It appears to me that most ballot initiatives target minorities.
Most ballot initiatives, if they pass, decrease equality. Very few seem designed with egalitarian
principles in mind.” Now, this is an interesting, imaginative statement. Is it true? Without conceptual
clarification, there is no way to tell.
Upon hearing statements such as the one made by this student, you have learned to insist that
conceptual terms, like equality or egalitarian principles, be described in concrete language. How would
one distinguish an egalitarian ballot initiative, an initiative that increases equality, from one that is not
egalitarian, an initiative that decreases equality? Pressed to clarify her conceptual terms, the student
settled on one defining characteristic of the degree of egalitarianism in a ballot initiative. The concept
of egalitarianism, she said, is defined as the extent to which a ballot initiative would confer new legal
rights on an identifiable group. So some initiatives, such as those that would confer protections on
commercial farm animals, could be classified as more egalitarian, whereas others, like those making
English the state’s official language, could be classified as less egalitarian. With a workable
measurement strategy in hand, this student could then turn to an empirical assessment of her in-class
claim. Using clearly defined terms and reproducible findings, this student’s research would enhance
our understanding of these vehicles of direct democracy. 1
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-65#s9781483396095.i2087)
An openness that is tempered by skepticism nurtures knowledge of the political world. This means
that political scientists must sometimes revisit relationships and rethink established explanations. The
search for truth is an ongoing process. Consider another example. For years, scholars of U.S. electoral
behavior measured voter turnout in presidential elections by dividing the number of people who
voted by the size of the voting-age population. So, if one wanted to describe trends in turnout, one
would calculate the percentage of the voting-age population who voted in each year, and then track
the numbers over time. Indeed, measured in this way, turnouts declined steadily after 1960, a high
watermark of the twentieth century. From the 1970s through the 1990s, the relatively low turnout in
presidential elections became one of the most heavily researched phenomena in U.S. politics, and it
has fostered a search for explanatory factors. Some explanations have linked declining turnout to
attitudinal variables, such as a weakened attachment to the political parties or an erosion of trust in
government.
In 2001, research by Michael P. McDonald and Samuel Popkin pointed to a potentially serious
measurement problem with using the voting-age population as the basis for calculating turnout. 2
(http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-65#s9781483396095.i2089)
They showed that, by using the voting-age population, researchers had been including large groups of
ineligible people, such as felons or noncitizens. What is more, the percentage of noncitizens among the
voting-age population went from 2 percent in 1966 to over 8 percent in 2014. Once McDonald and
Popkin adjusted for these and other measurement errors, they showed that although turnout dropped
after 1972 (when the eligible electorate was expanded to include eighteen-year-olds), there was no
downward trend. With this more precise measurement strategy in hand, political scientists have
turned their attention toward the explanation of dramatic increases in electoral turnout in the new
millennium. 3 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint65#s9781483396095.i2092)
10.2 Thinking Probabilistically
Statistics occupies a position of central importance in the research process. A discussion of random
error has come up in several topics in this book—measurement, sampling, hypothesis testing, and
statistical significance. We have seen that the accuracy of measurement...

Purchase answer to see full
attachment