Logistic Regression

User Generated

ZrggnQebCrnpr

Humanities

Description

Write: In your initial post of at least 550 words, using Exercise 1 on pages 239-240, complete the following:

  • First, fill in the blank: The logistic regression coefficient tells us that, for each one-unit increase in the party identification scale, the logged odds of a pro-control opinion increases by .506. Turn your attention to the odds scale, Exp(b). Recall that higher values of party identification are more Democratic and lower values are less Democratic. This coefficient says that an individual at one value of party identification is __________ times as likely to be pro-control as an individual at the next-lower level of party identification.
  • Then, use the value of Exp(b) to compute a percentage change in the odds. According to your calculations, each unit increase in party identification, from less Democratic to more Democratic, increases the odds of being pro-control by how much?
  • State the null hypothesis for this relationship.
  • Finally, explain what the inferential decision is: reject the null hypothesis or do not reject the null hypothesis.

Unformatted Attachment Preview

9 Logistic Regression Learning Objectives In this chapter you will learn: • How to use logistic regression to describe the relationship between an interval-level independent variable and a dichotomous dependent variable • How logistic regression is similar to—and different from—ordinary least squares regression • How maximum likelihood estimation works • How to use logistic regression with multiple independent variables • How to use probabilities to interpret logistic regression results Political analysis is not unlike using a toolbox. The researcher looks at the substantive problem at hand, selects the methodological tool most appropriate for analyzing the relationship, and then proceeds with the analysis. Selection of the correct tool is determined largely by the levels of measurement of the variables of interest. If both the independent and the dependent variables are measured by nominal or ordinal categories—a common situation, particularly in survey research—the researcher would most likely select cross-tabulation analysis. If both the independent and the dependent variables are interval level, then ordinary least squares (OLS) regression would be applied. Finally, if the researcher wanted to analyze the relationship between an interval-level dependent variable and a categorical independent variable, then he or she might use mean comparison analysis or, alternatively, the researcher could specify and test a linear regression model using dummy variables. These techniques, all of which have been discussed in earlier chapters, add up to a well-stocked toolbox. Even so, one set of tools is missing. Logistic regression is part of a family of techniques designed to analyze the relationship between an interval-level independent variable and a categorical dependent variable, a dependent variable measured by nominal or ordinal values. The dependent variable might have any number of categories—from several, to a few, to two. In this chapter we discuss how to use and interpret logistic regression in the simplest of these situations, when the dependent variable takes on only two values. For example, suppose we are using survey data to investigate the relationship between education and voter turnout. We think that a positive relationship exists here: As education increases, so does the likelihood of voting. The independent variable, education, is measured in precise 1-year increments, from 0 (no formal education) to 20 (20 years of schooling). The dependent variable, however, takes on only two values—respondents either voted or they did not vote. In this situation, we have a binary dependent variable. A binary variable is a dichotomous variable, one that can assume only two values. Binary variables are identical to dummy variables, discussed in Chapter 8 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-49#s9781483396095.i1646) . Thus, voted/did not vote, smoker/nonsmoker, approve/do not approve, and married/unmarried are all examples of dummy variables or binary variables. 1 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2027) In some ways, logistic regression is similar to OLS regression. Like OLS, logistic regression gauges the effect of the independent variable by estimating an intercept and a slope, both familiar fixtures of linear regression. Plus logistic regression provides a standard error for the slope, which allows the researcher to test hypotheses about the effect of the independent variable on the dependent variable. And like OLS, logistic regression is remarkably flexible, permitting the use of multiple independent variables, including dummy independent variables. In one fundamental way, however, logistic regression is a different breed of cat. When we perform OLS regression, we can reasonably assume a linear relationship between an independent variable (x) and a dependent variable (y). For example, for the relationship between years of schooling (x) and income in dollars (y), we can use a linear model to estimate the average dollar-change in income for each 1-year increase in education. OLS would give us an idea of how closely the relationship between x and y fits this linear pattern. However, when we have a binary dependent variable, we must assume that it bears a nonlinear relationship to x. So as education (x) increases from 8 years to 9 years to 10 years, we most plausibly assume that the likelihood of voting (y) is low and increases slightly for each of these 1-year increments. But as education increases from 11 years to 12 years to 13 years, we would expect voter turnout to show large increases for each 1-year increment in this range of x. In the higher values of education—say, beyond 13 years—we would assume that turnout is already high and that each additional year of schooling would have a weaker effect on voting. A logistic regression analysis would give us an idea of how closely the relationship between x and y fits this nonlinear pattern. This chapter is divided into four sections. In the first section we use both hypothetical and real-world data to illustrate the logic behind logistic regression. Here you will be introduced to some unfamiliar terms—such as odds and logged odds—that define the workings of the technique, and you will learn what to look for in your own analyses and how to describe and interpret your findings. In the second section we take a closer look at maximum likelihood estimation, the method logistic regression uses to estimate the effect of the independent variable (or variables) on the dependent variable. Here you will see how logistic regression is similar to other techniques and statistics we discussed previously, particularly chi-square. In the third section we demonstrate how the logistic regression model, much like multiple linear regression, can be extended to accommodate several independent variables. Finally, we consider some additional ways to present and interpret logistic regression results. By the end of this chapter you will have added another powerful technique to your toolbox of political research methods. Get the edge on your studies. edge.sagepub.com/pollock (http://edge.sagepub.com/pollock) • Take a quiz to find out what you’ve learned. • Review key terms with eFlashcards. • Watch videos that enhance chapter content. 9.1 The Logistic Regression Approach We begin with a hypothetical example. Suppose we are investigating whether education (x) affects voter turnout (y) among a random sample of respondents (n = 500). For purposes of illustration, we will assume that the independent variable, education, is an interval-level variable that varies from 0 (low) to 4 (high), and that voter turnout is a binary dependent variable, coded 1 if the individual voted and 0 if he or she did not vote. Table 9-1 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1883) shows the results from a cross-tabulation analysis of the hypothetical sample data. Although column percentages have not been supplied in Table 9-1 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1883) , they are easy to figure out because each value of education contains exactly 100 cases. For example, of the 100 people in the low-education category, 6 voted—a percentage equal to 6 or a proportion equal to .06. Twenty percent (.20) of the 100 middle-low education individuals voted, 50 percent (.50) of the middle group voted, and so on. The bottom row of Table 9-1 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1883) presents the proportion of voters for each value of education, but it uses the label “Probability of voting.” Why use probability instead of proportion? The two terms are synonymous. Think of it this way: If you were to randomly select one individual from the group of 100 low-education respondents, what is the probability that this randomly selected person voted? Because random selection guarantees that each case has an equal chance of being picked, there are 6 chances in 100—a probability of .06—of selecting a voter from this group. Similarly, you could say that there is a random probability of voting equal to .06 for any individual in the low-education category, a probability of .20 for any respondent in the middle-low group, and so on. It is important to shift your thinking from proportions to probabilities, because logistic regression is aimed at determining how well an independent variable (or set of independent variables) predicts the probability of an occurrence, such as the probability of voting. Consider the Table 9-1 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint57#s9781483396095.i1883) probabilities and make some substantive observations. Clearly, a positive relationship exists between education and the probability of voting: As education (x) goes up, so does the probability of voting (y). Now examine this pattern more closely and apply the logic of linear regression. Does a one-unit increase in the independent variable produce a consistent increase in the probability of voting? Starting with the interval between low and middle-low, the probability goes from .06 to .20—an increase of .14. So by increasing the independent variable by 1 in this interval, we see a .14 increase in the probability of voting. Between middle-low and middle, however, this effect increases substantially, from .20 to .50—a jump of .30. The next increment, from middle to middlehigh, produces another .30 increase in the probability of voting, from .50 to .80. But this effect levels off again between the two highest values of the independent variable. Moving from middle-high to high education occasions a more modest increase of .14 in the probability of voting. Thus the linear logic does not work very well. A unit change in education produces a change in the probability of voting of either .14 or .30, depending on the range of the independent variable examined. Put another way, the probability of voting (y) has a nonlinear relationship to education (x). Rest assured that there are very good statistical reasons the researcher should not use OLS regression to estimate the effect of an interval-level independent variable on a binary dependent variable. 2 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2029) Perhaps as important, there are compelling substantive reasons you would not expect a linear model to fit a relationship such as the one depicted in Table 9-1 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1883) . Think about this for a moment. Suppose you made $10,000 a year and were faced with the decision of whether to make a major purchase, such as buying a home. There is a good chance that you would decide not to make the purchase. Now suppose that your income rose to $20,000, a $10,000 increase. To be sure, this rise in income might affect your reasoning, but most likely it would not push your decision over the purchasing threshold, from a decision not to buy to a decision to buy. Similarly, if your initial income were $95,000, you would probably decide to buy the house, and an incremental $10,000 change, to $105,000, would have a weak effect on this decision—you were very likely to make the purchase in the first place. But suppose that you made $45,000. At this income level, you might look at your decision a bit differently: “If I made more money, I could afford a house.” Thus that $10,000 pay raise would push you over the threshold. Going from $45,000 to $55,000 greatly enhances the probability that you would make the move from “do not buy” to “buy.” So at low and high initial levels of income, an incremental change in the causal variable has a weaker effect on your dichotomous decision (do not buy/buy) than does the same incremental change in the middle range of income. Although fabricated, the probabilities in Table 9-1 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1883) show a plausible pattern. Less educated individuals are unlikely to vote, and you would not expect a small increment in education to make a huge difference in the probability of voting. The same idea applies to people in the upper education range. Individuals in the middle-high to high category are already quite likely to vote. It would be unreasonable to suggest that, for highly educated people, a one-unit increase in the independent variable would have a big effect on the likelihood of voting. It is in the middle intervals of the independent variable—from middle-low to middle-high—where you might predict that education would have its strongest effect on the dependent variable. As people in this range gain more of the resource (education) theoretically linked to voting, a marginal change in the independent variable is most likely to switch their dichotomous decision from “do not vote” to “vote.” Logistic regression allows us to specify a model that takes into account this nonlinear relationship between education and the probability of voting. As we have seen, the first step in understanding logistic regression is to think in terms of the probability of an outcome. The next step is to get into the habit of thinking in terms of the odds of an outcome. This transition really is not too difficult, because odds are an alternative way of expressing probabilities. Whereas probabilities are based on the number of occurrences of one outcome (such as voting) divided by the total number of outcomes (voting plus nonvoting), odds are based on the number of occurrences of one outcome (voting) divided by the number of occurrences of the other outcome (nonvoting). According to Table 9-1 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1883) , for example, among the 100 people in the middle-high education group, there were 80 voters—a probability of voting equal to 80/100 or .80. What are the odds of voting for this group? Using the raw numbers of voters and nonvoters, the odds would be 80 to 20, or, to use a more conventional way of verbalizing odds, 4 to 1, four voters to every nonvoter. In describing odds, we ordinarily drop the “ ... to 1” part of the verbalization and say that the odds of voting are 4. Thus, for the middle-high education group, the probability of voting is .80 and the odds of voting are 4. In figuring odds, you can use the raw numbers of cases, as we have just done, or you can use probabilities to compute odds. The formula for converting probabilities to odds is as follows: Odds = Probability / ( 1 − Probability ) Apply this conversion to the example just discussed. For middle-high education respondents, the odds would be .80 divided by (1 minus .80), which is equal to .80/.20, or 4. The “Odds of voting” column in Table 9-2 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint57#s9781483396095.i1895) shows this conversion for the five education groups. Consider the numbers in the “Odds of voting” column and note some further properties of odds. Note that probabilities of less than .50 produce odds of less than 1 and probabilities of greater than .50 convert to odds of greater than 1. The probabilities for low and middle-low education respondents (.06 and .20, respectively) convert to odds of .06 and .25, and the probabilities among the highest education groups translate into odds of 4 and 16. If an event is as likely to occur as not to occur, as among the middle education people, then the probability is .50 and the odds are equal to 1 (.50/.50 = 1). Now examine the “Odds of voting” column in Table 9-2 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1895) more closely. Can you discern a systematic pattern in these numbers, as you proceed down the column from low education to high education? Indeed, you may have noticed that the odds of voting for the middle-low education group is (very nearly) four times the odds of voting for the low education category, since 4 times .06 is about equal to .25. And the odds of voting for the middle education group is four times the odds for the middle-low group, since 4 times .25 equals 1. Each additional move, from middle to middle-high (from an odds of 1 to an odds of 4) and from middle-high to high (from 4 to 16), occasions another fourfold increase in the odds. So, as we proceed from lower to higher values of the independent variable, the odds of voting for any education group is four times the odds for the nextlower group. In the language of logistic regression, the relationship between the odds at one value of the independent variable compared with the odds at the next-lower value of the independent variable is called the odds ratio. Using this terminology to describe the “Odds of voting” column of Table 9-2 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1895) , we would say that the odds ratio increases by 4 for each one-unit increase in education. The pattern of odds shown in Table 9-2 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1895) may be described in another way. Instead of figuring out the odds ratio for each change in education, we could calculate the percentage change in the odds of voting for each unit change in education. This would be accomplished by seeing how much the odds increase and then converting this number to a percentage. Between low and middle-low, for example, the odds of voting go from .06 to .25, an increase of .19. The percentage change in the odds would be .19 divided by .06, which is equal to 3.17—a bit more than a 300-percent increase in the odds of voting. For the move from middle-low to middle we would have (1 − .25)/.25 = 3.00, another 300-percent increase in the odds of voting. In fact, the odds of voting increases by 300 percent for each additional unit increase in education: from middle to middle-high ([4 − 1]/1 = 3.00) and from middle-high to high ([16 − 4]/4 = 3.00). Using this method to describe the Table 9-2 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1895) data, we could conclude that the odds of voting increase by 300 percent for each one-unit increase in education. Briefly review what we have found so far. When we looked at the relationship between education and the probability of voting, we saw that an increase in the independent variable does not produce a consistent change in the dependent variable. In examining the relationship between education and the odds of voting, however, we saw that a unit change in education does produce a constant change in the odds ratio of voting—equal to 4 for each unit change in x. Alternatively, each change in the independent variable produces a consistent percentage increase in the odds of voting, a change equal to 300 percent for each unit change in x. What sort of model would summarize this consistent pattern? The answer to this question lies at the heart of logistic regression. Logistic regression does not estimate the change in the probability of y for each unit change in x. Rather, it estimates the change in the log of the odds of y for each unit change in x. Consider the third column of numbers in Table 9-2 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1895) . This column reports an additional conversion, labeled “Logged odds of voting.” For low education, this number is equal to −2.8; for middle-low education, it is equal to −1.4; for middle education, 0; for the middle-high group, +1.4; and for high education, +2.8. Where did these numbers originate? A logarithm, or log for short, expresses a number as an exponent of some constant or base. If we chose a base of 10, for example, the number 100 would be expressed as 2, since 100 equals the base of 10 raised to the power of 2 (102 = 100). We would say, “The base-10 log of 100 equals 2.” Base-10 logs are called common logarithms, and they are used widely in electronics and the experimental sciences. Statisticians generally work with a different base, denoted as e. The base e is approximately equal to 2.72. Base-e logs are called natural logarithms and are abbreviated ln. Using the base e, we would express the number 100 as 4.61, since 100 equals the base e raised to the power of 4.61 (e 4.61 ≈ [100], or ln[100] ≈ 4.61). We would say, “The natural log of 100 equals 4.61.” The five numbers in the Table 9-2 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint57#s9781483396095.i1895) column “Logged odds of voting” are simply the natural logs of .06 (e −2.8 = .06), .25 (e −1.4 = .25), 1 (e 0 = 1), 4 (e 1.4 = 4), and 16 (e 2.8 = 16). Using conventional notation: ln(.06) = −2.8, ln(.25) = −1.4, ln(1) = 0, ln(4) = 1.4, and ln(16) = 2.8. These five numbers illustrate some general features of logarithmic transformations. Any number less than 1 has a negatively signed log. So to express .25 as a natural log, we would raise the base e to a negative power, −1.4. Any number greater than 1 has a positively signed log. To convert 4 to a natural log, we would raise e to the power of 1.4. And 1 has a log of 0, since e raised to the power of 0 equals 1. Natural log transformations of odds are often called logit transformations or logits for short. So the logit (“lowjit”) of 4 is 1.4. 3 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint61#s9781483396095.i2031) You are probably unaccustomed to thinking in terms of odds instead of probabilities. And it is a safe bet that you are really unaccustomed to thinking in terms of the logarithmic transformations of odds. But stay focused on the “Logged odds of voting” column. Again apply the linear regression logic. Does a unit change in the independent variable, education, produce a consistent change in the log of the odds of voting? We can see that moving from low education to middle-low education, the logged odds increases from −2.8 to −1.4, an increase of 1.4. And moving from middle-low to middle, the logged odds again increases by 1.4 (0 minus a negative 1.4 equals 1.4). From middle to middle-high and from middle-high to high, each one-unit increase in education produces an increase of 1.4 in the logged odds of voting. Now there is the odd beauty of logistic regression. Although we may not use a linear model to estimate the effect of an independent variable on the probability of a binary dependent variable, we may use a linear model to estimate the effect of an independent variable on the logged odds of a binary dependent variable. Consider this plain-vanilla regression model: As you know, the regression coefficient, , estimates the change in the dependent variable for each unit change in the independent variable. And the intercept, , estimates the value of the dependent variable when x is equal to 0. Using the numbers in the “Logged odds of voting” column to identify the values for and , we would have: Logged odds (voting) = − 2.8 + 1.4 (education) Review how this model fits the data. For the low-education group (coded 0 on education), the logged odds of voting would be −2.8 + 1.4(0), which is equal to −2.8. For the middle-low education group (coded 1 on education): −2.8 + 1.4(1), equal to −1.4. For the middle group (coded 2): −2.8 + 1.4(2), equal to 0. And so on, for each additional one-unit increase in education. This linear model nicely summarizes the relationship between education and the logged odds of voting. Now if someone were to ask, “What, exactly, is the effect of education on the likelihood of voting?” we could reply, “For each unit increase in education there is an increase of 1.4 in the logged odds of voting.” Although correct, this interpretation is not terribly intuitive. Therefore, we can use the regression coefficient, 1.4, to retrieve a more understandable number: the change in the odds ratio of voting for each unit change in education. How might this be accomplished? Remember that, as in any regression, all the coefficients on the right-hand side are expressed in units of the dependent variable. Thus the intercept, a, is the logged odds of voting when x is 0. And the slope, b, estimates the change in the logged odds of voting for each unit change in education. Because logged odds are exponents of e, we can get from logged odds back to odds by raising e to the power of any coefficient in which we are interested. Accordingly, to convert the slope, 1.4, we would raise e to the power of 1.4. This exponentiation procedure, abbreviated Exp(b), looks like this: Now we have a somewhat more interpretable reply: “For each unit change in education, the odds ratio increases by 4. Members of each education group are four times more likely to vote than are members of the next-lower education group.” Even more conveniently, we can translate the coefficient, 1.4, into a percentage change in the odds of voting. Here is the general formula: Percentage change in the odds of y = 100 × (Exp (b) − 1) For our example: Thus we now can say, “Each unit increase in education increases the odds of voting by 300 percent.” Note further that, armed with the logistic regression equation, “Logged odds (voting)= -2.8 + 1.4 (education),” we can estimate the odds of voting—and therefore the probability of voting—for each value of the independent variable. For the middle-low education group, for example, the logistic regression tells us that the logged odds of voting is −2.8 plus 1.4(1), equal to −1.4. Again, because −1.4 is the exponent of e, the odds of voting for this group would be Exp(−1.4), equal to .25. If the odds of voting is equal to .25, what is the probability of voting? We have already seen that: Odds = Probability / ( 1 − Probability ) Following a little algebra: Probability = Odds / ( 1 + Odds ) So for the middle-low group, the probability of voting would be: .25 / ( 1 + .25 ) = .25 / 1.25 = .20 By performing these reverse translations for each value of x—from logged odds to odds, and from odds to probabilities—we can retrieve the numbers in the “Probability of voting” column of Table 9-2 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1895) . If we were to plot these retrieved probabilities of y for each value of x, we would end up with Figure 9-1 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint57#s9781483396095.i1920) . In the beginning we realized that the linear regression logic could not be accurately or appropriately applied to the nonlinear relationship between education and the probability of voting. But after transforming the dependent variable into logged odds, we could apply a linear model. So x bears a nonlinear relationship to the probability of y, but x bears a linear relationship to the logged odds of y. Furthermore, because the logged odds of y bears a nonlinear relationship to the probability of y, the logistic regression model permits us to estimate the probability of an occurrence for any value of x. An S-shaped relationship, such as that depicted in Figure 9-1 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1920) , is the visual signature of logistic regression. Just as OLS regression will tell us how well our data fit a linear relationship between an independent variable and the values of an interval-level dependent variable, logistic regression will tell us how well our data fit an S-shaped relationship between an independent variable and the probability of a binary dependent variable. 4 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2033) In the hypothetical education−voting example, the relationship worked out perfectly. The logistic regression returned the exact probabilities of voting for each value of education. By eyeballing the “Logged odds of voting” column of Table 9-2 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1895) , we could easily identify the intercept and regression coefficient of the logistic regression equation. No prediction error. In the practical world of political research, of course, relationships are never this neat. Figure 9-1 Plotted Probabilities of Voting (y), by Education (x) Note: Hypothetical data. To apply what you have learned thus far—and to discuss some further properties of logistic regression—we enlist here a real-world dataset, the 2012 General Social Survey (GSS), and reexamine the relationship between education and voting. We estimate the logistic regression equation as follows: The dependent variable is reported turnout in the 2008 presidential election. As in the hypothetical example, voters are coded 1 and nonvoters are coded 0. Unlike the hypothetical example, the independent variable, education, is measured in years of formal schooling. This is a more realistic interval-level variable, with values that range from 0 for no formal education to 20 for 20 years of education. Table 9-3 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint57#s9781483396095.i1927) reports the results obtained from a logistic regression analysis using Stata. First consider the numbers in the column labeled “Coefficient estimate.” Plugging these values into our equation, we would have: Logged odds (voting) = − 2.068 + .226 (education) The coefficients tell us that, for individuals with no formal schooling, the estimated logged odds of voting is equal to −2.068, and each 1-year increment in education increases the estimated logged odds by .226. The value in the right-most row, “Exp(b),” translates logged odds back into odds and thus provides a more accessible interpretation. Every unit increase in education increases the odds ratio by 1.254. Individuals at any given level of education are about 1.25 times more likely to vote than are individuals at the next-lower level of education. That is, as we move from one value of education to the next-higher value, we would multiply the odds of voting by about 1.25. Perhaps the most intuitively appealing way to characterize the relationship is to estimate the percentage change in the odds of voting for each 1-year increase in education. As we have seen, this is accomplished by subtracting 1 from the exponent, 1.254, and multiplying by 100. Performing this calculation: 100 × (1.254 − 1) = 25.4. Thus the odds of voting increase by about 25 percent for each 1-year increase in the independent variable. What would the null hypothesis have to say about these results? As always, the null hypothesis claims that, in the population from which the sample was drawn, no relationship exists between the independent and dependent variables, that individuals’ levels of education play no role in determining whether they vote. Framed in terms of the logistic regression coefficient, the null hypothesis says that, in the population, the true value of the coefficient is equal to 0, that a one-unit increase in the independent variable produces no change in the logged odds of voting. The null hypothesis also can be framed in terms of the odds ratio, Exp(b). As we have seen, the odds ratio tells us by how much to multiply the odds of the dependent variable for each one-unit increase in the independent variable. An odds ratio of less than 1 means that the odds decline as the independent variable goes up (a negative relationship). (For a discussion of negative relationships in logistic regression, see Box 9-1 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1934) .) An odds ratio of greater than 1 says that the odds increase as the independent variable goes up (a positive relationship). But an odds ratio of 1 means that the odds do not change as the independent variable increases. Thus an odds ratio equal to 1 would be good news for the null hypothesis, because it would mean that individuals at any level of education are no more likely to vote than are individuals at the next-lower level of education. So if the logistic regression coefficient were equal to 0 or Exp(b) were equal to 1, then we would have to say that the independent variable has no effect on the dependent variable. 5 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint61#s9781483396095.i2036) As it is, however, we obtained an estimated coefficient equal to .226, which is greater than 0, and an odds ratio equal to 1.254, which is greater than 1. But how can we tell if these numbers are statistically significant? Notice that, just as in OLS regression, logistic regression has provided a standard error for the estimated coefficient, b. And, again like OLS, the standard error tells us how much prediction error is contained in the estimated coefficient. 6 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2038) Thus, according to Table 9-3 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-57#s9781483396095.i1927) , each 1-year increase in education produces a .226 increase in the logged odds of voting, give or take .030 or so. OLS regression computes a test statistic based on the Student’s t-distribution. In Stata, logistic regression returns z, based on the normal distribution. SPSS computes the Wald statistic, which follows a chi-square distribution. 7 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2041) All computer programs provide a P-value for the test statistic. Like any P-value, this number tells you the probability of obtaining the observed results, under the assumption that the null hypothesis is correct. A P-value equal to .000 says that, if the null hypothesis is correct, then the probability of obtaining a regression coefficient of .226 is highly remote—clearly beyond the .05 standard. Therefore, we can safely reject the null hypothesis and conclude that education has a statistically significant effect on the likelihood of voting. As you can see, in many ways logistic regression bears a kinship to OLS regression. In running OLS, we obtain an estimate for the linear regression coefficient that minimizes prediction errors. That is, OLS provides the best fit between the predicted values of the dependent variable and the actual, observed values of the dependent variable. OLS also reports a standard error for the regression coefficient, which tells us how much prediction error is contained in the regression coefficient. This information permits us to determine whether x has a significant effect on y. Similarly, logistic regression minimizes prediction errors by finding an estimate for the logistic regression coefficient that yields the maximum fit between the predicted probabilities of y and the observed probabilities of y. Plus it reports a standard error for this estimated effect. However, a valuable statistic is missing from the analogy between OLS and logistic regression: Rsquare. As you know, R-square tells the researcher how completely the independent variable (or, in multiple regression, all the independent variables) explains the dependent variable. In our current example, it certainly would be nice to know how completely the independent variable, education, accounts for the likelihood of voting. Does logistic regression provide an analogous statistic to Rsquare? Strictly speaking, the answer is no. 8 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2043) Even so, methodologists have proposed R-square−like measures that give an overall reading of the strength of association between the independent variables and the dependent variable. To understand these measures, we need to take a closer look at maximum likelihood estimation, the technique logistic regression uses to arrive at the best fit between the predicted probabilities of y and the observed probabilities of y. Box 9-1 How to Interpret a Negative Relationship in Logistic Regression As we expected, Table 9-3 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint57#s9781483396095.i1927) reveals a positive relationship between education and the likelihood of voting. Each 1-year increase in schooling increases the logged odds of voting by .226. Alternatively, each increment in education boosts the odds ratio by 1.25—a 25 percent increase for each increment in educational attainment. However, in your own or in others’ research you will often encounter negative relationships, situations in which a unit increase in the independent variable is associated with a decrease in the logged odds of the dependent variable. Negative relationships can be a bit trickier in logistic regression than in OLS regression. Suppose we were to investigate the relationship between the likelihood of voting and the number of hours respondents spend watching television per day. In this situation, we might expect to find a negative relationship: The more television that people watch, the less likely they are to vote. In fact, we would obtain these estimates: Logged odds (voting) = 1.110 − .107 (TV hours) Thus, each 1-hour increase in daily television watching occasions a decrease of .107 in the logged odds of voting. Obtaining the odds ratio, we would have: Exp(– .107) = e– .107 = .898. Positive relationships produce odds ratios of greater than 1, and negative relationships produce odds ratios of less than 1. How would you interpret an odds ratio of .898? Like this: Individuals watching any given number of hours of television per day are only about .9 times as likely to vote as are individuals who watch the next-lower number of hours. For example, people who watch 4 hours per day are .9 times as likely to vote as are people who watch 3 hours per day. Following the formula for percentage change in the odds: 100 × (.898 – 1) = -10.2. Each additional hour spent in front of the television depresses the odds of voting by about 10 percent. a (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint57#s9781483396095.i1941) a Data for this analysis are from the 2008 General Social Survey (n = 1,163). The independent variable, number of hours spent watching television, is based on the following question: “On the average day, about how many hours do you personally watch television?” 9.2 Finding the Best Fit: Maximum Likelihood Estimation By way of introducing maximum likelihood estimation, it is helpful to recall the logic behind proportional reduction in error (PRE) measures of association, such as lambda or R-square. You will remember that a PRE measure first determines how well we can predict the values of the dependent variable without knowledge of the independent variable. It then compares this result with how well we can predict the dependent variable with knowledge of the independent variable. PRE uses the overall mean of the dependent variable to “guess” the dependent variable for each value of the independent variable. This guessing strategy produces a certain number of errors. PRE then figures out how many errors occur when the independent variable is taken into account. By comparing these two numbers—the number of errors without knowledge of the independent variable and the number of errors with knowledge of the independent variable—PRE determines how much predictive leverage the independent variable provides. Maximum likelihood estimation (MLE) employs the same approach. MLE takes the sample-wide probability of observing a specific value of a binary dependent variable and sees how well this probability predicts that outcome for each individual case in the sample. At least initially, MLE ignores the independent variable. As in PRE, this initial strategy produces a number of prediction errors. MLE then takes the independent variable into account and determines if, by knowing the independent variable, these prediction errors can be reduced. Consider a highly simplified illustration, which again uses education (x) to predict whether an individual voted (coded 1 on the dependent variable, y) or did not vote (coded 0 on y). MLE first would ask, “How well can we predict whether or not an individual voted without using education as a predictor?” For the sake of simplicity, suppose our sample consists of four individuals, as shown in Table 9-4 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint58#s9781483396095.i1945) . As you can see, two individuals voted (coded 1) and two did not (coded 0). Based only on the distribution of the dependent variable, what is the predicted probability of voting for each individual? MLE would answer this question by figuring out the sample-wide probability of voting and applying this prediction to each case. Since half the sample voted and half did not, MLE’s initial predicted probability (labeled P) would be equal to .5 for each individual. Why .5? Because there is a .5 chance that any individual in the sample voted and a .5 chance that he or she did not vote. We will label the model that gave rise to the initial predictions Model 1. Table 9-4 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-58#s9781483396095.i1945) shows the predicted probabilities, plus some additional information, for Model 1. How well, overall, does Model 1 predict the real values of y? MLE answers this question by computing a likelihood function, a number that summarizes how well a model’s predictions fit the observed data. In computing this function, MLE first determines a likelihood for each individual case. An individual likelihood tells us how closely the model comes to predicting the observed outcome for that case. MLE then computes the likelihood function by calculating the product of the individual likelihoods, that is, by multiplying them together. The likelihood function can take on any value between 0 (meaning the model’s predictions do not fit the observed data at all) and 1 (meaning the model’s predictions fit the observed data perfectly). Stated formally, the likelihood function is not beautiful to behold. 9 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2045) Applied in practice to a small set of data, however, the function is not difficult to compute. If a case has an observed value of y equal to 1 (the individual voted), then the likelihood for that case is equal to P. So individuals A and B, with predicted probabilities equal to .5, have likelihoods equal to P, which is .5. If a case has an observed value of y equal to 0 (the individual did not vote), then the likelihood for that case is equal to 1 − P. Thus individuals C and D, who have predicted probabilities of .5, have likelihoods equal to 1 − P, or 1 − .5, also equal to .5. The likelihoods for each individual are displayed in the right-most column of Table 9-4 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-58#s9781483396095.i1945) . The likelihood for Model 1 is determined by multiplying all the individual likelihoods together: Model 1 likelihood = .5×.5 ×.5 ×.5 =.0625 MLE would use this number, .0625, as a baseline summary of how well we can predict voting without knowledge of the independent variable, education. The baseline model is sometimes called the reduced model, because its predictions are generated without using the independent variable. Informally, we could also call it the “know-nothing model” because it does not take into account knowledge of the independent variable. In its next step, MLE would bring the independent variable into its calculations by specifying a logistic regression coefficient for education, recomputing the probabilities and likelihoods, and seeing how closely the new estimates conform to the observed data. Again, for the sake of illustration, suppose that these new estimates, labeled Model 2, yield the predicted probabilities displayed in Table 9-5 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-58#s9781483396095.i1955) . Model 2, which takes into account the independent variable, does a better job than Model 1 in predicting the observed values of y. By using education to predict voting, Model 2 estimates probabilities equal to .9 and .8 for individuals A and B (who, in fact, voted), but probabilities of only .3 and .1 for individuals C and D (who, in fact, did not vote). Just as in the Model 1 procedure, the individual likelihoods for each case are equal to P for each of the voters (for whom y = 1) and are equal to 1 − P for each of the nonvoters (for whom y = 0). The individual likelihoods appear in the right-most column of Table 9-5 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint58#s9781483396095.i1955) . As before, the likelihood function for Model 2 is computed by multiplying the individual likelihoods together: Model 2 Likelihood = .9×.8×.7×.9 =.4536 How much better is Model 2 than Model 1? Does using education as a predictor provide significantly improved estimates of the probability of voting? Now, MLE does not work directly with differences in model likelihoods. Rather it deals with the natural log of the likelihood, or logged likelihood (LL) of each model. Thus MLE would calculate the natural log of the Model 1 likelihood, calculate the natural log of the Model 2 likelihood, and then determine the difference between the two numbers. Table 9-6 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-58#s9781483396095.i1960) shows these conversions, plus some additional calculations, for Model 1 and Model 2. Examine the Table 9-6 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint58#s9781483396095.i1960) calculations. As we found earlier, Model 2’s likelihood (.4536) is greater than Model 1’s likelihood (.0625). This increase is also reflected in the LLs of both models: The LL increases from −2.78 for Model 1 to −0.79 for Model 2. MLE makes the comparison between models by starting with Model 1’s LL and subtracting Model 2’s LL: −2.78 − (−0.79) = −1.99. Notice that if Model 2 did about as well as Model 1 in predicting y, then the two LLs would be similar, and the calculated difference would be close to 0. 10 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2047) As it is, MLE found a difference equal to −1.99. Does the number −1.99 help us decide whether Model 2 is significantly better than Model 1? Yes, it does. With one additional calculation, the difference between two LLs follows a chi-square distribution. The additional calculation is achieved by multiplying the difference in LLs by −2. Doing so, of course, doubles the difference and reverses the sign: −2 (−1.99) = 3.98. This calculation, often labeled in computer output as “Chi-square,” is a chi-square test statistic, and MLE uses it to test the null hypothesis that the true difference between Model 1 and Model 2 is equal to 0. There is nothing mystical here. It is plain old hypothesis testing using chi-square. If the calculated value of the change in −2LL, equal to 3.98, could have occurred more frequently than 5 times out of 100, by chance, then we would not reject the null hypothesis. We would have to conclude that the education−voting relationship is not significant. However, if the chances of observing a chi-square value of 3.98 are less than or equal to .05, then we would reject the null hypothesis and infer that Model 2 is significantly better than Model 1. Using the appropriate degrees of freedom and applying a chi-square test, MLE would report a P-value of .046 for a test statistic of 3.98. 11 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2049) The P-value is less than .05, so we can reject the null hypothesis and conclude that education is a statistically significant predictor of the probability of voting. MLE proceeds much in the way illustrated by this example. It first obtains a set of predictions and likelihoods based on a reduced or know-nothing model, that is, a model using only the sample-wide probability of y to predict the observed values of y for each case in the data. It then “tries out” a coefficient for the independent variable in the logistic regression model. MLE usually obtains the first try-out coefficient by running a version of least squares regression using x to predict y. It enlists this coefficient to compute a likelihood, which it then compares with the likelihood of the reduced model. It then proceeds in an iterative fashion, using a complex mathematical algorithm to fine-tune the coefficient, computing another likelihood, and then another and another—until it achieves the best possible fit between the model’s predictions and the observed values of the dependent variable. MLE is the heart and soul of logistic regression. This estimation technique generates all the coefficient estimates and other useful statistics that help the analyst draw inferences about the relationship between the independent and dependent variables. Return now to the GSS data and consider some of these additional statistics, as reported in Table 9-7 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-58#s9781483396095.i1969) . To enhance the comparison between the real-world data and the hypothetical example just discussed, the reduced model—the model estimated without taking into account education—is called Model 1. Model 2 refers to the results obtained after the independent variable, education, is used to predict the likelihood of voting. Note the difference between the LLs of the models: When education is used to predict voting, the LL increases from −1,032.78 to −958.82. Is this a significant improvement? Yes, according to the “Model comparison” numbers in the table. Subtracting Model 2’s LL from Model 1’s LL yields a difference of −73.96. Multiplying the difference by −2, labeled “Chi-square,” gives us a chisquare test statistic of 147.92, which has a P-value well beyond the realm of the null hypothesis. Thus, compared with how well we can predict the dependent variable without knowledge of the independent variable, knowledge of respondents’ education significantly improves our ability to predict the likelihood of voting. If Model 2 had several predictors of voting—education, age, and race, for example—then the change in −2LL, the “Chi-square” statistic, would provide a chi-square test for the null hypothesis that none of these variables is significantly related to the likelihood of voting. Logistic regression enlists the change in likelihood function in yet another way—as the basis for Rsquare−type measures of association, three of which are reported in the Table 9-7 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-58#s9781483396095.i1969) “Model 2 summary.” These statistics are grounded on the intuitive PRE logic. Model 1’s LL represents prediction error without knowing the independent variable. The difference between Model 1’s LL and Model 2’s LL represents the predictive leverage gained by knowing the independent variable. In conceptual terms, then, we could express the difference between the two models as a proportion of Model 1’s LL: R−square = ( Model 1 LL − Model 2 LL ) / ( Model 1 LL ) If Model 2 did about as well as Model 1 in predicting voting—if the two models’ LLs were similar—then R-square would be close to 0. If, by contrast, Model 2’s LL was a lot higher than Model 1’s LL, then R-square would approach 1. 12 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2051) The various R-square measures build on this conceptual framework. McFadden R-square, familiar to Stata users, is derived from the above formula. Cox−Snell R-square makes an adjustment based on sample size. The Cox−Snell R-square is somewhat conservative, however, because it can have a maximum value of less than 1. The Nagelkerke statistic adjusts the Cox−Snell number, yielding a measure that is usually higher. By and large, though, these measures, and several others that you may encounter, give readings of strength that are pretty close to each other. 13 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2053) So what are we to make of an R-square in the .07 to .12 range? Again, unlike least squares regression, MLE is not in the business of explaining variance in the dependent variable. So we cannot say something like, “Education explains about 7 percent of the variation in voter turnout.” However, we know that R-square can assume values between 0 and 1, with 0 denoting a very weak relationship and 1 denoting a strong relationship. Thus we can say that education, while significantly related to the likelihood of voting, is not by itself a particularly strong predictive tool. From a substantive standpoint, this is not too surprising. You can probably think of several additional variables that might improve the predictive power of the logistic regression model. Age, race, political efficacy, strength of partisanship—all these variables come to mind as other possible causes of voting. If we were running OLS, we could specify a multiple regression model and estimate the effect of each of these variables on the dependent variable. Logistic regression also accommodates multiple predictors. We turn now to a discussion of logistic regression using more than one independent variable. 9.3 Logistic Regression with Multiple Independent Variables Thus far we have covered a fair amount of ground. You now understand the meaning of a logistic regression coefficient. You know how to interpret coefficients in terms of changes in the odds ratio, as well as the percentage change in the odds. You know how to evaluate the statistical significance of a logistic regression coefficient. Plus you have a basic understanding of MLE, and you can appreciate its central role in providing useful statistics, such as the change in −2LL, as well as R-square−type measures of association. So far, however, our substantive examples have been of a simple variety, with one independent variable. Yet political researchers are often interested in assessing the effects of several independent variables on a dependent variable. We often want to know whether an independent variable affects a dependent variable, controlling for other possible causal influences. In this section we show that the logistic regression model, much like the linear regression model, can be extended to accommodate multiple independent variables. We also illustrate how logistic regression models can be used to obtain and analyze the predicted probabilities of a binary variable. To keep things consistent with the previous examples—but to add an interesting wrinkle—we introduce a dummy independent variable into the education−voting model: Education, as before, is measured in years of schooling, from 0 to 20. “Partisan” is a dummy variable that gauges strength of party identification: Strong Democrats and strong Republicans are coded 1 on this dummy, and all others (weak identifiers, Independents, and Independent leaners) are coded 0. From an empirical standpoint, we know that strongly partisan people, regardless of their party affiliation, are more likely to vote than are people whose partisan attachments are weaker. So we would expect a positive relationship between strength of partisanship and the likelihood of voting. The coefficients in this model— â, regression. The coefficient , and —are directly analogous to coefficients in multiple linear will estimate the change in the logged odds of voting for each 1-year change in education, controlling for the effect of partisan strength. Similarly, will tell us by how much to adjust the estimated logged odds for strong partisans, controlling for the effect of education. To the extent that education and partisan strength are themselves related, the logistic regression procedure will control for this, and it will estimate the partial effect of each variable on the logged odds of voting. And the intercept, â, will report the logged odds of voting when both independent variables are equal to 0, for respondents with no schooling (for whom education = 0) and who are not strong party identifiers (partisan = 0). This point bears emphasizing: The logistic regression model specified above is a linear-additive model, and it is just like a garden-variety multiple regression model. The partial effect of education on the logged odds of voting is assumed to be the same for strong partisans and nonstrong partisans alike. And the partial effect of partisan strength on the logged odds of voting is assumed to be the same at all values of education. (This point becomes important in a moment, when we return to a discussion of probabilities.) Table 9-8 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint59#s9781483396095.i1978) reports the results of the analysis, using the GSS data. Plugging the coefficient values into the logistic regression model, we find: Interpretation of these coefficients is by now a familiar task. When we control for partisan strength, each 1-year increase in education increases the logged odds of voting by .224. And, after we take into account the effect of education, being a strong partisan increases the logged odds of voting by 1.522. Turning to the odds ratios, reported in the “Exp(b)” column of Table 9-8 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-59#s9781483396095.i1978) , we can see that a unit increase in education multiplies the odds by about 1.25. And, when “partisan” is switched from 0 to 1, the odds ratio jumps by 4.6. In other words, when we control for education, strong partisans are nearly five times more likely to vote than are weak partisans or Independents. Framing the relationships in terms of percentage change in the odds: The odds of voting increase by about 25 percent for each incremental change in education and by 358 percent for the comparison between nonstrong partisans and partisans. Finally, according to the z statistics (and accompanying Pvalues), each independent variable is significantly related to the logged odds of voting. Overall, the model performs fairly well. The chi-square statistic (255.84, P-value = .000) says that including both independent variables in the estimation procedure provides significant predictive improvement over the baseline know-nothing model. 14 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2055) The McFadden (.124), Cox−Snell (.133), and Nagelkerke (.195) R-square values, while not spellbinding, suggest that education and partisanship together do a decent job of predicting voting, especially when compared with our earlier analysis (see Table 9-7 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-58#s9781483396095.i1969) ), in which education was used as the sole predictor. These results add up to a reasonably complete analysis of the relationships. Certainly it is good to know the size and significance of the partial effects of education and partisan strength on the logged odds of voting, and it is convenient to express these effects in the language of odds ratios and the percentage change in odds. Often, however, the researcher wishes to understand his or her findings in the most intuitively meaningful terms: probabilities. We might ask, “What are the effects of the independent variables on the probability of voting? Although education and partisan attachments clearly enhance the odds of voting, by how much do these variables affect the probability that people will turn out?” These questions are perfectly reasonable, but they pose two challenges. First, in any logistic regression model—including the simple model with one independent variable—a linear relationship exists between x and the logged odds of y, but a nonlinear relationship exists between x and the probability of y. This signature feature of logistic regression was discussed earlier. The marginal effect of x on the probability of y will not be the same for all values of x. Thus the effect of, say, a 1-year increase in education on the probability of voting will depend on where you “start” along the education variable. Second, in a logistic regression model with more than one independent variable, such as the model we just discussed, the independent variables have a linear-additive relationship with the logged odds of y, but they may show interaction effects on the probability of y. For example, logistic regression will permit the relationship between education and the probability of voting to vary, depending on respondents’ levels of partisan strength. The technique might find that the marginal effects of education on the probability of voting are different—perhaps weaker, perhaps stronger—depending on partisan strength. Or, it could find that the marginal effects of education are the same or very similar, regardless of partisan strength. 15 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2058) Odd as it may sound, these challenges define some rather attractive features of logistic regression. Properly applied, the technique allows the researcher to work with probabilities instead of odds or logged odds and, in the bargain, to gain revealing substantive insights into the relationships being studied. 9.4 Working with Probabilities: MEMs and MERs Return to the logistic regression model we just estimated and consider how best to represent and interpret these relationships in terms of probabilities. The model will, of course, yield the predicted logged odds of voting for any combination of the independent variables. Just plug in values for education and the partisan dummy, do the math, and obtain an estimated logged odds of voting for that combination of values. As we saw earlier, logged odds can be converted back into odds and, in turn, odds can be translated into probabilities. These conversions—from logged odds to odds, and from odds to probabilities—form the basis of two commonly used methods for representing complex relationships in terms of probabilities. 16 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2061) First, we could examine the effect of one independent variable on the probability of the dependent variable, while holding the other independent variables constant at their sample averages. Thus, one retrieves “marginal effects at the means,” or MEMs. In the current example, we could estimate the probability of voting at each value of education, from 0 to 20, while holding partisan constant at its mean. This would allow us to answer the question, “For individuals of ‘average’ partisan strength, how does the probability of voting change as education increases?” A second approach is to report changes in the probability of the dependent variable across the range of an interesting independent variable—and to do so separately, for discrete categories of a another independent variable. Thus, one retrieves “marginal effects at representative values,” or MERs. For example, we might estimate the probability of voting at each value of education, from 0 to 20, separately for weak partisans and for strong partisans. This would enable us to answer these questions: “In what ways does education affect the probability of voting for individuals who are weakly tied to parties? How do these effects differ from education’s effect for people who are more strongly partisan?” We consider each of these approaches, beginning with MEMs. Suppose that we wanted to see what happens to the probability of voting as education increases from 0 to 20 years, holding partisan constant at its mean. We would enlist the logistic regression coefficients and use them to calculate—or, better, use a computer to calculate—twenty-one separate probabilities, one for each value of education, from education = 0 through education = 20. For convenience, here is the logistic regression equation we obtained earlier: The first group, people with 0 years of schooling, has a value of 0 on the education variable and a value of .2768 on partisan, its sample-wide mean. (The mean of any dummy is defined as the proportion of the sample scoring 1 on the dummy. Because 27.68 percent of the GSS sample are strong partisans, then partisan has a mean equal to .2768. 17 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2063) ) Using the logistic regression model to estimate the logged odds of voting for this group: So, the estimated logged odds of voting are equal to −1.933. What are the odds of voting for this group? Odds can be reclaimed by taking the exponent of the logged odds: Exp(−1.933), which is equal to .145. Thus the odds are .145. Next, convert .145 to a probability using the formula: Probability = Odds / (1 + Odds). Converting .145 to a probability of voting, we have .145/1.145 ≈ .13. Thus, the estimated probability of voting for individuals with no formal schooling and with average partisan strength is equal to .13. There is a very weak probability—barely better than one chance in ten—that these individuals voted. What is the turnout probability of average partisans with 20 years of schooling? We would have: If the logged odds are 2.547, then the odds would be Exp(2.547), equal to 12.769. Finally, the estimated probability is: 12.769/13.769 ≈ .93. There are over nine chances in ten that these people voted. Table 9-9 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint60#s9781483396095.i2000) and Figure 9-2 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-60#s9781483396095.i2003) present the estimated probabilities of voting at each value of education, calculated while holding partisan constant at its sample mean. Logistic regression’s nonlinear signature is clearly evident in Table 9-9 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-60#s9781483396095.i2000) and Figure 9-2 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint60#s9781483396095.i2003) . At low levels of education, the probabilities are lackluster, and they increase at a slow rate of about .03 or .04 for each 1-year increment in education. Higher values of education show a version of the same thing—in this case, high probabilities, and low marginal changes. How else, besides these signature features, might we describe the relationship between the independent variable and the probability of the dependent variable? We could cite two additional facets: the switchover point and the full effect. The switchover point is defined by the interval of the independent variable in which the probability of the dependent variable changes from less than .5 to greater than .5. (In negative relationships, it changes from greater than .5 to less than .5.) In the current example, the switchover occurs between 8 and 9 years of education. It is here that the probability of voting increases from .465 to .521, an increase of .056, which is the largest marginal increase in the data. In fact, the largest marginal change in probabilities, sometimes called the “instantaneous effect,” always occurs at the switchover point. 18 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-61#s9781483396095.i2065) The full effect of the independent variable is determined by subtracting the probability associated with the lowest value of the independent variable from the probability associated with the highest value of the independent variable. Applied to the current example: .928 − .126 = .802. Thus, across its full range of values, and holding partisan strength constant at its sample mean, education changes the probability of voting by a healthy .802. Source: 2012 General Social Survey. Switchover points and full effects become especially valuable interpretive tools in comparing marginal effects at representative values (MERs). To illustrate MERs, we will estimate the probability of voting, by years of education—and we will do so separately for weak and strong partisans. Does the switchover occur at different points for the two partisan types? Does the full effect of education vary by strength of partisanship? These questions require the calculation of forty-two probabilities: twenty-one for weak partisans and twenty-one for strong partisans. Here again is the logistic regression model: What is the estimated logged odds of voting for individuals with no formal schooling (education = 0) and weak partisan ties (partisan= 0)? It is −2.354 + .224(0) + 1.522(0) = −2.354. The odds of voting is equal to Exp(−2.354) = .095; and the probability = .095/1.095 = .087. Thus, for individuals who lack both participatory resources, education and partisan strength, the probability of voting is extraordinarily low. How do these individuals compare with their strongly partisan counterparts, those for whom education = 0 and partisan = 1? Figure 9-2 Predicted Probabilities of Voting, by Years of Education (graphic) Source: 2012 General Social Survey. Note: Partisan strength held constant at .2768. The logged odds of voting, −2.354 + .224(0) + 1.522(1) = − .832. The odds of voting, Exp(− .832) = .435; and the probability, .435/1.435 = .303. These two probabilities alone permit a remarkable comparison: At the lowest level of education, switching partisan from 0 to 1 boosts the probability of voting by .303 − .087 = .216. This is indeed a sizeable effect. Table 9-10 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint60#s9781483396095.i2007) and Figure 9-3 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-60#s9781483396095.i2008) present the predicted probabilities for all combinations of education and partisan strength. We can plainly see the different effects of education: Education plays a bigger role for weak partisans than for strong partisans. For weak partisans, the full effect of education is equal to .894 − .087 = .807. For strong partisans, .975 − .303 = .672. And notice the different switchover points. For individuals more weakly attached to parties, the probabilities build at a leisurely pace, not switching over until education reaches the interval between 10 years and 11 years of schooling. Strong partisans, by contrast, cross the threshold between 3 and 4 years of education. Indeed, a strong partisan with an associate’s degree (education = 14) is as likely to turn out as a weak partisan with a graduate education (education = 20): .909, compared with .894. It is worth pointing out that MEMs and MERs are not mutually exclusive strategies for analyzing probabilities. Suppose we added another predictor, income, to the current turnout model. We could define a MEMs−MERs hybrid—for example, looking at the effect of income on the predicted probability of voting, separately for weak and strong partisans (MERs), holding education constant at its sample mean (MEMs). MEMs and MERs are two basic approaches. There are alternative ways of working with probabilities in logistic regression. (For a discussion of one alternative approach, average marginal effects or AMEs, see Box 9-2 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-60#s9781483396095.i2010) .) Source: 2012 General Social Survey. Figure 9-3 Predicted Probabilities of Voting, by Education and Partisan Strength (graphic) Source: 2012 General Social Survey. Box 9-2 Average Marginal Effects As an alternative to MEMs, some researchers recommend retrieving average marginal effects (AMEs) for variables of interest. Consider an example used by methodologist Richard Williams. Suppose we wish to estimate the probability of being diabetic, separately for “average” whites and blacks. In the MEMs approach, we would reclaim two probabilities, one for whites and one for blacks, while holding all other predictors constant at their sample means. In the AMEs approach, we first would calculate a probability for each individual in the sample, under the assumption that each individual is white, and we would allow all other predictors to assume their observed values for each case. Second, we would recalculate the probabilities, this time under the assumption that each individual is black, and again we would allow all other predictors to assume observed values. For any individual, the difference between the two probabilities is the marginal effect (ME) of race for that case. The average of the MEs across all cases is the AME of race on the probability of diabetes. According to Williams, “With AMEs, you are in effect comparing two hypothetical populations—one all white, one all black—that have the exact same values on the other independent variables in the model. The logic is similar to that of a matching study, where subjects have identical values on every independent variable except one. Because the only difference between these two populations is their races, race must be the cause of the difference in their probabilities of having diabetes.” 19 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint61#s9781483396095.i2067) Summary A political researcher wants to explain why some people approve of same-sex marriage, whereas others disapprove. Thinking that age plays a causal role, she hypothesizes that as age increases, the likelihood of disapproval will go up, that older people will be more likely than younger people to disapprove of same-sex marriage. Recognizing that women may be more liberal than men on this question, the researcher wants to isolate the effect of age, controlling for gender. Consulting her survey dataset, the researcher finds a binary variable that will serve as the dependent variable (respondents who approve of same-sex marriage are coded 0 on this variable and those who disapprove are coded 1). She also finds an interval-level independent variable, age measured in years, from 18 to 89. She has a dummy variable labeled “female,” coded 0 for men and 1 for women. So she has the hypothesis, the data, and the variables. Now what? Which analytic technique is best suited to this research problem? If this researcher is someone other than you, she may need to test her idea by collapsing age into three or four categories, retrieving the tool labeled cross-tabulation from her methods toolbox, and comparing the percentages of disapprovers across the collapsed categories of the independent variable for each value of the control. That might work okay. But what if she decides to control for the effects of several other variables that may shape individuals’ approval or disapproval of same-sex marriage—such as education, ideology, and partisanship? Cross-tabulation would become cumbersome to work with, and she may need to settle for an incomplete analysis of the relationships. The larger point, of course, is that this researcher’s ability to answer an interesting substantive question is limited by the tools at her disposal. If this researcher is you, however, you now know a far better approach to the problem. Reach into your toolbox of techniques, select the tool labeled logistic regression, and estimate this model: Logged odds (disapproval) = a + b 1 (age) + b 2 (female). The logistic regression coefficient, b 1, will tell you how much the logged odds of disapproval increase for each 1-year change in age controlling for gender. Of course, logged odds are not easily grasped. But by entering the value of b 1 into your handheld calculator and tapping the e x key—or, better still, by examining the Exp(b) values in the computer output—you can find the odds ratio, the change in the odds of disapproving as age increases by 1 year. You can convert Exp(b) into a percentage change in the odds of disapproval. You can test the null hypothesis that b 1 is equal to 0 by consulting the P-value of the z statistic or Wald statistic. You can see how well the model performs by examining changes in the magnitude of −2LL and reviewing the accompanying chi-square test. Several R-square−like measures, such as McFadden, Cox−Snell, and Nagelkerke, will give you a general idea of how completely age and gender account for the likelihood of disapproving of same-sex marriage. You can calculate and examine the predicted probabilities of disapproval at each value of age, and hold female constant at its sample mean. Perhaps more meaningfully, you could see whether age works differently for women and men when it comes to disapproval of same-sex marriage. If you are challenged by a skeptic who thinks you should have controlled for education and partisanship, you can reanalyze your model, controlling for these variables—and any other independent variables that might affect your results. By adding logistic regression to your arsenal of research techniques, you are now well prepared to handle any research question that interests you. Take a closer look. edge.sagepub.com/pollock (http://edge.sagepub.com/pollock) Key Terms average marginal effects (p. 238) binary variable (p. 216) common logarithms (p. 220) likelihood function (p. 227) logged likelihood (p. 228) logits (p. 220) marginal effects at representative values (p. 233) marginal effects at the means (p. 233) maximum likelihood estimation (p. 226) natural logarithms (p. 220) odds (p. 218) odds ratio (p. 219) percentage change in the odds (p. 220) Exercises 20 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint61#s9781483396095.i2069) 1. In this exercise, you will interpret the results of a logistic regression analysis of the relationship between gun control opinions and party identification. The binary dependent variable is coded 1 for pro-control opinions and 0 for anti-control opinions. The independent variable, party identification, is a 7-point scale, ranging from 0 (strong Republicans) to 6 (strong Democrats). A. The logistic regression coefficient tells us that, for each one-unit increase in the party identification scale, the logged odds of a pro-control opinion increases by .506. Turn your attention to the odds ratio, Exp(b). Recall that higher values of party identification are more Democratic and lower values are less Democratic. This coefficient says that an individual at one value of party identification is ______________ times as likely to be pro-control as an individual at the next-lower value of party identification. B. Use the value of Exp(b) to compute a percentage change in the odds. According to your calculations, each unit increase in party identification, from less Democratic to more Democratic, increases the odds of being pro-control by how much? C. (i) State the null hypothesis for this relationship. (ii) What is your inferential decision—reject the null hypothesis or do not reject the null hypothesis? (iii) Explain your answer. 2. Here is an extension of the analysis in Exercise 1. The following analysis adds gender as a second independent variable. The model includes “female,” a dummy coded 1 for females and 0 for males. Parts A−D present interpretations based on these results. For each part, (i) state whether the interpretation is correct or incorrect, and (ii) explain why the interpretation is correct or incorrect. For incorrect interpretations, be sure that your response in (ii) includes the correct interpretation. A. Interpretation One: If we control for gender, each one-unit increase in the party identification scale increases the likelihood of pro-control opinion by 50.3 percent. B. Interpretation Two: If we control for party identification, females are almost twice as likely to be pro-control than are men. C. Interpretation Three: Compared with how well the model performs without including measures of party identification and gender, inclusion of both of these independent variables provides a statistically significant improvement. D. Interpretation Four: Party identification and gender together explain between 15.1 percent and 25.3 percent of the variance in the likelihood of a pro-control opinion. 3. The following table reports the predicted probabilities of a pro-control opinion, by party identification, for females and males. A. (i) For females, the switchover point occurs between which to values of party identification? (ii) For males, the switchover point occurs between which to values of party identification? Parts B and C present interpretations based on these results. For each part, (i) state whether the interpretation is correct or incorrect, and (ii) explain why the interpretation is correct or incorrect. For incorrect interpretations, be sure that your response in (ii) includes the correct interpretation. B. Interpretation One: For both women and men, the full effect of partisanship on the probability of a pro-control opinion is greater than .6. C. Interpretation Two: Strong Republicans have a larger “gender gap” on gun control opinions than do Strong Democrats. Notes 1. Methodologists have developed several techniques that may be used to analyze binary dependent variables. One popular technique, probit analysis, is based on somewhat different assumptions than logistic regression, but it produces similar results. Logistic regression, also called logit analysis or logit regression, is computationally more tractable than probit analysis and thus is the sole focus of this chapter. For a lucid discussion of the general family of techniques to which logistic regression and probit analysis belong, see Tim Futing Liao, Interpreting Probability Models: Logit, Probit, and Other Generalized Linear Models (Thousand Oaks, Calif.: SAGE Publications, 1994). 2. There are two statistically based problems with using OLS on a binary dependent variable, both of which arise from having only two possible values for the dependent variable. OLS regression assumes that its prediction errors, the differences between the predicted values of y and the actual values of y, follow a normal distribution. The prediction errors for a binary variable, however, follow a binomial distribution. More seriously, OLS also assumes homoscedasticity of these errors, that is, that the prediction errors are the same for all values of x. With a binary dependent variable, this assumption does not hold up. An accessible discussion of these problems may be found in Fred C. Pampel, Logistic Regression: A Primer (Thousand Oaks, Calif.: SAGE Publications, 2000), 3–10. 3. Because of this natural log transformation of the dependent variable, many researchers use the terms logit regression or logit analysis instead of logistic regression. Others make a distinction between logit analysis (used to describe a situation in which the independent variables are not continuous but categorical) and logistic regression (used to describe a situation in which the independent variables are continuous or a mix of continuous and categorical). To avoid confusion, we use logistic regression to describe any situation in which the dependent variable is the natural log of the odds of a binary variable. 4. Logistic regression will fit an S-shaped curve to the relationship between an interval-level independent variable and the probability of a dependent variable, but it need not be the same S-shaped pattern shown in Figure 9-1. For example, the technique may produce estimates that trace a “lazy S,” with probabilities rising in a slow, nearly linear pattern across values of the independent variable. Or perhaps the relationship is closer to an “upright S,” with probabilities changing little across the high and low ranges of the independent variable but increasing rapidly in the middle ranges. 5. If the logistic regression coefficient, , were equal to 0, then the odds ratio, Exp( ), would be Exp(0), or e 0, which is equal to 1. 6. Computer output also will report a standard error and test of significance for the intercept, â. This would permit the researcher to test the hypothesis that the intercept is significantly different from 0. So if we wanted to test the null hypothesis that the logged odds of voting for individuals with no formal education (who have a value of 0 on the independent variable) was equal to 0—that is, that the odds of voting for this group was equal to 1—we would use the standard error of the intercept. Much of the time such a test has no practical meaning, and so these statistics have been omitted from Table 9-3 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint57#s9781483396095.i1927) . 7. The Wald statistic (named for statistician Abraham Wald) divides the regression coefficient by its standard error and then squares the result. The value of Wald follows a chi-square distribution with degrees of freedom equal to 1. 8. The estimation procedure used by logistic regression is not aimed at minimizing the sum of the squared deviations between the estimated values of y and the observed values of y. So the conventional interpretation of R-square, the percentage of the variation in the dependent variable that is explained by the independent variable(s), does not apply when the dependent variable is binary. 9. The likelihood function = П {Pi yi × (1−Pi)1−yi }. The expression inside the brackets says, for each individual case, to raise the model’s predicted probability (P) to the power of y, and to multiply that number by the quantity 1 minus the predicted probability raised to power of 1 − y. The symbol П tells us to multiply all these individual results together. The formula is not as intimidating as it looks. When y equals 1, the formula simplifies to (P × 1), since P raised to y equals P, and (1 − P) raised to 1 − y equals 1. Similarly, when y equals 0, the formula simplifies to 1 − P. 10. A likelihood model that uses the independent variable(s) to generate predicted probabilities is called the full model or complete model. In making statistical comparisons between models, some computer programs work with the log of the likelihood ratio, denoted ln(L1/L2), in which L1 is the likelihood of Model 1 (reduced model) and L2 is the likelihood of Model 2 (complete model). Taking the log of the likelihood ratio is equivalent to subtracting the logged likelihood of Model 2 from the logged likelihood of Model 1: ln(L1/L2) = ln(L1)-ln(L2). 11. Degrees of freedom is equal to the difference between the number of independent variables included in the models being compared. Since Model 2 has one independent variable and Model 1 has no independent variables, degrees of freedom is equal to 1 for this example. We can, of course, test the null hypothesis the old-fashioned way, by consulting a chi-square table. The critical value of chisquare, at the .05 level with 1 degree of freedom, is equal to 3.84. Since the change in −2LL, which is equal to 3.98, exceeds the critical value, we can reject the null hypothesis. 12. Logged likelihoods can be confusing. Remember that likelihoods vary between 0 (the model’s predictions do not fit the data at all) and 1 (the model’s predictions fit the data perfectly). This means that the logs of likelihoods can range from very large negative numbers (any likelihood of less than 1 has a negatively signed log) to 0 (any likelihood equal to 1 has a log equal to 0). So if Model 2 had a likelihood of 1—that is, it perfectly predicted voter turnout—then it would have a logged likelihood of 0. In this case, the conceptual formula for R-square would return a value of 1.0. 13. Cox−Snell R-square and Nagelkerke’s R-square are included in SPSS logistic regression output. Another measure, popular among political researchers, is Aldrich and Nelson’s pseudo R-square: (Change in −2LL) / (Change in −2LL + N), in which N is the sample size. Menard has proposed yet another measure, based on the correlation between the logistic regression’s predicted probabilities of y and the actual values of y. See John H. Aldrich and Forrest D. Nelson, Linear Probability, Logit, and Probit Models (Thousand Oaks, Calif.: SAGE Publications, 1984), 54–58; and Scott Menard, Applied Logistic Regression Analysis, 2nd ed. (Thousand Oaks, Calif.: SAGE Publications, 2002), 24–27. For a comparison of the various pseudo-R-square measures, see Thomas J. Smith and Cornelius M. McKenna, “A Comparison of Logistic Regression Pseudo R 2 Indices,” Multiple Linear Regression Viewpoints 39, no. 2 (2013): 17–26. 14. The logged likelihoods for the reduced model and the complete model are not shown in Table 9-8 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-59#s9781483396095.i1978) . Rather, only the chi-square test statistic of interest, the change in −2LL, is reported. Note that, because no independent variables are included in the baseline model and two independent variables are included in the complete model, there are 2 degrees of freedom for the chi-square test. 15. On the complexities of interaction effects in logistic regression, see UCLA: Statistical Consulting Group, “Deciphering Interactions in Logistic Regression,” accessed April 1, 2015, http://www.ats.ucla.edu/stat/stata/seminars/interaction_sem/interaction_sem.htm (http://www.ats.ucla.edu/stat/stata/seminars/interaction_sem/interaction_sem.htm) . The UCLA authors offer this pithy yet vaguely discouraging quote of the day: “Departures from additivity imply the presence of interaction types, but additivity does not imply the absence of interaction types.” This quote is attributed to Kenneth J. Rothman and Sander Greenland, Modern Epidemiology, 2nd ed. (Philadelphia: Lippincott Williams and Wilkins, 1998). 16. This section adopts the terminology used by Richard Williams, in “Using the Margins Command to Estimate and Interpret Adjusted Predictions and Marginal Effects.” Stata Journal 12, no. 2 (2012): 308 –331. 17. It may seem odd to consider a 0−1 variable as having a mean or average in the conventional sense of the term. After all, respondents are coded either 0 or 1. There is no respondent who is coded “.2768” on partisan strength. However, means can also be thought of as random probabilities, termed expected values by statisticians. If you were to pick any respondent at random from the General Social Survey dataset, what is the probability that the case you chose would be a strong partisan? The answer is .2768, the expected value, or mean value, of this variable. 18. See Pampel, Logistic Regression: A Primer, 24–26. 19. See Williams, “Using the Margins Command to Estimate and Interpret Adjusted Predictions and Marginal Effects,” Stata Journal, 326. 20. Data used in these exercises are from the 2014 Pew Political Typology/Polarization Survey (N = 10,003), conducted by the Pew Research Center, 1615 L Street, NW, Suite 700, Washington, D.C. 20036. See Michael Dimock, Carroll Doherty, Jocelyn Kiley, and Russ Oates, “Political Polarization in the American Public: How Increasing Ideological Uniformity and Partisan Antipathy Affect Politics, Compromise and Everyday Life,” published June 12, 2014, http://www.peoplepress.org/2014/06/12/political-polarization-in-the-american-public/ (http://www.peoplepress.org/2014/06/12/political-polarization-in-the-american-public/) . The data are publicly available from http://www.people-press.org/category/datasets/?download=20057011 (http://www.people-press.org/category/datasets/?download=20057011) . 10 Thinking Empirically, Thinking Probabilistically This book has covered only the basics—the essential skills you need to understand political research and to perform your own analysis. Even so, we have discussed a wide range of topics and methodological issues. The first five chapters dealt with the foundations of political analysis: defining and measuring concepts, describing variables, framing hypotheses and making comparisons, designing research, and controlling for rival explanations. In the last four chapters we considered the role of statistics: making inferences, gauging the strength of relationships, performing linear regression analysis, and interpreting logistic regression. As you read research articles in political science, discuss and debate political topics, or evaluate the finer points of someone’s research procedure, the basic knowledge imparted in this book will serve you well. This book has also tried to convey a larger vision of the enterprise of political analysis. Political scientists seek to establish new facts about the world, to provide rich descriptions and accurate measurements of political phenomena. Political scientists also wish to explain political events and relationships. In pursuit of these goals, researchers learn to adopt a scientific mindset toward their work, a scientific approach to the twin challenges of describing and explaining political variables. As you perform your own political analysis, you too are encouraged to adopt this way of thinking. Here are two recommendations. First, in describing new facts, try to think empirically. Try to visualize how you would measure the phenomenon you are discussing and describing. Be open to new ideas, but insist on empirical rigor. Political science, like all science, is based on empirical evidence. This evidence must be described and measured in such a way that others can do what you did and obtain the same results. Second, in proposing and testing explanations, try to think probabilistically. You are well aware of one reason that political researchers must rely on probabilities: Random samples are a fact of life for much political science. Another reason is political science deals with human behavior and human events, and so it is an inexact science. Let’s briefly illustrate why it is important to think empirically. Let’s also look at the reasons political scientists must think probabilistically. 10.1 Thinking Empirically The main projects of political science are to describe concepts and to analyze the relationships between them. But potentially interesting relationships are often obscured by vague, conceptual language. During a class meeting of a voting and elections course, for example, students were discussing the electoral dynamics of ballot initiatives, law-making vehicles used frequently in several states. Controversial proposals, such as denying state benefits to undocumented immigrants or banning same-sex marriage, may appear on the ballot to be decided directly by voters. Near the end of the discussion, one student observed: “It appears to me that most ballot initiatives target minorities. Most ballot initiatives, if they pass, decrease equality. Very few seem designed with egalitarian principles in mind.” Now, this is an interesting, imaginative statement. Is it true? Without conceptual clarification, there is no way to tell. Upon hearing statements such as the one made by this student, you have learned to insist that conceptual terms, like equality or egalitarian principles, be described in concrete language. How would one distinguish an egalitarian ballot initiative, an initiative that increases equality, from one that is not egalitarian, an initiative that decreases equality? Pressed to clarify her conceptual terms, the student settled on one defining characteristic of the degree of egalitarianism in a ballot initiative. The concept of egalitarianism, she said, is defined as the extent to which a ballot initiative would confer new legal rights on an identifiable group. So some initiatives, such as those that would confer protections on commercial farm animals, could be classified as more egalitarian, whereas others, like those making English the state’s official language, could be classified as less egalitarian. With a workable measurement strategy in hand, this student could then turn to an empirical assessment of her in-class claim. Using clearly defined terms and reproducible findings, this student’s research would enhance our understanding of these vehicles of direct democracy. 1 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-65#s9781483396095.i2087) An openness that is tempered by skepticism nurtures knowledge of the political world. This means that political scientists must sometimes revisit relationships and rethink established explanations. The search for truth is an ongoing process. Consider another example. For years, scholars of U.S. electoral behavior measured voter turnout in presidential elections by dividing the number of people who voted by the size of the voting-age population. So, if one wanted to describe trends in turnout, one would calculate the percentage of the voting-age population who voted in each year, and then track the numbers over time. Indeed, measured in this way, turnouts declined steadily after 1960, a high watermark of the twentieth century. From the 1970s through the 1990s, the relatively low turnout in presidential elections became one of the most heavily researched phenomena in U.S. politics, and it has fostered a search for explanatory factors. Some explanations have linked declining turnout to attitudinal variables, such as a weakened attachment to the political parties or an erosion of trust in government. In 2001, research by Michael P. McDonald and Samuel Popkin pointed to a potentially serious measurement problem with using the voting-age population as the basis for calculating turnout. 2 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint-65#s9781483396095.i2089) They showed that, by using the voting-age population, researchers had been including large groups of ineligible people, such as felons or noncitizens. What is more, the percentage of noncitizens among the voting-age population went from 2 percent in 1966 to over 8 percent in 2014. Once McDonald and Popkin adjusted for these and other measurement errors, they showed that although turnout dropped after 1972 (when the eligible electorate was expanded to include eighteen-year-olds), there was no downward trend. With this more precise measurement strategy in hand, political scientists have turned their attention toward the explanation of dramatic increases in electoral turnout in the new millennium. 3 (http://content.thuzelearning.com/books/Pollock.2826.18.1/sections/navpoint65#s9781483396095.i2092) 10.2 Thinking Probabilistically Statistics occupies a position of central importance in the research process. A discussion of random error has come up in several topics in this book—measurement, sampling, hypothesis testing, and statistical significance. We have seen that the accuracy of measurement...
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Hello, am...


Anonymous
Just the thing I needed, saved me a lot of time.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags