Binary Logistic Regression: Week 5
Binary Logistic Regression: Week 5
Program Transcript
MATTHEW JONES: This week, we're going to continue with logistic regression,
except now we're going to add more predictor variables. That is, we're going to
be building multiple logistic regression model. Last week, we looked at the effect
of gender on fear of asking for help. This week, we're going to add one more
variable-- age. That is, we're going to examine the effect of gender on fear of
asking for help while controlling for age, and vice versa. Let's go to SPSS to see
how we do this.
To conduct our binary logistic regression, we're going to go up here to Analyze
and Regression, just as you're familiar with in fitting ordinary least squares
models, except now we're going to move over to Binary Logistic Regression. So
our dependent variable is fear of asking for help. So remember, we have two
variables in here about fear of asking for help.
We have the original variable, fear of asking for help at time point 1-- and
actually, there is a time point 2 and a time 3 as well-- but also, this fear of asking
for help dichotomous variable. So remember, in binary logistic regression, the
outcome variable is binary.
So we move that over into our dependent variable. And then our covariates are
going to be gender and student age. Now, you might remember from ordinary
least squares regression, whenever you have a categorical variable-- and gender
is a categorical variable here-- you have to dummy code that. It was already
dummy coded in this data set as 0 and 1, but there's a handy little feature in
SPSS that helps you with dummy coding just in case you have a variable that
has more than two attributes to it, maybe something like political party affiliation
or race or ethnicity.
But we can go up here to Categorical. And we move gender into there. The
interesting thing about the categorical covariate box in logistic regression is it lets
you build reference categories. So it will go ahead and take care of much of that
for you. So you just have to tell it whether you want to reference first or last as
the reference category. So here by default, it's listed as last.
So my variable of gender is coded as 0 for male and 1 is female. So it's going to
say, all right, the reference category is the last. That is the highest value, 1. So 1
is higher than 0, and 1 is coded as female. So when I get a coefficient, that is
always going to be compare to females. Click Continue.
And one thing I do want to mention under Options is you have this HosmerLemeshow goodness of fit test. And this was a pretty popular test that's been
used to assess how well your logistic regression model fits. Currently, Hosmer no
longer recommends the use of this test, so I'm not going to go over it here. But
© 2017 Laureate Education, Inc.
1
Binary Logistic Regression: Week 5
you might want to talk to your instructor if he or she has a preference about that
test.
So we're going to click OK, and that will give us our output. And we get quite a bit
of output in the logistic regression model. Here we see we have some case
processing summary. It's quite handy that we have no missing cases here.
Again, this is a simulated data set.
But a couple of things that I really want to pay attention to before I start
interpreting output is the coding-- so first of all, my dependent variable coding. So
I see here that my dependent variable yes-- so yes, I'm afraid of asking for help is
coded as yes, and no is 0. So yes equals 1, so it's predicting fear of asking for
help.
Also, here's that categorical variable coding. So it gives you some parameter
coding. So it's telling me male equals 1. So just as I mentioned in the categorical
variable box, whenever we see that coefficient and we're interpreting the values
for the gender variable, it's going to be for males, again, compared to females.
So the first block we get here is the beginning block, block 0. This is sometimes
called the null model or the intercept only model. So it gives me a little bit of
information. Some people just go past it rather quickly. But depending on what
you want to find out, you might want to spend a little bit more time with it.
Let's just tell you quickly where some of these numbers come from. So you see
this percentage correct here. So the percentage correct, 57. So that is taking 191
and dividing it by 335, which is a total sample. 191 plus 144 equals 335. So 57%
had no fear, and 43% had fear.
So with no other information-- so again, the null model, no other information-- the
best strategy is to predict students will have no fear. Indeed, we can see just
looking at raw counts, those counts are indeed higher.
Here we have the variables in the equation, again, for the null model. This little
exp(B) refers to the odds ratio. So it's predicting the odds of having fear.
Sometimes this is interpreted. Sometimes it's not. It's just depends on what you
are interested in here.
The next piece of output gives us the variables not in the equation. So this is
gender and age. These are the variables we're interested in. And already, we can
see there's going to be some significance there. So let's go ahead and scroll
down to block 1, which is going to be our full model.
So I'm looking here at model summary. So in an ordinary least squares
regression, you're probably used to interpreting the r square and the adjusted r
square. You really don't have that in a logistic regression model. It gives you a
© 2017 Laureate Education, Inc.
2
Binary Logistic Regression: Week 5
model summary, and SPSS defaults to this thing called negative 2 log likelihood.
The Cox and Snell r square and the Nagelkerke r square. So I'm just going to
focus on these two right here that say r square.
Sometimes, they're technically not r square, and so we often refer to them as
pseudo r square. So sometimes, you'll see that in the literature when you're
reading some results, the pseudo r square. I find for my benefit, the easiest one
to interpret is the Nagelkerke, and simply because that has a possible range of
values from 0 to 1 where Cox and Snell goes from 0 to 0.75.
So it's easier to interpret, I think, for most people 0.05 out of 100 rather than 0.03
out of 75. So I can treat this like my adjusted r square and ordinary least squares
regression where I'm saying-- I'm turning this into a percentage and saying, OK,
my predictor variables account for 5.1% of what I'm observing in my outcome
variable. So nothing too exciting there. It's not a large pseudo r square. But
nonetheless, there is something there. There is some predictive value.
So now I'm moving down to my second classification table. Now, we saw the
classification table above, but this is including our predictor variables. So now we
see the percentage correct is a little higher, 61.2%. So we get a little bit more
information.
So here's the meat and potatoes, if you will, of a logistic regression analysis, the
variables in the equation. So we have a beta, just like we would in an ordinary
least squares regression. But that's interpreted in a rather different fashion.
Because in logistic regression, you're talking about log odds. So it's not as easy
to say as you would in ordinary least squares regression, for every one-unit
increase in x, y is going to increase this much, because you have to talk in units
of log odds, which the vast majority of people probably find quite confusing.
So we tend to focus on the exponent of that, which is an odds ratio. And that's
what we talked a little bit last week in our Excel template about. So here, when
we're looking at gender, the first thing we explore is whether it's statistically
significant or not. 0.03, it is below the conventional threshold of 0.05. So we can
judge that statistically significant.
So what we can say about men-- because remember, we're talking about men
here. It's parameter coded as 1. And so females were originally coded as 1. But
now we're looking at females as the reference category.
So the odds of being fearful of asking for help are lower for men. And here's a
really important part in how you interpret it-- compared to females. Because
remember, it's that ratio.
© 2017 Laureate Education, Inc.
3
Binary Logistic Regression: Week 5
So maybe if it's-- again, as we discussed last week, some of these, when you
have a negative beta here, your exponent, your odds ratio is going to be that
below that 1.0, 1.0 being all things are equal. Sometimes, it's easier to invert that.
So we might want to take the reciprocal of that. We have 1 divided by 0.559
gives us 1.67. So we might want to interpret that in terms of the females.
What this means is the odds of females being fearful of asking for help are 1.67
times higher compared to men controlling for age. Or we could just simply keep it
and interpret it as the 0.599 for males. Again, it's up to you what makes most
sense to you and your audience.
You could always go back through the procedure and change the parameter
coding in your categorical box and use males as a reference category. And you
would get that 1.67 odds ratio in your output instead. So looking at our age, you
see that is also statistically significant, 0.003, well below the 0.05 threshold. We
have a positive beta here. Our odds ratio is really not that exciting. You see it's
pretty close to 1, but nonetheless slightly higher than 1 and statistically
significant.
So what we can say about that is for every one-unit increase in age compared to
the previous age-- so remember how we're interpreting ratios-- the odds of fear of
asking for help increased by 1.038, controlling for gender. So again, remember,
we were still doing a type of multiple regression where we're controlling for the
combined effects of our independent variables.
So here for the odds ratio for gender, again, if we're having some difficulty
interpreting this odds ratio below 1, or we really wanted to interpret females
compared to males, we could just simply go back and change our parameter
coding. So we go back in to Analyze, Regression, Binary Logistic. And everything
is still in there. So we just go back to Categorical.
And what we do here is highlight this. So again, remember, by default, it said
last, so a reference category last. This is the highest coded value or attribute for
that variable. And coded in the data, 1 equals female, so it's saying, OK, female
is going to be your reference.
Maybe we want it to be male. So then we just simply go to first. Important step
here is don't forget to hit Change. Because it will let you hit Continue, and it will
actually never change it until you hit Change. So don't forget that. And it tells you
right here up in categorical covariates.
OK, now first is here in parentheses. Click Continue. So let's click OK. And now
we should get an odds ratio that is 1 divided by 0.559. And Indeed, we do. There
we had gender, 1.67. So maybe that might be a little easier to interpret.
© 2017 Laureate Education, Inc.
4
Binary Logistic Regression: Week 5
Binary Logistic Regression: Week 5
Additional Content Attribution
FOOTAGE:
GettyLicense_160463144
simonkr/Creatas Video/Getty Images
GettyLicense_626791754
AzmanL/Creatas Video/Getty Images
GettyLicense_114759820
David Baumber/Vetta/Getty Images
© 2017 Laureate Education, Inc.
5
Purchase answer to see full
attachment