Access over 20 million homework documents through the notebank

Get on-demand Q&A homework help from verified tutors

Read 1000s of rich book guides covering popular titles

Anonymous

i need just edit

chapter 7

my prof give me some points to fixit he side :

I reviewed your most recent draft of Chapter 7 Exercises (multiple regression).

__ __

For question 2a, check the “id” variable to be sure you have the correct cases numbers that need to be removed. There are three cases that should be removed that you did not list. Also, 18 and 1129 will automatically be removed since there is missing data for these cases (no MAH_1 value).

__ __

Some of your values are different than what I have. Be sure you run the multiple regression with the profile-b data set and only with cases that are MAH_1 ≤ 22.458.

__ __

For question 2h, you should not include the values for the variables that are not statistically significant. The regression equation should only include those variables that are statistically significant.

__ __

For 2i, also mention that these two variables are not statistically significant and provide the p (sig) value.

__ __

1. The following output was generated from conducting a forward multiple
regression to identify which IVs (urban, birthrat, lnphone, and lnradio) predict
lngdp. The data analyzed were from the SPSS country-a.sav data file.
a. Evaluate the tolerance statistics. Is multicollinearity a problem?
In order to evaluate the presence of multicollinearity, we can exploit the tolerance
statistics, calculated as 1-R2. A small tolerance is an indication of the fact that the
variable considered is almost a perfect linear combination of the other independent
variables already in the equation. Usually, a value of 0.1 serves as the cutoff point.
Looking at the table, we can see assess that multicollinearity is not a problem because all
tolerance statistics are greater than .1 for all the independent variable in both
specifications.
b. What variables create the model to predict lngdp? What statistics support your
response?
The model summary output indicates that the variables used for the forward multiple
regression are are respectively lnphone (for the simple regression) and lnphone +
birthrate (for the multiple regression).
If we look at the p-values, we can see that both of the coefficients are statistically
significant in explaining the variation of lngdp. However, that of birthrat is significant at
a 5% significance level, differently from that of lnphone which is significant at 1%
significance level. Moreover, despite its significance, the coefficient of birthrat is rather
small in magnitude and the R^2 change between the regression including only lnphone
and the following one with the added birthrate is only 0.004. This is a suggestion of the
fact that the explicative power of birthrat is not much high.
c. Is the model significant in predicting lngdp? Explain.
Regression results indicate an overall model of two predictors (lnphone and birthrat) that
significantly predicts lngdp.
The R squared = .890, the Adjusted R squared = .888
d. What percentage of variance in lngdp is explained by the model?
The model accounted for 89% of the variance in lndgp, as it can be retrieved from the
R^2.
e. Write the regression equation for lngdp.
lngdp = 6.878 + .663*(lnphone) - .013*(birthrat)
2. This question utilizes the data sets profile-a.sav and profile-b.sav,
You are interested in examining whether the variables shown here in brackets
[years of age (age), hours worked per week (hrs 1), years of education (educ), years
of education for mother (maeduc), and years of education for father (paeduc)] are
predictors of individual income (rincmdol). Complete the following steps to conduct
this analysis.
a. Using profile-a.sav, conduct a preliminary regression to calculate Mahalanobis
distance. Identify the critical value for chi-square. Conduct Explore to identify outliers.
Which cases should be removed from further analysis?
In order to calculate Mahalanobis distance, I conducted a preliminary regressio
Model Summaryb
Std. Error of
R
Adjusted R
the
Model
R
Square
Square
Estimate
1
.580a
.336
.331
4.345
a. Predictors: (Constant), Highest Year of School
Completed, Father, Number of Hours Worked Last
Week, Age of Respondent, Highest Year of School
Completed, Highest Year of School Completed,
Mother
b. Dependent Variable: RESPONDENTS INCOME
The model summary indicates the general statistics of the regression where all the IVs
were included into the model
ANOVAa
Model
1
Regressio
n
Residual
Sum of
Squares
Mean
Square
df
6136.473
5
1227.295
12123.027
642
18.883
F
64.994
Sig.
.000b
Total
18259.500
647
a. Dependent Variable: RESPONDENTS INCOME
b. Predictors: (Constant), Highest Year of School Completed, Father,
Number of Hours Worked Last Week, Age of Respondent, Highest Year
of School Completed, Highest Year of School Completed, Mother
The ANOVA table presents the model significantly predicts the dependent variable of
rincmdol, with the F-test for the overall significance telling us that at least one of the
predictors are statistically significant. F(5, 642) = 64.994, p<.001.
Coefficientsa
Unstandardized
Coefficients
B
Std. Error
-5.487
1.302
Model
1
(Constant)
Age of
.133
.016
Respondent
Highest Year of
School
.507
.071
Completed
Number of
Hours Worked
.142
.012
Last Week
Highest Year of
School
.005
.074
Completed,
Mother
Highest Year of
School
.041
.055
Completed,
Father
a. Dependent Variable: RESPONDENTS INCOME
Standardize
d
Coefficients
Beta
t
-4.215
Sig.
.000
.291
8.585
.000
.256
7.145
.000
.385
11.788
.000
.003
.066
.948
.030
.733
.464
The coefficient table indicates the coefficients that were used to predict the regression
equation.
Residuals Statisticsa
Minimu Maximu
m
m
Mean
Predicted
Value
Std. Predicted
Value
Standard Error
of Predicted
Value
Adjusted
Predicted
Value
Residual
Std. Residual
Stud. Residual
Deleted
Residual
Stud. Deleted
Residual
Mahal.
Distance
Cook's
Distance
Centered
Leverage
Value
Std.
Deviation
N
4.16
24.16
13.64
3.080
648
-3.077
3.415
.000
1.000
648
.176
1.005
.398
.128
648
4.23
24.48
13.64
3.083
648
-15.499
-3.567
-3.583
13.759
3.166
3.188
.000
.000
.000
4.329
.996
1.001
648
648
648
-15.637
13.945
-.003
4.375
648
-3.616
3.211
-.001
1.003
648
.059
33.575
4.992
4.223
648
.000
.042
.002
.004
648
.000
.052
.008
.007
648
a. Dependent Variable: RESPONDENTS INCOME
Case Processing Summary
Cases
Missing
Valid
N
Mahalanobis
Distance
Percent
677
N
45.1%
823
Total
Percent
54.9%
N
Percent
1500 100.0%
The sample consisted of 1500 (823 missing values).
Descriptives
Statistic
Mahalanobis
Distance
Mean
95%
Confidence
Interval for
Mean
Lower
Bound
Upper
Bound
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
Std. Error
4.9925522 .16121086
4.6760180
5.3090864
4.5041201
3.8432691
17.595
4.19458144
.05899
33.57526
33.51627
4.15036
2.310
7.910
.094
.188
The skewness statistics has a z-score of 2.310 /.094= 24.574. Based on this, we can
conclude that the skewness is substantial and the distribution is non-normal.
The kurtosis values are in line with that, 7.910/.188 = 42.074 shows there is no
significance.
Using a chi-squared table, critical value 22.458 was found, therefore, cases 406, 508, 18,
1129, and 351 exceeded that value so should be eliminated.
The box plots is not normal and there are outliers at the highest end of the distribution.
The critical value for chi-square is 22.458. Any cases with mahlabnobis>22.458 should
be eliminated from the regression analysis. Therefore, cases 406, 508, 18, 1129, and 351
were eliminated following this reasoning.
For all subsequent analyses, use profile-b.sav. Make sure that only cases where
MAH_1<22.458 are selected.
b. Create a scatterplot matrix. Can you assume linearity and normality?
The scatterplot matrix with the transformed variables displays elliptical shapes,
suggesting that the variables are linear normally distributed.
Tests of Normality
Kolmogorov-Smirnova
Shapiro-Wilk
Statistic
df
Sig.
Statistic
df
Sig.
Age of
Respondent
Highest Year of
School
Completed
.057
609
.000
.975
609
.000
.151
609
.000
.944
609
.000
Number of
Hours Worked
.184
609
Last Week
Highest Year of
School
.270
609
Completed,
Mother
Highest Year of
School
.180
609
Completed,
Father
RESPONDENT
.115
609
S INCOME
a. Lilliefors Significance Correction
.000
.960
609
.000
.000
.891
609
.000
.000
.964
609
.000
.000
.952
609
.000
The Shapiro-Wilk test is particularly useful for testing the non-normality of the variables.
The null hypothesis of this test is that the variables are normally distributed. From the
results, we can reject the null hypothesis for all the variables, concluding that none of
them is normally distributed.
From the plot we can see that the residuals of the regression do not cluster on a horizontal
line; in fact, there is an even distribution above and below the reference. From this, it
seems that there is a moderate violation of linearity and homoscedasticity, which
however should not invalidate the analysis.
d. Conduct multiple regression using the Enter method. Evaluate the tolerance
statistics. Is multicollinearity a problem?
Multicollinearity is not a problem because all tolerance statistics is greater than .1.
Descriptive Statistics
Std.
Mean
Deviation
RESPONDENT
13.25
5.058
S INCOME
Age of
39.45
11.547
Respondent
Highest Year of
School
14.25
2.587
Completed
Number of
Hours Worked
42.88
14.059
Last Week
Highest Year of
School
11.81
2.802
Completed,
Mother
Highest Year of
School
11.65
3.862
Completed,
Father
N
609
609
609
609
609
609
Correlations
Numbe
RESPO
Highest
r of
NDENT
Year of Hours
S
Age of
School Worked
INCOM Respond Complet Last
E
ent
ed
Week
Pearson
Correlation
RESPON
DENTS
INCOME
Age of
Responde
nt
Highest
Year of
School
Comple
ted,
Mother
Highest
Year of
School
Comple
ted,
Father
1.000
.270
.335
.522
.036
.050
.270
1.000
-.017
.053
-.305
-.275
Sig. (1-tailed)
Highest
Year of
School
Complete
d
Number
of Hours
Worked
Last
Week
Highest
Year of
School
Complete
d, Mother
Highest
Year of
School
Complete
d, Father
RESPON
DENTS
INCOME
Age of
Responde
nt
Highest
Year of
School
Complete
d
Number
of Hours
Worked
Last
Week
.335
-.017
1.000
.145
.321
.370
.522
.053
.145
1.000
.037
.049
.036
-.305
.321
.037
1.000
.578
.050
-.275
.370
.049
.578
1.000
.
.000
.000
.000
.185
.109
.000
.
.337
.097
.000
.000
.000
.337
.
.000
.000
.000
.000
.097
.000
.
.180
.112
N
Highest
Year of
School
Complete
d, Mother
Highest
Year of
School
Complete
d, Father
RESPON
DENTS
INCOME
Age of
Responde
nt
Highest
Year of
School
Complete
d
Number
of Hours
Worked
Last
Week
Highest
Year of
School
Complete
d, Mother
Highest
Year of
School
Complete
d, Father
.185
.000
.000
.180
.
.000
.109
.000
.000
.112
.000
.
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
Correlation table indicates number of hours worked has highest correlation (.522) and
highest year of school completed (.355) is the second highest correlation. Also indicates
both mom (.036) and dad (.050) have the lowest correlation.
All variables were entered using the enter method.
Model Summaryb
Model
1
R
.635a
Std. Error of
R
Adjusted R
the
Square
Square
Estimate
.404
.399
3.922
a. Predictors: (Constant), Highest Year of School
Completed, Father, Number of Hours Worked Last
Week, Age of Respondent, Highest Year of School
Completed, Highest Year of School Completed,
Mother
b. Dependent Variable: RESPONDENTS INCOME
ANOVAa
Sum of
Squares
Mean
Square
Model
df
F
Sig.
1
Regressio
6280.935
5
1256.187 81.677
.000b
n
Residual
9274.119
603
15.380
Total
15555.054
608
a. Dependent Variable: RESPONDENTS INCOME
b. Predictors: (Constant), Highest Year of School Completed, Father,
Number of Hours Worked Last Week, Age of Respondent, Highest Year
of School Completed, Highest Year of School Completed, Mother
The ANOVA table suggests that the model significantly predicts the dependent variable
of income, with the F test of the overall significance telling us that at least one variable is
useful in predicting the income, F(5, 603) = 81.677, p<.001.
The coefficient table indicates the coefficients that were used to predict the regression
equation.
Collinearity Diagnosticsa
Variance Proportions
Di
m
en
Conditi
Age of
si Eigen
on
(Cons Respo
Model on value Index
tant) ndent
1
1
5.716
1.000
.00
.00
2
.131
6.611
.00
.22
3
.082
8.342
.00
.19
4
.034 12.888
.03
.20
5
.024 15.366
.00
.13
Highest
Year of
School
Comple
ted
.00
.00
.00
.07
.63
6
.012 21.772
.96
.26
.29
a. Dependent Variable: RESPONDENTS INCOME
Residuals Statisticsa
Minimu
m
Maximu
m
Mean
Number
of
Hours
Worked
Last
Week
.00
.06
.85
.04
.02
Highest
Year of
School
Complet
ed,
Mother
.00
.03
.00
.26
.49
Highe
st
Year
of
Scho
ol
Comp
leted,
Fathe
r
.00
.17
.00
.80
.00
.03
.22
.01
Std.
Deviation
Predicted
3.26
23.15
13.25
Value
Residual
-15.487
8.673
.000
Std. Predicted
-3.106
3.082
.000
Value
Std. Residual
-3.949
2.211
.000
a. Dependent Variable: RESPONDENTS INCOME
N
3.214
609
3.906
609
1.000
609
.996
609
e. Does the model significantly predict rincmdol? Explain.
The results indicate the model significantly predicts rincmdol.
The explanation power is given by R square = .404 (not too high),
Adjusted R squared = .399, F(5, 603) = 81.677, p <.001.
f. Which variables significantly predict rincmdol? Which variable is the best
predictor of the DV?
The variables of age (B=.110, Beta=.252, t=7.485, p<.001), edu (B=.531, Beta=.271,
t=7.818, p<.001), and hrs1(B=.169, Beta=.469, t=14.741, p<.001) significantly predict
the DV. The variable of hrs1 is the best predictor of rincmdol as indicated by the beta
weight and respective t and p-values.
g. What percentage of variance in rincmdol is explained by the model?
The model accounted for 40.4% of the variance rincmdol.
h. Write the regression equation for the standardized variables.
Income= -6.052 + .252 * age + .272 * educ + .469 * hrs + .017*maeduc + -.014*paeduc
i. Explain why the variables of mother’s and father’s education are not significant
predictors of rincmdol.
Bivariate and partial correlation coefficients of these two variables with the DV are very
low. Therefore, there seems to be not much evidence of these variables being important
in explaining the DV.
Advanced and Multivariate
Statistical Methods
Practical Application
and Interpretation
Sixth Edition
Craig A. Mertler
Arizona State University
Rachel Vannatta Reinhart
Bowling Green State University
Sixth edition published 2017
by Routledge
711 Third Avenue, New York, NY 10017
and by Routledge
2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN
Routledge is an imprint of the Taylor & Francis Group, an informa business
© 2017 Taylor & Francis
The right of Craig A. Mertler and Rachel Vannatta Reinhart to be identiﬁed as authors of this work has been asserted by
them in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988.
Cover design by 809 Design Group.
Editorial assistance provided by Randall R. Bruce, Jenifer Dill, Karen Sommerfeld, and Sharon Young.
All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic,
mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information
storage or retrieval system, without permission in writing from the publishers.
Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identiﬁcation and explanation without intent to infringe.
SPSS is a registered trademark of IBM Corp. Screen images copyright © IBM Corp. Used with permission. This book is not
approved or sponsored by SPSS Statistics.
First edition published by Pyrczak 2001
Fifth edition published by Pyrczak 2013
Library of Congress Cataloging in Publication Data
A catalog record for this book has been requested
ISBN: 978-1-138-28971-0 (hbk)
ISBN: 978-1-138-28973-4 (pbk)
ISBN: 978-1-315-26697-8 (ebk)
Typeset in Times New Roman
Visit the eResources: www.routledge.com/9781138289734
Contents
Detailed Chapter Contents
v
Preface
xi
Acknowledgments
xiv
Dedications
xv
Chapter 1
Introduction to Multivariate Statistics
1
Chapter 2
A Guide to Multivariate Techniques
13
Chapter 3
Pre-Analysis Data Screening
27
Chapter 4
Factorial Analysis of Variance
71
Chapter 5
Analysis of Covariance
99
Chapter 6
Multivariate Analysis of Variance and Covariance
125
Chapter 7
Multiple Regression
169
Chapter 8
Path Analysis
203
Chapter 9
Factor Analysis
247
Chapter 10 Discriminant Analysis
279
Chapter 11 Logistic Regression
307
Appendix A SPSS Data Sets
327
Appendix B The Chi-Square Distribution
357
Glossary
359
References
369
Subject Index
371
iii
Notes
iv
Detailed Chapter Contents
Chapter 1 Introduction to Multivariate Statistics .........................................1
Section 1.1 Multivariate Statistics: Some Background ................................................................. 2
Research Designs .................................................................................................................... 2
The Nature of Variables .......................................................................................................... 3
Data Appropriate for Multivariate Analyses ........................................................................... 4
Standard and Sequential Analyses .......................................................................................... 5
Difficulties in Interpreting Results .......................................................................................... 7
Section 1.2 Review of Descriptive and Inferential Statistics ........................................................ 7
Descriptive Statistics ............................................................................................................... 7
Inferential Statistics ................................................................................................................. 9
Section 1.3 Organization of the Book ......................................................................................... 12
Chapter 2 A Guide to Multivariate Techniques ......................................... 13
Section 2.1 Degree of Relationship Among Variables................................................................ 14
Bivariate Correlation and Regression.................................................................................... 14
Multiple Regression .............................................................................................................. 14
Path Analysis ......................................................................................................................... 14
Section 2.2 Significance of Group Differences ........................................................................... 15
t Test ...................................................................................................................................... 15
One-Way Analysis of Variance ............................................................................................. 15
One-Way Analysis of Covariance ......................................................................................... 15
Factorial Analysis of Variance .............................................................................................. 16
Factorial Analysis of Covariance .......................................................................................... 16
One-Way Multivariate Analysis of Variance ........................................................................ 16
One-Way Multivariate Analysis of Covariance .................................................................... 17
Factorial Multivariate Analysis of Variance ......................................................................... 17
Factorial Multivariate Analysis of Covariance...................................................................... 17
Section 2.3 Prediction of Group Membership ............................................................................. 17
Discriminant Analysis ........................................................................................................... 17
Logistic Regression ............................................................................................................... 18
Section 2.4 Structure ................................................................................................................... 18
Factor Analysis and Principal Components Analysis ........................................................... 18
Section 2.5 The Table of Statistical Tests .................. ...

Student has agreed that all tutoring, explanations, and answers provided by the tutor will be used to help in the learning process and in accordance with Studypool's honor code & terms of service.

This question has not been answered.

Create a free account to get help with this and any other question!

Brown University

1271 Tutors

California Institute of Technology

2131 Tutors

Carnegie Mellon University

982 Tutors

Columbia University

1256 Tutors

Dartmouth University

2113 Tutors

Emory University

2279 Tutors

Harvard University

599 Tutors

Massachusetts Institute of Technology

2319 Tutors

New York University

1645 Tutors

Notre Dam University

1911 Tutors

Oklahoma University

2122 Tutors

Pennsylvania State University

932 Tutors

Princeton University

1211 Tutors

Stanford University

983 Tutors

University of California

1282 Tutors

Oxford University

123 Tutors

Yale University

2325 Tutors