# Statistics help needed

*label*Statistics

*timer*Asked: Feb 6th, 2016

**Question description**

I need help with statistics help. The book I use is

APPLIED REGRESSION ANALYSIS and GENERALIZED LINEAR MODELS

SECOND EDITION

John Fox

McMaster University, Hamilton, Ontario, Canada

The exercises I need answered are copied below. Can you offer that assistance in say 1 day and at what cost? Thanks.

**Exercise 5.2. **∗Suppose that the means and standard deviations of *Y *and *X *are the same: *Y *= *X*

and *SY *= *SX*.

(a) Show that, under these circumstances,

*BY *|*X *= *BX*|*Y *= *rXY*

where *BY *|*X *is the least-squares slope for the simple regression of *Y *on *X*; *BX*|*Y *is the

least-squares slope for the simple regression of *X *on *Y *; and *rXY *is the correlation between

the two variables. Show that the intercepts are also the same, *AY *|*X *= *AX*|*Y *.

(b) Why, if *AY *|*X *= *AX*|*Y *and *BY *|*X *= *BX*|*Y *, is the least-squares line for the regression of *Y*

on *X *different from the line for the regression of *X *on *Y *(as long as *r*2 *< *1)?

(c) “Regression toward the mean” (the original sense of the term “regression”): Imagine that

*X *is father’s height and *Y *is son’s height for a sample of father-son pairs. Suppose that

*SY *= *SX*, that *Y *= *X*, and that the regression of sons’ heights on fathers’ heights is linear.

Finally, suppose that 0 *< rXY < *1 (i.e., fathers’ and sons’ heights are positively correlated,

but not perfectly so). Show that the expected height of a son whose father is shorter than

average is also less than average, but to a smaller extent; likewise, the expected height of a

son whose father is taller than average is also greater than average, but to a smaller extent.

Does this result imply a contradiction—that the standard deviation of son’s height is in fact

less than that of father’s height?

(d) What is the expected height for a father whose son is shorter than average? Of a father

whose son is taller than average?

(e) Regression effects in research design: Imagine that educational researchers wish to assess

the efficacy of a new program to improve the reading performance of children. To test the

program, they recruit a group of children who are reading substantially below grade level;

after a year in the program, the researchers observe that the children, on average, have

improved their reading performance. Why is this a weak research design? How could it be

improved?

**Exercise 5.7. **Consider the general multiple-regression equation

*Y *= *A *+ *B*1*X*1 + *B*2*X*2 +· · ·+*BkXk *+ *E*

An alternative procedure for calculating the least-squares coefficient *B*1 is as follows:

1. Regress *Y *on *X*2 through *Xk*, obtaining residuals *EY *|2 *... k*.

2. Regress *X*1 on *X*2 through *Xk*, obtaining residuals *E*1|2 *... k*.

3. Regress the residuals *EY *|2 *... k *on the residuals *E*1|2 *... k*. The slope for this simple regression

is the multiple-regression slope for *X*1, that is, *B*1.

(a) Apply this procedure to the multiple regression of prestige on education, income, and

percentage of women in the Canadian occupational prestige data, confirming that the

coefficient for education is properly recovered.

(b) Note that the intercept for the simple regression in Step 3 is 0. Why is this the case?

(c) In light of this procedure, is it reasonable to describe *B*1 as the “effect of *X*1 on *Y *when

the influence of *X*2*, . . . , Xk *is removed from both *X*1 and *Y *”?

(d) The procedure in this problem reduces the multiple regression to a series of simple

regressions (in Step 3). Can you see any practical application for this procedure? (See

the discussion of added-variable plots in Section 11.6.1.)

**Exercise 6.7. **Consider the regression model *Y *= *α*+*β*1*x*1+*β*2*x*2+*ε*.Howcan the incremental sum-

of-squares approach be used to test the hypothesis that the two population slopes are equal

to each other, *H*0: *β*1 = *β*2? [*Hint*: Under *H*0, the model becomes *Y *= *α *+ *βx*1 + *βx*2 + *ε *=

*Y *= *α*+*β(x*1 +*x*2*)*+*ε*, where *β *is the common value of *β*1 and *β*2.] Under what circumstances

would a hypothesis of this form be meaningful? (*Hint*: Consider the units of measurement of *x*1

and *x*2.) Now, test the hypothesis that the “population” regression coefficients for education and

income in Duncan’s occupational prestige regression are equal to each other. Is this test sensible?