STAT 415/615
Miller
Assignment #10
Due: Wednesday April 15, 2020 by 5 PM EST
Guidelines for your submission:
•
•
•
•
•
Your responses must be submitted as a single PDF (Portable Document Format) file.
Include your name at the top.
Please copy and paste any R (or similar) output or graphics that you create into your assignment
document. Work by hand can also be included. All responses must be easy to read and labeled with
the appropriate problem number.
Please save your work regularly (both your work in R and the solutions to the assignment questions).
Submit the .pdf file with your responses via Blackboard.
Part 1: Working with Dummy Variables
Instructions: You must show all work and/or provide a full explanation for the following problems. You should
use R or other software for plots.
1. (based on text p. 337: 8.13) Consider a regression model 𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝜀, where
𝑋1 is a numerical variable and 𝑋2 is a dummy variable. Plot the response functions (the graphs
of 𝐸(𝑌) as a function of 𝑋1 for different values of 𝑋2), if 𝛽0 = 25, 𝛽1 = 0.2, and 𝛽2 = −12.
2. Continue the previous exercise. Sketch the response curves for the model with interaction,
𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝛽3 𝑋1 𝑋2 + 𝜀, given that 𝛽3 = −0.2.
3. (based on text p.340: 8.34) In a regression study, three types of banks were involved, namely, (1)
commercial, (2) mutual savings, and (3) savings and loan. Consider the following dummy
variables for the type of bank:
a) Develop the first-order linear regression model (with no interactions) for relating last year’s
profit or loss (𝑌 ) to the size of the bank (𝑋1) and type of bank (𝑋2 , 𝑋3).
b) State the response function for the three types of banks.
c) Interpret each of the following quantities: (1) 𝛽2, (2) 𝛽3 , (3) 𝛽2 − 𝛽3.
1
STAT 415/615
Miller
Part 2: Intro to Model-Building
Instructions: Please read section 9.1 & 9.2 in the text (pp. 343-353)before answering the following questions.
You must provide a full explanation for each.
4. (based on text p. 376: 9.1) A speaker stated: “In well-designed experiments involving quantitative
explanatory variables, a procedure for reducing the number of explanatory variables after the data are
obtained is not necessary. Do you agree? Discuss.
5. (based on text p. 376: 9.2) The dean of a graduate school wishes to predict the GPA in graduate work
for recent applicants. List eight variables that might be useful explanatory variables here.
Part 3: Mini-Project
Instructions: Use statistical software to answer the following questions. For each, please provide any relevant
output and your answer to the question.
6.
(based on text pp. 337-8: 8.16, 8.20) This problem returns to our old GPA data. Previously the
GPA of the graduate students was predicted based on their ACT score. An assistant to the director of
admissions conjectured that the predictive power of the model could be improved by adding
information on whether the student had chosen a major field of concentration at the time the
application was submitted. The data set for this problem is GPA-1.19-8.16.csv (on Blackboard).
a) Fit the regression model 𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝜀, where 𝑋1 is the ACT score and 𝑋2 = 1 if a
student has indicated a major at the time of application and 𝑋2 = 0 if the major was undecided. State
the estimated regression function.
b) Test whether 𝑋2 can be dropped from the model, using 𝛼 = 0.05.
c) Fit the regression model 𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝛽3 𝑋1 𝑋2 + 𝜀 and state the estimated
regression function.
d) Interpret 𝛽3.
e) Test whether the interaction term can be dropped from the model, using 𝛼 = 0.05.
2
REGRESSION MODELS
FOR QUANTITATIVE AND
QUALITATIVE
PREDICTORS (8.2-8.5)
INTRO TO THE MODELBUILDING PROCESS
(CHAPTER 9)
S TAT 4 1 5 / 6 1 5
MILLER
INTERACTION
REGRESSION
MODELS
S TAT 4 1 5 / 6 1 5
MILLER
REVIEW: GENERAL LINEAR REGRESSION MODEL
• The general linear regression model, with Normal error terms, in terms of predictor
variables 𝑿 is :
𝒀𝒊 = 𝛽0 + 𝛽1 𝑋𝑖1 + 𝛽2 𝑋𝑖2 + ⋯ + 𝛽𝑝−1 𝑋𝑖,𝑝−1 + 𝜀𝑖
𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑖 = 1, … , 𝑛
𝜷𝟎 , 𝜷𝟏 , … , 𝜷𝒑−𝟏 : 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠
𝑿𝒊𝟏 , 𝑿𝒊𝟐 , … , 𝑿𝒊, 𝒑−𝟏 : 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 𝑖𝑛 𝑖 𝑡ℎ 𝑡𝑟𝑖𝑎𝑙 (𝑘𝑛𝑜𝑤𝑛 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠)
𝜺𝒊 : 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑁 0, 𝜎 2 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠
• The regression function for this model is:
𝑬 𝒀 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑝−1 𝑋𝑝−1
∴ The general linear model with Normal error terms implies that responses 𝒀𝒊 are independent
Normal random variables with :
• Mean: 𝐸(𝑌𝑖 ) = 𝛽0 + 𝛽1 𝑋𝑖1 + 𝛽2 𝑋𝑖2 + ⋯ + 𝛽𝑖,𝑝−1 𝑋𝑝−1
• (Constant) Variance: 𝜎 2
REVIEW: GENERAL LINEAR REGRESSION MODEL
• 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒑−𝟏 can be raised to higher powers (i.e., the predictors can be
squared or higher-order terms) and be nonadditive (have an interacting
effect). 𝒀𝒊 can be transformed.
(Note: “Linear” refers to the fact that the regression function is a linear
combination of the parameters 𝜷𝟎 , 𝜷𝟏 , … , 𝜷𝒑−𝟏 .)
Examples:
1. Polynomial regression models are general linear regression models.
(They yield a curvilinear response function.)
a) 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 2 + 𝜀𝑖
2
b) 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝛽2 𝑋𝑖 + 𝜀𝑖
Y
25
20
15
10
5
1
2
3
4
5
Xi
ADDITIVE EFFECTS VS INTERACTION EFFECTS
A regression model with 𝒑 − 𝟏 predictors contains additive effects if the response
(regression) function can be written as:
𝐸 𝑌 = 𝑓1 𝑋1 + 𝑓2 𝑋2 + ⋯ + 𝑓𝑝−1 𝑋𝑝−1
where 𝑓1 𝑋1 , 𝑓2 𝑋2 , … , 𝑓𝑝−1 𝑋𝑝−1 are functions of the predictors (can be simple or
complicated).
Examples (adapted from examples in the Chapter 6 notes):
𝑎) 𝐸 𝑌 = 𝛽0 + 𝛽1 𝑋1 2 (Additive)
𝑏) 𝐸 𝑌 = 𝛽0 + 𝛽1 𝑋1 2 + 𝛽2 𝑋1
(Additive)
𝒇 𝟏 𝑿𝟏
𝒇 𝟏 𝑿𝟏
Examples (adapted from examples in the Chapter 6 notes):
𝐸 𝑌 = 𝛽0 + 𝛽1 ln 𝑋++ 𝛽ถ
2 𝑋2
𝒇 𝟏 𝑿𝟏
AN
INTERACTION
TERM
𝒇 𝟐 𝑿𝟐
𝐸 𝑌 = 𝛽0 + 𝛽1 𝑋1 2 ++ 𝛽2 𝑋1 𝑋2
𝒇 𝟏 𝑿𝟏
(Additive)
(NOT Additive/contains an interaction effect)
𝒇 𝟐 𝑿𝟏 , 𝑿𝟐
• The cross-product term is an interaction term that may be called
linear-by-linear or bilinear.
• When we have interaction terms, we must change our interpretation
of the regression coefficients.
REVIEW: FIRST-ORDER (LINEAR) REGRESSION
MODEL WITH TWO PREDICTORS, X 1 AND X 2
Parameter:
Y intercept
Parameters:
Slope
Coefficients
Value of
Predictor
Variable on 𝑖 𝑡ℎ
trial
Parameter:
Random
Error term
Response in 𝑖 𝑡ℎ trial
Yi = β0 + β1 X i1 + β2 X i2 + εi
Yi = β0 + β1 X i + εi
Constant/Linear component
Random Error
component
REVIEW: FIRST-ORDER (LINEAR) REGRESSION
MODEL WITH TWO PREDICTORS : 𝑌𝑖
• 𝑌𝑖 is a random variable
• Assuming that 𝐸 𝜀𝑖 = 0, the regression function for this model is:
𝐸 𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2
• 𝐸 𝑌 = 𝛽0 + 𝛽1 𝑋1 is represented as a line. 𝐸 𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2
has an additional variable and must be represented by a plane, also called a
regression surface or response surface, (2-dimensional and one dimension higher
than a 1-dimensional line!).
OLD EXAMPLE:
𝐸 𝑌 = 5 + 1 1 𝑋1 + 3 𝑋2
REVIEW: REGRESSION COEFFICIENTS
Components Interpretation
For example:
𝜷𝟎
𝛽0 = 5 for our example
The y-intercept of the regression plane.
• If scope of model includes 𝑋1 = 0 and
𝑋2 = 0, it represents the mean
response 𝐸(𝑌|(𝑋1 = 0, 𝑋2 = 0)).
• Otherwise, it has no practical meaning.
𝜷𝟏
Change in the mean response 𝐸(𝑌|𝑋𝑖2 )
per unit increase in 𝑋1 (note:𝑋2 is held
constant).
e.g., mean response increases by 11
with a 1 unit increase of 𝑋1 when 𝑋2
is held constant.
If 𝑋2 =1 ,
𝐸(𝑌| 𝑋2 = 1 = 5 + 11𝑋1 + 3 1
= 8 + 11𝑋1
• This is a straight line with slope
11.
• y-intercept will change for each
𝑋2 level .
REVIEW: REGRESSION COEFFICIENTS
Components Interpretation
𝜷𝟐
Change in the mean response 𝐸(𝑌|𝑋𝑖1 )
per unit increase in 𝑋2 (note:𝑋1 is held
constant).
For example:
e.g., mean response increases by 3
with a 1 unit increase of 𝑋2 when 𝑋1
is held constant.
If 𝑋1 =4 ,
𝐸(𝑌| 𝑋1 = 4 = 5 + 11(4) + 3 1
= 49 + 3𝑋2
• This is a straight line with slope 3.
• y-intercept will change for each 𝑋1
level .
ALTERED INTERPRETATIONS OF
REGRESSION COEFFICIENTS
Consider a regression model for two quantitative predictors with linear effects on 𝑌 and
interacting effects of 𝑿𝟏 and 𝑿𝟐 on 𝑌 represented by a cross-product term:
Yi = β0 + β1 Xi1 + β2 X𝑖2 + 𝛽3 𝑋𝑖1 𝑋𝑖2 + εi
The regression function is:
𝐸(𝑌) = β0 + β1 X1 + β2 X2 + 𝛽3 𝑋1 𝑋2
•
𝛃𝟎 & 𝛃𝟏 will now have new interpretations and what they previously represented is shown by
a different term.
ALTERED INTERPRETATIONS OF
REGRESSION COEFFICIENTS
𝜷𝟏 + 𝜷𝟑 𝑿𝟐 Now represents the change in the mean
response (𝑬(𝒀|𝑿𝒊𝟐 ) ) with a 1 unit increase
in 𝐗 𝟏 when X2 is held constant.
Why?
𝜷𝟐 + 𝜷𝟑 𝑿𝟏 Now represents the change in the mean
response (𝑬(𝒀|𝑿𝒊𝟏 ) ) with a 1 unit increase in
𝐗 𝟐 when X1 is held constant.
Why?
𝝏𝑬(𝒀)
= 0 + 𝛽1 + 0 + 𝜷𝟑 𝑿𝟐
𝝏𝑿𝟏
𝜷 + 𝜷 𝟑 𝑿𝟐
= 𝟏
𝝏𝑬(𝒀)
= 0 + 0 + 𝛽2 + 𝛽3 𝑋1
𝝏𝑿𝟐
𝜷 + 𝜷 𝟑 𝑿𝟏
= 𝟐
Note: Both changes depend on the level of the other predictor.
EXAMPLE
In Chapter 6, we looked at 𝑬 𝒀 = 𝟓 + 𝟏𝟏𝑿𝟏 +
𝟑𝑿𝟐 .
Recall that the regression surface is a plane. At
any given 𝑋1 (or 𝑋2 ) level, 𝑬 𝒀 will be a line with
the same slope 𝜷𝟐 𝒐𝒓 𝜷𝟏 (i.e., they are
parallel) and a different intercept.
EXAMPLE
Now consider adding a cross-product term:
𝑬 𝒀 = 𝟓 + 𝟏𝟏𝑿𝟏 + 𝟑𝑿𝟐 +𝟏𝟑𝑿𝟏 𝑿𝟐
Here the lines representing 𝑬 𝒀 at different 𝑋1 (or 𝑋2 ) levels
will have different slopes (i.e., they are not parallel) and still
have different intercepts. The response surface is not a plane.
𝑬 𝒀|𝑿𝟏 = 𝟏 = 5 + 11 1 + 3𝑋2 +13 1 𝑋2 = 5 + 11 +
3 + 13 ∗ 1 𝑋2 = 𝟏𝟔 + 𝟏𝟔𝑿𝟐
𝑬 𝒀|𝑿𝟏 = 𝟐 = 5 + 11 2 + 3𝑋2 +13 2 𝑋2 = 5 + 22 +
3 + 13 ∗ 2 𝑋2 = 𝟐𝟕 + 𝟐𝟔𝑿𝟐
𝑬 𝒀|𝑿𝟐 = 𝟏 = 5 + 11𝑋1 + 3(1)+13𝑋1 (1) = 5 + 3 +
11 + 13 ∗ 1 𝑋1 = 𝟏𝟔 + 𝟏𝟔𝑿𝟏
𝑬 𝒀|𝑿𝟐 = 𝟐 = 5 + 11𝑋1 + 3(2)+13𝑋1 (2) = 5 + 6 +
11 + 13 ∗ 2 𝑋1 = 𝟐𝟕 + 𝟐𝟔𝑿𝟏
EXAMPLE
𝑬 𝒀|𝑿𝟏 = 𝟏 = 5 + 11 1 + 3𝑋2 +13 1 𝑋2 =
5 + 11 + 3 + 13 ∗ 1 𝑋2 = 𝟏𝟔 + 𝟏𝟔𝑿𝟐
𝑬 𝒀|𝑿𝟏 = 𝟐 = 5 + 11 2 + 3𝑋2 +13 2 𝑋2 =
5 + 22 + 3 + 13 ∗ 2 𝑋2 = 𝟐𝟕 + 𝟐𝟔𝑿𝟐
o Note that the slopes are
increasing as 𝑋1 (or 𝑋2 ) increases.
A 1 unit increase in 𝑿𝟐 (or 𝑋1 ) has
a larger effect on the response
when 𝑿𝟏 (or 𝑋2 ) are at higher
levels.
▪ This interaction effect
between 𝑋1 and 𝑋2 is said to be
synergistic (or of
reinforcement type) and
occurs when 𝛃𝟏 , 𝜷𝟐 & 𝛃𝟑 are
positive.
160
140
120
100
80
60
40
1
2
3
4
5
X2
EXAMPLE
𝑬 𝒀 = 𝟓 + 𝟏𝟏𝑿𝟏 + 𝟑𝑿𝟐 -3𝑿𝟏 𝑿𝟐
𝑬 𝒀|𝑿𝟏 = 𝟏 = 5 + 11 1 + 3𝑋2 -3 1 𝑋2 =
5 + 11 + 3 − 3 ∗ 1 𝑋2 = 𝟏𝟔
𝑬 𝒀|𝑿𝟏 = 𝟐 = 5 + 11 2 + 3𝑋2 -3 2 𝑋2 =
5 + 22 + 3 − 3 ∗ 2 𝑋2 = 𝟐𝟕 − 𝟑𝑿𝟐
If 𝛃𝟏 & 𝜷𝟐 are positive but 𝛃𝟑 is negative,
the interaction effect is said to be
antagonistic (or of interference type).
▪ This means that the slope, mean
change in response per unit
increase in a predictor (𝑋2 in
this case), will decrease for
higher levels of the other
predictor (𝑋1 in this case).
25
20
15
1
2
3
4
5
X2
Q U A L I TAT I V E
PREDICTORS:
DEFINING DUMMY
VA R I A B L E S
S TAT 4 1 5 / 6 1 5
MILLER
INDICATOR/DUMMY VARIABLES
We can incorporate qualitative variables, such as demographics and Yes/No formats, into
regression models.
o A common way to do this is by introducing indicator (or dummy) variables that take on a
value of 1 or 0.
**A qualitative variable with 𝒄 classes will be represented by 𝒄 − 𝟏 indicator/dummy
variables.
▪ Why 𝑐 − 1? Why not 𝑐?
Let’s say we have two predictors-one quantitative and one qualitative. If the qualitative variable
has two classes and we define two indicator/dummy variables based on it, we run into problems.
• The classes will be dependent, and we will see that when they are added as columns to
the 𝑿 matrix we get columns that are linearly dependent.
EXAMPLE
The references the “Copier Maintenance” from our text. Tricity Office Equipment Corporation sells franchised
copiers and performs preventative maintenance and repairs. Data was collected for 45 service calls. Here the
response (𝒀) is the total number of minutes spent on a service call is predicted by the number of copiers to be
serviced (𝑿𝟏 ) and the size of the copiers to be serviced-small or large.
Note: 17 calls involve small copiers.
1 𝑋11 1 0
1 𝑋21 0 1
1 𝑋31 0 1
1
𝑐𝑜𝑝𝑖𝑒𝑟 𝑖𝑠 𝑙𝑎𝑟𝑔𝑒
1
𝑐𝑜𝑝𝑖𝑒𝑟 𝑖𝑠 𝑠𝑚𝑎𝑙𝑙
If 𝑋𝑖2 = ቊ
and 𝑋𝑖3 = ቊ
then 𝑿 =
.
⋮
⋮ ⋮
⋮
0
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
0
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
1 𝑋44,1 0 1
1 𝑋45,1 0 1
Notice that the sum of the 3rd and 4th columns=the first column. This means that the columns are linearly
dependent.
Recall: If a matrix has linearly dependent columns, it is singular, and thus has determinant=0 and is not
invertible.
EXAMPLE
Since 𝑿 has linearly dependent columns, we also get that 𝑿𝑻 𝑿 has linearly dependent columns.
1
1
𝑋
𝑋21
𝑿𝑻 𝑿 = 11
1 0
0 1
σ45
σ45
𝑖=1 1
𝑖=1 𝑋𝑖1
σ45
𝑋𝑖1 σ45
𝑋𝑖1 2
𝑖=1
𝑖=1
=
#𝑠𝑚𝑎𝑙𝑙 σ𝑠𝑚𝑎𝑙𝑙 𝑋𝑖1
#𝑙𝑎𝑟𝑔𝑒 σ𝑙𝑎𝑟𝑔𝑒 𝑋𝑖1
#𝑠𝑚𝑎𝑙𝑙
σ𝑠𝑚𝑎𝑙𝑙 𝑋𝑖1
#𝑠𝑚𝑎𝑙𝑙
0
1
𝑋31
0
1
#𝑙𝑎𝑟𝑔𝑒
σ𝑙𝑎𝑟𝑔𝑒 𝑋𝑖1
0
#𝑙𝑎𝑟𝑔𝑒
… 1
… 𝑋44,1
… 0
… 1
1
1 1
𝑋45,1 1
0 ⋮
1 1
1
σ45
45
𝑖=1 𝑋𝑖1
σ45
𝑋𝑖1 σ45
𝑋𝑖1 2
𝑖=1
𝑖=1
=
17 σ𝑠𝑚𝑎𝑙𝑙 𝑋𝑖1
28 σ𝑙𝑎𝑟𝑔𝑒 𝑋𝑖1
𝑋11
𝑋21
𝑋31
⋮
𝑋44,1
𝑋45,1
1
0
0
⋮
0
0
0
1
1
⋮
1
1
17
28
σ𝑠𝑚𝑎𝑙𝑙 𝑋𝑖1
σ𝑙𝑎𝑟𝑔𝑒 𝑋𝑖1
17
0
0
28
Once again, the 1st column is the sum of the 3rd and 4th. Therefore, 𝑿𝑻 𝑿 is singular and not invertible. Recall
−1
that 𝒃 = 𝑿𝑻 𝑿
𝑿𝑻 𝒀. If 𝑿𝑻 𝑿
coefficients of this model.
−1
does not exist, we are unable to find estimators for the regression
EXAMPLE
Solution? Drop one of the dummy variables and use the regression model
𝐘𝐢 = β0 + β1 Xi1 + β2 X𝑖2 + εi where Xi1 is the quantitative variable ( number of copiers in this case) and Xi2 is
the dummy variable, in this case
𝑋𝑖2 = ቊ
1
0
𝑐𝑜𝑝𝑖𝑒𝑟 𝑖𝑠 𝑠𝑚𝑎𝑙𝑙
and the regression function is 𝑬(𝒀) = β0 + β1 X1 + β2 X2 .
𝑐𝑜𝑝𝑖𝑒𝑟 𝑖𝑠 𝑙𝑎𝑟𝑔𝑒
Q U A L I TAT I V E
PREDICTORS:
INTERPRETING THE
REGRESSION
COEFFICIENTS
S TAT 4 1 5 / 6 1 5
MILLER
DUMMY VARIABLES: INTERPRETING
REGRESSION COEFFICIENTS
Considering our model with one quantitative predictor and one dummy variable.
If 𝑿𝟐 = 𝟏 (“copier is small” in our example), 𝑬 𝒀|𝑿𝟐 = 𝟏 = 𝛽0 + β1 X1 + β2 1 =
𝛽0 + β1 X1 + β2 = 𝜷𝟎 + 𝛃𝟐 + 𝛃𝟏 𝐗 𝟏 .
If 𝑿𝟐 = 𝟎 (“copier is large” in our example), 𝑬 𝒀|𝑿𝟐 = 𝟎 = 𝛽0 + β1 X1 + β2 0 =
𝜷𝟎 + 𝛃𝟏 𝐗 𝟏 .
• In either case, we have a straight line with slope=𝛃𝟏 .
Example: The mean minutes spent on a service call is a linear function of
the number of copiers to be serviced with slope 𝛃𝟏 . 𝛃𝟏 represents the
average change in service time per 1 unit increase in number of copiers (𝐗 𝟏 ),
given that the size of the copier (𝐗 𝟐 ) is held constant.
DUMMY VARIABLES:
INTERPRETING REGRESSION
COEFFICIENTS
•
The intercepts,𝜷𝟎 + 𝛃𝟐 versus 𝜷𝟎 , differ by 𝛃𝟐 .
Example: 𝛃𝟐 indicates how much longer (or
shorter) the mean service time is for calls
involving large copiers as compared to small
copiers, for any given number of copiers to be
repaired.
In general, 𝛃𝟐 indicates how much higher (or
lower) the mean response line is for the class
coded 1 than the line for the class coded 0, for
any given level of 𝐗 𝟏 .
EXAMPLE
• The plot below shows the least-squares regression line for predicting the service time based on the
number of copiers to be serviced.
= −𝟎. 𝟓𝟖𝟎𝟐 + 𝟏𝟓. 𝟎𝟑𝟓𝟐𝑿𝟏 .
• The estimated regression function is 𝒀
Call:
lm(formula = time ~ number)
Coefficients:
(Intercept)
number
-0.5802
15.0352
EXAMPLE
• The new plot below shows the estimated regression functions for small (in red) and large (in black) copiers
separately.
• Both regression functions are based on the estimated regression function:
= −𝟎. 𝟗𝟐𝟐𝟓 + 𝟏𝟓. 𝟎𝟒𝟔𝟏𝑿𝟏 + 𝟎. 𝟕𝟓𝟖𝟕𝑿𝟐
𝒀
Call:
lm(formula = time ~ number + size)
Coefficients:
(Intercept)
-0.9225
number
15.0461
size
0.7587
EXAMPLE
(Intercept)
number
size
2.5 %
97.5 %
-7.177891 5.332945
14.057283 16.035004
-4.851254 6.368698
• Above we see the 95% confidence intervals for the regression parameters/coefficients.
• The 95% confidence interval for 𝜷𝟐 is:
−𝟒. 𝟖𝟓𝟏 ≤ 𝜷𝟐 ≤ 𝟔. 𝟑𝟔𝟗
• We are 95% confident that the average difference in service time for any given
number of copiers may be somewhere between calls involving small copiers
taking almost 5 minutes less and over 6 minutes more. Generally, we do not
expect service times to be drastically different for calls involving small copiers and
calls involving large copiers.
• Note: 0 falls within this interval.
EXAMPLE
For testing 𝑯𝟎 : 𝜷𝟐 = 𝟎 𝒗𝒔. 𝑯𝑨 : 𝜷𝟐 ≠ 𝟎 we can use a t test or F test (as seen in the previous chapters).
Call:
lm(formula = time ~ number + size)
Analysis of Variance Table
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.9225 3.0997 -0.298 0.767
number
15.0461 0.4900 30.706 F)
1 43 3416.4
2 42 3410.3 1 6.0488
0.0745 0.7862
• The p-value=0.786. We fail to reject that 𝜷𝟐 = 𝟎 .
• We may consider dropping , as the size of the copier does not seem to have a large effect
on service time.
• This coincides with the plots shown earlier (the original regression model with just
number of copiers as a predictor fits well overall and the separate regression functions
for small and large copiers based on the multiple regression model are similar.
• We also saw that 0 DID fall within the 95% confidence interval for 𝜷𝟐 .
REASON FOR USING A MODEL WITH A
DUMMY VARIABLE
Why use regression function with a dummy variable instead of two separate regression functions for small and
large copiers?
Our model assumes equal slopes and same constant error term variance for 𝑋2 = 1 and 𝑋2 = 0.
The common slope can be best estimated by pooling small and large copiers.
Inferences can made more precisely.
The model we just discussed can be adjusted to include more dummy variables.
𝐘𝐢 = β0 + β1 Xi1 + β2 X𝑖2 + ⋯ + β𝑐 X𝑖𝑐 + εi
for 𝒄 − 𝟏 dummy variables to represent a qualitative variable with 𝒄 classes.
Q U A L I TAT I V E
PREDICTORS:
C O N S I D E R AT I O N S
S TAT 4 1 5 / 6 1 5
MILLER
ALTERNATIVES FOR CODING INDICATOR
VARIABLES
1. Using 1 & -1 rather than 1 & 0.
Example: 𝑋𝑖2 = ቊ
1
−1
𝑐𝑜𝑝𝑖𝑒𝑟 𝑖𝑠 𝑠𝑚𝑎𝑙𝑙
𝑐𝑜𝑝𝑖𝑒𝑟 𝑖𝑠 𝑙𝑎𝑟𝑔𝑒
If “copier is small”, 𝑬 𝒀|𝑿𝟐 = 𝟏 = 𝛽0 + β1 X1 + β2 1 = 𝛽0 + β1 X1 + β2 = 𝜷𝟎 + 𝛃𝟐 + 𝛃𝟏 𝐗 𝟏.
If “copier is large”, 𝑬 𝒀|𝑿𝟐 = −𝟏 = 𝛽0 + β1 X1 + β2 −1 = 𝜷𝟎 − 𝛃𝟐 + 𝛃𝟏 𝐗 𝟏 .
•
•
In either case, we still have a straight line with slope=𝛃𝟏 .
The intercepts,𝜷𝟎 + 𝛃𝟐 versus 𝜷𝟎 − 𝛃𝟐 , have an average of 𝛃𝟎 .
Example: 𝛃𝟐 indicates how much the intercepts for small and large copiers differ from the
average intercept of 𝛃𝟎 .
ALTERNATIVES FOR CODING INDICATOR
VARIABLES
2. Use 𝒄 dummy variables for a qualitative variable with 𝒄 classes and drop the
intercept.
Our model with one quantitative variable and one qualitative predictor becomes:
𝐘𝐢 = β1 Xi1 + β2 X𝑖2 + ⋯ + β𝑐+1 X𝑖,𝑐+1 + εi
Example: If the qualitative predictor has two classes, such as in our Copier
Maintenance example.
1
1
𝑐𝑜𝑝𝑖𝑒𝑟 𝑖𝑠 𝑠𝑚𝑎𝑙𝑙
and 𝑋𝑖3 = ቊ
0
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
0
functions are:
If 𝑋𝑖2 = ቊ
𝑐𝑜𝑝𝑖𝑒𝑟 𝑖𝑠 𝑙𝑎𝑟𝑔𝑒
the response
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
If “copier is small”, 𝑬 𝒀|𝑿𝟐 = 𝟏, 𝑿𝟑 = 𝟎 = β1 X1 + β2 1 + β3 0 = 𝜷𝟎 + 𝛃𝟏 𝐗 𝟏
If “copier is large”, 𝑬 𝒀|𝑿𝟐 = 𝟎, 𝑿𝟑 = 𝟏 = β1 X1 + β2 0 + β3 1 = 𝜷𝟑 + 𝛃𝟏 𝐗 𝟏 .
ALTERNATIVES FOR CODING INDICATOR
VARIABLES
**Note: Other types of coding involve:
• Using allocated codes, such as a satisfaction scale.
• Coding a quantitative variable.
MODELING
INTERACTIONS
BETWEEN
Q U A N T I TAT I V E &
Q U A L I TAT I V E
PREDICTORS
S TAT 4 1 5 / 6 1 5
MILLER
INTERACTIVE MODEL FOR ONE
QUANTITATIVE VARIABLE AND ONE
DUMMY VARIABLE
To consider the possibility of interaction effects between a quantitative variable 𝑿𝒊𝟏 and
qualitative variable with two classes:
𝐘𝐢 = β0 + β1 Xi1 + β2 X𝑖2 + 𝛽3 𝑋𝑖1 𝑋𝑖2 + εi
and the regression function is:
𝑬(𝒀) = β0 + β1 X1 + β2 X2 + 𝛽3 𝑋1 𝑋2
Example: Xi1 is the quantitative variable ( number of copiers in this case) and Xi2 is the dummy variable, in this case
𝑋𝑖2 = ቊ
1
0
𝑐𝑜𝑝𝑖𝑒𝑟 𝑖𝑠 𝑠𝑚𝑎𝑙𝑙
.
𝑐𝑜𝑝𝑖𝑒𝑟 𝑖𝑠 𝑙𝑎𝑟𝑔𝑒
If “copier is small”, 𝑬 𝒀|𝑿𝟐 = 𝟏 = 𝛽0 + β1 X1 + β2 1 + 𝛽3 𝑋1 1 = 𝛽0 + β1 X1 + β2 + +𝛽3 𝑋1 =
𝜷𝟎 + 𝛃𝟐 + (𝛃𝟏 +𝜷𝟑 )𝐗 𝟏 .
If “copier is large” in our example), 𝑬 𝒀|𝑿𝟐 = 𝟎 = 𝛽0 + β1 X1 + β2 0 + 𝛽3 𝑋1 0 = 𝜷𝟎 + 𝛃𝟏 𝐗 𝟏 .
•
Now 𝛃𝟐 shows how much larger (or smaller) the y-intercept of the response (service time in minutes in this case) is
for the class coded 1 (small copiers here) than for the class coded 0 (large copiers here).
o This difference is only for the y-intercept and no longer for any given 𝐗 𝟏 level since the slopes differ.
o Effect the size of the copier with this regression model
has depends on the number of copiers to be serviced.
Interaction effects are present.
Possible scenario:
For a smaller number of copiers, smaller copiers
take longer to service,
but for a larger number of copiers
large copiers take longer to service.
This type of interaction is called disordinal.
Here large copiers always tend to take less time to service, but the effect is much smaller for a larger
number of copiers.
In this case (when nonparallel response functions with one quantitative and one qualitative variable) do not
intersect we say that the interaction is ordinal.
EXAMPLE
• Now we have the estimated regression function based on the multiple regression model with an added
interaction (cross-product) term:
= 𝟐. 𝟖𝟏𝟑 + 𝟏𝟒. 𝟑𝟑𝟗𝑿𝟏 − 𝟖. 𝟏𝟒𝟏𝑿𝟐 + 1.777𝑿𝟏 𝑿𝟐
𝒀
Call:
lm(formula = time ~ number * size)
Coefficients:
(Intercept)
number
2.813
14.339
size number:size
-8.141
1.777
EXAMPLE
For testing 𝑯𝟎 : 𝜷𝟑 = 𝟎 𝒗𝒔. 𝑯𝑨 : 𝜷𝟑 ≠ 𝟎 we can, again, use a t test or F test (as seen in the previous
chapters).
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.8131 3.6468 0.771 0.4449
number
14.3394 0.6146 23.333 F)
1
42 3410.3
2
41 3154.4 1 255.89 3.326 0.07549
• The p-value=0.0755. We can only reject that 𝜷𝟑 = 𝟎 at levels 0.0755 and above.
• We do not have enough evidence to support that there is an interaction effect and may
keep our model with no interaction term.
• In fact, from all of our analysis we may just use “number” as a predictor of the time of
the service call.
MODEL SELECTION &
V A L I D AT I O N :
INTRODUCTION TO
THE MODELBUILDING PROCESS
S TAT 4 1 5 / 6 1 5
MILLER
GENERAL MODEL-BUILDING PROCESS
1. Data collection and prep
2. Reduction of predictor variables
(for exploratory observational studies only)
3. Model Reinforcement and selection
4. Model validation
Purchase answer to see full
attachment