problem set 6

User Generated

fnhq766

Mathematics

Description

I have uploaded two files please do all the questions correctly.






Unformatted Attachment Preview

ECON 3100 PROBLEM SET 6 Textbook questions: Chapter 12: 27, 31, 33, 35, 61, 63 Statistical investigation: This exercise draws on the dataset “companies,” available in JMP under “sample data setsBusiness and Demographic” and on the course website. The goal is for you to learn to incorporate qualitative variables into multiple regression analysis and to interpret the results appropriately. We will analyze the profits of a set of companies in relationship to those companies’ sales, number of employees, and industry. 1. Run a regression with profits as the dependent variable, sales and the number of employees as the independent variables. Call this Model A, and include the regression output in your problem set. 2. Conduct an F-test at a 5% significance level as to whether there is a useful relationship between the dependent variable and the independent variables of Model A. 3. What are the r-squared and the adjusted r-squared of Model A? Interpret each of these values. 4. Create a dummy variable that that takes the value 1 if the company is in the computer industry, 0 if the company is in the pharmaceutical industry. 5. Run a regression with profits as the dependent variable, sale, the number of employees, and the dummy variable for “computer industry” as the independent variables. Call this Model B, and include the regression output in your problem set. 6. Adjusted for the number of variables in the model, which model explains the greater share of variation in profits, Model A or Model B? 7. Interpret the coefficient on the dummy variable for computer industry in model B. 8. Does industry have a statistically significant relationship to a company’s profits? Conduct a t-test at the 5% significance level as to whether the coefficient on the dummy variable differs from 0. 9. What is the predicted level of profits for a company in the computer industry with median sales and a median number of employees for that industry? 10. What is the predicted level of profits for a company in the pharmaceutical industry with median sales and a median level of employees for that industry? 1 11. Are there any outliers in the data set? Calculate the standardized (“studentized”) residual of each observation and identify outliers as any observations with a studentized residual greater than 3 in absolute value. Which observation has the largest standardized residual (in absolute value). 12. Do any of the observations have unusual influence on the results (high leverage)? Use Cook’s distance measure to identify variables with strong leverage. Everyday statistics The New Yorker article “Measure for Measure” that follows the problem set tells the story of Francis Galton, the eccentric 19th century scholar who gave us the terms “correlation” and “regression.” 1. Draw a rough sketch of Galton’s graph of children’s height (y) “regressed” on parental height (x). What did Galton find to be the slope of the relationship between the two? 2. Interpret the slope coefficient in question one. 3. Explain carefully why a baseball player who had an extremely good year (batting better than .300 for example), is likely to do worse the following year. 4. What is Galton’s fallacy? Does the author of the article believe that Galton himself committed Galton’s fallacy? 5. Based on the concept of regression to the mean, suppose a student gets an extremely high score on a midterm exam. Should he/she expect to do worse, the same, or better on the final exam? Explain your answer. 2 27. Below is a table showing some of the results from the multiple regression analysis described in Exercise 15. Here we were attempting to link winning percentage (y) to daily practice time (x1), average speed of first serve (x2), and first serve percentage (x3) using data from a group of 12 professional tennis players who were monitored over the past year. Build a 95% confidence interval estimate of each of the regression coefficients β1, β2 and β3. 31. The following output is from a study attempting to link monthly entertainment expenditures (y) for women aged 25 to 50 to two independent variables: age (x1) and monthly income (x2). Fill in the missing values. 33. Milton-Maxwell Foods is attempting to use regression analysis to identify factors which might explain the behavior of sales revenue (measured in $ millions) for its franchise restaurants. Dummy variable x5 has been included in the regression model to indicate whether a particular restaurant has curb service. This variable will be assigned a value of 0 if the restaurant has no curb service, and a value of 1 if the restaurant has curb service. Suppose the estimated regression coefficient for x5turns out to be .136. Interpret this result. 3 35. Gibson Products is using multiple regression analysis to try to relate a set of independent variables to the number of daily customer inquiries the company receives on its website. Gibson wants to include “season of the year” as one of the variables in the model. Since “season of the year” has four categories—summer, fall, winter, and spring—Gibson has defined three dummy variables, x5, x6, and x7, and assigned values as follows: Computer output for the analysis provides the following statistically significant coefficients for the three variables: b5 = 217, b6 = −335 and b7 = 564. Using these coefficients and assuming that all the other variables in the model are held constant, 1. what is the predicted difference in the number of inquiries would you expect for fall days versus summer days? 2. what is the predicted difference in the number of inquiries would you expect for winter days versus summer days? 3. what is the predicted difference in the number of inquiries would you expect between spring days and winter days? 61. You are overseeing a study of cigarette smoking in the countries of Southeast Asia. As part of the study you plan to conduct a multiple linear regression analysis to try to explain the variation in smoking rates among the countries, using as the dependent variable, y, the current smoking rate for adult men over the age of 40. The independent variables are x1, the average age at which smokers start smoking, and x2, per-capita government spending on anti-smoking campaigns during the past 5 years. The following data are available: The analysis produced the estimated regression equation below. (Note: the coefficients have been rounded slightly.) a. Interpret the coefficients. b. Calculate the missing values in the following tables: 4 1. Does the sample evidence show a useful linear relationship (at the 5% significance level) between the dependent variable, y, and the independent variables x1 and x2? Explain. 63. You have just completed a regression study in which you are attempting to link weekly sales of SeaFarer's new sun block spray to three factors: price, advertising and use of a special point-of-purchase promotion. You used a dummy variable, x3, to represent the promotion factor—assigning x3 a value of 0 if no in-store promotion was used and a value of 1 if the promotion was used. Output for the study is given below: 5 Companies.jmp Notes Selected Data on t Sales ($M) 855.1 Profits ($M) 31.0 # Employ 7523 profit/ emp 4120.70 Assets 615.2 %profit sales 3.63 859.8 4851.6 15.77 5453.5 2153.7 153.0 7.10 1102.2 40929 21007.11 8200 18658.54 50816 21690.02 12068 37620.15 54100 13807.76 6747.0 5284.0 2233.7 5681.5 2743.9 16.34 454.0 8.59 747.0 8497.0 7.93 9422.0 2876.1 333.3 9500 35084.21 2090.4 11.59 709.3 41.4 5000 8280.00 5.84 468.1 1860.7 2952.1 -680.4 -37800.0 -23.05 18000 4708 784.7 89.0 18903.99 955.8 11.34 1324.3 -119.7 13740 -8711.79 -9.04 1040.2 5848.0 4175.6 939.5 28200 22.50 33315.60 8726.32 11899.0 95000 10075.0 6.97 873.6 829.0 79.5 1082.0 8200 9695.12 808.0 9.10 9844.0 83100 10.99 Columns (8/0) L Type L Size Co + Sales ($M) Profits ($M) # Employ profit/emp + Assets %profit/sales Type Size Co 1 Computer small 2 Pharmaceutical big 3 Computer small 4 Pharmaceutical big 5 Computer small 6 Pharmaceutical big 7 Computer small 8 Computer small 9 Computer small 10 Computer small 11 Computer small 12 Pharmaceutical medium 13 Computer big 14 Computer small 15 Pharmaceutical big 16 Pharmaceutical small 17 Pharmaceutical medium 18 Computer big 19 Pharmaceutical big 20 Computer medium 21 Pharmaceutical small 22 Computer small 23 Pharmaceutical medium 24 Computer small 25 Computer small 26 Computer small 27 Computer small 28 Computer big 29 Pharmaceutical medium 30 Computer medium 31 Pharmaceutical medium 32 Computer big 13020.46 66530.13 969.2 227.4 3418 7919.0 784.0 6756.7 23.46 6698.4 1495.4 34400 43470.93 22.32 5956.0 56000 7357.14 4500.0 6.92 5903.7 412.0 681.1 252.8 42100 16178.15 8324.8 11.54 2959.3 31404 8049.93 5611.1 8.54 1198.3 86.5 8527 10144.25 1791.7 7.22 990.5 20.9 624.3 2.11 471.3 3613.5 14.53 3243.0 1382.3 8578 2436.47 21300 22126.76 2900 103.45 9100 5241.76 0.3 1076.8 0.02 4.70 1014.0 47.7 977.0 1769.2 60.8 10200 5960.78 1269.1 3.44 1643.9 118.3 9548 12390.03 1618.8 7.20 -639.3 82300 -7767.92 10751.0 Rows All rows Selected Excluded Hidden Labelled 32 0 0 0 0 1096.9 2916.3 -58.28 6.04 176.0 8756.22 3246.9 20100 28334 3078.4 -424.3 -14974.9 -13.78 4272.0 412.7 33000 12506.06 9806.38 2725.7 3051.6 77734.0 9.66 5.92 63438.0 3758.0 383220
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Attached.

Running head: ECON 3100

Econ 3100
Student’s Name:
Instructor’s Name:
Institutional Affiliation:

ECON 3100

2
Problem set 6

Statistical investigation:
This exercise draws on the dataset “companies,” available in JMP under “sample data setsBusiness and Demographic” and on the course website. The goal is for you to learn to
incorporate qualitative variables into multiple regression analysis and to interpret the results
appropriately. We will analyze the profits of a set of companies in relationship to those
companies’ sales, number of employees, and industry.
1. Run a regression with profits as the dependent variable, sales and the number of
employees as the independent variables. Call this Model A, and include the
regression output in your problem set.
Coefficientsa
Model

Unstandardized

Standardized

Coefficients

Coefficients

B
(Constant)
1

sales
employees

Std. Error

111.283

78.031

.104

.027

-.007

.004

t

Sig.

Beta
1.426

.165

1.469

3.906

.001

-.611

-1.624

.115

a. Dependent Variable: profits

2. Conduct an F-test at a 5% significance level as to whether there is a useful
relationship between the dependent variable and the independent variables of
Model A.
ANOVAa
2

ECON 3100

3

Model

Sum of

df

Mean

Squares

Square

14805831.31

7402915.65

Regression

2

52.768

8
1

F

Residual

4068454.211

Sig.

.000b

9
29

140291.525

18874285.52
Total

31
9

a. Dependent Variable: profits
b. Predictors: (Constant), employees, sales
Fcalculated = 52.768
Fcritical = Fk-1, n-k, s.f
F = 2-1, 32-2,5%
F1,30,5%
Fcritical = 4.17
Inference
Fcalc > Fcritical
52.768 > 4.17
Since the calculated F is greater than the critical F, the null hypothesis which says that the model
is not significant is rejected. The conclusion is that the overall model is statistically significant.
3. What are the r-squared and the adjusted r-squared of Model A? Interpret each of these
values.
Model Summaryb

3

ECON 3100
Mode

4
R

R Square

l
1

.886a

.784

Adjusted R

Std. Error of

Square

the Estimate

.770

374.555

a. Predictors: (Constant), employees, sales
b. Dependent Variable: profits
From the output, the r-squared is 0.784 while the adjusted R square is 0.770. The R-squared is
also known as coefficient of determination and it is a measure of goodness of fit. R-square
measures how close the data fits the regression line and it measures the strength of the
relationship between the model and the dependent variable. The higher the r-square, the better
the model fits your data (Darlington & Hayes, 2016). From the above model, the r-squared is
0.784 implying that the model explains 78.4% of the data variation around its mean. The
adjusted r-square on the other hand is computed by dividing the residual mean square error by
the total mean square error after which the result is subtracted from 1. The adjusted r2 explains
the variation when adjusted to the degrees of freedom. It shows the descriptive power of
regression models which take into account the diverse number of predictors. The adjusted r2
accounts for the percentage of variation explained by only the independent variables which
actually affect the dependent variable (Darlington & Hayes, 2016). From the model, the adjusted
r-square is equal to 0.770 implying that the model explains 77% of the variation of data around
the mean when adjusted to the degrees of freedom.

4

ECON 3100

5

4. Create a dummy variable that that takes the value 1 if the company is in the computer
industry, 0 if the company is in the pharmaceutical industry.
Model Summaryb
Mode

R

R Square

Adjusted R

Std. Error of

Square

the Estimate

l
.935a

1

.874

.860

291.612

a. Predictors: (Constant), dummy, sales, employees
b. Dependent Variable: profits
5. Run a regression with profits as the dependent variable, sale, the number of employees,
and the dummy variable for “computer industry” as the independent variables. Call this
Model B, and include the regression output in your problem set.
Model B
Coefficientsa
Model

Unstandardized

Standardized

Coefficients

Coefficients

B
(Constant)
sales

Std. Error

400.713

88.951

.100

.021

-.006
-475.169

t

Sig.

Beta
4.505

.000

1.409

4.806

.000

.003

-.541

-1.845

.076

106.671

-.300

-4.455

.000

1
employees
dummy

a. Dependent Variable: profits
6. Adjusted for the number of variables in the model, which model explains the greater
share of variation in profits, Model A or Model B?
5

ECON 3100

6
Model A
Model Summaryb

Mode

R

R Square

l
1

.886a

Adjusted R

Std. Error of

Square

the Estimate

.784

.770

374.555

a. Predictors: (Constant), employees, sales
b. Dependent Variable: profits
Model B
Model Summaryb
Mode

R

R Square

l
1

.935a

Adjusted R

Std. Error of

Square

the Estimate

.874

.860

291.612

a. Predictors: (Constant), dummy, sales, employees
b. Dependent Variable: profits
The r-square and adjusted r-square of Model B is higher than that of Model A and hence it the
model that explains better the share of variation in profits.
7. Interpret the coefficient on the dummy variable for computer industry in model B.
Y = a + bx
Profits = Sales ($M) + #employ + Type (computer=1, pharmaceutical=0)
Profits = 400.713 + 0.1Sales ($M) -0.006#Employ- 475.169Type
For the computer industry, value = 1
Profits = 400.713 + 0.1Sales ($M) -0.006#Employ- 475.169(1)
Profits = 400.713 + 0.1Sales ($M) -0.006#Employ- 475.169
6

ECON 3100

7

Profits = 400.713 + 0.100Sales ($M) -0.006#Employ- 475.169
Profits = -74.4567 + 0.100Sales ($M) -0.006#Employ
The coefficient on...


Anonymous
Very useful material for studying!

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags