### Unformatted Attachment Preview

Data Analytics Research Project – Executive Compensation Model
Status: Required
Due Date: May 1, 2020
Project Goals
The main goal of this project is to help students to apply quantitative analysis techniques including
statistical methods particularly predictive analytics to an empirical analyze and predict the
relationship between the executive compensation and a set of covariates that affect such
compensation. Each student is expected to analyze and predict the relationship, draw conclusions,
make policy recommendations and compose a written report.
Learning objectives
Upon completing this research project, the student will be able to:
•
Describe the relationship between executive compensation and factors that affect it;
•
Develop a regression model to analyze the relationship between executive compensation
and a set of covariates;
•
Collect relevant data and apply data analytics to describe empirically the relationship;
•
Use predetermined values of the covariates to predict the executive compensation in the
future;
•
Draw informed conclusions.
Problem Statement
The data used to analyze this data analytics project is obtained from the Warton Research Data
Services (WRDS). The Compustat Execucomp database provided executive compensation data
collected directly from each company’s annual proxy (DEF14A SEC form). Detailed information
on salary, bonus, options and stock awards, non-equity incentive plans, pensions and other
compensation items are available on annual basis since 1992 (see AnnComp – Summary
compensation data + other). The data in each table contains additional header information on
company IDs and individual identification.
The data to be used is provided in the attached excel format.
Resources
The following are resources you can use to improve knowledge about regression analysis.
•
You can obtain additional knowledge on statistical analysis and the use of the SPSS
software by completing the Statistics 101 on IBM Cognitive class using the following link
https://cognitiveclass.ai/courses/statistics-101/
•
Simply create an account using your BSU email and complete the course. You will learn,
obtain a certificate. Your certificate of completion must be attached to your report.
•
https://www.youtube.com/watch?reload=9&v=fO7g0pnWaRA
•
https://www.tableau.com/learn/training
Assignment
Task 1 – Data Cleaning
The data is not provided in a suitable state. It is necessary to get the data into a proper form that
supports your analysis, that is, you are expected to make it ready for the analysis by cleaning and
reconciling it using Excel or Tableau. Each student is expected to use data of a single state.
Task 2 – Data Visualization
You are expected to create various visualizations using Excel or Tableau to detect the
relationship between the executive compensation and each of its covariate.
Task 3 – Predictive Analytics
Develop and estimate a multiple regression model to determine the relationship between
executive compensation and its covariates. The equation to be estimated is specified below.
𝐓𝐎𝐓𝐀𝐋_𝐂𝐔𝐑𝐑 = 𝜷𝟎 + 𝜷𝟏 𝑨𝑮𝑬 + 𝜷𝟐 𝑮𝑬𝑵𝑫𝑬𝑹 + 𝜷𝟑 𝐒𝐇𝐑𝐎𝐖𝐍𝐓𝐎𝐓 + 𝜷𝟒 𝐄𝐗𝐄𝐂𝐑𝐀𝐍𝐊 + 𝛆
where:
TOTAL_CURR is Total Current Compensation (Salary + Bonus);
AGE is executive's age;
GENDER is gender;
EXECRANK is current rank by salary + bonus; and
𝜀 is the random error.
Assess the quality of your results of estimation in term of the fitness of the regression model (i.e.
R-squared and standard deviation) and the hypothesis test on each estimated coefficient (using Fstat or p-value or t-stat). Also, interpret the results of estimation.
Task 4 – Report
Write a paper to present the results of your analysis and make policy recommendations for the
determination of the salaries and bonuses of the executive personnel.
Each project must include at least 5 pages excluding title page, cover page and references. It will
be written using the following guidelines and contents:
•
Title page (Include project title and student name) (5%)
•
Introduction: Problem of the proposed study, purpose and justification of the study (15%)
•
Data analytics – various calculations and estimations (45%)
•
Interpretation of results (15%)
•
Findings and conclusion. (10%)
•
Appendices: Tables and Figures. (5%)
•
References (5%)
The project will be written using the APA style. (https://apastyle.apa.org/index)
Citation instructions
To cite this data, use the following format:
Wharton Research Data Services. " Compustat Execucomp data (Compustat report)"
wrds.wharton.upenn.edu, accessed 04/08/2020.
Variable Name
Data Type
Variable Description
AGE
NUM
AGE -- Executive's Age
EXECRANK
NUM
Current Rank by Salary + Bonus
EXECRANKANN NUM
EXECRANKANN -- Executive Rank by Salary + Bonus
GENDER
CHAR
Gender
GVKEY
CHAR
Company ID Number
JOINED_CO
NUM
Date Joined Company
LEFTCO
NUM
Date Left Company
PAGE
NUM
Present Age
SALARY
NUM
SALARY -- Salary ($)
SHROWN_TOT
NUM
SHROWN_TOT -- Shares Owned - As Reported
STATE
CHAR
State
TOTAL_CURR
NUM
TOTAL_CURR -- Total Current Compensation (Salary +
Bonus)
YEAR
NUM
YEAR -- Fiscal Year
1
Quantitative Analysis Project – Data Analytics
2
INTRODUCTION
There is a new real estate company in Bowie Maryland known as ABC Reality that was
founded by Mary Clay. Her goal is to determine what the relationship is between the price of
houses that are currently listed and the number of stories, bedrooms, bathrooms in the house, and
the age of the house. The data that Mary has acquired to begin this research is a huge set of data
that was collected in 2017. This source of data was obtained from a 2017 American House
Survey. To do this research, Mary will need to develop a hedonic pricing model with the help of
SPSS. Visualizations will also be created from her data through Tableau.
PART I: Data Visualizations (Tableau)
The first graph above shows that as the number of stories within a house increase, the
average market value increases as well. So, the more stories that are in a house the higher the
price will be to buy it. The second graph shows that as the age of the house increase, the average
market value decreases. So, the older the house gets the lower the price is to buy it. Also, this
second graph shows that basically after about ten years there is a huge drop in price for the
3
houses. This could possibly be that it is outdated after that amount of years, so it loses its value
and not a lot of buyers are willing to buy a house that is outdated. An article states that “A few
changes, such as replacing the fixtures, choosing modern paint colors, and installing newer
appliances, are often enough to generate interest at a better sales price. Otherwise, buyers will
generally not agree to pay top-of-market for an outdated home, even if it is newer and clean
(Weintraub, 2019).”
The first graph above shows that as the number of bedrooms increase, the average market
value increases as well. So basically, the more bedrooms there are in a house the higher it will be
in price. The second graph above shows that as the number of bathrooms increase, the average
market value increases as well. This is the same concept as the bedrooms because the more
bathrooms that are in a house the higher it will be in price. This second graph also shows that
there is a decrease in price after 6 bathrooms up until about 9 and then it starts to increase again.
This could be because the average maximum for bathrooms within a decent size house could be
six or the maximum amount a person would want. But once you go pass that, that many
4
bathrooms are not needed unless you’re living in a mansion or a really huge house. An article
from the year 2012 states that “Real estate brokers who cater to the moneyed say their clients
typically want homes that have at least two bathrooms for every bedroom (Beale, 2012).” Also,
“For those who can afford it, an abundance of bathrooms provides convenience and privacy for
both guest and residents (Beale, 2012).” The graph above is data collected from the year 2017 so
that could also be why there is that increase again from 9 up until 13 bathrooms.
PART II: Predictive Analysis – Multiple Regression Model (SPSS)
Coefficientsa
Standardized
Unstandardized Coefficients
Model
1
B
(Constant)
Std. Error
491428.123
11019.667
Stories
13107.885
2155.972
Age
-3591.484
Bedrooms
Bathrooms
Coefficients
Beta
t
Sig.
44.596
.000
.031
6.080
.000
109.236
-.175
-32.878
.000
-9881.423
3152.644
-.020
-3.134
.002
9621.542
2334.918
.028
4.121
.000
a. Dependent Variable: MarketVal
The estimated equation for this specific regression would be as follows:
̂
𝑀𝑎𝑟𝑘𝑒𝑡𝑉𝑎𝑙
= 491428.123 + 13107.885𝑆𝑡𝑜𝑟𝑖𝑒𝑠 − 3591.484𝐴𝑔𝑒 − 9881.423𝐵𝑒𝑑𝑟𝑜𝑜𝑚𝑠
+ 9621.542𝐵𝑎𝑡ℎ𝑟𝑜𝑜𝑚𝑠
From the regression results shown above, it basically explains that all the including
variables such as stories, age, bedrooms, and bathrooms are significant determinants of the
market value because all of the p-values are less than .05. Also, all of the variables have the
expected sign except for the number of bedrooms. It should be positive for bedrooms because the
higher the market value or the price of the house, the more bedrooms there are within that house.
5
It is possible that the data may not be good because it is showing that bedrooms have the wrong
sign. Because of this we will try running another regression without bedrooms included.
Model Summary
Model
R
.185a
1
R Square
Adjusted R
Std. Error of the
Square
Estimate
.034
.034
520122.46263
a. Predictors: (Constant), Bathrooms, Stories, Age, Bedrooms
Above is the Model Summary which is a test for the Goodness of Fit. It is shown that R
Square is equal to .034 which is also 3.4%. So, 3.4% of the variability in the Market Value is
explained by the regression equation based on stories, age, bedrooms, and bathrooms. This
percentage is actually bad because a lot of the variables should have been included, which is why
it is low.
ANOVAa
Model
1
Sum of Squares
Regression
Residual
df
Mean Square
3721494325257
4 9303735813143
38.000
4.500
1048888742731
38772
1086103685984
343.911
Sig.
.000b
270527376130.
9086.000
Total
F
173
38776
4824.000
a. Dependent Variable: MarketVal
b. Predictors: (Constant), Bathrooms, Stories, Age, Bedrooms
The ANOVA chart above is a test for the overall significance of the model. Because the
estimated equation for the regression was obtained, it is required that the estimated model is
tested to see if it is significant or not. This is done by testing the null hypothesis against the
alternative as shown below:
6
𝐻0 : 𝛽1 = 𝛽2 = 𝛽3 = 𝛽4 = 0
𝐻𝐴 : 𝛽1 ≠ 𝛽2 ≠ 𝛽3 ≠ 𝛽4 ≠ 0
The chart above also reports the p-value which is shown as ‘Sig’. Since, 0.000 < 0.05 this
means the overall model is strongly significant. So, all the independent variables (stories, age,
bedrooms, bathrooms) jointly determine the dependent variable (MarketVal).
Below is another regression that was ran without the variable bedrooms.
Model Summary
Model
R
1
.184a
Adjusted R
Std. Error of
Square
the Estimate
R Square
.034
.034 520181.64438
a. Predictors: (Constant), Bathrooms, Stories, Age
ANOVAa
Sum of
Model
1
Squares
df
Regression 36949176699
3
Mean Square
F
1231639223
455.170
8792.000
Residual
.000b
32930.670
10491545092
38773
2705889431
846032.000
Total
Sig.
52.349
10861036859
38776
844824.000
a. Dependent Variable: MarketVal
b. Predictors: (Constant), Bathrooms, Stories, Age
Coefficientsa
Model
1
Unstandardized
Standardized
Coefficients
Coefficients
B
(Constant) 474911.988
Std. Error
9679.228
Beta
t
49.065
Sig.
.000
7
Stories
14595.384
2103.329
.035
6.939
.000
Age
-3626.263
108.683
-.177
-33.365
.000
5157.643
1850.483
.015
2.787
.005
Bathrooms
a. Dependent Variable: MarketVal
As you can see, R Squared provided in the Model Summary is still equal to 3.4%. So, the
percentage is still bad due to all the variables not being included. Then there is the ANOVA chart
which still shows that the p-value is 0.000 < 0.05. So, the model is still strongly significant
overall. Lastly, there is the coefficients chart where the estimated equation for this specific
regression was formulated. It is basically the same thing as the last one, but the numbers are
slightly different and all the signs are as expected. After running this second regression and
seeing that it is basically the same as the last one, this just means that the data given is not good.
CONCLUSION
This research was done so that Mary Clay can determine the relationship between the
listing price for houses and the number of stories, bedrooms, bathrooms in the house, and the age
of the house. What she found using the hedonic pricing model and her data is that the more
stories, bedrooms, and bathrooms a house has, the higher the price is to buy it. Also, the older the
house is the lower the price is to buy it. With that knowledge Mary will now know how to make
a sound business decision and what most of her clients will be looking for when it comes to her
new real estate company.
8
References
Beale, L. (2012, March 2). Wealthy Home Buyers Demand Bathrooms. Retrieved from LA
Times: https://www.latimes.com/home/la-xpm-2012-mar-02-la-fi-many-bathrooms20120303-story.html
Weintraub, E. (2019, July 14). Tips for Buying a House that Needs Work. Retrieved from The
Balance: https://www.thebalance.com/buying-a-house-that-needs-work-1798264
...