Unformatted Attachment Preview
Data Analytics Research Project – Executive Compensation Model
Due Date: May 1, 2020
The main goal of this project is to help students to apply quantitative analysis techniques including
statistical methods particularly predictive analytics to an empirical analyze and predict the
relationship between the executive compensation and a set of covariates that affect such
compensation. Each student is expected to analyze and predict the relationship, draw conclusions,
make policy recommendations and compose a written report.
Upon completing this research project, the student will be able to:
Describe the relationship between executive compensation and factors that affect it;
Develop a regression model to analyze the relationship between executive compensation
and a set of covariates;
Collect relevant data and apply data analytics to describe empirically the relationship;
Use predetermined values of the covariates to predict the executive compensation in the
Draw informed conclusions.
The data used to analyze this data analytics project is obtained from the Warton Research Data
Services (WRDS). The Compustat Execucomp database provided executive compensation data
collected directly from each company’s annual proxy (DEF14A SEC form). Detailed information
on salary, bonus, options and stock awards, non-equity incentive plans, pensions and other
compensation items are available on annual basis since 1992 (see AnnComp – Summary
compensation data + other). The data in each table contains additional header information on
company IDs and individual identification.
The data to be used is provided in the attached excel format.
The following are resources you can use to improve knowledge about regression analysis.
You can obtain additional knowledge on statistical analysis and the use of the SPSS
software by completing the Statistics 101 on IBM Cognitive class using the following link
Simply create an account using your BSU email and complete the course. You will learn,
obtain a certificate. Your certificate of completion must be attached to your report.
Task 1 – Data Cleaning
The data is not provided in a suitable state. It is necessary to get the data into a proper form that
supports your analysis, that is, you are expected to make it ready for the analysis by cleaning and
reconciling it using Excel or Tableau. Each student is expected to use data of a single state.
Task 2 – Data Visualization
You are expected to create various visualizations using Excel or Tableau to detect the
relationship between the executive compensation and each of its covariate.
Task 3 – Predictive Analytics
Develop and estimate a multiple regression model to determine the relationship between
executive compensation and its covariates. The equation to be estimated is specified below.
𝐓𝐎𝐓𝐀𝐋_𝐂𝐔𝐑𝐑 = 𝜷𝟎 + 𝜷𝟏 𝑨𝑮𝑬 + 𝜷𝟐 𝑮𝑬𝑵𝑫𝑬𝑹 + 𝜷𝟑 𝐒𝐇𝐑𝐎𝐖𝐍𝐓𝐎𝐓 + 𝜷𝟒 𝐄𝐗𝐄𝐂𝐑𝐀𝐍𝐊 + 𝛆
TOTAL_CURR is Total Current Compensation (Salary + Bonus);
AGE is executive's age;
GENDER is gender;
EXECRANK is current rank by salary + bonus; and
𝜀 is the random error.
Assess the quality of your results of estimation in term of the fitness of the regression model (i.e.
R-squared and standard deviation) and the hypothesis test on each estimated coefficient (using Fstat or p-value or t-stat). Also, interpret the results of estimation.
Task 4 – Report
Write a paper to present the results of your analysis and make policy recommendations for the
determination of the salaries and bonuses of the executive personnel.
Each project must include at least 5 pages excluding title page, cover page and references. It will
be written using the following guidelines and contents:
Title page (Include project title and student name) (5%)
Introduction: Problem of the proposed study, purpose and justification of the study (15%)
Data analytics – various calculations and estimations (45%)
Interpretation of results (15%)
Findings and conclusion. (10%)
Appendices: Tables and Figures. (5%)
The project will be written using the APA style. (https://apastyle.apa.org/index)
To cite this data, use the following format:
Wharton Research Data Services. " Compustat Execucomp data (Compustat report)"
wrds.wharton.upenn.edu, accessed 04/08/2020.
AGE -- Executive's Age
Current Rank by Salary + Bonus
EXECRANKANN -- Executive Rank by Salary + Bonus
Company ID Number
Date Joined Company
Date Left Company
SALARY -- Salary ($)
SHROWN_TOT -- Shares Owned - As Reported
TOTAL_CURR -- Total Current Compensation (Salary +
YEAR -- Fiscal Year
Quantitative Analysis Project – Data Analytics
There is a new real estate company in Bowie Maryland known as ABC Reality that was
founded by Mary Clay. Her goal is to determine what the relationship is between the price of
houses that are currently listed and the number of stories, bedrooms, bathrooms in the house, and
the age of the house. The data that Mary has acquired to begin this research is a huge set of data
that was collected in 2017. This source of data was obtained from a 2017 American House
Survey. To do this research, Mary will need to develop a hedonic pricing model with the help of
SPSS. Visualizations will also be created from her data through Tableau.
PART I: Data Visualizations (Tableau)
The first graph above shows that as the number of stories within a house increase, the
average market value increases as well. So, the more stories that are in a house the higher the
price will be to buy it. The second graph shows that as the age of the house increase, the average
market value decreases. So, the older the house gets the lower the price is to buy it. Also, this
second graph shows that basically after about ten years there is a huge drop in price for the
houses. This could possibly be that it is outdated after that amount of years, so it loses its value
and not a lot of buyers are willing to buy a house that is outdated. An article states that “A few
changes, such as replacing the fixtures, choosing modern paint colors, and installing newer
appliances, are often enough to generate interest at a better sales price. Otherwise, buyers will
generally not agree to pay top-of-market for an outdated home, even if it is newer and clean
The first graph above shows that as the number of bedrooms increase, the average market
value increases as well. So basically, the more bedrooms there are in a house the higher it will be
in price. The second graph above shows that as the number of bathrooms increase, the average
market value increases as well. This is the same concept as the bedrooms because the more
bathrooms that are in a house the higher it will be in price. This second graph also shows that
there is a decrease in price after 6 bathrooms up until about 9 and then it starts to increase again.
This could be because the average maximum for bathrooms within a decent size house could be
six or the maximum amount a person would want. But once you go pass that, that many
bathrooms are not needed unless you’re living in a mansion or a really huge house. An article
from the year 2012 states that “Real estate brokers who cater to the moneyed say their clients
typically want homes that have at least two bathrooms for every bedroom (Beale, 2012).” Also,
“For those who can afford it, an abundance of bathrooms provides convenience and privacy for
both guest and residents (Beale, 2012).” The graph above is data collected from the year 2017 so
that could also be why there is that increase again from 9 up until 13 bathrooms.
PART II: Predictive Analysis – Multiple Regression Model (SPSS)
a. Dependent Variable: MarketVal
The estimated equation for this specific regression would be as follows:
= 491428.123 + 13107.885𝑆𝑡𝑜𝑟𝑖𝑒𝑠 − 3591.484𝐴𝑔𝑒 − 9881.423𝐵𝑒𝑑𝑟𝑜𝑜𝑚𝑠
From the regression results shown above, it basically explains that all the including
variables such as stories, age, bedrooms, and bathrooms are significant determinants of the
market value because all of the p-values are less than .05. Also, all of the variables have the
expected sign except for the number of bedrooms. It should be positive for bedrooms because the
higher the market value or the price of the house, the more bedrooms there are within that house.
It is possible that the data may not be good because it is showing that bedrooms have the wrong
sign. Because of this we will try running another regression without bedrooms included.
Std. Error of the
a. Predictors: (Constant), Bathrooms, Stories, Age, Bedrooms
Above is the Model Summary which is a test for the Goodness of Fit. It is shown that R
Square is equal to .034 which is also 3.4%. So, 3.4% of the variability in the Market Value is
explained by the regression equation based on stories, age, bedrooms, and bathrooms. This
percentage is actually bad because a lot of the variables should have been included, which is why
it is low.
Sum of Squares
a. Dependent Variable: MarketVal
b. Predictors: (Constant), Bathrooms, Stories, Age, Bedrooms
The ANOVA chart above is a test for the overall significance of the model. Because the
estimated equation for the regression was obtained, it is required that the estimated model is
tested to see if it is significant or not. This is done by testing the null hypothesis against the
alternative as shown below:
𝐻0 : 𝛽1 = 𝛽2 = 𝛽3 = 𝛽4 = 0
𝐻𝐴 : 𝛽1 ≠ 𝛽2 ≠ 𝛽3 ≠ 𝛽4 ≠ 0
The chart above also reports the p-value which is shown as ‘Sig’. Since, 0.000 < 0.05 this
means the overall model is strongly significant. So, all the independent variables (stories, age,
bedrooms, bathrooms) jointly determine the dependent variable (MarketVal).
Below is another regression that was ran without the variable bedrooms.
Std. Error of
a. Predictors: (Constant), Bathrooms, Stories, Age
a. Dependent Variable: MarketVal
b. Predictors: (Constant), Bathrooms, Stories, Age
a. Dependent Variable: MarketVal
As you can see, R Squared provided in the Model Summary is still equal to 3.4%. So, the
percentage is still bad due to all the variables not being included. Then there is the ANOVA chart
which still shows that the p-value is 0.000 < 0.05. So, the model is still strongly significant
overall. Lastly, there is the coefficients chart where the estimated equation for this specific
regression was formulated. It is basically the same thing as the last one, but the numbers are
slightly different and all the signs are as expected. After running this second regression and
seeing that it is basically the same as the last one, this just means that the data given is not good.
This research was done so that Mary Clay can determine the relationship between the
listing price for houses and the number of stories, bedrooms, bathrooms in the house, and the age
of the house. What she found using the hedonic pricing model and her data is that the more
stories, bedrooms, and bathrooms a house has, the higher the price is to buy it. Also, the older the
house is the lower the price is to buy it. With that knowledge Mary will now know how to make
a sound business decision and what most of her clients will be looking for when it comes to her
new real estate company.
Beale, L. (2012, March 2). Wealthy Home Buyers Demand Bathrooms. Retrieved from LA
Weintraub, E. (2019, July 14). Tips for Buying a House that Needs Work. Retrieved from The