Use only clean sheets of good quality 8 1/2" x 11" white paper. Text should be typed on one side only.

The data in Used Cars represent characteristics of cars that are currently part of an inventory of a used car dealership. The variables included are car, year, age, price($), mileage, power(hp), fuel (mph), Region of origin (manufactured in USA or in a foreign country), and single ownership (Yes= owned by one or No= owned by more than one owner). The excel file for this problem is stored under module called “Used Cars”. You want to describe each of these variables, and you would like to predict the price of the used cars. Make sure to take appropriate steps to analyze this data set and write a mini report for the Car Dealer. Also, do you think that the model is missing some important variables? If so, what are those missing variables? Please explain.

Report on Determinants of Used Car Prices

Determinants of any used prices are numerous. This report, therefore, presents a discussion

of the most significant factors based on statistical analysis. To come up with a reliable conclusion,

a sample of 500 used cars at a dealer shop was selected, and data on the selected sample was

collected. Both qualitative and quantitative data were collected. The quantitative data collected

included the price of the car, year of manufacture and hence the age of the car, mileage covered so

far, power and fuel. Qualitative data included information about whether the car was manufactured

in the United States or a foreign country and whether the car was owned by a single person or more

than one person.

From the data collected, y-variable is the price of the used car. This is because it is the

dependent variable whereas the rest of the variables were considered x-variables because they in

one way or another influences the price. To understand the significance and relationship between

the dependent and the independent variable, a regression analysis was performed. But before

running the regression analysis, the data was analyzed using descriptive statistics.

As mentioned above, the data collected had both quantitative data and qualitative data. To

make qualitative data quantitative, the data was coded using the integer 1 and 0 (McNeil &

Chapman, 2005; Carlberg, 2014). For origin, those cars manufactured in the US were coded 1

whereas those manufactured in a foreign country were coded 0. Similarly, those cars that were

initially owned by a single person were recorded 1 while those with more than one owner were

coded 0. Coding make analyzing quantitative data possible.

USED CAR PROJECT

3

Descriptive Data

Price

Mean

Standard

Error

Median

Mode

Standard

Deviation

Minimum

Maximum

Sum

Count

Age

Mileage

Power

(HP)

Fuel

(MPG)

Region

of

Single

origin owner

9057.80

8.36

92492.00

174.78

25.47

0.71

0.67

93.44

9100.00

10600.00

0.21

8.00

3.00

2303.83

91000.00

40000.00

1.34

180.00

130.00

0.42

27.00

12.00

0.02

1.00

1.00

0.02

1.00

1.00

2089.47

4800.00

13400.00

4528900.00

500.00

4.79

1.00

16.00

4181.00

500.00

51515.22

5000.00

184000.00

46246000.00

500.00

29.92

130.00

220.00

87390.00

500.00

9.44

12.00

42.00

12735.00

500.00

0.46

0.47

0.00

0.00

1.00

1.00

353.00 334.00

500.00 500.00

From the descriptive statistics above, it’s apparent that the sample consists of 353 cars

manufactured in the US and 147 from foreign countries. The sample was also made up of 334

single owned cars, and 166 non-single owned cars. The statistics also shows the average price of

a used car to be $9,057.80, and we can also say that based on the sample, the vehicles in the dealer

shop are between a year old to 16 years old with the most cars being 3 years old. Furthermore, we

can say that the largest mileage covered by the cars in the shop is 184,000miles while the smallest

mileage is 5,000 miles and most vehicles have actually covered 4,000 miles. The descriptive

statistics gives an overview of the sample data. Therefore, to comprehensively analyze the data

and use this analysis to make inference about the entire population, there was a need for performing

a regression analysis.

USED CAR PROJECT

4

Regression Analysis

Regression analysis is a statistical technique that is used determines the relationship

between variables (Cowan, 2004). Regression analysis results normally have three with the main

metric being the intercept, the coefficient, the R-squared, the adjusted R-squared and the p-value.

Before using the regression analysis results to form a regression model, there was the need to

determine if all the variables were statistically significant (Cowan, 2004). A variable is considered

significant if its p-value is less than the significance level used in the regression analysis (Wang &

Jain, 2003). The results below shows part three of the regression analysis result and it was used to

determine the significance of the variable.

Intercept

Year

Age

Mileage

Power (HP)

Fuel (MPG)

Region of origin

Single owner

Coefficients

Standard Error

12953.32793

768.6667128

0

0

-285.1267654

46.32344455

-0.012553909

0.004304894

-8.370190934

8.111615837

42.6670949

25.7252244

-28.59946237

52.29839736

69.22228311

50.44309571

t Stat

16.85168319

65535

-6.155128752

-2.916194718

-1.031877138

1.658570368

-0.546851602

1.372284594

P-value

1.18357E-50

0

0

0.003704752

0.302635614

0.097838437

0.584728116

0.170598887

Based on the results above, only mileage and age are significant. I have intentionally left

year because the year of manufacture determines the how old a car is. The rest of the variables

have p-value more than a significance level of 0.05. However, it would not be prudent to eliminate

all the variables with a p-value greater than 0.05 at once. We, therefore, eliminated the variables

one at a time starting with one that has the highest p-value. As highlighted above, the region of

origin is the least significant and therefore it was eliminated, and a regression analysis performed

USED CAR PROJECT

5

again. This step was repeated until a result with variables' p-value less than 0.05 was found. The

final regression results were, therefore, has shown below.

Part 1

Regression Statistics

Multiple R

0.96742

R Square

0.93589

Adjusted

R

Square

0.93349

Standard Error

530.636

Observations

500

Part 2

ANOVA

df

Regression

Residual

Total

4

496

500

SS

2038918613

139660967

2178579580

Coefficients

12192.3

0

-292.91

-0.0118

16.1198

Standard

Error

81.31208179

0

46.08750159

0.004284727

2.532104958

MS

509729653

281574.53

F

2413.711

Significance

F

0

Part 3

Intercept

Year

Age

Mileage

Fuel (MPG)

t Stat

149.943992

65535

-6.3555889

-2.764776

6.36616152

P-value Lower 95%

0

12032.5

0

0

0

-383.464

0.0059

-0.02026

0.0000

11.14481

Upper

95%

12352.02

0

-202.362

-0.00343

21.09476

Part 1 of the regression analysis above consist of five metrics namely the multiple R, Rsquared, adjusted R-squared, observations and standard error. The multiple R metrics measures

the strength of the linear relationship between the variables (Wang & Jain, 2003). Based on our

sample data, the multiple R is 0.9674 showing a very strong correlation between the age, mileage,

USED CAR PROJECT

6

fuel, and price of a used car. The second metric is the R-squared, and it shows what percentage of

the variation in the y-variable is explained by the x-variable (McNeil & Chapman, 2005). From

the analysis above, it is apparent that 93.59% of the variation in used car prices is explained by

variation in age of the car, mileage covered and fuel. The R-squared is mostly relied upon when

the regression is has a single independent variable. However, where the regression involves

multiple x-variables, then adjusted R-squared is preferred (McNeil & Chapman, 2005). The sample

adjusted R-squared is 0.9335 meaning 93.35% of the variation in the used car price is explained

by a variation in the car’s age, mileage and fuel consumption. The other two metrics in this section

are the standard error and the observation. Observation shows the sample size while the standard

error is basically used to test whether the coefficient is different from zero.

The second part of the regression analysis is the sample ANOVA. This section is made of

the degree of freedom, the sum of residuals and the sample f-test. This section is not commonly

utilized, and for this project, it won't be utilized either.

The third and last part of the regression analysis is the intercept and the coefficient section.

This section is made up of the intercept coefficient and the x-variable coefficient. The section also

entails the p-value, t-statistics, lower limit and the upper limit (Carlberg, 2014). The p-value has

been discussed above, and it shows the statistical significance of a variable. The upper and lower

limit indicates the confidence interval of a variable. The most important metric in this section is

the coefficient. The variable coefficient shows the average change in the car price given a unit

change in the x-variable while holding other x-variable constant (Carlberg, 2014). A positive

coefficient indicates a direct relationship while a negative coefficient indicates an inverse

relationship. The coefficient of the intercept, on the other hand, indicates the average price given

that all the other variables are zero. Therefore the regression model for used car price determination

USED CAR PROJECT

7

is y = 12,192 – 292.91x1 - 0.01x2 + 16.11x3 ; given x1 represent age, x2 represent mileage and x3

represent fuel.

In conclusion, we can say that the older a car is, the cheaper it is, and the same applies to

mileage covered. However, a car with higher fuel consumption will definitely cost more. It is also

prudent to conclude that the regression model established indicates a very strong relationship as

indicated by an adjusted R-squared of 93.59%. However, this value is not 100% meaning there are

still unexplained 6% determinants of the used car price. One of such factor would the car brand.

Different new cars have different prices based on their brands. This factor does not change just

because the car is second hand. And therefore brand is a significant determinant of a used car price.

USED CAR PROJECT

8

References

Carlberg, C. G. (2014). Statistical Analysis: Microsoft Excel 2013. INpolis, IN: Que.

Cowan, G. (2004). Statistical data analysis. Oxford: Clarendon Press.

McNeill, P., & Chapman, S. (2005). Research methods. London: Routledge.

Wang, G. C., & Jain, C. L. (2003). Regression analysis: modeling & forecasting. Flushing,

NY: Graceway Pub.

