FACULTY OF LAW & BUSINESS
Peter Faber Business School
Semester 2, 2018
DATA201: Data Analytics and Decision Making
Case Study: Don’t Get Kicked
One of the biggest challenges of an auto dealership purchasing a used car at an auto
auction is the risk that the vehicle might have serious issues that prevent it from being
sold to customers. The auto community calls these unfortunate purchases "kicks".
Kicked cars often result when there are tampered odometers, mechanical issues the
dealer is not able to address, issues with getting the vehicle title from the seller, or some
other unforeseen problem. Kicked cars can be very costly to dealers after transportation
cost, throw-away repair work, and market losses in reselling the vehicle.
Data analysts who can figure out which cars have a higher risk of being kick can provide
real value to dealerships trying to provide the best inventory selection possible to their
customers.
The challenge of this case study is to predict if a car purchased at an Auction is a Kick
(bad buy).
The data dictionary, Carvana_Data_Dictionary.txt, and the data files can be downloaded
from LEO under Assessment tab. The data dictionary describes the 34 attributes: RefId,
in the first column, contains the ID number for each record. IsBadBuy, in the second
column, is the binary dependent variable, where a 1 (one) means “is Kick” and 0 (zero)
“is not Kick”. The remaining columns (3 through 30) are independent variables. The
dataset contains records for 72,561 vehicles, of which 12.3% are Kick.
(Adapted from Kaggle competition)
Case Assignment
The overall objective of this assignment is two-fold. First, determine if a classification
tool can be constructed that can effectively assist the buyer in identification of cars likely
to be Kicks. Second, if successful in creating such a tool, describe the tool’s
functionality with respect to input contributions to the Kick classification.
To accomplish this, do the following:
Step 1 (for Assessment Task 1): Data pre-processing
Explore and prepare the dataset for mining as follows:
1. Evaluate and determine what to do with missing values (including blanks and
dummy values such as NULL)
2. Eliminate columns, which contain highly correlated values or values that are
otherwise of no value to the analysis process.
3. Search the dataset and remove obvious outliers.
Use the statistical methods you have learned in weeks 1-5 to explore and pre-process
data. Describe what data pre-processing you did and explain why. Also describe how
you engage with stakeholders to elicit requirements, data and relevant information.
Submit your Report (Word) and Excel file on ePortfolio. Submit your Report that
contains the secret link to your ePortfolio on LEO Turnitin.
See the Unit Outline for information about what to include in your Assignment and how
to submit your it. More information and help are available on LEO / Assessment.
Step 2 (for Assessment Task 2): Constructing the
classification tool
1. Experiment different configurations of the decision tree tool in Knime to find the best
one you can. (NB. The error rate should be less than 15%).
2. It is expected that while exploring this tool, you may need to keep coming back to
explore the dataset to find the best set of inputs for your classification problem.
3. When you are happy with your classification tool, create a dashboard in Tableau or
Excel to present these inputs and how they affect IsBadBuy (Kicks). Be mindful to
choose appropriate visuals for your dashboard.
Use the data analytic methods you have learnt in the whole semester. Explain your
analysis in the experiment. Evaluate your classification tool and explain how it may
assist the buyer to reduce the Kicks rate.
Submit your Report (Word), data file (Excel), Knime file (of your decision tree), and
Dashboard (Tableau or Excel) on ePortfolio. Submit your Report that contains the
secret link to your ePortfolio on LEO Turnitin.
See the Unit Outline for information about what to include in your Assignment and how
to submit your it. More information and help are available on LEO / Assessment.
Field Name
Definition
RefID
Unique (sequential) number assigned to vehicles
IsBadBuy
Identifies if the kicked vehicle was an avoidable
purchase
PurchDate
The Date the vehicle was Purchased at Auction
Auction
Auction provider at which the vehicle was
purchased
VehYear
The manufacturer's year of the vehicle
VehicleAge
The Years elapsed since the manufacturer's year
Make
Vehicle Manufacturer
Model
Vehicle Model
Trim
Vehicle Trim Level
SubModel
Vehicle Submodel
Color
Vehicle Color
Transmission
Vehicles transmission type (Automatic,
Manual)
WheelTypeID
The type id of the vehicle wheel
WheelType
The vehicle wheel type description (Alloy, Covers)
VehOdo
The vehicles odometer reading
Nationality
The Manufacturer's country
Size
The size category of the vehicle (Compact, SUV,
etc.)
TopThreeAmericanName
Identifies if the manufacturer is one of the
top three American manufacturers
MMRAcquisitionAuctionAveragePrice Acquisition price for this vehicle in
average condition at time of purchase
MMRAcquisitionAuctionCleanPrice
Acquisition price for this vehicle in
the above Average condition at time of purchase
MMRAcquisitionRetailAveragePrice
Acquisition price for this vehicle in the
retail market in average condition at time of purchase
MMRAcquisitonRetailCleanPrice
Acquisition price for this vehicle in the
retail market in above average condition at time of purchase
MMRCurrentAuctionAveragePrice
Acquisition price for this vehicle in
average condition as of current day
MMRCurrentAuctionCleanPrice
Acquisition price for this vehicle in the
above condition as of current day
MMRCurrentRetailAveragePrice
Acquisition price for this vehicle in the
retail market in average condition as of current day
MMRCurrentRetailCleanPrice
Acquisition price for this vehicle in the
retail market in above average condition as of current day
PRIMEUNIT
Identifies if the vehicle would have a higher
demand than a standard purchase
AcquisitionType
Identifies how the vehicle was aquired
(Auction buy, trade in, etc)
AUCGUART
The level guarntee provided by auction for the
vehicle (Green light - Guaranteed/arbitratable, Yellow Light - caution/issue,
red light - sold as is)
KickDate
Date the vehicle was kicked back to the auction
BYRNO
Unique number assigned to the buyer that purchased
the vehicle
VNZIP
Zipcode where the car was purchased
VNST
State where the the car was purchased
VehBCost
Acquisition cost paid for the vehicle at time of
purchase
IsOnlineSale
Identifies if the vehicle was originally
purchased online
WarrantyCost
Warranty price (term=36month and
millage=36K)
Purchase answer to see full
attachment