Description
Guidelines
- Write you response as a research analysis with explanation and APA Format
- Share the code and the plots
- Put your name and id number
- Upload Word document and ipynb file from google colab
HW02 Cover Sheet – Analyze the following dataset
https://archive.ics.uci.edu/ml/datasets/Automobile
The research paper should include
- Introduction
- Dataset attributes
- Dataset clean-up
- Exploratory Data Analysis
- Univariate analysis (individual variables)
- Bivariate analysis (relationships)
- Heat Maps
- Bar charts
- Identification of important features
- Perform a Regression to predict the car prices
Do not copy
References for analysis
Explanation & Answer
There you go buddy, check it out and let me know in case of anything
Running head: DATASET ANALYSIS.
Dataset Analysis.
Student Name.
Institution Affiliation.
Date.
DATASET ANALYSIS.
Dataset Analysis.
Introduction.
Below are the attributes that are associated with the dataset that is being analyzed.
1. symboling: -3, -2, -1, 0, 1, 2, 3.
2. normalized-losses: continuous from 65 to 256.
3. make: alfa-romero, audi, bmw, chevrolet, dodge, honda,
isuzu, jaguar, mazda, mercedes-benz, mercury,
mitsubishi, nissan, peugot, plymouth, porsche,
renault, saab, subaru, toyota, volkswagen, volvo
4. fuel-type: diesel, gas.
5. aspiration: std, turbo.
6. num-of-doors: four, two.
7. body-style: hardtop, wagon, sedan, hatchback, convertible.
8. drive-wheels: 4wd, fwd, rwd.
9. engine-location: front, rear.
10. wheel-base: continuous from 86.6 120.9.
11. length: continuous from 141.1 to 208.1.
12. width: continuous from 60.3 to 72.3.
13. height: continuous from 47.8 to 59.8.
14. curb-weight: continuous from 1488 to 4066.
15. engine-type: dohc, dohcv, l, ohc, ohcf, ohcv, rotor.
16. num-of-cylinders: eight, five, four, six, three, twelve, two.
17. engine-size: continuous from 61 to 326.
DATASET ANALYSIS.
18. fuel-system: 1bbl, 2bbl, 4bbl, idi, mfi, mpfi, spdi, spfi.
19. bore: continuous from 2.54 to 3.94.
20. stroke: continuous from 2.07 to 4.17.
21. compression-ratio: continuous from 7 to 23.
22. horsepower: continuous from 48 to 288.
23. peak-rpm: continuous from 4150 to 6600.
24. city-mpg: continuous from 13 to 49.
25. highway-mpg: continuous from 16 to 54.
26. price: continuous from 5118 to 45400.
To perform data cleaning, we need to run the code below.
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the inp
ut directory
import warnings
warnings.filterwarnings("ignore")
import os
print(os.listdir("../input"))
DATASET ANALYSIS.
# Any results you write to the current directory are saved as output.
['Automobile_data.csv']
Data Loading
In [2]:
df_automobile = pd.read_csv("../input/Automobile_data.csv")
Data Cleaning
•
Data contains "?" replace it with NAN
In [3]:
df_data = df_automobile.replace('?',np.NAN)
df_data.isnull().sum()
Out[3]:
The output look like the one below.
DATASET ANALYSIS.
0
fuel-type
0iration
0
4
Missing Data
•
fill missing data of normalised-losses, price, horsepower, peak-rpm, bore, stroke with
the respective column mean
•
Fill missing data category Number of doors with the mode of the column i.e. Four
In [4]:
df_temp = df_automobile[df_automobile['normalized-losses']!='?']
normalised_mean = df_temp['normalized-losses'].a...