University of Kentucky Dataset Analysis Project

User Generated

fcbbegul

Engineering

University of Kentucky

Description

Guidelines

  • Write you response as a research analysis with explanation and APA Format
  • Share the code and the plots
  • Put your name and id number
  • Upload Word document and ipynb file from google colab

HW02 Cover Sheet – Analyze the following dataset

https://archive.ics.uci.edu/ml/datasets/Automobile

The research paper should include

  • Introduction
    • Dataset attributes
    • Dataset clean-up
  • Exploratory Data Analysis
    • Univariate analysis (individual variables)
    • Bivariate analysis (relationships)
    • Heat Maps
    • Bar charts
    • Identification of important features
  • Perform a Regression to predict the car prices

Do not copy

References for analysis

User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

There you go buddy, check it out and let me know in case of anything

Running head: DATASET ANALYSIS.

Dataset Analysis.
Student Name.
Institution Affiliation.
Date.

DATASET ANALYSIS.

Dataset Analysis.
Introduction.
Below are the attributes that are associated with the dataset that is being analyzed.
1. symboling: -3, -2, -1, 0, 1, 2, 3.
2. normalized-losses: continuous from 65 to 256.
3. make: alfa-romero, audi, bmw, chevrolet, dodge, honda,
isuzu, jaguar, mazda, mercedes-benz, mercury,
mitsubishi, nissan, peugot, plymouth, porsche,
renault, saab, subaru, toyota, volkswagen, volvo
4. fuel-type: diesel, gas.
5. aspiration: std, turbo.
6. num-of-doors: four, two.
7. body-style: hardtop, wagon, sedan, hatchback, convertible.
8. drive-wheels: 4wd, fwd, rwd.
9. engine-location: front, rear.
10. wheel-base: continuous from 86.6 120.9.
11. length: continuous from 141.1 to 208.1.
12. width: continuous from 60.3 to 72.3.
13. height: continuous from 47.8 to 59.8.
14. curb-weight: continuous from 1488 to 4066.
15. engine-type: dohc, dohcv, l, ohc, ohcf, ohcv, rotor.
16. num-of-cylinders: eight, five, four, six, three, twelve, two.
17. engine-size: continuous from 61 to 326.

DATASET ANALYSIS.

18. fuel-system: 1bbl, 2bbl, 4bbl, idi, mfi, mpfi, spdi, spfi.
19. bore: continuous from 2.54 to 3.94.
20. stroke: continuous from 2.07 to 4.17.
21. compression-ratio: continuous from 7 to 23.
22. horsepower: continuous from 48 to 288.
23. peak-rpm: continuous from 4150 to 6600.
24. city-mpg: continuous from 13 to 49.
25. highway-mpg: continuous from 16 to 54.
26. price: continuous from 5118 to 45400.
To perform data cleaning, we need to run the code below.
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the inp
ut directory
import warnings
warnings.filterwarnings("ignore")

import os
print(os.listdir("../input"))

DATASET ANALYSIS.

# Any results you write to the current directory are saved as output.
['Automobile_data.csv']
Data Loading
In [2]:
df_automobile = pd.read_csv("../input/Automobile_data.csv")
Data Cleaning


Data contains "?" replace it with NAN

In [3]:
df_data = df_automobile.replace('?',np.NAN)
df_data.isnull().sum()
Out[3]:
The output look like the one below.

DATASET ANALYSIS.

0
fuel-type

0iration

0

4
Missing Data


fill missing data of normalised-losses, price, horsepower, peak-rpm, bore, stroke with
the respective column mean



Fill missing data category Number of doors with the mode of the column i.e. Four

In [4]:
df_temp = df_automobile[df_automobile['normalized-losses']!='?']
normalised_mean = df_temp['normalized-losses'].a...


Anonymous
Great! 10/10 would recommend using Studypool to help you study.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags