Access Millions of academic & study documents

Data Analytics Exercise

Content type
User Generated
Subject
Statistics
Type
Homework
Showing Page:
1/4
1. We have begun our analysis of whether a dataset is appropriate for analysis. The first
step was to evaluate whether there was a continuous numeric variable. This allowed us to
run appropriate summary statistics. We have provided the definitions of some of the
summary statistics in Table 1. We obtained the descriptive statistics by running
characterize data in SAS (Table 2.), summary statistics in SAS (Table 3.), and descriptive
statistics in Excel (Table 4.).
Table 1:
Table 2:
Descriptive Statistics for Numeric Variables
Variable
Label
N
N
Miss
Minimum
Mean
Median
Maximum
MSRP
Invoice
EngineSize
Cylinders
Horsepower
MPG_City
MPG_Highway
Weight
Wheelbase
Length
Engine Size (L)
MPG (City)
MPG (Highway)
Weight (LBS)
Wheelbase (IN)
Length (IN)
428
428
428
426
428
428
428
428
428
428
0
0
0
2
0
0
0
0
0
0
10280.00
9875.00
1.3000000
3.0000000
73.0000000
10.0000000
12.0000000
1850.00
89.0000000
143.0000000
32774.86
30014.70
3.1967290
5.8075117
215.8855140
20.0607477
26.8434579
3577.95
108.1542056
186.3621495
27635.00
25294.50
3.0000000
6.0000000
210.0000000
19.0000000
26.0000000
3474.50
107.0000000
187.0000000
192465.00
173560.00
8.3000000
12.0000000
500.0000000
60.0000000
66.0000000
7190.00
144.0000000
238.0000000
Mean
A number that is the average of a set of numbers.
Standard
Error
A measure of the statistical accuracy of an estimate, equal to the standard
deviation of the theoretical distribution of a large population of such
estimates.
Median
The median is the value separating the higher half from the lower half of a
data sample.
Mode
The mode of a set of data values is the value that appears most often.
Standard
Deviation
A quantity expressing by how much the members of a group differ from the
mean value for the group.
Sample
Variance
The variance is mathematically defined as the average of the squared
differences from the mean.
Kurtosis
The sharpness of the peak of a frequency-distribution curve.
Skewness
A measure of the asymmetry of the probability distribution of a real-valued
random variable about its mean.
Range
The Range is the difference between the lowest and highest values.
Minimum
The smallest amount or number allowed or possible.
Maximum
The largest amount or number allowed or possible.
Sum
The total amount resulting from the addition of two or more numbers.
Count
The number presents.

Sign up to view the full document!

lock_open Sign Up
Showing Page:
2/4
Table 3:
Variable
Label
Mean
Std Dev
Minimum
Maximum
N
N
Miss
Skewness
Kurtosis
MSRP
Invoice
EngineSize
Cylinders
Horsepower
MPG_City
MPG_Highway
Weight
Wheelbase
Length
Engine Size (L)
MPG (City)
MPG (Highway)
Weight (LBS)
Wheelbase (IN)
Length (IN)
32774.86
30014.70
3.1967290
5.8075117
215.8855140
20.0607477
26.8434579
3577.95
108.1542056
186.3621495
19431.72
17642.12
1.1085947
1.5584426
71.8360316
5.2382176
5.7412007
758.9832146
8.3118130
14.3579913
10280.00
9875.00
1.3000000
3.0000000
73.0000000
10.0000000
12.0000000
1850.00
89.0000000
143.0000000
192465.00
173560.00
8.3000000
12.0000000
500.0000000
60.0000000
66.0000000
7190.00
144.0000000
238.0000000
428
428
428
426
428
428
428
428
428
428
0
0
0
2
0
0
0
0
0
0
2.7980993
2.8347404
0.7081520
0.5927852
0.9303307
2.7820718
1.2523953
0.8918242
0.9622870
0.1819770
13.8792055
13.9461638
0.5419435
0.4403783
1.5521586
15.7911473
6.0456107
1.6887885
2.1336492
0.6147245
Table 4:
2. There are 10 numeric attributes in the data. The only one attribute and is missing two
records is Cylinders. This information is provided in Table 2 and table 3:(descriptive
statistics for numeric Variables) Where the N Miss for Cylinders is 2. Missing data
can reduce the power of the statistics because it leads to biased estimates and invalid
conclusions.
3. The most common type in figure 1 is sedan because it has the highest frequency in
distribution. Sedans have a frequency of more than 250 and hence 100 records will
still qualify it to be used to evaluate the data. However, the reliability of data
decreases when Less data is used in statistics.
MSRP Invoice
EngineSize
Cylinders
Horsepower
MPG_City
MPG_Highway
Weight
Wheelbase
Length
Mean 32774.86 30014.7 3.196729 5.807512 215.8855 20.06075 26.84346 3577.953 108.1542 186.3621
Standard Error
939.2675 852.7639 0.053586 0.075507 3.472326 0.253199 0.277511 36.68684 0.401767 0.69402
Median 27635 25294.5 3 6 210 19 26 3474.5 107 187
Mode 35940 14207 3 6 200 18 26 3285 107 178
Standard Deviation
19431.72 17642.12 1.108595 1.558443 71.83603 5.238218 5.741201 758.9832 8.311813 14.35799
Sample Variance
3.78E+08 3.11E+08 1.228982 2.428743 5160.415 27.43892 32.96139 576055.5 69.08624 206.1519
Kurtosis 13.87921 13.94616 0.541944 0.440378 1.552159 15.79115 6.045611 1.688789 2.133649 0.614725
Skewness 2.798099 2.83474 0.708152 0.592785 0.930331 2.782072 1.252395 0.891824 0.962287 0.181977
Range 182185 163685 7 9 427 50 54 5340 55 95
Minimum 10280 9875 1.3 3 73 10 12 1850 89 143
Maximum
192465 173560 8.3 12 500 60 66 7190 144 238
Sum 14027638 12846292 1368.2 2474 92399 8586 11489 1531364 46290 79763
Count 428 428 428 426 428 428 428 428 428 428

Sign up to view the full document!

lock_open Sign Up
Showing Page:
3/4

Sign up to view the full document!

lock_open Sign Up
End of Preview - Want to read all 4 pages?
Access Now
Unformatted Attachment Preview
1. We have begun our analysis of whether a dataset is appropriate for analysis. The first step was to evaluate whether there was a continuous numeric variable. This allowed us to run appropriate summary statistics. We have provided the definitions of some of the summary statistics in Table 1. We obtained the descriptive statistics by running characterize data in SAS (Table 2.), summary statistics in SAS (Table 3.), and descriptive statistics in Excel (Table 4.). Table 1: Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count A number that is the average of a set of numbers. A measure of the statistical accuracy of an estimate, equal to the standard deviation of the theoretical distribution of a large population of such estimates. The median is the value separating the higher half from the lower half of a data sample. The mode of a set of data values is the value that appears most often. A quantity expressing by how much the members of a group differ from the mean value for the group. The variance is mathematically defined as the average of the squared differences from the mean. The sharpness of the peak of a frequency- ...
Purchase document to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.
Studypool
4.7
Indeed
4.5
Sitejabber
4.4