Intro to data science

User Generated

nxnexv1234

Computer Science

Tarleton State University

Description

Exam

starts from 2:40 pm till 3:50 pm last 1 hrs(live exam) will send questions that need instance answer.

User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

View attached explanation and answer. Let me know if you have any questions.

1. What is the difference between model validation and calibration?
Validation is a process of comparing the model and its behavior to the real system and its behavior.
Calibration is the iterative process of comparing the model with real system, revising the model if
necessary, comparing again, until a model is accepted

2. Best visualization coloring.
Consider the following color palettes. Which one is more appropriate for effective
visualization communications with humans?

The first palette is best for visualization as all the colors are distinct and clearly visible. One
color in the second palette is not clearly visible and is not be appropriate for visualization
3. Visualization color scales.
Which classes of color scales the following color-mappings belong to?
a) Qualitative color Schemes
b) Sequential color schemes
c) Diverging color schemes
d) Diverging color schemes

4. Name the three pillars of science.
-Theory
-Experiment
-Computing
5. Is logical implication the same as physical causation? Explain with an example.
They are not the same. Logical implication is brought about when relationship is between
two propositions in which the second is a logical consequence of the first. Example, John
is 13 years old. He is therefore a teenager. While physical causation occurs when there is
a cause and effect. For example, pressing hard on the gas peddle causes a car to
accelerate.
6. When data is scarce, which school of probability theory is the most useful for scientific
inference?
7. Which school of thought in Probability Theory cannot define or discuss the probability of
existence of God? Why?
8. What is wrong in the following histogram?

9. Why is everything represented by integers in computers?
10. Name the first high-level programming language that is still used today.

11. What kind of data set can be best visualized in polar coordinates?

View attached explanation and answer. Let me know if you have any questions.Final Exam answers

1. What is the difference between model validation and calibration?
Validation is a process of comparing the model and its behavior to the real system and its behavior.
Calibration is the iterative process of comparing the model with real system, revising the model if
necessary, comparing again, until a model is accepted

2. Best visualization coloring.
Consider the following color palettes. Which one is more appropriate for effective
visualization communications with humans?

The first palette is best for visualization as all the colors are distinct and clearly visible. One
color in the second palette is not clearly visible and is not be appropriate for visualization
3. Visualization color scales.
Which classes of color scales the following color-mappings belong to?
a) Qualitative color Schemes
b) Sequential color schemes
c) Diverging color schemes
d) Diverging color schemes

4. Name the three pillars of science.
-Theory
-Experiment
-Computing
5. Is logical implication the same as physical causation? Explain with an example.
They are not the same. Logical implication is brought about when relationship is between
two propositions in which the second is a logical consequence of the first. Example, John
is 13 years old. He is therefore a teenager. While physical causation occurs when there is
a cause and effect. For example, pressing hard on the gas peddle causes a car to
accelerate.
6. When data is scarce, which school of probability theory is the most useful for scientific
inference?
the Bayesian school

7. Which school of thought in Probability Theory cannot define or discuss the probability of
existence of God? Why?
Frequentist probability. As it is based on events that have occurred and not optinion.

8. What is wrong in the following histogram?

The histogram does not show the distinct frequencies of each of the classes.
9. Why is everything represented by integers in computers?
Computer machine language is in binary form which forma the basis of all computer
preogramming
10. Name the first high-level programming language that is still used today.
Fortran
11. What kind of data set can be best visualized in polar coordinates?
Polar coordinates are two-dimensional and thus they can be used only where point positions lie on a
single two-dimensional plane. Are most appropriate for datasets that are inherently tied to direction
and length from a center point.

View attached explanation and answer. Let me know if you have any questions.It is a bit lengthy

QUIZ
Q1.

But in making such a discovery, I repeatedly and subconsciously throw away any data point that
does not seem to be a good fit to my hypothesis and keep those only data points that seem to
confirm my hypothesis. Of course, these exclusions are not done in bad faith, I have reasons for
myself that appear legitimate at least on the surface to exclude those outlier (red) data points to
my discovery.
What kind of cognitive bias is affecting my discovery?
Confirmation bias
Confirmation bias is the tendency to seek out or notice information that supports our existing beliefs.
Q2.

Supposed we have observed a dataset comprised of events with two attributes x and y
as in this file: data.xlsx.
1. Plot this data in Microsoft Excel.

Relationship between X and Y
180
160

y = 0.9295e1.0159x
R² = 0.9942

140
120
100
80
60
40
20
0
0

1

2

3

4

5

6

2. Form a hypothesis about the relationship between x and y
Null hypothesis: There is a no significant exponential relationship between the
variables

Alternate Hypothesis: There is a significant exponential relationship between the
variables

3. Use Excel’s Trendline toolbox to fit your hypothesized model to this data.

Relationship between X and Y
180
160

y = 0.9295e1.0159x
R² = 0.9942

140
120
100
80
60
40
20
0
0

1

2

3

4

5

6

4. Is is a good fit to data?
Yes. The r squared value is high
5. Try at least one other hypothesis for this dataset and fit the corresponding model to the
observed trend in data.
Null hypothesis: There is a no significant quadratic relationship between the
variables
Alternate Hypothesis: There is a significant quadratic relationship between the
variables

6. Which hypothesis is a better fit to your data? The original or your alternative hypothesis?
The original hypothesis is better as the r squared value is higher
7. Use the Excel Trendline again to obtain the equation for the model that seems to be a
better fit to data.

Relationship between X and Y
180
160
140
y = 9.2074x2 - 24.212x + 13.093
R² = 0.9562

120

100
80
60
40
20
0
-20

0

1

2

3

4

5

6

8. Using this equation, compute the predicted y values by the model for the corresponding x
9. values in the dataset.
10. Subtract the model-predicted y values from the actual y
11. values in the data set. We call this fitting residuals.
12. Make a histogram of this fitting residual in Excel. Does the histogram of residuals look
significantly asymmetric at all?
(Hint: If you have chosen a good model for your data, then this histogram should look
fairly symmetric.)


SUMMARY OUTPUT
Regression Statistics
Multiple R 0.851368
R Square
0.724827
Adjusted R Square
0.724552
Standard Error
19.07657
Observations
1000
ANOVA
df
Regression
Residual
Total

SS
MS
F
Significance F
1 956666.3 956666.3 2628.814 6.8E-282
998 363187.7 363.9155
999 1319854

Coefficients
Standard Error t Stat
P-value Lower 95% Upper 95%Lower 95.0%
Upper 95.0%
Intercept
-23.891 1.193034 -20.0255 2.97E-75 -26.2322 -21.5499 -26.2322 -21.5499
x
21.33693 0.416152 51.27196 6.8E-282 20.5203 22.15356 20.5203 22.15356

Upper 95.0%

x

y
1.648878
4.518921
2.330583
2.541003
3.973645
0.675502
1.024979
0.950881
0.009456
4.288875
0.050046
0.043423
4.202426
3.850216
4.285799
4.66726
0.075365
0.949221
3.47052
4.886488
2.876303
2.811851
4.708872
3.431778
1.655788
1.700282
4.671857
1.537479
0.193813
2.8389
3.254247
3.693456
4.781866
1.530281
3.795519
1.406636
4.509547
3.352941
1.457745
1.617424
3.583859
0.921144
2.201639
1.822583
1.067535
2.667733

5.7798
93.20637
7.826325
12.79112
50.21983
1.071444
3.628977
3.826931
0.996638
73.98215
0.959431
1.075716
67.12277
47.50258
74.25101
108.5997
1.143787
2.944744
24.39221
129.5123
15.67953
16.84199
117.4992
31.41899
6.13349
2.459243
110.7189
2.891897
1.348964
19.90782
25.67105
48.0882
119.6664
3.431451
42.28548
1.877812
88.84781
32.43134
2.691484
5.871715
30.66369
2.103122
7.931607
6.810705
3.035929
14.52326

Exponential Relationship between X and Y
180
160

y = 0.9295e1.0159x
R² = 0.9942

140
120
100
80
60
40
20
0

0

1

2

3

4

Quadratic Relationship between X and Y
180
160
140
y = 9.2074x2 - 24.212x + 13.093
R² = 0.9562

120
100
80
60
40
20
0

-20

0

1

2

3

4

5

4.161357
2.363975
3.161784
0.985005
4.987746
0.011688
3.076565
4.122343
2.070329
4.481583
2.560064
4.808426
3.039346
3.330471
4.607273
2.243444
3.16649
1.057575
2.801717
0.518046
1.461644
2.720085
0.904358
1.380377
1.162985
2.744089
2.785244
3.102563
0.003285
0.938918
2.381347
0.090784
2.37454
0.382953
1.253457
2.905908
0.755068
2.051053
4.465179
4.272169
2.713777
2.302291
0.746974
2.444521
2.845604
1.7073
2.089614

55.39369
14.25976
19.17001
3.306404
155.8761
1.00074
23.78397
65.41764
10.14368
91.07105
15.18812
129.0439
20.05275
21.73629
97.89827
9.603582
28.12463
2.991433
21.70421
0.862635
5.168361
18.42293
4.249532
1.888901
2.462542
12.64908
20.29885
20.52827
1.00743
1.433343
12.11145
0.990042
11.62775
1.231818
0.819087
17.74342
1.606164
8.144837
83.54298
73.68443
13.60238
6.758659
3.130564
4.26824
17.47481
2.3257
6.311315

0.883861
4.173884
3.378974
0.927215
3.055664
3.525758
0.141595
2.604347
1.295602
2.54108
4.866205
4.091158
4.293807
4.483086
1.497523
4.400125
1.276066
0.960741
0.933092
4.305538
2.222995
1.933028
4.16003
1.032461
4.489014
2.297515
0.81725
1.343749
2.603504
3.86184
1.936955
0.222126
4.934238
2.615118
4.218416
4.459286
3.427946
2.589221
2.209781
1.900705
0.159219
1.079411
0.426519
3.984383
2.009028
0.248969
1.582443

2.883044
66.77009
30.71139
2.595922
25.99555
35.05514
1.155686
12.85412
3.83067
12.6044
130.4813
65.45301
76.82538
72.5646
5.027515
75.90524
3.956798
3.034999
3.475982
74.79082
6.524697
8.086067
68.47196
4.193184
88.18193
9.97659
2.608835
5.103995
16.30688
52.28103
4.599653
1.476889
140.6528
15.33591
57.34552
88.94081
31.66928
12.75953
7.977641
6.391857
1.221343
2.014816
0.961824
59.18523
5.917151
1.116761
2.989636

4.356099
4.65703
2.128671
4.330776
2.403215
4.541712
1.25267
2.633221
1.224156
4.192197
2.628617
1.749019
1.285658
0.84414
2.996174
4.054959
4.244596
4.223862
1.175015
2.541427
3.346774
0.453707
2.714993
1.489957
1.909216
2.669433
2.132628
1.160181
0.142095
0.130141
1.371506
3.472403
1.649989
3.752445
3.978261
0.268709
0.04612
3.640623
3.32196
2.926633
4.595189
0.836686
4.85982
1.098035
0.191243
1.144807
3.538829

77.98369
104.0511
11.71595
74.23988
9.285609
85.26298
3.320056
15.62036
4.206473
65.84248
12.73397
6.224072
3.94049
1.531334
18.17501
58.996
67.05477
66.49685
2.864008
13.08458
30.21005
1.650538
18.98006
6.115654
9.52471
12.81326
11.77451
4.644285
1.009507
1.017776
3.895182
27.8274
6.724628
40.04083
46.64458
1.226319
1.050008
31.70219
28.1285
19.84216
93.8921
2.208047
128.9826
3.324752
1.524732
3.925811
34.21492

1.76311
3.547439
2.943463
0.54378
3.234117
3.655574
4.316901
0.113238
0.756006
2.62597
1.132823
4.208317
2.150165
1.496952
2.885845
1.723762
4.285056
1.432844
2.204768
3.831956
1.915072
1.19402
1.969561
0.979167
2.203507
3.092003
1.159277
4.357166
3.6571
1.599744
4.058883
2.294279
3.379845
1.438925
3.652599
3.671107
3.539646
1.047327
0.855793
2.668704
3.54431
0.005064
0.954915
3.680069
4.977723
0.985647
1.164452

3.916395
32.32874
20.11771
2.101055
29.38512
37.26733
75.76642
1.22765
2.928009
15.88228
4.253935
66.09762
10.03225
4.840509
15.61711
5.110614
76.71749
1.95877
7.957096
46.78508
5.033057
2.858683
7.178325
1.928694
10.83322
23.08813
1.210453
83.29067
44.51686
6.606169
59.16819
17.47448
27.6661
5.121412
40.40857
42.84325
37.34481
3.45216
1.732432
15.22126
32.47801
1.009962
1.396841
37.94645
141.8007
2.286329
2.92619

1.675954
0.695151
1.682591
2.830731
2.304967
4.457726
0.586993
0.5309
1.959452
3.483716
4.442635
0.23102
1.460888
4.768742
3.804848
0.278853
0.142122
4.635355
1.852951
0.079169
2.518181
4.74577
4.662191
0.268533
4.572761
2.518768
0.9577
3.094173
4.376614
2.09939
4.65216
2.651794
4.221987
3.802525
3.24474
4.646449
3.492898
1.141544
3.517037
1.190162
2.70168
1.886306
3.164412
4.937993
3.884651
3.584409
4.213101

5.133576
1.881291
7.899756
17.99587
12.80677
83.98451
1.953985
1.619601
8.880333
35.15638
87.78502
1.10933
1.751076
114.5894
48.64289
1.877703
1.010107
106.1461
10.06047
0.901384
13.42095
115.8372
99.03103
1.518291
98.45964
11.98667
2.337901
19.44493
77.16724
10.09976
104.6709
13.3653
65.21686
45.78114
30.56447
101.7738
29.14591
3.40605
33.97551
2.664565
10.43689
6.19698
24.42106
132.6419
48.95685
40.79141
69.54476

1.309571
2.306417
1.814609
3.084972
1.547248
3.163283
3.895265
4.440191
3.39419
1.071976
0.98371
2.053039
3.834908
3.933497
2.997978
0.800305
0.938055
4.295674
2.576512
2.344928
0.026379
3.788006
2.568425
2.209521
4.872266
1.751626
0.9885
2.392965
4.444466
2.799132
1.245591
1.324825
2.816901
3.965781
0.105891
3.684577
2.005691
3.550329
3.205505
3.238965
4.79363
2.578343
0.064231
4.14297
3.132676
3.041934
3.529977

4.682828
12.32334
7.770747
24.70827
2.800553
26.23383
57.17446
89.35129
27.98393
3.917122
1.734682
11.07063
39.08337
52.31046
17.38708
2.654814
2.390042
72.11599
9.420639
9.835014
1.060276
47.02121
12.38368
8.619857
133.6575
7.210103
3.586725
10.1785
77.22621
20.06914
3.4056...


Anonymous
Just what I needed. Studypool is a lifesaver!

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags