Description
Hi,
All the questions are in the file Exam 2 Canvas.doc. The excel and jmp files are the data he includes for some of the questions and their work. The data in the .jmp file is attached as a compressed file. Looking to pass the exam with a good grade.
Thanks
Unformatted Attachment Preview
Purchase answer to see full attachment
Explanation & Answer
Hi, I am back ;). here is your assignment and excel ;).. You can invite me if you need more help ;)
DS 101 Summer 2020
EXAM #2
Points possible 100
NAME: ____________________
Directions: Neatly write your answers in the space provided.
This exam is due on July 3 at 5:00 PM. You are to submit your exam by emailing it to
STaylorDS101@gmail.com. Keep a copy of you submission for your records. In the
subject line put your name and “EXAM 2 SUMMER 2020”. For example, Stan Taylor
EXAM 2 SUMMER 2020
1. a) (5 Pts) Logistic regression is a classification technique. Explain why this is the case.
Answer:
The logistic regression is a classification technique since it used in classification of data mining
for structure the basis so that it has distributed metric structure. Logistic regression is more
useful when we have binary data. That is we have that dependent variable which is dichotomous.
So it has classification ( Yes or No ) and also logistic regression is explain the relationship
between one dependent binary variable and one or more nominal , ordinal , interval or ratio level
independent variables.
b) (5 Pts) In logistic regression JMP uses a default rule of
If Pi > 0.5 then the observation is classified as a success (ie 1)
If Pi < 0.5 then the observation is classified as a failure(ie 0)
Explain why one might want to use a different value than 0.5? (Think confusion matrix)
Answer:
Varying sensitivity analysis and specificity can be acquired by applying different value than 0.5.
Having +0.5 threshold, the specificy will be increase and having below 0.5 thereshold, theer will
be the increased of sensitivity. Hence, in summay, as the specificity incresaes, the senisivity
decresaes and vice versa. Shown on the matrix, we can apply the specificity and the sensitivity
formulation as illustrated below:
1
TP = True positive
TN = True negative
FN = False negative
FP = True negative
Fomula:
Sensivity = TP / TP + FN
Specificity = TN / TN + FP
c) (5 Pts) The confusion matrix for a logistic regression model is shown below. Interpret
the values included in the confusion matrix
.
Since TP and FP were both predicted 1, although TP indicates actual and predicted 1 while FP
indiactes actual was 0, only the predicted is 0. Now, for the values on the given matrix, we can
see that the values belongs to predicted 1 are both 8 and 4, 8 value is considered to be TP since it
has both actala nd predicted 1 and the 4 value is FP since its actual is 0and predicted is 1.
Let’s proceed to FN an TN. TN indicates that actaul is 0 has predicted 1 and FN was the vice
versa of TN which has 0 actuala nd 1 predicted. Looking for the values on the given matrix, 4
belongs to FN because it has actual 1 and predicted is 0 and 12 is belongs to TN since its actual
is 0 and predicted is 0.
Specificity: TN / TN + FP = 12 / 12 + 4 = ¾
Sensitivity: TP / TP + FN = 8 / 8+ 4 = 2/3
2. For the regression model Salest = βo + β1 Advertising t-3 + β2 Sales t-12 + εt
2
a) (5 Pts) Explain what the term εt represents
Answer:
The term εt represents the residual. The residuals of the fit should be normally distributed with
mean 0 and variance. These are used in Residual plots. Residuals must be random. The nonrandom pattern in the residuals indicates that the deterministic portion (predictor variables) of the
model is not capturing some explanatory information that is “leaking” into the residuals. The
graph could represent several ways in which the model is not explaining all that is possible.
b) (5 Pts) explain the technique JMP uses to estimate the parameters of the model
Answer:
The technique JMP has uses both ACF and PACF. The identification of an ARIMA
model is done with the ACF rather than PACF. For an ARIMA model; the theoretical PACF
does not shut off, but instead tapers toward 0. This is useful to detect the ORDER of an
autoregressive model. That is, the PACF for a time series with lag 1 will have non-zero value
only till 1; the partial auto-correlation function (PACF) gives the partial correlation of a time
series with its own lagged values, controlling for the values of the time series at all shorter lags.
It contrasts with the auto-correlation function, which does not control for other lags. For the
fitted model, check the ACF, PACF and Residual plots to decide whether an alternate model
ought to be fit. JMP gives a Model Comparison report, which shows How any new model fits the
information better.
c) (5 Pts) If the ...