DS 101 Logistic Regression Questions

User Generated

Enpury_B

Mathematics

DS 101

DS

Description

Hi,

All the questions are in the file Exam 2 Canvas.doc. The excel and jmp files are the data he includes for some of the questions and their work. The data in the .jmp file is attached as a compressed file. Looking to pass the exam with a good grade.

Thanks

Unformatted Attachment Preview

Month SALES Print Mail BB Total Jul-14 577.563 1565,00 336,45 5488,75 7390,20 Aug-14 542.948 2092,75 352,72 3634,50 6079,97 Sep-14 499.066 1590,75 290,59 3608,75 5490,09 Oct-14 485.132 2229,25 267,92 4923,50 7420,67 Nov-14 511.742 1458,00 304,23 4525,50 6287,73 Dec-14 631.313 1370,50 270,26 6218,00 7858,76 Jan-15 529.692 2257,00 384,13 3451,00 6092,13 Feb-15 637.138 1384,25 277,97 6231,75 7893,97 Mar-15 481.165 1302,00 370,73 3106,00 4778,73 Apr-15 580.945 2066,50 265,48 5647,00 7978,98 May-15 891.906 1466,00 326,28 5426,25 7218,53 Jun-15 685.383 1558,75 356,62 5714,00 7629,37 Jul-15 611.582 1460,50 331,90 4719,50 6511,90 Aug-15 613.348 2302,25 312,57 5768,00 8382,82 Sep-15 483.268 1883,75 368,41 2756,00 5008,16 Oct-15 466.770 1318,50 227,59 3757,25 5303,34 Nov-15 402.431 1275,25 185,93 3514,00 4975,18 Dec-15 401.063 1899,75 269,74 3461,00 5630,49 Jan-16 459.923 1884,00 208,35 4377,75 6470,10 Feb-16 449.089 1972,25 302,45 3599,25 5873,95 Mar-16 527.236 2101,25 237,71 4840,50 7179,46 Apr-16 607.349 1462,00 310,25 6173,25 7945,50 May-16 1.019.440 1622,75 382,90 5827,50 7833,15 Jun-16 692.305 2194,75 361,40 5928,50 8484,65 Jul-16 634.111 1849,00 388,14 4992,25 7229,39 Aug-16 663.151 2448,75 380,32 6131,00 8960,07 Sep-16 535.825 1751,25 388,75 3414,00 5554,00 Oct-16 458.804 1672,75 324,81 3371,25 5368,81 Nov-16 415.963 1905,25 250,31 3565,50 5721,06 Dec-16 416.112 1952,50 253,94 3924,00 6130,44 Jan-17 518.759 2392,25 261,97 5210,75 7864,97 Feb-17 506.471 2095,25 298,46 3971,75 6365,46 Mar-17 562.210 1581,75 299,79 5609,50 7491,04 Apr-17 629.802 2303,50 311,84 5965,75 8581,09 May-17 1.002.376 2001,25 370,22 5697,50 8068,97 Jun-17 692.552 2465,50 407,25 5984,50 8857,25 Jul-17 626.601 1453,50 425,41 4289,75 6168,66 Aug-17 517.671 1873,25 304,05 3055,00 5232,30 Sep-17 555.945 2165,75 233,85 5968,00 8367,60 Oct-17 459.841 1724,25 341,83 3857,25 5923,33 Nov-17 575.057 1768,50 333,82 5766,75 7869,07 Dec-17 314.564 2419,00 281,04 3639,50 6339,54 Jan-18 308.211 2424,25 293,47 4544,25 7261,97 Feb-18 324.154 2397,25 265,65 5392,00 8054,90 Mar-18 395.687 1824,25 344,61 5340,00 7508,86 Apr-18 591.948 2281,25 322,29 3931,75 6535,29 May-18 Jun-18 Jul-18 Aug-18 Sep-18 Oct-18 Nov-18 Dec-18 Jan-19 Feb-19 Mar-19 Apr-19 May-19 Jun-19 Jul-19 Aug-19 Sep-19 Oct-19 Nov-19 Dec-19 Jan-20 Feb-20 Mar-20 Apr-20 May-20 Jun-20 961.455 606.309 551.793 500.598 517.344 468.414 397.045 449.809 572.024 507.434 447.931 477.764 860.729 642.444 571.933 476.864 535.468 471.865 607.926 621.473 636.665 615.563 469.326 528.467 821.140 467.031 1895,75 1482,75 1277,25 2359,75 1495,75 2135,00 1845,25 2062,75 2099,25 1282,50 1802,75 1276,00 1721,75 1879,50 2488,50 1633,00 1268,50 2325,75 1566,00 1855,50 1918,75 1301,50 2438,00 1642,50 2360,00 1816,25 318,26 361,22 379,22 326,13 262,42 320,52 257,48 261,89 274,75 359,94 248,67 221,05 338,66 271,87 358,47 320,58 239,17 315,12 307,19 335,55 405,20 331,18 355,10 273,02 287,62 308,22 5671,50 5345,75 3969,50 3801,25 5073,25 2921,75 3208,25 5032,75 5374,75 2796,75 3388,50 5217,75 4119,75 5999,00 4290,00 2773,50 5687,00 3747,00 5858,75 5997,25 4208,25 5448,25 2735,25 5150,50 4226,25 3757,75 7885,51 7189,72 5625,97 6487,13 6831,42 5377,27 5310,98 7357,39 7748,75 4439,19 5439,92 6714,80 6180,16 8150,37 7136,97 4727,08 7194,67 6387,87 7731,94 8188,30 6532,20 7080,93 5528,35 7066,02 6873,87 5882,22 DS 101 Summer 2020 EXAM #2 Points possible 100 NAME: ____________________ Directions: Neatly write your answers in the space provided. This exam is due on July 3 at 5:00 PM. You are to submit your exam by emailing it to STaylorDS101@gmail.com. Keep a copy of you submission for your records. In the subject line put your name and “EXAM 2 SUMMER 2020”. For example, Stan Taylor EXAM 2 SUMMER 2020 1. a) (5 Pts) Logistic regression is a classification technique. Explain why this is the case. b) (5 Pts) In logistic regression JMP uses a default rule of If Pi > 0.5 then the observation is classified as a success (ie 1) If Pi < 0.5 then the observation is classified as a failure(ie 0) Explain why one might want to use a different value than 0.5 ? (Think confusion matrix) c) (5 Pts) The confusion matrix for a logistic regression model is shown below. Interpret the values included in the confusion matrix . 1 2. For the regression model Salest = βo + β1 Advertising t-3 + β2 Sales t-12 + εt a) (5 Pts) Explain what the term εt represents b) (5 Pts) Explain the technique JMP uses to estimate the parameters of the model c) (5 Pts) If the estimated model is Sales t = 215.2 + 2.24 Advertising t-3 + 1.06 Sales t-12, provide an economic interpretation for the estimated parameters (assume they are statistically significant) 2 3. (20 points) The quality control director for a clothing manufacturer wanted to study the effect of operators and machines on the breaking strength (in pounds) of wool serge material. A batch of material was cut into square yard pieces and these were randomly assigned, two each, to all twelve combinations of four operators and three machines chosen specifically for the equipment. The results were as follows: Operator A B C D Machine Machine Machine I II III 110 117 109 114 113 122 108 114 102 106 118 114 107 105 104 101 102 108 111 110 107 109 115 124 At the .05 level of significance: (a) Is there an affect due to operator? If so, describe the affect (b) Is there an affect due to machine? If so, describe the affect 3 4. (20 points) An Audi car dealership wants to assess the likelihood that an applicant for a new car loan will pay off the loan in full. You are to construct a logistic regression model based upon data (AudiCarLoan.JMP) from the dealer using the two variables status and credit score. Status is defined as either 1 if the loan was paid in full or 0 if the loan defaulted (not paid in full). The population for the model are individuals that had score between 650 and 690, which are known as the “ Problem Credit Score” category. What is the estimated logistic model? Interpret the estimate for credit score? According to the estimated model, what would be the estimated probability for an applicant with a score of 663 will pay off the loan in full? According to the estimated model, what would be the estimated probability for an applicant with a score of 682 will pay off the loan in full? Describe the confusion matrix for the model created, using the data provided. 4 . 5.(30 pts) The marketing department for BER Inc. has been struggling with how they will spend their advertising budget in the coming year. You have been requested to make a recommendation. You have been provided the monthly values for net sales (SALES), and advertising expenditures for mailings (MAIL), print(PRINT), and billboards (BB) along with their sum (TOTAL = MAIL+ PRINT + BB), in the file BER.JMP, covering the months from July 2014 through June 2020. Your assignment is to evaluate the effectiveness of each of the advertising components (mailings, print and billboards) with regards to net sales. a. Determine an appropriate forecasting equation using regression analysis. (use the following page to show you work) b. Final estimated equation (State equation in the form Yt = bo + b1 X1 + b 2 X2 + b 3 X3 + ...... ) c. What is your recommendation regarding where (what media) to spend additional advertising dollars? Justify your answer. d. Make a forecast for July 2020. Assume that the amount to be spent on the three types of advertising (MAIL, PRINT and BB), for July 2020, are identical to what was spent in June 2020. 5 Model Building Problem 5 6
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Hi, I am back ;). here is your assignment and excel ;).. You can invite me if you need more help ;)

DS 101 Summer 2020
EXAM #2
Points possible 100

NAME: ____________________

Directions: Neatly write your answers in the space provided.

This exam is due on July 3 at 5:00 PM. You are to submit your exam by emailing it to
STaylorDS101@gmail.com. Keep a copy of you submission for your records. In the
subject line put your name and “EXAM 2 SUMMER 2020”. For example, Stan Taylor
EXAM 2 SUMMER 2020

1. a) (5 Pts) Logistic regression is a classification technique. Explain why this is the case.
Answer:
The logistic regression is a classification technique since it used in classification of data mining
for structure the basis so that it has distributed metric structure. Logistic regression is more
useful when we have binary data. That is we have that dependent variable which is dichotomous.
So it has classification ( Yes or No ) and also logistic regression is explain the relationship
between one dependent binary variable and one or more nominal , ordinal , interval or ratio level
independent variables.

b) (5 Pts) In logistic regression JMP uses a default rule of
If Pi > 0.5 then the observation is classified as a success (ie 1)
If Pi < 0.5 then the observation is classified as a failure(ie 0)
Explain why one might want to use a different value than 0.5? (Think confusion matrix)
Answer:
Varying sensitivity analysis and specificity can be acquired by applying different value than 0.5.
Having +0.5 threshold, the specificy will be increase and having below 0.5 thereshold, theer will
be the increased of sensitivity. Hence, in summay, as the specificity incresaes, the senisivity
decresaes and vice versa. Shown on the matrix, we can apply the specificity and the sensitivity
formulation as illustrated below:
1

TP = True positive
TN = True negative
FN = False negative
FP = True negative
Fomula:
Sensivity = TP / TP + FN
Specificity = TN / TN + FP

c) (5 Pts) The confusion matrix for a logistic regression model is shown below. Interpret
the values included in the confusion matrix
.

Since TP and FP were both predicted 1, although TP indicates actual and predicted 1 while FP
indiactes actual was 0, only the predicted is 0. Now, for the values on the given matrix, we can
see that the values belongs to predicted 1 are both 8 and 4, 8 value is considered to be TP since it
has both actala nd predicted 1 and the 4 value is FP since its actual is 0and predicted is 1.
Let’s proceed to FN an TN. TN indicates that actaul is 0 has predicted 1 and FN was the vice
versa of TN which has 0 actuala nd 1 predicted. Looking for the values on the given matrix, 4
belongs to FN because it has actual 1 and predicted is 0 and 12 is belongs to TN since its actual
is 0 and predicted is 0.
Specificity: TN / TN + FP = 12 / 12 + 4 = ¾
Sensitivity: TP / TP + FN = 8 / 8+ 4 = 2/3

2. For the regression model Salest = βo + β1 Advertising t-3 + β2 Sales t-12 + εt
2

a) (5 Pts) Explain what the term εt represents
Answer:
The term εt represents the residual. The residuals of the fit should be normally distributed with
mean 0 and variance. These are used in Residual plots. Residuals must be random. The nonrandom pattern in the residuals indicates that the deterministic portion (predictor variables) of the
model is not capturing some explanatory information that is “leaking” into the residuals. The
graph could represent several ways in which the model is not explaining all that is possible.

b) (5 Pts) explain the technique JMP uses to estimate the parameters of the model
Answer:
The technique JMP has uses both ACF and PACF. The identification of an ARIMA
model is done with the ACF rather than PACF. For an ARIMA model; the theoretical PACF
does not shut off, but instead tapers toward 0. This is useful to detect the ORDER of an
autoregressive model. That is, the PACF for a time series with lag 1 will have non-zero value
only till 1; the partial auto-correlation function (PACF) gives the partial correlation of a time
series with its own lagged values, controlling for the values of the time series at all shorter lags.
It contrasts with the auto-correlation function, which does not control for other lags. For the
fitted model, check the ACF, PACF and Residual plots to decide whether an alternate model
ought to be fit. JMP gives a Model Comparison report, which shows How any new model fits the
information better.

c) (5 Pts) If the ...


Anonymous
Just the thing I needed, saved me a lot of time.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags