User Generated

serfuyl12

Mathematics

Description

Question 2: Machine Learning

a) Which of unsupervised or supervised machine learning is best suited to assessing causation? Explain your choice.

b) Your analytics team presents you with two sets of results that have improved the organization’s ability to predict customer defections. The first method uses deep learning and has a precision of 85%. The second method uses decision trees and has a precision of 70%. The previous approach had a precision of 40%.

i) Make a case for using the results of the deep learning method.

ii) Make a case for using the decision tree method.

c) An analytics team used two different models to predict the likelihood of an outcome. The results from two different analysts are below:

Don’s Analysis

 Actual Positive Negative Predicted Positive 220 100 Negative 30 650

Katie’s Analysis

 Actual Positive Negative Predicted Positive 170 10 Negative 80 740

i) Use the Confusion Matrix and Index Calculation tables below to calculate the model performance measures.

 Confusion Matrix Actual Positive Negative Predicted Positive TP FP Negative FN TN
 Formula Don Calculation Katie Calculation Accuracy (completed as an example) (TP + TN) / (TP + TN + FP + FN) (220 + 650) / (220 + 650 + 100 + 30) 0.87 (170 + 740) / (170 + 740 + 10 + 80) 0.91 Precision TP / (TP + FP) Error rate (FP + FN) / (TP + TN + FP + FN) Recall TP / (TP + FN) Specificity TN / (TN + FP) False positive rate FP / (TN + FP) F-score 2* ((Precision*Recall) / (Precision + Recall))

ii) Describe a medical or business context where you would prefer to use Don’s model. Why do you prefer Don’s model?

iii) Describe a medical or business context where you would prefer to use Katie’s model. Why do you prefer Katie’s model?

Ian is an intern with the team who claims he made a breakthrough with a model that outperforms both Don’s and Katie’s. The confusion matrix for his model is below:

Ian’s Analysis

 Actual Positive Negative Predicted Positive 249 2 Negative 1 748

iv) What could possibly have gone wrong that would result in his results being invalid? How could this be solved? (15 marks)

Question 3: Experiments

Jennifer was given the results of an experiment that was designed to determine if a 10% reduction in price on an online shopping portal would lead to an increase in purchases. Control and treatment group were created. These groups are described below:

 Control Group Treatment Group Number of males 25 25 Number of females 25 25 Average age 47 years 37 years Average spend per visit in the month BEFORE the experiment \$25.00 \$25.00 Average spend per visit in the month AFTER the experiment \$25.00 \$29.00

a) Were the control and treatment groups effectively randomized? Why or why not?

b) What are the two most likely explanations for the treatment groups showing a higher average spend than the control group?

c) What type of analysis could be used to remove one of the possible explanations for the difference in average spend?

d) Experiments are useful in helping determine if people have responded due to a stimulus or if they would have responded even without the stimulus. Design an experiment that could demonstrate what proportion of people have responded to a stimulus. These people could be customers or employees within a company. Examples could be an advertising campaign to customers, or a policy of flexible work hours for employees. Requirements:

i) How would you pick the treatment and control groups? Fill in the table below to indicate the number of people and 3 important characteristics that describe each group

 Control Group Treatment Group People Characteristic 1: Characteristic 2: Characteristic 3:

ii) Predict the results and state the managerial conclusion you could make from this result. Use the table below to indicate the change in behavior you expect to observe.

 Control Group Treatment Group Observed behavior before treatment: Observed behavior after treatment:

iii) State the managerial action you could take from the results of your experiment. Briefly describe a useful follow-up experiment that would further deepen understanding of why people behaved in the manner observed.

User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Your assignment is complete, if you have any queries shoot me a message and hang tight, I'll assist you in a couple minutes or asap. :-)Have a blessed day

Institutional Affiliation
Date

1

2

Question 2: Machine Learning
a) Which of unsupervised or supervised machine learning is best suited to assessing
Unsupervised learning technique is best suited for assessing causation. Unsupervised learning
techniques rely on latent variables to assess for causation. With unsupervised learning, it is
possible to learn larger and more complex models than with supervised learning. This is because
in supervised learning one is trying to find the connection between two sets of observations. The
causal structure of supervised learning technique assumes that you have inputs at the start of the
model and outputs at the end. The difficulty of the learning task increases exponentially in the
number of steps between the two sets and that is why supervised learning cannot, in practice,
learn models with deep hierarchies.
b) Your analytics team presents you with two sets of results that have improved the
organization’s ability to predict customer defections. The first method uses deep learning
and has a precision of 85%. The second method uses decision trees and has a precision of
70%. The previous approach had a precision of 40%.
i) Make a case for using the results of the deep learning method.
Deep learning methods perform best under situations where the data is unstructured (audio,
images, text, video). Given such a data set, I would consider using deep learning method to be
able to obtain better results.
ii) Make a case for using the decision tree method.
Decision trees are part of the random forests ensemble methods. Decision trees work best in
situations of binary classifications. Random forests are good in classification and prediction n

3

scenarios where the number of variables is greater than the number of observations (high
dimensional data sets). Therefore when dealing with binary data sets that are high dimensional in
nature...

Review

Anonymous
Really useful study material!

Studypool
4.7
Indeed
4.5
Sitejabber
4.4