DAT105 University of California Berkeley Database Statistics Questions

Anonymous

Question Description

please read then finishing answers for each questions with several sentences short writes.

Unformatted Attachment Preview

1:Your CEO has heard that a one of the advantages of a BigData database platform is that it can handle a lot of work very quickly due to the feature known as M-P-P, but he asks you to explain that feature to him in simple terms. How would you respond? 2:A marketing test had just been completed and you are asked to lead the analysis to differentiate those that responded vs. those that did not respond. You propose to split the results file into a Test set and Validation set. The marketing manager asks you what is the difference? And then points out that if you do that you won't be analyzing all of the data, and you should just analyze the one file of all results. Good idea or not? How would you respond? 3: Your CMO wants to test the response to a new product that is designed to appeal to a subset of consumers - those that have lived in their current address for less than a year. She asks you to run a test and measure response based on a sample of 500 people. She suggests selecting the 500 consumers from the database in the following way: For each record, flip a coin. If the coin comes up Heads, then the consumer is included in the test. If Tails, then the consumer is not included. Is this a good way to assemble a sample to test response to the product? Why or why not? 4: You work for an airline and the customer database has information on Relationship Status as well as amount paid for recent travel tickets. Relationship status is coded as: 0:Single, 1:Married, 2: Divorced, 3:Widowed, 4:Commited, 5:It's Complicated. Amount spent is in the database as actual dollars paid. Your Chief Marketing Officer wants to determine whether there is a correlation between amount spent and the customer's relationship status. She asks you to calculate the correlation between relationship-status and amount spent. Is this a good idea, or not? Explain. 5: You are in a meeting with the CMO of a major retailer discussing a simple regression model to predict Amount_Spent (Y) from Age (X). The model has a high r-square value, and everything about the final model is statistically significant. The coefficient for AGE is 0.00000001234. The CMO points out that the coefficient for AGE is almost zero, and anything multiplied by zero is zero; So, AGE really doesn't matter in the model. Agree? Disagree? How would you respond? 6: Using SAS and the EMMA dataset, generate a correlation matrix (table) that outputs the correlations between TENURE, AGE, ORDERS. From the SAS output, which pair of variables has the lowest correlation? What is the correlation value for this pair of variables? And what is the probability of getting this correlation simply by chance? How many records did SAS use in this calculation? Imagine that a dataset has a correlation of r=0.80 between Salary in dollars (Y-axis) and GPA from 0 to 4.0 on the x-axis. Which of the following statements are true. • A. A simple regression model to predict Salary from GPA will have greater explainedvariation than unexplained variation. • B. "Flipping" the equation to model GPA (Y) based on Salary (X) will have the same amount of explained variation as a model predicting Salary from GPA • C. The simple regression model to predict one variable from the other will have a regression line with a positive slope. • D. All of these • E. None of these ...
Purchase answer to see full attachment

Tutor Answer

psumanrec
School: University of Maryland

Please find answer.Thank you.

1:Your CEO has heard that a one of the advantages of a BigData database platform is that it can
handle a lot of work very quickly due to the feature known as M-P-P, but he asks you to explain that
feature to him in simple terms. How would you respond?
Answer
M-P-P stands for massively parallel processing where a large number of processors are used for
performing a set of coordinated function in parallel. It consists of a large number of homogenous
processing nodes interconnected through a high-speed network. Here the same program is being
processed in collaboration by two or more processors. Here, each processor has its own dedicated
program which handles different threads of the program.
2:A marketing test had just been completed, and you are asked to lead the analysis to differentiate
those that responded vs. those that did not respond. You propose to split the results file into a Test
set and Validation set. The marketing manager asks you what is the difference? And then points out
that if you do that you won't be analyzing all of the data, and you should just analyze the one file of all
results. Good ...

flag Report DMCA
Review

Anonymous
Thanks, good work

Brown University





1271 Tutors

California Institute of Technology




2131 Tutors

Carnegie Mellon University




982 Tutors

Columbia University





1256 Tutors

Dartmouth University





2113 Tutors

Emory University





2279 Tutors

Harvard University





599 Tutors

Massachusetts Institute of Technology



2319 Tutors

New York University





1645 Tutors

Notre Dam University





1911 Tutors

Oklahoma University





2122 Tutors

Pennsylvania State University





932 Tutors

Princeton University





1211 Tutors

Stanford University





983 Tutors

University of California





1282 Tutors

Oxford University





123 Tutors

Yale University





2325 Tutors