Exploratory Data Analysis Project

Anonymous
timer Asked: Apr 9th, 2017

Question description

Please, I need help completing this project for tomorrow. I only need part A and B for tomorrow, but part C will be due next week.

Preparation • Open the files for the Course Project and the data set. • For each of the five variables, process, organize, present and summarize the data. Analyze each variable by itself using graphical and numerical techniques of summarization. Use Excel as much as possible, explaining what the results reveal. Some of the following graphs may be helpful: stem-leaf diagram, frequency/relative frequency table, histogram, boxplot, dotplot, pie chart, bar graph. Caution: not all of these are appropriate for each of these variables, nor are they all necessary. More is not necessarily better. In addition be sure to find the appropriate measures of central tendency, the measures of dispersion, and the shapes of the distributions (for the quantitative variables) for the above data. Where appropriate, use the five number summary (the Min, Q1, Median, Q3, Max). Once again, use Excel as appropriate, and explain what the results mean. • Analyze the connections or relationships between the variables. There are ten (10) possible pairings of two (2) variables. Use graphical as well as numerical summary measures. Explain the results of the analysis. Be sure to consider all 10 pairings. Some variables show clear relationships, whereas others do not. Report Requirements • From the variable analysis above, provide the analysis and interpretation for three individual variables. This would include no more than one graph for each, one or two measures of central tendency and variability (as appropriate), the shapes of the distributions for quantitative variables, and two or three sentences of interpretation. • For the 10 pairings, identify and report only on three of the pairings, again using graphical and numerical summary (as appropriate), with interpretations. Please note that at least one pairing must include a qualitative variable, and at least one pairing must not include a qualitative variable. • Prepare the report in Microsoft Word, integrating graphs and tables with text explanations and interpretations. Be sure to include graphical and numerical back up for the explanations and interpretations. Be selective in what is included in the report to meet the requirements of the report without extraneous information. • All DeVry University policies are in effect, including the plagiarism policy. • Project Part A report is due by the end of Week 2. • Project Part A is worth 100 total points. See the grading rubric below. Submission: The report, including all relevant graphs and numerical analysis along with interpretations Format for report: A. Brief Introduction B. Discuss the first individual variable, using graphical, numerical summary and interpretation. C. Discuss the second individual variable, using graphical, numerical summary and interpretation. D. Discuss the third individual variable, using graphical, numerical summary and interpretation. E. Discuss the first pairing of variables, using graphical, numerical summary and interpretation. F. Discuss the second pairing of variables, using graphical, numerical summary and interpretation. G. Discuss the third pairing of variables, using graphical, numerical summary and interpretation. H. Conclusion Part A: Grading Rubric Category Points % Description Three individual variables—12 points each 36 36 Graphical analysis, numerical analysis (when appropriate), and interpretation Three relationships— 15 points each 45 45 Graphical analysis, numerical analysis (when appropriate), and interpretation Communication skills 19 19 Writing, grammar, clarity, logic, cohesiveness, adherence to the above format Category Points % Description Total 100 100 A quality paper will meet or exceed all of the above requirements. Part B: Hypothesis Testing and Confidence Intervals The data file includes four hypotheses labeled a. - d. • a. Mean sales per week exceeds 41.5 per salesperson b. Proportion receiving online training is less than 55% c. Mean calls made among those with no training is less than 145 d. Mean time per call is greater than 15 minutes 1. Using the same data set from Part A, perform the hypothesis test for each speculation in order to see if there is evidence to support the manager's belief. Use the Seven Elements of a Test of Hypothesis from Section 7.1 of your textbook, as well as the p-value calculation from Section 7.3, and explain your conclusion in simple terms. 2. Compute confidence intervals (the required confidence level is included with the speculations) for each of the variables described in A–D, and interpret these intervals. 3. Write a report about the results, distilling down the results in a way that would be understandable to someone who does not know statistics. Clear explanations and interpretations are critical. 4. All DeVry University policies are in effect, including the plagiarism policy. 5. Project Part B report is due by the end of Week 6. 6. Project Part B is worth 100 total points. See grading rubric below. Format for report: A. Summary Report (about one paragraph on each of the speculations a. - d.) B. Appendix with the calculations of the Seven Elements of a Test of Hypothesis, the pvalues, and the confidence intervals—include the Excel formulas used in the calculations. Part B: Grading Rubric Category Points % Description Category Points % Description Addressing each speculation—20 points each 80 80 Hypothesis test, interpretation, confidence interval, and interpretation Summary report clarity 20 20 One paragraph on each of the speculations Total 100 100 A quality paper will meet or exceed all of the above requirements. Part C: Regression and Correlation Analysis Use the dependent variable (labeled Y) and the independent variables (labeled X1, X2, and X3) in the data file. Use Excel to perform the regression and correlation analysis to answer the following. 1. Generate a scatterplot for the specified dependent variable (Y) and the X1 independent variable, including the graph of the "best fit" line. Interpret. 2. Determine the equation of the "best fit" line, which describes the relationship between the dependent variable and the selected independent variable. 3. Determine the coefficient of correlation. Interpret. 4. Determine the coefficient of determination. Interpret. 5. Test the utility of this regression model. Interpret results, including the p-value. 6. Based on the findings in Steps 1-5, analyze the ability of the independent variable to predict the designated dependent variable. 7. Compute the confidence interval for β1 (the population slope) using a 95% confidence level. Interpret this interval. 8. Using an interval, estimate the average for the dependent variable for a selected value of the independent variable. Interpret this interval. 9. Using an interval, predict the particular value of the dependent variable for a selected value of the independent variable. Interpret this interval. 10. What can be said about the value of the dependent variable for values of the independent variable that are outside the range of the sample values? Explain. In an attempt to improve the model, use a multiple regression model to predict the dependent variable, Y, based on all of the independent variables, X1, X2, and X3. 11. Using Excel, run the multiple regression analysis using the designated dependent and three independent variables. State the equation for this multiple regression model. 12. Perform the Global Test for Utility (F-Test). Explain the conclusion. 13. Perform the t-test on each independent variable. Explain the conclusions and clearly state how the analysis should proceed. In particular, which independent variables should be kept and which should be discarded. If any independent variables are to be discarded, re-run the multiple regression, including only the significant independent variables, and summarize results with discussion of analysis. 14. Is this multiple regression model better than the linear model generated in parts 110? Explain. 15. All DeVry University policies are in effect, including the plagiarism policy. 16. Part C report is due by the end of Week 7. 17. Part C is worth 100 total points. See grading rubric below. Summarize your results from Steps 1–14 in a three-page report. The report should explain and interpret the results in ways that are understandable to someone who does not know statistics. Submission: The summary report and all of the work done in 1–14 (Excel output and interpretations) as an appendix Format for report: A. Summary Report B. Points 1–14 should be addressed with appropriate output, graphs, and interpretations. Be sure to number each point 1–14. Part C: Grading Rubric Category Points % Description Steps 1–12 and step 14, worth 5 points each 65 65 Addressed with appropriate output, graphs, and interpretations Step 13 15 15 Addressed with appropriate output, graphs, and interpretations Category Points % Description Communication skills 20 20 Writing, grammar, clarity, logic, and cohesiveness Total 100 100 A quality paper will meet or exceed all of the above requirements.
Sales (Y) 48 32 44 47 41 46 32 46 42 33 42 50 42 41 41 44 36 47 38 21 67 45 52 37 33 31 44 44 39 43 42 49 41 40 37 36 46 41 49 42 37 37 21 39 44 49 35 46 Calls (X1) 171 139 165 186 180 184 120 172 161 143 181 148 140 198 149 168 121 149 135 185 155 149 193 159 152 170 192 165 150 174 168 178 164 191 132 140 171 170 153 154 142 130 177 160 134 131 130 183 Time (X2) 13,0 16,9 15,7 13,5 14,0 12,7 19,9 14,7 13,2 15,4 11,5 16,0 17,5 13,2 17,3 11,0 18,0 15,8 18,5 18,9 17,9 13,5 13,7 18,1 15,0 14,3 16,7 12,4 15,3 12,7 16,4 15,1 17,8 19,0 10,0 15,7 14,9 12,3 19,0 14,3 13,9 16,9 17,0 14,3 19,4 14,6 19,4 15,4 Years (X3) 5 4 3 3 2 5 3 3 1 3 4 0 2 2 0 5 2 1 1 2 1 1 5 0 3 4 1 3 3 2 0 3 3 5 0 1 5 0 3 2 3 2 0 4 5 1 4 4 Type ONLINE NONE ONLINE ONLINE ONLINE ONLINE NONE GROUP GROUP NONE ONLINE NONE GROUP ONLINE ONLINE ONLINE NONE GROUP GROUP ONLINE NONE ONLINE ONLINE NONE GROUP GROUP GROUP ONLINE GROUP ONLINE ONLINE ONLINE GROUP ONLINE NONE NONE ONLINE ONLINE GROUP GROUP NONE NONE ONLINE NONE GROUP GROUP NONE ONLINE 43 41 48 39 40 48 50 44 43 33 32 46 48 56 44 34 43 33 49 50 52 45 48 35 44 44 67 51 41 40 45 41 40 47 48 42 41 29 48 33 48 45 36 43 42 49 50 42 44 169 155 182 140 157 167 144 168 175 150 155 163 162 189 153 158 160 173 178 189 184 174 188 149 159 160 166 178 178 176 138 159 145 151 186 194 152 145 188 139 201 156 131 161 152 178 157 154 156 14,0 16,0 13,0 12,4 15,4 14,8 15,8 12,4 13,6 14,9 17,9 16,6 14,5 15,0 15,3 14,2 10,9 17,5 18,3 14,3 11,4 13,6 13,6 15,6 14,6 14,8 18,9 16,5 13,4 12,6 15,3 18,8 14,7 16,6 14,2 13,6 14,5 19,0 11,3 19,3 12,5 13,2 18,5 17,3 14,6 16,4 15,9 15,3 20,0 5 2 2 1 1 3 2 2 5 2 1 2 4 3 2 3 4 1 2 1 4 2 0 1 2 2 1 1 2 1 2 2 2 2 1 2 4 2 2 3 1 3 2 3 1 2 3 1 0 GROUP ONLINE ONLINE NONE ONLINE ONLINE NONE GROUP GROUP GROUP GROUP ONLINE GROUP ONLINE ONLINE ONLINE ONLINE ONLINE GROUP ONLINE ONLINE ONLINE ONLINE GROUP GROUP ONLINE GROUP ONLINE ONLINE ONLINE NONE ONLINE NONE GROUP ONLINE ONLINE GROUP NONE ONLINE GROUP ONLINE GROUP NONE ONLINE ONLINE ONLINE GROUP GROUP ONLINE 45 48 39 170 170 144 14,2 17,4 17,7 1 5 3 ONLINE ONLINE NONE

Tutor Answer

(Top Tutor) Studypool Tutor
School: Cornell University
Studypool has helped 1,244,100 students
flag Report DMCA
Similar Questions
Hot Questions
Related Tags
Study Guides

Brown University





1271 Tutors

California Institute of Technology




2131 Tutors

Carnegie Mellon University




982 Tutors

Columbia University





1256 Tutors

Dartmouth University





2113 Tutors

Emory University





2279 Tutors

Harvard University





599 Tutors

Massachusetts Institute of Technology



2319 Tutors

New York University





1645 Tutors

Notre Dam University





1911 Tutors

Oklahoma University





2122 Tutors

Pennsylvania State University





932 Tutors

Princeton University





1211 Tutors

Stanford University





983 Tutors

University of California





1282 Tutors

Oxford University





123 Tutors

Yale University





2325 Tutors