Data Analytics Question

User Generated

Wwalz

Business Finance

MIS 272

University of Sydney

MIS

Description

 

Unformatted Attachment Preview

MIS272 – Predictive Analytics T2 2022 Assignment 2 – Group type your group number here Student name Type your name here … … Student number Type your student number here … … Executive summary (1 page) Use this section to include an executive summary written for a senior business manager or similar non-technical reader. Include your key recommendations to the business here. When presenting your recommendations, ensure you add references to specific analyses and results in the subsequent sections of your report (e.g., by means of page numbers, figure references). Rubric reference: Executive summary and recommendations Data exploration, pattern discovery, and preparation (2 pages) Use this section to include all your work on finding meaningful patterns in the data set as relevant to the case study using RapidMiner. This may include: - your exploration of relevant data attributes as predictors your selection of an attribute as label your approach to dealing with missing values, errors in the dataset, etc. transformations you have done on the data (including any modifications on the data, any numeric normalizations, or any type conversions such as nominal to numeric and the similar) your data distribution analysis (e.g., using histograms or scatterplots) your correlation analyses using correlation matrices your cluster analysis your detection of (and discussion around) outliers Make sure your visualizations are accompanied by relevant discussions of the insights the analyses and visualizations will/should lead to. BRING IN SCREENSHOTS OF YOUR RAPIDMINER PROCESSES AND EXPLAIN THE MAJOR FUNCTIONALITIES IF YOU DO SO… Rubric reference: Explore, discover, and prepare data for predictive analysis Predictive modelling (2 page) Use this section to show and discuss the RapidMiner predictive process/processes you have created as relevant to the case study. This may include: - your estimation analysis workings and models your justification of why you chose to carry out the specific analysis with the specific data attributes BRING IN SCREENSHOTS OF YOUR RAPIDMINER PROCESSES AND EXPLAIN THE MAJOR FUNCTIONALITIES IF YOU DO SO… Rubric reference: Analyse data (estimation and association analysis) Association analysis (1 page) Use this section to report your findings in terms of association analysis. This may include: - your frequent item set discovery process your frequent item set discovery results (evaluation) and discussion BRING IN SCREENSHOTS OF YOUR RAPIDMINER PROCESSES AND EXPLAIN THE MAJOR FUNCTIONALITIES IF YOU DO SO… Rubric reference: Analyse data (estimation and association analysis) Model evaluation and improvement (1 page) Use this section to report your evaluation procedures and results in RapidMiner. Also, report any steps you have taken to improve the performance of your model/s. This may include: - your evaluation procedure (e.g., hold-out or resampling) for any of the analysis cases you have included in the previous section your comparative analysis on the evaluations of different predictive models or different evaluation procedures you have developed (e.g., any performance improvements you may have achieved by trying different pre-processing, outlier removal, validation, or feature selection tasks) BRING IN SCREENSHOTS OF YOUR RAPIDMINER PROCESSES AND EXPLAIN THE MAJOR FUNCTIONALITIES IF YOU DO SO… Rubric reference: Evaluate and improve analytics solutions MIS272 – Predictive Analytics - Trimester 2 2022 Assignment 2 – Group Assignment DUE DATE AND TIME: PERCENTAGE OF FINAL GRADE: WORD COUNT: 8 pm, 22 September 40% 3,000 words (written report) Description The purpose of this assignment is to develop your ability to construct a predictive model (using regression and association analysis) to solve a problem based on understanding a specified business context. The business context for this assignment relates to decision-making based on consumer complaints. Many countries around the world set up regulatory bodies that receive and act on complaints lodged by consumers against companies under their jurisdiction. The frequency and severity of received complaints informs targeted decision-making of the regulator (such as further investigations, regulatory directives, and in extreme cases, legal action against companies). The specific dataset that you are given is from the regulatory body in a particular country. The regulator receives large volumes of complaints lodged against various companies that provide several types of services to customers. Every time a complaint is lodged against a company by a customer or by another business, the regulator assesses the company’s ongoing ‘fitness’ score in the business domain (based on several external factors). This aggregate fitness score is calculated for the company according to the assessment by the regulator as part of responding to the complaint (the better the company’s score, the fitter the company is in terms of their responsibility to address complaints). These scores are used by the regulator body to inform their decision-making processes during and after responding to complaints. The dataset contains a large number of complaints lodged against specific companies (each complaint includes details such as the specific company ID and complaint code). The description of what particular complaint codes refer to has been removed for reasons of confidentiality. At the time when a complaint is lodged, additional company-related data fields are also collected (from the complainant and external data sources) and recorded in the data set by the regulator. You are asked to explore and analyse this data set. Specific tasks are: Task 1: Use appropriate visualizations, explorative analytics, and cluster analysis to demonstrate a thorough understanding of the data and extract informative data patterns for use by decision makers at the regulator Task 2: Develop a predictive model to estimate the fitness score given to the company based on relevant data attributes. You must consider the significance of a variety of applicable attributes, and in particular also sector and location of the company involved. Task 3: The regulator plans to develop policy aimed at addressing systemic co-occurring complaints across all companies. For this you are asked to identify the top 10 frequently co-occurring complaints. Definitions of the data attributes are given in a separate data dictionary file in the assignment folder. It is recommended that you read the data definitions to better understand and consider the quality of the data prior to developing your analytics solutions. You will need to join the files (companies and complaints) to be able to address the tasks above. Your solutions should only draw on learning in the lectures and seminars of the unit. Specific Requirements This is a group assignment (2-3 students maximum). Every group member must submit a declaration of their individual contribution as part of the submission. Students in each group will need to work together regularly on their assignment and submit their work as a group. Groups should NOT discuss their work or collude with students of other groups. Please refer to the Deakin policy on Academic misconduct in this regard. You must use the submission template for the assignment provided on Cloud Deakin for your report. Your final report must adhere strictly to the page limits in the template as only pages within the limits will be marked. It is essential that the executive summary section of your report is targeted at a non-technical reader (e.g., a senior manager at the Regulator) and that the remaining parts of the report target a data/business analyst. Your final deliverables must include: i) the final report according to the submission template as a PDF file ii) all RapidMiner process files, combined into a single ZIP file. iii) declaration of each student’s individual contribution to the submission (use provided form). All submissions will need to be lodged via the CloudDeakin dropbox before the deadline. You should submit a partial submission of your work prior to the deadline. You should select and include relevant tables, charts, analysis processes, analysis results, models and the evaluation in your report. You must use RapidMiner for this assignment. The use of Excel or any other analytics tools are therefore not permitted. You must include appropriate documentary notes within your RapidMiner process files make it easier to understand the logic. Use sub-processes as appropriate to consolidate related analytical steps, and to improve the readability of your processes. The consistency of your RapidMiner file(s) will be checked against the results in your report. You must NOT modify the data file provided for this assignment before importing it into RapidMiner. Marking and feedback The marking rubric for this assignment is available on the CloudDeakin unit site - in the Assessment folder (under Assessment Resources). You should familiarise yourself with the criteria before completing any part of the assessment. Criteria act as a boundary around the task and help identify what assessors are looking for specifically in your submission. The criteria are drawn from the unit’s learning outcomes ensuring they align with appropriate graduate attribute/s. Identifying the standard you aim to achieve is also a useful strategy for success and to that end, familiarising yourself with the descriptor for that standard is recommended. Students who submit their work by the due date will receive their marks and feedback on CloudDeakin 15 working days after the submission date. Extensions There will be no extensions granted unless there are exceptional and most unusual circumstances outside the student’s control. Partial submissions will be considered as evidence of groups progress in this regard. Students who require a time extension should submit a written request to the Unit Chair, supported with documentation (e.g., a medical certificate). Such requests should be e-mailed to the Unit Chair. Requests for extensions will NOT be considered three days prior to submission. Late submission The following marking penalties will apply if you submit an assessment task after the due date without an approved extension: 5% will be deducted from available marks for each day up to five days, and work that is submitted more than five days after the due date will not be marked and will receive 0% for the task. 'Day' means working day for paper submissions and calendar day for electronic submissions. The Unit Chair may refuse to accept a late submission where it is unreasonable or impracticable to assess the task after the due date. Calculation of the late penalty is as follows: this is based on the assignment being due on a Thursday • 1 day late: submitted after 8pm on Thursday but before 8pm Friday – 5% penalty. • 2 days late: submitted after 8pm Friday but before Saturday 8pm – 10% penalty. • 3 days late: submitted after 8pm Saturday on due date but before Sunday 8pm – 15% penalty. • 4 days late: submitted after 8pm Sunday on due date but before Monday 8pm – 20% penalty. • 5 days late: submitted after 8pm Monday on due date but before Tuesday 8pm – 25% penalty. Support The Division of Student Life (see link below) provides all students with editing assistance. Students who wish to take advantage of this service must be organized and plan ahead and contact the Division of Student Life in order to schedule a booking, well in advance of the due date of this assignment. http://www.deakin.edu.au/about-deakin/administrative-divisions/student-life Referencing Any material used in this assignment that is not your original work must be acknowledged as such and appropriately referenced. You can find information about plagiarism and other study support resources at the following website: http://www.deakin.edu.au/students/study-support Academic misconduct For information about academic misconduct, special consideration, extensions, and assessment feedback, please refer to the document Your rights and responsibilities as a student in this Unit in the first folder next to the Unit Guide in the Resources area of the CloudDeakin unit site. MIS272 – Predictive Analytics T2 2022 Assignment 2 – Rubric Assessment Items (Global Learning Outcomes) YET TO ACHIEVE MINIMUM STANDARDS Fail (N) /0-49 Poor/unacceptable/not attempted/requires further development/needs improvement MEETS EXPECTATIONS EXCEEDS EXPECTATIONS Pass (P)/ 50-59 Acceptable/satisfactory Credit (C)/60-69 Good/well done Distinction (D)/ 70-79 Very good/exceeds expectations High Distinction (HD)/80-100 Excellent and exemplary/exceeding high standards (GLO1) Provides no or incomprehensible summary of problem and solution. Solutions are not justified. No references are given to the rest of the report. Provides insignificant summary of problem and solution. Solutions are not fully justified. Limited references to the rest of the report. Provides a satisfactory summary of problem and solution. Some solutions are justified. Some minor references to the rest of the report. Provides a good summary of problem and solution. Most of the solution is justified. Adequate references to the rest of the report. Provides a very good summary of both the problem and solution. Solution well justified. Good reference to the rest of the report. Provides an excellent summary of both the problem and solution at an appropriate level of specificity. Solutions are fully comprehensible and fully justified. Excellent referencing to the rest of the report. Explore, discover, and prepare data for predictive analysis Default: 1.5 Range: 0 – 2.9 Provides no evidence of initiative in exploration and no viable approaches to preparing data. Finds no meaningful patterns. Default: 4 Range: 3 – 4.9 Provides little evidence of initiative in exploration and a few viable approaches to preparing data. Finds a few meaningful patterns. Default: 5.5 Range: 5 – 5.9 Demonstrates satisfactory initiative in exploration and identifies and multiple basic approaches to preparing data. Finds some meaningful patterns. Default: 6.5 Range: 6 – 6.9 Demonstrates good initiative in exploration and identifies and multiple adequate approaches to preparing data. Finds adequate meaningful patterns. Default: 7.5 Range: 7 – 7.9 Demonstrates very good initiative in exploration and multiple effective approaches to preparing data. Finds relevant and meaningful patterns that informs predictive modelling. Default: 10 Range: 8 – 10 Demonstrates exemplary initiative in exploration and multiple advanced approaches for preparing data. Finds significant meaningful patterns that informs predictive modelling. Default: 4.5 Range: 0 – 8.9 Summarizes data rather than predictive modelling; does not address the problem and questions. Default: 12 Range: 9 – 14.9 Summarizes data with very little predictive modelling; rarely address the problem and questions. Default: 16.5 Range: 15 – 17.9 Reveals some proficiency in predictive modelling; addresses the problem and questions satisfactorily. Default: 19.5 Range: 18 – 20.9 Reveals generally sound proficiency in predictive modelling; addresses the problem and questions well. Default: 22.5 Range: 21– 23.9 Reveals very good proficiency in predictive modelling; insightful response to the problem and questions. Default: 30 Range: 24 – 30 Reveals excellent proficiency in predictive modelling; insightful response to the problem and questions. Default: 6 Range: 0 – 11.9 Evaluation of solutions is superficial lacking any consideration for the models developed, with little or no logical judgments of the pros and cons of the models or improvement. Default: 16 Range: 12 – 19.9 Evaluation of solutions is partial lacking consideration for the models developed, with little or no logical judgments of the pros and cons of the models or improvement. Default: 22 Range: 20 – 23.9 Evaluation of solutions satisfactorily considers the models developed, logically judges the pros and cons of the models; Reveals satisfactory efforts towards model improvement. Default: 26 Range: 24 – 27.9 Evaluation of solutions reasonably considers the models developed, logically judges the pros and cons of the models. Reveals generally sound efforts towards model improvement. Default: 30 Range: 28 – 31.9 Evaluation of solutions thoroughly considers models developed, logically judges the pros and cons of the models. Reveals very good efforts towards model improvement. Default: 40 Range: 32 – 40 Evaluation of solutions contains consistently thorough and insightful consideration of the models developed, logically judges the pros and cons of the models. Reveals excellent efforts towards model improvement. Default: 3 Range: 0 – 5.9 Default: 8 Range: 6 – 9.9 Default: 11 Range: 10 – 11.9 Default: 13 Range: 12 – 13.9 Default: 15 Range: 14 – 15.9 Default: 20 Range: 16 – 20 Executive summary of problem and solution (GLO1) Analyse data with predictive models (GLO4) Evaluate and improve analytic solutions (GLO4) Data exploration, pattern discovery, and preparation (2 pages) Use this section to include all your work on finding meaningful patterns in the data set as relevant to the case study using RapidMiner. This may include: - - - your exploration of relevant data attributes as predictors your selection of an attribute as label your approach to dealing with missing values, errors in the dataset, etc. transformations you have done on the data (including any modifications on the data, any numeric normalizations, or any type conversions such as nominal to numeric and the similar) your data distribution analysis (e.g., using histograms or scatterplots) your correlation analyses using correlation matrices your cluster analysis your detection of (and discussion around) outliers Make sure your visualizations are accompanied by relevant discussions of the insights the analyses and visualizations will/should lead to. BRING IN SCREENSHOTS OF YOUR RAPIDMINER PROCESSES AND EXPLAIN THE MAJOR FUNCTIONALITIES IF YOU DO SO... Rubric reference: Explore, discover, and prepare data for predictive analysis
Purchase answer to see full attachment
Explanation & Answer:
3000 Words
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

View attached explanation and answer. Let me know if you have any questions.

Data Exploration, Pattern Discovery, and Preparatio 1

DATA EXPLORATION, PATTERN DISCOVERY, AND PREPARATION

by [Name]

Course
Professor's Name
Institution
Location of Institution
Date

Data Exploration, Pattern Discovery, and Preparatio 2

Data Exploration, Pattern Discovery, and Preparation
This section of the report will focus on data exploration, preparation, and pattern
discovery for comprehensive statistical analysis of the dataset and draw actionable inferences.
The first data preparation process was joining the complaints and the companies' data
sets to associate the complaints and related information with specific companies to which the
complaints were addressed. The process involved importing the two datasets into a single
folder in the local repository, from where set role operators were used on each dataset to set
the role of Company_ID attributes in either dataset as Id. This enabled identifying the
Company_ID attributes in either dataset as the primary keys from which a join operator was
used to merge the two datasets for easier subsequent relational analysis between the company
and complaints.
The select attribute operator was then used to select relevant data attributes as
predictors. The complaint description attribute was inconsequential to subsequent analysis at
a glance as it had no values. To solve this, an inverted selection of the complaint description
attribute using the select operator allowed for dropping the variable from the dataset, leaving
relevant data attributes as predictors. The replace missing values operator was used to replace
all missing values with average values. This approach is p...


Anonymous
Super useful! Studypool never disappoints.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags