MIS272 – Predictive Analytics
T2 2022
Assignment 2 – Group type your group number here
Student name
Type your name here
…
…
Student number
Type your student number here
…
…
Executive summary
(1 page)
Use this section to include an executive summary written for a senior business manager or similar non-technical reader.
Include your key recommendations to the business here. When presenting your recommendations, ensure you add references to
specific analyses and results in the subsequent sections of your report (e.g., by means of page numbers, figure references).
Rubric reference: Executive summary and recommendations
Data exploration, pattern discovery, and preparation
(2 pages)
Use this section to include all your work on finding meaningful patterns in the data set as relevant to the case study using
RapidMiner. This may include:
-
your exploration of relevant data attributes as predictors
your selection of an attribute as label
your approach to dealing with missing values, errors in the dataset, etc.
transformations you have done on the data (including any modifications on the data, any numeric normalizations, or any type
conversions such as nominal to numeric and the similar)
your data distribution analysis (e.g., using histograms or scatterplots)
your correlation analyses using correlation matrices
your cluster analysis
your detection of (and discussion around) outliers
Make sure your visualizations are accompanied by relevant discussions of the insights the analyses and visualizations will/should
lead to. BRING IN SCREENSHOTS OF YOUR RAPIDMINER PROCESSES AND EXPLAIN THE MAJOR FUNCTIONALITIES IF YOU DO SO…
Rubric reference: Explore, discover, and prepare data for predictive analysis
Predictive modelling
(2 page)
Use this section to show and discuss the RapidMiner predictive process/processes you have created as relevant to the case study.
This may include:
-
your estimation analysis workings and models
your justification of why you chose to carry out the specific analysis with the specific data attributes
BRING IN SCREENSHOTS OF YOUR RAPIDMINER PROCESSES AND EXPLAIN THE MAJOR FUNCTIONALITIES IF YOU DO SO…
Rubric reference: Analyse data (estimation and association analysis)
Association analysis
(1 page)
Use this section to report your findings in terms of association analysis. This may include:
-
your frequent item set discovery process
your frequent item set discovery results (evaluation) and discussion
BRING IN SCREENSHOTS OF YOUR RAPIDMINER PROCESSES AND EXPLAIN THE MAJOR FUNCTIONALITIES IF YOU DO SO…
Rubric reference: Analyse data (estimation and association analysis)
Model evaluation and improvement
(1 page)
Use this section to report your evaluation procedures and results in RapidMiner. Also, report any steps you have taken to improve
the performance of your model/s. This may include:
-
your evaluation procedure (e.g., hold-out or resampling) for any of the analysis cases you have included in the previous section
your comparative analysis on the evaluations of different predictive models or different evaluation procedures you have
developed (e.g., any performance improvements you may have achieved by trying different pre-processing, outlier removal,
validation, or feature selection tasks)
BRING IN SCREENSHOTS OF YOUR RAPIDMINER PROCESSES AND EXPLAIN THE MAJOR FUNCTIONALITIES IF YOU DO SO…
Rubric reference: Evaluate and improve analytics solutions
MIS272 – Predictive Analytics - Trimester 2 2022
Assignment 2 – Group Assignment
DUE DATE AND TIME:
PERCENTAGE OF FINAL GRADE:
WORD COUNT:
8 pm, 22 September
40%
3,000 words (written report)
Description
The purpose of this assignment is to develop your ability to construct a predictive model (using regression
and association analysis) to solve a problem based on understanding a specified business context.
The business context for this assignment relates to decision-making based on consumer complaints. Many
countries around the world set up regulatory bodies that receive and act on complaints lodged by
consumers against companies under their jurisdiction. The frequency and severity of received complaints
informs targeted decision-making of the regulator (such as further investigations, regulatory directives, and
in extreme cases, legal action against companies).
The specific dataset that you are given is from the regulatory body in a particular country. The regulator
receives large volumes of complaints lodged against various companies that provide several types of services
to customers. Every time a complaint is lodged against a company by a customer or by another business, the
regulator assesses the company’s ongoing ‘fitness’ score in the business domain (based on several external
factors). This aggregate fitness score is calculated for the company according to the assessment by the
regulator as part of responding to the complaint (the better the company’s score, the fitter the company is
in terms of their responsibility to address complaints). These scores are used by the regulator body to inform
their decision-making processes during and after responding to complaints.
The dataset contains a large number of complaints lodged against specific companies (each complaint
includes details such as the specific company ID and complaint code). The description of what particular
complaint codes refer to has been removed for reasons of confidentiality. At the time when a complaint is
lodged, additional company-related data fields are also collected (from the complainant and external data
sources) and recorded in the data set by the regulator.
You are asked to explore and analyse this data set. Specific tasks are:
Task 1: Use appropriate visualizations, explorative analytics, and cluster analysis to demonstrate a
thorough understanding of the data and extract informative data patterns for use by decision makers at
the regulator
Task 2: Develop a predictive model to estimate the fitness score given to the company based on relevant
data attributes. You must consider the significance of a variety of applicable attributes, and in particular
also sector and location of the company involved.
Task 3: The regulator plans to develop policy aimed at addressing systemic co-occurring complaints
across all companies. For this you are asked to identify the top 10 frequently co-occurring complaints.
Definitions of the data attributes are given in a separate data dictionary file in the assignment folder. It is
recommended that you read the data definitions to better understand and consider the quality of the data
prior to developing your analytics solutions. You will need to join the files (companies and complaints) to be
able to address the tasks above. Your solutions should only draw on learning in the lectures and seminars of
the unit.
Specific Requirements
This is a group assignment (2-3 students maximum). Every group member must submit a declaration of
their individual contribution as part of the submission. Students in each group will need to work together
regularly on their assignment and submit their work as a group.
Groups should NOT discuss their work or collude with students of other groups. Please refer to the Deakin
policy on Academic misconduct in this regard.
You must use the submission template for the assignment provided on Cloud Deakin for your report. Your final
report must adhere strictly to the page limits in the template as only pages within the limits will be marked. It
is essential that the executive summary section of your report is targeted at a non-technical reader (e.g., a
senior manager at the Regulator) and that the remaining parts of the report target a data/business analyst.
Your final deliverables must include:
i)
the final report according to the submission template as a PDF file
ii)
all RapidMiner process files, combined into a single ZIP file.
iii)
declaration of each student’s individual contribution to the submission (use provided form).
All submissions will need to be lodged via the CloudDeakin dropbox before the deadline. You should submit a
partial submission of your work prior to the deadline. You should select and include relevant tables, charts,
analysis processes, analysis results, models and the evaluation in your report.
You must use RapidMiner for this assignment. The use of Excel or any other analytics tools are therefore not
permitted. You must include appropriate documentary notes within your RapidMiner process files make it
easier to understand the logic. Use sub-processes as appropriate to consolidate related analytical steps, and
to improve the readability of your processes.
The consistency of your RapidMiner file(s) will be checked against the results in your report. You must NOT
modify the data file provided for this assignment before importing it into RapidMiner.
Marking and feedback
The marking rubric for this assignment is available on the CloudDeakin unit site - in the Assessment folder
(under Assessment Resources).
You should familiarise yourself with the criteria before completing any part of the assessment. Criteria
act as a boundary around the task and help identify what assessors are looking for specifically in your
submission. The criteria are drawn from the unit’s learning outcomes ensuring they align with
appropriate graduate attribute/s.
Identifying the standard you aim to achieve is also a useful strategy for success and to that end,
familiarising yourself with the descriptor for that standard is recommended.
Students who submit their work by the due date will receive their marks and feedback on CloudDeakin
15 working days after the submission date.
Extensions
There will be no extensions granted unless there are exceptional and most unusual circumstances outside
the student’s control. Partial submissions will be considered as evidence of groups progress in this regard.
Students who require a time extension should submit a written request to the Unit Chair, supported with
documentation (e.g., a medical certificate). Such requests should be e-mailed to the Unit Chair. Requests for
extensions will NOT be considered three days prior to submission.
Late submission
The following marking penalties will apply if you submit an assessment task after the due date
without an approved extension: 5% will be deducted from available marks for each day up to five
days, and work that is submitted more than five days after the due date will not be marked and will
receive 0% for the task.
'Day' means working day for paper submissions and calendar day for electronic submissions. The
Unit Chair may refuse to accept a late submission where it is unreasonable or impracticable to
assess the task after the due date.
Calculation of the late penalty is as follows: this is based on the assignment being due on a Thursday
• 1 day late: submitted after 8pm on Thursday but before 8pm Friday – 5% penalty.
• 2 days late: submitted after 8pm Friday but before Saturday 8pm – 10% penalty.
• 3 days late: submitted after 8pm Saturday on due date but before Sunday 8pm – 15% penalty.
• 4 days late: submitted after 8pm Sunday on due date but before Monday 8pm – 20% penalty.
• 5 days late: submitted after 8pm Monday on due date but before Tuesday 8pm – 25% penalty.
Support
The Division of Student Life (see link below) provides all students with editing assistance. Students who wish
to take advantage of this service must be organized and plan ahead and contact the Division of Student Life
in order to schedule a booking, well in advance of the due date of this assignment.
http://www.deakin.edu.au/about-deakin/administrative-divisions/student-life
Referencing
Any material used in this assignment that is not your original work must be acknowledged as such and
appropriately referenced. You can find information about plagiarism and other study support resources at
the following website: http://www.deakin.edu.au/students/study-support
Academic misconduct
For information about academic misconduct, special consideration, extensions, and assessment
feedback, please refer to the document Your rights and responsibilities as a student in this Unit in
the first folder next to the Unit Guide in the Resources area of the CloudDeakin unit site.
MIS272 – Predictive Analytics
T2 2022
Assignment 2 – Rubric
Assessment Items
(Global Learning
Outcomes)
YET TO ACHIEVE MINIMUM STANDARDS
Fail (N) /0-49
Poor/unacceptable/not attempted/requires further
development/needs improvement
MEETS EXPECTATIONS
EXCEEDS EXPECTATIONS
Pass (P)/ 50-59
Acceptable/satisfactory
Credit (C)/60-69
Good/well done
Distinction (D)/ 70-79
Very good/exceeds
expectations
High Distinction (HD)/80-100
Excellent and exemplary/exceeding
high standards
(GLO1)
Provides no or
incomprehensible summary
of problem and solution.
Solutions are not justified.
No references are given to
the rest of the report.
Provides insignificant
summary of problem and
solution. Solutions are not
fully justified. Limited
references to the rest of the
report.
Provides a satisfactory
summary of problem and
solution. Some solutions are
justified. Some minor
references to the rest of the
report.
Provides a good summary of
problem and solution. Most of
the solution is justified.
Adequate references to the
rest of the report.
Provides a very good summary
of both the problem and
solution. Solution well justified.
Good reference to the rest of
the report.
Provides an excellent summary of
both the problem and solution at an
appropriate level of specificity.
Solutions are fully comprehensible
and fully justified. Excellent
referencing to the rest of the report.
Explore, discover, and
prepare data for
predictive analysis
Default: 1.5
Range: 0 – 2.9
Provides no evidence of
initiative in exploration and
no viable approaches to
preparing data. Finds no
meaningful patterns.
Default: 4
Range: 3 – 4.9
Provides little evidence of
initiative in exploration and a
few viable approaches to
preparing data. Finds a few
meaningful patterns.
Default: 5.5
Range: 5 – 5.9
Demonstrates satisfactory
initiative in exploration and
identifies and multiple basic
approaches to preparing
data. Finds some
meaningful patterns.
Default: 6.5
Range: 6 – 6.9
Demonstrates good initiative
in exploration and identifies
and multiple adequate
approaches to preparing data.
Finds adequate meaningful
patterns.
Default: 7.5
Range: 7 – 7.9
Demonstrates very good
initiative in exploration and
multiple effective approaches to
preparing data. Finds relevant
and meaningful patterns that
informs predictive modelling.
Default: 10
Range: 8 – 10
Demonstrates exemplary initiative in
exploration and multiple advanced
approaches for preparing data. Finds
significant meaningful patterns that
informs predictive modelling.
Default: 4.5
Range: 0 – 8.9
Summarizes data rather
than predictive modelling;
does not address the
problem and questions.
Default: 12
Range: 9 – 14.9
Summarizes data with very
little predictive modelling;
rarely address the problem
and questions.
Default: 16.5
Range: 15 – 17.9
Reveals some proficiency in
predictive modelling;
addresses the problem and
questions satisfactorily.
Default: 19.5
Range: 18 – 20.9
Reveals generally sound
proficiency in predictive
modelling; addresses the
problem and questions well.
Default: 22.5
Range: 21– 23.9
Reveals very good proficiency in
predictive modelling; insightful
response to the problem and
questions.
Default: 30
Range: 24 – 30
Reveals excellent proficiency in
predictive modelling; insightful
response to the problem and
questions.
Default: 6
Range: 0 – 11.9
Evaluation of solutions is
superficial lacking any
consideration for the
models developed, with
little or no logical judgments
of the pros and cons of the
models or improvement.
Default: 16
Range: 12 – 19.9
Evaluation of solutions is
partial lacking consideration
for the models developed,
with little or no logical
judgments of the pros and
cons of the models or
improvement.
Default: 22
Range: 20 – 23.9
Evaluation of solutions
satisfactorily considers the
models developed, logically
judges the pros and cons of
the models; Reveals
satisfactory efforts towards
model improvement.
Default: 26
Range: 24 – 27.9
Evaluation of solutions
reasonably considers the
models developed, logically
judges the pros and cons of
the models. Reveals generally
sound efforts towards model
improvement.
Default: 30
Range: 28 – 31.9
Evaluation of solutions
thoroughly considers models
developed, logically judges the
pros and cons of the models.
Reveals very good efforts
towards model improvement.
Default: 40
Range: 32 – 40
Evaluation of solutions contains
consistently thorough and insightful
consideration of the models
developed, logically judges the pros
and cons of the models. Reveals
excellent efforts towards model
improvement.
Default: 3
Range: 0 – 5.9
Default: 8
Range: 6 – 9.9
Default: 11
Range: 10 – 11.9
Default: 13
Range: 12 – 13.9
Default: 15
Range: 14 – 15.9
Default: 20
Range: 16 – 20
Executive summary of
problem and solution
(GLO1)
Analyse data with
predictive models
(GLO4)
Evaluate and improve
analytic solutions
(GLO4)
Data exploration, pattern discovery, and preparation
(2 pages)
Use this section to include all your work on finding meaningful patterns in the data set as relevant to the case study
using RapidMiner. This may include:
-
-
-
your exploration of relevant data attributes as predictors
your selection of an attribute as label
your approach to dealing with missing values, errors in the dataset, etc.
transformations you have done on the data (including any modifications on the data, any numeric normalizations, or
any type conversions such as nominal to numeric and the similar)
your data distribution analysis (e.g., using histograms or scatterplots)
your correlation analyses using correlation matrices
your cluster analysis
your detection of (and discussion around) outliers
Make sure your visualizations are accompanied by relevant discussions of the insights the analyses and visualizations
will/should lead to. BRING IN SCREENSHOTS OF YOUR RAPIDMINER PROCESSES AND EXPLAIN THE MAJOR
FUNCTIONALITIES IF YOU DO SO...
Rubric reference: Explore, discover, and prepare data for predictive analysis
Purchase answer to see full
attachment