Description
Analyze percentiles, quartiles, and the Five-Number Summary.
300 words with APA reference
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.
Explanation & Answer
Review
Review
Anonymous
Great study resource, helped me a lot.
Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4
24/7 Homework Help
Stuck on a homework question? Our verified tutors can answer all questions, from basic math to advanced rocket science!
Most Popular Content
R Studio, decision tree, tree interpretation
Assignment Instructions Scenario: You work for an insurance company that has many policy holders, and many agents who sell ...
R Studio, decision tree, tree interpretation
Assignment Instructions Scenario: You work for an insurance company that has many policy holders, and many agents who sell insurance to new customers every day. You have been asked to use historical data about past and current policy holders to build a decision tree that will be used by sales agents to determine the insurability of potential new clients. You will use two data sets to do this. The Policy Holders data set contains information about current and past auto insurance customers, such as whether or not they have a claim or ticket in the past 12 months, an accident in the past 36 months, how they pay for their policy, their gender and marital status, and the level of activity associated with their insurance account (this is Low, Moderate or High based on frequency of changes to the policy, frequency of late or partial payments, and other similar account activity). Note that the only variable in the Policy Holders data set that is not also in the Policy Buyers data set is Insurance Category variable. This is the dependent variable that you will predict using a decision tree model. For the Policy Holders, you have the benefit of hindsight since your company did sell auto insurance policies to all of the people in this data set, and looking back on their activity as policy holders, they have each been assigned one of Insurance Category values: Insure-Best Terms, Insure-Risk Terms, Insure-High Premium, or Do Not Insure. The “Best Terms” customers are those who have paid their premiums and had no or few claims that have cost your company money. They are the lowest risk customers. The “Risk Terms” customers have been good for your company, but have had a few claims or incidents that have cost the company money. They are still a good risk for the company, but may have slightly higher premiums or lower coverage amounts in order to account for the higher risk to the company. The “High Premium” customers are those who have had a number of claims or other problems that have cost the company money (e.g., maybe they have not always paid their premiums on time or in full), but still have been worth insuring as long as they paid higher premiums than most of the other customers. They represent a higher risk for the company, and therefore must be sold policies at higher premiums and lower coverage. The “Do Not Insure” customers are those who have filed too many claims and/or claims that have cost more than what they have paid in premiums; or who have been unreliable in paying their premiums to the point where they cost the company more money than they pay in, and are therefore not a good risk for the company. They may have had their policies cancelled by the company due to excessive risk that the company cannot bear. Complete the following steps: Download the PolicyHolders.csv and PolicyBuyers.csv files from Course Documents. In a Word document create a cover page for your Assignment, then provide evidence that you have imported both of these data sets into R with appropriate names. Use the rpart function in R to create a decision tree model for the Insurance Category dependent variable. Do not forget to load library(rpart). Provide evidence in Word that you have created the model. Using summary(<yourtreename>), identify the three most important independent variables used to predict Insurance Category. In Word, show evidence of the three top independent variables. Write a short explanation of your findings. Use Tools > Install Packages in the R Studio application menu to install the rpart.plot package. Once installed, load this package using library(rpart.plot). Then, use the prp function to visualize your decision tree. You may need to resize the Plots window in the lower right part of your R Studio application to make the tree large enough to read. In your prp function, include the following parameters: extra=4, faclen=0, varlen=0, cex=.75. The extra parameter includes the confidence percentages in each leaf of your tree; faclen causes the independent variable names to be spelled out in the tree; varlen causes the dependent variable values to be spelled out in the tree, and cex sets the font size (you can experiment with this if you would like). In your Word document, include a screen capture of your visualized decision tree. Write a short explanation of how the percentages in each tree leaf would be interpreted. Make predictions for each of the policy buyers by applying your decision tree model the Policy Buyers data set. When using the predict function in R, be sure to include the parameter type=”class” so that you will generate an Insurance Category for each policy buyer. Using the Filter feature in R Studio, report the number of policy buyers that you predict will fall into each of the four categories. Be sure to label these clearly in your Word document. If you have done this step correctly, the numbers predicted for each category should total to 473, which is the number of records in the Policy Buyers data set. Conduct research about how the insurance industry uses analytics to manage risk as they insure their customers. Write a brief summary of your research (1–2 paragraphs) discussing how the insurance industry uses analytics. Be sure to include discussion of both legal and ethical ramifications for the industry in their use of analytics. Cite your sources both in the text and in a references page. Screen shots need to be included for each step in R Studio.
8 pages
Organization Analysis 1
❑ Descriptive statistics are shifts, total sales, and participant ID ❑ I completed my analysis by evaluating, cleaning ...
Organization Analysis 1
❑ Descriptive statistics are shifts, total sales, and participant ID ❑ I completed my analysis by evaluating, cleaning, and summarizing ❑We ...
MATU203 Mickey & Minnie Home Sales Statistics And Probability Research Paper
see details in attached instructions and rubrics for all parts. must be in APA format.
MATU203 Mickey & Minnie Home Sales Statistics And Probability Research Paper
see details in attached instructions and rubrics for all parts. must be in APA format.
Similar Content
UC Merced Rayleigh Distribution Advanced Probability Statistics Worksheet
х
Suppose X1...X, is an independent and identically distributed sample. It is a Rayleigh
distribution with parameter 0 > ...
prisms are polyhedra with two parellel congruent__ called bases, math homework help
verticesedgesfacespolyhedrons...
use Mathematica to solve Differential Equation
please see the attached file and answer all the questions (4 questions), using Mathematica.please use the given equation.t...
Homework question Need help
Automobile Production The number N of cars produced at a certain factory in 1 day after t hours of
operation is given by ...
Need math help with the probability of an event
An ordinary (fair) die is a cube with the numbers 1 through 6 on the sides (represented by painted spo...
Paragragp responding to the following post
Describe the error in the conclusion. Given: There is a linear
correlation between the number of cigarettes smoked and t...
College Algebra Solutions
Given in the opposite problem, we need to find the solution bounded by the following 1) First, to graph any inequality, we...
Answers 2
(31) If you could save 5 cents ($0.05) every 30 seconds, approximately how many years would it take you to save a million ...
Assignment 1
Demand for patient surgery at the General Hospital has increased steadily in the past few years, as shown in the following...
Related Tags
Book Guides
Fast Food Nation
by Eric Schlosser
Girl in Translation
by Jean Kwok
As I Lay Dying
by William Faulkner
The Age Of Light
by Whitney Scharer
Things That Matter
by Charles Krauthammer
The Awakening
by Kate Chopin
Uncle Tom's Cabin
by Harriet Beecher Stowe
Frankenstein
by Mary Shelley
The Knife of Never Letting Go
by Patrick Ness
Get 24/7
Homework help
Our tutors provide high quality explanations & answers.
Post question
Most Popular Content
R Studio, decision tree, tree interpretation
Assignment Instructions Scenario: You work for an insurance company that has many policy holders, and many agents who sell ...
R Studio, decision tree, tree interpretation
Assignment Instructions Scenario: You work for an insurance company that has many policy holders, and many agents who sell insurance to new customers every day. You have been asked to use historical data about past and current policy holders to build a decision tree that will be used by sales agents to determine the insurability of potential new clients. You will use two data sets to do this. The Policy Holders data set contains information about current and past auto insurance customers, such as whether or not they have a claim or ticket in the past 12 months, an accident in the past 36 months, how they pay for their policy, their gender and marital status, and the level of activity associated with their insurance account (this is Low, Moderate or High based on frequency of changes to the policy, frequency of late or partial payments, and other similar account activity). Note that the only variable in the Policy Holders data set that is not also in the Policy Buyers data set is Insurance Category variable. This is the dependent variable that you will predict using a decision tree model. For the Policy Holders, you have the benefit of hindsight since your company did sell auto insurance policies to all of the people in this data set, and looking back on their activity as policy holders, they have each been assigned one of Insurance Category values: Insure-Best Terms, Insure-Risk Terms, Insure-High Premium, or Do Not Insure. The “Best Terms” customers are those who have paid their premiums and had no or few claims that have cost your company money. They are the lowest risk customers. The “Risk Terms” customers have been good for your company, but have had a few claims or incidents that have cost the company money. They are still a good risk for the company, but may have slightly higher premiums or lower coverage amounts in order to account for the higher risk to the company. The “High Premium” customers are those who have had a number of claims or other problems that have cost the company money (e.g., maybe they have not always paid their premiums on time or in full), but still have been worth insuring as long as they paid higher premiums than most of the other customers. They represent a higher risk for the company, and therefore must be sold policies at higher premiums and lower coverage. The “Do Not Insure” customers are those who have filed too many claims and/or claims that have cost more than what they have paid in premiums; or who have been unreliable in paying their premiums to the point where they cost the company more money than they pay in, and are therefore not a good risk for the company. They may have had their policies cancelled by the company due to excessive risk that the company cannot bear. Complete the following steps: Download the PolicyHolders.csv and PolicyBuyers.csv files from Course Documents. In a Word document create a cover page for your Assignment, then provide evidence that you have imported both of these data sets into R with appropriate names. Use the rpart function in R to create a decision tree model for the Insurance Category dependent variable. Do not forget to load library(rpart). Provide evidence in Word that you have created the model. Using summary(<yourtreename>), identify the three most important independent variables used to predict Insurance Category. In Word, show evidence of the three top independent variables. Write a short explanation of your findings. Use Tools > Install Packages in the R Studio application menu to install the rpart.plot package. Once installed, load this package using library(rpart.plot). Then, use the prp function to visualize your decision tree. You may need to resize the Plots window in the lower right part of your R Studio application to make the tree large enough to read. In your prp function, include the following parameters: extra=4, faclen=0, varlen=0, cex=.75. The extra parameter includes the confidence percentages in each leaf of your tree; faclen causes the independent variable names to be spelled out in the tree; varlen causes the dependent variable values to be spelled out in the tree, and cex sets the font size (you can experiment with this if you would like). In your Word document, include a screen capture of your visualized decision tree. Write a short explanation of how the percentages in each tree leaf would be interpreted. Make predictions for each of the policy buyers by applying your decision tree model the Policy Buyers data set. When using the predict function in R, be sure to include the parameter type=”class” so that you will generate an Insurance Category for each policy buyer. Using the Filter feature in R Studio, report the number of policy buyers that you predict will fall into each of the four categories. Be sure to label these clearly in your Word document. If you have done this step correctly, the numbers predicted for each category should total to 473, which is the number of records in the Policy Buyers data set. Conduct research about how the insurance industry uses analytics to manage risk as they insure their customers. Write a brief summary of your research (1–2 paragraphs) discussing how the insurance industry uses analytics. Be sure to include discussion of both legal and ethical ramifications for the industry in their use of analytics. Cite your sources both in the text and in a references page. Screen shots need to be included for each step in R Studio.
8 pages
Organization Analysis 1
❑ Descriptive statistics are shifts, total sales, and participant ID ❑ I completed my analysis by evaluating, cleaning ...
Organization Analysis 1
❑ Descriptive statistics are shifts, total sales, and participant ID ❑ I completed my analysis by evaluating, cleaning, and summarizing ❑We ...
MATU203 Mickey & Minnie Home Sales Statistics And Probability Research Paper
see details in attached instructions and rubrics for all parts. must be in APA format.
MATU203 Mickey & Minnie Home Sales Statistics And Probability Research Paper
see details in attached instructions and rubrics for all parts. must be in APA format.
Earn money selling
your Study Documents