Description
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.
Explanation & Answer
y=-x...................................................................
Completion Status:
100%
Review
Review
Anonymous
Just what I needed…Fantastic!
Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4
24/7 Homework Help
Stuck on a homework question? Our verified tutors can answer all questions, from basic math to advanced rocket science!
Most Popular Content
7 pages
Part 2 E Commerce Retail Sales
Time Series Plot of Quarterly E-commerce Retail Sales a CMA Plot of Quarterly E-commerce Retail Sales as Percent Retail Sa ...
Part 2 E Commerce Retail Sales
Time Series Plot of Quarterly E-commerce Retail Sales a CMA Plot of Quarterly E-commerce Retail Sales as Percent Retail Sales as Percent of Total U.S. ...
MAT 144 Grand Canyon University Modified Version of Technology Worksheet
Hello Class, and Happy Monday!USE THE TEMPLATE ATTACHED TO THIS POST RATHER THAN THE TEMPLATE IN ALEKS. THE TEMPLATE INCLU ...
MAT 144 Grand Canyon University Modified Version of Technology Worksheet
Hello Class, and Happy Monday!USE THE TEMPLATE ATTACHED TO THIS POST RATHER THAN THE TEMPLATE IN ALEKS. THE TEMPLATE INCLUDES INSTRUCTIONS FOR COMPLETING THE DQ.For this DQ, you’ll complete a modified version of Technology Assignment 1 in Lesson 3-1 of the textbook in ALEKS, for which you’ll look at the empirical vs. theoretical probabilities of rolling a given number on a multi-sided die.Download the attached template (“QR 3-1 Tech Template MODIFIED.xlsx”) and follow the instructions in the text box to complete the Excel. In this case, the ALEKS video may not provide that much help. Here is a video I made to help!Here are some additional hints for completing the DQ:Hint #1: The RANDBETWEEN() function used in cell B2 will be invoked each time you make a change in the Excel sheet, so your Frequency and Empirical Probability column entries in E2:F16 will keep updating as you change the worksheet. This is normal.Hint #2: Use these steps to copy and paste the contents of cells E2:F16 by value:Select the range of cells E2:F16, then press Copy.Now right-click in the top-left cell that you’ll be copying to (either D21 or F21), and then select the “By Value” option (the clipboard with “123” underneath it) from the Paste Options section of the right-click menu.Once you have copied-and-pasted the cell contents, the values in the cells that you’re copying to should no longer change.Here’s a link to a video that may also help with this: https://www.youtube.com/watch?v=PbRQBse3Ob0.Hint #3: For your response comparing the empirical probability with 1,000 rolls against the empirical probability with 4,000 rolls, what you should do is compare each set of empirical probabilities against the theoretical probabilities in cells G2:G16. Which group of empirical probabilities is, on average, closer to the theoretical probabilities? (How would you be able to quantify “closer” in this case?) How might the number of rolls be contributing to this result?I hope this helps; feel free to let me know if you have any questions.
2 pages
Answers Week 2 Quiz
1). In the CH 4 video entitled "Example 6: Cost of a Car", how much would Jorge save 2). In the CH 4 video entitled "Examp ...
Answers Week 2 Quiz
1). In the CH 4 video entitled "Example 6: Cost of a Car", how much would Jorge save 2). In the CH 4 video entitled "Example 5: Monthly Compounding ...
Harvard Stats Homework Using R Lab 5 Help
stats homework using RHomework- lab 5:#The central limit theorem (CLT) states that as the sample size gets sufficiently la ...
Harvard Stats Homework Using R Lab 5 Help
stats homework using RHomework- lab 5:#The central limit theorem (CLT) states that as the sample size gets sufficiently large, the distribution of the sample means will be normally distributed.#In addition, the CLT has been used to justify the fact that for many of our statistics we rely upon computing the mean (not median or trimmed mean) of our samples#There are a few problems with the CLT. #1) How large of a sample is needed#2) It seems that our experiments with the contaminated normal may contradict this.#In this homework assignment you will investigate the CLT further. #PART 1 - The Central Limit Theorem under Normality.#1.1) Simulate a standard normal population of 1 million people called pop1 #1.2) Draw 5000 samples of size 20 and put these in sam20. Draw 5000 samples of size 50 and put these in sam50 .#1.3) Create variables called sam20means and sam50means that contains the means of the samples . Use a density plot to show the sampling distribution of the means for sam20means and sam50means together#1.4) Compare the Standard Error (SE) of the sampling distributions. Which sample size creates better estimates of the population mean (ie. has the lowest SE)? #PART 2 - The Central Limit Theorem under Non-Normality#2.1) Simulate a contaminated normal population using cnorm() of 1 million people called pop2 where 30% (epsilon=0.3) of the data have an SD of 30 (k=30) .#2.2) Draw 5000 samples of size 30 and put these in sam30. Draw 5000 samples of size 100 and put these in sam100.#2.3) Create variables called sam30means, sam30tmeans, sam100means, sam100tmeans that represent the means AND trimmed means for the samples. #2.4) Use a density plot to show the sampling distribution of the means and trimmed means for these variables.#2.5) Compare the Standard Error (SE) of the sampling distributions. #2.6) Which would be better here: a larger sample size using the mean as the location estimator OR a smaller sample using the trimmed mean? #2.7) Which location estimator performs the best, regardless of sample size?-------------------------------------------------------------------------------------------------------------------------------------------------Lab 5 lecture notes:#Lab 5-Contents# 1. Sampling Distribution of the Mean, # Median, and Trimmed Mean under Normality# 2. Sampling Distribution of the Mean, # Median, and Trimmed Mean under Non-Normality# 3. The Central Limit Theorem# Last week we saw that when we had a Normal or Uniform population, # that the means of random samples taken from that population #were normally distributed.#Today we are going to investigate the distributions of the mean,#median, and trimmed mean from samples coming from Normal # and non-normal populations.#---------------------------------------------------------------------------------# 1. Sampling Distribution of the Mean, Median, # and Trimmed Mean under Normality#--------------------------------------------------------------------------------- #Let's start by generating a standard normal distribution (mean=0, SD=1) for 1 million subjectspop1 = rnorm(1000000, mean=0, sd=1) #We will use this as our population from a normal distribution #*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*##EXERCISE 1-1: #A) Find the mean, median, trimmed mean (using tmean() ), and sd of pop1#B) Draw a density plot of pop1#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*# #A)mean(pop1); median(pop1); tmean(pop1); sd(pop1) #B) plot(density(pop1)) #Like we did last week, we are going to want to take random samples # from our population and then compute a measure of central tendency #(eg. mean, median, trimmed mean) for each sample and examine #the distribution of this measure. #We are going to take 5000 samples of 20 subjectssam1 = matrix(, ncol=5000, nrow=20)#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*## EXERCISE 1-2: Use a loop to draw 5000 samples of size 20 from pop1 # an place the samples in sam1#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#for (ii in 1:5000) {sam1[ , ii] = sample(pop1, 20, replace=TRUE)} # Now that we have our datafile containing all 5000 samples (ie. sam1) # we can begin to create variables for each of our location measures #I'll start us off with the meansam1means = apply(sam1, 2, mean) # number 2 = work in the columns rather than rows#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*## EXERCISE 1-3: Use the apply function to generate # the variables sam1meds (medians) and sam1tmeans (trimmed mean)#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#sam1meds = apply(sam1, 2, median)sam1tmeans = apply(sam1, 2, tmean) #Let's look at the distributions of each of these location estimatorsplot(density(sam1means))lines(density(sam1meds), col="red")lines(density(sam1tmeans), col="blue")abline(v = mean(pop1), lty=2) #Add in a line for the pop1 mean#??????????????????????????????????????????????????????????????##Thought Question 1: Which location estimator performs the best #for data coming from a normal population? Why?#??????????????????????????????????????????????????????????????# # One of the ways we can determine which location estimator # performs the best is by looking at the standard deviation # of the estimator accross all the samples. # The estimator with the lowest SD will have the least amount # of variability accross the samples. # A more common name for the standard deviation of the location # estimator is called the Standard Error or SE#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*## EXERCISE 1-4: Find the Standard Error of the sample means,# medians, and trimmed means. Based upon the SE, which # location estimator is the best for samples coming from# a normal population?#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*# sd(sam1means); sd(sam1meds); sd(sam1tmeans) #The mean performs the best.# In real life, we generally cannot go out an collect multiple samples # from a population, so we compute the Standard Error using a formula: # SE = sd(sample) / sqrt(sample N)#---------------------------------------------------------------------------------# 2. Sampling Distribution of the Mean, Median, # and Trimmed Mean under Non-Normality#---------------------------------------------------------------------------------# Normal distributions generally have very few outliers, # however when outliers begin to occur more frequently so of the # basic assumptions about normal distributions are no longer true # (as we are about to see). # One distribution that is like a normal distribution,# but with more outliers is called a mixed or contaminated # normal distribution and it is a result of two populations mixing together. #*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*# #EXAMPLE 1: "a" will be a mix of TWO populations 1: with SD=1 and 2: with SD=2 a=c(rnorm(5000, 0, 1), rnorm(5000, 0, 2)) #Let's compare this to b, which is from ONE population but with the same parameters of a b=rnorm(10000, mean(a), sd(a))plot(density(a))lines(density(b), col="red") #*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*##??????????????????????????????????????????????????????????????##Thought Question 2: How are a and b from Example 1 different?#??????????????????????????????????????????????????????????????##Thankfully, rather than having to create contaminated normal distributions the hard way, we can just use#a function provided to us by Dr. Wilcox called cnorm()#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^##Contaminated/Mix Normal Distribution: cnorm(n, epsilon=0.1, k=10)#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^##Let's look at the options for the contaminated normal distribution:#cnorm() combines two normal distributions: #1) A standard normal (mean=0, sd=1) for 1-epsilon % of the data #2) A normal of mean=0 and sd=k for epsilon % of the data #If we were trying to re-create the variable a we made in example 1 we would have to do:z=cnorm(10000, epsilon=0.5, k=2)plot(density(a))lines(density(z), col="blue")#Which looks very very similar to a! #Let's create a second population called pop2 from a contaminated normal distributionpop2 = cnorm(1000000, epsilon=0.1, k=10) #The mean, sd, and plot of which are:mean(pop2); sd(pop2); plot(density(pop2))#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*##EXERCISE 2:#A) Create an empty matrix called sam2 to contain 5000 samples #of 20 observations each#B) Populate sam2 with 5000 random samples of size 20 from pop2#C) Compute the mean (sam2means), median (sam2meds), #and trimmed mean (sam2tmeans) for each sample#D) Create an overlaid density plot of each sample WITH the pop2 #mean as a verticle line#E) Find the SE of each location estimator#F) Based upon the SE, which location estimator is the best # for samples coming from a contaminated normal distribution#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*# #A) #B) #C) #D) #E) #F)#---------------------------------------------------------------------------------# 3. The Central Limit Theorem#--------------------------------------------------------------------------------- #We've discovered a few things today: #1) When a population comes from a normal distribution, # then mean will be the best location estimator of the samples#2) When a population comes from a mixed/contaminated normal distribution, # the trimmed mean is the best location estimator# These observations are related to the Central Limit Theorem (CLT)# that is discussed in Section 5.3 of the book (page 85)# The CLT states that as the sample size gets sufficiently large, # the distribution of the sample means will be normally distributed.# We saw a demonstration of this last week when we looked at the means # from the unifom distribution.# The CLT has been used to justify the fact that for many of our statistics# we rely upon computing the mean (not median or trimmed mean) of our samples#There are a few problems with the CLT. #1) how large of a sample do we need? #2) It seems that our experiements with the contaminated normal may contradict this.#In the homework you will investigate this further
Similar Content
Find the line of symmetry and the vertex for the equation
y=x squared+5x+4...
I want partial derivatives
I want prove the partial derivatives in the attached file with mention the rule used and explain the answer.Thank you...
MATH 180 Grossmont Cuyamaca Community Basic Calculus Exam
please solve this exam and make it clear with good hand writing. show all steps. ...
Calculus question: local max
Give an example of a continuous function that is defined for all x > 0 and such that x=1 is a local maximum but not a g...
find the length of the lower base and the perimeter of the trapezoid
the length of the median in isosceles trapezoid TPRA us 25 meters. TA is 16 AR is 8.5m the I am looking for PR and TP...
Boston University Economics Negative ?Externalities Discussion
Externalities have always been an interesting debate topic in economics.Check this article for example:https://www.forbes....
Related Tags
Book Guides
The Awakening
by Kate Chopin
All Quiet on the Western Front
by Erich Maria Remarque
To Kill a Mockingbird
by Harper Lee
What Happened
by Hillary Clinton
The President is Missing
by James Patterson, Bill Clinton
The BFG
by Roald Dahl
Don Quixote
by Miguel de Cervantes
Slaughterhouse Five
by Kurt Vonnegut
Dr Jekyll And Mr Hyde
by Robert Louis Stevenson
Get 24/7
Homework help
Our tutors provide high quality explanations & answers.
Post question
Most Popular Content
7 pages
Part 2 E Commerce Retail Sales
Time Series Plot of Quarterly E-commerce Retail Sales a CMA Plot of Quarterly E-commerce Retail Sales as Percent Retail Sa ...
Part 2 E Commerce Retail Sales
Time Series Plot of Quarterly E-commerce Retail Sales a CMA Plot of Quarterly E-commerce Retail Sales as Percent Retail Sales as Percent of Total U.S. ...
MAT 144 Grand Canyon University Modified Version of Technology Worksheet
Hello Class, and Happy Monday!USE THE TEMPLATE ATTACHED TO THIS POST RATHER THAN THE TEMPLATE IN ALEKS. THE TEMPLATE INCLU ...
MAT 144 Grand Canyon University Modified Version of Technology Worksheet
Hello Class, and Happy Monday!USE THE TEMPLATE ATTACHED TO THIS POST RATHER THAN THE TEMPLATE IN ALEKS. THE TEMPLATE INCLUDES INSTRUCTIONS FOR COMPLETING THE DQ.For this DQ, you’ll complete a modified version of Technology Assignment 1 in Lesson 3-1 of the textbook in ALEKS, for which you’ll look at the empirical vs. theoretical probabilities of rolling a given number on a multi-sided die.Download the attached template (“QR 3-1 Tech Template MODIFIED.xlsx”) and follow the instructions in the text box to complete the Excel. In this case, the ALEKS video may not provide that much help. Here is a video I made to help!Here are some additional hints for completing the DQ:Hint #1: The RANDBETWEEN() function used in cell B2 will be invoked each time you make a change in the Excel sheet, so your Frequency and Empirical Probability column entries in E2:F16 will keep updating as you change the worksheet. This is normal.Hint #2: Use these steps to copy and paste the contents of cells E2:F16 by value:Select the range of cells E2:F16, then press Copy.Now right-click in the top-left cell that you’ll be copying to (either D21 or F21), and then select the “By Value” option (the clipboard with “123” underneath it) from the Paste Options section of the right-click menu.Once you have copied-and-pasted the cell contents, the values in the cells that you’re copying to should no longer change.Here’s a link to a video that may also help with this: https://www.youtube.com/watch?v=PbRQBse3Ob0.Hint #3: For your response comparing the empirical probability with 1,000 rolls against the empirical probability with 4,000 rolls, what you should do is compare each set of empirical probabilities against the theoretical probabilities in cells G2:G16. Which group of empirical probabilities is, on average, closer to the theoretical probabilities? (How would you be able to quantify “closer” in this case?) How might the number of rolls be contributing to this result?I hope this helps; feel free to let me know if you have any questions.
2 pages
Answers Week 2 Quiz
1). In the CH 4 video entitled "Example 6: Cost of a Car", how much would Jorge save 2). In the CH 4 video entitled "Examp ...
Answers Week 2 Quiz
1). In the CH 4 video entitled "Example 6: Cost of a Car", how much would Jorge save 2). In the CH 4 video entitled "Example 5: Monthly Compounding ...
Harvard Stats Homework Using R Lab 5 Help
stats homework using RHomework- lab 5:#The central limit theorem (CLT) states that as the sample size gets sufficiently la ...
Harvard Stats Homework Using R Lab 5 Help
stats homework using RHomework- lab 5:#The central limit theorem (CLT) states that as the sample size gets sufficiently large, the distribution of the sample means will be normally distributed.#In addition, the CLT has been used to justify the fact that for many of our statistics we rely upon computing the mean (not median or trimmed mean) of our samples#There are a few problems with the CLT. #1) How large of a sample is needed#2) It seems that our experiments with the contaminated normal may contradict this.#In this homework assignment you will investigate the CLT further. #PART 1 - The Central Limit Theorem under Normality.#1.1) Simulate a standard normal population of 1 million people called pop1 #1.2) Draw 5000 samples of size 20 and put these in sam20. Draw 5000 samples of size 50 and put these in sam50 .#1.3) Create variables called sam20means and sam50means that contains the means of the samples . Use a density plot to show the sampling distribution of the means for sam20means and sam50means together#1.4) Compare the Standard Error (SE) of the sampling distributions. Which sample size creates better estimates of the population mean (ie. has the lowest SE)? #PART 2 - The Central Limit Theorem under Non-Normality#2.1) Simulate a contaminated normal population using cnorm() of 1 million people called pop2 where 30% (epsilon=0.3) of the data have an SD of 30 (k=30) .#2.2) Draw 5000 samples of size 30 and put these in sam30. Draw 5000 samples of size 100 and put these in sam100.#2.3) Create variables called sam30means, sam30tmeans, sam100means, sam100tmeans that represent the means AND trimmed means for the samples. #2.4) Use a density plot to show the sampling distribution of the means and trimmed means for these variables.#2.5) Compare the Standard Error (SE) of the sampling distributions. #2.6) Which would be better here: a larger sample size using the mean as the location estimator OR a smaller sample using the trimmed mean? #2.7) Which location estimator performs the best, regardless of sample size?-------------------------------------------------------------------------------------------------------------------------------------------------Lab 5 lecture notes:#Lab 5-Contents# 1. Sampling Distribution of the Mean, # Median, and Trimmed Mean under Normality# 2. Sampling Distribution of the Mean, # Median, and Trimmed Mean under Non-Normality# 3. The Central Limit Theorem# Last week we saw that when we had a Normal or Uniform population, # that the means of random samples taken from that population #were normally distributed.#Today we are going to investigate the distributions of the mean,#median, and trimmed mean from samples coming from Normal # and non-normal populations.#---------------------------------------------------------------------------------# 1. Sampling Distribution of the Mean, Median, # and Trimmed Mean under Normality#--------------------------------------------------------------------------------- #Let's start by generating a standard normal distribution (mean=0, SD=1) for 1 million subjectspop1 = rnorm(1000000, mean=0, sd=1) #We will use this as our population from a normal distribution #*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*##EXERCISE 1-1: #A) Find the mean, median, trimmed mean (using tmean() ), and sd of pop1#B) Draw a density plot of pop1#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*# #A)mean(pop1); median(pop1); tmean(pop1); sd(pop1) #B) plot(density(pop1)) #Like we did last week, we are going to want to take random samples # from our population and then compute a measure of central tendency #(eg. mean, median, trimmed mean) for each sample and examine #the distribution of this measure. #We are going to take 5000 samples of 20 subjectssam1 = matrix(, ncol=5000, nrow=20)#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*## EXERCISE 1-2: Use a loop to draw 5000 samples of size 20 from pop1 # an place the samples in sam1#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#for (ii in 1:5000) {sam1[ , ii] = sample(pop1, 20, replace=TRUE)} # Now that we have our datafile containing all 5000 samples (ie. sam1) # we can begin to create variables for each of our location measures #I'll start us off with the meansam1means = apply(sam1, 2, mean) # number 2 = work in the columns rather than rows#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*## EXERCISE 1-3: Use the apply function to generate # the variables sam1meds (medians) and sam1tmeans (trimmed mean)#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#sam1meds = apply(sam1, 2, median)sam1tmeans = apply(sam1, 2, tmean) #Let's look at the distributions of each of these location estimatorsplot(density(sam1means))lines(density(sam1meds), col="red")lines(density(sam1tmeans), col="blue")abline(v = mean(pop1), lty=2) #Add in a line for the pop1 mean#??????????????????????????????????????????????????????????????##Thought Question 1: Which location estimator performs the best #for data coming from a normal population? Why?#??????????????????????????????????????????????????????????????# # One of the ways we can determine which location estimator # performs the best is by looking at the standard deviation # of the estimator accross all the samples. # The estimator with the lowest SD will have the least amount # of variability accross the samples. # A more common name for the standard deviation of the location # estimator is called the Standard Error or SE#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*## EXERCISE 1-4: Find the Standard Error of the sample means,# medians, and trimmed means. Based upon the SE, which # location estimator is the best for samples coming from# a normal population?#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*# sd(sam1means); sd(sam1meds); sd(sam1tmeans) #The mean performs the best.# In real life, we generally cannot go out an collect multiple samples # from a population, so we compute the Standard Error using a formula: # SE = sd(sample) / sqrt(sample N)#---------------------------------------------------------------------------------# 2. Sampling Distribution of the Mean, Median, # and Trimmed Mean under Non-Normality#---------------------------------------------------------------------------------# Normal distributions generally have very few outliers, # however when outliers begin to occur more frequently so of the # basic assumptions about normal distributions are no longer true # (as we are about to see). # One distribution that is like a normal distribution,# but with more outliers is called a mixed or contaminated # normal distribution and it is a result of two populations mixing together. #*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*# #EXAMPLE 1: "a" will be a mix of TWO populations 1: with SD=1 and 2: with SD=2 a=c(rnorm(5000, 0, 1), rnorm(5000, 0, 2)) #Let's compare this to b, which is from ONE population but with the same parameters of a b=rnorm(10000, mean(a), sd(a))plot(density(a))lines(density(b), col="red") #*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*##??????????????????????????????????????????????????????????????##Thought Question 2: How are a and b from Example 1 different?#??????????????????????????????????????????????????????????????##Thankfully, rather than having to create contaminated normal distributions the hard way, we can just use#a function provided to us by Dr. Wilcox called cnorm()#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^##Contaminated/Mix Normal Distribution: cnorm(n, epsilon=0.1, k=10)#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^##Let's look at the options for the contaminated normal distribution:#cnorm() combines two normal distributions: #1) A standard normal (mean=0, sd=1) for 1-epsilon % of the data #2) A normal of mean=0 and sd=k for epsilon % of the data #If we were trying to re-create the variable a we made in example 1 we would have to do:z=cnorm(10000, epsilon=0.5, k=2)plot(density(a))lines(density(z), col="blue")#Which looks very very similar to a! #Let's create a second population called pop2 from a contaminated normal distributionpop2 = cnorm(1000000, epsilon=0.1, k=10) #The mean, sd, and plot of which are:mean(pop2); sd(pop2); plot(density(pop2))#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*##EXERCISE 2:#A) Create an empty matrix called sam2 to contain 5000 samples #of 20 observations each#B) Populate sam2 with 5000 random samples of size 20 from pop2#C) Compute the mean (sam2means), median (sam2meds), #and trimmed mean (sam2tmeans) for each sample#D) Create an overlaid density plot of each sample WITH the pop2 #mean as a verticle line#E) Find the SE of each location estimator#F) Based upon the SE, which location estimator is the best # for samples coming from a contaminated normal distribution#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*# #A) #B) #C) #D) #E) #F)#---------------------------------------------------------------------------------# 3. The Central Limit Theorem#--------------------------------------------------------------------------------- #We've discovered a few things today: #1) When a population comes from a normal distribution, # then mean will be the best location estimator of the samples#2) When a population comes from a mixed/contaminated normal distribution, # the trimmed mean is the best location estimator# These observations are related to the Central Limit Theorem (CLT)# that is discussed in Section 5.3 of the book (page 85)# The CLT states that as the sample size gets sufficiently large, # the distribution of the sample means will be normally distributed.# We saw a demonstration of this last week when we looked at the means # from the unifom distribution.# The CLT has been used to justify the fact that for many of our statistics# we rely upon computing the mean (not median or trimmed mean) of our samples#There are a few problems with the CLT. #1) how large of a sample do we need? #2) It seems that our experiements with the contaminated normal may contradict this.#In the homework you will investigate this further
Earn money selling
your Study Documents