Sept 3 2018

User Generated

cerggloynpx2005

Business Finance

Description

1.

Show the outcomes, the number of combinations, and the probabilities for the discrete random variable for the number of tails in three successive flips of a coin. Also calculate the mean and the variance.

2.

Baseball’s World Series follows a best-of-seven series. The series winner must win four games to be crowned World Champion. However, teams do not always play seven games. As soon as one team wins four games, the series is over. The following table lists the number of games played in every World Series from 1905 to 2011, along with the probability of that happening. For example, 20 times (19.4%) the series was over in four games. The probability table is given below.

  • Calculate the expected number of games, the variance, and the standard deviation for the number of games played in a world series.
  • Networks bid for World Series broadcasting rights, yet they cannot know with certainty how many games they will broadcast. Assume they don’t make a profit unless at least six games are played in the series. What is the probability they will make a profit in any given year?

3.

The binomial tables used by the author are based on a cumulative distribution. Given a sample size (n) and an overall probability of success (p), you can solve probability problems for the binomial distribution using the tables. It is important to learn to use the tables because they help solve some problems much faster than using the binomial formula, which may require many calculations. The key to understanding the tables is that the table is based on a cumulative distribution. Sometimes you need to perform addition and subtraction to get the correct answer. Answer the following questions using the Binomial Table for n = 25 and p = 0.4.

  • Solve the probability for x = 8 using the binomial formula.
  • Next, get the same answer from the binomial tables. To use the table you have to subtract the cumulative probability for x = 7 from the cumulative probability for x = 8. This leaves the exact probability of 8. Confirm that the table calculation equals the formula calculation (to at least four decimal places).
  • What is the probability of more than 8, P(x > 8)? To solve this, you need to subtract the table value for 8 from 1.0000.
  • What is the probability of 4 or less, P(x < 5)?
  • What is the probability of between 6 and 9, P(x > 5 and x < 10)?

4.

Supposed an estimated 30% of high school students smoke cigarettes. You randomly select 25 high school students and survey them about their attitudes on smoking, and you ask them if they smoke. Look at the probabilities associated with the number of smokers out of 25.This is a binomial probability distribution problem with n = 25 and p = 0.30.

  • What is the probability that at least 8 students smoke? P(x ≥ 8)?
  • What is the probability that exactly 5 students smoke? P(x = 5)?
  • Calculate the mean, variance, and standard deviation for this problem.

5.

The standard normal table used by the author is based on μ = 0 and σ = 1. The table shows the probability from the center of the distribution out so many standard deviations in the right side of the curve. The right side of the distribution has a total probability of 0.5 (as does the left side). Given any mean and standard deviation from a normal distribution, computing a z-score converts the distribution to a standard normal. Since the normal distribution is symmetrical, the left side is a mirror image of the right side, and you can use the absolute value of a negative z-score to find probabilities on the left side of the curve. It is important to learn to use the standard normal table to solve normal distribution problems.

To get started, find the probabilities based on a normal distribution with a mean of 100 and a standard deviation of 10; that is, μ = 100 and σ = 10.

  • The probability between 100 and 115
  • The probability greater than 115
  • The probability between 120 and 125
  • The probability less than 175
  • The probability greater than 181

6.

The height of females aged 20 to 29 follows a normal distribution with μ = 64.1 and σ = 2.8.

  • What is the probability that a female aged 20 to 29 will be under 60 inches tall?
  • What is the probability that a female aged 20 to 29 is taller than 66 inches?
  • What is the probability that a female aged 20 to 29 will be between 61 and 68 inches tall?
  • What is the probability that a female aged 20 to 29 will be shorter than 68 inches?
  • Modeling agencies like models 5’10” or taller. What is the probability a female aged 20 to 29 is greater or equal to 70 inches? Is this a rare event? Explain your answer.

7.

The length of pregnancies follows a normal distribution, with μ = 268 days and σ = 15 days.

  • What is the probability that a pregnancy will last less than 280 days?
  • What is the probability that a pregnancy will last less than 250 days?
  • Pregnancy is considered “at term” when gestation attains 37 complete weeks (259 days) but is less than 42 complete weeks (294 days). What proportion of pregnancies fall within this period?
  • What is the number of days at the 50th percentile?
  • New medical technology and procedures put viability as low as 24 weeks (168 days), but a child born that early only has a 50% chance of survival. Chances of survival are much better at 32 weeks. What is the probability that a pregnancy will last less than 32 weeks (224 days)? Is this a rare event? Explain your answer.

8.

The intelligence quotient (IQ) test can be scored based on a normal distribution with a mean of 100 and a standard deviation of 15: IQ ~ N(100, 15).

  • What proportion of the population has an IQ that falls between 90 and 120?
  • What proportion of the population has an IQ below 95?
  • What is the IQ value at the 85 percentile?
  • The definition of genius is a difficult one. One suggestion is a score higher than 136 on an IQ test, while another is the top 0.1% of the distribution. Calculate the probability of a value of 136 or higher. Compare that that to the alternative definition of the top 0.1% (this is a probability of 0.001).

9.

In a sampling distribution for a mean, if we know the population mean and standard deviation (μ and σ), then the distribution of the sample means follows a normal distribution, with a mean equal to μ and the standard deviation equal to σ/SQRT(n). Note that σ/SQRT(n) is called the standard error. Based on this information, if we took a sample of size n (n is given in the problem as some number), what is the probability that the sample mean is greater than (or less than) some value? This is simply a z-score and normal distribution problem, similar to what we did earlier. However, there is one important change with these problems. Now the denominator of the z-score is the standard error and not σ. That said, answer the following questions.

A manufacturing process produces a product that contains an average of 3.5 liters of liquid with a standard deviation of 0.25 liters (i.e., μ = 3.5 and σ = 0.25). If the plant manager takes a sample of 64 observations, what is the probability that:

  • The sample mean is greater than 3.55
  • The sample mean is less than 3.6
  • The sample mean is between 3.45 and 3.5
  • The sample mean is exactly equal to 3.5

10.

In a sampling distribution for a mean, if we know the population mean and standard deviation (μ and σ), then the distribution of the sample means follows a normal distribution, with a mean equal to μ and the standard deviation equal to σ/SQRT(n). Note that σ/SQRT(n) is called the standard error. Based on this information, if we took a sample of size n (n is given in the problem as some number), what is the probability that the sample mean is greater than (or less than) some value? This is simply a z-score and normal distribution problem, similar to what we did earlier. However, there is one important change with these problems. Now the denominator of the z-score is the standard error and not σ. That said, answer the following questions.

Assume the systolic blood pressure of young adults in the U.S. aged 20 to 30 years follows a normal distribution, with μ = 113.7 and σ = 11.7. If we take a random sample of 150 young adults, what is the probability that:

  • The sample mean is between 113 and 115
  • The sample mean is less than 111
  • The sample mean is greater than 111
  • The sample mean is greater than 116.5

11.

Suppose we work for a company that makes a popcorn product containing 1.2 ounces of unpopped kernels in a microwavable bag. However, no manufacturing process is perfect, and there is variability from bag to bag, which the factory manager seeks to keep to a minimum. Bags that are under-filled can lead to consumer complaints and lawsuits, while bags that are overfilled can result in lost profits and affect the quality of the popping process. Based on previous experience, the distribution of the popcorn bags is distributed normally with a mean of 1.21 oz. and a standard deviation of 0.22, bag~N(1.21, 0.22).

  • Suppose the manger takes a sample of 16 bags and observes a sample mean of 1.20 with a standard deviation of 0.20. One of the bags in the sample weighs 1.3 ounces. Calculate a z-score for the value of 1.3 and interpret its meaning.
  • The manager asks the following question: “If the mean and standard deviation of the population are true as given (i.e., μ = 1.21 and σ = 0.22) and I took a random sample of 16 bags, what is the probability that the sample mean would be greater than 1.3 oz.?” Note: This problem is a sampling distribution problem. Use the standard error in calculating a z-score.
  • The manager asks the same question as in the previous problem, but in reference to 49 bags. He chooses a larger sample size. Compare the probabilities from using a sample of 16 bags and to 49 bags. Explain why the answers are different.

12.

The data below are means taken from random samples from a population that is distributed as a uniform continuous distribution with parameters 0 and 10. A uniform continuous distribution would generate a histogram that looks like a rectangle. With parameters 0 and 10 the mean of this distribution is 5 and the standard deviation is 2.89 (think of these as μ = 5 and σ = 2.89). A uniform distribution is decidedly not normal or bell-shaped and, as such, provides a good illustration of whether the sampling distribution resembles a normal distribution. Samples of 36 observations were randomly drawn, and the mean was calculated for each sample. This was done in Excel using the RAND function. To demonstrate sampling distributions, 36 different sample means were examined (below). This is a small sample of the sample means, but nonetheless, it gives us insight into sampling distribution theory.

Unformatted Attachment Preview

Sampling Distribution for a Mean or Proportion R When we take a sample from a population we often do so with the notion I that our sample will provide a good estimate of a population parameter. Thus Csame as the mean of the the mean of the sample is expected to be about the population. In other words, our sample will represent A the population well. However, rarely will the sample estimate match the mean of the population exactly, and if we take different samples, even ifRthey are taken randomly with the same sample size, we would likely get slightly different estimates D of the population mean or proportion. In fact, we would expect that samples , will vary from one to another and we can demonstrate this mathematically and empirically. chapter 9 A the 44th president of the On November 6, 2009 Barak Obama was elected as United States. Usually presidents start out their D terms with high approval ratings, but the business of being president can be difficult, and most experience a decline in approval ratings within the first R year. Figure 9.1 shows the approval rating for Barack Obama for his first year I in office. The data come from repeated samples of approximately 1,600 adults in the United States E from the Gallup Poll (source: www.pollingreport.com). N The variability in favorability estimates in Figure 9.1 reflects changes that N take place over time. A single company, the Gallup organization, takes E each value from a different sample of approximately 1,600 adults in the 2 4 7 9 T S Daily Estimate of President Obama’s Approval Ratings from January 21, 2010 to March 8, 2010 by the Gallup Poll K11352_Ilvento_CH09.indd 169 figure 9.1 7/12/13 2:43 PM 170 c h a p t e r 9 Sampling Distribution for a Mean or Proportion United States. However, many polling companies working for different news organizations also take polls. If we looked at approval ratings across several polling ­organizations at roughly the same time period (February 3, 2010 to March 3, 2010) we see that the estimates are similar, but there are differences. Figure 9.2 shows the estimates the approval ratings estimated from seven different polling organizations. Each poll used slightly different methodologies and sample sizes, but all would be considered reasonable estimates of the approval rating for President Obama in roughly the same time period. These estimates vary from a low of 46% to a high of 53%. These differences are not large, but politically it makes a difference whether the approval rating is above or below 50%, and these estimates are not consistent. The results in Figure 9.2 are Rsomewhat like a small sampling distribution for the proportion of U.S. citizens who support President Obama in the period I February 3, 2010 to March 3, 2010. We expect estimates from a population to C This sampling variation of proportions, just like vary from sample to sample. that for means, tends to follow a normal distribution. The estimates will cenA ter around the true population proportion, but there will be variability from R sample to sample. D This is perhaps one of the most important chapters in the book. Sampling distributions are key to the, logic of inference and we need to have a good sense of how they work in order to move forward to topics such as confidence intervals and hypothesis tests. A sampling distribution is based on the notion of taking many, A many samples from a population, all of the same size n, and making an estimate D from each sample. The distribution of these estimates will follow. In a research setting we typically only take one sample, R as one of many possible samples that can be but we can think of our sample taken from the population,I and our estimate as one of many estimates that could have been made. In this chapter we will learn that most sampling disE tributions tend to follow a specific probability distribution and we can use the distribution of known probabilities to make inferences. One such distribution N is the normal distribution, and that will be our focus in this chapter as we look N at the sampling distribution for the mean. E The variability in the sample estimates above is called sampling error, and is an expected outcome of taking a sample to represent the population. 2 4 7 9 T S figure 9.2 K11352_Ilvento_CH09.indd 170 Seven Estimates of President Obama’s Approval Ratings from February 3, 2010 to March 3, 2010 7/12/13 2:43 PM c h a p t e r 9 Sampling Distribution for a Mean or Proportion 171 The sample estimate of the mean or proportion will not exactly match the population mean or proportion. We will expect some variability from sample to sample. The key is whether we know what the variability will look like and how it behaves. It turns out that statisticians do know how it behaves and the way it is distributed. This leads us to the sampling distribution of the mean—the distribution of means generated from many samples of the same size n. As we move forward in the discussion of sampling distributions, it is important to remember that we noted that a parameter is a numerical descriptive measure of the population, such as the mean or the variance. We use Greek terms to represent population parameters. These parameters are hardly ever known—you are generally doing the research R to gain an estimate or understanding of the population parameter. In contrast, a sample statistic is I a ­numerical descriptive measure from a sample—i.e., based on the observations in the sample. We will want the sample to be C derived from a random process in order to feel our sample represents the population. Inferential staA tistics requires the sample be drawn in a random fashion. R D A Sampling Distribution Experiment: Rolling, a Die Three Times I want you to be involved in a simple sampling experiment. All you need is a single die—you can get one from several board games in your home. Here is A your rolls: what I want you to do. A table is provided to record D 1. Toss a die three times Rface value—either 1, 2, 3, 2. Each time you toss the die, note and record the 4, 5, or 6 I 3. Calculate the mean and median for the three rolls—the median will E always be the middle value 4. Repeat this experiment ten times N N Table 9.1 is given below for you to fill out. For example if I roll the following E sequence: 5 1 6 the mean is (5 + 1 + 6)/3 = 12/3 = 4 and the median is 5, the middle value in an ordered sequence.2 4 7 9 Times and Calculating the Recording Table for an Experiment of Rolling a Die Three Mean and Median T Sample Roll 1 Roll 2 Roll 3 S Mean Median table 9.1 1 2 3 4 5 6 7 8 9 10 K11352_Ilvento_CH09.indd 171 7/12/13 2:43 PM 172 c h a p t e r 9 Sampling Distribution for a Mean or Proportion We are going to think of each three-roll sequence as a different sample of three rolls. Each sample will generate a mean, which we will think of as our sample estimate of the population mean. Each three-roll sequence is a different sample and each is likely to generate a different sample estimate of the mean. Using many samples brings us into the realm of a sampling distribution—many samples of the same size n. We will also compute the median as a way to compare the mean versus the median as a way to estimate the population mean. I know it may seem silly to use the median from the sample to estimate the mean of the population, but think of it as an alternative estimate of the central tendency of the population. Our experiment uses a very simple approach—multiple samples of size three (the sample size is R 3, n = 3). I chose this approach because using probability theory from past modules I can work out exactly what the mean I and variance is for the roll of a die. I can also work out all the possible combinations of this sampling C distribution, and I can do it using an a priori probability of 1/6 for each face of the die. If you look at your own simple A experiment, the values of the mean can range from 1 (rolling three ones) to R 6 (rolling three sixes). Remember, each three-roll sequence will be thought of as a sample. D , can note the possible outcomes of this experiUsing probability theory we ment as a discrete random variable. We start with the mean and variance of rolling a single die, represented as a following table (Table 9.2). A A priori we have the following D expectation: Mean = E(X) = 1(.1667)R + 2(.1667) + 3(.1667) + 4(.1667) + 5(.1667) + 6(.1667) I Mean = E(X) = 3.500 E N Variance = E(X − μ)2 = (1 − 3.5)2(.1667) + (2 − 3.5)2(.1667) + (3 − 3.5)2(.1667) N + (4 − 3.5)2(.1667) + (5 − 3.5)2(.1667) + (6 − 3.5)2(.1667) E = 2.916667 2 Standard Deviation = 1.7078 4 There are 6*6*6 = 216 different combinations of outcomes of rolling a die 7 are 216 different sample combinations that three times. In essence, there can result from rolling a die 9 three times. Rather than work with many random samples, I can work out all the possible combinations of each roll and T of those combinations. I can take the mean the mean and median of each and median of each possible S outcome and examine the summary statistics ­using JMP. table 9.2 Recording Table for an Experiment of Rolling a Die Three Times and Calculating the Mean and Median X P (X) K11352_Ilvento_CH09.indd 172 1 .1667 2 .1667 3 .1667 4 .1667 5 .1667 6 .1667 7/12/13 2:43 PM c h a p t e r 9 Sampling Distribution for a Mean or Proportion R JMP Output for the Descriptive Statistics for the MeanIof the 216 Different Outcomes of Rolling a Die Three Times C A Notice several things from Figure 9.3, which is the R sampling distribution of rolling three die and calculating the mean. D • There are 216 observations in the count, the, number of possible out- 173 figure 9.3 comes of rolling a die three times. • The minimum value is 1 (a mean from rolling three ones) and the maximum is 6 (the mean of rolling three sixes). A • The mean of each of the sample means is in fact the population mean of D 3.5. Thus the mean of the sampling distribution will equal the mean of the population. R • The median of the sampling distribution is also 3.5, which agrees exactly I with the estimate using the mean. E • The histogram shows a symmetrical, mound-shaped distribution with the center at 3.5. N • The standard deviation for this variable is .989. N We noted that the standard deviation of the population E was 1.7078, which is considerably larger than the standard deviation of the means of each sample. However, if we divide this figure by the square root of the sample size for our experiment (n = 3, so the square root of 3 = 1.7321), we get the following 2 result: 1.7078/1.7321 = .986 or rounded off to .99, which is very close to the value of .989 in the table. The standard deviation of4a sampling distribution is called the standard error. I will return in a bit to the 7 standard error and why we used the square root of the sample size to make this calculation. For now 9 we can note that the standard deviation of our sampling distribution for the mean is smaller than the population value and it can T be expressed as a function of the sample size. S As we move forward we are going to think of our sample estimate, i.e., the mean, as a reasonable estimator of the population value, μ. However, there may be more than one estimator available to us. And the question arises, which estimator is the best one? In anticipation of this question, let us allow that the sample median might be a better estimate of the population parameter than the mean. In other words, we will consider whether the sample median might be a preferred way to estimate the mean of a population. K11352_Ilvento_CH09.indd 173 7/12/13 2:43 PM 174 c h a p t e r 9 Sampling Distribution for a Mean or Proportion figure 9.4 JMP Output for the DescriptiveR Statistics for the Median of the 216 Different Outcomes of Rolling a Die Three Times I C Figure 9.4 shows the JMP output for the median of each three-roll sample of A die. The average of the medians from each of the outcomes is also 3.5, the R population value. The histogram also shows a symmetrical, mound-shaped distribution for the median. However, the standard deviation of the 216 D ­median values is 1.374, considerably higher than the value for the distribu, that the sampling distribution of the median tion of the means. This shows has a larger variance than that of the mean. We will say that the mean has minimum variance in comparison to the median as an estimator of the popuA lation value, μ. D R The Sampling Distribution of the Mean I Sample statistics are random variables. By this we mean that the sample E statistic, in this case the mean, that is estimated from each sample will vary N from sample to sample. Sample statistics have a probability distribution based on repeating the sampling experiment many times. We will get slightly N different sample statistics each time, even though we would consider each as a reasonable estimate ofEthe population mean. Repeating the experiment (­ obtaining a sample and calculating the mean of the sample) many times results in a sampling distribution. The sampling distribution of a sample sta2 of n measurements results in a probability tistic calculated from a sample distribution of the statistic.4In order to better know what the probability distribution is, we need to know how the sampling distribution is distributed. 7 I hope to make it clear shortly. I know that is a mouthful, but 9 I should start with the notion of an estimator. It is not an easy concept for T some students to grasp because our conclusion will seem so obvious. The ­estimator is the strategy weS use to estimate the population parameter. It could be as simple as a random guess or a combination of guesses from an expert panel. For example, if we want an estimate of the average systolic blood pressure of all adult males in Delaware in March of 2010 I could simply guess that it is 125. My guess may not be particularly good, but it is an estimate. I might ask a panel of doctors and nurses in Delaware to help me estimate the average, and they might end up saying that a better estimate is 131. It would not be too much of a stretch to think that their estimate is better than mine, but that still would not say this alternative strategy gave a good estimate. After all, doctors and nurses tend to see a disproportionate number of people who are sick or need assistance. Their estimate might not represent the true average. An even better way to estimate the population parameter is to take K11352_Ilvento_CH09.indd 174 7/12/13 2:43 PM c h a p t e r 9 Sampling Distribution for a Mean or Proportion 175 a random sample of adult males, take their blood pressure, and then take the sample average as the estimate of the population value. This makes sense— use the sample mean to represent the population mean. In most cases, our estimator will be the sample statistic. As I said, this seems obvious. We will use the sample mean to make an estimate of the population parameter μ. But statisticians often need something more than saying “it is obvious” as evidence that an estimator is a good one. One set of criteria is that we want our estimators to be BLUE. BLUE stands for, best linear unbiased estimator. Without going into too much detail, here are the basics for a BLUE estimator. Best Best refers to a criterion that ourRestimator has minimum variance. This means that if we took repeated measures of I around the true value the sample estimates, the variability would be less than any other estimator. C Linear Linear refers to a class of estimators A that tend to be simple and straightforward. All things being equal, a simple estiR mate is thought to be better. Unbiased D our estimates from Unbiased refers to a property where sample to sample tend to cluster around the true value with, out missing the target in one direction or another. Estimator This reflects the notion that there could be many different A ways to estimate a population parameter, and we will pick the one that is best, linear, and unbiased. D R One of the best analogies for an estimator that I have heard is to think of it as hitting a bull’s eye by shooting a rifle (Kementa, John, I Elements of Econometrics, Macmillian Publishing Co., New York, 1971). The bull’s eye on the target is E the population parameter. The rifle is an estimator. We prefer to pick a rifle that N is straightforward and simple rather one that is complicated, such as a machine gun. This makes the rifle linear. While we would like to hit the bull’s eye every N time, we know that will not be possible. However, we want our shots to center E as above right or below around the bull’s eye and not miss systematically, such left. This would make our rifle unbiased. And finally, we want a rifle that tends to have a tight pattern around the bull’s eye rather than a wider dispersion. We 2 target. want the rifle to have minimum variance around the 4 There is one last component to our analogy with target shooting that is 7 ­important to our discussion of estimators and sampling distributions. The further away one is from the target, the less accurate 9 the shots will be. Think of the distance from the target as being inversely rated to the sample size. A large sample size implies you are close to theTtarget and therefore better able to have a tight pattern around the bull’s eye. S A small sample would imply being further away. All things being equal, we will tend to make good estimates if our estimator uses a larger sample, has minimum variance, is unbiased, and is linear (think of this as being simple). We will use the sample mean, taken from a random sample, as our estimator of the population mean. Here is what we would like to see in our sampling distribution of the mean. If the sample mean is a good estimator of the population mean, we would expect the values of the sample means taken from many samples to cluster around the true population mean. We would not want them to tend to cluster at a point above or below the true value, or else we would consider them to be a biased estimate. And, we might K11352_Ilvento_CH09.indd 175 7/12/13 2:43 PM 176 c h a p t e r 9 Sampling Distribution for a Mean or Proportion say our estimator is “good” if the cluster of the sample means around the population mean is tighter than the sampling distribution of some other possible estimator. This property is called minimum variance. In relation to the sample mean as being a good estimator of the population parameter μ, we already showed with the die example that the variance/standard deviation of the sampling distribution of the mean is smaller than the sampling distribution of the median. Let us use the following example to set up our discussion of a sampling distribution. Suppose we are looking at the blood pressure for the population of adult males (ages 18 to 85) in Delaware in 2010. We believe there is an average blood pressure of this population, designated as µ. However, because it would expensive and extremely difficult (perhaps even impossible) to get R information on all males, we want to take a sample to estimate µ. I If we take a sample of 300C adult males on a random basis, we can use the sample mean (the sum of all 100 values divided by 100) as our estimator A of the population mean. Likewise, we can use the sample variance (using n − 1 as the denominator)R as an unbiased estimator of the variance of the population. D , n n 2 X= ∑X i =1 i and 2 s = ∑ (X i =1 i − X) with s = s 2 (n − 1) A D The standard deviation represents the average deviation around the sample R sample out of an infinite number of possible mean. But we only took one samples. A reasonable question would be: what is the spread of our estimaI tor (i.e., the sample mean)? In other words, if we took many, many of samples E of each sample, what would the distribution of of 300 and recorded the mean these sample means look like? N N distribution is a normal distribution with μ = 150 It turns out that the sampling and σ equal to and a function E of the sample size (more shown later). Sampling n theory tells us that the mean of the sampling distribution (many samples of the same sample size n) will equal the population mean. However, we have to remember that if we could2take an infinite number of samples, each sample would yield a different sample mean. Yet, each one would be expressed as a 4 reasonable estimate of the true population mean. So, if we were able to take repeated samples, each of 7 sample size n, what would be the standard deviation of the sample estimates? 9 And, the variance of the sampling distribution will equal the variance of the population divided by the sample size. We can convert the variance of theTsampling distribution to a standard deviation by dividing sigma by the square S root of n. ∞ ∑X i =1 ∞ i = and σ X2 = σ2 n and σX = σ n The latter value is called the standard error of the mean. The standard error of the mean is the standard deviation of a sampling distribution of a mean from a population with parameters equal to μ and σ (mu and sigma). If we do not know sigma, we use the unbiased sample estimate of s to estimate the sampling variance of the mean. K11352_Ilvento_CH09.indd 176 7/12/13 2:43 PM c h a p t e r 9 Sampling Distribution for a Mean or Proportion 177 ∞ ∑X i =1 ∞ i = and s X2 = s2 n and sX = s n These relationships are expressed in the equations above. How these formulas are derived is beyond the scope of this course. We have to accept it on faith that the statisticians who worked out these formulas knew what they were doing. We benefit from their work. We will be able to demonstrate this with simulations of taking many, many sample means from a population of a known μ and σ. R A Simulated Example of a Sampling Distribution I The notion of taking repeated samples seems strange C to some people. It is important to remember that we rarely would ever take more than a single A sample in a research project. What is important is that we can think of taking more than one sample, or that our sample is oneR of many samples that can be taken randomly from the population. Statisticians can tell us what the D sampling distribution would look like for some estimators, such as the mean. , Let us look at an example of taking many samples from a population that is distributed normally with μ = 150 and σ = 30 (X~N(150, 30). I used Excel and A JMP to help with this example, and I took 1,000 samples of sample size 49 in classic statistical theory, (n = 49). A sample size of 49 is considered “large” D though it might seem moderate or even small in modern research. For each R deviation. sample I calculated the mean, median, and standard I First, let us look at just 10 samples from this exercise. Table 9.3 shows the results for just 10 samples. I will use this to makeE a point. The sample mean for each of the 10 samples tends to be close to the Npopulation parameter of 150, but none of the samples equals 150 exactly. The sample estimates range N from a low of 145.527 to a high of 154.929. The fact that some estimates are lower and some are higher is to be expected, especially if the estimator is to E be considered unbiased. The estimates for the standard deviation also center around the population parameter of 30. 2 4 Descriptive Statistics of 10 Samples from a Population7~N(150, 30) with Sample Size n = 49 9 Column N Mean Std Dev Maximum TMinimum S1 49 146.351 29.147 205.789 S 91.7885 S2 49 153.872 28.115 109.944 224.931 S3 S4 S5 S6 S7 S8 S9 S10 K11352_Ilvento_CH09.indd 177 49 49 49 49 49 49 49 49 146.783 154.929 149.236 152.505 154.126 145.527 152.183 152.078 30.342 27.867 32.380 28.552 35.044 33.043 25.252 30.652 81.7468 89.9620 57.6809 92.7285 73.6521 60.0667 80.3653 91.0217 table 9.3 215.869 209.204 222.263 206.216 246.100 213.805 203.197 225.725 7/12/13 2:43 PM 178 c h a p t e r 9 Sampling Distribution for a Mean or Proportion Figure 9.5 shows the sampling distribution of the 1,000 samples of size 49. This is a large number of samples, but it is not quite the same as a sampling distribution. However, for our purposes, using 1,000 samples of size 49 taken from a population that is normally distributed will work very well. If we look at the histogram we can see that the distribution very much resembles a normal distribution. In fact, I super-imposed a normal distribution over the histogram and it fits very well. There are a few ­extreme values from our sample estimates, from a low of 135.697 to a high of 161.954. However, most estimates fit very well. The mean of the sample means is 149.942, which is very close to the population parameter of 150. The center of our sampling distribution very much reflects the population value. The spread of our sampling distribution has a standard deviation of 4.202. This is considerably smaller than the population value of 30. R However, we stated earlier that the sampling distribution will have a stanI dard deviation equal to σ divided by the square root of n. In our case this value would be: C A StandardError R = σX = σ n = 30 49 = 30 = 4.286 7 D Our example value of 4.202,comes pretty close to the expected value of 4.286. Our simulation of 1,000 samples of size 49 from a population ~N(150, 30) shows the following. A • The sampling distribution follows a normal distribution. D • The center of the distribution (i.e., the mean of the means) is very close to the population valueR of 150. • The standard deviation of the sample estimates, referred to as a standard I error, is less than the standard deviation of the population by a factor of the square root of the sample size. E N I also calculated a sampling distribution based on 1,000 samples from the same population, but withNa sample size of 16. This would be considered a small sample. This new E sampling distribution also resembles a normal distribution and the center is very close to the population parameter of 150. However, because the sample size of each estimate is smaller, i.e., 16, there 2 4 7 9 T S figure 9.5 K11352_Ilvento_CH09.indd 178 Sampling Distribution Statistics of 1,000 Samples from a Population ~N(150, 30) with Sample Size n = 49 7/12/13 2:43 PM c h a p t e r 9 Sampling Distribution for a Mean or Proportion 179 is more spread in this sampling distribution. The standard error is 7.349 (read this as the standard deviation of the sampling distribution). We ­expected it to be: StandardError = σ X = σ n = 30 16 = 30 = 7.500 4 One of the sample estimates is as low as 123.727 and one is as high as 173.565. With a smaller sample size, even when drawing from the sample population, we expect more variability from sample to sample. Sample size is a key factor in how well we can make estimates from a population. R It is important to note that the standard error is smaller than the standard deviaI tion of the population. We expect that the standard deviation of a sampling distribution of the estimator (in this case the mean) willCbe smaller than that of the population or the samples themselves. This is because we expect some variability across samples, but not as much as we would findAin the population.Thus the sampling error is smaller than the standard deviation Rfor the population. D The size of the standard error depends upon two things. , 1. The size of n (as n gets larger the standard error gets smaller) 2. The variance of the population variable itself. We can think of this as the homogeneity of the population. A D The larger the sample size, and the more homogeneous the population, the smaller the standard error will be for our estimator. R I I have used the following table to help students remember that there are E dealing with sampling three distinct things we need to keep in mind when distributions (Table 9.4). N 1. The population variable we are interested in, N which often is unobserved but we can think that it exists. E 2. Our sample, which we collect on a random basis and which we can ­observe. 3. The sampling distribution, which is theoretical2(we do not observe it), but we know what it will look like (distributed normally) and its mean and standard deviation based on statistical theory.4 7 9 T S Sampling Distribution Statistics of 1,000 Samples from a Population ~N(150, 30) with Sample Size n = 16 K11352_Ilvento_CH09.indd 179 figure 9.6 7/12/13 2:43 PM 180 c h a p t e r 9 Sampling Distribution for a Mean or Proportion table 9.4 Comparison of the Population, Sample, and Sampling Distribution Population Sample Sampling ­Distribution Referred to as: Parameters Statistics Statistics How it is Viewed Real but not o ­ bserved Observed from our sample Theoretical from sampling theory N Mean = ∑X i =1 n i N R N Variance Standard ­ eviation D σ2 = X= ∑I (X i =1 C A σ R D , i − 2 ∑X i =1 s2 = N i n n ) ∞ ∑ (X i =1 i − X 2) (n − 1) s = ∑X i =1 σ X2 = σX = i ∞ σ2 n σ n Two Theorems about the Sampling Distribution In the case of the sampling distribution of the mean, we use two theorems A distribution of the sample means and thus that help us understand the ultimately help in makingD inferences from a sample to a population. The first one depends upon the population variable being distributed normally, or at least approximately R normal. The second theorem, the central limit theorem, relaxes this assumption as long as the sample size is sufficiently I large. E Theorem 1: Concerning a Variable That Is Normally Distributed. If repeated N samples of size n are drawn from a variable Y that is distributed normally N with mean μ and variance σ 2, the sampling distribution of the mean will also E mean and variance equal to: be a normal distribution with ∞ ∑ Xi 2 = i =1 4 ∞ 7 σ2 σ X2 = 9 n T As long as the variable of interest is distributed normally in the population, S the sampling distribution will also be normally distributed. This applies to small samples and large samples, and thus we can use the ­normal distribution to find probabilities associated with our estimate. Adjustments will be made depending upon whether σ 2 is known or whether we use the sample estimate s 2, but Theorem 1 provides a basis for inference for small sample problems via a confidence interval (Chapter 10) and hypothesis test (Chapter 11). And, as long as the variable is approximately normally distributed, as in a symmetric, mound-shaped distribution, the sampling distribution of the mean will also be normally distributed. ­However, if the variable is not normally distributed, the small sample test is not valid. K11352_Ilvento_CH09.indd 180 7/12/13 2:43 PM c h a p t e r 9 Sampling Distribution for a Mean or Proportion 181 Theorem 2: The Central Limit Theorem. The Central Limit Theorem says that even if the variable in the population is not normally distributed, the sampling distribution for the mean will be normally distributed as long as the sample size is sufficiently large. And, the larger the sample size, the more normal the sampling distribution gets. This offers a tremendous advantage to us because many variables of interest are not normally distributed. While we have a limitation for small sample sizes, large samples will still be applicable for making inferences under the central limit theorem. The key is what is a large sample? As it turns out, as we approach a sample size of 30, the central limit theorem starts to take effect. A sample of 30 or more is not really that great a burden, so the central limit theorem has very important uses for making inferences in statistics. No matter how our variable is distributed in the population, the sampling distribution R of the mean will become more and more like a normal distribution as the sample size increases, and I almost always it will be ok to use the normal distribution for making inferC ences with a sample size greater than 30. A R How to Use the Sampling Distribution to Make an Inference D Up till now we have not really explained how we make an inference. In the last section of this chapter I will present the basic, strategy that we will use in making an inference, but only in a very basic form. In Chapters 10 and 11 I will provide details on confidence intervals and hypothesis tests that will use A here depend upon the the ­approach outlined here. All of the ideas presented sample being drawn in a random fashion. This means D that every subject has an equal or near equal chance of being selected and there are no biases that would lead to one subject or a group of subjects R having a greater chance of being selected. The most basic random sample is called I a simple random sample, but there are other types, which are for the most part acceptable randomly E based samples. However, if the sample is based on a convenience sample of Nrandom, the probabilities available subjects, or some other means that is not of our inference are not well known and the inference will not hold. N E The following will be our strategy to make inferences from a sample to a population. 2 1. We draw a random sample. 2. We think of our sample as one of many possible 4 samples of size n from a population with parameters µ and σ. 7 3. We use knowledge of the probabilities associated with the theoretical sampling distribution to make a probability statement. 9 T population (or we have a. If the variable is distributed normally in the strong reason to believe it is so), we can assume S the sampling distribution of the mean is distributed normally to make inferences from the sample to the population, even if the sample size is small (n < 30). b. If the variable is not distributed normally, but our sample size (n) is large enough (n ≥ 30), we can assume the sampling distribution of the mean is distributed normally under the central limit theorem. Then we can use this information to make an inference. Most inferences will be made using a rare event approach. In this strategy, we see how rare it was to take a sample and come up with the observed estimate (either a mean, proportion, or some other estimate) when compared to some other hypothesized value. We use our sample as the basis for an K11352_Ilvento_CH09.indd 181 7/12/13 2:43 PM 182 c h a p t e r 9 Sampling Distribution for a Mean or Proportion inference by comparing it to a hypothesized mean from a population. We will test to see how close or far away our sample estimate is to the hypothesized mean in the context of a sampling distribution. Both the confidence interval and hypothesis approaches will be explored in detail in future modules. Let us look at an example to illustrate the basic logic of inference using a rare event approach. LCD televisions have a backlight that is expected to last between 30,000 and 60,000 hours. Let us say within the population the mean is 45,000 hours with a standard deviation of 7500 hours. This would mean that the backlight of an LCD television should last for 12.3 years if it were used 10 hours a day, 365 days a year. For the sake of argument, we will assume that the population distribution is approximately normal—LCD Life ~N(45,000, 7,500). R I Let us say we are involved in a consumer study of Brand X LCD televisions. We have a process that allows C us to simulate television use on a random sample of 40 televisions so that we have a good indication of how long A the Brand X TVs last. The results of our sample experiment yield a mean of R 38,000 with a standard deviation of 6,000. We want to know if our sample is unusually low from the perspective of a sampling distribution of LCD televiD sion backlight lifespan. And we know the sampling distribution will follow a , normal distribution with expected values for μ and σ based on the population and the sample size. A key point is that we have to think of our sample of 40 TVs as one of many possible samples that could have been drawn from this Aare asking is: population. The question we D What is the chance (or probability) that we drew a random sample of 40 TVs R which resulted in a mean backlight life of 38,000 hours, if the true value of the population is distributed normally with μ = 45,000 and σ = 7,500? I E Expressed this way, the problem becomes a sampling distribution problem. We did not ask for the probability of any one TV, but instead that the sample N mean would equal 45,000. We already know that if we took repeated samples N from this population, we would have some sample means that are greater E are less. And we could use a z-score and the than 45,000 and some that standard normal table to find the probability associated with a value for any sample mean. In this problem we want to know the probability associated 2 from a sample of 40. The z-score for this probwith a sample mean of 38,000 lem is the following. 4 Z= (38,7000 − 45, 000) −7, 000 = = −5.90 1,185.85 9 7, 500 T 40 S Notice that I used the standard error in the denominator of the z-score since it is the standard deviation of the sampling distribution. I am thinking of my sample of one of many possible samples from the sampling distribution with μ = 45,000 and σ = 7,500/SQRT(40). If I had thought of 38,000 as one value in my sample, the z-score would have been very different. Z = (38,000 – 45,000)/6000 = −1.167. K11352_Ilvento_CH09.indd 182 7/12/13 2:43 PM c h a p t e r 9 Sampling Distribution for a Mean or Proportion 183 This is not such an unusual value for a sample observation if the mean were 45,000. However, we are looking at a sample mean of 38,000, not an individual value. This makes it a sampling distribution problem and we use the standard error instead of the standard deviation. It turns out the value of 38,000 is 5.90 standard deviations below the mean in the sampling distribution. This is a very large absolute value of a z-score! In fact, looking at the standard normal table, our value is off the chart. Based on the furthest reaches of our table, we note that the probability of this event is less than .0007. I arrived at this figure because our table goes to a z-score of 3.19 with a probability of .4993. We want the area into the left hand tail, so I subtract this value from .5 (.5 − .4993 = .0007). We figure the probability in these problems as the area in one or both tailsRof the distribution. In this case we want the left hand side of the tail below a value of 38,000. Using I a program like Excel, I can calculate the exact probability out into the left C take a sample of 40 and tail as .0000000018. It is an extremely rare event to observe a sample mean of 38,000 or less if the true population values are A really μ = 45,000 and σ = 7,500. R This even is so rare as to cast doubt upon the population values. It is beyond D belief to take a sample of 40 and observe a sample mean of 38,000 if the , true population mean is 45,000. There must be something different about my sample. Perhaps my sample comes from a different population and Brand X does not have near the LCD backlight life of other LCD televisions. Based on A X has a lower backlight the results of our test, we would conclude that Brand life than other LCD televisions. D Rwill cover this in detail in In this example we conducted a hypothesis test. We Chapter 12. A hypothesis test is based on a pointI estimate from a sample; in this case the sample mean is 38,000. We test our sample estimate against E a null value, in this case 45,000, which is the stated mean for all LCD televisions. We found that it would be a very rare event N to draw a sample of 40 from a population where μ = 45,000 and σ = 7,500 and get a sample mean of N 38,000. In fact, it is so rare that we can reject the notion that it came from a E population with μ = 45,000. As a result, we might conclude that Brand X LCD backlights last much less time. An alternative to a point estimate and hypothesis 2 test is an interval estimate using a confidence interval. In this approach we put 4 a bound of error around our sample estimate based on sampling theory. To do this we take the sample 7 on so many standard mean and put a plus and minus interval around it based errors. The number of standard errors is based on 9 a probability level that we have confidence in, e.g., 95% confidence. Conversely, 1—the confidence T in our conclusion. For level, or 1 − .95 = .05—is the chance of being wrong example, if we want a 95% confidence interval around S our estimate we use a value of 1.96 standard errors added and subtracted from the mean. The 1.96 comes from the normal distribution and reflects 95% of the values in the distribution. What we are saying, in so many words, is that we have a 95% chance that the population mean is within this interval. The precise definition of the confidence interval will come in Chapter 10. With a 95% confidence interval we have a 5% chance of being wrong. 38, 000 ± 1.96 * K11352_Ilvento_CH09.indd 183 6, 000 40 = 38, 000 ± 1.96 * 948.683 = 38, 000 ± 1859.419 7/12/13 2:43 PM 184 c h a p t e r 9 Sampling Distribution for a Mean or Proportion For Brand X TVs, our confidence interval says that we are 95% sure that the true population value would lie between 36,140.58 hours and 39,859.42 hours. We do not know exactly where the population mean lies, but we think it is in this interval. But we could be wrong. In any case, 45,000 is nowhere near this interval and we can safely conclude that Brand X backlights last much less time than the industry norm. Summary Sampling distributions and the logic underlying them form the basis for ­inference in statistics. I refer to this as the “logic of inference” when I talk with students. It is the notion R that we need to think of our observed sample as being one of many possible samples from the population. And in that I framework, an estimate from my sample may be thought of as a reasonC parameter, but rarely will it exactly equal the able estimate of the population population value. The difference is referred to as sampling error. We expect A that the sampling error will be small, and it will vary from sample to sample. We can show this by doingRa simulation of repeated samples from a known population. The sampling error, D or standard error, will be a function of the population parameter σ and the sample size n. , If we know the distribution of the sampling distribution, we can use that ­information to make an inference. It turns out that the sampling distribution A distribution. We can use that information to of the mean follows a normal place a bound of error around D our estimate in a confidence interval, or we can conduct a hypothesis test for our estimate. In either case the inference is placed within a probabilityRframework. We will seek to put the probabilities vastly in our favor, as in being I 95% sure, but there is always a small chance we will be wrong in our conclusions. Chapter 10 will deal with confidence E intervals and Chapter 11 with hypothesis tests. N For the rest of the book, we will not deal with certainties. Our estimates will N be made in a probability framework. Inferential statistics from a sample are not about certainty. We canE always be wrong in our inferences. If you want to be certain, you need to get the population rather than a sample. However, we will keep the probability of being wrong rather small. And for the most part, 2 of error. we can live with a small chance 4 7 Sampling Distribution Problems 9 1. In a sampling distribution for a mean, if we know the population mean and T σ), then the distribution of the sample means folstandard deviation (μ and lows a normal distribution, S with a mean equal to μ and the standard deviation equal to σ/SQRT(n). Note that σ/SQRT(n) is called the standard error. Based on this information, if we took a sample of size n (n is given in the problem as some number), what is the probability that the sample mean is greater than (or less than) some value? This is simply a z-score and normal distribution problem, similar to what we did in chapter 8. However, there is one important change with these problems. Now the denominator of the z-score is the standard error and not σ. That said, answer the following questions. A manufacturing process produces a product weighing an average of 150 grams, with a standard deviation of 12 grams (i.e., μ = 150 and σ = 12). If K11352_Ilvento_CH09.indd 184 7/12/13 2:43 PM c h a p t e r 9 Sampling Distribution for a Mean or Proportion 185 the plant manager takes a sample of 36 observations, what is the probability that: a. The sample mean is less than 148? b. The sample mean is greater than 155? c. The sample mean is between 147 and 153? d. Suppose the sample size is now 100. How does the standard error change? Recalculate the probability in part b and show how it changes when the sample size is increased. R 2. In a sampling distribution for a mean, if we know the population mean I and standard deviation (μ and σ), then the distribution of the sample means follows a normal distribution, with a C mean equal to μ and the standard deviation equal to σ/SQRT(n). Note that σ/SQRT(n) is called the A standard error. Based on this information, if we took a sample of size n (n Ris the probability that the is given in the problem as some number), what sample mean is greater than (or less than) some D value? This is simply a z-score and normal distribution problem, similar to what we did in chapter 8. However, there is one important change ,with these problems. Now the denominator of the z-score is the standard error and not σ. That said, answer the following questions. A A manufacturing process produces a product that D contains an average of 3.5 liters of liquid with a standard deviation of 0.25 liters (i.e., μ = 3.5 and R 64 observations, what is σ = 0.25). If the plant manager takes a sample of the probability that: I E a. The sample mean is greater than 3.55? N b. The sample mean is less than 3.6? N c. The sample mean is between 3.45 and 3.5? E d. The sample mean is exactly equal to 3.5? 2 3. In a sampling distribution for a mean, if we know 4 the population mean and standard deviation (μ and σ), then the distribution of the sample means follows a normal distribution, with a 7mean equal to μ and the standard deviation equal to σ/SQRT(n). Note that 9 σ/SQRT(n) is called the standard error. Based on this information, if we took a sample of size n (n Tis the probability that the is given in the problem as some number), what sample mean is greater than (or less than) some S value? This is simply a z-score and normal distribution problem, similar to what we did in chapter 8. However, there is one important change with these problems. Now the denominator of the z-score is the standard error and not σ. That said, answer the following questions. A manufacturing process produces a product that contains an average of 3.5 liters of liquid with a standard deviation of 0.25 liters (i.e., μ = 3.5 and K11352_Ilvento_CH09.indd 185 7/12/13 2:43 PM 186 c h a p t e r 9 Sampling Distribution for a Mean or Proportion σ = 0.25). If the plant manager takes a sample of 64 observations, what is the probability that: a. The sample mean is less than 3.55? b. The sample mean is greater than 3.6? c. The sample mean is between 3.45 and 3.75? d. The sample mean is exactly equal to 3.75? 4. In a sampling distribution for a mean, if we know the population mean and standard deviationR(μ and σ), then the distribution of the sample means follows a normal distribution, with a mean equal to μ and the I standard deviation equal to σ/SQRT(n). Note that σ/SQRT(n) is called the C this information, if we took a sample of size n (n standard error. Based on is given in the problem as some number), what is the probability that the A sample mean is greater than (or less than) some value? This is simply a R z-score and normal distribution problem, similar to what we did in chapter 8. However, there isD one important change with these problems. Now the denominator of the z-score is the standard error and not σ. That said, , answer the following questions. Assume the systolic blood pressure of young adults in the U.S. aged 20 A distribution, with μ = 113.7 and σ = 11.7. If we to 30 years follows a normal take a random sample of D 150 young adults, what is the probability that: a. The sample mean is R between 113 and 115? I E c. The sample mean is N greater than 111? N d. The sample mean is greater than 116.5? E b. The sample mean is less than 111? 5. In a sampling distribution for a mean, if we know the population mean and standard deviation (μ and σ), then the distribution of the sample means follows a normal2distribution with a mean equal to μ and the standard deviation equal to4 σ/SQRT(n). Note that σ/SQRT(n) is called the standard error. Based on this information, if we took a sample of size n (n is 7 some number), what is the probability that the given in the problem as sample mean is greater9than (or less than) some value? This is simply a z-score and normal distribution problem, similar to what we did in chapter 8. However, there isT one important change with these problems. Now the denominator of theS z-score is the standard error and not σ. That said, answer the following questions. Suppose the lifespan of the top-of-the-line car battery follows a normal distribution with μ = 48.1 months and σ = 4.4 months based on regular use and maintenance. If we take a random sample of 40 batteries, what is the probability that: a. The sample mean is between 45 and 51? b. The sample mean is less than 46? K11352_Ilvento_CH09.indd 186 7/12/13 2:43 PM c h a p t e r 9 Sampling Distribution for a Mean or Proportion 187 c. The sample mean is greater than 50? d. The sample mean is greater than 46? 6. Suppose we work for a company that makes a popcorn product containing 1.2 ounces of unpopped kernels in a microwavable bag. However, no manufacturing process is perfect, and there is variability from bag to bag, which the factory manager seeks to keep to a minimum. Bags that are underfilled can lead to consumer complaints and lawsuits, while bags that are overfilled can result in lost profits and affect the quality of the popping process. Based on previous experience, the distribution of the popcorn bags is distributed normally with a mean of 1.21 oz. and a standard deviation of 0.22, bag~N(1.21, 0.22). R I a. Suppose the manger takes a sample of 16 bags and observes a sample C One of the bags in the mean of 1.20 with a standard deviation of 0.20. sample weighs 1.3 ounces. Calculate a z-score for the value of 1.3 and A interpret its meaning. R b. The manager asks the following question: “If D the mean and standard deviation of the population are true as given (i.e., μ = 1.21 and σ = 0.22), if I took a random sample of 16 bags, what, is the probability that the sample mean would be greater than 1.3 oz.? Note: This problem is a sampling distribution problem. Use the standard error in calculating a A z-score. D c. The manager asks the same question as in part b, but in reference to 49 R the probabilities from bags. He chooses a larger sample size. Compare using a sample of 16 bags and to 49 bags. Explain why the answers are I different. E 7. The data below are means taken from random N samples from a population that is distributed normally with μ = 75 and σ = 8. Samples of 30 N observations were randomly drawn, and the mean was calculated for each sample. This was done in Excel using theENORMINV and the RAND functions. To demonstrate sampling distributions, 30 sample means were examined (below). This is a small sample of the sample means, but none2 theless, it gives us insight into sampling distribution theory. Sample Means 72.9 74.3 75.6 72.3 74.3 75.6 72.5 74.7 75.6 73.5 74.8 75.8 4 7 Count 9 Sum T SumSq S Count 30 2245.70 168154.93 30.00 73.5 74.8 76.0 Min 72.30 73.7 74.9 76.3 Max 77.40 73.7 75.0 76.6 Q2 73.88 73.8 75.3 76.7 Q3 75.60 74.1 75.3 77.0 74.2 75.5 77.4 K11352_Ilvento_CH09.indd 187 7/12/13 2:43 PM 188 c h a p t e r 9 Sampling Distribution for a Mean or Proportion a. Construct a stem-and-leaf plot of the 30 sample means. b. Calculate the mean, median, and mode of the sample means. c. Calculate the variance and the standard deviation of the sample means. d. Describe in words the distribution of the sample means using your stem-and-leaf plot and the descriptive statistics. What is your conclusion regarding this small example of a sampling distribution from a variable that is distributed normally? e. Sampling theory tells us that the standard deviation of the sampling distribution should be σ/SQRT(n). For this data, that would be 8/SQRT(30) = 1.461. Compare the standard deviation you calculated for R the 30 means with this figure. I 8. The data below are means C taken from random samples from a population that is distributed normally with μ = 75 and σ = 8. Samples of 60 A observations were randomly drawn, and the mean was calculated for each sample. This was R done in Excel using the NORMINV and the RAND functions. To demonstrate sampling distributions, 30 sample means were examined (below). ThisD is a small sample of the sample means, but nonetheless, it gives us insight , into sampling distribution theory. Sample Means, Samples of 60 73.2 74.6 73.6 74.9 73.8 75.1 73.8 75.3 74.1 75.4 74.2 75.4 74.3 75.5 74.4 75.6 74.4 75.8 74.6 75.9 A D R I E N N E 75.9 Count 75.9 Sum 75.9 SumSq 76.0 Count 30.00 76.0 Min 73.20 76.0 Max 77.00 76.1 Q2 74.40 76.1 Q3 75.90 30.0 2255.40 169587.96 76.6 77.0 2 4 b. Calculate the mean, median, and mode of the sample means. 7 9 c. Calculate the variance and the standard deviation of the sample means. T d. Describe in words the distribution of the sample means using your S stem-and-leaf plot and the descriptive statistics. What is your conclua. Construct a stem-and-leaf plot of the 30 sample means. sion regarding this small example of a sampling distribution from a variable that is distributed normally? e. Sampling theory tells us that the standard deviation of the sampling distribution should be σ/SQRT(n). For this data, that would be 8/SQRT(60) = 1.033. Compare the standard deviation you calculated for the 30 means with this figure. K11352_Ilvento_CH09.indd 188 7/12/13 2:43 PM c h a p t e r 9 Sampling Distribution for a Mean or Proportion 189 9. The data below are means taken from random samples from a population that is distributed as a uniform continuous distribution with parameters 0 and 10. A uniform continuous distribution would generate a histogram that looks like a rectangle. With parameters 0 and 10 the mean of this distribution is 5 and the standard deviation is 2.89 (think of these as μ = 5 and σ = 2.89). A uniform distribution is decidedly not normal or bell shaped and, as such, provides a good illustration of whether the sampling distribution resembles a normal distribution. Samples of 36 observations were randomly drawn, and the mean was calculated for each sample. This was done in Excel using the RAND function. To demonstrate sampling distributions, 36 different sample means were examined (below). This is a small sample of the sample means, but nonetheless, it gives us insight into sampling distribution theory. R 4.4 4.9 5.3 I C Count A Sum R SumSq D Count Min , 4.5 4.9 5.3 Max 6.20 4.5 4.9 5.5 4.78 4.6 5.0 5.5 Q2 A Sample Means, Samples of 36 4.1 4.8 5.2 4.2 4.9 5.2 4.2 4.9 5.2 4.4 4.9 5.3 Q3 D 4.7 5.0 5.5 R 4.8 5.1 5.6 I 4.8 5.1 5.6 E 4.8 5.1 6.2 N a. Construct a stem-and-leaf plot of the 36 sample means. N b. Calculate the mean, median, and mode of the E sample means. 36.00 178.90 896.11 36.00 4.10 5.23 c. Calculate the variance and the standard deviation of the sample means. 2 d. Describe in words the distribution of the sample means using your 4 stem-and-leaf plot and the descriptive statistics. What is your conclu7 sion regarding this small example of a sampling distribution from a variable that is distributed normally? 9 e. Sampling theory tells us that the standard T deviation of the sampling distribution should be σ/SQRT(n). For this S data, that would be 2.89/ SQRT(36) = 0.482. Compare the standard deviation you calculated for the 36 means with this figure. K11352_Ilvento_CH09.indd 189 7/12/13 2:43 PM 190 c h a p t e r 9 Sampling Distribution for a Mean or Proportion 10. The data below are means taken from random samples from a population that is distributed as a uniform continuous distribution with parameters 0 and 10. A uniform continuous distribution would generate a histogram that looks like a rectangle. With parameters 0 and 10, the mean of this distribution is 5, and the standard deviation is 2.89 (think of these as μ = 5 and σ = 2.89). A uniform distribution is decidedly not normal or bell shaped and, as such, provides a good illustration of whether the sampling distribution resembles a normal distribution. Samples of 20 observations were randomly drawn and the mean was calculated for each sample. (Note: Samples of 20 should not generate normal distributions based on the Central Limit Theorem, but they should still be close.) This was done in Excel using the RAND function. To demonstrate sampling distributions, 36 different sample means were examined (below).RThis is a small sample of the sample means, but nonetheless, it gives us insight into sampling distribution theory. 3.8 4.9 3.9 4.9 4.3 4.9 4.5 4.9 4.5 5 4.5 5 I C Sample Means, Samples of 20 A 5.4 Count 5.5 Sum R 5.5 SumSq D 5.5 Count , 36.0 183.2 944.5 36.0 5.6 Min 3.8 5.6 Max 6.4 A 4.5 5 5.8 Q2 D 4.6 5.1 5.8 Q3 R 4.6 5.2 5.8 I 4.7 5.3 5.9 4.8 5.3 6.1 E 4.8 5.3 6.4 N N plot of the 36 sample means. a. Construct a stem-and-leaf E 4.7 5.5 b. Calculate the mean, median, and mode of the sample means. c. Calculate the variance and 2 the standard deviation of the sample means. 4 d. Describe in words the distribution of the sample means using your stem7 and-leaf plot and the descriptive statistics. What is your conclusion regarding this small example of a sampling distribution from a variable 9 that is distributed normally? T e. Sampling theory tells usSthat the standard deviation of the sampling distri- bution should be σ/SQRT(n). For this data, that would be 2.89/SQRT(36) = 0.482. Compare the standard deviation you calculated for the 36 means with this figure. K11352_Ilvento_CH09.indd 190 7/12/13 2:43 PM Continuous Random Variables and the Normal Distribution R Unlike discrete random variables, continuous random variables take on any I point in the interval. Thus the probability distribution is continuous and is referred to a probability density function or PDF (also C represented as f (x)). In contrast to a discrete random variable, it is not particularly useful to think of A a probability for a particular value of a continuous random variable. In fact, with a continuous random variable the probabilityRthat X equals any value is zero: P (X = k) = 0. Instead, we will tend to think ofD the probability that X falls between two values, is greater than a value, or is less than a value. We can do this by finding areas under the probability density, function curve. chapter 8 In this chapter we will focus exclusively on one continuous random variA able, the normal distribution. There are other continuous random variables, such as the uniform distribution, and the exponential distribution. However, D the normal distribution has important significance in statistics, especially in inferential statistics. For much of the remainder R of this course we will be working with some aspect of the normal distribution in order to make an I inference from a sample to the population. E N N E bell-shaped, Normal Distribution The normal distribution is one particular symmetrical distribution. The normal distribution is defined by a mathematical formula (see below) which is specified by two key parameters, the mean and the standard deviation. The formula for the normal distribution 2 is given as: 4 P (X ) = e 7 σ 2π 9 T While the formula looks daunting, the key point is that the only things that S deviation (sigma). For vary in this formula is the mean (mu) and the standard 1 − ( X − )2 /(2σ 2 ) every distribution with a mean and a standard deviation there is a different normal curve Thus, there are an infinite number of normal curves. If X is a random variable and is distributed as a normal variable then it is designated as: X ~ N(mean, std Dev). For every combination of a mean and standard deviation there is a different normal curve. Figures 8.1 and 8.2 show two normal distributions for μ = 100 and σ = 10 and 5. Changing σ from 10 to 5 changes the shape of the distribution, but both are normal distributions. K11352_Ilvento_CH08.indd 147 7/12/13 2:41 PM 148 c h a p t e r 8 Continuous Random Variables and the Normal Distribution figure 8.1 R Probability Density Function of a Normal Distribution with Mean 100 and Standard I Deviation 10 C A R D , figure 8.2 A D R I Probability Density Function ofEa Normal Distribution with Mean 100 and Standard Deviation 5 N N Since there are an infinite E number of normal distributions depending upon the combination of μ and σ, calculating probabilities for any one distribution would be tedious. However, it is possible to transform any variable into a z-score. Remember, a z-score 2 is a transformation for a particular value in a variable where we subtract the mean and divide by the standard deviation 4 (see Chapter 4). 7 9 T S values z= (x i −X s ) Any variable where all are transformed into z-scores will have a mean equal to zero and a standard deviation equal to 1. This transformation will allow us to work with one normal distribution with μ = 0 and σ = 1. This is referred to as the standard normal distribution and we will use the ­designation, X ~ N(0, 1). Properties of the Normal Distribution The normal distribution reflects a probability and as such the total area under the curve is equal to 1.0. This distribution is a symmetrical, bell-shaped curve. This means that the area from the right of the center is equal to .5 and it is K11352_Ilvento_CH08.indd 148 7/12/13 2:41 PM c h a p t e r 8 Continuous Random Variables and the Normal Distribution 149 exactly identical to the area on the left hand side of the curve. The normal distribution is a continuous distribution so it will not be easy to reflect values and outcomes in the frequency table. There are an infinite number of values to the distribution. Instead we will think of solving for areas under the curve between two values, or greater than or less than particular values. There will in fact be a normal distribution table that we will learn about and work from. This table will be tied to the standard normal distribution with a mean of zero and a standard deviation of 1.0. And because the curve is symmetrical, we often only tabulate probabilities for one half of the curve. The normal distribution is expressed as a mathematical formula which is specified by the mean and standard deviation. We tend to think that many variables can be approximated by the normal distribution, but in reality, perR haps few things are defined as exactly normally distributed. However, there I are many continuous variables that reflect a mound-shaped symmetrical distribution with a center that is close to the meanC and a shape that reflects the standard deviation. And the normal distribution can reasonably reflect A the shape of those distributions. These can be the height of adult males, the R amount of soda in a bottle from a manufacturing process, or the distribution of intelligence in a population. D , Here are some basic properties of the normal distribution. • The area under a probability density function (PDF) of the normal distriA bution is equal to 1.0. • The normal distribution is defined by the mean D(μ) and the standard deviation (σ). We will think of these as parameter values, and thus use μ and R mean and the standard σ, but at times I will simply refer to them as the deviation of the distribution. I • The center of the distribution is specified by the mean. E the mode. All measures • The mean equals the median which also equals of central tendency that we have discussed areNequal in the normal distribution. N • Most of the values in a variable that is normally distributed are found E than 2/3 of the values close to the center of the distribution. A little more are within plus or minus one standard deviation from the mean. This gives the distribution its unique “bell” shape. • The normal distribution has an infinite range of2values, but the values out in the tails of the distribution are increasingly rare. 4 Most values fall within two standard deviations of the mean. 7 • There are other known dimensions to the normal distribution which relate to other measures of center and spread9of the data. For example, the mean is located at the 50th percentile and the IQR (inter quartile range) is 1.349 standard deviations wide (.6745 belowTor .6745 above the mean/ median). S • With the normal distribution, we can now be much more precise about the central limit theorem and the areas associated with one, two, or three standard deviations around the mean. °° Plus or minus 1 standard deviation around the mean accounts for a probability of .6826 – 68.26% of the values should fall within 1 standard deviation of the mean. °° Plus or minus 2 standard deviations around the mean accounts for a probability of .9544 – 95.44% of the values should fall within 2 standard deviations of the mean. °° Plus or minus 3 standard deviations around the mean accounts for a probability of .9974 – 99.74% of the values should fall within 3 standard K11352_Ilvento_CH08.indd 149 7/12/13 2:41 PM 150 c h a p t e r 8 Continuous Random Variables and the Normal Distribution deviations of the mean. In other words, virtually all the values will be within three standard deviations. • In a normal distribution, it is a rare event for a value to be more than two standard deviations away from the mean. It is an extremely rare event for a value to be more than three standard deviations from the mean. Since its properties are defined by a formula, we can define a priori probabilities associated with the curve. We can work out the probability of finding a value of 110 or higher if the mean is 100 and the standard deviation is 10, but this probability will be different if the mean is 100 and the standard deviation is 5. However, there is an easy way around this problem. If we convert our normally distributed variable to a z-score, we make it possible to use one table of probabilities for the R normal distribution. This is called the standard normal distribution. Remember, for z-scores if we convert all the values in I our variable to a z-score it would result in a new variable with a mean equal to zero and a standard deviation equal to one. This will allow us to use one C table, the standard normal table, to solve probability problems with variables A that are distributed normally. R Standard Normal Table D In order to solve probability, problems you need to be able to understand and work with the standard normal table. A full copy of the standard normal table calculated from Excel is given in the Appendix, but we will start to learn how to use the table by lookingA at a partial table, in Table 8.1. The organization of this table is that it reflectsD probabilities from the center of the distribution, where the mean equals zero, out into the right hand tail of the distribution. R which shows 1/2 of the standard normal distriThis can be seen in Figure 8.3, bution with a mean of 0 andI a standard deviation of 1.0. The right hand side of the distribution, out into the right hand tail, reflects .5 probability and is idenE tical in size and shape to the left hand side of the distribution. You should be N to organize the standard normal table which aware that there are other ways would reflect probabilities from the left hand side up to the center, or from the N left tail to the right hand tail. We will use the format found in Table 8.1. E The rows in Table 8.1 reflect the whole number and the first decimal place of a z-score. The columns reflect the second decimal place. For example, the 2 the row 0.3 and the column .03 for a z-score of shaded cell in Table 8.1 reflects 0.33. The number .1293 in the 4 table reflects the probability from the center of the distribution, where μ = 0, out .33 standard deviations to the right. The area under the standard normal 7 curve for a z-score of .33 is .1293. We can read this as the probability of the area 9 between the center out .33 standard deviations to the right is .1293. T S figure 8.3 K11352_Ilvento_CH08.indd 150 A Graph Showing the Right Hand Side of the Standard Normal Distribution 7/12/13 2:41 PM c h a p t e r 8 Continuous Random Variables and the Normal Distribution 151 table 8.1 A Partial Standard Normal Table Standard Normal Curve Probability Distribution The table is based on the upper right 1/2 of the normal distribution; total area shown is .5 The z-score values are represented by the column value + row value, up to two decimal places The probabilities up to the z-score are in the cells Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0000 0.0398 0.0793 0.1179 0.1554 0.1915 0.2257 0.2580 0.2881 0.3159 0.3413 0.0040 0.0438 0.0832 0.1217 0.1591 0.1950 0.2291 0.2611 0.2910 0.3186 0.3438 0.0080 0.0478 0.0871 0.1255 0.1628 0.1985 0.2324 0.2642 0.2939 0.3212 0.3461 0.0120 0.0517 0.0910 0.1293 0.1664 0.2019 0.2357 0.2673 0.2967 0.3238 0.3485 0.0160 0.0557 0.0948 0.1331 R 0.1700 0.2054 I 0.2389 C 0.2704 A 0.2995 0.3264 R 0.3508 0.0199 0.0596 0.0987 0.1368 0.1736 0.2088 0.2422 0.2734 0.3023 0.3289 0.3531 0.0239 0.0636 0.1026 0.1406 0.1772 0.2123 0.2454 0.2764 0.3051 0.3315 0.3554 0.0279 0.0675 0.1064 0.1443 0.1808 0.2157 0.2486 0.2794 0.3078 0.3340 0.3577 0.0319 0.0714 0.1103 0.1480 0.1844 0.2190 0.2517 0.2823 0.3106 0.3365 0.3599 0.0359 0.0753 0.1141 0.1517 0.1879 0.2224 0.2549 0.2852 0.3133 0.3389 0.3621 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 0.3643 0.3849 0.4032 0.4192 0.4332 0.4452 0.4554 0.4641 0.4713 0.4772 0.3665 0.3869 0.4049 0.4207 0.4345 0.4463 0.4564 0.4649 0.4719 0.4778 0.3686 0.3888 0.4066 0.4222 0.4357 0.4474 0.4573 0.4656 0.4726 0.4783 0.3708 0.3907 0.4082 0.4236 0.4370 0.4484 0.4582 0.4664 0.4732 0.4788 0.3925 0.4099 0.4251 A 0.4382 D 0.4495 0.4591 R 0.4671 I 0.4738 0.4793 E 0.3749 0.3944 0.4115 0.4265 0.4394 0.4505 0.4599 0.4678 0.4744 0.4798 0.3770 0.3962 0.4131 0.4279 0.4406 0.4515 0.4608 0.4686 0.4750 0.4803 0.3790 0.3980 0.4147 0.4292 0.4418 0.4525 0.4616 0.4693 0.4756 0.4808 0.3810 0.3997 0.4162 0.4306 0.4429 0.4535 0.4625 0.4699 0.4761 0.4812 0.3830 0.4015 0.4177 0.4319 0.4441 0.4545 0.4633 0.4706 0.4767 0.4817 D 0.3729 , N You can solve for other probabilities. For example,Nfind the probability associated with the following z-scores. In most cases I am expressing the probaEwhich is the way we tend bility of the z-value as a range between two values, to think of these problems. Some of the probabilities will be a little tricky, but try to find an answer and I will give the correct answer and an explanation. All of these can be solved by looking at Table 8.1. 2 a. b. c. d. e. f. P (0 ≤ Z ≤ 1.0) = P (0 ≤ Z ≤ .89) = P (0 ≤ Z ≤ 1.62) = P (−1 ≤ Z ≤ 0) = P (Z = 1) = P (Z ≥ 1) = 4 7 9 T S The answers are given below, along with a brief explanation. Problems d, e, and f were a little tricky. I did this to push you a little to see if you could handle them, and to lead ito the next discussion. a. P (0 ≤ Z ≤ 1.00) = .3413 b. P (0 ≤ Z ≤ .89) = .3133 K11352_Ilvento_CH08.indd 151 This comes directly from Table 8.1. The probability associated with the area from the center of the distribution out to 1.00 standard deviation is .3413. This comes directly from Table 8.1. The probability associated with the area 7/12/13 2:41 PM 152 c h a p t e r 8 Continuous Random Variables and the Normal Distribution c. d. e. f. P (0 ≤ Z ≤ 1.62) = .4474 P (−1.00 ≤ Z ≤ 0.00) = .3413 P (Z = 1.00) = R.0000 I C P (Z ≥ 1.00) = A.1587 R D , A from the center of the distribution out to .89 standard deviations is .3133. This comes directly from Table 8.1. The probability associated with the area from the center of the distribution out to 1.62 standard deviations is .4474. This one is a little tricky, but it also comes from Table 8.1 and knowledge that the left hand side of the distribution, where we would deal with negative z-scores, is identical to the right hand side. In a continuous distribution, the probability of any single value is thought to be zero. This definition was given in the first paragraph of this chapter. We can solve this problem from Table 8.1 and the knowledge that the total area to the right of the center is .5. From Table 8.1 we can find that the probability from the center to 1 stan dard deviation above the mean is .3413. The area after this, that a z-value is greater than 1.00, is equal to .5 − .3413 = .1587. D The answer to problem d shows that the left hand side of the distribution is R I do not need a table with negative z-values in equal to the right hand side. order to solve for these probabilities as long as I remember that the left hand I side of the distribution is a mirror image of the right hand side. Problem f is a reminder that sometimesEI want a different probability than what is in the table. In the case of problem N f, I wanted the rest of the probability out in the right hand tail. To calculate this probability, I need to subtract the table probN ability from .5. I call these types of calculations normal table gymnastics. E A Closer Look at the Standard Normal Table 2 Table 8.2 shows the full probabilities for the standard normal distribution in the upper right hand side of4the distribution. The table provides the probabilities from the center of the distribution, the mean, out to a particular z-score. 7 decimal places for a z-score, and four decimal The table allows for up to two places of a probability. In some 9 cases we may prefer more precision and we will have to estimate between values or interpolate, but the precision of the T of our applications. table will work well for most S Let us begin by solving for the precise probabilities associated with one, two, or three standard deviations about the mean in the normal distribution. We noted in Chapter 4 that the empirical rule states that in a symmetrical, mound shaped distribution, approximately 68% of the values should be within plus or minus 1 standard deviation from the mean, 95% should be within 2 standard deviations, and 99.9% should be within 3 standard deviations. Now we can precisely solve these values for the normal distribution, which is a particular symmetrical, mound-shaped distribution. From Table 8.2, a z-value of 1.00 is associated with a probability of .3413. This is the area of the distribution from the center out to one standard deviation K11352_Ilvento_CH08.indd 152 7/12/13 2:41 PM c h a p t e r 8 Continuous Random Variables and the Normal Distribution 153 table 8.2 Probabilities Found under One-Half of the Standard Normal Distribution Standard Normal Curve Probability Distribution The table is based on the upper right half of the normal distribution; total area shown is .5 The z-score values are represented by the column value + row value, up to two decimal places The probabilities up to the z-score are in the cells Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0000 0.0398 0.0793 0.1179 0.1554 0.1915 0.2257 0.2580 0.2881 0.3159 0.3413 0.0040 0.0438 0.0832 0.1217 0.1591 0.1950 0.2291 0.2611 0.2910 0.3186 0.3438 0.0080 0.0478 0.0871 0.1255 0.1628 0.1985 0.2324 0.2642 0.2939 0.3212 0.3461 0.0120 0.0517 0.0910 0.1293 0.1664 0.2019 0.2357 0.2673 0.2967 0.3238 0.3485 0.0160 0.0557 0.0948 0.1331 R 0.1700 0.2054 I 0.2389 C 0.2704 A 0.2995 0.3264 R 0.3508 0.0199 0.0596 0.0987 0.1368 0.1736 0.2088 0.2422 0.2734 0.3023 0.3289 0.3531 0.0239 0.0636 0.1026 0.1406 0.1772 0.2123 0.2454 0.2764 0.3051 0.3315 0.3554 0.0279 0.0675 0.1064 0.1443 0.1808 0.2157 0.2486 0.2794 0.3078 0.3340 0.3577 0.0319 0.0714 0.1103 0.1480 0.1844 0.2190 0.2517 0.2823 0.3106 0.3365 0.3599 0.0359 0.0753 0.1141 0.1517 0.1879 0.2224 0.2549 0.2852 0.3133 0.3389 0.3621 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 0.3643 0.3849 0.4032 0.4192 0.4332 0.4452 0.4554 0.4641 0.4713 0.4772 0.3665 0.3869 0.4049 0.4207 0.4345 0.4463 0.4564 0.4649 0.4719 0.4778 0.3686 0.3888 0.4066 0.4222 0.4357 0.4474 0.4573 0.4656 0.4726 0.4783 0.3708 0.3907 0.4082 0.4236 0.4370 0.4484 0.4582 0.4664 0.4732 0.4788 0.3925 0.4099 0.4251 A 0.4382 D 0.4495 0.4591 R 0.4671 I 0.4738 0.4793 E 0.3749 0.3944 0.4115 0.4265 0.4394 0.4505 0.4599 0.4678 0.4744 0.4798 0.3770 0.3962 0.4131 0.4279 0.4406 0.4515 0.4608 0.4686 0.4750 0.4803 0.3790 0.3980 0.4147 0.4292 0.4418 0.4525 0.4616 0.4693 0.4756 0.4808 0.3810 0.3997 0.4162 0.4306 0.4429 0.4535 0.4625 0.4699 0.4761 0.4812 0.3830 0.4015 0.4177 0.4319 0.4441 0.4545 0.4633 0.4706 0.4767 0.4817 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 0.4821 0.4861 0.4893 0.4918 0.4938 0.4953 0.4965 0.4974 0.4981 0.4987 0.4826 0.4864 0.4896 0.4920 0.4940 0.4955 0.4966 0.4975 0.4982 0.4987 0.4830 0.4868 0.4898 0.4922 0.4941 0.4956 0.4967 0.4976 0.4982 0.4987 0.4834 0.4871 0.4901 0.4925 0.4943 0.4957 0.4968 0.4977 0.4983 0.4987 0.4842 0.4878 0.4906 0.4929 0.4946 0.4960 0.4970 0.4978 0.4984 0.4987 0.4846 0.4881 0.4909 0.4931 0.4948 0.4961 0.4971 0.4979 0.4985 0.4988 0.4850 0.4884 0.4911 0.4932 0.4949 0.4962 0.4972 0.4979 0.4985 0.4988 0.4854 0.4887 0.4913 0.4934 0.4951 0.4963 0.4973 0.4980 0.4986 0.4988 0.4857 0.4890 0.4916 0.4936 0.4952 0.4964 0.4974 0.4981 0.4986 0.4988 3.1 0.4990 0.4991 0.4992 0.4993 0.4993 D 0.3729 , N 0.4838 N 0.4875 0.4904 E 0.4927 0.4945 0.4959 2 0.4969 4 0.4977 0.4984 7 0.4987 9 0.4991 0.4991 0.4992 0.4992 0.4992 T S using Microsoft Excel Probabilities computed above the mean. Since the standard normal distribution is a symmetric distribution, the area 1 standard deviation below the mean is also .3413. Thus the total area plus or minus 1 standard deviation around the mean is: P (X ± 1σ) = .3413 + .3413 = .6826 This is very similar to what we said in the empirical rule; approximately 68% of the area is within 1 standard deviation about the mean. Figure 8.4 shows K11352_Ilvento_CH08.indd 153 7/12/13 2:41 PM 154 c h a p t e r 8 Continuous Random Variables and the Normal Distribution figure 8.4 Probabilities Associated with Plus or Minus 1.0 Standard Deviation about the Mean in the Standard Normal Table the area within plus or minus R one standard deviation about the mean (in white). The area out in the tails (shaded in black) is equal to: I P (X ≤ −1σ or X C ≥ 1σ) = (.5 − .3413)*2 = .1587*2 = .3174 A We could have also solved for the areas in the tails by subtracting .6826 from 1.0. R Similarly we can solve for plus or minus two standard deviations or plus or D minus three standard deviations by the following calculations (see below). These calculations result in, similar probabilities as noted from the empirical rule. Plus or minus 2 standard deviations captures about 95% of the observations while plus or minus 3 standard deviations captures 99.7% of the values A in a normally distributed variable. Because these values are so important for inferential statistics, I have D also included the graphs for each interval (Figures 8.5 and 8.6). The area in the tails for more than three standard deviations can Rbecause it is so small. Any value that far out in hardly be seen in the graph the distribution is an extremely rare event. The areas out in the tails of the I distribution, beyond two or three standard deviations from the mean, will be E inference in upcoming chapters. very important as we shift to N N P (X ≤ −2σ or X ≥ E2σ) = (.5 − .4772)*2 = .0228*2 = .0456 P (X ± 2σ) = .4772 + .4772 = .9544 P (X ± 3σ) = .4987 + .4987 = .9974 2 4 7 Normal Table Gymnastics 9 The organization of a standard normal table with probabilities in half of the probability distributionTis a choice that makes some problems easy to solve while other ones require S a few calculations. This approach is similar to P (X ≤ −3σ or X ≥ 3σ) = (.5 − .4987)*2 = .0013*2 = .0026 the discussion for the cumulative binomial probability tables—it will require some normal table gymnastics. This section will go through some typical problems that we may face with the standard normal distribution and how we might solve for probabilities. In general, I tell students to follow the following strategy when working with a problem involving the standard normal table. If you follow this approach you will have success in solving these problems. 1. Draw out the problem using the normal distribution 2. Calculate the relevant z-scores K11352_Ilvento_CH08.indd 154 7/12/13 2:41 PM c h a p t e r 8 Continuous Random Variables and the Normal Distribution Probabilities Associated with Plus or Minus 2.0 Standard Deviations about the Mean in the Standard Normal Table R I C A R D , Deviations about the Mean Probabilities Associated with Plus or Minus 3.0 Standard in the Standard Normal Table 155 figure 8.5 figure 8.6 A 3. Look up probabilities in the standard normal table 4. Do any necessary calculations (the gymnastics) D R10). What is the probabilNormal Probability Problem 1. Suppose X ~ N(100, ity that X is greater than 115? I E hand tail that is greater In this problem we are looking for the area in the right than 115. If we were drawing this problem out, the N curve would reflect the area shaded in Figure 8.7. This area requires us to calculate a z-score, find the N probability in the standard normal table, and subtract that probability from .5 E hand side of the distribuin order to find the remaining probability in the right tion. This is because our standard normal table provides probabilities from the center of the distribution out into the right hand tail. The answer shown 2 above 115 for a variable in Table 8.7 is .0668 – 6.68% of the values should fall distributed normally with a mean of 100 and a standard deviation of 10. 4 1. Draw out the problem using the normal distribution 7 9 T S 2. Calculate the relevant z-scores Z = (115 − 100)/10 = 1.50 3. Look up probabilities in the ­standard normal table 1.50 in the standard normal table is associated with a probability of .4332. 4. Do any necessary calculations (the gymnastics) We want the probability out into the right hand tail. This requires us to subtract .4332 from .5. P (X > 115) = .5 − .4332 = .0668 Solution for the Probability X > 115 for a Variable ~N(100, 10) K11352_Ilvento_CH08.indd 155 figure 8.7 7/12/13 2:41 PM 156 c h a p t e r 8 Continuous Random Variables and the Normal Distribution Normal Probability Problem 2. Suppose X ~ N(100, 10). What is the probability that X is less than 115? In this problem we are looking for the area left of 115. If we were drawing this problem out, the drawing would reflect the area shaded in Figure 8.8. This area requires us to calculate a z-score, find the probability in the standard normal table, and add that probability to .5 in order to find the remaining probability to the left of 115. This is because our standard normal table provides probabilities from the center of the distribution out into the right hand tail and we want to add the whole left hand side of the distribution. The answer shown in Figure 8.8 is .9332 – 93.32% of the values should fall below 115 for a variable distributed normally with a mean of 100 and a standard deviation of 10. R I 1. Draw out the problem using C the normal distribution A R D , 2. Calculate the relevant z-scores Z = (115 − 100)/10 = 1.50 3. Look up probabilities in the A 1.50 in the standard normal table is associated with a standard normal table D probability of .4332. 4. Do any necessary calculations We want the probability to the left of 115. This requires R us to add .4332 to .5. (the gymnastics) I P (X < 115) = .5 + .4332 = .9332 figure 8.8 E Solution for the Probability X 115) = C .5 − .4772 = .0228 A Solution for the Probability X < 80 for a Variable ~N(100, R 10) figure 8.9 D ­ rawing would reflect the area shaded in Figure 8.10. This area requires us to d , calculate two z-scores, find the probability in the standard normal table associated with each of them, and add them together. The answer shown in Figure 8.10 is .9104 – 91.04% of the values should fall between A 80 and 115 for a variable distributed normally with a mean of 100 and a standard deviation of 10. 1. Draw out the problem using the normal distribution 2. Calculate the relevant z-scores D R I E N N = −2.00 Z1 = (80 − 100)/10 Z2 = (115 − 100)/10 E = 1.50 3. Look up probabilities in the standard normal table −2.00 (converted to 2.00) in the standard normal table is associated with a probability 2 of .4772. 1.50 in the standard 4 normal table is associated with a probability of .4332. 4. Do any necessary calculations (the gymnastics) We want the probability between the two 9 values. This requires us to add .4772 and .4332. T P (80 < X < 115) = .4772 + .4332 = .9104 7 S Solution for the Probability X > 80 and X < 115 for a Variable ~N(100, 10) figure 8.10 Normal Probability Problem 5. Suppose X ~ N(100, 10). What is the probability that X is between 110 and 125? In this problem we are looking for the area between two values that are on the same side of the mean (100). If we were drawing this problem out, the curve would reflect the area shaded in Figure 8.11. This area requires us to calculate two z-scores, find the probability in the standard normal table K11352_Ilvento_CH08.indd 157 7/12/13 2:41 PM 158 c h a p t e r 8 Continuous Random Variables and the Normal Distribution ­ ssociated with each of them, and then subtract the smaller value from the a larger value to get the probability between them. The answer shown in Figure 8.11 is .1525 – 15.25% of the values should fall between 110 and 125 for a variable distributed normally with a mean of 100 and a standard deviation of 10. 1. Draw out the ­problem using the normal ­distribution 2. Calculate the relevant z-scores 3. Look up probabilities in the standard normal table 4. Do any necessary ­calculations (the ­gymnastics) figure 8.11 R I ZC = (110 − 100)/10 = 1.00 1 Z2 = (125 − 100)/10 = 2.50 A 1.00 in the standard normal table is associated with a R probability of .3413. D in the standard normal table is associated with a 2.50 probability of .4938. , We want the probability between the two values. This requires us to subtract .3413 from .4938. PA (110 < X < 125) = .4938 − .3413 = .1525 D Solution for the Probability X >R110 and X < 125 for a Variable ~N(100, 10) Calculating a Value of X Ifrom a Given Percentile E Another problem we might solve in the standard normal table is calculatN ing the value of X at a given percentile. Suppose X ~ N(50, 5) and we want to find the value of X at theN75th percentile. We can use the standard normal table to help solve this problem, but we will use it in a slightly different way. E In these problems we will search for a probability in the table and read the z-value associated with it. Then we use the z-value to calculate the value of X. A ­partial standard normal table 2 is provided below to help in these problems (Table 8.3). The shaded areas reflect probabilities for the problems. 4 Here are the basic steps to 7 solving a value of X at a given percentile. 9 1. Draw out the desired area. 2. Find the probability in the T table associate with the desired percentile. 3. Read the z-score based on the percentile, being careful to keep the right S sign. 4. Solve for the value of X based on the z-score. The formulas we need are: Z= (X − ) σ and X = (σ ⋅ Z ) + Percentile Problem 1. The height of adult males is distributed approximately normally with a mean of 176 cm and a standard deviation of 7.8, Male height ~N(176, 7.8). What is the height at the 80th percentile? K11352_Ilvento_CH08.indd 158 7/12/13 2:41 PM c h a p t e r 8 Continuous Random Variables and the Normal Distribution Partial Probabilities Found under One-Half of the Standard Normal Distribution for Solving Percentile Problems 159 table 8.3 Standard Normal Curve Probability Distribution The table is based on the upper right half of the normal distribution; total area shown is .5 The z-score values are represented by the column value + row value, up to two decimal places The probabilities up to the z-score are in the cells 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0000 0.0398 0.0793 0.1179 0.1554 0.1915 0.2257 0.2580 0.2881 0.3159 0.3413 0.0040 0.0438 0.0832 0.1217 0.1591 0.1950 0.2291 0.2611 0.2910 0.3186 0.3438 0.0080 0.0478 0.0871 0.1255 0.1628 0.1985 0.2324 0.2642 0.2939 0.3212 0.3461 0.0120 0.0517 0.0910 0.1293 0.1664 0.2019 0.2357 0.2673 0.2967 0.3238 0.3485 0.0160 0.0557 R 0.0948 0.1331 I 0.1700 C 0.2054 0.2389 A 0.2704 R 0.2995 0.3264 D 0.3508 0.0199 0.0596 0.0987 0.1368 0.1736 0.2088 0.2422 0.2734 0.3023 0.3289 0.3531 0.0239 0.0636 0.1026 0.1406 0.1772 0.2123 0.2454 0.2764 0.3051 0.3315 0.3554 0.0279 0.0675 0.1064 0.1443 0.1808 0.2157 0.2486 0.2794 0.3078 0.3340 0.3577 0.0319 0.0714 0.1103 0.1480 0.1844 0.2190 0.2517 0.2823 0.3106 0.3365 0.3599 0.0359 0.0753 0.1141 0.1517 0.1879 0.2224 0.2549 0.2852 0.3133 0.3389 0.3621 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830 Z , A D R the standard normal To solve this we look for a probability of .3000 inside table. Why do we look for .3000? The 80th percentileI represents the .50 area of the left hand side of the distribution along with an additional .30 probability of the right hand side. While there is not an exactEprobability of .30 in Table 8.3, we can see a value of .2995 in the table (highlighted in Table 8.3) which N is very close. This will do for our purposes. The z-value associated with this probability is .84 (I read from the probability out toNthe row and column margins). We use this value to solve for the value of E X. The answer is found in th Probabilities computed using Microsoft Excel Figure 8.12. A value of 182.55 cm is the height of males at the 80 percentile. 1. Draw out the desired area. 2. Find the probability in the table ­associate with the desired percentile. 2 4 7 9 T We are looking Sfor a probability close to .3000 in the standard normal distribution. A probability of .2995 can be found, which is very close. 3. Read the z-score based on the percentile, being careful to keep the right sign. The z-value associated with .2995 is .84. 4. Solve for the value of X based on the z-score. X = 7.8*.84 + 176 = 182.55 Solution for Solving the 80th Percentile for the Average Height of Males (in Centi­ meters) That Is ~N(176, 7.8) K11352_Ilvento_CH08.indd 159 figure 8.12 7/12/13 2:41 PM 160 c h a p t e r 8 Continuous Random Variables and the Normal Distribution Percentile Problem 2. The height of adult males is distributed approximately normally with a mean of 176 cm and a standard deviation of 7.8, Male height ~N(176, 7.8). What is the height at the 30th percentile? To solve this we look for a probability of .2000 inside the standard normal table. Why do we look for .2000? The 30th percentile represents a .30 area from the left hand side of the distribution. But the table works from the center out to the tails. So a probability of .2000 is the area from the center and .3000 is the remaining area out in the left hand tail of the distribution. While there is not an exact probability of .2000 in Table 8.3, we can see a value of .1985 and a second value of .2019 in the table (highlighted in Table 8.3). Roughly in R would be .2000, so I will use a z-value of −.525 the middle of these two values (the middle of −.52 and −.53). I You should note that the z-score is negative for this problem because we are dealing with a value below the mean. We use this value to solve for the C value of X. The answer is found in Figure 8.13. A value of 171.91 cm is the height A of males at the 30th percentile. 1. Draw out the desired area. R D , A D 2. Find the probability in the R table ­associate with the desired ­percentile. I E N 3. Read the z-score based on the percentile, being careful to keep the N right sign. Eon 4. Solve for the value of X based the z-score. figure 8.13 We are looking for a probability close to .2000 in the standard normal distribution. Probabilities of .1985 and .2019 can be found which are very close, and we will use a z-value in the middle of these two probabilities. The z-value associated with .2000 is −.525. X = 7.8* − .525 + 176 = 171.91 2 Solution for Solving the 30th Percentile for the Average Height of Males (in Centime4 ters) That Is ~N(176, 7.8) 7 9 to the Binomial Distribution The Normal Approximation In Chapter 7 we noted that T proportions for yes/no or other dichotomous variables can be thought of as binomial random variables. Remember for a binoS mial the focus is on N, the sample size; p, the probability of a success in a random trial; and q, the probability of a failure (1 − p). It turns out that when N is large relative to the size of p, the binomial distribution appears like a normal distribution. Even though the binomial is a discrete distribution, the continuous normal distribution provides a reasonable approximation and can be far easier to calculate when dealing with probability problems and later in inference. The relationship between N and p and how they affect how normal the distribution appears to be is easy to show. Figure 8.14 shows the binomial distribution for p = .2 when N = 5 and N = 25. As N gets larger, the distribution K11352_Ilvento_CH08.indd 160 7/12/13 2:41 PM c h a p t e r 8 Continuous Random Variables and the Normal Distribution 161 looks more like a symmetrical, mound-shaped distribution. And the larger N gets, for any value of p, the more the distribution approximates a normal distribution. The graph on the right looks more like a symmetrical distribution. If we showed the same plot for N = 100 it would appear even more like the normal distribution. R I C A25 The Binomial Distribution for p = .20 and N = 5 and N = R The general rule of thumb for using the normal distribution to approximate probabilities for proportions is that either p*N or qD *N must equal at least 5.0. , figure 8.14 We require either N*p ≥ 5 or N*q ≥ 5 A N = 5 the calculation Note that in the scenario depicted in Figure 8.14, when of N*P equals only 1. However, in the second calculation, with n = 25, the D calculation of 25*.2 equals 5. Our rule of thumb focuses on p or q, whichever R (i.e., with p or q less reflects the smaller proportion. As a result, rare events than .05) will require large sample sizes in order to I use the normal approximation. My own feeling is that when we use this approach we should err on E rare events. I prefer a the safe side and be very careful when dealing with calculation of 10 or more in order to use the normal N approximation. Small sample problems should use the binomial distribution to calculate problems. N Table 8.4 provides calculations for different combinations of N and p to demE onstrate what type of sample is needed for the normal approximation to work. The probabilities only go to .5 because we only need to work with the lower of p or q to generate the needed sample 2 size. The shaded area represents all the combinations of N and p that generate a calculation of 5.0 4 or higher and the darkest shading shows the combinations that generate a value of 10 or higher. It is clear from this table that 7 the closer p or q is to .5, the smaller the sample size needed for the approximation. 9 Let us look at a proportion problem and see how T the normal approximation to the binomial distribution would work in practice.S The proportion answering yes to a public policy question is believed to be .60. If we randomly sampled 50 people, what is the probability that more that 35 people would support this measure? This proportion can be thought of as a binomial random variable with p = .6 and N = 50. The expected value and variance for this distribution are given below. Expected Value E(X) = 50*.6 = 30.0 Variance E(X − 30)2 = 50*.6*.4 = 12.0 Standard Deviation SQRT(12.0) = 3.4641 K11352_Ilvento_CH08.indd 161 7/12/13 2:41 PM 162 c h a p t e r 8 Continuous Random Variables and the Normal Distribution table 8.4 Combinations of p and N to Generate the Minimum Requirement for ...
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Hello, I'm done with your assignment, all areas are well detailed as per your instructions.
check_circle Thomas574 marked this question as complete.

Running head: PROBABILITY AND STATISTICS

Probability and Statistics
Institution Affiliation
Name of Professor
Name of the Student
Date of Submission

1

PROBABILITY AND STATISTICS

2
Question 1
Solution

The outcomes, the number of combinations, and the probabilities (thr...


Anonymous
I use Studypool every time I need help studying, and it never disappoints.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags