Multivariate Data Analytics, assignment help

User Generated

PbaarePJ

Business Finance

Description

Multivariate Data Analytics. Please review the attached document titled homework statistical interface. Due May 30th.

Unformatted Attachment Preview

BIA652 Homework – Statistical Inference Review – 01 (Please show all work, copy/paste as needed from any computer output) 1. Television viewing reached a new high when the Nielsen Company reported a mean daily viewing time of 8.35 hours per household (USA Today, November 11, 2009). Use a normal probability distribution with a standard deviation of 2.5 hours to answer the following questions about daily television viewing per household. a. What is the probability that a household views television between 5 and 10 hours a day? b. How many hours of television viewing must a household have in order to be in the top 3% of all television viewing households? c. What is the probability that a household views television more than 3 hours a day? 2. According to the Sleep Foundation, the average night’s sleep is 6.8 hours (Fortune, March20, 2006). Assume the standard deviation is .6 hours and that the probability distribution is normal. a. What is the probability that a randomly selected person sleeps more than 8 hours? b. What is the probability that a randomly selected person sleeps 6 hours or less? c. Doctors suggest getting between 7 and 9 hours of sleep each night. What percentage of the population gets this much sleep? 3. The mean preparation fee H&R Block charged retail customers last year was $183 (The Wall Street Journal, March 7, 2012). Use this price as the population mean and assume the population standard deviation of preparation fees is $50. a. What is the probability that the mean price for a sample of 30 H&R Block retail customers is within $8 of the population mean? b. What is the probability that the mean price for a sample of 50 H&R Block retail customers is within $8 of the population mean? c. What is the probability that the mean price for a sample of 100 H&R Block retail customers is within $8 of the population mean? d. Which, if any, of the sample sizes in parts (a), (b), and (c) would you recommend to have at least a .95 probability that the sample mean is within $8 of the population mean? 4. The latest available data showed health expenditures were $8086 per person in the United States or 17.6% of gross domestic product (Centers for Medicare & Medicaid Services website, April 1, 2012). Use $8086 as the population mean and suppose a survey research firm will take a sample of 100 people to investigate the nature of their health expenditures. Assume the population standard deviation is $2500. a. Show the sampling distribution of the mean amount of health care expenditures for a sample of 100 people. b. What is the probability the sample mean will be within ± $200 of the population mean? c. What is the probability the sample mean will be greater than $9000? If the survey research firm reports a sample mean greater than $9000, would you question whether the firm followed correct sampling procedures? Why or why not? Continuous Probability Distributions  Uniform Probability Distribution Normal Probability Distribution Normal Approximation of Binomial Probabilities  Exponential Probability Distribution   f (x) f (x) Exponential Uniform f (x) Normal x x x TFB 1 Continuous Probability Distributions  A continuous random variable can assume any value in an interval on the real line or in a collection of intervals.  It is not possible to talk about the probability of the random variable assuming a particular value.  Instead, we talk about the probability of the random variable assuming a value within a given interval. TFB 2 Continuous Probability Distributions  f (x) The probability of the random variable assuming a value within some given interval from x1 to x2 is defined to be the area under the graph of the probability density function between x1 and x2. f (x) Exponential Uniform f (x) x1 x 2 Normal x1 xx12 x2 x x1 x2 x x TFB 3 Uniform Probability Distribution  A random variable is uniformly distributed whenever the probability is proportional to the interval’s length.  The uniform probability density function is: f (x) = 1/(b – a) for a < x < b =0 elsewhere where: a = smallest value the variable can assume b = largest value the variable can assume TFB 4 Uniform Probability Distribution  Expected Value of x E(x) = (a + b)/2  Variance of x Var(x) = (b - a)2/12 TFB 5 Uniform Probability Distribution  Example: Slater's Buffet Slater customers are charged for the amount of salad they take. Sampling suggests that the amount of salad taken is uniformly distributed between 5 ounces and 15 ounces. TFB 6 Uniform Probability Distribution  Uniform Probability Density Function f(x) = 1/10 for 5 < x < 15 =0 elsewhere where: x = salad plate filling weight TFB 7 Uniform Probability Distribution  Expected Value of x E(x) = (a + b)/2 = (5 + 15)/2 = 10  Variance of x Var(x) = (b - a)2/12 = (15 – 5)2/12 = 8.33 TFB 8 Uniform Probability Distribution  Uniform Probability Distribution for Salad Plate Filling Weight f(x) 1/10 0 5 10 Salad Weight (oz.) x 15 TFB 9 Uniform Probability Distribution What is the probability that a customer will take between 12 and 15 ounces of salad? f(x) P(12 < x < 15) = 1/10(3) = .3 1/10 0 5 10 12 Salad Weight (oz.) x 15 TFB 10 Area as a Measure of Probability  The area under the graph of f(x) and probability are identical.  This is valid for all continuous random variables.  The probability that x takes on a value between some lower value x1 and some higher value x2 can be found by computing the area under the graph of f(x) over the interval from x1 to x2. TFB 11 Normal Probability Distribution    The normal probability distribution is the most important distribution for describing a continuous random variable. It is widely used in statistical inference. It has been used in a wide variety of applications including: • Heights of people • Rainfall amounts • Test scores • Scientific measurements  Abraham de Moivre, a French mathematician, published The Doctrine of Chances in 1733.  He derived the normal distribution. TFB 12 Normal Probability Distribution  Normal Probability Density Function 1  ( x   )2 /2 2 f (x)  e  2 where:  = mean  = standard deviation  = 3.14159 e = 2.71828 TFB 13 Normal Probability Distribution  Characteristics The distribution is symmetric; its skewness measure is zero. x TFB 14 Normal Probability Distribution  Characteristics The entire family of normal probability distributions is defined by its mean  and its standard deviation  . Standard Deviation  Mean  x TFB 15 Normal Probability Distribution  Characteristics The highest point on the normal curve is at the mean, which is also the median and mode. x TFB 16 Normal Probability Distribution  Characteristics The mean can be any numerical value: negative, zero, or positive. x -10 0 25 TFB 17 Normal Probability Distribution  Characteristics The standard deviation determines the width of the curve: larger values result in wider, flatter curves.  = 15  = 25 x TFB 18 Normal Probability Distribution  Characteristics Probabilities for the normal random variable are given by areas under the curve. The total area under the curve is 1 (.5 to the left of the mean and .5 to the right). .5 .5 x TFB 19 Normal Probability Distribution  Characteristics (basis for the empirical rule) 68.26% of values of a normal random variable are within +/- 1 standard deviation of its mean. 95.44% of values of a normal random variable are within +/- 2 standard deviations of its mean. 99.72% of values of a normal random variable are within +/- 3 standard deviations of its mean. TFB 20 Normal Probability Distribution  Characteristics (basis for the empirical rule) 99.72% 95.44% 68.26%  – 3  – 1  – 2   + 3  + 1  + 2 x TFB 21 Standard Normal Probability Distribution  Characteristics A random variable having a normal distribution with a mean of 0 and a standard deviation of 1 is said to have a standard normal probability distribution. TFB 22 Standard Normal Probability Distribution  Characteristics The letter z is used to designate the standard normal random variable. 1 z 0 TFB 23 Standard Normal Probability Distribution  Converting to the Standard Normal Distribution z x  We can think of z as a measure of the number of standard deviations x is from . TFB 24 Standard Normal Probability Distribution  Example: Pep Zone Pep Zone sells auto parts and supplies including a popular multi-grade motor oil. When the stock of this oil drops to 20 gallons, a replenishment order is placed. The store manager is concerned that sales are being lost due to stockouts while waiting for a replenishment order. TFB 25 Standard Normal Probability Distribution  Example: Pep Zone It has been determined that demand during replenishment lead-time is normally distributed with a mean of 15 gallons and a standard deviation of 6 gallons. The manager would like to know the probability of a stockout during replenishment lead-time. In other words, what is the probability that demand during lead-time will exceed 20 gallons? P(x > 20) = ? TFB 26 Standard Normal Probability Distribution  Solving for the Stockout Probability Step 1: Convert x to the standard normal distribution. z = (x - )/ = (20 - 15)/6 = .83 Step 2: Find the area under the standard normal curve to the left of z = .83. see next slide TFB 27 Standard Normal Probability Distribution  Cumulative Probability Table for the Standard Normal Distribution z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 . . . . . . . . . . . .5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224 .6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549 .7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852 .8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133 .9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389 . . . . . . . . . . . P(z < .83) TFB 28 Standard Normal Probability Distribution  Solving for the Stockout Probability Step 3: Compute the area under the standard normal curve to the right of z = .83. P(z > .83) = 1 – P(z < .83) = 1- .7967 = .2033 Probability of a stockout P(x > 20) TFB 29 Standard Normal Probability Distribution  Solving for the Stockout Probability Area = 1 - .7967 Area = .7967 = .2033 0 .83 z TFB 30 Standard Normal Probability Distribution  Standard Normal Probability Distribution If the manager of Pep Zone wants the probability of a stockout during replenishment lead-time to be no more than .05, what should the reorder point be? --------------------------------------------------------------(Hint: Given a probability, we can use the standard normal table in an inverse fashion to find the corresponding z value.) TFB 31 Standard Normal Probability Distribution  Solving for the Reorder Point Area = .9500 Area = .0500 0 z.05 z TFB 32 Standard Normal Probability Distribution  Solving for the Reorder Point Step 1: Find the z-value that cuts off an area of .05 in the right tail of the standard normal distribution. z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 . . . . . . . . . . . 1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441 1.6 .9452 .9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .9545 1.7 .9554 .9564 .9573 .9582 .9591 .9599 .9608 .9616 .9625 .9633 up.9706 1.8 .9641 .9649 .9656 .9664 .9671 .9678 .9686 We .9693look .9699 the.9756 complement 1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9761 .9767 . . . . . . . . of the tail. area . . (1 - .05 = .95) TFB 33 Standard Normal Probability Distribution  Solving for the Reorder Point Step 2: Convert z.05 to the corresponding value of x. x =  + z.05 = 15 + 1.645(6) = 24.87 or 25 A reorder point of 25 gallons will place the probability of a stockout during leadtime at (slightly less than) .05. TFB 34 Normal Probability Distribution  Solving for the Reorder Point Probability of no stockout during replenishment lead-time = .95 Probability of a stockout during replenishment lead-time = .05 15 24.87 x TFB 35 Standard Normal Probability Distribution  Solving for the Reorder Point By raising the reorder point from 20 gallons to 25 gallons on hand, the probability of a stockout decreases from about .20 to .05. This is a significant decrease in the chance that Pep Zone will be out of stock and unable to meet a customer’s desire to make a purchase. TFB 36 Excel NORMDIST TFB 37 Excel NORMINV TFB 38 Excel NORMSDIST TFB 39 Excel NORMSINV TFB 40 Excel STANDARDIZE TFB 41 Normal Approximation of Binomial Probabilities When the number of trials, n, becomes large, evaluating the binomial probability function by hand or with a calculator is difficult. The normal probability distribution provides an easy-to-use approximation of binomial probabilities where np > 5 and n(1 - p) > 5. In the definition of the normal curve, set  = np and   np (1  p ) TFB 42 Normal Approximation of Binomial Probabilities Add and subtract a continuity correction factor because a continuous distribution is being used to approximate a discrete distribution. For example, P(x = 12) for the discrete binomial probability distribution is approximated by P(11.5 < x < 12.5) for the continuous normal distribution. TFB 43 Normal Approximation of Binomial Probabilities  Example Suppose that a company has a history of making errors in 10% of its invoices. A sample of 100 invoices has been taken, and we want to compute the probability that 12 invoices contain errors. In this case, we want to find the binomial probability of 12 successes in 100 trials. So, we set:  = np = 100(.1) = 10   np (1  p ) = [100(.1)(.9)] ½ = 3 TFB 44 Normal Approximation of Binomial Probabilities  Normal Approximation to a Binomial Probability Distribution with n = 100 and p = .1 =3 P(11.5 < x < 12.5) (Probability of 12 Errors)  = 10 11.5 12.5 x TFB 45 Normal Approximation of Binomial Probabilities  Normal Approximation to a Binomial Probability Distribution with n = 100 and p = .1 P(x < 12.5) = .7967 10 12.5 x TFB 46 Normal Approximation of Binomial Probabilities  Normal Approximation to a Binomial Probability Distribution with n = 100 and p = .1 P(x < 11.5) = .6915 10 x 11.5 TFB 47 Normal Approximation of Binomial Probabilities  The Normal Approximation to the Probability of 12 Successes in 100 Trials is .1052 P(x = 12) = .7967 - .6915 = .1052 10 11.5 12.5 x TFB 48 Exponential Probability Distribution   The exponential probability distribution is useful in describing the time it takes to complete a task. The exponential random variables can be used to describe: •Time between vehicle arrivals at a toll booth •Time required to complete a questionnaire •Distance between major defects in a highway  In waiting line applications, the exponential distribution is often used for service times. TFB 49 Exponential Probability Distribution  A property of the exponential distribution is that the mean and standard deviation are equal.  The exponential distribution is skewed to the right. Its skewness measure is 2. TFB 50 Exponential Probability Distribution  Density Function f ( x)  where: 1  e x / for x > 0  = expected or mean e = 2.71828 TFB 51 Exponential Probability Distribution  Cumulative Probabilities P ( x  x0 )  1  e  xo /  where: x0 = some specific value of x TFB 52 Exponential Probability Distribution  Example: Al’s Full-Service Pump The time between arrivals of cars at Al’s fullservice gas pump follows an exponential probability distribution with a mean time between arrivals of 3 minutes. Al would like to know the probability that the time between two successive arrivals will be 2 minutes or less. TFB 53 Exponential Probability Distribution  Example: Al’s Full-Service Pump f(x) .4 P(x < 2) = 1 - 2.71828-2/3 = 1 - .5134 = .4866 .3 .2 .1 x 0 1 2 3 4 5 6 7 8 9 10 Time Between Successive Arrivals (mins.) TFB 54 Excel EXPONDIST TFB 55 Relationship between the Poisson and Exponential Distributions The Poisson distribution provides an appropriate description of the number of occurrences per interval The exponential distribution provides an appropriate description of the length of the interval between occurrences TFB 56 End TFB 57 Sampling and Sampling Distributions  Selecting a Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution of x  Sampling Distribution of p  Properties of Point Estimators  Other Sampling Methods TFB 1 Introduction An element is the entity on which data are collected. A population is a collection of all the elements of interest. A sample is a subset of the population. The sampled population is the population from which the sample is drawn. A frame is a list of the elements that the sample will be selected from. TFB 2 Introduction The reason we select a sample is to collect data to answer a research question about a population. The sample results provide only estimates of the values of the population characteristics. The reason is simply that the sample contains only a portion of the population. With proper sampling methods, the sample results can provide “good” estimates of the population characteristics. TFB 3 Selecting a Sample  Sampling from a Finite Population  Sampling from an Infinite Population TFB 4 Sampling from a Finite Population  Finite populations are often defined by lists such as: • Organization membership roster • Credit card account numbers • Inventory product numbers  A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected. TFB 5 Sampling from a Finite Population  Replacing each sampled element before selecting subsequent elements is called sampling with replacement.  Sampling without replacement is the procedure used most often.  In large sampling projects, computer-generated random numbers are often used to automate the sample selection process. TFB 6 Sampling from a Finite Population  Example: St. Andrew’s College St. Andrew’s College received 900 applications for admission in the upcoming year from prospective students. The applicants were numbered, from 1 to 900, as their applications arrived. The Director of Admissions would like to select a simple random sample of 30 applicants. TFB 7 Sampling from a Finite Population  Example: St. Andrew’s College Step 1: Assign a random number to each of the 900 applicants. The random numbers generated by Excel’s RAND function follow a uniform probability distribution between 0 and 1. Step 2: Select the 30 applicants corresponding to the 30 smallest random numbers. TFB 8 Sampling from an Infinite Population  Sometimes we want to select a sample, but find it is not possible to obtain a list of all elements in the population.  As a result, we cannot construct a frame for the population.  Hence, we cannot use the random number selection procedure.  Most often this situation occurs in infinite population cases. TFB 9 Sampling from an Infinite Population  Populations are often generated by an ongoing process where there is no upper limit on the number of units that can be generated.  Some examples of on-going processes, with infinite populations, are: • parts being manufactured on a production line • transactions occurring at a bank • telephone calls arriving at a technical help desk • customers entering a store TFB 10 Sampling from an Infinite Population  In the case of an infinite population, we must select a random sample in order to make valid statistical inferences about the population from which the sample is taken.  A random sample from an infinite population is a sample selected such that the following conditions are satisfied. • Each element selected comes from the population of interest. • Each element is selected independently. TFB 11 Point Estimation Point estimation is a form of statistical inference. In point estimation we use the data from the sample to compute a value of a sample statistic that serves as an estimate of a population parameter. We refer to mean . as the point estimator of the population s is the point estimator of the population standard deviation . is the point estimator of the population proportion p. TFB 12 Point Estimation  Example: St. Andrew’s College Recall that St. Andrew’s College received 900 applications from prospective students. The application form contains a variety of information including the individual’s Scholastic Aptitude Test (SAT) score and whether or not the individual desires on-campus housing. At a meeting in a few hours, the Director of Admissions would like to announce the average SAT score and the proportion of applicants that want to live on campus, for the population of 900 applicants. TFB 13 Point Estimation  Example: St. Andrew’s College However, the necessary data on the applicants have not yet been entered in the college’s computerized database. So, the Director decides to estimate the values of the population parameters of interest based on sample statistics. The sample of 30 applicants is selected using computer-generated random numbers. TFB 14 Point Estimation  x as Point Estimator of  x  x 50,520   1684 30 30 i  s as Point Estimator of  s  2 ( x  x )  i 29  210, 512  85.2 29 p as Point Estimator of p p  20 30  .67 Note: Different random numbers would have identified a different sample which would have resulted in different point estimates. TFB 15 1 Point Estimation Once all the data for the 900 applicants were entered in the college’s database, the values of the population parameters of interest were calculated.  Population Mean SAT Score xi    1697 900  Population Standard Deviation for SAT Score   2 ( x   )  i 900  87.4 Population Proportion Wanting On-Campus Housing 648 p  .72 900 TFB 16 Summary of Point Estimates Obtained from a Simple Random Sample Population Parameter Parameter Value  = Population mean 1697 SAT score Point Estimator x = Sample mean 1684 85.2 SAT score  = Population std. 87.4 s = Sample standard deviation for SAT score p = Population proportion wanting campus housing .72 p = Sample pro- deviation for SAT score Point Estimate .67 portion wanting campus housing TFB 17 Practical Advice The target population is the population we want to make inferences about. The sampled population is the population from which the sample is actually taken. Whenever a sample is used to make inferences about a population, we should make sure that the targeted population and the sampled population are in close agreement. TFB 18 Sampling Distribution of x  Process of Statistical Inference Population with mean =? The value of x is used to make inferences about the value of . A simple random sample of n elements is selected from the population. The sample data provide a value for the sample mean x . TFB 19 Sampling Distribution of x The sampling distribution of x is the probability distribution of all possible values of the sample mean x. • Expected Value of x E( x ) =  where:  = the population mean When the expected value of the point estimator equals the population parameter, we say the point estimator is unbiased. TFB 20 Sampling Distribution of x • Standard Deviation of x We will use the following notation to define the standard deviation of the sampling distribution of x. x = the standard deviation of x  = the standard deviation of the population n = the sample size N = the population size TFB 21 Sampling Distribution of x • Standard Deviation of x Finite Population N n  x  ( ) N 1 n Infinite Population x   n • A finite population is treated as being infinite if n/N < .05. • ( N  n) / ( N  1) is the finite population correction factor. •  x is referred to as the standard error of the mean. TFB 22 Sampling Distribution of x When the population has a normal distribution, the sampling distribution of is normally distributed for any sample size. In most applications, the sampling distribution of can be approximated by a normal distribution whenever the sample is size 30 or more. In cases where the population is highly skewed or outliers are present, samples of size 50 may be needed. TFB 23 Sampling Distribution of x The sampling distribution of can be used to provide probability information about how close the sample mean is to the population mean  . TFB 24 The Distribution of the Sample Mean The Case of a Normal Population Distribution A normal population distribution and X sampling distributions. TFB 25 Simulation Example: Platelet Sizes… TFB 26 Simulation Example: Platelet Sizes… TFB 27 Simulation Example: Platelet Sizes… TFB 28 Central Limit Theorem When the population from which we are selecting a random sample does not have a normal distribution, the central limit theorem is helpful in identifying the shape of the sampling distribution of x . CENTRAL LIMIT THEOREM In selecting random samples of size n from a population, the sampling distribution of the sample mean can be approximated by a normal distribution as the sample size becomes large. TFB 29 The Distribution of the Sample Mean The Central Limit Theorem The Central Limit Theorem illustrated. TFB 30 Simulation Example: Electronic Control Lifetimes… TFB 31 Simulation Example: Electronic Control Lifetimes…(cont’d) TFB 32 Simulation Example: Electronic Control Lifetimes…(cont’d) TFB 33 Sampling Distribution of x  Example: St. Andrew’s College Sampling Distribution of x for SAT Scores E( x )  1697 x   n  87.4  15.96 30 x TFB 34 Sampling Distribution of x  Example: St. Andrew’s College What is the probability that a simple random sample of 30 applicants will provide an estimate of the population mean SAT score that is within +/10 of the actual population mean  ? In other words, what is the probability that x will be between 1687 and 1707? TFB 35 Sampling Distribution of x  Example: St. Andrew’s College Step 1: Calculate the z-value at the upper endpoint of the interval. z = (1707 - 1697)/15.96= .63 Step 2: Find the area under the curve to the left of the upper endpoint. P(z < .63) = .7357 TFB 36 Sampling Distribution of x  Example: St. Andrew’s College Cumulative Probabilities for the Standard Normal Distribution TFB 37 Sampling Distribution of x  Example: St. Andrew’s College Sampling Distribution of x for SAT Scores  x  15.96 Area = .7357 x 1697 1707 TFB 38 Sampling Distribution of x  Example: St. Andrew’s College Step 3: Calculate the z-value at the lower endpoint of the interval. z = (1687 - 1697)/15.96= - .63 Step 4: Find the area under the curve to the left of the lower endpoint. P(z < -.63) = .2643 TFB 39 Sampling Distribution of x for SAT Scores  Example: St. Andrew’s College Sampling Distribution of x for SAT Scores  x  15.96 Area = .2643 x 1687 1697 TFB 40 Sampling Distribution of x for SAT Scores  Example: St. Andrew’s College Step 5: Calculate the area under the curve between the lower and upper endpoints of the interval. P(-.68 < z < .68) = P(z < .68) - P(z < -.68) = .7357 - .2643 = .4714 The probability that the sample mean SAT score will be between 1687 and 1707 is: P(1687 < x < 1707) = .4714 TFB 41 Sampling Distribution of x for SAT Scores  Example: St. Andrew’s College Sampling Distribution of x for SAT Scores  x  15.96 Area = .4714 1687 1697 1707 x TFB 42 Relationship Between the Sample Size and the Sampling Distribution of x  Example: St. Andrew’s College • Suppose we select a simple random sample of 100 applicants instead of the 30 originally considered. • E( x) =  regardless of the sample size. In our example, E( x) remains at 1697. • Whenever the sample size is increased, the standard error of the mean  x is decreased. With the increase in the sample size to n = 100, the standard error of the mean is decreased from 15.96 to: x  N n    900  100  87.4    .94333(8.74)  8.2     N 1  n  900  1  100  TFB 43 Relationship Between the Sample Size and the Sampling Distribution of x  Example: St. Andrew’s College With n = 100,  x  8.2 With n = 30,  x  15.96 E( x )  1697 x TFB 44 Relationship Between the Sample Size and the Sampling Distribution of x  Example: St. Andrew’s College • Recall that when n = 30, P(1687 < x < 1707) = .4714. • We follow the same steps to solve for P(1687 < x < 1707) when n = 100 as we showed earlier when n = 30. • Now, with n = 100, P(1687 < x < 1707) = .7776. • Because the sampling distribution with n = 100 has a smaller standard error, the values of x have less variability and tend to be closer to the population mean than the values of x with n = 30. TFB 45 Relationship Between the Sample Size and the Sampling Distribution of x  Example: St. Andrew’s College Sampling Distribution of x for SAT Scores  x  8.2 Area = .7776 1687 1697 1707 x TFB 46 Sampling Distribution of p  Making Inferences about a Population Proportion Population with proportion p=? The value of p is used to make inferences about the value of p. A simple random sample of n elements is selected from the population. The sample data provide a value for the sample proportion p. TFB 47 Sampling Distribution of p The sampling distribution of p is the probability distribution of all possible values of the sample proportion p. • Expected Value of p E ( p)  p where: p = the population proportion TFB 48 Sampling Distribution of p • Standard Deviation of p Finite Population N n p  N 1 p(1  p) n Infinite Population p  p (1  p ) n •  p is referred to as the standard error of the proportion. • ( N  n) / ( N  1) is the finite population correction factor. TFB 49 Form of the Sampling Distribution of p The sampling distribution of p can be approximated by a normal distribution whenever the sample size is large enough to satisfy the two conditions: np > 5 and n(1 – p) > 5 . . . because when these conditions are satisfied, the probability distribution of x in the sample proportion, p = x/n, can be approximated by normal distribution (and because n is a constant). TFB 50 Sampling Distribution of p  Example: St. Andrew’s College Recall that 72% of the prospective students applying to St. Andrew’s College desire on-campus housing. What is the probability that a simple random sample of 30 applicants will provide an estimate of the population proportion of applicant desiring on-campus housing that is within plus or minus .05 of the actual population proportion? TFB 51 Sampling Distribution of p  Example: St. Andrew’s College For our example, with n = 30 and p = .72, the normal distribution is an acceptable approximation because: np = 30(.72) = 21.6 > 5 and n(1 - p) = 30(.28) = 8.4 > 5 TFB 52 Sampling Distribution of p  Example: St. Andrew’s College Sampling Distribution of p p  E( p )  .72 .72(1  .72)  .082 30 p TFB 53 Sampling Distribution of p  Example: St. Andrew’s College Step 1: Calculate the z-value at the upper endpoint of the interval. z = (.77 - .72)/.082 = .61 Step 2: Find the area under the curve to the left of the upper endpoint. P(z < .61) = .7291 TFB 54 Sampling Distribution of p  Example: St. Andrew’s College Cumulative Probabilities for the Standard Normal Distribution TFB 55 Sampling Distribution of p  Example: St. Andrew’s College Sampling Distribution of p  p  .082 Area = .7291 p .72 .77 TFB 56 Sampling Distribution of p  Example: St. Andrew’s College Step 3: Calculate the z-value at the lower endpoint of the interval. z = (.67 - .72)/.082 = - .61 Step 4: Find the area under the curve to the left of the lower endpoint. P(z < -.61) = .2709 TFB 57 Sampling Distribution of p  Example: St. Andrew’s College Sampling Distribution of p  p  .082 Area = .2709 p .67 .72 TFB 58 Sampling Distribution of p  Example: St. Andrew’s College Step 5: Calculate the area under the curve between the lower and upper endpoints of the interval. P(-.61 < z < .61) = P(z < .61) - P(z < -.61) = .7291 - .2709 = .4582 The probability that the sample proportion of applicants wanting on-campus housing will be within +/-.05 of the actual population proportion : P(.67 < p < .77) = .4582 TFB 59 Sampling Distribution of p  Example: St. Andrew’s College Sampling Distribution of p  p  .082 Area = .4582 p .67 .72 .77 TFB 60 Properties of Point Estimators  Before using a sample statistic as a point estimator, statisticians check to see whether the sample statistic has the following properties associated with good point estimators. Unbiased Efficiency Consistency TFB 61 Properties of Point Estimators Unbiased If the expected value of the sample statistic is equal to the population parameter being estimated, the sample statistic is said to be an unbiased estimator of the population parameter. TFB 62 Properties of Point Estimators Efficiency Given the choice of two unbiased estimators of the same population parameter, we would prefer to use the point estimator with the smaller standard deviation, since it tends to provide estimates closer to the population parameter. The point estimator with the smaller standard deviation is said to have greater relative efficiency than the other. TFB 63 Properties of Point Estimators Consistency A point estimator is consistent if the values of the point estimator tend to become closer to the population parameter as the sample size becomes larger. TFB 64 Other Sampling Methods  Stratified Random Sampling  Cluster Sampling  Systematic Sampling  Convenience Sampling  Judgment Sampling TFB 65 Stratified Random Sampling The population is first divided into groups of elements called strata. Each element in the population belongs to one and only one stratum. Best results are obtained when the elements within each stratum are as much alike as possible (i.e. a homogeneous group). TFB 66 Stratified Random Sampling A simple random sample is taken from each stratum. Formulas are available for combining the stratum sample results into one population parameter estimate. Advantage: If strata are homogeneous, this method is as “precise” as simple random sampling but with a smaller total sample size. Example: The basis for forming the strata might be department, location, age, industry type, and so on. TFB 67 Cluster Sampling The population is first divided into separate groups of elements called clusters. Ideally, each cluster is a representative small-scale version of the population (i.e. heterogeneous group). A simple random sample of the clusters is then taken. All elements within each sampled (chosen) cluster form the sample. TFB 68 Cluster Sampling Example: A primary application is area sampling, where clusters are city blocks or other well-defined areas. Advantage: The close proximity of elements can be cost effective (i.e. many sample observations can be obtained in a short time). Disadvantage: This method generally requires a larger total sample size than simple or stratified random sampling. TFB 69 Systematic Sampling If a sample size of n is desired from a population containing N elements, we might sample one element for every n/N elements in the population. We randomly select one of the first n/N elements from the population list. We then select every n/Nth element that follows in the population list. TFB 70 Systematic Sampling This method has the properties of a simple random sample, especially if the list of the population elements is a random ordering. Advantage: The sample usually will be easier to identify than it would be if simple random sampling were used. Example: Selecting every 100th listing in a telephone book after the first randomly selected listing TFB 71 Convenience Sampling It is a nonprobability sampling technique. Items are included in the sample without known probabilities of being selected. The sample is identified primarily by convenience. Example: A professor conducting research might use student volunteers to constitute a sample. TFB 72 Convenience Sampling Advantage: Sample selection and data collection are relatively easy. Disadvantage: It is impossible to determine how representative of the population the sample is. TFB 73 Judgment Sampling The person most knowledgeable on the subject of the study selects elements of the population that he or she feels are most representative of the population. It is a nonprobability sampling technique. Example: A reporter might sample three or four senators, judging them as reflecting the general opinion of the senate. TFB 74 Judgment Sampling Advantage: It is a relatively easy way of selecting a sample. Disadvantage: The quality of the sample results depends on the judgment of the person selecting the sample. TFB 75 Recommendation It is recommended that probability sampling methods (simple random, stratified, cluster, or systematic) be used. For these methods, formulas are available for evaluating the “goodness” of the sample results in terms of the closeness of the results to the population parameters being estimated. An evaluation of the goodness cannot be made with non-probability (convenience or judgment) sampling methods. TFB 76 End TFB 77 Statistical Inference – Part I Interval Estimation     Population Mean: s Known Population Mean: s Unknown Determining the Sample Size Population Proportion TFB 1 Margin of Error and the Interval Estimate A point estimator cannot be expected to provide the exact value of the population parameter. An interval estimate can be computed by adding and subtracting a margin of error to the point estimate. Point Estimate +/- Margin of Error The purpose of an interval estimate is to provide information about how close the point estimate is to the value of the parameter. TFB 2 Margin of Error and the Interval Estimate The general form of an interval estimate of a population mean is x  Margin of Error TFB 3 Interval Estimate of a Population Mean: s Known    In order to develop an interval estimate of a population mean, the margin of error must be computed using either: • the population standard deviation s , or • the sample standard deviation s s is rarely known exactly, but often a good estimate can be obtained based on historical data or other information. We refer to such cases as the s known case. TFB 4 Interval Estimate of a Population Mean: s Known There is a 1 -  probability that the value of a sample mean will provide a margin of error of z /2 s x or less. Sampling distribution of x /2 1 -  of all x values z /2 s x  /2 x z /2 s x TFB 5 Interval Estimate of a Population Mean: s Known Sampling distribution of x /2 interval does not include  1 -  of all x values z /2 s x  /2 x z /2 s x interval includes  x -------------------------] [------------------------- x -------------------------] [------------------------[------------------------- x -------------------------] TFB 6 Interval Estimate of a Population Mean: s Known  Interval Estimate of  x  z /2 where: s n x is the sample mean 1 - is the confidence coefficient z/2 is the z value providing an area of /2 in the upper tail of the standard normal probability distribution s is the population standard deviation n is the sample size TFB 7 Interval Estimate of a Population Mean: s Known  Values of z/2 for the Most Commonly Used Confidence Levels Confidence Level 90% 95% 99%  /2 Table Look-up Area .10 .05 .01 .05 .025 .005 .9500 .9750 .9950 z/2 1.645 1.960 2.576 TFB 8 Meaning of Confidence Because 90% of all the intervals constructed using x  1.645s x will contain the population mean, we say we are 90% confident that the interval x  1.645s x includes the population mean . We say that this interval has been established at the 90% confidence level. The value .90 is referred to as the confidence coefficient. TFB 9 Interval Estimate of a Population Mean: s Known  Example: Discount Sounds Discount Sounds has 260 retail outlets throughout the United States. The firm is evaluating a potential location for a new outlet, based in part, on the mean annual income of the individuals in the marketing area of the new location. A sample of size n = 36 was taken; the sample mean income is $41,100. The population is not believed to be highly skewed. The population standard deviation is estimated to be $4,500, and the confidence coefficient to be used in the interval estimate is .95. TFB 10 Interval Estimate of a Population Mean: s Known  Example: Discount Sounds 95% of the sample means that can be observed are within + 1.96 s x of the population mean . The margin of error is: z /2 s  4,500   1.96    1, 470 n  36  Thus, at 95% confidence, the margin of error is $1,470. TFB 11 Interval Estimate of a Population Mean: s Known  Example: Discount Sounds Interval estimate of  is: $41,100 + $1,470 or $39,630 to $42,570 We are 95% confident that the interval contains the population mean. TFB 12 Interval Estimate of a Population Mean: s Known  Example: Discount Sounds Confidence Margin Level of Error 90% 95% 99% 1233.75 1470.00 1932.00 Interval Estimate 39,866.25 to 42,333.75 39,630.00 to 42,570.00 39,168.00 to 43,032.00 In order to have a higher degree of confidence, the margin of error and thus the width of the confidence interval must be larger. TFB 13 Interval Estimate of a Population Mean: s Known  Adequate Sample Size In most applications, a sample size of n = 30 is adequate. If the population distribution is highly skewed or contains outliers, a sample size of 50 or more is recommended. TFB 14 Interval Estimate of a Population Mean: s Known  Adequate Sample Size (continued) If the population is not normally distributed but is roughly symmetric, a sample size as small as 15 will suffice. If the population is believed to be at least approximately normal, a sample size of less than 15 can be used. TFB 15 Interval Estimate of a Population Mean: s Unknown     If an estimate of the population standard deviation s cannot be developed prior to sampling, we use the sample standard deviation s to estimate s . This is the s unknown case. In this case, the interval estimate for  is based on the t distribution. (We’ll assume for now that the population is normally distributed.) TFB 16 t Distribution William Gosset, writing under the name “Student”, is the founder of the t distribution. Gosset was an Oxford graduate in mathematics and worked for the Guinness Brewery in Dublin. He developed the t distribution while working on small-scale materials and temperature experiments. TFB 17 t Distribution The t distribution is a family of similar probability distributions. A specific t distribution depends on a parameter known as the degrees of freedom. Degrees of freedom refer to the number of independent pieces of information that go into the computation of s. TFB 18 t Distribution A t distribution with more degrees of freedom has less dispersion. As the degrees of freedom increases, the difference between the t distribution and the standard normal probability distribution becomes smaller and smaller. TFB 19 t Distribution t distribution (20 degrees of freedom) Standard normal distribution t distribution (10 degrees of freedom) z, t 0 TFB 20 t Distribution For more than 100 degrees of freedom, the standard normal z value provides a good approximation to the t value. The standard normal z values can be found in the infinite degrees ( ) row of the t distribution table. TFB 21 t Distribution Standard normal z values TFB 22 Interval Estimate of a Population Mean: s Unknown  Interval Estimate x  t /2 s n where: 1 - = the confidence coefficient t/2 = the t value providing an area of /2 in the upper tail of a t distribution with n - 1 degrees of freedom s = the sample standard deviation TFB 23 Interval Estimate of a Population Mean: s Unknown  Example: Apartment Rents A reporter for a student newspaper is writing an article on the cost of off-campus housing. A sample of 16 one-bedroom apartments within a half-mile of campus resulted in a sample mean of $750 per month and a sample standard deviation of $55. Let us provide a 95% confidence interval estimate of the mean rent per month for the population of onebedroom efficiency apartments within a half-mile of campus. We will assume this population to be normally distributed. TFB 24 Interval Estimate of a Population Mean: s Unknown At 95% confidence,  = .05, and /2 = .025. t.025 is based on n - 1 = 16 - 1 = 15 degrees of freedom. In the t distribution table we see that t.025 = 2.131. TFB 25 Interval Estimate of a Population Mean: s Unknown  Interval Estimate x  t.025 s n Margin of Error 55 750  2.131  750  29.30 16 We are 95% confident that the mean rent per month for the population of one-bedroom apartments within a half-mile of campus is between $720.70 and $779.30. TFB 27 26 Excel TDIST TFB 27 Excel TINV TFB 28 Interval Estimate of a Population Mean: s Unknown  Adequate Sample Size In most applications, a sample size of n = 30 is adequate when using the expression to develop an interval estimate of a population mean. If the population distribution is highly skewed or contains outliers, a sample size of 50 or more is recommended. TFB 29 Interval Estimate of a Population Mean: s Unknown  Adequate Sample Size (continued) If the population is not normally distributed but is roughly symmetric, a sample size as small as 15 will suffice. If the population is believed to be at least approximately normal, a sample size of less than 15 can be used. TFB 30 Summary of Interval Estimation Procedures for a Population Mean Can the population standard deviation s be assumed known ? Yes Use the sample standard deviation s to estimate s s Known Case Use x  z /2 s n No s Unknown Case Use x  t /2 s n TFB 31 Sample Size for an Interval Estimate of a Population Mean Let E = the desired margin of error. E is the amount added to and subtracted from the point estimate to obtain an interval estimate. If a desired margin of error is selected prior to sampling, the sample size necessary to satisfy the margin of error can be determined. TFB 32 Sample Size for an Interval Estimate of a Population Mean  Margin of Error E  z /2  s n Necessary Sample Size ( z / 2 ) 2 s 2 n E2 TFB 33 Sample Size for an Interval Estimate of a Population Mean The Necessary Sample Size equation requires a value for the population standard deviation s . If s is unknown, a preliminary or planning value for s can be used in the equation. 1. Use the estimate of the population standard deviation computed in a previous study. 2. Use a pilot study to select a preliminary study and use the sample standard deviation from the study. 3. Use judgment or a “best guess” for the value of s . TFB 34 Sample Size for an Interval Estimate of a Population Mean  Example: Discount Sounds Recall that Discount Sounds is evaluating a potential location for a new retail outlet, based in part, on the mean annual income of the individuals in the marketing area of the new location. Suppose that Discount Sounds’ management team wants an estimate of the population mean such that there is a .95 probability that the sampling error is $500 or less. How large a sample size is needed to meet the required precision? TFB 35 Sample Size for an Interval Estimate of a Population Mean z /2 s n  500 At 95% confidence, z.025 = 1.96. Recall that s = 4,500. (1.96)2 (4, 500)2 n  311.17  312 2 (500) A sample of size 312 is needed to reach a desired precision of + $500 at 95% confidence. TFB 36 Interval Estimate of a Population Proportion The general form of an interval estimate of a population proportion is p  Margin of Error TFB 37 Interval Estimate of a Population Proportion The sampling distribution of plays a key role in computing the margin of error for this interval estimate. The sampling distribution of can be approximated by a normal distribution whenever np > 5 and n(1 – p) > 5. TFB 38 Interval Estimate of a Population Proportion  Normal Approximation of Sampling Distribution of p Sampling distribution of p /2 p(1 - p) sp  n 1 -  of all p values z /2s p p /2 p z /2s p TFB 39 Interval Estimate of a Population Proportion  Interval Estimate p  z / 2 p (1 - p ) n where: 1 - is the confidence coefficient z/2 is the z value providing an area of /2 in the upper tail of the standard normal probability distribution p is the sample proportion TFB 40 Interval Estimate of a Population Proportion  Example: Political Science, Inc. Political Science, Inc. (PSI) specializes in voter polls and surveys designed to keep political office seekers informed of their position in a race. Using telephone surveys, PSI interviewers ask registered voters who they would vote for if the election were held that day. TFB 41 Interval Estimate of a Population Proportion  Example: Political Science, Inc. In a current election campaign, PSI has just found that 220 registered voters, out of 500 contacted, favor a particular candidate. PSI wants to develop a 95% confidence interval estimate for the proportion of the population of registered voters that favor the candidate. TFB 42 Interval Estimate of a Population Proportion p  z / 2 p (1 - p ) n where: n = 500, p = 220/500 = .44, z/2 = 1.96 .44(1 - .44) = .44 + .0435 .44  1.96 500 PSI is 95% confident that the proportion of all voters that favor the candidate is between .3965 and .4835. TFB 43 Sample Size for an Interval Estimate of a Population Proportion  Margin of Error E  z / 2 p (1 - p ) n Solving for the necessary sample size, we get ( z / 2 ) 2 p (1 - p ) n E2 However, p will not be known until after we have selected the sample. We will use the planning value p* for p. TFB 44 Sample Size for an Interval Estimate of a Population Proportion  Necessary Sample Size ( z / 2 ) 2 p* (1 - p* ) n E2 The planning value p* can be chosen by: 1. Using the sample proportion from a previous sample of the same or similar units, or 2. Selecting a preliminary sample and using the sample proportion from this sample. 3. Use judgment or a “best guess” for a p* value. 4. Otherwise, use .50 as the p* value. TFB 45 Sample Size for an Interval Estimate of a Population Proportion  Example: Political Science, Inc. Suppose that PSI would like a .99 probability that the sample proportion is within + .03 of the population proportion. How large a sample size is needed to meet the required precision? (A previous sample of similar units yielded .44 for the sample proportion.) TFB 46 Sample Size for an Interval Estimate of a Population Proportion z /2 p(1 - p )  .03 n At 99% confidence, z.005 = 2.576. Recall that p = .44. n ( z /2 )2 p(1 - p) E2 (2.576)2 (.44)(.56)   1817 2 (.03) A sample of size 1817 is needed to reach a desired precision of + .03 at 99% confidence. TFB 47 Sample Size for an Interval Estimate of a Population Proportion Note: We used .44 as the best estimate of p in the preceding expression. If no information is available about p, then .5 is often assumed because it provides the highest possible sample size. If we had used p = .5, the recommended n would have been 1843. TFB 48 End Statistical Inference – Part I Interval Estimation TFB 49 Statistical Inference – Part II Hypothesis Testing  Developing Null and Alternative Hypotheses  Type I and Type II Errors  Population Mean: s Known  Population Mean: s Unknown  Population Proportion  Hypothesis Testing and Decision Making  Calculating the Probability of Type II Errors  Determining the Sample Size for a Hypothesis Test About a Population mean TFB 1 Hypothesis Testing  Hypothesis testing can be used to determine whether a statement about the value of a population parameter should or should not be rejected.  The null hypothesis, denoted by H0 , is a tentative assumption about a population parameter.  The alternative hypothesis, denoted by Ha, is the opposite of what is stated in the null hypothesis.  The hypothesis testing procedure uses data from a sample to test the two competing statements indicated by H0 and Ha. TFB 2 Developing Null and Alternative Hypotheses • It is not always obvious how the null and alternative hypotheses should be formulated. • Care must be taken to structure the hypotheses appropriately so that the test conclusion provides the information the researcher wants. • The context of the situation is very important in determining how the hypotheses should be stated. • In some cases it is easier to identify the alternative hypothesis first. In other cases the null is easier. • Correct hypothesis formulation will take practice. TFB 3 Developing Null and Alternative Hypotheses  Alternative Hypothesis as a Research Hypothesis • Many applications of hypothesis testing involve an attempt to gather evidence in support of a research hypothesis. • In such cases, it is often best to begin with the alternative hypothesis and make it the conclusion that the researcher hopes to support. • The conclusion that the research hypothesis is true is made if the sample data provide sufficient evidence to show that the null hypothesis can be rejected. TFB 4 Developing Null and Alternative Hypotheses  Alternative Hypothesis as a Research Hypothesis • Example: A new teaching method is developed that is believed to be better than the current method. • Alternative Hypothesis: The new teaching method is better. • Null Hypothesis: The new method is no better than the old method. TFB 5 Developing Null and Alternative Hypotheses  Alternative Hypothesis as a Research Hypothesis • Example: A new sales force bonus plan is developed in an attempt to increase sales. • Alternative Hypothesis: The new bonus plan increase sales. • Null Hypothesis: The new bonus plan does not increase sales. TFB 6 Developing Null and Alternative Hypotheses  Alternative Hypothesis as a Research Hypothesis • Example: A new drug is developed with the goal of lowering blood pressure more than the existing drug. • Alternative Hypothesis: The new drug lowers blood pressure more than the existing drug. • Null Hypothesis: The new drug does not lower blood pressure more than the existing drug. TFB 7 Developing Null and Alternative Hypotheses  Null Hypothesis as an Assumption to be Challenged • We might begin with a belief or assumption that a statement about the value of a population parameter is true. • We then using a hypothesis test to challenge the assumption and determine if there is statistical evidence to conclude that the assumption is incorrect. • In these situations, it is helpful to develop the null hypothesis first. TFB 8 Developing Null and Alternative Hypotheses  Null Hypothesis as an Assumption to be Challenged • Example: The label on a soft drink bottle states that it contains 67.6 fluid ounces. • Null Hypothesis: The label is correct. m > 67.6 ounces. • Alternative Hypothesis: The label is incorrect. m < 67.6 ounces. TFB 9 Summary of Forms for Null and Alternative Hypotheses about a Population Mean  The equality part of the hypotheses always appears in the null hypothesis.  In general, a hypothesis test about the value of a population mean mmust take one of the following three forms (where m0 is the hypothesized value of the population mean). H 0 : m  m0 H a : m  m0 H 0 : m  m0 H a : m  m0 H 0 : m  m0 H a : m  m0 One-tailed (lower-tail) One-tailed (upper-tail) Two-tailed TFB 10 Null and Alternative Hypotheses  Example: Metro EMS A major west coast city provides one of the most comprehensive emergency medical services in the world. Operating in a multiple hospital system with approximately 20 mobile medical units, the service goal is to respond to medical emergencies with a mean time of 12 minutes or less. The director of medical services wants to formulate a hypothesis test that could use a sample of emergency response times to determine whether or not the service goal of 12 minutes or less is being achieved. TFB 11 Null and Alternative Hypotheses H0: m The emergency service is meeting the response goal; no follow-up action is necessary. Ha:m The emergency service is not meeting the response goal; appropriate follow-up action is necessary. where: m = mean response time for the population of medical emergency requests TFB 12 Type I Error  Because hypothesis tests are based on sample data, we must allow for the possibility of errors.  A Type I error is rejecting H0 when it is true.  The probability of making a Type I error when the null hypothesis is true as an equality is called the level of significance.  Applications of hypothesis testing that only control the Type I error are often called significance tests. TFB 13 Type II Error  A Type II error is accepting H0 when it is false.  It is difficult to control for the probability of making a Type II error.  Statisticians avoid the risk of making a Type II error by using “do not reject H0” and not “accept H0”. TFB 14 Type I and Type II Errors Population Condition Conclusion H0 True (m < 12) H0 False (m > 12) Accept H0 (Conclude m < 12) Correct Decision Type II Error Type I Error Correct Decision Reject H0 (Conclude m > 12) TFB 15 p-Value Approach to One-Tailed Hypothesis Testing  The p-value is the probability, computed using the test statistic, that measures the support (or lack of support) provided by the sample for the null hypothesis.  If the p-value is less than or equal to the level of significance , the value of the test statistic is in the rejection region.  Reject H0 if the p-value <  . TFB 16 Suggested Guidelines for Interpreting p-Values  Less than .01 Overwhelming evidence to conclude Ha is true.  Between .01 and .05 Strong evidence to conclude Ha is true.  Between .05 and .10 Weak evidence to conclude Ha is true.  Greater than .10 Insufficient evidence to conclude Ha is true. TFB 17 Lower-Tailed Test About a Population Mean: s Known  p-Value < , so reject H0. p-Value Approach  = .10 Sampling distribution x  m0 of z  s/ n p-value 7 z z = -z = -1.46 -1.28 0 TFB 18 Upper-Tailed Test About a Population Mean: s Known  p-Value < , so reject H0. p-Value Approach Sampling distribution x  m0 of z  s/ n  = .04 p-Value  z 0 z = 1.75 z= 2.29 TFB 19 Critical Value Approach to One-Tailed Hypothesis Testing  The test statistic z has a standard normal probability distribution.  We can use the standard normal probability distribution table to find the z-value with an area of  in the lower (or upper) tail of the distribution.  The value of the test statistic that established the boundary of the rejection region is called the critical value for the test.  The rejection rule is: • Lower tail: Reject H0 if z < -z • Upper tail: Reject H0 if z > z TFB 20 Lower-Tailed Test About a Population Mean: s Known  Critical Value Approach Sampling distribution x  m0 of z  s/ n Reject H0  Do Not Reject H0 z z = 1.28 0 TFB 21 Upper-Tailed Test About a Population Mean: s Known  Critical Value Approach Sampling distribution x  m0 of z  s/ n Reject H0 Do Not Reject H0  z 0 z = 1.645 TFB 22 Steps of Hypothesis Testing Step 1. Develop the null and alternative hypotheses. Step 2. Specify the level of significance . Step 3. Collect the sample data and compute the value of the test statistic. p-Value Approach Step 4. Use the value of the test statistic to compute the p-value. Step 5. Reject H0 if p-value < . TFB 23 Steps of Hypothesis Testing Critical Value Approach Step 4. Use the level of significanceto determine the critical value and the rejection rule. Step 5. Use the value of the test statistic and the rejection rule to determine whether to reject H0. TFB 24 One-Tailed Tests About a Population Mean: s Known  Example: Metro EMS The response times for a random sample of 40 medical emergencies were tabulated. The sample mean is 13.25 minutes. The population standard deviation is believed to be 3.2 minutes. The EMS director wants to perform a hypothesis test, with a .05 level of significance, to determine whether the service goal of 12 minutes or less is being achieved. TFB 25 One-Tailed Tests About a Population Mean: s Known  p -Value and Critical Value Approaches 1. Develop the hypotheses. H0: m Ha:m 2. Specify the level of significance.  = .05 3. Compute the value of the test statistic. x  m 13.25  12 z   2.47 s / n 3.2 / 40 TFB 26 One-Tailed Tests About a Population Mean: s Known  p –Value Approach 4. Compute the p –value. For z = 2.47, cumulative probability = .9932. p–value = 1  .9932 = .0068 5. Determine whether to reject H0. Because p–value = .0068 <  = .05, we reject H0. There is sufficient statistical evidence to infer that Metro EMS is not meeting the response goal of 12 minutes. TFB 27 One-Tailed Tests About a Population Mean: s Known  p –Value Approach Sampling distribution x  m0 of z  s/ n  = .05 p-value  z 0 z = 1.645 z= 2.47 TFB 28 One-Tailed Tests About a Population Mean: s Known  Critical Value Approach 4. Determine the critical value and rejection rule. For  = .05, z.05 = 1.645 Reject H0 if z > 1.645 5. Determine whether to reject H0. Because 2.47 > 1.645, we reject H0. There is sufficient statistical evidence to infer that Metro EMS is not meeting the response goal of 12 minutes. TFB 29 p-Value Approach to Two-Tailed Hypothesis Testing  Compute the p-value using the following three steps: 1. Compute the value of the test statistic z. 2. If z is in the upper tail (z > 0), compute the probability that z is greater than or equal to the value of the test statistic. If z is in the lower tail (z < 0), compute the probability that z is less than or equal to the value of the test statistic. 3. Double the tail area obtained in step 2 to obtain the p –value.  The rejection rule: Reject H0 if the p-value <  . TFB 30 Critical Value Approach to Two-Tailed Hypothesis Testing  The critical values will occur in both the lower and upper tails of the standard normal curve.  Use the standard normal probability distribution table to find z/2 (the z-value with an area of /2 in the upper tail of the distribution).  The rejection rule is: Reject H0 if z < -z/2 or z > z/2. TFB 31 Two-Tailed Tests About a Population Mean: s Known  Example: Glow Toothpaste The production line for Glow toothpaste is designed to fill tubes with a mean weight of 6 oz. Periodically, a sample of 30 tubes will be selected in order to check the filling process. Quality assurance procedures call for the continuation of the filling process if the sample results are consistent with the assumption that the mean filling weight for the population of toothpaste tubes is 6 oz.; otherwise the process will be adjusted. TFB 32 Two-Tailed Tests About a Population Mean: s Known  Example: Glow Toothpaste Assume that a sample of 30 toothpaste tubes provides a sample mean of 6.1 oz. The population standard deviation is believed to be 0.2 oz. Perform a hypothesis test, at the .03 level of significance, to help determine whether the filling process should continue operating or be stopped and corrected. TFB 33 Two-Tailed Tests About a Population Mean: s Known  p –Value and Critical Value Approaches 1. Determine the hypotheses. H0: m Ha: m  6 2. Specify the level of significance.  = .03 3. Compute the value of the test statistic. x  m0 6.1  6 z   2.74 s / n .2 / 30 TFB 34 Two-Tailed Tests About a Population Mean: s Known  p –Value Approach 4. Compute the p –value. For z = 2.74, cumulative probability = .9969 p–value = 2(1  .9969) = .0062 5. Determine whether to reject H0. Because p–value = .0062 <  = .03, we reject H0. There is sufficient statistical evidence to infer that the alternative hypothesis is true (i.e. the mean filling weight is not 6 ounces). TFB 35 Two-Tailed Tests About a Population Mean: s Known  p-Value Approach 1/2 p -value = .0031 1/2 p -value = .0031 /2 = /2 = .015 .015 z z = -2.74 -z/2 = -2.17 0 z/2 = 2.17 z = 2.74 TFB 36 Two-Tailed Tests About a Population Mean: s Known  Critical Value Approach 4. Determine the critical value and rejection rule. For /2 = .03/2 = .015, z.015 = 2.17 Reject H0 if z < -2.17 or z > 2.17 5. Determine whether to reject H0. Because 2.74 > 2.17, we reject H0. There is sufficient statistical evidence to infer that the alternative hypothesis is true (i.e. the mean filling weight is not 6 ounces). TFB 37 Two-Tailed Tests About a Population Mean: s Known  Critical Value Approach Sampling distribution x  m0 of z  s/ n Reject H0 Reject H0 Do Not Reject H0 /2 = .015 -2.17 /2 = .015 0 2.17 z TFB 38 Confidence Interval Approach to Two-Tailed Tests About a Population Mean  Select a simple random sample from the population and use the value of the sample mean x to develop the confidence interval for the population mean m. (Confidence intervals are covered in Chapter 8.)  If the confidence interval contains the hypothesized value m0, do not reject H0. Otherwise, reject H0. (Actually, H0 should be rejected if m0 happens to be equal to one of the end points of the confidence interval.) TFB 39 Confidence Interval Approach to Two-Tailed Tests About a Population Mean The 97% confidence interval for m is x  z / 2 s  6.1  2.17(.2 30)  6.1  .07924 n or 6.02076 to 6.17924 Because the hypothesized value for the population mean, m0 = 6, is not in this interval, the hypothesis-testing conclusion is that the null hypothesis, H0: m = 6, can be rejected. TFB 40 Tests About a Population Mean: s Unknown  Test Statistic x  m0 t s/ n This test statistic has a t distribution with n - 1 degrees of freedom. TFB 41 Tests About a Population Mean: s Unknown  Rejection Rule: p -Value Approach Reject H0 if p –value <   Rejection Rule: Critical Value Approach H0: mm Reject H0 if t < -t H0: mm Reject H0 if t > t H0: mm Reject H0 if t < - t or t > t TFB 42 p -Values and the t Distribution  The format of the t distribution table provided in most statistics textbooks does not have sufficient detail to determine the exact p-value for a hypothesis test.  However, we can still use the t distribution table to identify a range for the p-value.  An advantage of computer software packages is that the computer output will provide the p-value for the t distribution. TFB 43 Example: Highway Patrol  One-Tailed Test About a Population Mean: s Unknown A State Highway Patrol periodically samples vehicle speeds at various locations on a particular roadway. The sample of vehicle speeds is used to test the hypothesis H0: m < 65. The locations where H0 is rejected are deemed the best locations for radar traps. At Location F, a sample of 64 vehicles shows a mean speed of 66.2 mph with a standard deviation of 4.2 mph. Use  = .05 to test the hypothesis. TFB 44 One-Tailed Test About a Population Mean: s Unknown  p –Value and Critical Value Approaches 1. Determine the hypotheses. H0: m < 65 Ha: m > 65 2. Specify the level of significance.  = .05 3. Compute the value of the test statistic. t x  m0 66.2  65   2.286 s / n 4.2 / 64 TFB 45 One-Tailed Test About a Population Mean: s Unknown  p –Value Approach 4. Compute the p –value. For t = 2.286, the p–value must be less than .025 (for t = 1.998) and greater than .01 (for t = 2.387). .01 < p–value < .025 5. Determine whether to reject H0. Because p–value <  = .05, we reject H0. We are at least 95% confident that the mean speed of vehicles at Location F is greater than 65 mph. TFB 46 One-Tailed Test About a Population Mean: s Unknown  Critical Value Approach 4. Determine the critical value and rejection rule. For  = .05 and d.f. = 64 – 1 = 63, t.05 = 1.669 Reject H0 if t > 1.669 5. Determine whether to reject H0. Because 2.286 > 1.669, we reject H0. We are at least 95% confident that the mean speed of vehicles at Location F is greater than 65 mph. Location F is a good candidate for a radar trap. TFB 47 One-Tailed Test About a Population Mean: s Unknown Reject H0 Do Not Reject H0 0  t = 1.669 t TFB 48 A Summary of Forms for Null and Alternative Hypotheses About a Population Proportion The equality part of the hypotheses always appears in the null hypothesis.  In general, a hypothesis test about the value of a population proportion p must take one of the following three forms (where p0 is the hypothesized value of the population proportion).  H0: p > p0 H0: p < p0 H0: p = p0 Ha: p < p0 Ha: p > p0 Ha: p ≠ p0 One-tailed (lower tail) One-tailed (upper tail) Two-tailed TFB 49 Tests About a Population Proportion  Test Statistic z p  p0 sp where: sp  p0 (1  p0 ) n assuming np > 5 and n(1 – p) > 5 TFB 50 Tests About a Population Proportion  Rejection Rule: p –Value Approach Reject H0 if p –value <   Rejection Rule: Critical Value Approach H0: pp Reject H0 if z > z H0: pp Reject H0 if z < -z H0: pp Reject H0 if z < -z or z > z TFB 51 Two-Tailed Test About a Population Proportion  Example: National Safety Council (NSC) For a Christmas and New Year’s week, the National Safety Council estimated that 500 people would be killed and 25,000 injured on the nation’s roads. The NSC claimed that 50% of the accidents would be caused by drunk driving. A sample of 120 accidents showed that 67 were caused by drunk driving. Use these data to test the NSC’s claim with  = .05. TFB 52 Two-Tailed Test About a Population Proportion  p –Value and Critical Value Approaches 1. Determine the hypotheses. H 0 : p  .5 H a : p  .5 2. Specify the level of significance.  = .05 3. Compute the value of the test statistic. a common error is using p in this formula sp  z p0 (1  p0 ) .5(1  .5)   .045644 n 120 p  p0 sp (67 /120)  .5   1.28 .045644 TFB 53 Two-Tailed Test About a Population Proportion  pValue Approach 4. Compute the p -value. For z = 1.28, cumulative probability = .8997 p–value = 2(1  .8997) = .2006 5. Determine whether to reject H0. Because p–value = .2006 >  = .05, we cannot reject H0. TFB 54 Two-Tailed Test About a Population Proportion  Critical Value Approach 4. Determine the criticals value and rejection rule. For /2 = .05/2 = .025, z.025 = 1.96 Reject H0 if z < -1.96 or z > 1.96 5. Determine whether to reject H0. Because 1.278 > -1.96 and < 1.96, we cannot reject H0. TFB 55 Type I and Type II Errors Population Condition Conclusion H0 True (m < 12) H0 False (m > 12) Accept H0 (Conclude m < 12) Correct Decision Type II Error Type I Error Correct Decision Reject H0 (Conclude m > 12) TFB 56 Hypothesis Testing and Decision Making  In many decision-making situations the decision maker may want, and in some cases may be forced, to take action with both the conclusion do not reject H0 and the conclusion reject H0.  In such situations, it is recommended that the hypothesis-testing procedure be extended to include consideration of making a Type II error. TFB 57 Calculating the Probability of a Type II Error in Hypothesis Tests About a Population Mean 1. Formulate the null and alternative hypotheses. 2. Using the critical value approach, use the level of significance  to determine the critical value and the rejection rule for the test. 3. Using the rejection rule, solve for the value of the sample mean corresponding to the critical value of the test statistic. TFB 58 Calculating the Probability of a Type II Error in Hypothesis Tests About a Population Mean 4. Use the results from step 3 to state the values of the sample mean that lead to the acceptance of H0; this defines the acceptance region. 5. Using the sampling distribution of x for a value of m satisfying the alternative hypothesis, and the acceptance region from step 4, compute the probability that the sample mean will be in the acceptance region. (This is the probability of making a Type II error at the chosen level of m.) TFB 59 Calculating the Probability of a Type II Error  Example: Metro EMS (revisited) Recall that the response times for a random sample of 40 medical emergencies were tabulated. The sample mean is 13.25 minutes. The population standard deviation is believed to be 3.2 minutes. The EMS director wants to perform a hypothesis test, with a .05 level of significance, to determine whether or not the service goal of 12 minutes or less is being achieved. TFB 60 Calculating the Probability of a Type II Error 1. Hypotheses are: H0: m and Ha:m 2. Rejection rule is: Reject H0 if z > 1.645 3. Value of the sample mean that identifies the rejection region: x  12 z  1.645 3.2 / 40  3.2  x  12  1.645    12.8323  40  4. We will accept H0 when x < 12.8323 TFB 61 Calculating the Probability of a Type II Error 5. Probabilities that the sample mean will be in the acceptance region: 12.8323  m Values of mb 1-b 3.2 / 40 z 14.0 13.6 13.2 12.8323 12.8 12.4 12.0001 -2.31 -1.52 -0.73 0.00 0.06 0.85 1.645 .0104 .0643 .2327 .5000 .5239 .8023 .9500 .9896 .9357 .7673 .5000 .4761 .1977 .0500 TFB 62 Calculating the Probability of a Type II Error  Calculating the Probability of a Type II Error Observations about the preceding table:  When the true population mean m is close to the null hypothesis value of 12, there is a high probability that we will make a Type II error. Example: m = 12.0001, b = .9500  When the true population mean m is far above the null hypothesis value of 12, there is a low probability that we will make a Type II error. Example: m = 14.0, b = .0104 TFB 63 Power of the Test    The probability of correctly rejecting H0 when it is false is called the power of the test. For any particular value of m, the power is 1 – b. We can show graphically the power associated with each value of m; such a graph is called a power curve. (See next slide.) TFB 64 Power Curve Probability of Correctly Rejecting Null Hypothesis 1.00 0.90 0.80 H0 False 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 11.5 m 12.0 12.5 13.0 13.5 14.0 14.5 TFB 65 Determining the Sample Size for a Hypothesis Test About a Population Mean  The specified level of significance determines the probability of making a Type I error.  By controlling the sample size, the probability of making a Type II error is controlled. TFB 66 Determining the Sample Size for a Hypothesis Test About a Population Mean Sampling distribution of x when H0 is true and m = m0 c Reject H0  H0: mm Ha:mm x m0 Sampling distribution of x when H0 is false and ma > m0 Note: b c ma x TFB 67 Determining the Sample Size for a Hypothesis Test About a Population Mean n where ( z  zb ) 2 s 2 (m 0  m a ) 2 z = z value providing an area of  in the tail zb = z value providing an area of b in the tail s= population standard deviation m0 = value of the population mean in H0 ma = value of the population mean used for the Type II error Note: In a two-tailed hypothesis test, use z /2 not z TFB 68 Determining the Sample Size for a Hypothesis Test About a Population Mean  Let’s assume that the director of medical services makes the following statements about the allowable probabilities for the Type I and Type II errors: •If the mean response time is m = 12 minutes, I am willing to risk an  = .05 probability of rejecting H0. •If the mean response time is 0.75 minutes over the specification (m = 12.75), I am willing to risk a b = .10 probability of not rejecting H0. TFB 69 Determining the Sample Size for a Hypothesis Test About a Population Mean  = .05, b = .10 z = 1.645, zb = 1.28 m0 = 12, ma = 12.75 s= 3.2 ( z  zb )2s 2 (1.645  1.28)2 (3.2)2 n   155.75  156 2 2 ( m0  ma ) (12  12.75) TFB 70 Relationship Among , b, and n    Once two of the three values are known, the other can be computed. For a given level of significance , increasing the sample size n will reduce b. For a given sample size n, decreasing  will increase b, whereas increasing  will decrease b. TFB 71 End of Statistical Inference – Part II Hypothesis Testing TFB 72
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Heya, all done! :)

1. Television viewing reached a new high when the Nielsen Company reported a mean daily
viewing time of 8.35 hours per household (USA Today, November 11, 2009). Use a normal
probability distribution with a standard deviation of 2.5 hours to answer the following
questions about daily t...


Anonymous
I use Studypool every time I need help studying, and it never disappoints.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags