If you recall from Chapter 5, sample statistics (sample mean and sample standard variation) are dependent or vary by sample. For example, the sample mean number of plain M&M candies in a 1.69 oz bag will be different for different bags (samples). Also recall, that one of the goals of sample statistics is to try to estimate the true population parameters (true population mean, etc.).
Confidence intervals are used to help you get a better feel for your estimated value. Confidence intervals are like nets. You don't know what the TRUE proportion value is so you throw a net (find a confidence interval based upon a survey). The confidence level indicates the percentage of times your net would "catch" the true population value. Alpha (which is 1 minus the confidence level), then indicates the percentage of times the net would NOT "catch" the true population value. When you are creating a 90% Confidence interval, what you are saying is that 90% of the time, the interval you find will contain the true parameter value. A 95% Confidence interval says that 95% of the time, the interval will contain the true parameter value.
There is a relationship between the size or width of the interval and the confidence level chosen. The higher the confidence, the wider the interval will be, because you want to be sure to include the true value more of the time.
Go to the following website and check out the applet there: http://www.ruf.rice.edu/~lane/stat_sim/conf_interval/index.html. This applet lets you visually see 95% and 99% Confidence intervals as they are simulated. It also keeps track of the number of intervals which did not contain the mean, so you can easily compare it to the confidence level. The orange lines are 95% CIs that include the true mean. The blue tips indicate the width of the 99% CIs that include the true mean. Red lines are 95% CIs that did NOT include the true mean and the white lines are 99% CIs that did NOT include the true mean.
With this applet, the TRUE population value is known and they are counting how many of the confidence intervals based on randomly generated samples "catch" the true value.
Simulate at least 5000 intervals and note the proportion that contained the mean. Note how the proportions correspond to the confidence levels. What you should observe is that the percentage that do not contain the mean will be VERY close to the alpha value.
- Report your findings.
- Explain how this helped broaden your understanding of confidence levels.
- Identify what was easy and what was challenging about this applet.