Biostatistics homework involving SAS, parameters, statistical significance and p-values.

PH7017 Week 7 - Homework 6 Summer 2020 1. Below are the SAS results are from an investigator who is interested in whether selfreported problems sleeping (SLQ050 – Yes/no) is associated with the number of hours a person sleeps on average each night (SLD010H) using data from National Health and Nutrition Examination survey 2009-10. Based on this SAS output answer the following questions: a. What are the appropriate hypotheses for this research question using the appropriate population parameter(s) for SAS output for a t-test? b. Based on the output, which p-value tests the following hypotheses if you want to use a two-sample independent pooled t-test? Provide the specific p-value and briefly interpret the p-value in the context of the question. c. From the output, what is the estimate of the pooled standard deviation? d. From the output, what is the estimated mean difference of hours slept between those who report trouble sleeping compared to those who do not? e. e. From the output, what is the standard error of the average difference between the two groups (assume the population variances are equal)? f. From the output, what is a 95% confidence interval for the population mean difference of hours slept between those who report trouble sleeping compared to those who do not (assume the population variances are equal for this calculation)? g. Calculate a 99% confidence interval for the population mean difference of hours slept between those who report trouble sleeping compared to those who do not (assume the population variances are equal for this calculation)? h. From the output, what is the standard error and 95% confidence interval of the average number of hours those who report having trouble sleeping—actual sleep? 2. Eight people participated in a weight loss program which lasted 3 months. The descriptive statistics of the data is provided below. Perform the hypothesis test to determine whether the program had any effect on weight at a 0.01 significance level? (“any” means in either direction, weight gain or weight loss) Variable n Before After Difference (Before – After) 8 8 8 Sample mean (lbs) 214.4 199.1 15.3 Sample standard deviation (lbs) 43.5 36.5 10.9 3. Complete this table identifying the correct critical value from the Chi-Square table. degrees of freedom α=.05 α=.025 α=.01 α=.005 1 10 30 40 Use 10 degrees of freedom and the critical values you looked up and draw a rough picture of the chi-square distribution and annotate the upper right tail with the critical values you looked up 4. A study performed in 1985 investigated the relationship between the duration of IUD use to infertility. A group of 89 infertile IUD users and a group of 640 control IUD users were identified and then further categorized (or subdivide) by the duration of IUD use. The data are presented in the table below infertile IUD users (cases) fertile IUD users (controls) < 3 months 10 53 3-18 months 23 200 18-36 months 20 168 > 36 months 36 219 a. What is the appropriate name of the statistical test you will use to carry out this research question? b. What is the Null and alternative hypothesis? c. Calculate the expected counts under the null hypothesis. d. Calculate the test statistic e. What is your conclusion at the 0.01 significance level- state it in words in the context of the problem? f. Write a SAS program to carry out this hypothesis test and check your answers (include program in your homework) 5. The World Health Organization (WHO) recommends HPV vaccines as part of routine vaccinations in countries that can afford them. HPV vaccine can prevent most genital warts and most cases of cervical cancer. In 2006, the FDA approved the first HPV to protect females between the ages of 9 and 26. In effort to gauge the acceptance of the HPV vaccine in the US, data was evaluated and summarized below. These are the results of 381 females aged 19-26 years who responded to questions about receiving at least 1 dose of the HPV vaccine and Demographic and sexual characteristics. a. List the characteristics that are statistically significant at the 0.10 significance level. b. Calculate a 95% confidence interval for the difference in the difference in the percent of females 19-26 years in the U.S. population who have received at least 1 dose of the HPV vaccine between those who have private health insurance and those that have none c. Using the Table 2, complete this table with the observed counts (round to whole numbers) Race/Ethnicity received at least 1 Did NOT receiving at dose of the HPV least 1 dose of the vaccine HPV vaccine White, non-Hispanic Black, non-Hispanic Mexican American d. e. f. g. h. State the null and alternative hypothesis for the table in part c. What is the name of the test you would use to test the null hypothesis? What are the expected counts for table c. Calculate the test statistic Calculate the p-value for the test statistic and compare it to the reported pvalue in the table (they should be different). This is because these data come from NHANES which uses weights and other sampling design elements to carry the hypothesis tests. The statistics we select to use must match the sampling design from which the data was sampled. The methods we have learned in this class pertain ONLY to simple random samples from infinitely large populations (practically speaking sample size < 10% of the population size)
Question 1:
(a) Two sample t-tests
Mean number of hours a person with sleep disturbance (i.e. self-reported problems
sleep) sleeps on average each night =  ys
Mean number of hours a person without sleep disturbance (i.e. self-reported problems
sleep) sleeps on average each night =  ns
Null Hypothesis
H 0 :  ys =  ns ; Average number of hours a person sleeps either with

or without sleep disturbance.
Alternate Hypothesis
H 0 :  ys   ns ; Average number of hours a person sleeps with sleep
Disturbance less than a person sleeps without sleep
Disturbance
(b) For two sample independent pooled test, p = 0.0001 .
This test is known as a two sample (or unpaired) t-test. It produces a “p-value”, which
can be used to decide whether there is evidence of a difference between the two
population means. The p-value is the probability that the difference between the
sample means is at least as large as what has been observed, under the assumption
that the population means are equal. The smaller the p-value, the more surprised we
would be by the observed difference in sample means if there really was no
difference between the population means. Therefore, the smaller the p-value, the
stronger the evidence is that the two populations have different means .
(c) Estimate of the pooled standard deviation =1.4111
(d) Estimated mean difference of hours slept between those who report trouble
sleeping compared to those who do not = −0.7578
(e) Standard error of the average difference between two groups = 0.0403
(f) 95% confidence interval for the population mean difference of hours slept between
those who report trouble sleeping compared to those who do not is (−0.8368,−0.6787)

(g) 99% confidence interval for the population mean difference of hours slept between
those who report trouble sleeping compared to those who do not is:

1 
1  
 (x − x )− t
2  1
2  1
(
)
s
+
,
x

x
+
t
s
+
p
p
ys
ns
 /2
 ys ns  / 2
n
�...

