# Final Project

*label*Statistics

*timer*Asked: Mar 4th, 2016

**Question description**

In this final assignment, we will revisit datasets that we have utilized in previous assignments, but with new objectives.

- In the Week One assignment, you looked at mortality in your
particular state, with two different metrics: the first was numbers of
deaths, and the second was years of life lost. For this question, return
to the original dataset, but this time first pool all
*cancer*causes of death together, so that cancer constitutes the only category for cause of death. Then, repeat your analyses from Week One. How do your conclusions change?

- In the Week Two assignment, you looked at sex ratios for births in your state.
- Take the data you have assembled from the second part of your Week Two assignment, namely, numbers of first-born boy and girl births in your state between 2007 and 2012, separately by racial group (i.e., American Indians, Asians, Blacks, and Whites). Form a two-by-four contingency table from these data: the two row categories are female (girl) and male (boy), and the four column categories are the four racial groups. Calculate the chi-square statistic from this contingency table, and interpret the result.
- Return to the CDC Wonder website,
and obtain the numbers of births in your state between 2007 and 2012,
by month. (Disregard gender, or race, or birth order—you want all
births). Calculate a chi-square statistic to assess whether there is any
seasonality to births. (Your null hypothesis is that births should be
equally likely to occur in any of the 12 months. We are ignoring the
varying lengths of the months to simplify calculations.) How would you
interpret your findings? Explain in 500 words in APA format supported by
scholarly sources.

**BONUS:**Give a graphical representation of your findings for this portion highlighting what you consider significant.

- In the Week Three assignment, you were given levels of
tumor-associated antigens in a sample of 90 normal (non-cancer)
individuals, and 160 hepatocellular carcinoma (HCC) patients. Here is a
proposed diagnostic test for HCC:
- For each individual, calculate a numerical score:
- score = -3.95 + 10.7 * HCC1 - 4.14 * P16 + 13.95 * P53 + 28.92 * P90 + 6.48 * survivin
- (This equation was derived from logistic regression.)

- If this score is positive (i.e., > 0), diagnose this individual as an HCC patient; if this score is negative (i.e., <0), diagnose this individual as normal (i.e., non-cancer).
- Apply this rule to the entire cohort of 250 individuals. Report the sensitivity of this rule, the specificity, the false positive rate, the false negative rate, and the overall accuracy. Do you think the score function provides a good diagnostic test for HCC? Explain.

- For each individual, calculate a numerical score:

- In the Week Four assignment, we considered a simple two-by-two crossover trial of a new experimental treatment for interstitial cystitis. We calculated t tests for carryover and treatment effects, but we have not yet considered period effects. It is unlikely that there are any period effects in this trial, but we may want to test this formally. If there were a period effect, then patient responses under either treatment would likely be systematically higher in one period than the other. (Here's an analogy: Think of taking the same test twice. You would likely perform better on the test the second time, since you have learned from your experience of taking the first test.) Explain how you would devise a t test for assessing a period effect in this trial. (Hint: look at the explanation of the t test for treatment effects given in the Week Four assignment. There, we based the test on the random variable X - Y. Suppose we look instead at X + Y?)

- In the Week Five assignment, you investigated measures of brain
size and intelligence in a sample of 20 youths. A potential shortcoming
of your prior analyses is that you did not take into account all
available information in the dataset, in particular, gender. Answer the
following questions and explain your answers:
- Do any of the physiologic variables CCSA, HC, TOTSA, TOTVOL, and WEIGHT differ significantly between males and females?
- Do IQs differ significantly by gender?
- Undertake a paired analysis of IQs, in order to assess whether firstborns have higher IQs than non-firstborns. In this regard, there are 10 pairs of related youths, as denoted by the variable PAIR.

**Completing the Final Project**The Final Project:

- Must include a title page with the following:
- Title of paper
- Student’s name
- Course name and number
- Instructor’s name
- Date submitted

- Must begin with an introductory paragraph that has a succinct thesis statement.
- Must address the topic of the paper with critical thought.
- Must end with a conclusion that reaffirms your thesis.
- Must use at least three scholarly, peer-reviewed sources published within the last five years (not including the course text) or those applicable to the data sets.
- Must document all sources in APA style.
- Must include a separate reference page, formatted according to APA style. The number of pages must be applicable to the specific data sets outlined in the Final Project assignment.