# Regression analysis of these data

**Question description**

Kong et al. (2012) studied mutation rates in Icelandic children, specifically looking at mutations in single nucleotide polymorphisms (SNPs). As part of this study, they also recorded the age (in years) of the mother and the father. Part of their data are given below:

Father's Age |
# of SNP Mutations |

16 |
39 |

18 |
41 |

20 |
39 |

19 |
49 |

22 |
50 |

24 |
54 |

24 |
55 |

24 |
61 |

25 |
57 |

28 |
52 |

29 |
54 |

30 |
57 |

32 |
61 |

37 |
67 |

36 |
70 |

34 |
77 |

30 |
83 |

29 |
67 |

33 |
68 |

26 |
54 |

33 |
65 |

A) If you were to do a regression analysis of these data, which would be the dependent variable and which would be the independent variable? Explain your reasoning!

B) What is the regression equation for the relationship between these two variables? For this part, I want you to show me the steps you’d take to calculate the slope and the Y-intercept. You can check your results using Excel or some other program, but you will not get full credit if you simply give me the equation as generated by Excel or another program. (This is all a fancy way of saying you need to calculate this part by hand!).

C) Determine whether the slope of this equation is significantly different from zero, using a regression ANOVA approach.

