Unformatted Attachment Preview
R.H.
Probability and Statistics: Lab Assignment 4 @ UCU
Spring 2018
Probability and Statistics
Lab assignment 4: Hypothesis testing and Linear regression
General comments:
• Complete solution will give you 5 points (out of 100 total). Submission deadline — June 03 at 18:00.
• The preferred (and strongly advised) language is R (https://www.r-project.org/). It can be installed from
the official site; RStudio (https://www.rstudio.com/) is a convenient GUI
• You will need just a few basic R commands to complete the task. As a quick reference guide, use the official
manual https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf or help section of RStudio
• The assignment must be prepared as a Jupyter notebook and submitted to cms. To use R within Jupyter, you
will have to install the R kernel (available on https://irkernel.github.io/)
• For each task, include the corresponding R code (usually it is just a couple of lines long), the statistics obtained
(like sample mean or anything else you use to complete the task) and make conclusions whether to accept the
null hypothesis
• The id number referred to in tasks is your ordinal number in the student list on cms (see the attached file).
Observe that the answers do depend on this id number!
Part I: Hypothesis testing
For problems 1–4 generate the data as follows: set
ak := {k ln (k 2 n + π)},
k ≥ 1,
where {x} := x − [x] is the fractional part of a number x and n is your id number. Sample realizations X1 , . . . , X100
and Y1 , . . . , Y50 from the hypothetical normal distributions N (µ1 , σ12 ) and N (µ2 , σ22 ) respectively are obtained as
xk = Φ−1 (ak ),
−1
yl = Φ
k = 1, . . . , 100,
(al+100 ),
l = 1, . . . , 50,
where Φ is the cumulative distribution function of N (0, 1) and Φ−1 is its inverse.
In R, you can define a function f calculating ak from k, then apply f to the whole list of k’s to get the list a.data
of ak , and, finally get xk and yk by running qnorm on a.data.
Instructions: In problems 1–4, test H0 vs H1 . To this end,
• point out what standard test you use and why;
• indicate the general form of the rejection region of the test H0 vs H1 of level 0.05;
• find out if H0 should be rejected on the significance level 0.05;
• indicate the p-value of the test and comment whether you would reject H0 for that value of p and why
Problem 1. H0 : µ1 = 0 vs. H1 : µ < 0;
σ12 is unknown.
Problem 2. H0 : µ1 = µ2 vs. H1 : µ1 6= µ2 ;
Problem 3. H0 : σ12 = 1 vs. H1 : σ12 6= 1;
Problem 4. H0 : σ12 = σ22 vs. H1 : σ12 > σ22 ;
σ12 = σ22 = 2.
µ1 = 0.
µ1 and µ2 are unknown.
Hint: this is the f -test; read the details in Ross, p. 321-323
1
R.H.
Probability and Statistics: Lab Assignment 4 @ UCU
Spring 2018
Part II: Simple linear regression
Consider the simple linear regression model
Yk = a + bxk + εk ,
in which ε1 , . . . , ε50 are i.i.d. rv’s with normal distribution N (0, σ 2 ). Generate the data (xk , yk ), k = 1, . . . , 50 as
follows:
xk := 10 · 1 + cos(kn) ,
yk := sin n + cos(k 2 ) + cos n + sin(k 2 )/k · 1 + sin(k 2 )/k · xk ,
where n is your id number.
Problem 5. (a) Find estimate â, b̂, σ̂ 2 of the parameters a, b and σ 2 ;
(b) test the hypothesis H0 : b = 0 vs the general alternative;
(c) find the determination coefficient r2 and comment on whether the linear model is adequate;
(d) find the confidence interval for Y at x = 0 and x = 20.
Hint: all this can be done with one single command lm
2