questions about basic linear statistics

User Generated

vqneyyr

Mathematics

sta 106

UC Davis

Description

please help me with questions 3 to 8. Please answer these questions as detail as possible. Show all the work.

Unformatted Attachment Preview

STA 106 Winter 2018 Homework 1 - Due Friday, Jan 19th Sample Mean Sample Standard Deviation Sample Size Book Homework (does not require R) Smokers 150.03 27.49 266 Nonsmokers 139.18 27.49 234 Note: This may be hand written or typed. Answers should be clearly marked. Please put your name in the upper right corner. Assume that you use the pooled variance formula. Source: Data are part of a larger case study for the 2003 Annual Meeting of the Statistical Society of Canada. 1. For the following problems, identify all possible combinations of the two categorical variables. (a) State the appropriate null and alternative hypothesis. (b) Find the test-statistic and interpret the value. (a) Grade (A, B, C, D, F) and employment status (Unemployed (U ), Employed at least part time (E)). (c) Find the approximate p-value using the t-table and interpret the value in terms of the problem. (b) Cancer status (has cancer, or does not), and age group (Young, Middle, Old). (d) State your decision and conclusion in terms of the problem if α = 0.05. (c) Smoking status (Smoker, Non-smoker), and illegal drug activity (Recently used, Used in the past, Never 6. Continue with problem 5. used). (a) Interpret a Type I error in terms of the problem. (d) Intelligence of dog (high, medium, low), and type of (b) Interpret a Type II error in terms of the problem. dog (Border Collie, German Shepard, Dachshund). (c) Calculate a 99% confidence interval for the true difference in average systolic blood pressure. 2. Assume that Y is a random variable with mean µY = 4, and standard deviation σY = 8. Find the mean and standard deviations for the following random variables: (d) Interpret the interval from (c) in terms of the problem, being as specific as you can. (a) U1 = 3 + 4Y (e) What is the largest difference between the two groups you would expect with 99% confidence? Be sure to specify the direction of the difference. (b) U2 = −10 + 2Y (c) U3 = 1/4 − Y (d) U4 = 3/4 − (1/4)Y 7. Three high schools participated in a study to evaluate the effectiveness of a new computer-based mathematics curriculum. In each school, four 24-student sections of freshman algebra were available for the study. The two types of instruction (standard, computer-based) were randomly Find the mean and variance of: Ȳ . assigned to the four sections in each of the three schools. P10 Find the mean and variance of: i=1 Yi At the end of the term, a standardized mathematics test ∗ was given to the 24 students in each section. Find the mean and variance of: Y = a + bȲ , where a and b are constants. (a) Is this an experimental, observation, or mixed study? P10 Find the mean and variance of: Y ∗ = 5 − 2 i=1 Yi Explain. 3. Assume Y1 , Y2 , . . . , Y10 denotes an independent random sample of size 10 from a population with mean 20, standard deviation 5. (a) (b) (c) (d) (b) What is the primary variable of interest (the response variable)? 4. Suppose we take three independent random samples of size 100 from three independent populations. Let population i be normally distributed, with mean µi , and standard deviation σi , i = 1, 2, 3. Identify the distribution of the following quantities, being as specific as you can (name the distribution if possible, find the mean, find the standard deviation). (c) What are the explanatory variables? Identify all levels of the explanatory variables (factor-levels) if appropriate. (d) Identify all combinations of the explanatory variables. 8. A rehabilitation center researcher was interested in examining the relationship between physical fitness prior to (b) Ȳ1 + Ȳ2 surgery of persons undergoing corrective knee surgery and Ȳ2 − Ȳ (c) Ȳ1 + 3 the time required in physical therapy until successful reha2 bilitation. Data on the number of days required for a suc(d) Ȳ1 + Ȳ3 − 2Ȳ2 cessful completion of physical therapy and the prior phys5. A random sample of 500 subjects measured their systolic ical fitness status (below average, average, above average), blood pressure, and if they were a smoker or not. The goal and the doctor they were randomly paired with (out of 3 is to evaluate if average systolic blood pressure differs by possible doctors) were collected. smoking status. Summary sample statistics on the dataset follow: (a) Is this an experimental, observation, or mixed study? Explain. (a) Ȳ1 − Ȳ2 1 (b) What is the primary variable of interest (the response variable)? (c) Create a mosaic plot of exercise and stress level. Which exercise group had the highest proportion of subjects with high stress? (c) What are the explanatory variables? Identify all levels of the explanatory variables (factor-levels) if appropriate. (d) Identify all combinations of the explanatory variables. (d) Create a mosaic plot of marriage and stress level. Which marriage group had the highest proportion of low-exercise subjects? R Homework (requires some use of R) III. Continue with the “GSK.csv” dataset. For the following problems, you must show results from either a plot, a table, or an aggregate command to back up your answers. Note: You do not have to use R Markdown to turn in the homework, but the homework must be turned in in a reasonable format. The answers to the questions should be in the body of the homework, and the code used to obtain those answers should be in an appendix. There should be no code in the body of the homework. You can accomplish this in R, Word, LaTex, Google Docs, etc. (a) Which exercise group had the most subjects? (b) Which stress group had the most highly educated subjects? (c) Which stress group had the highest average age? (d) Which gender group had the lowest average systolic blood pressure? IV. Continue with the “GSK.csv” dataset. Using R, and assuming equal variance by group, assume we want to test if the average systolic blood pressure for married vs. non-married subjects is equal. I. Online you will find the file “GSK.csv”. The csv file has the following columns: Column 1. sysbp: The systolic blood pressure of the subject (mmHg). (a) Find the test-statistic. Column 2. gender: The gender, with levels F and M. (b) Find the exact p-value. Column 3. married: Y if the subject was married, N if not. (c) Find the 95% confidence interval for the true difference. Column 4. exercise: With levels L = low, M = medium, H = high. (d) Interpret the confidence interval in terms of the problem. Column 5. age: The age of the subject in years. (e) What is your conclusion about how systolic blood pressure may differ by marriage category? Explain in detail. Column 6. stress: With levels LS = low, MS = medium, HS = high. Column 7. educatn: With levels LE = low, ME = medium, HE = high. Use this dataset in problems I, II, III, IV. Source: Data are part of a larger case study for the 2003 Annual Meeting of the Statistical Society of Canada. (a) Find the average systolic blood pressure by stress level. Which group had the highest average? (b) Find the standard deviation of systolic blood pressure by stress level. Does it appear the standard deviations are approximately equal? (c) Find the average age by exercise level. Which group has the lowest average age? (d) Find the standard deviation of age by exercise level. Which group seems to differ the most from its group mean? II. Continue with the “GSK.csv” dataset. (a) Create a boxplot of systolic blood pressure by education level. Does there appear to be a trend? Explain your answer. (b) Create a histogram of systolic blood pressure by marriage category. Does one group tend to vary more than the other? Explain your answer. 2
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Thank y...


Anonymous
Super useful! Studypool never disappoints.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags