Research Methods HW#7
Professor Shea
POLS 6380
1. (2 points) What happens to our OLS inference if we multiply some value by X, where X is
our explanatory variable of interest?
2. This question relies on the following file: anorexia.csv. This data examines the effectiveness
of different treatments for anorexia.
(a) (3 points) Treat the data as paired data (as it is). I’m interested in whether treatment “f” is
effective in adding weight to the patients. Derive a test statistic to determine a difference.
What is your decision with the null based on the test?
(b) (2 points) Why is it particularly difficult to infer a causal effect using a “matched pair”
test?
(c) (2 points) Repeat the same test procedure for treatment “c”, Derive a test statistic to
determine a difference. What is your decision with the null based on the test?
(d) (2 points) Repeat the same procedure for treatment “b”, Derive a test statistic to determine a difference. What is your decision with the null based on the test?
(e) (2 points) We just engaged in some “multiple testing” behavior. What is the potential
problem of multiple testing? Propose a strategy to address this “multiple testing” problem
(you don’t have to actually carry this out - just discuss it). Is there a tradeoff with your
strategy?
(f) (2 points) Now suppose we failed to recognize that the data were paired. Derive a difference in means test statistic for the “b” groups (assume unequal variance).
(g) (2 points) Was unequal variance a justifable assumption for the previous test. Use an
F-test to support your case.
3. This next section will require you to use the following dataset: “gerber green larimer.Rdata”
(this is the social pressure/civic duty data from the slides, weeks 6 and 7).
(a) (2 points) Present a descriptive graph that compares voting turnout for each treatment
group.
(b) (1 point) For now, assume that variable “voted” is normally distributed. Run a one-way
ANOVA with “treatment” as the main explanatory variable. What is the null hypothesis
of this test?
(c) (1 point) What is the sampling distribution of this test? Please identify the degrees of
freedom and where the sampling distribution is centered.
(d) (1 point) What is the explained and unexplained variance of your ANOVA test?
(e) (1 point) What’s the F-statistic derived from the ANOVA test you ran? What does it tell
us from a statistical significance standpoint?
(f) (2 points) Reconsider the outcome variable (voted). Is an ANOVA appropriate with these
data?
4. This next section will require you to use the following dataset: midtermvoteloss.csv.
Research Methods HW#7
Professor Shea
POLS 6380
(a) (2 points) Present a scatter plot of Midterm Vote Loss (the dependent variable) and Change
in Income (the independent variable). Show this scatter plot with the best fitted line.
(b) (2 points) Estimate the linear model associated with this scatter plot. Present the results
and interpret all the important information.
(c) (2 points) Provide two point predictions with the linear model (i.e. with minimum value
of Change in Income and maximum value of Change in Income, etc.).
5. This next section will require you to use the following dataset: hw7.
(a) (2 points) Ignoring what the variables are for a moment, estimate the following linear
model using OLS and report the results:
y1 = α + βx1 +
(b) (2 points) Estimate the following linear model using OLS and report the results:
y2 = α + βx2 +
(c) (2 points) Estimate the following linear model using OLS and report the results:
y3 = α + βx3 +
(d) (2 points) Estimate the following linear model using OLS and report the results:
y4 = α + βx4 +
(e) (3 points) Given what you have learned in class, how would you compare the linear models you just estimated?
Page 2
https://github.com/Tommysd123/tommy.git
You have to use these CSV files in the link
gerber_green_larimer.Rdata
hw7.csv
anorexia.csv
midtermvoteloss.csv
POLS 6480, Fall 2017
Lab assistant: Philip Waggoner
Lab Assignment 08
I. Objectives: Primary objective is to test hypotheses regarding two populations. Secondary
objective is to understand research design issues involving panel studies and
paired/related samples.
II. Datasets:
“cereal.csv” and “anorexia.csv”
III. Packages: none
IV. Preparation
1) Open RStudio by double-clicking the icon or selecting RStudio from the Windows Start menu.
2) Clear any data in memory:
> rm(list=ls())
3) Download datasets “cereal.csv” and “anorexia.csv” and place them in your working directory
4) Download R script “POLS 6480 Lab 08.R” and place it in your working directory.
5) Open the R script by typing Ctrl+O or by clicking on File in the upper-left corner, using the
dropdown menu, and navigating to the script in your working directory.
V. Instructions for Lab 08
The first dataset you will use is a sample of 24 breakfast cereals – 12 cereals intended for children
and 12 intended for adults – which you used last week. The five variables are the number of
grams per serving, the number of calories per serving, the milligrams of sodium per serving, the
grams of fiber per serving, and the grams of refined sugar per serving.
The second dataset you will use is from a well-designed experiment on treating eating disorders.
Subjects’ weights were measured before and after treatment, allowing us to calculate the change
in weight, and the study also included a control group.
A. Cereals
1. To load the first dataset, type the following lines, changing the directory if needed:
> cereal children adults m1 t.test(treatment.f$after, control$after, alt="greater")
Notice that I shortened the word alternative to alt. Answer the following three questions:
Did the t statistic change? Did the confidence interval change? Did the p value change?
8. While it is good to know that subjects in the family treatment group had higher weight after
treatment than those in the control group, it is impossible to say that the treatment had a causal
relationship, because it is conceivable that subjects in the treatment group had higher weight
before treatment. The research design utilized for questions 6 and 7 is called the “post-test only
with non-equivalent control groups” design. An alternative is the “pre-test and post-test with no
control group” design, which simply examines whether weight changed from before treatment to
after treatment. You can carry out this comparison using a two-sample test:
> t.test(treatment.f$after, treatment.f$before, alt="greater")
Is there a statistically significant difference between pre- and post-treatment weights?
Compare these results to a one-sample t test using R’s built-in t.test command.
> treatment.f$delta t.test(treatment.f$delta, mu=0, alt="greater")
Lab written by Scott Basinger, sjbasinger@uh.edu
POLS 6480, Fall 2017
Lab assistant: Philip Waggoner
Is the mean difference (reported as “mean of x” after the One Sample test) equal to the difference
of means (subtracting “mean of y” from “mean of x” reported after the Two Sample test?)
Is the t statistic for the One Sample test equal to the t statistic for the Two Sample test? Why not?
Data from an experiment with the same subjects having the response variable measured two times
(before and after treatment) are an example of paired data. To calculate the correct standard error
of the difference of means, you need to take into account that the values of the response variable
(in this case, the subject’s weight) are correlated! Patients who start heavier typically also finish
heavier. Find the correlation between pre- and post-treatment weights by typing:
> cor(treatment.f$after, treatment.f$before)
The paired-samples difference of means test requires adding a statement to the code:
> t.test(treatment.f$after, treatment.f$before, alt="greater",
paired = TRUE)
Check again: is the t statistic for the Paired Sample test equal to the t statistic for the One Sample
test that you carried out earlier?
9. The final task will be to perform what is called a differences-in-differences test. Because we
have measures of pre- and post-treatment weight for the control group also, the most persuasive
test is to compare the changes in weights for subjects in the treatment group against the changes
in weights for subjects in the control group, which must be computed before the t test:
> control$delta t.test(treatment.f$delta, control$delta, alt="greater")
Is there a statistically significant difference between the average weight change in the treatment
group (= _____ ) and the average weight change in the control group (= _____ ) ?
10. On your own, repeat 6–9 using the cognitive behavioral therapy data. Answer the following:
Is the average post-treatment weight higher for subjects receiving cognitive behavioral therapy
higher than average post-treatment weight for subjects in the control group? Is the difference
large enough to attain statistical significance?
Did cognitive behavioral therapy increase the average weight of subjects? Is the difference large
enough to attain statistical significance?
Is the difference between average weight gain of subjects receiving cognitive behavioral therapy
larger than the average weight gain of subjects in the control group? Is the difference large
enough to attain statistical significance?
Next week, we will use all three groups to perform one-way ANOVA.
11. To clear the Environment, type rm(list=ls()) or click on the broom icon.
To clear the Console window, type Ctrl-l
Lab written by Scott Basinger, sjbasinger@uh.edu
Purchase answer to see full
attachment