UCB Aggregated Data and High Dimension Anomalies Questions Response

User Generated

0224trzzn

Mathematics

University of California Berkeley

Description

Please refer to the attached file.

Although the question is not asking to show R codes, but it is supposed to show some R codes and values from the codes


Unformatted Attachment Preview

1 Homework 7: Aggregated data and High Dimension Anomalies 1.1 Purpose Homework 7 is meant to give you some practice with WLS and play around with the high dimensional estimation problems. Any line starting with ”side note” is something for you to think about, but no need to answer to it. 1.2 Weighted Least Squares Let’s create data at the individual level according to the linear regression model Y = Xβ +  with the following specifications ind • Let i ∼ N ormal(0, 20). • Let the sample size, n, be 1000.  0 • Let β = 1.2 (the first parameter is for the constant feature). • let’s assign each sample to 20 exclusive groups according to the following multinomial distribution: P (i ∈ Group k) ∝ √1k where k ∈ {1, . . . , 20} ind • Define the non-constant feature, Xi ∼ Binomial(n = 100, p = 0.5), ∀i Create the aggregated data by simply averaging the data points within each group for the X and Y values. Let’s call these aggregated data values X̄k , and Ȳk .  Let the vectorized version of the data be denoted as Ȳ and let X̄ =  1 X̄1  .. ..  . .  1 X̄20 1.2.1 Q0 TRUE/FALSE, the aggregated data will always have a higher correlation between X and Y than the correlation at the individual level. Note, ignore the constant feature. 1.2.2 Q1 TRUE/FALSE, Ȳ = X̄β + γ where γ only depends on  1.2.3 Q2 Let γ = Ȳ − X̄β, TRUE/FALSE, E(γ|X̄) = 0 1 1.2.4 Q3 What is the analytical expression for V ar(γk |X) and Cov(γk , γm |X) where k 6= m. Please express the solution in terms of V ar() = σ 2 and the sample sizes of the different groups. You should assume the group assignments are given. 1.2.5 Q4 TRUE/FALSE, using OLS on the aggregated data will produce unbiased estimates for β. 1.2.6 Q5 TRUE/FALSE, using OLS on the aggregated data vs using OLS on the individual level data will produce the same exact estimates for β. 1.2.7 Q6 If we only had access to the aggregate data, please produce the point-wise 95% confidence interval for β if we used OLS (i.e. pretend the variances are constant) and compare that to the interval created using WLS (i.e. the correct calculation). 1.2.8 Q7 Continuing Q6, which one would you recommend using? 1.2.9 Q8 Compute the point-wise 95% confidence interval for β using the individual level data using OLS. Side note, you should wonder if using the individual data is always preferable despite the calculation from Q3. For the following problems, let’s change the data generation process slightly: ind let Xi ∼ Binomial(n = 100, p = k−10 200 + 0.5), i.e. group 1 is distributed −9 −8 according to p = 200 + 0.5, group 2 has p = 200 + 0.5, etc. There are still 20 groups. Side note, you can imagine the group are different neighborhoods. X is your parents’ income when you were born and Y is the base salary of your first job (all in weird units). 1.2.10 Q9 Compare the point-wise 95% confidence interval for β1 using OLS at the individual level data vs the method chosen in Q7 with the aggregate data. Which one would you recommend? 2 1.2.11 Q10 Using the individual level data and OLS, please write the code that produces the the point-wise 95% prediction interval for new Y values for each hypothetical X values, 0, 1, . . . , 100. Please make the interval center the regression line. No need to report numbers, just the code is sufficient. Again, the prediction interval is the interval that will capture 95% of the cases when predicting new data points. 1.2.12 Q11 For this problem, assume you only have access to the aggregate data. Side note: if you were to create a prediction interval based on the aggregate data, you would need X̄new AND its corresponding group size (notice how WLS assumes the weights are known). When you apply these intervals to individuals, this is how ecological correlation mistakes are made. Instead of creating an interval for Ȳnew , let’s create an interval for Ynew |{Xnew , X̄} by computing an interval that uses V ar(Ynew − Xnew β̂wls |X̄, Xnew ), estimates σ̂ 2 under our WLS setting, and centers Xnew β̂wls . Please create a plot that compares this interval to the interval implied by your code from Q10. Side note: you should think about what’s specific about this set up is allowing us to do this? Is this calculation true for all WLS settings? 1.3 NOT-James-Stein’s estimator Let’s define MSE in estimating high dimension vectors, β, using an estimate β̂, as E(kβ − β̂k2 ). 1.3.1 Q12 What is the theoretical MSE if we estimated arbitrary β with the vector of 0’s? Side note: do not overthink. This is just to show anything CAN be an estimate for anything. 1.3.2 Q13 Under the usual regression settings, create the biased estimate β̂γ = γ ∗ β̂OLS where β̂OLS is the coefficient estimate from the regression. Calculate the theoretical mean squared error for γ β̂OLS . Express the result in terms of γ, β, X, and σ 2 and simplify as much as possible. Side note: you should know why this isn’t very useful in practice because β and σ 2 are unknown. 1.3.3 Q14 Let Y = Xβ +  where β is the 0 vector. Let  ∼ N (0, 10), n = 1000 and create 99 random features all from a uniform random variable (between 0 and 1) and 3 1 constant feature for X. Let β̂OLS be the usual regression estimate. Using the result above, with your simulated X values, write the code AND report the smallest value for γ before the MSE starts to increase again. Side note: this is intentionally similar to the simultaneous inference case. 1.3.4 Q15 To shrink a vector Z to the origin (i.e. the vector of all 0s), we can multiple Z by γ ∈ [0, 1). However, we can also shrink Z to arbitrary vector µ by calculating γ(Z − µ) + µ. Same as Q14, let Y = Xβ +  where β is the 0 vector. Let  ∼ N (0, 10), n = 1000 and create 99 random features all from a uniform random variable (between 0 and 1) and 1 constant feature for X. Let β̂OLS be the usual regression estimate. Shrink β̂OLS towards µ = 2, i.e. a vector containing 2’s and with γ = 0.99. Numerically approximate the MSE over 100 simulations for the shrink estimator and the OLS estimator. Report which estimator would you prefer if you’re optimizing for MSE for estimating β. 4
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Attached.

Running head: HOMEWORK 7: AGGREGATED DATA AND HIGH DIMENSION ANOMALIES

Homework 7: Aggregated data and High Dimension
Anomalies

Name
Institution
Date

1

HOMEWORK 7 AGGREGATED DATA AND HIGH DIMENSION ANOMALIES

1.2.1
set.seed(34903490)
N = 1000
x = ...


Anonymous
Very useful material for studying!

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags