Instructions:
ECON 178 WI 2023: Homework 1
Due: Feb 7, 2023 (by 2:00pm PT)
• The homework has a total of 40 points. The TAs will randomly pick one problem to grade and this problem is worth 30 points (you will get 30 points if your answers are correct or almost correct). The remaining 10 points will be graded on completion of this assignment.
• There will be two separate submissions: one for your R code and one for your writeup. Please submit both on Gradescope (more details for the submission of the R part are given in “Applied questions”).
• Please follow the policy stated in the syllabus about academic integrity.
• You must read, understand, agree and sign the integrity pledge (https://academicintegrity.ucsd.edu/take-action/promote-integrity/faculty/excel-with-integrity- pledge.pdf) before completing any assignment for ECON178. Please include your signed pledge in the submission of your assignment on Gradescope.
Conceptual questions
The following questions involve reviewing exercises about expectations, conditional expectations, biases and variances, and basic properties of Normal (also called Gaussian) distributions.
Question 1
Supposethatwehaveamodelyi =βxi+εi (i=1,…,n)wherey= n1 ni=1yi =0,x= n1 ni=1xi = 0, and εi is distributed normally with mean 0 and variance σ2; that is, εi ∼ N(0,σ2). Furthermore, ε1, ε2, …, εn are independently distributed, and the xis (i = 1, …, n) are non-random.
(a) The OLS estimator for β minimizes the Sum of Squared Residuals: n
βˆ=argmin (y −βx)2 βii
i=1 Take the first-order condition to show that
ˆ ni=1 xiyi β=n x2.
i=1 i (b) Assume E[εi|β] = 0 for all i = 1,…,n. Show that
ˆ ni=1 xiεi β=β+ ni=1x2i
What is E[βˆ | β] and Var(βˆ | β)? Use this to show that, conditional on β, βˆ has the following
distribution:
ˆ σ2 β|β∼Nβ,n x2.
(c) Suppose we believe that β is distributed normally with mean 0 and variance σ2 ; that is,
β ∼ N (0, σ2 ). Additionally assume that β is independent of εi for all i = 1, …, n. Compute λ
the mean and variance of βˆ. That is, what is E[βˆ] and Var(βˆ)?
(Hint you might find useful: E[w1] = E[E[w1 | w2]] and Var(w1) = E[Var(w1 | w2)] +
Var(E[w1 | w2]) for any random variables w1 and w2.)
(d) Since everything is normally distributed, it turns out that
E[β | βˆ] = E[β] + Cov(β, βˆ) · (βˆ − E[βˆ]). Var(βˆ)
Let βˆRR = E[β | βˆ]. Compute Cov(β,βˆ) and use the value of E[β] along with the values of E[βˆ], Cov(β,βˆ), and Var(βˆ) you have computed to show that
ˆRR ˆ ni=1 x2i ˆ β = E[β | β] = ni=1 x2i + λ · β
(Hint: Cov(w1, w2) = E[(w1 − E[w1])(w2 − E[w2])] and E[w1w2] = E[w1E[w2 | w1]] for any random variables w1 and w2)
(e) Does βˆRR increase or decrease as λ increases? How does this relate to β being distributed N(0, σ2 )?
Question 2
Let us consider the linear regression model yi = β0 + β1xi + ui (i = 1, …, n), which satisfies
Assumptions MLR.1 through MLR.5 (see Slide 7 in “Linear_regression_review” under “Modules”
on Canvas)1. The xis (i = 1, …, n) and β0 and β1 are nonrandom. The randomness comes from uis
(i = 1, …, n) where var (ui) = σ2. Let βˆ0 and βˆ1 be the usual OLS estimators (which are unbiased for y1 1
y2 1
β0 and β1, respectively) obtained from running a regression of . on . (the intercept
column) and
. y n − 1
x n − 1 xn
. Suppose you also run a regression of
x1 x2
1The model is a simple special case of the general multiple regression model in “Linear_regression_review”.
(excluding the intercept column) to obtain another estimator β ̃1 of β1. Solving this question does not require knowledge about matrix operations.
y n − 1 1
y2 .
x n − 1
a) Give the expression of β ̃1 as a function of yis and xis (i = 1, …, n). ̃ ̃
b) Derive E β1 in terms of β0, β1, and xis. Show that β1 is unbiased for β1 when β0 = 0. If β0 ̸= 0, when will β ̃1 be unbiased for β1?
c) Derive Varβ ̃ , the variance of β ̃ , in terms of σ2 and x s (i = 1,…,n). 11i
d) Show that Varβ ̃ is no greater than Varβˆ ; that is, Var β ̃ ≤ Var βˆ . When do 1111
you have Var β ̃ = Var βˆ ? (Hint you might find useful: use n x2 ≥ n (x − x ̄)2 where 11 i=1ii=1i
x ̄ = n1 ni=1 xi.)
e) Choosing between βˆ1 and β ̃1 leads to a tradeoff between the bias and variance. Comment on
this tradeoff.
Question 3
Let vˆ be an estimator of the truth v. Show that E (vˆ − v)2 = Var (vˆ) + [Bias (vˆ)]2 where Bias (vˆ) = E (vˆ) − v. (Hint: The randomness comes from vˆ only and v is nonrandom).
Applied questions (with the use of R)
For this question you will be asked to use tools from R for coding.
Installation
• To install R, please see https://www.r-project.org/.
• Once you install R, please install also R Studio https://rstudio.com/products/rstudio/
download/.
• You will need to use R Studio to solve the problem set.
from Canvas ⇒ Assignments • data_ps1.csv;
• template_ps1.R .
Submission
• Open the template_ps1.R file that we provided on Canvas ⇒ Assignments.
• All your solutions and code need to be saved in a single file named template_ps1_YOURFIRSTANDLASTNAME.R file. Please use the template_ps1.R pro- vided in Canvas to structure your answers.
• Any file that is not an .R will not be accepted, and the grade for this exercise will be zero.
• Please submit your code on Gradescope.
• Please follow the policy stated in the syllabus about academic integrity.
Useful readings
In addition to the lectures provided by the instructor and the TAs, you might find the following readings useful:
• Chapter 2.3 and 3.6 in the textbook ”An introduction to statistical learning with applications in R”.
Question 4
This exercise helps you get familiar with basic commands in R by working with the Forest Fires data set (Dua, D. and Graff, C., 2019, UCI Machine Learning Repository;2 see: https://archive.ics.uci.edu/ml/datasets/Forest+Fires). This data set is available on Canvas ⇒ Assignments.
1. Download the dataset from Canvas and open it using the command “read.csv”.
2. Open the data and report how many columns and rows the dataset has;
3. See the names of the variables (see online the command “names”);
4. Run a linear regression with “area” as a function of “temp” using the command “lm”;
5. Report the summary of your results (see online the command “summary”)
7. Plot a scatter plot of the regression (Hint: use abline() to draw the regression line)
8. Write down the interpretation of the coefficients as a comment in your .R script (Hint: see template file).
Please write all your answer and code in template_ps1.R file and submit that file on Gradescope as described in the “Submission” section.
2Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.