Assignment 1
EC338: Assignment 1
General Information
The assignment has 5 sections. Each section carries a different weight, totalling 100.
Do your best to answer all questions in each section.
Submit two files: (1) a pdf containing your answers; (2) a do-file (or R-script) containing your code. I am happy for R users to include a single R-markdown file that contains both answers and code.
Mathematical proofs may be hand-written, but must be legible. If you write the proof by hand, take a photo and past it in the document with your solutions. Of course, typed proofs are more likely to be legible.
When asked to produce a figure or table of results, please include the output in the same document as your answers.
For parts C, D, and E, I strongly recommend using either STATA or R.
DUE: Thursday, 17 November 2022, 2pm
EC338 Assignment 1
Section A (15 marks) – Proof of Variance
In the lectures we discussed how the simple difference-in-means estimator (τˆ) is an unbiased estimator of the τfs and τATE if treatment is assigned according to a completely random
experiment. In addition, we demonstrated how the OLS estimator from a simple linear re- gression (of outcome against treatment status) gave us the same estimator and is therefore unbias too,
βˆOLS =Y ̄obs −Y ̄obs =τˆ tc
Assuming homogeneous treatment effects, we said that, ˆconst 21 1
V=s·N+N ct
was the most precise estimate of V ar(τˆ). In addition, we know that with under homoskedas- ticity the estimator for V ar(βˆOLS ) is given by,
Vˆconst =Vˆhomosk
Vˆneyman ≊Vˆhetero
= Ni=1(Wi − W ̄ )2 == Ni=1(Wi − W ̄ )2
QUESTIONS:
1. Show that,
2. Next, show that,
That is, show that,
3. Discuss formally the relationship between homoskedasticity (heteroskedasticity) and ho- mogeneous (heterogeneous) treatment effects.
i=1 i i = N
i=1(Wi − W ̄ )2
εˆ · ( W − W ) s ̃ s ̃
2 = N + N c t
s ̃c=N (Yi−Yc)
obs ̄obs 2 obs ̄obs 2
c i:Wi=0 1 N
ands ̃t=N (Yi−Yt) t i:Wi=1
CS Help, Email: tutorcs@163.com
EC338 Assignment 1
Note: For part 1, you should use the definition of s2 in the lecture notes. For part 2, a formal discussion should refer to the definition of the error term in the linear regression model, with and without heterogeneous treatment effects.
Hint: First show the statement from the lecture notes,
(Wi −W ̄)2 =NW ̄(1−W ̄)
程序代写 CS代考 加微信: cstutorcs
EC338 Assignment 1
Section B (10 marks) – Power Calculation
The following quote comes from the NBER paper by Gennetian et al. (2022, p.8) included with the assignment.
“Striking a balance between statistical power and project costs, 40% of the recruited sample within each site was randomized to receive $333 monthly cash gifts and 60% to receive $20 monthly gifts. With an enrolled sample of n=1,000 mother-infant dyads, and accounting for a predicted 20% attrition over longer- term follow-ups, the anticipated sample size of 800 dyads during subsequent waves of data collection is estimated to provide 80% power to detect a .207 standard deviation impact at p<.05 in a two-tailed test on cognitive functioning and family process outcomes.”
1. Demonstrate how the authors arrived at this power calculation: “80% power to detect a .207 standard deviation impact at p<.05”. That is, demonstrate that the power of the relevant test is 80% for under the alternative hypothesis that the treatment effect is 0.207 of a standard deviation.
Note: Please treat this question like a proof where you show your working. You should be able to show that power is ≊82% using either a standard normal or t-distribution approximation of the test-statistic distribution. This can be done without any additional information from the paper. Demonstrating this is sufficient for full marks. You need to assume that attrition is independent of treatment status. The relevant null hypothesis is one of no effect; i.e. H0 : τ = 0.
EC338 Assignment 1
Section C (25 marks) - Simulation Exercise
We have discussed various approaches to estimating treatment effects under the Conditional Independence Assumption using observational data. In this section you will design and exe- cute a simulation that will allow you to compare of the mean and variance of five potential estimators.
You should be able to borrow from Seminar 1’s STATA do-file (or R-script). However, the set up has been adapted in a number of ways. Please follow the steps carefully and report back any ambiguities.
Set the number of observations to 1000, as observational datasets tend to be larger.
Generate the data according to the data generating process,
where, Let,
Yi(0) = γ0 + γ1agei + γ2femalei + γ3agei · femalei + εi γ0, γ1, γ2, γ3 = 1.2, 0.015, −0.02, −0.01
– Generate εi ∼ N(0,0.552)
– Generate agei as a random integer that is uniformly distributed between [20, 65].
– Generate femalei as a dummy variable where (female) labour force participation declines rapidly during years of childbirth,
femalei ∼B(1,ρ(agei)) where ρ(agei)=0.5−0.25·ln(agei −19) ln(46)
Allow for heterogeneous treatment effects by age,
τi(agei)∼N(μ(agei),0.01) where μ(agei)=0.02+0.06·1{agei >43}
Since age is uniformly distributed, τAT E ≊ 0.05.
Simulate unconfoundedness on age, alone. Assign treatment status in such a way that the average level of treatment increases with age, but is independent of Yi(0) conditional on age. Use the binomial distribution where the probability of success depends on age in the following way,
Wi ∼ B(1, ρ(agei)) where ρ(agei) = 0.25 + 0.5 · ln(agei − 19) ln(46)
The probability of treatment should be between [0.25, 0.75].
Estimate the following five models 1,000 times. Report the mean and standard deviation of the simulated samples of βˆ1. Provide a plot of the kernel density distribution of each estimator.
Code Help
Assignment 1
DISCUSSION:
– Model 1: Estimate a linear regression model without covariates, Yobs=β +β D+υ
– Model 2: Estimate a linear regression model that matches the CEF of Y obs,
i Yobs =β +β D +γ age +γ female +γ age ·female +ε
i 02 12 i 12 i 22 i 32 i i i – Model 3: Estimate a saturated linear regression model,
Yobs =β D +γ 1{age =j}+ε i 13i j3 i i
– Model 4: Estimate Model 1, applying inverse probability weights,
Yobs=β +β D+υ i 0414ii
where the estimated weights are based on the estimated propensity scores,
eˆ(Xi)Di · (1 − eˆ(Xi))1−Di
derived from a logit model,
e(Xi) = P r(Di = 1|agei) = Λ (ψ04 + ψ14agei)
– Model 5: Repeat model 4, but use a saturated logit model,
65 e(Xi) = P r(Di = 1|agei) = Λ ψj51{agei = j}
1. In addition to reporting on the distributions of these five estimators, provide a discus- sion of the simulation results. In particular, discuss how important model specification appears to be relative to omitted variable bias (or selection on unobservables).
EC338 Assignment 1
Section D (40 marks) – Differences-in-differences
In this section you will evaluate a 2018 change in the Ontario minimum wage using a difference- in-difference model. Figure 1 plots the nominal minimum wage for Ontario alongside its two neighbouring provinces, Manitoba and Quebec. Canadian provincial minimum wages tend to be pegged to inflation, with annual increases at regular intervals.1
In January 2018 the provincial government enacted a non-standard increase in the minimum wage from $11.60 to $14, just 3 months after its annual inflation adjustment in October 2017.2 This is an increase of $2.40, or 20%, during a period of low inflation.3
Figure 1: Provincial Nominal Minimum Wage (Canadian $’s)
For this question I have provided you with a 20% sample of the publicly available Canadian Labour Force Survey from 2010-2019. The survey is a repeated cross-section and contains a select set of variables. The sample includes all working-aged adults. It is a large file and contains some variables which are not observed in all years.4 As with all survey data, there may be missing data for certain variables.
1You can examine the provincial minimum wages in other provinces around this time using this link.
2The policy appears to have been announced in June 2017 (see link). One might interpret the change in minimum wage as an attempt to appease voters by the incumbent Liberal government of Ontario, given that provincial elections were to take place in June 2018. The Liberals lost the election to the Progressive Conservative Party, which is possibly why the nominal minimum wage remain fixed for almost three years after this unprecedented increase.
3Ontario has a different minimum wage for those under 18 and liquor servers. These too were increased by $2.25 and $2.10 respectively.
4Beware, this may affect your sample if included in a your estimating equation. 7 of 10
EC338 Assignment 1
QUESTIONS:
1. Discuss why this policy change might make for a good or bad ‘natural’ experiment.
2. Create two time-series graphs demonstrating the employment rate (proportion of pop- ulation employed) of individuals aged 15-24 for the period 2010 to 2019 in Ontario, Manitoba, and Quebec. The first should depict the annual employment rate, while the second graph should have a x-axis denominated by calendar months, not years. Com- ment on the trend and seasonality of youth employment. Does it ‘look’ like the parallel trend assumption holds between Ontario and its neighbouring provinces.
3. Estimate a 2-period-2-group difference-in-difference model using the 2017 and 2018 data.
Yitc =α+ψDc +δTt +βDc ·Tt +εitc (1)
Where the outcome Yitc is a dummy variable indicating that individual i is employed. In this application, the assignment-group (denoted by c) is the individual’s province of residence: Dc = 1{Ontario}. You need to justify your choice of control group and may choose a province other than Manitoba or Quebec. You may also use more than one province as a control group, assuming there is reason to do so.
In a second specification, include month fixed-effects (λm) to account for seasonality.
Yitc =α+ψDc +δTt +βDc ·Tt +λm +νitc (2)
In a third, include a set of good covariates of your own choosing. Justify your choice.
Y =α+ψD +δT +βD ·T +λ +X′ γ+υ (3)
itc c t c t m itc itc
Present the results in a single table and comment on any important differences across the specifications.
4. Estimate a dynamic difference-in-differences specification that will allow you to test the
parallel trends assumption in the pre-treatment period. You should include data from
2014 to 2019 in the model and normalize the results relative to 2017, the year before
treatment. Try to present the estimates of βˆ in a graph, along with 95% confidence j
intervals.
Yitc =α+ψDc +δt + βj1{t=j}·Dc +λm +εitc (4) j ̸=2017
Comment on whether you find support for the parallel trends assumption. In addition, comment on the dynamics of the treatment effect in the post-treatment period. Are the results consistent with your expectations of the policy impact?
5. As we have monthly data, we could estimate a dynamic model with monthly treatment effects. What additional assumptions would we need to make with respect to the CEF of Yi(0)?
EC338 Assignment 1
6. The policy was announced in June 2017. As such, we may expect pre-emptive adjust- ments to the demand for labour. Propose and execute a way of checking for pre-emptive behaviour using this sample.
7. The estimated treatment effect may be explained by differential trends in the aggregate labour markets of Ontario. Demonstrate how we might use the remaining sample to test this hypothesis?
8. Are you convinced by your estimates? By in large, the literature on minimum wages finds that a reasonable increase in the minimum has no effect on employment levels (see review by Manning, 2021).
EC338 Assignment 1
Section E (10 marks) – Synthetic Control
In this section you will apply a synthetic control estimator to examine the impact of the above minimum wage policy. I have provided the code needed to execute this estimator in STATA. R users should be able to use the very similar Synth package developed by Abadie, Diamond, and Hainmueller (see link), the same developers as the corresponding STATA package.
ssc install synth
** Look at the syntax help synth
use lfs 2010 2019 ages1564 20per.dta, clear
keep if agegrp<=2
gen employed = empstat<=2 if empstat!=.
gen male = sex==1 if sex!=.
tab edugrp if edugrp!=., gen(edu)
recode efamtype (1 = 1) (2/4 = 2) (14 16 17 = 3) (5/10 = 4) (11/13 15 = 5) (18 = 6), gen(fam)
tab efamtype fam, m
tab fam if fam!=., gen(fam)
collapse (mean) employed male edu1-edu3 fam1-fam6 [w=wgt], by(province year)
tsset province year
** 1. Match over the outcome in the pre-period
synth employed , tru(6) trp(2018) fig
** 2. Match over the outcome in the periods 2014-2016
synth employed employed, tru(6) trp(2018) fig mspeperiod(2014(1)2016)
** 3. Match over covariates in the pre-period
synth employed male edu2 edu3 fam2 fam3 fam4 fam5 fam6, tru(6) trp(2018) fig
** 4. Match over covariates in the periods 2014-2016
synth employed male edu2 edu3 fam2 fam3 fam4 fam5 fam6, tru(6) trp(2018) fig
mspeperiod(2014(1)2016)
DISCUSSION:
1. Comment on the appropriateness of the weights assigned to each of the control provinces in each instance. As a researcher, discuss whether you think it is more appropriate to choose a control group (as in Section D) or construct one using a method such as synthetic control.
Note: There is no one right answer to this question.