## Loading required package: pacman
6-6 DiD and Synthetic Control
April 03, 2024
he ‘rlang’ pac
# Install packages
if (!require(“pacman”)) install.packages(“pacman”)
# We are using a package (augsynth) that is not on CRAN, R packages on CRAN have to pass
# some formal tests. Always proceed with caution if a packages is not on CRAN. Since the
# R package is not on CRAN, we needed to download and install the package directly from
# GitHub. Always use the CRAN version if there is one because it is most stable. However,
# if you need something that is currently in development, you might want to download from
# GitHub. I’ve commented out the workflow since I already have it on my computer:
# workflow to install a package from GitHub
# —————————————————————-
# 1. install `devtools` if you don’t already have it. Note that you might need to update t
# ———-
#install.packages(“devtools”) # download developer tools package
#library(devtools) # load library
# 2. install the package (“augsynth”). you can find this path on the GitHub instructions
# ———-
#devtools::install_github(“ebenmichael/augsynth”)
# install libraries – install “augsynth” here since it is now on CRAN pacman::p_load(# Tidyverse packages including dplyr and ggplot2
tidyverse,
# chunk options
# —————————————————————- knitr::opts_chunk$set(
warning = FALSE # prevents warning from appearing after code chunk )
# set seed
set.seed(44)
Introduction
In this lab we will explore difference-in-differences estimates and a newer extension, synthetic control. The basic idea behind both of these methods is simple – assuming two units are similar in a pre-treatment period and one undergoes treatment while the other stays in control, we can estimate a causal effect by taking three differences. First we take the difference between the two in the pre-treatment period, then take another difference in the post-treatment period. Then we take a difference between these two differences (hence the name difference in differences). Let’s see how this works in practice!
We’ll use the kansas dataset that comes from the augsynth package. Our goal here is to estimate the effect of the 2012 Kansas tax cuts on state GDP. Let’s take a look at our dataset:
## fips year qtr state
## Min. : 1.00 Min. :1990 Min. :1.000 Length:5250
## 1st Qu.:17.00 1st Qu.:1996 1st Qu.:1.000 Class :character
## Median :29.50 Median :2003 Median :2.000 Mode :character
## Mean :29.32 Mean :2003 Mean :2.486
## 3rd Qu.:42.00 3rd Qu.:2009 3rd Qu.:3.000
## Max. :56.00 Max. :2016 Max. :4.000
## gdp revenuepop rev_state_total rev_local_total
## Min. : 11509 Min. : 1335 Min. : 1668 Min. : 550
## 1st Qu.: 55151 1st Qu.: 3057 1st Qu.: 7026 1st Qu.: 3268
## Median : 130650 Median : 3628 Median : 13868 Median : 10041
## Mean : 228237 Mean : 3851 Mean : 20813 Mean : 17197
## 3rd Qu.: 276303 3rd Qu.: 4365 3rd Qu.: 24405 3rd Qu.: 18774
# load data
data(kansas)
# summary statistics of kansas
summary(kansas)
## Min. : 453690 Min. : 15133 Min. : 178737 Min. : 178587
## 1st Qu.: 1652585 1st Qu.: 48170 1st Qu.: 657056 1st Qu.: 663786
## Median : 3997978 Median : 108822 Median : 1675988 Median : 1684341
## Mean : 5767107 Mean : 161021 Mean : 2482331 Mean : 2494933
## 3rd Qu.: 6611215 3rd Qu.: 188730 3rd Qu.: 2990530 3rd Qu.: 2993158
## Max. :39250017 Max. :1448488 Max. :16600851 Max. :16633834
## month3_emplvl total_qtrly_wages taxable_qtrly_wages avg_wkly_wage
## Min. : 181521 Min. :8.811e+08 Min. :0.000e+00 Min. : 301.0
## 1st Qu.: 667492 1st Qu.:5.403e+09 1st Qu.:0.000e+00 1st Qu.: 515.2
## Median : 1699044 Median :1.362e+10 Median :1.096e+09 Median : 658.0
## Mean : 2510204 Mean :2.402e+10 Mean :3.776e+09 Mean : 674.8
## 3rd Qu.: 3016494 3rd Qu.:2.973e+10 3rd Qu.:4.177e+09 3rd Qu.: 804.0
## Max. :16606038 Max. :2.753e+11 Max. :7.689e+10 Max. :1792.0
## year_qtr treated gdpcapita lngdp
## Min. :1990 Min. :0.000000 Min. :15029 Min. : 9.351
Max. :2568986
popestimate
Max. :14609 Max. :182530 Max. :143137
NA’s :2250 NA’s :2850 NA’s :2850
qtrly_estabs_count month1_emplvl month2_emplvl
## 1st Qu.:1996 1st Qu.:0.000000 1st Qu.:27989 1st Qu.:10.918
## Median :2003 Median :0.000000 Median :36449 Median :11.780
## Mean :2003 Mean :0.003048 Mean :37808 Mean :11.754
## 3rd Qu.:2010 3rd Qu.:0.000000 3rd Qu.:45531 3rd Qu.:12.529
## Max. :2016 Max. :1.000000 Max. :84382 Max. :14.759
## lngdpcapita revstatecapita revlocalcapita emplvl1capita
## Min. : 9.618 Min. : 2021 Min. : 883.6 Min. :0.3249
## 1st Qu.:10.240 1st Qu.: 2903 1st Qu.:2012.4 1st Qu.:0.4113
## Median :10.504 Median : 3380 Median :2428.3 Median :0.4356
## Mean :10.486 Mean : 3742 Mean :2480.2 Mean :0.4368
## 3rd Qu.:10.726 3rd Qu.: 4048 3rd Qu.:2819.4 3rd Qu.:0.4621
## Max. :11.343
## emplvl2capita
## Min. :0.3251 Min. :0.3289 Min. :0.3269 Min. : 1493
## 1st Qu.:0.4138 1st Qu.:0.4163 1st Qu.:0.4138 1st Qu.: 2941
## Median :0.4378 Median :0.4406 Median :0.4378 Median : 3787
## Mean :0.4390 Mean :0.4420 Mean :0.4393 Mean : 3869
## 3rd Qu.:0.4644 3rd Qu.:0.4676 3rd Qu.:0.4644 3rd Qu.: 4608
## Max. :1.0507 Max. :1.0513 Max. :1.0515 Max. :10275
## taxwagescapita avgwklywagecapita estabscapita abb
## Min. : 0.0 Min. : 301.0 Min. :0.01992 Length:5250
## 1st Qu.: 0.0 1st Qu.: 515.2 1st Qu.:0.02553 Class :character
## Median : 355.7 Median : 658.0 Median :0.02845 Mode :character
## Mean : 728.8 Mean : 674.8 Mean :0.02928
## 3rd Qu.:1224.4 3rd Qu.: 804.0 3rd Qu.:0.03211
## Max. :5254.4 Max. :1792.0 Max. :0.07071
We have a lot of information here! We have quarterly state GDP from 1990 to 2016 for each U.S. state, as well as some other covariates. Let’s begin by adding a treatment indicator to Kansas in Q2 2012 and onward.
Max. :20353
NA’s :2850
emplvl3capita
Max. :7160.9 Max. :1.0524
NA’s :2850
## # A tibble: 6 x 9
## year qtr year_qtr state treated gdp lngdpcapita fips treatment
##
## 1 1990
## 2 1990
## 3 1990
## 4 1990
## 5 1991
## 6 1991
1 1990 Alabama
2 1990. Alabama
3 1990. Alabama
4 1991. Alabama
1 1991 Alabama
2 1991. Alabama
9.78 1 0
9.79 1 0
9.80 1 0
9.82 1 0
9.83 1 0
9.84 1 0
emplvlcapita totalwagescapita
s treatment in
# create a treatment indicator # ———-
kansas %>%
# select subset of variables
select(year, qtr, year_qtr, state, treated, gdp, lngdpcapita, fips) %>%
# create new treatment flag just to see
mutate(treatment = case_when(state == “Kansas” & year_qtr >= 2012.5 ~ 1, # note this add
# view head
head(kansas)
TRUE ~ 0))
One approach might be to compare Kansas to itself pre- and post-treatment. If we plot state GDP over time we get something like this:
# visualize intervention in Kansas # ———-
kansas %>%
# processing
# ———-
filter(state == ‘Kansas’) %>%
# ———- ggplot() +
# geometries
geom_point(aes(x = year_qtr, y = lngdpcapita)) +
geom_vline(xintercept = 2012.5, color = “maroon”) + # color horizontal line red
theme_fivethirtyeight() + theme(axis.title = element_text()) +
labs(x = “Year-Quarter “,
y = “State GDP Per Capita \n(in thousands)”,
# x-axis label
# y-axis label
title = “Kansas State GDP Per Capita Over Time”) # title
Kansas State GDP Per Capita Over Time
1990 2000 2010 Year−Quarter
State GDP Per Capita (in thousands)
QUESTION: Looks like GDP went up after the tax cut! What is the problem with this inference?
ANSWER: It looks like GDP went up after the tax cut, but we have no way of telling whether it went up because of the tax cut or went up because it would have otherwise. In short, we need to compare the treated Kansas to a counterfactual for if taxes weren’t cut.
Ideally, we would like to compare treated Kansas to control Kansas. Because of the fundamental problem of causal inference, we will never observe both of these conditions though. The core idea behind DiD is that we could instead use the fact that our treated unit was similar to a control unit, and then measure the differences between them. Perhaps we could choose neighboring Colorado:
nstead which I
# visualize intervention in Kansas # ———-
kansas %>%
# processing
# ———-
filter(state %in% c(“Kansas”,”Colorado”)) %>% # use “%in% to filter values in a vector filter(year_qtr >= 2012.5 & year_qtr<= 2012.75) %>%
#filter(between(year_qtr, 2012.5, 2012.75)) %>%
# ———-
ggplot() +
# add in point layer geom_point(aes(x = year_qtr,
y = lngdpcapita,
color = state)) + # color by state
# add in line
geom_line(aes(x = year_qtr,
y = lngdpcapita,
# same filtering but using between() i
color = state)) + theme_fivethirtyeight() +
theme(axis.title = element_text()) +
# labels – PREFER TO USE labs() SO THAT IT IS ALL IN ONE ARGUMENT
ggtitle(‘Colorado and Kansas GDP \n before/after Kansas tax cut’) + xlab(‘Year-Quarter’) +
ylab(‘State GDP Per Capita \n(in thousands)’)
Colorado and Kansas GDP before/after Kansas tax cut
2012.50 2012.55 2012.60 2012.65 2012.70 Year−Quarter
Colorado Kansas
State GDP Per Capita (in thousands)
This is basically what Card-Krueger (1994) did measuring unemployment rates among New Jersey and Pennsylvania fast food restaurants.
Challenge: Try writing a simple DiD estimate using dplyr/tidyr (use subtraction instead of a regression):
# DiD for: kansas-colorado
# —————————————————————- # create a dataset for kansas and colorado
kansas %>%
filter(state %in% c(“Kansas”,”Colorado”)) %>% filter(year_qtr >= 2012.5 & year_qtr<= 2012.75)
# pre-treatment difference # ----------
pre_diff <-
# filter out only the quarter we want filter(year_qtr == 2012.5) %>%
# subset to select only vars we want select(state,
lngdpcapita) %>% # make the data wide
pivot_wider(names_from = state, values_from = lngdpcapita) %>%
# subtract to make calculation
summarise(Colorado – Kansas)
# post-treatment difference # ———-
post_diff <-
# filter out only the quarter we want filter(year_qtr == 2012.75) %>%
# subset to select only vars we want select(state,
lngdpcapita) %>% # make the data wide
pivot_wider(names_from = state, values_from = lngdpcapita) %>%
# subtract to make calculation
summarise(Colorado – Kansas)
# diff-in-diffs
# ———-
diff_in_diffs <- post_diff - pre_diff diff_in_diffs
## Colorado - Kansas
## 1 0.003193447
Looks like our treatment effect is about .003 (in logged thousands dollars per capita). Again this is the basic idea behind Card-Krueger.
QUESTION: Why might there still be a problem with this estimate?
ANSWER: We just assumed that Colorado was similar to Kansas because they are neighbors - we don’t
really have evidence for this idea.
Parallel Trends Assumptions
One of the core assumptions for difference-in-differences estimation is the “parallel trends” or “constant trends” assumption. Essentially, this assumption requires that the difference between our treatment and control units are constant in the pre-treatment period. Let’s see how Kansas and Colorado do on this assumption:
# parallel trends
# ---------------------------------------------------------------- kansas %>%
# ———
filter(state %in% c(“Kansas”,”Colorado”)) %>%
# plotting all of the time periods — not filtering out any of them
# ———
ggplot() +
# add in point layer geom_point(aes(x = year_qtr,
y = lngdpcapita,
color = state)) + # add in line layer
geom_line(aes(x = year_qtr,
y = lngdpcapita,
color = state)) + # add a horizontal line
geom_vline(aes(xintercept = 2012.5)) +
theme_fivethirtyeight() + theme(axis.title = element_text()) +
ggtitle(‘Colorado and Kansas GDP \n before/after Kansas tax cut’) + xlab(‘Year-Quarter’) +
ylab(‘State GDP Per Capita \n(in thousands)’)
Colorado and Kansas GDP before/after Kansas tax cut
2000 Year−Quarter
State GDP Per Capita (in thousands)
The two lines somewhat move together, but the gap does grow and shrink at various points over time. The most concerning part here is that the gap quickly shrinks right before treatment. What do we do if we do not trust the parallel trends assumption? Perhaps we pick a different state.
Challenge: Choose another state that you think would be good to try out, and plot it alongside Kansas and Colorado.
# parallel trends: add a third state
# —————————————————————-
kansas %>%
# ———
filter(state %in% c(“Kansas”,
“Colorado”, “Missouri”)) %>%
# ———
ggplot() +
geom_point(aes(x = year_qtr,
y = lngdpcapita,
color = state)) + geom_line(aes(x = year_qtr,
y = lngdpcapita,
color = state)) + geom_vline(aes(xintercept = 2012.5)) +
theme_fivethirtyeight() + theme(axis.title = element_text()) +
ggtitle(‘Colorado and Kansas GDP \n before/after Kansas tax cut’) + xlab(‘Year-Quarter’) +
ylab(‘State GDP’)
Colorado and Kansas GDP before/after Kansas tax cut
2000 2010 Year−Quarter
Colorado Kansas
QUESTION: Would you pick Colorado or your choice? be the more plausible control unit in this case? Why?
ANSWER: There is a good argument for both of them (Missouri in this case). However, the gap between Colorado and Kansas closes quickly before the treatment period, and similarly it grows between between Kansas and Missouri at the same point.
Selecting comparative units this way can be hard to justify theoretically, and sometimes we do not have a good candidate. What can we do then? This is where synthetic control comes in.
Synthetic Control
Synthetic control is motivated by the problem of choosing comparison units for comparative case studies. It aims to create a “synthetic” version of the treatment unit by combining and weighting covariates from other units (“donors”). In this case, we would construct a synthetic Kansas by creating a weighted average of the other 49 U.S. states. Ideally, the synthetic unit would match the treatment unit in the pre-treatment periods.
For constructing a synthetic control, we are going to primarily rely on the augsynth library, since you can use the same library for augmented synthetic controls. The basic syntax for this library is:
augsynth(outcome ~ trt, unit, time, t_int, data)
augsynth library
This is a very flexible package that can handle both synthetic controls as well as augmentation and staggered adoption. It’s a bit more clunky but will handle the heavy lifting of estimation. Here is a tutorial for simultaneous adoption.
Note that the ATT here varies slightly from the tutorial because we have specified 2012.5 as the first treatment quarter, whereas the tutorial specifies 2012.25 (the quarter in which the law was passed (May)).
# NOTE: when t_int is not specified (time when intervention took place), then the code will automatical # Doesn’t seem to run when try to specify t_int anyways
# synthetic control
# ———
syn <- # save object
augsynth(lngdpcapita ~ treatment, # treatment - use instead of treated bc latter codes 2012.25 as trea state, # unit
year_qtr, # time
summary(syn)
## 2012.50 -0.035
## 2012.75 -0.027
## 2013.00 -0.014
## 2013.25 -0.024
## 2013.50 -0.041
## 2013.75 -0.027
## 2014.00 -0.039
## 2014.25 -0.037
## 2014.50 -0.023
## 2014.75 -0.012
## 2015.00 -0.023
## 2015.25 -0.013
## 2015.50 -0.015
## 2015.75 -0.012
## 2016.00 -0.021
-0.059 -0.004
-0.054 0.004
-0.036 0.015
-0.047 0.005
-0.065 -0.012
-0.050 -0.005
-0.064 -0.015
-0.063 -0.008
-0.050 0.008
-0.043 0.019
-0.058 0.010
-0.044 0.016
-0.048 0.013
-0.047 0.019
-0.065 0.014
progfunc = "None",
# plain syn control
# synthetic control
## single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time),
## t_int = t_int, data = data, progfunc = "None", scm = ..2)
## Average ATT Estimate (p Value for Joint Null): -0.0242
## L2 Imbalance: 0.084
## Percent improvement from uniform weights: 79.1%
## Avg Estimated Bias: NA
## Inference type: Conformal inference
## Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
## One outcome and one treatment time found. Running single_augsynth.
We can use the built in plot function to see how Kansas did relative to synthetic Kansas. The confidence intervals are calculated using Jackknife procedures (leave one out, calculate, and cycle through all).
We can see which donors contributed the most to the synthetic Kansas:
x in row as in
use geom_col()
# view each state's contribution
# ---------
data.frame(syn$weights) %>% # coerce to data frame since it’s in vector form
# ———
# change index to a column
tibble::rownames_to_column(‘State’) %>% # move index from row to column (similar to inde # plot
# ———
ggplot() +
# stat = identity to take the literal value instead of a count for geom_bar() geom_bar(aes(x = State,
y = syn.weights),
stat = ‘identity’) + # override count() which is default of geom_bar(), could
coord_flip() + # flip to make it more readable # themes
theme_fivethirtyeight() +
theme(axis.title = element_text()) +
ggtitle(‘Synthetic Control Weights’) + xlab(‘State’) +
ylab(‘Weight’)
Wyoming Wisconsin West Virginia Washington Virginia Vermont Utah Texas Tennessee South Dakota South Carolina Rhode Island Pennsylvania Oregon Oklahoma Ohio North Dakota North Carolina New York New Mexico New Jersey New Hampshire Nevada Nebraska Montana Missouri Mississippi Minnesota Michigan Massachusetts Maryland Maine Louisiana Kentucky Iowa Indiana Illinois Idaho Hawaii Georgia Florida Delaware Connecticut Colorado California Arkansas Arizona Alaska Alabama
Synthetic Control Weights
Surprisingly, only a few units ended up contributing! Let’s take a closer look at the ones that did:
# view each state’s contribution, where weights are greater than 0 # ———
data.frame(syn$weights) %>%
# processing
# ———
tibble::rownames_to_column(‘State’) %>%
filter(syn.weights > 0) %>% # filter out weights less than 0 # plot
# ———
ggplot() +
geom_bar(aes(x = State,
y = syn.weights), stat = ‘identity’) +
coord_flip() + # flip to make it more readable # themes
theme_fivethirtyeight() +
theme(axis.title = element_text()) +
ggtitle(‘Synthetic Control Weights’) + xlab(‘State’) +
ylab(‘Weight’)
Wisconsin West Virginia Washington Texas Tennessee South Carolina Oregon Ohio North Dakota New York New Mexico New Hampshire Nevada Nebraska Missouri Minnesota Michigan Massachusetts Kentucky Indiana Illinois Georgia Delaware Connecticut Colorado Alaska
Synthetic Control Weights
tidysynth library
Before we move on, I want to talk about the tidysynth library, which is a new, tidyverse-friendly implementation of original synth package. As you will see, it is easy to use to visualize the parallel trends, but it cannot handle the augmentation functions we might want to implement and it doesn’t have as much support for estimation, unlike augsynth. So, you should be aware of it, use it for visualization, but maybe use augsynth for estimation and augmentation. Here is a helpful tutorial by the package author as well as an another implementation that might be helpful.
# specifying a synthetic control using tidysynth
# —————————————————————-
# install package
# install.packages(‘tidysynth’)
# load library
library(tidysynth)
# specify synthetic control
kansas_out <- kansas %>%
# initial the synthetic control object synthetic_control(outcome = lngdpcapita, # outcome
unit = state, # unit index in the panel data
程序代写 CS代考 加微信: cstutorcs
# GDP covariate
time = year_qtr, # time index in the panel data
i_unit = “Kansas”, # unit where the intervention occurred (treatment i i_time = 2012.25, # time period when the intervention occurred # (t_i generate_placebos=T # generate placebo synthetic controls (for inferen ) %>%
generate_predictor(gdp = gdp) %>%
# Generate the fitted weights for the synthetic control generate_weights(optimization_window = 1990.00:2012.25, # time to use in the optimizatio
margin_ipop = .02,
sigf_ipop = 7,
bound_ipop = 6) %>% # optimizer options
# Generate the synthetic control
generate_control()
Now we can manually calculate a treatment effect (ATT) that approximates what we obtained using augsynth but is not exactly the same. For this reason, I might use augsynth for estimation.
Plot trends. The key here is that we differences in synthetic Kansas more closely tracts Kansas than did Missouri in our DiD.
n augsynth)
nt variable in
# calculate the treatment effect manually
# —————————————————————- kansas_out %>%
grab_synthetic_control(placebo = T) %>% # specify placebo to be able to filter on .id v filter(.id == “Kansas”)%>%
filter(time_unit >= 2012.5) %>% # time period
# sum all of the post-treatment effects
mutate(estimate = synth_y – real_y) %>%
summarize(ATT = sum(estimate)) %>% # subtract difference to obtain treatment eff glimpse()
# plot parallel trends for synthetic Kansas vs observed Kansas
# —————————————————————- kansas_out %>% plot_trends()
Time Series of the synthetic and observed lngdpcapita
View the differences between Kansas and Synthetic Kansas.
Observed Synthetic
Dashed line denotes the time of the intervention.
lngdpcapita
# plot observed differences between synthetic Kansas vs observed Kansas # —————————————————————- kansas_out %>% plot_differences()
Difference in the synthetic control and observed Kansas
Differences in each state in the donor pool from Kansas. So this shows how much each state varies from Kansas.
observed Kansa
lngdpcapita
# plot differences in trends for all other states that contribute to synethetic Kansas vs # —————————————————————-
kansas_out %>% plot_placebos()
Programming Help
Difference of each ‘state’ in the donor pool
Kansas control units
Pruned all placebo cases with a pre−period RMSPE exceeding two times the treated unit’s pre−period RMSPE.
lngdpcapita
# plot control weights of each other state
# —————————————————————- kansas_out %>% plot_weights()
浙大学霸代写 加微信 cstutorcs
Control Unit Weights (W)
Variable Weights (V)
Vermont Wyoming Montana South Dakota Rhode Island North Dakota Maine Idaho Delaware Alaska New Hampshire West Virginia Hawaii New Mexico Mississippi Nebraska Arkansas Nevada Utah Iowa Oklahoma South Carolina Kentucky Alabama Oregon Connecticut Louisiana Missouri Arizona Colorado Wisconsin Tennessee Minnesota Indiana Maryland Washington Michigan Massachusetts North Carolina Georgia Virginia New Jersey Ohio Pennsylvania Illinois Florida New York Texas California
0.00 0.05 0.10 0.15 0.20
Synthetic Control Augmentation
The main advantage of the asynth package is that it allows for “augmented synthetic control”. One of the main problems with synthetic control is that if the pre-treatment balance between treatment and control outcomes is poor, the estimate is not valid. Specifically, they advocate for using L2 imbalance, which he first encountered as the penalty that ridge regression uses. L2 uses “squared magnitude” of the coefficient to penalize a particular feature.
Parallel Trends
# plot parallel trends for synthetic Kansas vs observed Kansas (manually)
# —————————————————————-
# Aniket’s method for getting the underlying data # ———
syn_sum <- summary(syn)
# create synthetic Kansas # --------- kansas_synkansas <-
kansas %>%
# filter just Kansas filter(state == “Kansas”) %>%
# bind columns
bind_cols(difference = syn_sum$att$Estimate) %>% # add in estimate
# calculate synthetic Kansas
mutate(synthetic_kansas = lngdpcapita + difference) # adds the estimate to the observed
# ——— kansas_synkansas %>%
ggplot() +
# ———
geom_line(aes(x = year_qtr,
y = lngdpcapita,
color = ‘Kansas’)) + # synthetic kansas
# ———
geom_line(aes(x = year_qtr,
y = synthetic_kansas,
color = ‘Synthetic Kansas’)) +
scale_color_manual(values = c(‘Kansas’ = ‘red’, ‘Synthetic Kansas’ = ‘blue’)) + geom_vline(aes(xintercept = 2012.5)) +
theme_fivethirtyeight() +
theme(axis.title = element_text()) +
ggtitle(‘Kansas vs Synthetic Kansas’) +
xlab(‘Year-Quarter’) +
ylab(‘State GDP Per Capita’)
Kansas to crea
Kansas vs Synthetic Kansas
1990 2000 2010 Year−Quarter
Kansas Synthetic Kansas
State GDP Per Capita
QUESTION: How does pre-treatment matching between Kansas and Synthetic Kansas look here? ANSWER: Pretty good! We may not need to augment this synthetic control, though let’s try anyway.
Augmentation
Let’s play a bit with the augmentation parameters that will adjust the weights to see if we can find better fits to create a synthetic control.
# recalculate with Ridge function that penalizes really high weights # —————————————————————- ridge_syn <-
augsynth(lngdpcapita ~ treatment, state,
progfunc = "ridge", # specify
## One outcome and one treatment time found. Running single_augsynth.
summary(ridge_syn) # the lower the L2 balance, the better -- now 0.07 compared to ~0.08
## single_augsynth(form = form, unit = !!enquo(unit), time = !!enquo(time),
## t_int = t_int, data = data, progfunc = "ridge", scm = ..2)
## Average ATT Estimate (p Value for Joint Null): -0.0298
## L2 Imbalance: 0.070
## Percent improvement from uniform weights: 82.7%
## Avg Estimated Bias: 0.006
## Inference type: Conformal inference
## Time Estimate 95% CI Lower Bound 95% CI Upper Bound p Value
## 2012.50 -0.038 -0.065
## 2012.75 -0.031 -0.058
## 2013.00 -0.019 -0.041
## 2013.25 -0.031 -0.055
## 2013.50 -0.048 -0.075
## 2013.75 -0.034 -0.058
## 2014.00 -0.046 -0.073
## 2014.25 -0.043 -0.072
## 2014.50 -0.029 -0.061
## 2014.75 -0.017 -0.052
## 2015.00 -0.028 -0.065
## 2015.25 -0.019 -0.053
## 2015.50 -0.021 -0.055
## 2015.75 -0.017 -0.057
## 2016.00 -0.026 -0.069
Let’s look at the weights:
-0.013 0.023
-0.004 0.036
0.002 0.066
-0.009 0.011
-0.023 0.028
-0.012 0.022
-0.022 0.020
-0.016 0.026
0.000 0.055
0.012 0.122
0.004 0.055
0.011 0.076
0.009 0.099
0.018 0.112
0.006 0.053
# view weights - now we have negative weights as a result of Ridge # ---------------------------------------------------------------- data.frame(ridge_syn$weights) %>%
tibble::rownames_to_column(‘State’) %>% ggplot() +
geom_bar(aes(x = State, y = ridge_syn.weights),
stat = ‘identity’) + coord_flip() + # coord flip
theme_fivethirtyeight() + theme(axis.title = element_text()) + ggtitle(‘Synthetic Control Weights’) + xlab(‘State’) +
ylab(‘Weight’)
Wyoming Wisconsin West Virginia Washington Virginia Vermont Utah Texas Tennessee South Dakota South Carolina Rhode Island Pennsylvania Oregon Oklahoma Ohio North Dakota North Carolina New York New Mexico New Jersey New Hampshire Nevada Nebraska Montana Missouri Mississippi Minnesota Michigan Massachusetts Maryland Maine Louisiana Kentucky Iowa Indiana Illinois Idaho Hawaii Georgia Florida Delaware Connecticut Colorado California Arkansas Arizona Alaska Alabama
Synthetic Control Weights
0.0 0.1 0.2 0.3 Weight
Notice how with the ridge augmentation, some weights are allowed to be negative now.