First of all, thank you for this wonderful resource!
I am confused by the Stata event study code, and think it might not be totally correct. For reference, here it is
use "https://raw.githubusercontent.com/LOST-STATS/LOST-STATS.github.io/master/Model_Estimation/Data/Event_Study_DiD/bacon_example.dta", clear
* create the lag/lead for treated states
* fill in control obs with 0
* This allows for the interaction between `treat` and `time_to_treat` to occur for each state.
* Otherwise, there may be some NAs and the estimations will be off.
g time_to_treat = year - _nfd
replace time_to_treat = 0 if missing(_nfd)
* this will determine the difference
* btw controls and treated states
g treat = !missing(_nfd)
* Stata won't allow factors with negative values, so let's shift
* time-to-treat to start at 0, keeping track of where the true -1 is
summ time_to_treat
g shifted_ttt = time_to_treat - r(min)
summ shifted_ttt if time_to_treat == -1
local true_neg1 = r(mean)
* Regress on our interaction terms with FEs for group and year,
* clustering at the group (state) level
* use ib# to specify our reference group
reghdfe asmrs ib`true_neg1'.shifted_ttt pcinc asmrh cases, a(stfips year) vce(cluster stfips)
My problem stems from the line
replace time_to_treat = 0 if missing(_nfd)
This means that states which are not treated are given 0, meaning they are treated in that year. This gives the following
time_to_tre
at Freq. Percent Cum.
-21 1 0.06 0.06
-20 2 0.12 0.19
-19 2 0.12 0.31
-18 2 0.12 0.43
-17 2 0.12 0.56
-16 3 0.19 0.74
-15 3 0.19 0.93
-14 3 0.19 1.11
-13 6 0.37 1.48
-12 7 0.43 1.92
-11 9 0.56 2.47
-10 12 0.74 3.22
-9 22 1.36 4.58
-8 25 1.55 6.12
-7 32 1.98 8.10
-6 34 2.10 10.20
-5 36 2.23 12.43
-4 36 2.23 14.66
-3 36 2.23 16.88
-2 36 2.23 19.11
-1 36 2.23 21.34
0 465 28.76 50.09
1 36 2.23 52.32
2 36 2.23 54.55
3 36 2.23 56.77
4 36 2.23 59.00
5 36 2.23 61.22
6 36 2.23 63.45
7 36 2.23 65.68
8 36 2.23 67.90
9 36 2.23 70.13
10 36 2.23 72.36
11 36 2.23 74.58
12 35 2.16 76.75
13 34 2.10 78.85
14 34 2.10 80.95
15 34 2.10 83.06
16 34 2.10 85.16
17 33 2.04 87.20
18 33 2.04 89.24
19 33 2.04 91.28
20 30 1.86 93.14
21 29 1.79 94.93
22 27 1.67 96.60
23 24 1.48 98.08
24 14 0.87 98.95
25 11 0.68 99.63
26 4 0.25 99.88
27 2 0.12 100.00
Total 1,617 100.00
It's possible that because in control units, time_to_treat does not vary across years, the state (stfips) fixed effects "take care" of this. But I can't intuitively reason about what's really happening given 0 stands for both untreated and treated, but year 0.
I would recommend making the time_to_treat variable 100 or the maximum plus 100, to avoid this confusion. The values don't matter since they are used as fixed effects anyways.
First of all, thank you for this wonderful resource!
I am confused by the Stata event study code, and think it might not be totally correct. For reference, here it is
My problem stems from the line
This means that states which are not treated are given
0, meaning they are treated in that year. This gives the followingIt's possible that because in control units,
time_to_treatdoes not vary across years, the state (stfips) fixed effects "take care" of this. But I can't intuitively reason about what's really happening given0stands for both untreated andtreated, but year0.I would recommend making the
time_to_treatvariable100or the maximum plus100, to avoid this confusion. The values don't matter since they are used as fixed effects anyways.