Machine Learning in Empirical Economics

Get Complete Project Material File(s) Now! »

A Parametric Alternative to the Synthetic Control Method with Many Covariates Joint work with Marianne Blehaut, Xavier D’Haultf uille and Alexandre Tsybakov.

Summary

The synthetic control method developed by Abadie et al. (2010) is a an econo-metric tool to evaluate causal e ects when a few units are treated. While initially aimed at evaluating the e ect of large-scale macroeconomic changes with very few available control units, it has increasingly been used in place of more well-known microeconometric tools in a broad range of applications, but its properties in this context are unknown. This paper proposes a parametric generalization of the synthetic control, which is developed both in the usual asymptotic framework and in the high-dimensional one. The proposed esti-mator is doubly robust, consistent and asymptotically normal uniformly over a large class of data-generating processes. It is also immunized against rst-step selection mistakes. We illustrate these properties using Monte Carlo sim-ulations and applications to both standard and potentially high-dimensional settings, and o er a comparison with the synthetic control method.

Introduction

The original synthetic control method developed by Abadie and Gardeazabal (2003); Abadie et al. (2010, 2015) is an econometric tool to quantify the e ects of a policy change that a ects one or very few aggregate units, using aggregate-level data. The idea is to construct a counterfactual treated unit using a convex combination of non-treated units, the \synthetic control unit », that closely recreates the characteristics of the treated. The weight given to each control unit are computed by minimizing the discrepancy between the treated and the synthetic unit in the mean of predictors of the outcome of interest. The synthetic control method has been used to evaluate causal impacts in a wide range of applications such as terrorism, civil wars and social unrest (Acemoglu et al., 2016), political and monetary unions (Abadie et al., 2015, Wassmann, 2015), minimum wage (Dube and Zipperer, 2015, Addison et al., 2014), health (Bilgel and Galle, 2015), scal policies (Dietrichson and Ellegard, 2015), geographical and regional policies (Gobillon and Magnac, 2016), immigration policy (Bohn et al., 2014b), international trade (Nannicini and Billmeier, 2011) and many more. While initially aimed at evaluating the e ect of large-scale macroeconomic changes with very few available units of comparison, most of the time these units being states or regions, the synthetic control method has increasingly been used in place of more well-known microeconometric tools. Contrasting with these standard approaches, the theory behind the synthetic control estimator has not been fully built yet, especially when the number of control units tends to in nity.
This paper proposes an alternative to the synthetic method by using a parametric form for the weight given to each control unit. In the small-dimensional case where the number of observations is much larger than the number of covariates, our approach amounts to a two-step GMM estimator, where the parameters governing the synthetic weights are computed in a rst step so that the reweighted control group matches some features of the treated. A key result of the paper is the double robustness of the estimator, as de ned by Bang and Robins (2005). Under that property, misspeci cations in the synthetic control weights do not prevent valid inference if the outcome regression function is linear for the control group. This approach is also extended to the high-dimensional case where the number of covariates is proportional or larger than the number of observations and to cases where variable selection is performed. This extension makes the proposed estimator suitable for comparative case studies and macroeconomic applications. Here, the double robustness property helps constructing an estimator which is immunized against rst-step selection mistakes in the sense de ned by Chernozhukov et al. (2015); Chernozhukov et al. (2018a). In both cases, it is consistent and asymptotically normal uniformly over a large class data-generating processes. Consequently, we develop inference based on asymptotic approximation, linking the synthetic control method with more standard microeconometric tools.
The present paper builds mainly along two lines of the treatment e ect literature. The rst one is the literature related to propensity score weighting and covariate balancing propensity scores. Several recent e orts have been made to include balance between co-variates as an explicit objective for estimation with or without relation to the propensity score (e.g. Hainmueller (2012); Graham et al. (2012)). Recently, Imai and Ratkovic (2014) integrated propensity score estimation and covariate balancing in the same frame-work. Their covariate balancing propensity score method is estimated with GMM and yields more robust estimates than standard propensity score-related methods. Indeed, they show that this method is less impacted by potential misspeci cations and retains the theoretical properties of GMM estimators. Our Theorem 2.1 gives a theoretical basis to support these empirical ndings. It is to be noted that the covariate balancing idea is related to the calibration on margins method used in survey sampling, see for example Deville et al. (1993).
It also partakes in the econometric literature that addresses variable selection, and more generally the use of machine learning tools, when estimating a treatment e ect, especially but not exclusively in a high-dimensional framework. The lack of uniformity for inference after a selection step has been raised in a series of papers by Leeb and Potscher (2005, 2008a,b), echoing earlier papers by Leamer (1983) who put into question the credibility of many empirical policy evaluation results. One recent innovative solution proposed to circumvent this post-selection conundrum is the use of double-selection procedures Belloni and Chernozhukov (2013); Farrell (2015); Chernozhukov et al. (2015); Chernozhukov et al. (2018a). For example, Belloni et al. (2014a,b) highlight the dangers of selecting controls exclusively in their relation to the outcome and propose a three-step procedure that helps selecting more controls and guards against omitted variable biases much more than a simple \post-single-selection » estimator, as it is usually done by selecting covariates based on either their relation with the outcome or with the treatment variable, but rarely both. Farrell (2015) extends this approach by allowing for heterogeneous treatment e ects, proposing an estimator that is robust to either model selection mistakes in propensity scores or in outcome regression. In addition, he deals explicitly with a discrete treatment that is a more common setting in the policy evaluation literature. Chernozhukov et al. (2015, 2018a) have theorized this approach by showing how using moments that are rstorder-insensitive to the selection step help immunizing the inference against selection p mistakes, or more generally against estimators that are not n-consistent. A di erent path to deal with the problem of propensity score speci cation has been followed by Kitagawa and Muris (2016) using the Focused Information Criterion (FIC) of Claeskens and Hjort (2003), but it does not explicitly accommodate for a high-dimensional nuisance parameter and assumes that the researcher knows the true model.
The paper is organized as follows. Section 2 introduces our estimator and states its properties in a standard low-dimensional setting. Section 3 extends the previous section to the high-dimensional case and studies its asymptotic properties. Section 4 illustrates the good inference properties of the estimator in a Monte Carlo experiment. Section 5 revisits LaLonde (1986)’s dataset to compare our procedure with other high-dimensional econometric tools and the e ect of the large-scale tobacco control program of Abadie et al. (2010) for a comparison with synthetic control. The appendix gathers the proofs.

A Parametric Alternative to Synthetic Control

Covariate Balancing Weights and Double Robustness

We are interested in the e ect of a binary treatment, coded by D = 1 for the treated and D = 0 for the non-treated. We let Y0 and Y1 denote the potential outcome un-der no treatment and under the treatment, respectively. The observed outcome is then Y = DY1 + (1 D)Y0. We also observe a random vector X 2 Rp of pre-treatment characteristics. The quantity of interest is the Average Treatment E ect on the Treated (ATET) de ned a: 0 = E[Y1 Y0jD = 1]:
Since no individual is observed in both treatment states, identi cation of the counterfac-tual E[Y0jD = 1] is achieved through the following two ubiquitous conditions.
Assumption 2.1 (Nested Support) P[D = 1jX] < 1 almost surely and := P[D = 1] 2 (0; 1).
Assumption 2.2 (Mean Independence) E[Y0jX; D = 1] = E[Y0jX; D = 0].
Assumption 2.1, a version of the usual common support condition, requires that there exist control units for any possible value of the covariates in the population. Since the ATET is the parameter of interest, we are never reconstructing a counterfactual for control units so P[D = 1jX] > 0 is not required. Assumption 2.2 states that conditional on a set of observed confounding factors, the expected potential outcome under no treatment is the same for treated and control individuals. This assumption is a weaker form of the classical conditional independence assumption : (Y0; Y1) ?? DjX.
As in most of the time in policy evaluation settings, the counterfactual is identi ed and estimated as a weighted average of non-treated unit outcomes: 0 = E[Y1jD = 1] E[W Y0jD = 0]; (2.1)
where W is a random variable. Popular choices for the weights are the following:
1. Linear regression: W = E[DX0]E[(1 D)XX0] 1X, also referred to as the Oaxaca-Blinder estimator Kline (2011),
2. Propensity score: W = P [D = 1jX]=(1 P [D = 1jX]),
3. Matching: see Smith and Todd (2005) for more details,
4. Synthetic controls: see Abadie et al. (2010).
This paper proposes another choice of weight W which can be seen as a particular solution of the synthetic control. Formally, we look for weights W that (i) satisfy a balancing condition as in the synthetic control method, are (ii) positive and (iii) function of the covariates. The rst condition writes:
E[DX] = E[W(1 D)X]: (2.2)
Up to a proportional constant, this is equivalent to E[XjD = 1] = E[W XjD = 0]. This condition means that W balances the rst moment of the observed covariates between the treated and the control group. The de nition of the observable covariates X is left to the econometrician and can include transformation of the original covariates so as to match more features of their distribution. The idea behind such weights relies on the idea of \covariate balancing » as in e.g. Imai and Ratkovic (2014). The following lemma shows that under Assumption 2.1, weights satisfying the balancing condition always exist.
Lemma 2.1 (Balancing Weights) If Assumption 2.1 holds, the propensity score weight W0 := P[D = 1jX]=(1 P[D = 1jX]) satis es the balancing condition (2.2).
It is straightforward to verify by plugging this expression in equation (2.2) and using the law of iterated expectations. Note that the linear regression weight W = E[DX0]E[(1 D)XX0] 1X also veri es the balancing condition but can be negative. The lemma sug-gests estimating a binary choice model to obtain P[D = 1jX] and estimate weights W0 as a rst step, and plugging them to estimate 0 in a second step. However, an inconsistent estimate of the propensity score leads to an inconsistent estimator of 0 and does not guarantee that the implied weights will achieve covariate balancing. Finally, estimation of a propensity score can be problematic when there are very few treated units. For these reasons, we consider instead an estimation directly based on balancing equations: E[(D (1 D)W0) X] = 0: (2.3)
An important advantage of this approach over the usual one based on the propensity score estimation through maximum likelihood is its double-robustness (for a de nition, see, e.g., Bang and Robins, 2005). Indeed, let W1 denote the weights identi ed by (2.3) and a misspeci ed model on the propensity score. Because the balancing equations (2.3) still hold for W1, the estimated treatment e ect will still be consistent provided that E[Y0jX] is linear in X. The formal result is provided in Theorem 2.1 below.
We consider a parametric estimator of W0. Suppose that P [D = 1jX] = G(X0 0) for some unknown 0 2 Rp and some known, strictly increasing cumulative distribution function G. Then W0 = h(X0 0) with h = G=(1 G) and 0 is identi ed by (2.3). h is a positive increasing function, meaning that its primitive H is convex and its derivative (if it exists) is positive. A classical example of h would be h = exp, corresponding to a logistic distribution for G. In such an example, h = h0 = H. In any case, the convexity of H implies that 0 is the solution of the strictly convex program: 0 = arg min E [(1 D)H(X0 ) DX0 ] : (2.4)
Note that this program is well-de ned, whether or not P [D = 1jX] = G(X0 0).
We are now ready to state the main identi cation theorem that justi es the use of the ATET estimand of equation (2.1):
Theorem 2.1 (Double Robustness) Suppose that Assumptions 2.1-2.2 hold and let 0 de ned by equation (2.4) for some positive, strictly increasing convex function H. Then, for any 2 Rp, 0 satis es
0 = 1 E [(D (1 D)h(X0 0)) (Y X0 )] ; (2.5)
E(D)
in two cases:
1. the regression function under no treatment is linear, i.e. there exists 0 2 Rp such that E[Y0jX] = X0 0, or
2. the propensity score is given by P [D = 1jX] = G(X0 0), with G = h=(1 + h).
Theorem 2.1 highlights the double-robustness property of using an estimate of the propen-sity score based on the balancing approach. This result is similar to the one obtained by Kline (2011) for the Oaxaca-Blinder estimator, but his requires the propensity score to follow speci cally a log-logistic model in the propensity-score-well-speci ed case. So Theorem 2.1 is more general. At this stage, in equation (2.5) does not play any role and could be zero. However, we will see below that choosing carefully is important in the high-dimensional case to obtain an \immunized » estimator of 0.

READ  The right to property 

Table of contents :

Introduction 
1 High-Dimension, Variable Selection and Immunization
1.1 The Post-Selection Inference Problem
1.2 State of the Art
1.3 Contribution
2 Machine Learning in Empirical Economics
2.1 State of the Art
2.2 Contribution
2.3 Perspectives
3 Synthetic Control, High-Dimension and Selection of the Control Group
3.1 State of the Art
3.2 Contribution
3.3 Perspectives
2 A Parametric Alternative to the Synthetic Control Method with Many Covariates
1 Introduction
2 A Parametric Alternative to Synthetic Control
2.1 Covariate Balancing Weights and Double Robustness
2.2 Asymptotic Properties in Low-Dimension
3 High-Dimensional Covariates and Post-Selection Inference
3.1 Regularized Estimation
3.2 Immunized Estimation
3.3 Asymptotic Properties
4 Simulations
5 Empirical Applications
5.1 Job Training Program, LaLonde (1986)
5.2 California Tobacco Control Program, Abadie et al. (2010)
6 Conclusion
7 Appendix A: Algorithm for Feasible Penalty Loadings
8 Appendix B: Proofs
3 A Penalized Synthetic Control Estimator for Disaggregated Data
1 Introduction
2 Penalized Synthetic Control
2.1 Synthetic Control for Disaggregated Data
2.2 Penalized Synthetic Control
2.3 Bias-Corrected Synthetic Control
3 Large Sample Properties
3.1 Bias
3.2 Consistency
3.3 Asymptotic Normality
3.4 Asymptotic Behavior of S()
4 Permutation Inference
4.1 Inference on Aggregate Eects
4.2 Inference Based on the Sum of Rank Statistics of Unit-Level Treatment Eects Estimates
5 Penalty Choice
5.1 Leave-One-Out Cross-Validation of Post-Intervention Outcomes for the Untreated
5.2 Pre-Intervention Holdout Validation on the Outcomes of the Treated
6 Simulations
7 Empirical Applications
7.1 The Value of Connections in Turbulent Times, Acemoglu et al. (2016)
7.2 The Impact of Election Day Registration on Voter Turnout, Xu (2017)
8 Conclusion
9 Appendix: Proofs
4 Using Generic Machine Learning to Analyze Treatment Heterogeneity: An Application to Provision of Job Counseling 
1 Introduction
2 Machine Learning in Empirical Economics
3 Data and Experimental Design
3.1 Design of the Experiment
3.2 Data
4 Empirical Strategy
4.1 An Economic Model of Treatment Allocation
4.2 Methodological Aspects
5 Results
5.1 Detection of Heterogeneity
5.2 Dimension of Heterogeneity (CLAN)
5.3 Selection into the Treatment
6 Conclusion
7 Appendix: Descriptive Statistics
8 Appendix: Adaptation of Th. 2.1 in Chernozhukov et al. (2018b)
Bibliography

GET THE COMPLETE PROJECT

Related Posts