Using Generic Machine Learning to Analyze Treatment Heterogeneity: An Application to Provision of Job Counseling

Get Complete Project Material File(s) Now! »

Job Training Program, LaLonde (1986)

We revisit LaLonde (1986). This dataset was rst built to assess the impact of the National Supported Work (NSW) program. The NSW is a transitional, subsidized work experience program targeted towards people with longstanding employment problems: exo enders, former drug addicts, women who were long-term recipients of welfare benets and school dropouts. Here, the quantity of interest is the ATET, dened as the impact of the participation in the program on 1978 yearly earnings in dollars. The treated group gathers people who were randomly assigned to this program from the population at risk (n1 = 185).Two control groups are available. The rst one is experimental: it is directly comparable to the treated group as it has been generated by a random control trial (sample size n0 = 260). The second one comes from the Panel Study of Income Dynamics (PSID) (sample size n0 = 2490). The presence of the experimental sample allows to obtain a benchmark for ATET obtained with observational data. We use these datasets to compare our estimator with other competitors and defer discussion of the NSW program and the controversy regarding econometric estimates of nonexperimental causal eects to the paper by LaLonde (1986) and subsequent contributions by Dehejia and Wahba (2002); Smith and Todd (2005).
To allow for a exible specication, we consider the setting of Farrell (2015) and take the raw covariates of the dataset (age, education, black, hispanic, married, no degree, income in 1974, income in 1975, no earnings in 1974, no earnings in 1975), two-by-two-interactions between the four continuous variables and the dummies, two-by-two interactions between the dummies and up to a degree of order 5 polynomial transformations of continuous variables. Continuous variables are linearly rescaled to [0; 1]. All in all, we end up with 172 variables to select from. The experimental benchmark for the ATT estimate is $1,794 (671). We compare several estimators: the naive plug-in estimator, the immunized plug-in estimator, the doubly-robust estimator of Farrell (2015), the double-post-selection linear estimator of Belloni et al. (2014b), and a simple OLS estimator where all the covariates are included.
Table 2.5 displays the results. Columns (3)-(5) show estimators that give a credible value for the ATT with respect to the experimental benchmark. However, they dier in their variances as one can easily see. Farrell (2015) in its Lasso version and the immunized estimator achieve the lowest standard-error. Notably, Farrell (2015) in its Lasso version and the immunized estimator are the only ones out of six estimators which display a signicant, positive impact similarly to the experimental benchmark. The immunized estimator estimator oers a large improvement on bias and standard error over the naive plug-in estimator, which augments the evidence given by the Monte Carlo experiment. The estimate obtained using Farrell (2015) shown in the Table dier from the on displayed in the original paper because we have not automatically included the variables education, 1974 income and nodegree in the set of theory pre-selected covariatesas it is done in the original paper. When doing so, the results are slightly better but not qualitatively dierent for this estimator, but we thought it would bias the comparison as other estimators do not include a set of pre-selected variables. For estimators from columns (2) to (6), the penalty parameters can potentially be tuned to obtain a better bias-variance trade-o. The OLS estimator in column (7) presents a benchmark of a very simple model that does not use any selection at all.

California Tobacco Control Program, Abadie et al. (2010)

Proposition 99 is one of the rst and most ambitious large-scale tobacco control program,implemented in 1989 in California. It includes a vast array of measures, including an increase in cigarette taxation of 25 cents per pack, and a signicant eort in prevention and education. In particular, the tax revenues generated by Proposition 99 were used to fund anti-smoking campaigns. Abadie et al. (2010) analyze the impact of the law on tobacco consumption in California. Since this program was only enforced in California, it is a classic example where the synthetic control method applies, and more standard public policy evaluation tools cannot be used. It is possible to reproduce a synthetic California by reweighting other states so as to imitate California’s behavior. For this purpose, Abadie et al. (2010) consider the following covariates: retail price of cigarettes, state log income per capita, percentage of population between 15-24, per capita beer consumption (all 1980-1988 averages). 1970 to 1975, 1980 and 1988 cigarette consumptions are also included. Using the same variables, we conduct the same analysis with our estimator. Figure 2.2 displays the estimated eect of Proposition 99 using the immunized estimator.

READ  HEATING AND COOLING SYSTEMS OPERATIONAL TEST PROCEDURES

Inference Based on the Sum of Rank Statistics of Unit-Level Treatment Eects Estimates

Similar to Dube and Zipperer (2015), we propose a test based on the rank statistics of the unit-level treatment eects. Unlike the test in Dube and Zipperer (2015), we calculate the permutation distribution directly from the data. The test we employ is based on the sum of ranks of individual treatment eects in the ordered sample combining the n1 (B + 1) unit-level treatment eects for the actual assignments and B random permutations. Individual treatment eects, b Ti, may be based on dierences in outcomes between treated and synthetic controls,

Table of contents :

0 Resume substantiel en Francais 
1 Introduction 
1 High-Dimension, Variable Selection and Immunization
1.1 The Post-Selection Inference Problem
1.2 State of the Art
1.3 Contribution
2 Machine Learning in Empirical Economics
2.1 State of the Art
2.2 Contribution
2.3 Perspectives
3 Synthetic Control, High-Dimension and Selection of the Control Group .
3.1 State of the Art
3.2 Contribution
3.3 Perspectives
2 A Parametric Alternative to the Synthetic Control Method with Many
1 Introduction
2 A Parametric Alternative to Synthetic Control
2.1 Covariate Balancing Weights and Double Robustness
2.2 Asymptotic Properties in Low-Dimension
3 High-Dimensional Covariates and Post-Selection Inference
3.1 Regularized Estimation
3.2 Immunized Estimation
3.3 Asymptotic Properties
4 Simulations
5 Empirical Applications
5.1 Job Training Program, LaLonde (1986)
5.2 California Tobacco Control Program, Abadie et al. (2010)
6 Conclusion
7 Appendix A: Algorithm for Feasible Penalty Loadings
8 Appendix B: Proofs
3 A Penalized Synthetic Control Estimator for Disaggregated Data 
1 Introduction
2 Penalized Synthetic Control
2.1 Synthetic Control for Disaggregated Data
2.2 Penalized Synthetic Control
2.3 Bias-Corrected Synthetic Control
3 Large Sample Properties
3.1 Bias
3.2 Consistency
3.3 Asymptotic Normality
3.4 Asymptotic Behavior of S()
4 Permutation Inference
4.1 Inference on Aggregate Eects
4.2 Inference Based on the Sum of Rank Statistics of Unit-Level Treatment Eects Estimates
5 Penalty Choice
5.1 Leave-One-Out Cross-Validation of Post-Intervention Outcomes for the Untreated
5.2 Pre-Intervention Holdout Validation on the Outcomes of the Treated
6 Simulations
7 Empirical Applications
7.1 The Value of Connections in Turbulent Times, Acemoglu et al. (2016) 83
7.2 The Impact of Election Day Registration on Voter Turnout, Xu
(2017)
8 Conclusion
9 Appendix: Proofs
4 Using Generic Machine Learning to Analyze Treatment Heterogeneity: An Application to Provision of Job Counseling 
1 Introduction
2 Machine Learning in Empirical Economics
3 Data and Experimental Design
3.1 Design of the Experiment
3.2 Data
4 Empirical Strategy
4.1 An Economic Model of Treatment Allocation
4.2 Methodological Aspects
5 Results
5.1 Detection of Heterogeneity
5.2 Dimension of Heterogeneity (CLAN)
5.3 Selection into the Treatment
6 Conclusion
7 Appendix: Descriptive Statistics
8 Appendix: Adaptation of Th. 2.1 in Chernozhukov et al. (2018b)
Bibliography

GET THE COMPLETE PROJECT

Related Posts