Di Francesco Riccardo
CEIS Research Paper
In this paper, I propose a data-driven approach to discover heterogeneous subpopulations in a selection-on-observables framework that avoids the risk of data snooping and the drawbacks of pre-analysis plans. The approach constructs partitions of the population in a completely nonparametric fashion and can handle covariate spaces of arbitrary dimensions and arbitrary patterns of interaction among covariates. I exploit estimated unit-level treatment effects to grow and prune an “aggregation tree” that aggregates observations into groups. This approach formalizes the trade-off between parsimony and granularity implicit in the aggregation process. By varying the key parameter of the assumed cost-complexity criterion, a sequence of “optimal” partitions is generated, one for each level of granularity. The resulting sequence is nested, as previous groupings are never undone when moving to coarser levels. I illustrate the use of the proposed methodology through an empirical exercise that revisits the effects of maternal smoking on infants’ weight.
Keywords: Causality, conditional average treatment effects, recursive partitioning, subgroups discovery, subgroup analysis
JEL codes: C29,C45,C55
Date: Thursday, December 15, 2022
Revision Date: Thursday, December 15, 2022