Improving control over unobservables with network data. Link
Unobserved variables often threaten the causal interpretation of empirical estimates. An opportunity to alleviate this concern lies in network datasets, which provide a rich source of information about individual characteristics insofar as they influence network formation. This paper develops the idea of controlling for unobserved confounders by leveraging network structures that exhibit homophily, a frequently observed tendency to form edges with similar nodes. This is formally accomplished under two main frameworks. First, I introduce a concept of asymptotic homophily, according to which individuals’ selectivity is at scale with the size of the potential connection pool. This contributes to the network formation literature with a model that can accommodate common features of empirical networks such as homophily, sparsity, and clustering, and allows me to show that an estimator that considers neighbors as a comparison group is consistent for the Conditional Average Treatment Effect (CATE). I then consider a setting without asymptotic homophily and show how selecting connected individuals whose observed characteristics made such a connection less likely delivers an estimator with similar properties. Overall, the method allows for nonparametric treatment effect inference for both CATE and Average Treatment Effect (ATE) under a version of unconfoundedness that conditions on unobservables, which is often more credible than selection on observables alone. In an application, I recover an estimate of the effect of parental involvement on students’ test scores that is greater than that of OLS, arguably due to the estimator’s ability to account for unobserved ability and motivation.
Using spatial modeling to address covariate measurement error, with Susanne M. Schennach; revised and resubmited to the Journal of Econometrics. Link
We propose a new estimation methodology to address the presence of covariate measurement error by exploiting the availability of spatial data. The approach uses neighboring observations as repeated measurements, after suitably controlling for the random distance between the observations in a way that allows the use of operator diagonalization methods to establish identification. The method is applicable to general nonlinear models with potentially nonclassical errors and does not rely on a priori distributional assumptions regarding any of the variables. The method's implementation combines a sieve semiparametric maximum likelihood with a first-step kernel conditional density estimator and simulation methods. The method's effectiveness is illustrated through both controlled simulations and an application to the assessment of the effect of pre-colonial political structure on current economic development in Africa.
Definition and Estimation of Peer Effects through Latent Processes. Link
I propose a framework to analyze peer effects by modeling a latent sequence of decisions in continuous time. The method avoids linear-in-means regression or regression on conditional expectations -- and thus reflection type problems (Manski, 1993) -- by modeling the (unobserved) direction of causality, whose probability can be identified. I propose a parsimonious parametric specification to incorporate covariates and define a peer effect parameter meant to capture the causal peer influence of first-movers. The parameters are shown to be consistently estimated by maximum of likelihood methods and lends itself to standard inference under repeated network asymptotics.
Estimation of Independent Component Analysis Systems. Link
Although approaches to Independent Component Analysis (ICA) based on characteristic function are usually deemed theoretically elegant, they are known to suffer from severe implementational challenges because of numerical integration steps or selection of tuning parameters. Leveraging results from the continuum Generalized Method of Moments of Carrasco and Florens (2000), I derive an optimally-weighted objective function that can take a tractable form and thus bypass these concerns. The method shares advantages with characteristic-function approaches; it does not require existence of higher-order moments or parametric restrictions and can achieve asymptotic efficiency. The results are adapted to handle a possible first-step that delivers estimated sensors. Finally, the method delivers a specification test which is valuable in many ICA applications. The method's effectiveness is illustrated through simulations, where the estimator outperforms efficient GMM and fastICA, and an application to the estimation of Structural Vector Autoregressions (SVAR), a popular model in the econometric time series literature.
Optimally-Transported Generalized Method of Moments, with Susanne M. Schennach; revise and resubmit at Econometrica. Link
We propose a novel optimal transport-based version of the Generalized Method of Moment (GMM). Instead of handling overidentified models by reweighting the data until all moment conditions are satisfied (as in Generalized Empirical Likelihood methods), this method proceeds by introducing measurement error of the least mean square magnitude necessary to simultaneously satisfy all moment conditions. This approach, based on the notion of optimal transport, aims to address the problem of assigning a logical interpretation to GMM results even when overidentification tests reject the null, a situation that cannot always be avoided in applications.