Job Market Paper

Improving control over unobservables with network data. Link

Unobserved variables often threaten the causal interpretation of empirical estimates. An opportunity to alleviate this concern lies in network datasets, which provide a rich source of information about individual characteristics insofar as they influence network formation. This paper develops the idea of controlling for unobserved confounders by leveraging network structures that exhibit homophily, a frequently observed tendency to form edges with similar nodes. This is formally accomplished under two main frameworks. First, I introduce a concept of asymptotic homophily, according to which individuals’ selectivity is at scale with the size of the potential connection pool. This contributes to the network formation literature with a model that can accommodate common features of empirical networks such as homophily, sparsity, and clustering, and allows me to show that an estimator that considers neighbors as a comparison group is consistent for the Conditional Average Treatment Effect (CATE). I then consider a setting without asymptotic homophily and show how selecting connected individuals whose observed characteristics made such a connection less likely delivers an estimator with similar properties. Overall, the method allows for nonparametric treatment effect inference for both CATE and Average Treatment Effect (ATE) under a version of unconfoundedness that conditions on unobservables, which is often more credible than selection on observables alone. In an application, I recover an estimate of the effect of parental involvement on students’ test scores that is greater than that of OLS, arguably due to the estimator’s ability to account for unobserved ability and motivation.

Research Papers

Using spatial modeling to address covariate measurement error, with Susanne M. Schennach; revised and resubmitted to the Journal of Econometrics. Link

We propose a new estimation methodology to address the presence of covariate measurement error by exploiting the availability of spatial data. The approach uses neighboring observations as repeated measurements, after suitably controlling for the random distance between the observations in a way that allows the use of operator diagonalization methods to establish identification. The method is applicable to general nonlinear models with potentially nonclassical errors and does not rely on a priori distributional assumptions regarding any of the variables. The method's implementation combines a sieve semiparametric maximum likelihood with a first-step kernel conditional density estimator and simulation methods. The method's effectiveness is illustrated through both controlled simulations and an application to the assessment of the effect of pre-colonial political structure on current economic development in Africa.

Definition and Estimation of Peer Effects through Latent Processes. Link

I propose a framework to analyze peer effects in continuous time using latent exponential stochastic processes. The method avoids 'outcomes on means' regression and thus reflection type problems (Manski, 1993) by constructing a likelihood function that recognizes the temporal ordering in causality and accounts for every possible causal sequence of events. I define a peer effect parameter at the individual level, which is meant to capture causal peer influence relationships. The parameter -- and possibly covariates' coefficient -- is shown to be consistently estimated by maximum of likelihood methods and lends itself to standard inference.

Estimation of Independent Component Analysis Systems. Link

I propose an approach to Independent Component Analysis (ICA) with square mixing matrix that does not require existence of higher-order moments or parametric restrictions, handles estimated sensors explicitely, and can achieve asymptotic efficiency. The estimator is shown to be consistent and asymptotically normal, with an asymptotic variance that can be consistently estimated. The approach is an application of the continuum Generalized Method of Moments of Carrasco and Florens (2000) and also delivers a global specification test which is valuable in many ICA applications. The method's effectiveness is illustrated through simulations, where the estimator outperforms efficient GMM and fastICA, and an application to the estimation of Structural Vector Autoregressions (SVAR), a popular model in the econometric time series literature.

Optimally-Transported Generalized Method of Moments, with Susanne M. Schennach. Link

We propose a novel optimal transport-based version of the Generalized Method of Moment (GMM). Instead of handling overidentified models by reweighting the data until all moment conditions are satisfied (as in Generalized Empirical Likelihood methods), this method proceeds by introducing measurement error of the least mean square magnitude necessary to simultaneously satisfy all moment conditions. This approach, based on the notion of optimal transport, aims to address the problem of assigning a logical interpretation to GMM results even when overidentification tests reject the null, a situation that cannot always be avoided in applications.