Research Papers

Improving control over unobservables with network data. Link

Abstract

Unobserved variables often threaten the causal interpretation of empirical estimates. An opportunity to alleviate this concern lies in network datasets, which provide a rich source of information about individual characteristics insofar as they influence network formation. This paper develops the idea of controlling for unobserved confounders by leveraging network structures that exhibit homophily, a frequently observed tendency to form edges with similar nodes. This is formally accomplished under two main frameworks. First, I introduce a concept of asymptotic homophily, according to which individuals’ selectivity is at scale with the size of the potential connection pool. This contributes to the network formation literature with a model that can accommodate common features of empirical networks such as homophily, sparsity, and clustering, and allows me to show that an estimator that considers neighbors as a comparison group is consistent for the Conditional Average Treatment Effect (CATE). I then consider a setting without asymptotic homophily and show how selecting connected individuals whose observed characteristics made such a connection less likely delivers an estimator with similar properties. Overall, the method allows for nonparametric treatment effect inference for both CATE and Average Treatment Effect (ATE) under a version of unconfoundedness that conditions on unobservables, which is often more credible than selection on observables alone. In an application, I recover an estimate of the effect of parental involvement on students’ test scores that is greater than that of OLS, arguably due to the estimator’s ability to account for unobserved ability and motivation.

Using spatial modeling to address covariate measurement error, with Susanne M. Schennach; revisions requested (3rd round) at the Journal of Econometrics. Link

Abstract

We propose a new estimation methodology to address the presence of covariate measurement error by exploiting the availability of spatial data. The approach uses neighboring observations as repeated measurements, after suitably controlling for the random distance between the observations in a way that allows the use of operator diagonalization methods to establish identification. The method is applicable to general nonlinear models with potentially nonclassical errors and does not rely on a priori distributional assumptions regarding any of the variables. The method's implementation combines a sieve semiparametric maximum likelihood with a first-step kernel conditional density estimator and simulation methods. The method's effectiveness is illustrated through both controlled simulations and an application to the assessment of the effect of pre-colonial political structure on current economic development in Africa.

Peer effect analysis with latent processes. Link

Abstract

I study peer effects that arise from irreversible decisions in the absence of a standard social equilibrium. I model a latent sequence of decisions in continuous time and obtain a closed-form expression for the likelihood, which allows to estimate proposed causal estimands. The method avoids regression on conditional expectations or linear-in-means regression – and thus reflection-type problems (Manski, 1993) or simultaneity issues – by modeling the (unobserved) realized direction of causality, whose probability is identified. Under a parsimonious parametric specification, I introduce a peer effect parameter meant to capture the causal influence of first-movers on their peers. Various forms of peer effect heterogeneity can be accommodated. Parameters are shown to be consistently estimated by maximum likelihood methods and lend themselves to standard inference.

Estimation of Independent Component Analysis Systems. Link

Abstract

Although approaches to Independent Component Analysis (ICA) based on characteristic function seem theoretically elegant, they may suffer from implementational challenges because of numerical integration steps or selection of tuning parameters. Extending previously considered objective functions and leveraging results from the continuum Generalized Method of Moments of Carrasco and Florens (2000), I derive an optimal estimator that can take a tractable form and thus bypass these concerns. The method shares advantages with characteristic function approaches -- it does not require the existence of higher-order moments or parametric restrictions -- while retaining computational feasibility and asymptotic efficiency. The results are adapted to handle a possible first step that delivers estimated sensors. Finally, a by-product of the approach is a specification test that is valuable in many ICA applications. The method's effectiveness is illustrated through simulations, where the estimator outperforms efficient GMM, JADE, or FastICA, and an application to the estimation of Structural Vector Autoregressions (SVAR), a workhorse of the macroeconometric time series literature.

Optimally-Transported Generalized Method of Moments, with Susanne M. Schennach; conditionally accepted at Econometrica.

Abstract

We propose a novel optimal transport-based version of the Generalized Method of Moment (GMM). Instead of handling overidentification by reweighting the data to satisfy the moment conditions (as in Generalized Empirical Likelihood methods), this method proceeds by allowing for errors in the variables of the least mean-square magnitude necessary to simultaneously satisfy all moment conditions. This approach, based on the notions of optimal transport and Wasserstein metric, aims to address the problem of assigning a logical interpretation to GMM results even when overidentification tests reject the null, a situation that cannot always be avoided in applications. We illustrate the method by revisiting Duranton, Morrow and Turner's (2014) study of the relationship between a city's exports and the extent of its transportation infrastructure. Our results corroborate theirs under weaker assumptions and provide insight into the error structure of the variables.