I am a Postdoctoral Fellow at Stanford Data Science where I work with Guido Imbens.
I am on the 2025-2026 academic job market.
I work on machine learning methods for causal inference with broad applications in economics. My recent research has focused on debiased machine learning, including for instrumental variables regression and reinforcement learning. I also apply these causal machine learning methods to a variety of economics questions including household responses to income shocks and demand estimation.
Previously, I completed my PhD in Computer Science at UC Berkeley, advised by Avi Feller and Emi Nakamura. In addition to Computer Science, I also completed core PhD courses in Economics as a Berkeley Opportunity Lab Labor Science Fellow. I worked with Alex D'Amour at Google Deepmind from 2023-2024. My email is causal@stanford.edu.
David Bruns-Smith
The growing access to large administrative datasets with rich covariates presents an opportunity to revisit classic two-stage least squares (2SLS) applications with machine learning (ML). We develop Two-Stage Machine Learning, a simple and efficient estimator for nonparametric instrumental variables (NPIV) regression. Our method uses ML models to flexibly estimate nonparametric treatment effects while avoiding the computational complexity and statistical instability of existing machine learning NPIV approaches. Our procedure has two steps: first, we predict the outcomes given instruments and covariates (the reduced form) and extract a basis from this predictor; second, we predict the outcomes using the treatment and covariates, but where the predictions are projected onto the learned basis of instruments. We prove that under a testable condition, our estimation error depends entirely on the reduced-form prediction task, where ML methods excel. We also develop a bias correction procedure that provides valid confidence intervals for scalar summaries like average derivatives. In an empirical application to California supermarket data featuring bunching at 99-ending price points, we find our machine learning approach is crucial for modeling discontinuities in demand at the dollar boundary: we reduce NPIV estimation error nearly seven-fold compared to previous estimators and estimate a price elasticity that is 2.5-6 times larger than prior estimates.
A note on Computer Science publications for non-Computer Scientists: machine learning research is primarily published at selective refereed conferences such as Neurips, ICML, AISTATS, or FAccT. The "top-tier" conferences are Neurips, ICML, and ICLR.
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2025
David Bruns-Smith, Oliver Dukes, Avi Feller, and Elizabeth L. Ogburn
* Royal Statistical Society Discussion Paper (~1-2 JRSSB papers are selected for discussion each year)
* Presented as a keynote at the RSS Annual Conference
We provide a novel characterization of augmented balancing weights, also known as automatic debiased machine learning. These popular doubly robust estimators combine outcome modelling with balancing weights—weights that achieve covariate balance directly instead of estimating and inverting the propensity score. When the outcome and weighting models are both linear in some (possibly infinite) basis, we show that the augmented estimator is equivalent to a single linear model with coefficients that combine those of the original outcome model with those from unpenalized ordinary least-squares (OLS). Under certain choices of regularization parameters, the augmented estimator in fact collapses to the OLS estimator alone. We then extend these results to specific outcome and weighting models. We first show that the augmented estimator that uses (kernel) ridge regression for both outcome and weighting models is equivalent to a single, undersmoothed (kernel) ridge regression—implying a novel analysis of undersmoothing. When the weighting model is instead lasso-penalized, we demonstrate a familiar ‘double selection’ property. Our framework opens the black box on this increasingly popular class of estimators, bridges the gap between existing results on the semiparametric efficiency of undersmoothed and doubly robust estimators, and provides new insights into the performance of augmented balancing weights.
To appear at Neurips 2025
David Bruns-Smith, Zhongming Xie, and Avi Feller
* Spotlight paper, 3% acceptance rate
Multiaccuracy (Kim et al, 2019), developed in fair machine learning, provides a framework for reducing predictive bias uniformly over subpopulations defined by an auditing class. Recent work shows that multiaccurate estimators trained only on source data can remain low-bias under unknown covariate shifts—a property known as ``Universal Adaptability'' (Kim et al, 2022). Building on this, we show that when the auditing class is a Hilbert space, a simple multiaccurate ridge-boosting estimator is numerically equivalent to an ``automatic debiased machine learning'' estimator. As a result, it inherits semiparametric efficiency and optimal asymptotic variance. In this setting, the multiaccurate estimator is not only robust under unknown shifts --- it is also efficient, even without access to data from the target distribution.
Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2023
David Bruns-Smith, Avi Feller, and Emi Nakamura
Household responses to income shocks are important drivers of financial fragility, the evolution of wealth inequality, and the effectiveness of fiscal and monetary policy. Traditional approaches to measuring the size and persistence of income shocks are based on restrictive econometric models that impose strong homogeneity across households and over time. In this paper, we propose a more flexible, machine learning framework for estimating income shocks that allows for variation across all observable features and time horizons. First, we propose non-parametric estimands for shocks and shock persistence. We then show how to estimate these quantities by using off-the-shelf supervised learning tools to approximate the conditional expectation of future income given present information. We solve this income prediction problem in a large Icelandic administrative dataset, and then use the estimated shocks to document several features of labor income risk in Iceland that are not captured by standard economic income models.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022
David Bruns-Smith and Avi Feller
We study balancing weight estimators, which reweight outcomes from a source population to estimate missing outcomes in a target population. These estimators minimize the worst-case error by making an assumption about the outcome model. In this paper, we show that this outcome assumption has two immediate implications. First, we can replace the minimax optimization problem for balancing weights with a simple convex loss over the assumed outcome function class. Second, we can replace the commonly-made overlap assumption with a more appropriate quantitative measure, the minimum worst-case bias. Finally, we show conditions under which the weights remain robust when our assumptions on the outcomes are wrong.
International Conference on Machine Learning (ICML), 2021
David Bruns-Smith
When decision-makers can directly intervene, policy evaluation algorithms give valid causal estimates. In off-policy evaluation (OPE), there may exist unobserved variables that both impact the dynamics and are used by the unknown behavior policy. These "confounders" will introduce spurious correlations and naive estimates for a new policy will be biased. We develop worst-case bounds to assess sensitivity to these unobserved confounders in finite horizons when confounders are drawn iid each period. We demonstrate that a model-based approach with robust MDPs gives sharper lower bounds by exploiting domain knowledge about the dynamics. Finally, we show that when unobserved confounders are persistent over time, OPE is far more difficult and existing techniques produce extremely conservative bounds.
This version was presented at INFORMS 2023. An updated draft is in submission at Management Science.
David Bruns-Smith and Angela Zhou
* Short version accepted at NeurIPS 2025 ML×OR Workshop.
Offline reinforcement learning is important in domains such as medicine, economics, and e-commerce where online experimentation is costly, dangerous or unethical, and where the true model is unknown. However, most methods assume all covariates used in the behavior policy's action decisions are observed. Though this assumption, sequential ignorability/unconfoundedness, likely does not hold in observational data, most of the data that accounts for selection into treatment may be observed, motivating sensitivity analysis. We study robust policy evaluation and policy optimization in the presence of sequentially-exogenous unobserved confounders under a sensitivity model. We propose and analyze orthogonalized robust fitted-Q-iteration that uses closed-form solutions of the robust Bellman operator to derive a loss minimization problem for the robust Q function, and adds a bias-correction to quantile estimation. Our algorithm enjoys the computational ease of fitted-Q-iteration and statistical improvements (reduced dependence on quantile estimation error) from orthogonalization. We provide sample complexity bounds, insights, and show effectiveness both in simulations and on real-world longitudinal healthcare data of treating sepsis. In particular, our model of sequential unobserved confounders yields an online Markov decision process, rather than partially observed Markov decision process: we illustrate how this can enable warm-starting optimistic reinforcement learning algorithms with valid robust bounds from observational data.
New Draft Available!
David Bruns-Smith, Emi Nakamura, Jón Steinsson
A canonical finding from earlier research is that the cross-sectional variance of income in- creases sharply with age (Deaton and Paxson, 1994). However, the trend in this age profile is not separately identified from time and cohort trends. Conventional methods solve this iden- tification problem by ruling out “time effects.” This strong assumption is rejected by the data. We propose a new proxy variable machine learning approach to disentangle age, time and cohort effects. Using this method, we estimate a significantly smaller slope of the age profile of income variance for the US than conventional methods, as well as less erratic slopes for 11 other countries.
In Preparation
Joonhyuk Lee, David Bruns-Smith, and Guido Imbens
Panel data are ubiquitous as a basis for analysis and decision-making in social scientific, policy, and industrial settings. Methods for studying panel data, such as two-way fixed effects and synthetic control, are equally numerous and can yield sharply divergent empirical predictions. This ‘curse of abundance’ raises the following question – given a particular panel and goal, can we learn the best estimator to use? We provide a simple and statistically principled method for doing so. At a high level, our method generates synthetic panels that are plausible in light of the observed data, and then uses these as a basis for inference. The validity of our method is based on a novel connection to the selective inference literature in statistics and requires minimal assumptions. Importantly, our method can be applied to the realistic and economically important case in which both cross-unit and temporal correlation may exist, such as panels of state or country-level economic data.
IEEE Micro 41(3), 2021
Tae Jun Ham, Yejin Lee, Seong Hoon Seo, U Gyeong Song, Jae W Lee, David Bruns-Smith, Brendan Sweeney, Krste Asanovic, Young H Oh, and Lisa Wu Wills
This article presents a framework, Genesis (genome analysis), to efficiently and flexibly accelerate generic data manipulation operations that have become performance bottlenecks in the genomic data processing pipeline utilizing FPGAs-as-a-service. Genesis conceptualizes genomic data as a very large relational database and uses extended SQL as a domain-specific language to construct data manipulation queries. To accelerate the queries, we designed a Genesis hardware library of efficient coarse-grained primitives that can be composed into a specialized dataflow architecture. This approach explores a systematic and scalable methodology to expedite domain-specific end-to-end accelerated system development and deployment.
2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), 2020
Tae Jun Ham, David Bruns-Smith, Brendan Sweeney, Yejin Lee, Seong Hoon Seo, U Gyeong Song, Young H Oh, Krste Asanovic, Jae W Lee, and Lisa Wu Wills
* IEEE Micro Top Pick (top 12 papers of the year in computer architecture)
* ISCA@50 Retrospective (98/1077 papers were selected from across the last 25 years)
In this paper, we describe our vision to accelerate algorithms in the domain of genomic data analysis by proposing a framework called Genesis (genome analysis) that contains an interface and an implementation of a system that processes genomic data efficiently. This framework can be deployed in the cloud and exploit the FPGAs-as-a-service paradigm to provide cost-efficient secondary DNA analysis. We propose conceptualizing genomic reads and associated read attributes as a very large relational database and using extended SQL as a domain-specific language to construct queries that form various data manipulation operations. To accelerate such queries, we design a Genesis hardware library which consists of primitive hardware modules that can be composed to construct a dataflow architecture specialized for those queries. As a proof of concept for the Genesis framework, we present the architecture and the hardware implementation of several genomic analysis stages in the secondary analysis pipeline corresponding to the best known software analysis toolkit, GATK4 workflow proposed by the Broad Institute. We walk through the construction of genomic data analysis operations using a sequence of SQL-style queries and show how Genesis hardware library modules can be utilized to construct the hardware pipelines designed to accelerate such queries. We exploit parallelism and data reuse by utilizing a dataflow architecture along with the use of on-chip scratchpads as well as non-blocking APIs to manage the accelerators, allowing concurrent execution of the accelerator and the host. Our accelerated system deployed on the cloud FPGA performs up to 19.3× better than GATK4 running on a commodity multi-core Xeon server and obtains up to 15× better cost savings. We believe that if a software algorithm can be mapped onto a hardware library to utilize the underlying accelerator(s) using an already-standardized software interface such as SQL, while allowing the efficient mapping of such interface to primitive hardware modules as we have demonstrated here, it will expedite the acceleration of domain-specific algorithms and allow the easy adaptation of algorithm changes.
Future Generation Computer Systems 96, 2019
Muthu M. Baskaran, Thomas Henretty, James Ezick, Richard Lethin, and David Bruns-Smith
The increasing size, variety, rate of growth and change, and complexity of network data has warranted advanced network analysis and services. Tools that provide automated analysis through traditional or advanced signature-based systems or machine learning classifiers suffer from practical difficulties. These tools fail to provide comprehensive and contextual insights into the network when put to practical use in operational cyber security. In this paper, we present an effective tool for network security and traffic analysis that uses high-performance data analytics based on a class of unsupervised learning algorithms called tensor decompositions. The tool aims to provide a scalable analysis of the network traffic data and also reduce the cognitive load of network analysts and be network-expert-friendly by presenting clear and actionable insights into the network. In this paper, we demonstrate the successful use of the tool in two completely diverse operational cyber security environments, namely, (1) security operations center (SOC) for the SCinet network at the SuperComputing (SC) Conference in 2016 and 2017 and (2) Reservoir Labs' Local Area Network (LAN). In each of these environments, we produce actionable results for cyber security specialists including (but not limited to) (1) finding malicious network traffic involving internal and external attackers using port scans, SSH brute forcing, and NTP amplification attacks, (2) uncovering obfuscated network threats such as data exfiltration using DNS port and using ICMP traffic, and (3) finding network misconfiguration and performance degradation patterns.
2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2019
Lisa Wu, David Bruns-Smith, Frank A. Nothaft, Qijing Huang, Sagar Karandikar, Johnny Le, Andrew Lin, Howard Mao, Brendan Sweeney, Krste Asanović, David A. Patterson, and Anthony D. Joseph
The amount of data being generated in genomics is predicted to be between 2 and 40 exabytes per year for the next decade, making genomic analysis the new frontier and the new challenge for precision medicine. This paper explores targeted deployment of hardware accelerators in the cloud to improve the runtime and throughput of immense-scale genomic data analyses. In particular, INDEL (INsertion/DELetion) realignment is a critical operation that enables diagnostic testings of cancer through error correction prior to variant calling. It is the slowest part of the somatic (cancer) genomic analysis pipeline, the alignment refinement pipeline, and represents roughly one-third of the execution time of time-sensitive diagnostics for acute cancer patients. To accelerate genomic analysis, this paper describes a hardware accelerator for INDEL realignment (IR), and a hardware-software framework leveraging FPGAs-as-a-service in the cloud. We chose to implement genomics analytics on FPGAs because genomic algorithms are still rapidly evolving (e.g. the de facto standard "GATK Best Practices" has had five releases since January of this year). We chose to deploy genomics accelerators in the cloud to reduce capital expenditure and to provide a more quantitative performance and cost analysis. We built and deployed a sea of IR accelerators using our hardware-software accelerator development framework on AWS EC2 F1 instances. We show that our IR accelerator system performed 81x better than multi-threaded genomic analysis software while being 32x more cost efficient.
2017 IEEE High Performance Extreme Computing Conference (HPEC), 2017
Tom Henretty, Muthu Baskaran, James Ezick, David Bruns-Smith, and Tyler A. Simon
With the recent explosion of systems capable of generating and storing large quantities of GPS data, there is an opportunity to develop novel techniques for analyzing and gaining meaningful insights into this spatiotemporal data. In this paper we examine the application of tensor decompositions, a high-dimensional data analysis technique, to georeferenced data sets. Guidance is provided on fitting spatiotemporal data into the tensor model and analyzing the results. We find that tensor decompositions provide insight and that future research into spatiotemporal tensor decompositions for pattern detection, clustering, and anomaly detection is warranted.
2017 IEEE High Performance Extreme Computing Conference (HPEC), 2017
* Best Paper Award
Muthu Baskaran, Tom Henretty, Benoit Pradelle, M Harper Langston, David Bruns-Smith, James Ezick, and Richard Lethin
Tensor decompositions are a powerful technique for enabling comprehensive and complete analysis of real-world data. Data analysis through tensor decompositions involves intensive computations over large-scale irregular sparse data. Optimizing the execution of such data intensive computations is key to reducing the time-to-solution (or response time) in real-world data analysis applications. As high-performance computing (HPC) systems are increasingly used for data analysis applications, it is becoming increasingly important to optimize sparse tensor computations and execute them efficiently on modern and advanced HPC systems. In addition to utilizing the large processing capability of HPC systems, it is crucial to improve memory performance (memory usage, communication, synchronization, memory reuse, and data locality) in HPC systems. In this paper, we present multiple optimizations that are targeted towards faster and memory-efficient execution of large-scale tensor analysis on HPC systems. We demonstrate that our techniques achieve reduction in memory usage and execution time of tensor decomposition methods when they are applied on multiple datasets of varied size and structure from different application domains. We achieve up to 11× reduction in memory usage and up to 7× improvement in performance. More importantly, we enable the application of large tensor decompositions on some important datasets on a multi-core system that would not have been feasible without our optimization.
2016 IEEE High Performance Extreme Computing Conference (HPEC), 2016
Muthu Baskaran, M Harper Langston, Tahina Ramananandro, David Bruns-Smith, Tom Henretty, James Ezick, and Richard Lethin
Tensor analysis (through tensor decompositions) is increasingly becoming popular as a powerful technique for enabling comprehensive and complete analysis of real-world data. In many critical modern applications, large-scale tensor data arrives continuously (in streams) or changes dynamically over time. Tensor decompositions over static snapshots of tensor data become prohibitively expensive due to space and computational bottlenecks, and severely limit the use of tensor analysis in applications that require quick response. Effective and rapid streaming (or non-stationary) tensor decompositions are critical for enabling large-scale real-time analysis. We present new algorithms for streaming tensor decompositions that effectively use the low-rank structure of data updates to dynamically and rapidly perform tensor decompositions of continuously evolving data. Our contributions presented here are integral for enabling tensor decompositions to become a viable analysis tool for large-scale time-critical applications. Further, we present our newly-implemented parallelized versions of these algorithms, which will enable more effective deployment of these algorithms in real-world applications. We present the effectiveness of our approach in terms of faster execution of streaming tensor decompositions that directly translate to short response time during analysis.
2016 Cybersecurity Symposium (CYBERSEC), 2016
David Bruns-Smith, Muthu M. Baskaran, James Ezick, Tom Henretty, and Richard Lethin
Traditional machine learning approaches are plagued with problems for practical use in operational cyber security. The class of unsupervised learning algorithms called tensor decompositions provide a new approach for analyzing network traffic data that avoids these traditional problems. Tensors are a natural representation for multidimensional data as an array with arbitrary dimensions. Tensor decompositions factor the data into components, each of which represents a different pattern of activity from within the original data. We use ENSIGN, a tensor decomposition toolbox developed by Reservoir Labs, in the security operations center for the SCinet network at SC15 - The International Conference for High Performance Computing, Networking, Storage and Analysis. ENSIGN integrates naturally into an operational cyber security framework by extracting anomalous patterns of network traffic. In this paper, we present two case studies highlighting specific actionable results: one, discovering an external attacker and tracing the evolution of the attack over time, and the other, extracting an example of data exfiltration that the actor disguised as DNS activity and cleanly separating it from normal DNS activity. Through proof-of-concept experiments at SC15, we successfully demonstrate concrete and practical use of ENSIGN and make a critical step forward towards delivering an integrated tensor analysis engine for network security.