Search Results
Journal Article
Artificial Intelligence and Inflation Forecasts
We explore the ability of large language models (LLMs) to produce in-sample conditional inflation forecasts during the 2019–23 period. We use a leading LLM (Google AI’s PaLM) to produce distributions of conditional forecasts at different horizons and compare these forecasts to those of a leading source, the Survey of Professional Forecasters (SPF). We find that LLM forecasts generate lower mean-squared errors overall in most years and at almost all horizons. LLM forecasts exhibit slower reversion to the 2 percent inflation anchor.
Working Paper
Explaining Machine Learning by Bootstrapping Partial Marginal Effects and Shapley Values
Machine learning and artificial intelligence are often described as “black boxes.” Traditional linear regression is interpreted through its marginal relationships as captured by regression coefficients. We show that the same marginal relationship can be described rigorously for any machine learning model by calculating the slope of the partial dependence functions, which we call the partial marginal effect (PME). We prove that the PME of OLS is analytically equivalent to the OLS regression coefficient. Bootstrapping provides standard errors and confidence intervals around the point ...
Working Paper
The Anatomy of Out-of-Sample Forecasting Accuracy
We develop metrics based on Shapley values for interpreting time-series forecasting models, including“black-box” models from machine learning. Our metrics are model agnostic, so that they are applicable to any model (linear or nonlinear, parametric or nonparametric). Two of the metrics, iShapley-VI and oShapley-VI, measure the importance of individual predictors in fitted models for explaining the in-sample and out-of-sample predicted target values, respectively. The third metric is the performance-based Shapley value (PBSV), our main methodological contribution. PBSV measures the ...
Working Paper
How Centralized is U.S. Metropolitan Employment?
Centralized employment remains a benchmark stylization of metropolitan land use.To address its empirical relevance, we delineate "central employment zones" (CEZs)- central business districts together with nearby concentrated employment|for 183 metropolitan areas in 2000. To do so, we first subjectively classify which census tracts in a training sample of metros belong to their metro's CEZ and then use a learning algorithm to construct a function that predicts our judgment. {{p}} Applying this prediction function to the full cross section of metros estimates the probability we would judge ...
Working Paper
Generative AI at the Crossroads: Light Bulb, Dynamo, or Microscope?
With the advent of generative AI (genAI), the potential scope of artificial intelligence has increased dramatically, but the future effect of genAI on productivity remains uncertain. The effect of the technology on the innovation process is a crucial open question. Some inventions, such as the light bulb, temporarily raise productivity growth as adoption spreads, but the effect fades when the market is saturated; that is, the level of output per hour is permanently higher but the growth rate is not. In contrast, two types of technologies stand out as having longer-lived effects on ...
Working Paper
Machine Learning, the Treasury Yield Curve and Recession Forecasting
We use machine learning methods to examine the power of Treasury term spreads and other financial market and macroeconomic variables to forecast US recessions, vis-à-vis probit regression. In particular we propose a novel strategy for conducting cross-validation on classifiers trained with macro/financial panel data of low frequency and compare the results to those obtained from standard k-folds cross-validation. Consistent with the existing literature we find that, in the time series setting, forecast accuracy estimates derived from k-folds are biased optimistically, and cross-validation ...
Working Paper
Evaluating Local Language Models: An Application to Bank Earnings Calls
This study evaluates the performance of local large language models (LLMs) in interpreting financial texts, compared with closed-source, cloud-based models. We first introduce new benchmarking tasks for assessing LLM performance in analyzing financial and economic texts and explore the refinements needed to improve its performance. Our benchmarking results suggest local LLMs are a viable tool for general natural language processing analysis of these texts. We then leverage local LLMs to analyze the tone and substance of bank earnings calls in the post-pandemic era, including calls conducted ...