Working Paper
On the Testability of the Anchor-Words Assumption in Topic Models
Abstract: What does the Fed talk about in its monetary policy discussions? We introduce a new statistical methodology to analyze text documents, and we use that methodology to recover the topics discussed during FOMC meetings. Topic models are a simple and popular tool for the statistical analysis of textual data. Their identification and estimation are typically enabled by assuming the existence of anchor words; that is, words that are exclusive to specific topics. In this paper we show that the existence of anchor words is statistically testable: There exists a hypothesis test with correct size that has nontrivial power. This means that the anchor-words assumption cannot be viewed simply as a convenient normalization. Central to our results is a simple characterization of when a column-stochastic matrix with known nonnegative rank admits a separable factorization. We test for the existence of anchor words in two different datasets derived from monetary policy discussions in the Federal Reserve and reject the null hypothesis that anchor words exist in one of them.
https://doi.org/10.21799/frbp.wp.2025.14
Access Documents
File(s): File format is application/pdf https://www.philadelphiafed.org/-/media/FRBP/Assets/working-papers/2025/wp25-14.pdf
Bibliographic Information
Provider: Federal Reserve Bank of Philadelphia
Part of Series: Working Papers
Publication Date: 2025-03-19
Number: 25-14