Working Paper
Validating Large Language Model Annotations
Abstract: This paper proposes a validation framework for LLM-generated measurements when reliable benchmarks are unavailable. Validity is established by testing whether an LLM can reconstruct passages from annotated labels while maintaining semantic consistency with the original text. The framework avoids circular reasoning by establishing testable prerequisite properties that must be met for a validation to be considered successful. Application to news article data demonstrates that the framework serves as a practical alternative to human benchmarking, which offers advantages in objectivity, scalability, and cost-effectiveness while identifying cases where LLMs capture economic meaning that human evaluators miss.
JEL Classification: C18; C45; C80;
https://doi.org/10.17016/FEDS.2026.020
Access Documents
File(s):
File format is application/pdf
https://www.federalreserve.gov/econres/feds/files/2026020pap.pdf
Description: Full text
Authors
Bibliographic Information
Provider: Board of Governors of the Federal Reserve System (U.S.)
Part of Series: Finance and Economics Discussion Series
Publication Date: 2026-03-30
Number: 2026-020