Validating Large Language Model Annotations

Lundgaard Hansen, Anne

doi:10.17016/FEDS.2026.020

Working Paper

Validating Large Language Model Annotations

Abstract: This paper proposes a validation framework for LLM-generated measurements when reliable benchmarks are unavailable. Validity is established by testing whether an LLM can reconstruct passages from annotated labels while maintaining semantic consistency with the original text. The framework avoids circular reasoning by establishing testable prerequisite properties that must be met for a validation to be considered successful. Application to news article data demonstrates that the framework serves as a practical alternative to human benchmarking, which offers advantages in objectivity, scalability, and cost-effectiveness while identifying cases where LLMs capture economic meaning that human evaluators miss.

JEL Classification: C18; C45; C80;

https://doi.org/10.17016/FEDS.2026.020

Access Documents

File(s): File format is application/pdf https://www.federalreserve.gov/econres/feds/files/2026020pap.pdf
Description: Full text

Authors

Lundgaard Hansen, Anne

Bibliographic Information

Provider: Board of Governors of the Federal Reserve System (U.S.)

Part of Series: Finance and Economics Discussion Series

Publication Date: 2026-03-30

Number: 2026-020

Fed in Print

Validating Large Language Model Annotations

Access Documents

Authors

Bibliographic Information