Working Paper
LLM on a Budget: Active Knowledge Distillation for Efficient Classification of Large Text Corpora
Abstract: Large Language Models (LLMs) are highly accurate in classification tasks, however, substantial computational and financial costs hinder their large-scale deployment in dynamic environments. Knowledge Distillation (KD) where a LLM ""teacher"" trains a smaller and more efficient ""student"" model, offers a promising solution to this problem. However, the distillation process itself often remains costly for large datasets, since it requires the teacher to label a vast number of samples while incurring significant token consumption. To alleviate this challenge, in this work we explore the active learning (AL) as a way to create efficient student models at a fraction of the cost while preserving the LLM's performance. In particular, we introduce M-RARU (Multi-class Randomized Accept/Reject Uncertainty Sampling), a novel AL algorithm that significantly reduces training costs. M-RARU employs an innovative strategy combining uncertainty with a randomized accept-reject mechanism to select only the most informative data points for the LLM teacher. This focused approach significantly minimizes required API calls and data processing time. We evaluate M-RARU against random sampling across five diverse student models (SVM, LDA, RF, GBDT, and DistilBERT) on multiple benchmark datasets. Experiments demonstrate that our proposed method achieves up to 80\% reduction in sample requirements as compared to random sampling, substantially improving classification accuracy while reducing financial costs and overall training time.
JEL Classification: C38; C45; C55;
https://doi.org/10.17016/FEDS.2025.108
Access Documents
File(s):
File format is application/pdf
https://www.federalreserve.gov/econres/feds/files/2025108pap.pdf
Description: Full text
Authors
Bibliographic Information
Provider: Board of Governors of the Federal Reserve System (U.S.)
Part of Series: Finance and Economics Discussion Series
Publication Date: 2025-12-15
Number: 2025-108