Students and Faculty from Key Laboratory of AI Safety, Chinese Academy of Sciences won EMNLP 2024 Best Paper Award

Date： Nov 26, 2024 | Size: 【 A A A 】 | 【Print】

Recently, Weichao Zhang, a Ph.D. student at the Key Laboratory of AI Safety, Chinese Academy of Sciences (supervised by Professor Jiafeng Guo), was awarded the Best Paper Award at the EMNLP 2024 conference for paper titled Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method, in which he is the first author.

The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024) is a high-level international conference in the field of computational linguistics and natural language processing. It is listed as a B conference in the CCF (China Computer Federation) recommendation list and holds a high academic reputation in the related fields. The conference took place from November 12 to 16, 2024, in Miami, Florida, USA. EMNLP 2024 received 6,105 submissions and accepted 2,978 papers, including 1269 main conference papers. The award-winning papers were selected by the Best Paper Award Committee from 114 candidates nominated by field chairs and senior field chairs, with a total of 5 papers receiving the Best Paper Award.

Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method

Authors: Weichao Zhang，Ruqing Zhang，Jiafeng Guo，Maarten de Rijke，Yixing Fan，Xueqi Cheng

Paper link:https://arxiv.org/abs/2409.14781

Code link: https://github.com/zhang-wei-chao/DC-PDD

Abstract : As the scale of training corpora for large language models (LLMs) grows, model developers become increasingly reluctant to disclose details on their data. This lack of transparency poses challenges to scientific evaluation and ethical deployment. Recently, pretraining data detection approaches, which infer whether a given text was part of an LLM’s training data through black-box access, have been explored. The Min-K% Prob method, which has achieved state-of-the-art results, assumes that a non-training example tends to contain a few outlier words with low token probabilities. However, the effectiveness may be limited as it tends to misclassify non-training texts that contain many common words with high probabilities predicted by LLMs. To address this issue, we introduce a divergence-based calibration method, inspired by the divergencefrom-randomness concept, to calibrate token probabilities for pretraining data detection. We compute the cross-entropy (i.e., the divergence) between the token probability distribution and the token frequency distribution to derive a detection score. We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text. Experimental results on English-language benchmarks and PatentMIA demonstrate that our proposed method significantly outperforms existing methods.

downloadFile