학술논문

Multitask Fine-Tuning for Passage Re-Ranking Using BM25 and Pseudo Relevance Feedback

Document Type

Periodical

Author

Source

IEEE Access Access, IEEE. 10:54254-54262 2022

Subject

Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Task analysis
Training
Computational modeling
Information retrieval
Data models
Semantics
Neural networks
passage ranking
pre-trained language model
self-supervised learning

Language

ISSN

2169-3536

Abstract

Passage re-ranking is a machine learning task that estimates relevance scores between a given query and candidate passages. Keyword features based on the lexical similarities between queries and passages have been traditionally used for the passage re-ranking models. However, such approaches have a limitation; it is difficult to find semantic and contextual features beyond word-matching information. Recently, several studies based on neural pre-trained language models such as BERT overcome the limitations of traditional keyword-based models and they show significant performance improvements. Such ranking models have the advantage of finding the contextual features of queries and documents better than traditional keyword-based methods. However, these deep learning-based models require large amounts of data for training. Such training data is usually manually labeled with high cost, and how to utilize the data efficiently is an important issue. This paper proposes a fine-tuning method for efficient training of the neural re-ranking model. The proposed model utilizes data augmentation by simultaneously learning the ranking and MLM tasks during the fine-tuning stages. For the MLM task, different parts of a passage are masked at each training epoch. Even if only one pair of query and passage is given, the model is exposed to diverse cases with passages dynamically masked from the one. Also, the probability distribution of term importance is trained on the model. We calculate term importance weight by two novel methods using BM25 and pseudo relevance feedback. Terms are sampled and masked according to the importance weight. The ranking model learns representation based on the term weight distribution by executing the MLM task. A novel method with pseudo relevance feedback is applied for calculating term importance. It enables the neural ranking models to form representation according to feedbacks from an initial search stage. The proposed model is trained with data from the MS MARCO leaderboard for the re-ranking task. Our model achieves the state-of-the-art MRR@10 score in the leaderboard except for the ensemble-based method. In addition, our model demonstrates significant performance in three different evaluation metrics: MRR@10, Mean Rank, and Hit@(5,10,20,50).

Online Access

Open Access (EBSCO) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송