Papers
arxiv:2410.10739

Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs

Published on Oct 14, 2024
Authors:
,
,
,

Abstract

The study investigates the impact of continuous pre-training on the instruction-following abilities of LLMs and seeks compute-efficient methods to update both base and instruction-refined models without additional fine-tuning data.

AI-generated summary

Large Language Models (LLMs) for public use require continuous pre-training to remain up-to-date with the latest data. The models also need to be fine-tuned with specific instructions to maintain their ability to follow instructions accurately. Typically, LLMs are released in two versions: the Base LLM, pre-trained on diverse data, and the instruction-refined LLM, additionally trained with specific instructions for better instruction following. The question arises as to which model should undergo continuous pre-training to maintain its instruction-following abilities while also staying current with the latest data. In this study, we delve into the intricate relationship between continuous pre-training and instruction fine-tuning of the LLMs and investigate the impact of continuous pre-training on the instruction following abilities of both the base and its instruction finetuned model. Further, the instruction fine-tuning process is computationally intense and requires a substantial number of hand-annotated examples for the model to learn effectively. This study aims to find the most compute-efficient strategy to gain up-to-date knowledge and instruction-following capabilities without requiring any instruction data and fine-tuning. We empirically prove our findings on the LLaMa 3, 3.1 and Qwen 2, 2.5 family of base and instruction models, providing a comprehensive exploration of our hypotheses across varying sizes of pre-training data corpus and different LLMs settings.

Community

Pretty interesting idea, which transfers the instruction-following capability through the delta weights of the instruction model and base model, making continuous pretraining more practical.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2410.10739
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 4

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.10739 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.