Papers
arxiv:2401.03129

Examining Forgetting in Continual Pre-training of Aligned Large Language Models

Published on Jan 6, 2024
Authors:
,

Abstract

Continual pre-training of fine-tuned Large Language Models causes catastrophic forgetting, affecting output format, knowledge, and reliability, particularly through the repetition issue.

Recent advances in Large Language Models (LLMs) have exhibited remarkable proficiency across various tasks. Given the potent applications of LLMs in numerous fields, there has been a surge in LLM development. In developing LLMs, a common practice involves continual pre-training on previously fine-tuned models. However, this can lead to catastrophic forgetting. In our work, we investigate the phenomenon of forgetting that occurs during continual pre-training on an existing fine-tuned LLM. We evaluate the impact of continuous pre-training on the fine-tuned LLM across various dimensions, including output format, knowledge, and reliability. Experiment results highlight the non-trivial challenge of addressing catastrophic forgetting during continual pre-training, especially the repetition issue.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2401.03129
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2401.03129 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2401.03129 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2401.03129 in a Space README.md to link it from this page.

Collections including this paper 1