Papers
arxiv:2402.07754

Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models

Published on Feb 12, 2024
Authors:
,
,
,
,
,
,
,
,
,

Abstract

Diffusion-of-Thought (DoT) integrates diffusion models with Chain-of-Thought to improve reasoning ability in text processing, demonstrating effectiveness in mathematical problems and self-correction.

Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models. In this work, we propose Diffusion-of-Thought (DoT), a novel approach that integrates diffusion models with Chain-of-Thought, a well-established technique for improving the reasoning ability of autoregressive language models. In contrast to autoregressive language models that make decisions in a left-to-right, token-by-token manner, DoT allows reasoning steps to diffuse over time through a diffusion language model and offers greater flexibility in trading-off computation for reasoning performance. Our experimental results demonstrate the effectiveness of DoT in multi-digit multiplication, boolean logic, and grade school math problems, with a small diffusion model outperforming a much larger autoregressive model in both efficiency and accuracy. In addition to that, DoT showcases promising self-correction abilities and benefits from existing reasoning-enhancing techniques like self-consistency decoding. Our findings contribute to the understanding and development of reasoning with diffusion language models.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2402.07754
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2402.07754 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2402.07754 in a Space README.md to link it from this page.

Collections including this paper 3