fix chat template to avoid empty historical `<think>` blocks

#48

by latent-variable - opened 3 days ago

base: refs/heads/main

←

from: refs/pr/48

Discussion Files changed

-1

fix chat template to avoid empty historical `<think>` blockscc5442c0

latent-variable

3 days ago

•

edited about 12 hours ago

This fixes a chat template issue where historical assistant turns can emit empty <think>...</think> blocks even when reasoning_content is empty.

That matters because these empty historical <think> blocks change the serialized prompt without adding any useful information.

The fix is a really simple one-line change in the template:

from:

{%- if loop.index0 > ns.last_query_index %}

to:

{%- if loop.index0 > ns.last_query_index and reasoning_content %}

Why this is important:

it reduces unnecessary prompt drift
it improves prefix-cache reuse
it helps avoid avoidable cache misses
it reduces extra token processing caused by equivalent histories rendering differently

In practice, this means less wasted compute and better cache stability, especially in longer multi-turn or tool-using conversations.

The change is intentionally minimal:

keep the historical <think> wrapper when reasoning_content is actually present
do not emit an empty <think> block when there is no reasoning content

Without this guard, the template can produce prior turns like:

assistant
<think>

</think>

<tool_call>...

instead of rendering just the assistant content or tool call directly.

So this change preserves real reasoning content while avoiding empty reasoning scaffolding that can hurt caching behavior.

Edit: made a video explaining the bug
https://www.youtube.com/watch?v=3g70-ToSgr0

latent-variable changed pull request title from fix chat template to avoid empty historical `<think>` blocks to fix historical assistant turn rendering in chat_template.jinja about 12 hours ago

align historical assistant rendering with docsdd2757ff

latent-variable

about 12 hours ago

•

edited about 9 hours ago

small update after more testing: i tried the stricter version that removes historical <think> blocks entirely, but i think that one is too aggressive.

it seems better for cache reuse, but it may affect reasoning behavior / separation in some cases.

so i’m reverting these prs back to the safer minimal fix:

{%- if loop.index0 > ns.last_query_index and reasoning_content %}

that still fixes the empty historical wrapper issue without changing historical turns as aggressively.

latent-variable changed pull request title from fix historical assistant turn rendering in chat_template.jinja to fix chat template to avoid empty historical `<think>` blocks about 9 hours ago

revert to safer historical think guard28a1d554

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment