fix chat template to avoid empty historical `<think>` blocks

#48

This fixes a chat template issue where historical assistant turns can emit empty <think>...</think> blocks even when reasoning_content is empty.

That matters because these empty historical <think> blocks change the serialized prompt without adding any useful information.

The fix is a really simple one-line change in the template:

from:

{%- if loop.index0 > ns.last_query_index %}

to:

{%- if loop.index0 > ns.last_query_index and reasoning_content %}

Why this is important:

  • it reduces unnecessary prompt drift
  • it improves prefix-cache reuse
  • it helps avoid avoidable cache misses
  • it reduces extra token processing caused by equivalent histories rendering differently

In practice, this means less wasted compute and better cache stability, especially in longer multi-turn or tool-using conversations.

The change is intentionally minimal:

  • keep the historical <think> wrapper when reasoning_content is actually present
  • do not emit an empty <think> block when there is no reasoning content

Without this guard, the template can produce prior turns like:

assistant
<think>

</think>

<tool_call>...

instead of rendering just the assistant content or tool call directly.

So this change preserves real reasoning content while avoiding empty reasoning scaffolding that can hurt caching behavior.

Edit: made a video explaining the bug
https://www.youtube.com/watch?v=3g70-ToSgr0

latent-variable changed pull request title from fix chat template to avoid empty historical `<think>` blocks to fix historical assistant turn rendering in chat_template.jinja

small update after more testing: i tried the stricter version that removes historical <think> blocks entirely, but i think that one is too aggressive.

it seems better for cache reuse, but it may affect reasoning behavior / separation in some cases.

so i’m reverting these prs back to the safer minimal fix:

{%- if loop.index0 > ns.last_query_index and reasoning_content %}

that still fixes the empty historical wrapper issue without changing historical turns as aggressively.

latent-variable changed pull request title from fix historical assistant turn rendering in chat_template.jinja to fix chat template to avoid empty historical `<think>` blocks
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment