fix chat template to avoid empty historical `<think>` blocks
This fixes a chat template issue where historical assistant turns can emit empty <think>...</think> blocks even when reasoning_content is empty.
That matters because these empty historical <think> blocks change the serialized prompt without adding any useful information.
The fix is a really simple one-line change in the template:
from:
{%- if loop.index0 > ns.last_query_index %}
to:
{%- if loop.index0 > ns.last_query_index and reasoning_content %}
Why this is important:
- it reduces unnecessary prompt drift
- it improves prefix-cache reuse
- it helps avoid avoidable cache misses
- it reduces extra token processing caused by equivalent histories rendering differently
In practice, this means less wasted compute and better cache stability, especially in longer multi-turn or tool-using conversations.
The change is intentionally minimal:
- keep the historical
<think>wrapper whenreasoning_contentis actually present - do not emit an empty
<think>block when there is no reasoning content
Without this guard, the template can produce prior turns like:
assistant
<think>
</think>
<tool_call>...
instead of rendering just the assistant content or tool call directly.
So this change preserves real reasoning content while avoiding empty reasoning scaffolding that can hurt caching behavior.
Edit: made a video explaining the bug
https://www.youtube.com/watch?v=3g70-ToSgr0
small update after more testing: i tried the stricter version that removes historical <think> blocks entirely, but i think that one is too aggressive.
it seems better for cache reuse, but it may affect reasoning behavior / separation in some cases.
so i’m reverting these prs back to the safer minimal fix:
{%- if loop.index0 > ns.last_query_index and reasoning_content %}
that still fixes the empty historical wrapper issue without changing historical turns as aggressively.