model.generate() crashes: AttributeError 'AttentionInterface' has no attribute 'get_interface' (transformers==5.0.0)

#11
by Bias92 - opened

Follow-up: _tied_weights_keys fix confirmed ✅ — but new AttentionInterface error in model.generate()

Thanks for the update @nuxlear ! I pulled the latest snapshot with force_download=True and confirmed:

  • _tied_weights_keys is now a dict ({"lm_head.weight": "transformer.wte.weight"}) — the original 'list' object has no attribute 'keys' error during post_init is fixed.
  • ✅ Model loads successfully (weights 100% materialized).

confirm\_tied\_weights\_keys\_fixed

However, model.generate() now crashes with a new error:

AttributeError: 'AttentionInterface' object has no attribute 'get_interface'

generate\_error\_1

generate\_error\_traceback

use\_cache\_false\_same\_error

env\_versions

Tested:

  • attn_implementation="eager" → same error
  • use_cache=False → same error

So it doesn’t look KV-cache/SDPA-specific — the failure happens before that.

Possible root cause (hypothesis):
modeling_exaone.py appears to use the older Transformers v4-style attention selection (class-based dispatch like ExaoneSelfAttention / ExaoneFlashAttention / ExaoneSdpaAttention based on config). In Transformers v5, attention dispatch is handled via AttentionInterface / ALL_ATTENTION_FUNCTIONS. This mismatch may be why generate() hits AttentionInterface.get_interface and fails.

Repro:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL = "LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct"

tok = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL,
    trust_remote_code=True,
    attn_implementation="eager",
)

x = tok("hi\n", return_tensors="pt")
x = {k: v.to(model.device) for k, v in x.items()}

y = model.generate(**x, max_new_tokens=16, do_sample=False, use_cache=True)
# -> AttributeError: 'AttentionInterface' object has no attribute 'get_interface'

Environment:

  • transformers==5.0.0
  • torch==2.10.0
  • Python 3.13.2, Mac / M4 Pro (Apple Silicon)

Happy to help test changes or submit a PR once the attention code is updated.

LG AI Research org

Could you upgrade your transformers version to 5.1.0?
The new modeling code is generated with the latest version, which is also compatible with EXAONE-MoE.

Please refer to this PR for more details: https://github.com/huggingface/transformers/pull/43622

Confirmed that the issue is resolved with transformers==5.1.0.

  • Model loads and model.generate() works without the AttentionInterface error.
  • Tested on Python 3.13.2, Mac / M4 Pro (Apple Silicon)

Thanks @nuxlear for the quick response!

nuxlear changed discussion status to closed

Sign up or log in to comment