model.generate() crashes: AttributeError 'AttentionInterface' has no attribute 'get_interface' (transformers==5.0.0)

#11

by Bias92 - opened Feb 7

Discussion

Bias92

Feb 7

Follow-up: _tied_weights_keys fix confirmed ✅ — but new AttentionInterface error in model.generate()

Thanks for the update @nuxlear ! I pulled the latest snapshot with force_download=True and confirmed:

✅ _tied_weights_keys is now a dict ({"lm_head.weight": "transformer.wte.weight"}) — the original 'list' object has no attribute 'keys' error during post_init is fixed.
✅ Model loads successfully (weights 100% materialized).

$confirm\_tied\_weights\_keys\_fixed$

However, model.generate() now crashes with a new error:

AttributeError: 'AttentionInterface' object has no attribute 'get_interface'

$generate\_error\_1$

$generate\_error\_traceback$

$use\_cache\_false\_same\_error$

$env\_versions$

Tested:

attn_implementation="eager" → same error
use_cache=False → same error

So it doesn’t look KV-cache/SDPA-specific — the failure happens before that.

Possible root cause (hypothesis):
modeling_exaone.py appears to use the older Transformers v4-style attention selection (class-based dispatch like ExaoneSelfAttention / ExaoneFlashAttention / ExaoneSdpaAttention based on config). In Transformers v5, attention dispatch is handled via AttentionInterface / ALL_ATTENTION_FUNCTIONS. This mismatch may be why generate() hits AttentionInterface.get_interface and fails.

Repro:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL = "LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct"

tok = AutoTokenizer.from_pretrained(MODEL, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL,
    trust_remote_code=True,
    attn_implementation="eager",
)

x = tok("hi\n", return_tensors="pt")
x = {k: v.to(model.device) for k, v in x.items()}

y = model.generate(**x, max_new_tokens=16, do_sample=False, use_cache=True)
# -> AttributeError: 'AttentionInterface' object has no attribute 'get_interface'

Environment:

transformers==5.0.0
torch==2.10.0
Python 3.13.2, Mac / M4 Pro (Apple Silicon)

Happy to help test changes or submit a PR once the attention code is updated.

nuxlear

LG AI Research org Feb 9

Could you upgrade your transformers version to 5.1.0?
The new modeling code is generated with the latest version, which is also compatible with EXAONE-MoE.

Please refer to this PR for more details: https://github.com/huggingface/transformers/pull/43622

Bias92

Feb 10

Confirmed that the issue is resolved with transformers==5.1.0.

Model loads and model.generate() works without the AttentionInterface error.
Tested on Python 3.13.2, Mac / M4 Pro (Apple Silicon)

Thanks @nuxlear for the quick response!

nuxlear changed discussion status to closed Feb 10

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment