Instructions to use rpDungeon/Gemma-4-E6B-IT-raw with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rpDungeon/Gemma-4-E6B-IT-raw with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="rpDungeon/Gemma-4-E6B-IT-raw")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("rpDungeon/Gemma-4-E6B-IT-raw")
model = AutoModelForImageTextToText.from_pretrained("rpDungeon/Gemma-4-E6B-IT-raw")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use rpDungeon/Gemma-4-E6B-IT-raw with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rpDungeon/Gemma-4-E6B-IT-raw"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rpDungeon/Gemma-4-E6B-IT-raw",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/rpDungeon/Gemma-4-E6B-IT-raw

SGLang

How to use rpDungeon/Gemma-4-E6B-IT-raw with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "rpDungeon/Gemma-4-E6B-IT-raw" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rpDungeon/Gemma-4-E6B-IT-raw",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "rpDungeon/Gemma-4-E6B-IT-raw" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rpDungeon/Gemma-4-E6B-IT-raw",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use rpDungeon/Gemma-4-E6B-IT-raw with Docker Model Runner:
```
docker model run hf.co/rpDungeon/Gemma-4-E6B-IT-raw
```

Gemma-4-E6B-IT-raw

⚠️ RAW / UNHEALED depth-upscale. This model degenerates (loops) under normal decoding and is not usable as-is. It is published as a base artifact for a healing train and for reproducibility of the E4B depth-upscale research. Do not deploy it raw.

What it is

A depth-upscaled (42 → 66 transformer layers, ~11.9B params) passthrough frankenmerge of google/gemma-4-E4B-it onto itself ("IT + IT"). It is the "E6B" depth target — a deeper E4B — in its pre-heal state. No other model is mixed in; every layer comes from gemma-4-E4B-it.

How it was made

Gemma-4 E4B uses Per-Layer Embeddings (PLE) — each layer is injected with a depth-specific, token-derived signal (embed_tokens_per_layer + per_layer_model_projection). Generic passthrough mergers leave those two global tensors at their original 42-layer width, which breaks a stacked model, so this was assembled with a custom PLE-aware stacker.

The 42 IT layers were re-sequenced into 66 output slots via four overlapping slices (the all-IT version of the project's "e6b_v1" topology — same slice structure, every slice sourced from IT):

output slots  0–17 : IT layers  0–17     (forward)
output slots 18–33 : IT layers 10–25     (rewind/overlap to L10)
output slots 34–49 : IT layers 18–33     (rewind/overlap to L18)
output slots 50–65 : IT layers 26–41     (rewind/overlap to L26)

For each output layer i (sourced from IT layer j) the stacker:

copies all model.language_model.layers.{j}.* weights (including the per-layer PLE riders);
remaps the global PLE tensors embed_tokens_per_layer[:, j·256:(j+1)·256] → [:, i·256:(i+1)·256] and per_layer_model_projection[j·256:(j+1)·256, :] → [i·256:(i+1)·256, :], widening the PLE table from 42×256 to 66×256;
copies the trunk (token embeddings, final norm, vision/audio towers) verbatim from IT;
sets num_hidden_layers = 66, num_kv_shared_layers = 26 (trailing shared region recomputed so each shared layer borrows KV from the last non-shared layer of its attention type), and rebuilds layer_types to keep E4B's 5-sliding : 1-full rhythm.

Parameter breakdown (~11.9B total): transformer layers 6.65B (56%), Per-Layer-Embedding table 4.43B (37%), token embeddings 0.67B (6%), the rest PLE projection + vision/audio towers. Note that >40% of the mass is the PLE table + embeddings + towers, which scale with vocabulary/modality, not depth — so the depth-upscale's headline size comes from both the 24 added layers and the widened PLE table.

Why it is "raw" (what's wrong with it)

E4B's per-layer embeddings are welded to each layer's weights, and re-running layers at a different depth (the three rewind/overlap seams) pushes the residual stream off-distribution. The result is grammatical but loops under greedy decoding. A broad set of no-train fixes — PLE re-indexing by destination, zeroing duplicated PLE, residual-write scaling, brand-new interpolated layers, donor swaps — were all tested and none recover coherence. The break is structural, not something a weight-shuffle can fix; new depth has to be learned. (This 66-layer build has more rewind seams than a minimal single-seam stack, so expect it to need a somewhat heavier heal.)

How to make it usable

A heal (LoRA SFT/CPT) re-knits it into a coherent deeper model. A Fisher/subspace-protected heal — protecting the high-importance instruction directions while re-knitting the off-distribution layers — is the intended next step, to recover coherence without sacrificing instruction-following.

Intended use

Base for a healing train (the primary purpose).
Reproducibility / study of Gemma-4 depth-upscaling and its Per-Layer-Embedding constraints.

Built 2026-06-01 as part of the E4B → E6B depth-upscale research.

Downloads last month: -

Safetensors

Model size

12B params

Tensor type

BF16

Model tree for rpDungeon/Gemma-4-E6B-IT-raw

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Finetuned

(195)

this model