Instructions to use rpDungeon/Gemma-4-E6B-IT-raw with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rpDungeon/Gemma-4-E6B-IT-raw with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="rpDungeon/Gemma-4-E6B-IT-raw") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("rpDungeon/Gemma-4-E6B-IT-raw") model = AutoModelForImageTextToText.from_pretrained("rpDungeon/Gemma-4-E6B-IT-raw") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use rpDungeon/Gemma-4-E6B-IT-raw with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rpDungeon/Gemma-4-E6B-IT-raw" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rpDungeon/Gemma-4-E6B-IT-raw", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/rpDungeon/Gemma-4-E6B-IT-raw
- SGLang
How to use rpDungeon/Gemma-4-E6B-IT-raw with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "rpDungeon/Gemma-4-E6B-IT-raw" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rpDungeon/Gemma-4-E6B-IT-raw", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "rpDungeon/Gemma-4-E6B-IT-raw" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rpDungeon/Gemma-4-E6B-IT-raw", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use rpDungeon/Gemma-4-E6B-IT-raw with Docker Model Runner:
docker model run hf.co/rpDungeon/Gemma-4-E6B-IT-raw
Gemma-4-E6B-IT-raw
⚠️ RAW / UNHEALED depth-upscale. This model degenerates (loops) under normal decoding and is not usable as-is. It is published as a base artifact for a healing train and for reproducibility of the E4B depth-upscale research. Do not deploy it raw.
What it is
A depth-upscaled (42 → 66 transformer layers, ~11.9B params) passthrough frankenmerge of
google/gemma-4-E4B-it onto itself ("IT + IT").
It is the "E6B" depth target — a deeper E4B — in its pre-heal state. No other model is mixed in;
every layer comes from gemma-4-E4B-it.
How it was made
Gemma-4 E4B uses Per-Layer Embeddings (PLE) — each layer is injected with a depth-specific,
token-derived signal (embed_tokens_per_layer + per_layer_model_projection). Generic passthrough
mergers leave those two global tensors at their original 42-layer width, which breaks a stacked model,
so this was assembled with a custom PLE-aware stacker.
The 42 IT layers were re-sequenced into 66 output slots via four overlapping slices (the all-IT version of the project's "e6b_v1" topology — same slice structure, every slice sourced from IT):
output slots 0–17 : IT layers 0–17 (forward)
output slots 18–33 : IT layers 10–25 (rewind/overlap to L10)
output slots 34–49 : IT layers 18–33 (rewind/overlap to L18)
output slots 50–65 : IT layers 26–41 (rewind/overlap to L26)
For each output layer i (sourced from IT layer j) the stacker:
- copies all
model.language_model.layers.{j}.*weights (including the per-layer PLE riders); - remaps the global PLE tensors
embed_tokens_per_layer[:, j·256:(j+1)·256]→[:, i·256:(i+1)·256]andper_layer_model_projection[j·256:(j+1)·256, :]→[i·256:(i+1)·256, :], widening the PLE table from 42×256 to 66×256; - copies the trunk (token embeddings, final norm, vision/audio towers) verbatim from IT;
- sets
num_hidden_layers = 66,num_kv_shared_layers = 26(trailing shared region recomputed so each shared layer borrows KV from the last non-shared layer of its attention type), and rebuildslayer_typesto keep E4B's 5-sliding : 1-full rhythm.
Parameter breakdown (~11.9B total): transformer layers 6.65B (56%), Per-Layer-Embedding table 4.43B (37%), token embeddings 0.67B (6%), the rest PLE projection + vision/audio towers. Note that >40% of the mass is the PLE table + embeddings + towers, which scale with vocabulary/modality, not depth — so the depth-upscale's headline size comes from both the 24 added layers and the widened PLE table.
Why it is "raw" (what's wrong with it)
E4B's per-layer embeddings are welded to each layer's weights, and re-running layers at a different depth (the three rewind/overlap seams) pushes the residual stream off-distribution. The result is grammatical but loops under greedy decoding. A broad set of no-train fixes — PLE re-indexing by destination, zeroing duplicated PLE, residual-write scaling, brand-new interpolated layers, donor swaps — were all tested and none recover coherence. The break is structural, not something a weight-shuffle can fix; new depth has to be learned. (This 66-layer build has more rewind seams than a minimal single-seam stack, so expect it to need a somewhat heavier heal.)
How to make it usable
A heal (LoRA SFT/CPT) re-knits it into a coherent deeper model. A Fisher/subspace-protected heal — protecting the high-importance instruction directions while re-knitting the off-distribution layers — is the intended next step, to recover coherence without sacrificing instruction-following.
Intended use
- Base for a healing train (the primary purpose).
- Reproducibility / study of Gemma-4 depth-upscaling and its Per-Layer-Embedding constraints.
Built 2026-06-01 as part of the E4B → E6B depth-upscale research.
- Downloads last month
- -