tiny-30m-byte โ LocalAgent (28.32M params)
A from-scratch, byte-level tool-calling agent model from LocalAgent. Pure PyTorch, 28.32M params, trained on CPU. It pairs a tiny decoder (GQA + RoPE + SwiGLU) with a dual head (tool-selection classifier + pointer/copy argument head) and prompt-grounded constrained decoding for reliable tool calls across 21 tools (general assistant, the Claude Code / Codex coding surface, and computer-use / productivity tools), including parallel two-call turns.
Architecture
- vocab 256 (byte-level), d_model 512, layers 10, heads 8/2 (GQA), ffn 1408
- factorized embeddings: False
Files
config.jsonโModelConfigmodel.safetensors/pytorch_model.binโ decoder weightsagent_heads.binโ trained tool-selection + pointer heads (optional)
What it can do (use cases)
One byte-level model that turns a natural-language turn into a grounded tool call โ across an assistant, a coding agent, computer-use/productivity apps, and parallel two-call turns:
| you say | it calls |
|---|---|
| "What's the weather in Cusco?" | get_weather(city="Cusco") |
| "What is 19 * 19 * 5?" | calculator(expression="19*19*5") |
| "Open the file bin/run.sh." | read_file(path="bin/run.sh") |
| "Grep for 'TODO'." | grep_search(pattern="TODO") |
| "Run the tests." | run_tests() |
| "Commit with message 'fix bug'." | git_commit(message="fix bug") |
| "Send an email to Greta." | send_email(recipient="Greta") |
| "Go to figma.com." | open_url(url="figma.com") |
| "Send a Slack message saying 'ship it'." | slack_send(message="ship it") |
| "Create a Jira ticket titled 'broken link'." | jira_issue(summary="broken link") |
| "Compose an email to Judy and search for how tall is Everest." | send_email(recipient="Judy") + web_search(query="how tall is Everest") |
Multi-turn coding (grounds a follow-up arg from a tool response):
read_file(tests/test_api.py) โ result โ run_tests() โ "FAILEDโฆ" โ fix.
At catalog scale (100sโ1000s of tools) selection is done by retrieval (top-k) instead of a
fixed head. See the LocalAgent repo.
Load (pure PyTorch, no transformers)
import json, torch
from huggingface_hub import hf_hub_download
from localagent.model import LocalAgentLM, ModelConfig
cfg_d = json.load(open(hf_hub_download("danelcsb/localagent-tiny-30m-byte", "config.json")))
cfg = ModelConfig(**{k: v for k, v in cfg_d.items() if k in ModelConfig.__dataclass_fields__})
model = LocalAgentLM(cfg)
from safetensors.torch import load_file
model.load_state_dict(load_file(hf_hub_download("danelcsb/localagent-tiny-30m-byte", "model.safetensors")))
model.eval()
See the LocalAgent repo for the grounded decoder / agent runtime (tool head, pointer head, retrieval, parallel-call decode).
- Downloads last month
- 9