Nepalaya-R

Nepalaya-R is a large language model project with full source, configs, and deployment tooling for local and Hugging Face usage.

About This Model

This repository contains the Nepalaya-R model implementation with:

  • ✅ Full source code and inference implementations
  • ✅ Tokenizer configuration adapted for Nepalaya-R
  • ✅ Easy-to-use inference scripts
  • ✅ Documentation and setup guides

Quick Start

Installation

pip install -r requirements.txt

Download & Setup

Option 1: Download from Hugging Face

export HF_TOKEN=your_token
python download_model.py --model-id your-username/Nepalaya-R --local-dir ./model_weights

Option 2: Run Quick Inference

python quick_inference.py --prompt "Your prompt here"

Mirror Setup

To create your own Nepalaya-R repo mirror:

export HF_TOKEN=your_token
python mirror_to_hf.py \
  --source source-org/source-model \
  --dest your-username/Nepalaya-R

Documentation

Model Architecture

Nepalaya-R architecture summary:

  • Parameters: 671B
  • Context Length: Extended via sparse attention
  • Training: Sparse attention based training pipeline
  • Architecture: Optimized transformer with mixture-of-experts

Key Features

  • Multi-expert routing for efficient inference
  • Sparse attention for long-context processing
  • Chat template support
  • Distributed inference capabilities

System Requirements

  • GPU Memory: 48GB+ VRAM recommended
  • RAM: 64GB+ system memory
  • Storage: ~300GB for full model weights
  • SSD: Fast storage recommended

Usage Examples

Basic Generation

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "your-username/Nepalaya-R",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("your-username/Nepalaya-R")

inputs = tokenizer("Hello", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))

Chat Mode

messages = [
    {"role": "user", "content": "What is machine learning?"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)

Repository Structure

Nepalaya-R/
├── README.md                          # This file
├── SETUP.md                           # Setup guide
├── GITHUB_DEPLOY.md                   # Deployment guide
├── requirements.txt                   # Python dependencies
├── config.json                        # Model configuration
├── tokenizer.json                     # Tokenizer
├── quick_inference.py                 # Quick inference script
├── download_model.py                  # Model downloader
├── mirror_to_hf.py                    # HF mirroring tool
├── inference/                         # Inference code
│   ├── generate.py                    # Generation script
│   ├── model.py                       # Model implementation
│   ├── convert.py                     # Weight converter
│   └── config_671B_nepalaya.json      # Inference config
└── assets/                            # Chat templates

Files Included

  • Source Code: Full inference implementation
  • Configuration: Model and generation configs
  • Tokenizer: Complete tokenizer setup
  • Documentation: Setup and usage guides
  • Utilities: Download and mirror scripts

License

MIT License - See LICENSE file

Support

For documentation, see SETUP.md For deployment, see GITHUB_DEPLOY.md


Nepalaya-R model card and repository maintained by the Nepalaya-R project.

Downloads last month
252
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support