Part of the Hello Neural World learning project.

About This Model

TinyNet ReLU v2 Regularized - Final production-ready model with multiple improvements.

Key improvements:

  • ReLU on hidden layer, Sigmoid on output
  • Mixed batches: 20% clean + 80% noisy data (learns confidence + generalization)
  • Adam optimizer: Adaptive learning rate (lr=0.01)
  • Weight decay (L2 regularization): 0.01 to prevent overfitting
  • Extended training: 300 epochs (regularization allows longer training)

Performance:

  • Final loss: ~0.057 (77% better than Sigmoid baseline!)
  • High confidence (>95%) on clear patterns
  • Appropriate uncertainty (~50%) on ambiguous cases
  • Production-ready generalization

From the blog post: This model demonstrates real ML iteration - each weakness identified, each improvement targeted. The result: a model that knows what it knows.

Architecture

Input Layer:  4 neurons (2x2 pixel grid)
             โ†“
Hidden Layer: 3 neurons (ReLU or Sigmoid)
             โ†“
Output Layer: 2 neurons (Horizontal vs Vertical probabilities)

Total parameters: 23 (4ร—3 + 3 bias + 3ร—2 + 2 bias)

Training Data

Trained on thousands of noisy examples generated from 4 base patterns:

  • Horizontal top: [1,1,0,0]
  • Horizontal bottom: [0,0,1,1]
  • Vertical left: [1,0,1,0]
  • Vertical right: [0,1,0,1]

Each pattern augmented with random noise to force pattern learning instead of memorization.

Usage

from safetensors.torch import load_file
import torch.nn as nn

# Define the architecture
class TinyNet(nn.Module):
    def __init__(self):
        super(TinyNet, self).__init__()
        self.layer1 = nn.Linear(4, 3)
        self.layer2 = nn.Linear(3, 2)
        self.relu = nn.ReLU()  # or nn.Sigmoid() for baseline
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.relu(self.layer1(x))
        x = self.sigmoid(self.layer2(x))
        return x

# Load weights
model = TinyNet()
state_dict = load_file("model.safetensors")
model.load_state_dict(state_dict)

# Run inference
import torch
test_input = torch.tensor([[1.0, 1.0, 0.0, 0.0]])  # Perfect horizontal
output = model(test_input)
print(f"Horizontal: {output[0][0]:.2%}, Vertical: {output[0][1]:.2%}")

Intended Use

Educational purposes - demonstrates:

  • Backpropagation mechanics
  • Effect of activation functions
  • Overfitting vs generalization
  • Impact of data augmentation (noise)
  • Iterative ML development process

Limitations

  • Toy dataset (2ร—2 grids only)
  • Binary classification (horizontal vs vertical)
  • Not for production use
  • Designed for learning, not performance

Learn More

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support