I want to know what this is.

These models are Bidirectional LSTMs with attention.

How to run?

Each models version folder includes a script to run, however it also has training capabilities, for inference set 'RETRAIN' and 'continueTrain' to False, continueTrain continues training from the last check point, RETRAIN trains the model from scratch and overrides any models that may have the same file name. Directory layout: folderWhereTheIAMAMscriptIs/ = Script model/ = tokenizer.pkl = chatbot.keras

Note: Ensure tokenizer.pkl is in the same directory as the script.

How was this trained?

In each models versions folder, a rough sorta guide will tell you how that model was trained.

This is cool, but whats the best one here and what shoud I use?

There are some you should use and some you shouldn't:

  • version 1.0.x (2M params) This is a great base model consider using this!
  • version 1.1.x (40M param) DO NOT USE shows the same or marginally better than the v1.0.0 2M model
  • version 1.2.x (7M param) More details coming soon!

Why are some of these spitting out junk? (limitations)

These models may produce incoherent or incorrect outputs because:

  • They are not fine-tuned using reinforcement learning (e.g., PPO)
  • They do not use RLHF
  • They are trained primarily on supervised data only
  • No system prompts or instruction tuning are applied
  • Some models are trained from scratch without leveraging external pretrained base models*

*: Some models will be trained off of other IAMAM models but will never use base models that are not part of the IAMAM project!

Downloads last month
421
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train DJF-on-arm/Iamam