Getting Started

Welcome to the Model Train Protocol (MTP) documentation. This section will help you understand the system architecture and get started with the API.

What is MTP?

The Model Train Protocol (MTP) is an open-source framework for creating and training custom Language Models on Databiomes. MTP provides a structured approach to defining all the data, patterns, and behaviors that your model will learn.

System Architecture

The MTP system is built on a hierarchical structure of five main components that work together to create comprehensive training protocols for language models.

Core Components

Context - Background information and domain knowledge for the model
Tokens - The fundamental building blocks
TokenSets - Combinations of tokens that define input patterns
Instructions - Training patterns that inform the model what to do
Guardrails - Safety mechanisms for bad user prompts

Component Hierarchy

Context

Context provides the foundational background information and domain knowledge that the model needs to understand the training data and respond appropriately.

Context establishes the domain, setting, and background information
It helps the model understand the context in which tokens, instructions, and responses should be interpreted
Context is added to the protocol using the add_context() method

Tokens

Tokens are the base building blocks of the MTP system. They represent words, symbols, concepts, or actions that the model will understand and use.

Basic Token: Standard tokens for concepts, actions, or entities
NumToken: Tokens associated with numerical values
NumListToken: Tokens for lists of numerical values

TokenSets

TokenSets group multiple Tokens together to define specific input patterns. They represent the structure of data that will be fed to the model.

TokenSets are the basic building blocks of instructions
They can contain any combination of token types
Snippets are created on TokenSets to provide training examples

Instructions

Instructions define how the model should respond to different input patterns. Instructions are composed of two main components:

InstructionInput: Defines the structure for the model's input, including TokenSets
InstructionOutput: Defines the structure and format of the model's response, including the response TokenSet and final tokens

Guardrails

Guardrails provide safety mechanisms for user interactions by defining what constitutes good vs. bad user prompts and how the model should respond to inappropriate inputs.

Data Flow

Context Establishment: Add background information and domain knowledge
Token Creation: Define the basic building blocks
TokenSet Assembly: Combine tokens into meaningful patterns
Snippet Generation: Create training examples from TokenSets
Instruction Definition: Specify how the model should respond to TokenSet patterns
Guardrail Application: Add safety mechanisms

Best Practices

Start with a clear understanding of your model's purpose
Establish comprehensive context to provide domain knowledge and background information
Define tokens that represent the core concepts in your domain
Create TokenSets that capture meaningful input patterns
Use instructions to teach the model appropriate responses
Always include guardrails for user-facing applications
Test your protocol with various examples before deployment

Next Steps

Learn about the System Architecture in detail
Explore the API Reference for implementation details
Start building with Instructions to understand the core training components

What is MTP?​

System Architecture​

Core Components​

Component Hierarchy​

Context​

Tokens​

TokenSets​

Instructions​

Guardrails​

Data Flow​

Best Practices​

Next Steps​