System Architecture

The MTP system is built on a hierarchical structure of five main components that work together to create comprehensive training protocols for language models.

Architecture Overview

The five core components are:

Context - Background information and domain knowledge for the model
Tokens - The fundamental building blocks
TokenSets - Combinations of tokens that define input patterns
Instructions - Training patterns that inform the model what to do
Guardrails - Safety mechanisms for bad user prompts

Component Hierarchy

Context

Context provides the foundational background information and domain knowledge that the model needs to understand the training data and respond appropriately.

Context establishes the domain, setting, and background information
It helps the model understand the context in which tokens, instructions, and responses should be interpreted
Context is added to the protocol using the add_context() method

Tokens

Tokens are the base building blocks of the MTP system. They represent words, symbols, concepts, or actions that the model will understand and use.

Basic Token: Standard tokens for concepts, actions, or entities
NumToken: Tokens associated with numerical values
NumListToken: Tokens for lists of numerical values

TokenSets

TokenSets group multiple Tokens together to define specific input patterns. They represent the structure of data that will be fed to the model.

TokenSets are the basic building blocks of instructions
They can contain any combination of token types
Snippets are created on TokenSets to provide training examples

Instructions

Instructions define how the model should respond to different input patterns. Instructions are composed of two main components:

InstructionInput: Defines the structure for the model's input, including TokenSets
InstructionOutput: Defines the structure and format of the model's response, including the response TokenSet and final tokens

Guardrails

Guardrails provide safety mechanisms for user interactions by defining what constitutes good vs. bad user prompts and how the model should respond to inappropriate inputs.

Data Flow

Context Establishment: Add background information and domain knowledge
Token Creation: Define the basic building blocks
TokenSet Assembly: Combine tokens into meaningful patterns
Snippet Generation: Create training examples from TokenSets
Instruction Definition: Specify how the model should respond to TokenSet patterns
Guardrail Application: Add safety mechanisms

Best Practices

Start with a clear understanding of your model's purpose
Establish comprehensive context to provide domain knowledge and background information
Define tokens that represent the core concepts in your domain
Create TokenSets that capture meaningful input patterns
Use instructions to teach the model appropriate responses
Always include guardrails for user-facing applications
Test your protocol with various examples before deployment

Architecture Overview​

Component Hierarchy​

Context​

Tokens​

TokenSets​

Instructions​

Guardrails​

Data Flow​

Best Practices​