TokenSets

TokenSets group multiple Tokens together to define specific input patterns. They represent the structure of data that will be fed to the model.

TokenSets are the basic building blocks of Instructions.

TokenSet Parameters

class TokenSet:
    def __init__(self, tokens: Sequence[Token]):

tokens: Required sequence of Token instances that define the input pattern

Creating TokenSets

# Create a TokenSet combining multiple tokens
tree_alice_talk = mtp.TokenSet(tokens=(tree, alice, talk))

# Create a TokenSet with sentence length
character_context_sentence = mtp.TokenSet(tokens=(character, context, sentence_length))

# Create a TokenSet with NumToken
tree_english_alice_talk_emotion = mtp.TokenSet(
    tokens=(token_tree, token_english, token_alice, token_talk, emotion)
)

# Create a TokenSet with NumListToken
tree_english_cat_talk_coordinates = mtp.TokenSet(
    tokens=(token_tree, token_english, token_cat, token_talk, coordinates)
)

TokenSet Properties

tokens: The tokens in the set (unordered)

Creating Snippets from TokenSets

TokenSets provide a create_snippet() method to create snippets for use in instructions. When a TokenSet contains NumTokens or NumListTokens, you must use create_snippet() and provide the numeric values. For TokenSets without numeric tokens, you can pass strings directly when adding samples.

# For TokenSets without NumTokens or NumListTokens, you can use strings directly
simple_tokenset = mtp.TokenSet(tokens=(tree, cat, talk))

# When adding samples, you can pass strings directly:
instruction.add_sample(
    input_snippets=["Why do I keep vanishing?"],  # String is automatically converted
    output_snippet="Because it amuses me."
)

# For TokenSets with NumTokens, you must create snippets with numeric values
emotion_token = mtp.NumToken("Emotion", min_value=0, max_value=10)
emotion_tokenset = mtp.TokenSet(tokens=(tree, alice, talk, emotion_token))

# Create a snippet with the numeric value
emotion_snippet = emotion_tokenset.create_snippet(
    string="Can you tell me a way?",
    numbers=5  # Required: numeric value for the NumToken
)

# For TokenSets with NumListTokens, you must create snippets with number lists
coordinates_token = mtp.NumListToken("Coordinates", min_value=-1000, max_value=1000, length=3)
coordinates_tokenset = mtp.TokenSet(tokens=(tree, cat, talk, coordinates_token))

# Create a snippet with the number list
coordinates_snippet = coordinates_tokenset.create_snippet(
    string="Then it doesn't matter which way you go.",
    number_lists=[100, 200, -50]  # Required: list of numbers matching the length
)

# Use the snippets when adding samples
instruction.add_sample(
    input_snippets=[emotion_snippet, coordinates_snippet],
    output_snippet="Oh sure, if you only walk long enough."
)

What's Allowed in TokenSets

Any combination of Token types: TokenSets can contain Basic Tokens, NumTokens, and NumListTokens in any combination
Multiple tokens: TokenSets can contain multiple tokens to define complex patterns
Unordered tokens: The order of tokens in a TokenSet doesn't matter; they are treated as a set

TokenSet Validation

The MTP system ensures that:

All tokens in a TokenSet are valid and properly defined
NumTokens have associated number ranges when used in snippets
NumListTokens have associated number ranges and fixed lengths when used in snippets
TokenSets are used consistently across instructions

Example TokenSet Patterns

Storytelling Patterns

# Storytelling TokenSets
scene_setting = mtp.TokenSet(tokens=(scene, setting, time))
character_dialogue = mtp.TokenSet(tokens=(character, dialogue, emotion)) # emotion is a NumToken for intensity
plot_development = mtp.TokenSet(tokens=(plot, development, conflict))

Educational Patterns

# Educational TokenSets
question_answer = mtp.TokenSet(tokens=(question, answer, subject))
explanation_concept = mtp.TokenSet(tokens=(explanation, concept, level))
example_application = mtp.TokenSet(tokens=(example, application, domain))

Interactive Patterns

# Interactive TokenSets
user_input_response = mtp.TokenSet(tokens=(user, input, response))
system_prompt_output = mtp.TokenSet(tokens=(system, prompt, output))
feedback_improvement = mtp.TokenSet(tokens=(feedback, improvement, iteration))

Best Practices

When using NumTokens or NumListTokens, always create snippets with the appropriate numeric values
For TokenSets without numeric tokens, you can pass strings directly when adding samples to instructions
Use TokenSets consistently across instructions to maintain pattern coherence

TokenSet Parameters​

Creating TokenSets​

TokenSet Properties​

Creating Snippets from TokenSets​

What's Allowed in TokenSets​

TokenSet Validation​

Example TokenSet Patterns​

Storytelling Patterns​

Educational Patterns​

Interactive Patterns​

Best Practices​