TokenSets
TokenSets group multiple Tokens together to define specific input patterns. They represent the structure of data that will be fed to the model.
TokenSets are the basic building blocks of Instructions.
TokenSet Parameters
class TokenSet:
def __init__(self, tokens: Sequence[Token]):
- tokens: Required sequence of Token instances that define the input pattern
Creating TokenSets
# Create a TokenSet combining multiple tokens
tree_alice_talk = mtp.TokenSet(tokens=(tree, alice, talk))
# Create a TokenSet with sentence length
character_context_sentence = mtp.TokenSet(tokens=(character, context, sentence_length))
# Create a TokenSet with NumToken
tree_english_alice_talk_emotion = mtp.TokenSet(
tokens=(token_tree, token_english, token_alice, token_talk, emotion)
)
# Create a TokenSet with NumListToken
tree_english_cat_talk_coordinates = mtp.TokenSet(
tokens=(token_tree, token_english, token_cat, token_talk, coordinates)
)
TokenSet Properties
- tokens: The tokens in the set (unordered)
Creating Snippets from TokenSets
TokenSets provide a create_snippet() method to create snippets for use in instructions. When a TokenSet contains NumTokens or NumListTokens, you must use create_snippet() and provide the numeric values. For TokenSets without numeric tokens, you can pass strings directly when adding samples.
# For TokenSets without NumTokens or NumListTokens, you can use strings directly
simple_tokenset = mtp.TokenSet(tokens=(tree, cat, talk))
# When adding samples, you can pass strings directly:
instruction.add_sample(
input_snippets=["Why do I keep vanishing?"], # String is automatically converted
output_snippet="Because it amuses me."
)
# For TokenSets with NumTokens, you must create snippets with numeric values
emotion_token = mtp.NumToken("Emotion", min_value=0, max_value=10)
emotion_tokenset = mtp.TokenSet(tokens=(tree, alice, talk, emotion_token))
# Create a snippet with the numeric value
emotion_snippet = emotion_tokenset.create_snippet(
string="Can you tell me a way?",
numbers=5 # Required: numeric value for the NumToken
)
# For TokenSets with NumListTokens, you must create snippets with number lists
coordinates_token = mtp.NumListToken("Coordinates", min_value=-1000, max_value=1000, length=3)
coordinates_tokenset = mtp.TokenSet(tokens=(tree, cat, talk, coordinates_token))
# Create a snippet with the number list
coordinates_snippet = coordinates_tokenset.create_snippet(
string="Then it doesn't matter which way you go.",
number_lists=[100, 200, -50] # Required: list of numbers matching the length
)
# Use the snippets when adding samples
instruction.add_sample(
input_snippets=[emotion_snippet, coordinates_snippet],
output_snippet="Oh sure, if you only walk long enough."
)
What's Allowed in TokenSets
- Any combination of Token types: TokenSets can contain Basic Tokens, NumTokens, and NumListTokens in any combination
- Multiple tokens: TokenSets can contain multiple tokens to define complex patterns
- Unordered tokens: The order of tokens in a TokenSet doesn't matter; they are treated as a set
TokenSet Validation
The MTP system ensures that:
- All tokens in a TokenSet are valid and properly defined
- NumTokens have associated number ranges when used in snippets
- NumListTokens have associated number ranges and fixed lengths when used in snippets
- TokenSets are used consistently across instructions
Example TokenSet Patterns
Storytelling Patterns
# Storytelling TokenSets
scene_setting = mtp.TokenSet(tokens=(scene, setting, time))
character_dialogue = mtp.TokenSet(tokens=(character, dialogue, emotion)) # emotion is a NumToken for intensity
plot_development = mtp.TokenSet(tokens=(plot, development, conflict))
Educational Patterns
# Educational TokenSets
question_answer = mtp.TokenSet(tokens=(question, answer, subject))
explanation_concept = mtp.TokenSet(tokens=(explanation, concept, level))
example_application = mtp.TokenSet(tokens=(example, application, domain))
Interactive Patterns
# Interactive TokenSets
user_input_response = mtp.TokenSet(tokens=(user, input, response))
system_prompt_output = mtp.TokenSet(tokens=(system, prompt, output))
feedback_improvement = mtp.TokenSet(tokens=(feedback, improvement, iteration))
Best Practices
- When using NumTokens or NumListTokens, always create snippets with the appropriate numeric values
- For TokenSets without numeric tokens, you can pass strings directly when adding samples to instructions
- Use TokenSets consistently across instructions to maintain pattern coherence
Databiomes