bmel:tokenCount¶
Category: LLM & AI Observability · Returns: bmel:integer
bmel:tokenCount(content: bmel:any, tokenizer: bmel:string)
Description¶
Calculates the number of tokens required to represent the given content using the specified tokenizer. Useful for cost estimation, context-window budget tracking, and chunk sizing in RAG pipelines. Supported tokenizer identifiers: 'cl100k_base' (GPT-4, GPT-3.5-turbo, text-embedding-ada-002), 'o200k_base' (GPT-4o, GPT-4o-mini), 'p50k_base' (GPT-3, Codex), 'voyage' (Anthropic / Voyage AI models), 'gemini' (Gemini 1.0/1.5 SentencePiece tokenizer), 'llama3' (Meta LLaMA 3.x tiktoken-based tokenizer), 'mistral' (Mistral / Mixtral SentencePiece tokenizer). If an unrecognised identifier is passed, falls back to 'cl100k_base' and logs a warning.
Arguments¶
| Parameter | Type | Required | Description |
|---|---|---|---|
content | bmel:any | ✅ | The content to tokenize. Strings are tokenized directly; other types are serialized to JSON first. |
tokenizer | bmel:string | ✅ | Tokenizer identifier. Supported values: 'cl100k_base' (GPT-4 / GPT-3.5 / Ada), 'o200k_base' (GPT-4o), 'p50k_base' (GPT-3 / Codex), 'voyage' (Claude / Anthropic), 'gemini', 'llama3', 'mistral'. |
Example¶
bmel:tokenCount({getCompletion:Request Payload}.$.prompt, 'cl100k_base')