Skip to content

bmel:tokenCount

Category: LLM & AI Observability · Returns: bmel:integer

bmel:tokenCount(content: bmel:any, tokenizer: bmel:string)

Description

Calculates the number of tokens required to represent the given content using the specified tokenizer. Useful for cost estimation, context-window budget tracking, and chunk sizing in RAG pipelines. Supported tokenizer identifiers: 'cl100k_base' (GPT-4, GPT-3.5-turbo, text-embedding-ada-002), 'o200k_base' (GPT-4o, GPT-4o-mini), 'p50k_base' (GPT-3, Codex), 'voyage' (Anthropic / Voyage AI models), 'gemini' (Gemini 1.0/1.5 SentencePiece tokenizer), 'llama3' (Meta LLaMA 3.x tiktoken-based tokenizer), 'mistral' (Mistral / Mixtral SentencePiece tokenizer). If an unrecognised identifier is passed, falls back to 'cl100k_base' and logs a warning.

Arguments

Parameter Type Required Description
content bmel:any The content to tokenize. Strings are tokenized directly; other types are serialized to JSON first.
tokenizer bmel:string Tokenizer identifier. Supported values: 'cl100k_base' (GPT-4 / GPT-3.5 / Ada), 'o200k_base' (GPT-4o), 'p50k_base' (GPT-3 / Codex), 'voyage' (Claude / Anthropic), 'gemini', 'llama3', 'mistral'.

Example

bmel:tokenCount({getCompletion:Request Payload}.$.prompt, 'cl100k_base')

Back to BMEL Reference