In Depth

Modern LLMs use sub-word tokenization schemes like Byte-Pair Encoding (BPE) or SentencePiece, which balance vocabulary size against sequence length. The number of tokens in a prompt and response determines computational cost and falls within the model context window limit. A rough rule of thumb is that 1,000 tokens ≈ 750 English words, though this varies by language and tokenizer.