What is Tokens? — AI Glossary

Tokens are the fundamental units that large language models use to read and generate text. A tokenizer breaks input text into these units before the model processes them. In English, a token roughly corresponds to about 3/4 of a word on average, so 100 tokens equals approximately 75 words. However, common words may be a single token while uncommon or technical words may be split into multiple tokens.

Tokenization varies between different AI models. For example, the word "unhappiness" might be split into "un", "happiness" or "un", "happi", "ness" depending on the tokenizer. Code, numbers, and non-English languages often require more tokens per character of text. Understanding tokenization is important because AI models have a maximum context window measured in tokens, and pricing is typically based on the number of input and output tokens processed.

Tokens directly impact both the cost and capability of AI applications. More tokens in the context window means the model can reference more information, but each token adds to processing time and cost. Efficient prompt engineering minimizes unnecessary tokens while maximizing the useful context provided to the model. Modern models support context windows ranging from 8,000 to over 1,000,000 tokens.

Tokens

Real-World Examples

Related Terms

Prompt Engineering Mastery

Stop watching tutorials.
Start building.

Tokens

Real-World Examples

Related Terms

Prompt Engineering Mastery

Stop watching tutorials.
Start building.

Real-World Examples

Related Terms

Prompt Engineering Mastery

Stop watching tutorials. Start building.

Real-World Examples

Related Terms

Prompt Engineering Mastery

Stop watching tutorials. Start building.

Stop watching tutorials.
Start building.

Stop watching tutorials.
Start building.