Context Window
The context window is the maximum amount of text (measured in tokens) that an AI model can process in a single interaction, including both the input and the output.
The context window defines the total amount of information an AI model can consider at once, measured in tokens. It encompasses everything the model sees including the system prompt, conversation history, any documents or data provided as context, the user's current message, and the model's response. Once the context window is full, the model cannot reference earlier information without it being explicitly re-included.
Context window sizes have expanded dramatically as models have improved. Early GPT models had 2,048-token windows. GPT-3.5 expanded to 4,096 and then 16,384 tokens. Claude 3 supports 200,000 tokens (roughly 150,000 words or a full novel). Google's Gemini 1.5 Pro supports up to 1,000,000 tokens. Larger context windows enable use cases like analyzing entire codebases, processing long legal documents, and maintaining extended conversations without losing context.
However, context window size alone does not determine the quality of information processing. Research shows that models can struggle with information in the middle of very long contexts (the "lost in the middle" problem), and retrieval accuracy can degrade as context length increases. Effective use of large context windows involves structuring information clearly, placing the most important content at the beginning and end, and using techniques like RAG to ensure the most relevant information is included rather than dumping everything into the context.
Real-World Examples
- •Claude's 200K token context window allowing analysis of an entire codebase in one conversation
- •A chatbot losing track of early conversation details because they scrolled out of the context window
- •Using RAG to fill the context window with only the most relevant documents instead of everything
- •Gemini 1.5 Pro processing an entire hour-long video transcript within its million-token context