Top-p (Nucleus Sampling)
Top-p is a text generation parameter that limits the model's token selection to the smallest set of tokens whose cumulative probability exceeds a threshold p.
Top-p, also known as nucleus sampling, is an alternative to temperature for controlling the randomness of AI model outputs. Instead of scaling all token probabilities uniformly like temperature does, top-p dynamically selects a subset of tokens to sample from. It sorts tokens by probability, then includes tokens from highest to lowest probability until the cumulative probability reaches the threshold p. Only tokens within this nucleus are considered for selection.
With top-p set to 0.9, the model considers only the tokens that make up the top 90% of the probability mass. If the model is very confident about the next word, this might include only 1-2 tokens. If the model is uncertain, it might include dozens of tokens. This dynamic behavior is what makes top-p different from top-k sampling (which always considers exactly k tokens) and gives it an advantage in adapting to different contexts within the same generation.
In practice, top-p and temperature are sometimes used together, though this can be confusing and many practitioners recommend adjusting only one at a time. Lower top-p values (0.1-0.5) produce more focused outputs, while higher values (0.9-1.0) allow more variety. Most API defaults set top-p to 1.0 (consider all tokens) and let temperature alone control randomness. When building applications, it is best to start with reasonable defaults and adjust based on the specific quality characteristics you need.
Real-World Examples
- •Setting top-p to 0.1 for highly deterministic factual answers in a knowledge base chatbot
- •Using top-p 0.9 for general conversation to allow natural variation in responses
- •Combining temperature 0.5 with top-p 0.95 for balanced creative writing output
- •OpenAI and Anthropic APIs exposing top-p as a configurable parameter for fine-tuned generation