Temperature
Learn what Temperature means in AI and machine learning, with examples and related concepts.
Definition
Temperature is a parameter that controls how random or deterministic an LLM’s output is. It scales the probability distribution over tokens before the model picks the next one.
- Temperature = 0 — The model always picks the most likely token. Output is deterministic and focused.
- Temperature = 1 — The model samples from the full probability distribution as-is. Output is varied and natural.
- Temperature > 1 — Probabilities get flattened, making unlikely tokens more probable. Output becomes more creative (or chaotic).
The name comes from thermodynamics — higher temperature means more molecular randomness. Same idea: higher temperature means more randomness in token selection.
How It Works
Temperature modifies the softmax function that converts raw model scores (logits) into probabilities:
Original logits for next token:
"Paris": 5.0
"Lyon": 2.0
"Berlin": 1.0
Temperature = 0.2 (focused):
"Paris": 99.4% ← almost certain
"Lyon": 0.5%
"Berlin": 0.1%
Temperature = 1.0 (balanced):
"Paris": 84.4%
"Lyon": 10.5%
"Berlin": 3.9%
Temperature = 2.0 (creative):
"Paris": 55.1% ← much less certain
"Lyon": 22.1%
"Berlin": 14.0%
Mathematically: P(token) = exp(logit / T) / sum(exp(all_logits / T)) where T is the temperature.
Why It Matters
Choosing the right temperature is one of the easiest ways to improve LLM output quality:
| Use Case | Recommended Temperature | Why |
|---|---|---|
| Code generation | 0 - 0.2 | Code needs to be correct, not creative |
| Data extraction / JSON | 0 | Deterministic output for parsing |
| General Q&A | 0.3 - 0.7 | Balance accuracy with natural language |
| Creative writing | 0.8 - 1.0 | Varied, interesting prose |
| Brainstorming | 1.0 - 1.5 | Diverse, unexpected ideas |
Setting temperature too high for factual tasks causes hallucinations. Setting it too low for creative tasks produces repetitive, boring text.
Example
from anthropic import Anthropic
client = Anthropic()
prompt = "Write a one-sentence product tagline for an AI code editor."
# Low temperature — consistent, safe output
for i in range(3):
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=50,
temperature=0,
messages=[{"role": "user", "content": prompt}]
)
print(f"T=0 run {i+1}: {response.content[0].text}")
# → All 3 outputs will be identical or near-identical
# Higher temperature — varied, creative output
for i in range(3):
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=50,
temperature=1.0,
messages=[{"role": "user", "content": prompt}]
)
print(f"T=1 run {i+1}: {response.content[0].text}")
# → Each output will be different
# Practical pattern: use low temperature for structured output
import json
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=200,
temperature=0, # deterministic for reliable JSON
messages=[{
"role": "user",
"content": "Extract {name, price, category} as JSON from: 'The Sony WH-1000XM5 headphones cost $349 in the audio category.'"
}]
)
data = json.loads(response.content[0].text)
# → {"name": "Sony WH-1000XM5", "price": 349, "category": "audio"}
Temperature vs Top-p
Temperature and top-p both control randomness, but differently:
- Temperature scales all probabilities — changes the shape of the entire distribution
- Top-p cuts off the tail — only considers tokens whose cumulative probability reaches p
Most APIs let you set both, but it’s best to adjust one at a time. Anthropic’s API supports both temperature and top_p parameters.
Key Takeaways
- Temperature controls randomness: 0 = deterministic, 1 = natural sampling, >1 = more random
- Use low temperature (0-0.3) for code, data extraction, and factual tasks
- Use higher temperature (0.7-1.0) for creative writing and brainstorming
- Temperature is the single most impactful generation parameter — adjust it before anything else
- When using both temperature and top-p, change one at a time to understand the effect
Part of the DeepRaft Glossary — AI and ML terms explained for developers.