Softmax Temperature

How temperature affects probability distributions in language models

Choose a sentence context:

Sentence completion with model logits:

"I read it yesterday, it is a very interesting..."

Softmax with Temperature Formula

$$P(\text{word}_i) = \frac{e^{\text{logit}_i / T}}{\sum_j e^{\text{logit}_j / T}}$$

Where T is the temperature parameter. Lower T makes the distribution sharper, higher T makes it more uniform.

Temperature Control

Sharp (0.1) Uniform (6.0)

Probability Distributions

Bar Chart

Treemap

Sample a Word

Click the button below to sample a word based on the current probability distribution.

"I read it yesterday, it is a very interesting"