Clamp logit_scale to prevent numerical instability by Mr-Neutr0n · Pull Request #530 · openai/CLIP

Mr-Neutr0n · 2026-02-11T18:32:29Z

Summary

Adds torch.clamp(logit_scale, max=100) after exponentiation in CLIP.forward() to prevent the learned temperature parameter from causing numerical overflow during training/fine-tuning.

Problem

The logit_scale parameter is learned during training with no upper bound enforced. As training progresses, self.logit_scale can grow large enough that self.logit_scale.exp() overflows, producing Inf/NaN in the cosine similarity logits. This causes the contrastive loss to become NaN and training to diverge entirely.

This is a known failure mode when fine-tuning CLIP, and the fix is consistent with:

The original CLIP training procedure described in the paper (Section 2.4: "clipped to prevent scaling the logits by more than 100")
OpenCLIP's implementation (open_clip/model.py)

Fix

One-line addition in clip/model.py:

logit_scale = self.logit_scale.exp()
logit_scale = torch.clamp(logit_scale, max=100)  # added

The max value of 100 corresponds to a minimum temperature of 0.01, which produces an extremely sharp softmax distribution and is well beyond any practical operating point.

Test plan

Verified the change is a single line addition with no side effects on inference
Consistent with the CLIP paper's described training procedure
Matches the approach used in OpenCLIP and other widely-used reimplementations

The logit_scale parameter (initialized to ln(1/0.07) ≈ 2.66) can grow unbounded during training since there is no upper bound enforced on it. When logit_scale becomes too large, the exponentiated value overflows and produces NaN/Inf in the similarity logits, causing training to diverge. This adds torch.clamp(logit_scale, max=100) after exponentiation, consistent with the original CLIP training procedure and other reference implementations (e.g., OpenCLIP). The cap of 100 corresponds to a temperature of 0.01, which is already an extremely sharp distribution and well beyond any practical operating point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clamp logit_scale to prevent numerical instability#530

Clamp logit_scale to prevent numerical instability#530
Mr-Neutr0n wants to merge 1 commit intoopenai:mainfrom
Mr-Neutr0n:fix/clamp-logit-scale

Mr-Neutr0n commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mr-Neutr0n commented Feb 11, 2026

Summary

Problem

Fix

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant