Use fake_score features for DMD2 discriminator losses by wlaud1001 · Pull Request #17 · NVlabs/FastGen

wlaud1001 · 2026-03-16T09:00:01Z

I really appreciate the amazing work of bringing together multiple distillation methods into a single unified framework.
While looking through the DMD2 implementation in FastGen, I noticed what seems to be a slight discrepancy from the original paper and reference implementation, so I wanted to submit this PR.

This PR updates DMD2 so that GAN losses use fake_score features instead of teacher features.

Changes:

use fake_score features for generator GAN loss
use fake_score features for discriminator real/fake inputs
use fake_score features in the R1 perturbation path
allow discriminator-side adversarial gradients to also update the fake_score backbone

teacher remains responsible for producing the x0 target used by VSD.

This change is motivated by the DMD2 paper and its reference implementations.
In Section 4.3 of Improved Distribution Matching Distillation for Fast Image Synthesis (NeurIPS 2024), the authors write:

"Our integration of a GAN classifier into DMD follows a minimalist design: we add a classification branch on top of the bottleneck of the fake diffusion denoiser (see Fig. 3)."

This is also consistent with existing implementations:

Original DMD2 applies the classification head on fake_unet bottleneck features:
https://github.com/tianweiy/DMD2/blob/8d8fa55633d47cfb81bbc7a892e7248f9518763f/main/edm/edm_guidance.py#L185
NVIDIA Cosmos DMD2 similarly routes the adversarial path through fake_score intermediate features:
https://github.com/nvidia-cosmos/cosmos-predict2.5/blob/7e5ffc83fefb2ae1c105c1185cdeb239efb1325c/cosmos_predict2/_src/predict2/distill/models/video2world_model_distill_dmd2.py#L244

If the current use of teacher features was intentional, I’d appreciate clarification on the intended design.

juliusberner · 2026-03-19T02:22:31Z

Thanks a lot for the MR!

I agree that the original and other DMD2 implementations use the fake score features as input to the discriminator. However, we observed better and more stable performance when using the (fixed) teacher network as feature extractor. This choice is also used in f-distill (see Appendix B) and is similar to the setting in LADD.

Nevertheless, the MR will still be interesting for people that want to test the original DMD2 in FastGen.

wlaud1001 · 2026-03-25T05:57:36Z

Thanks for clarifying that this was an intentional design choice.

It is very interesting to hear that using the teacher network as the discriminator feature extractor gave better and more stable performance in practice. I also appreciate the references to f-distill and LADD.

My main goal with this MR was to support a setup closer to the original DMD2 implementation, since some users may want to reproduce or directly compare with that formulation. So if you think it would be useful, I would be happy to keep this MR open as an optional alternative, while keeping the current teacher-feature version as the default.

Thanks again for the review.

Use fake_score features for DMD2 discriminator losses

bd32a47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use fake_score features for DMD2 discriminator losses#17

Use fake_score features for DMD2 discriminator losses#17
wlaud1001 wants to merge 1 commit intoNVlabs:mainfrom
wlaud1001:dmd2-use-fake-score-features

wlaud1001 commented Mar 16, 2026

Uh oh!

juliusberner commented Mar 19, 2026

Uh oh!

wlaud1001 commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wlaud1001 commented Mar 16, 2026

Uh oh!

juliusberner commented Mar 19, 2026

Uh oh!

wlaud1001 commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants