Skip to content

Use fake_score features for DMD2 discriminator losses#17

Open
wlaud1001 wants to merge 1 commit intoNVlabs:mainfrom
wlaud1001:dmd2-use-fake-score-features
Open

Use fake_score features for DMD2 discriminator losses#17
wlaud1001 wants to merge 1 commit intoNVlabs:mainfrom
wlaud1001:dmd2-use-fake-score-features

Conversation

@wlaud1001
Copy link

I really appreciate the amazing work of bringing together multiple distillation methods into a single unified framework.
While looking through the DMD2 implementation in FastGen, I noticed what seems to be a slight discrepancy from the original paper and reference implementation, so I wanted to submit this PR.

This PR updates DMD2 so that GAN losses use fake_score features instead of teacher features.

Changes:

  • use fake_score features for generator GAN loss
  • use fake_score features for discriminator real/fake inputs
  • use fake_score features in the R1 perturbation path
  • allow discriminator-side adversarial gradients to also update the fake_score backbone

teacher remains responsible for producing the x0 target used by VSD.

This change is motivated by the DMD2 paper and its reference implementations.
In Section 4.3 of Improved Distribution Matching Distillation for Fast Image Synthesis (NeurIPS 2024), the authors write:

"Our integration of a GAN classifier into DMD follows a minimalist design: we add a classification branch on top of the bottleneck of the fake diffusion denoiser (see Fig. 3)."

This is also consistent with existing implementations:

If the current use of teacher features was intentional, I’d appreciate clarification on the intended design.

@juliusberner
Copy link
Collaborator

Thanks a lot for the MR!

I agree that the original and other DMD2 implementations use the fake score features as input to the discriminator. However, we observed better and more stable performance when using the (fixed) teacher network as feature extractor. This choice is also used in f-distill (see Appendix B) and is similar to the setting in LADD.

Nevertheless, the MR will still be interesting for people that want to test the original DMD2 in FastGen.

@wlaud1001
Copy link
Author

Thanks for clarifying that this was an intentional design choice.

It is very interesting to hear that using the teacher network as the discriminator feature extractor gave better and more stable performance in practice. I also appreciate the references to f-distill and LADD.

My main goal with this MR was to support a setup closer to the original DMD2 implementation, since some users may want to reproduce or directly compare with that formulation. So if you think it would be useful, I would be happy to keep this MR open as an optional alternative, while keeping the current teacher-feature version as the default.

Thanks again for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants