Use fake_score features for DMD2 discriminator losses#17
Use fake_score features for DMD2 discriminator losses#17wlaud1001 wants to merge 1 commit intoNVlabs:mainfrom
Conversation
|
Thanks a lot for the MR! I agree that the original and other DMD2 implementations use the fake score features as input to the discriminator. However, we observed better and more stable performance when using the (fixed) teacher network as feature extractor. This choice is also used in f-distill (see Appendix B) and is similar to the setting in LADD. Nevertheless, the MR will still be interesting for people that want to test the original DMD2 in FastGen. |
|
Thanks for clarifying that this was an intentional design choice. It is very interesting to hear that using the teacher network as the discriminator feature extractor gave better and more stable performance in practice. I also appreciate the references to f-distill and LADD. My main goal with this MR was to support a setup closer to the original DMD2 implementation, since some users may want to reproduce or directly compare with that formulation. So if you think it would be useful, I would be happy to keep this MR open as an optional alternative, while keeping the current teacher-feature version as the default. Thanks again for the review. |
I really appreciate the amazing work of bringing together multiple distillation methods into a single unified framework.
While looking through the DMD2 implementation in FastGen, I noticed what seems to be a slight discrepancy from the original paper and reference implementation, so I wanted to submit this PR.
This PR updates DMD2 so that GAN losses use fake_score features instead of teacher features.
Changes:
fake_scorefeatures for generator GAN lossfake_scorefeatures for discriminator real/fake inputsfake_scorefeatures in the R1 perturbation pathfake_scorebackboneteacherremains responsible for producing the x0 target used by VSD.This change is motivated by the DMD2 paper and its reference implementations.
In Section 4.3 of Improved Distribution Matching Distillation for Fast Image Synthesis (NeurIPS 2024), the authors write:
This is also consistent with existing implementations:
fake_unetbottleneck features:https://github.com/tianweiy/DMD2/blob/8d8fa55633d47cfb81bbc7a892e7248f9518763f/main/edm/edm_guidance.py#L185
fake_scoreintermediate features:https://github.com/nvidia-cosmos/cosmos-predict2.5/blob/7e5ffc83fefb2ae1c105c1185cdeb239efb1325c/cosmos_predict2/_src/predict2/distill/models/video2world_model_distill_dmd2.py#L244
If the current use of
teacherfeatures was intentional, I’d appreciate clarification on the intended design.