Fix AutoTP custom patterns: respect use_default_specs#7827
Conversation
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
| # Only use fused-QKV heuristics when no partition_config is provided. | ||
| elif self.partition_config is None and require_tp_fused_qkvw(name, self.mp_size): | ||
| # Check and handle fused qkv for TP | ||
| return fused_LinearLayer(module, self.mp_group, fused_module=self.module) |
There was a problem hiding this comment.
Are these fix exposed by a test? i.e. a model with conv linear layer or fused qkv weight.
There was a problem hiding this comment.
Great catch! I added a test to validate we use the layers for new custom patterns when a partition is given.
|
Hi @tohtana I reviewed this PR. I have one extended question. I saw current partition_config has a field called "use_default_specs" which allows user to specify default behavior, for example taken from your test: Does this equal to define a default fallback pattern and partition type in layer_specs? I feel this might make the config in a more unified format. It might just be personal favor. |
|
@delock Thank you for your review!
The document explains "You can also set use_default_specs to true to merge your custom patterns on top of the preset (when preset_model is provided)." I think we could clarify this by explaining when Can you share your thoughts? |
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
|
@tohtana Thanks for the explaination! I don't have further comments.
|
The current code has the following issues: - `use_default_specs: false` doesn't work - Injection by the traditional pattern runs even when custom patterns are set - `mpu` needs to be passed to `deepspeed.initialize` (HF integration doesn't pass mpu) This PR fixes AutoTP setup to respect `use_default_specs: false` and disable the traditional injection path when custom patterns are enabled. Also, when `mpu` is not passed, we create a TP group in the initialization process. With these changes, the [related tests](https://github.com/deepspeedai/DeepSpeed/tree/master/tests/unit/model_parallelism) pass and [all AutoTP examples](https://github.com/tohtana/DeepSpeedExamples/tree/tohtana/custom_auto_tp/training/tensor_parallel) in DeepSpeedExamples work now ([PR](deepspeedai/DeepSpeedExamples#998)). --------- Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com> Signed-off-by: Kento Sugama <kentosugama@protonmail.ch>
The current code has the following issues:
use_default_specs: falsedoesn't workmpuneeds to be passed todeepspeed.initialize(HF integration doesn't pass mpu)This PR fixes AutoTP setup to respect
use_default_specs: falseand disable the traditional injection path when custom patterns are enabled. Also, whenmpuis not passed, we create a TP group in the initialization process.With these changes, the related tests pass and all AutoTP examples in DeepSpeedExamples work now (PR).