Skip to content

fix: fix rank argument in ddp multi-node training#111

Open
Chamberlain0w0 wants to merge 1 commit intomasterfrom
fix/ddp_multinode
Open

fix: fix rank argument in ddp multi-node training#111
Chamberlain0w0 wants to merge 1 commit intomasterfrom
fix/ddp_multinode

Conversation

@Chamberlain0w0
Copy link
Contributor

修改 DDP 构造时接受 Rank 参数

@Chamberlain0w0
Copy link
Contributor Author

Llama FP32 精度
双机 16 卡开启 ddp+tp+sp+pp 并行训练:
image
单机同规模:
image

@Chamberlain0w0
Copy link
Contributor Author

Llama BF16 精度
双机 16 卡开启 ddp+tp+sp+pp 并行训练:
image
单机同规模:
image

@Chamberlain0w0
Copy link
Contributor Author

GPT2 FP32 精度
双机 16 卡开启 ddp+tp+sp+pp 并行训练:
image
单机同规模:
image

@Chamberlain0w0
Copy link
Contributor Author

GPT2 BF16 精度
双机 16 卡开启 ddp+tp+sp+pp 并行训练:
image

单机同规模:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant