Fix ROCm BF16 conversion intrinsics in inference v2 (#7843) by tohtana · Pull Request #7846 · deepspeedai/DeepSpeed

tohtana · 2026-02-12T02:49:46Z

On HIP/ROCm (the AMD path), several CUDA-style BF16 intrinsics used in the code are not provided, e.g.:

__ll2bfloat16_rn
__int2bfloat16_rn
__short2bfloat16_rn
__bfloat162uint_rn

This causes compilation errors on HIP platforms.

This PR introduces fallback paths using functions available on HIP platform mirroring the conversion util in csrc. The converion paths are:

int/uint -> bf16: convert to float (or double for 64-bit), then to bf16.
bf16 -> int/uint: convert bf16 to float, then to the integer type.
float -> bf16: build from bf16 via supported HIP helpers.

Add __HIP_PLATFORM_AMD__ fallbacks for BF16 conversions in inference v2 conversion helpers to avoid unsupported HIP intrinsics on ROCm toolchains (e.g. __ll2bfloat16_rn, __int2bfloat16_rn, __bfloat162uint_rn). The fallback logic mirrors csrc conversion helpers to keep CUDA behavior unchanged while enabling ROCm builds. Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f8a8f2be62

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-12T02:53:40Z

deepspeed/inference/v2/kernels/includes/conversion_utils.h

 DS_D_INLINE __nv_bfloat16 to(int32_t val)
 {
+#ifdef __HIP_PLATFORM_AMD__
+    return __float2bfloat16(__int2float_rn(val));


Use exact intermediate for int32/uint32 BF16 casts on HIP

On AMD HIP, this new path converts 32-bit integers through float (__int2float_rn/__uint2float_rn) before __float2bfloat16, which can double-round for |val| > 2^24 and pick a different bf16 than a direct integer→bf16 round (the CUDA branch uses __int2bfloat16_rn/__uint2bfloat16_rn). This means large integer inputs can now produce numerically different bf16 values on ROCm; converting via double for 32-bit ints would keep the conversion exact before the final bf16 rounding.

Useful? React with 👍 / 👎.

This conversion is aligned with the existing code. If we want to change the behavior, that should be a separated PR.

PKUWZP

LGTM.

tohtana requested a review from hwchen2017 as a code owner February 12, 2026 02:49

chatgpt-codex-connector bot reviewed Feb 12, 2026

View reviewed changes

tohtana mentioned this pull request Feb 12, 2026

Problems while Deepspeed building from scratch on AMD MI300a APU #7843

Open

PKUWZP self-requested a review February 16, 2026 07:53

PKUWZP approved these changes Feb 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ROCm BF16 conversion intrinsics in inference v2 (#7843)#7846

Fix ROCm BF16 conversion intrinsics in inference v2 (#7843)#7846
tohtana wants to merge 1 commit intodeepspeedai:masterfrom
tohtana:tohtana/fix-issue7843-mi300a-bf16-intrinsics

tohtana commented Feb 12, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 12, 2026

Uh oh!

tohtana Feb 12, 2026

Uh oh!

PKUWZP left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tohtana commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

tohtana Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

PKUWZP left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tohtana commented Feb 12, 2026 •

edited

Loading