Skip to content

Redesign intrinsic-test to use simple comparison#2063

Draft
sayantn wants to merge 5 commits intorust-lang:mainfrom
sayantn:intrinsic-test
Draft

Redesign intrinsic-test to use simple comparison#2063
sayantn wants to merge 5 commits intorust-lang:mainfrom
sayantn:intrinsic-test

Conversation

@sayantn
Copy link
Contributor

@sayantn sayantn commented Mar 16, 2026

Currently intrinsic-test prints the outputs and then compares the outputs manually. This PR uses a different approach -- generate C wrappers for the intrinsics, link to them from Rust, and then just use simple rust tests to compare outputs

@sayantn sayantn force-pushed the intrinsic-test branch 3 times, most recently from fc52b8d to feb1dcd Compare March 16, 2026 00:27
@sayantn
Copy link
Contributor Author

sayantn commented Mar 16, 2026

---- test_vdupq_n_f16 stdout ----

thread 'test_vdupq_n_f16' (2187) panicked at mod_0/src/lib.rs:13773:17:
assertion `left == right` failed: 
  left: [NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0)]
 right: [NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(1.43e-5), NiceF16(0.0), NiceF16(-50430.0), NiceF16(1.79e-5)]

This seems weird (left is the Rust output, right is the C one, and NiceF16 is a wrapper which implements PartialEq as a == b || (a.is_nan() && b.is_nan())). This looks like ABI-related issue. For reference, the declaration looks like

unsafe extern "C" {
    fn vdup_n_f16_wrapper(value: f16) -> float16x4_t;
}

In fact most f16 tests fail in armv7. @folkertdev can you help?

Edit:

To work around this issue I have modified the tool to communicate with C via pointers (e.g. the C wrapper for _mm_add_ps looks like void _mm_add_ps_wrapper(__m128 *dst, const __m128* a, const __m128* b). This fixed the AArch64 and ARMv7 problems, but now the AArch64BE tests are failing, because apparently C and Rust have different pointer load semantics for matrix-like vectors (e.g. uint64x2x2_t) https://godbolt.org/z/j1d16z1P9

@sayantn sayantn force-pushed the intrinsic-test branch 3 times, most recently from 53fa987 to e2346ff Compare March 16, 2026 06:05
@sayantn
Copy link
Contributor Author

sayantn commented Mar 16, 2026

Btw the time gains are significant, it reduces the Arm and aarch64 times to 2-3 minutes, and the full x86 run (we did 20% previously) to around 12 mins for release and 17 mins for dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant