Refactor Part 2: Integration of mlir-aie Features#88
Refactor Part 2: Integration of mlir-aie Features#88
Conversation
… score calculation in llama
…ueError Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… design Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r map lookup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…eck in design Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e count The previous implementation counted available artifacts before/after a rule fires. GenerateMLIRFromPythonCompilationRule works by replacing a PythonGeneratedMLIRArtifact node with an available SourceArtifact, which caused the available count to stay the same in some graph configurations, triggering a false positive. The correct check is whether the set of unavailable artifact filenames changed. If the same set remains unavailable after a rule fires, the rule made no genuine progress. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The AIERepeat parameter was renamed from num_repeats to repeat during the simplifying refactor, but llama_npu.py was not updated. Fixes the CI failure: TypeError: AIERepeat.__init__() got an unexpected keyword argument 'num_repeats' Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- llama_inference_harness.py: remove stray if __name__ == '__main__' guard
(no main() function exists in this file; it is a library, not a script)
- llama_inference_harness.py: initialize state.num_preceding_tokens = 0 in
reset_kv_cache() so the NPU decode path never hits AttributeError
- llama_npu.py: move scratch_buffer.to("cpu") outside the per-layer loop
(it was being called n_layers times redundantly per prefill pass)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📊 Test Results for Small Benchmark/Test Suite9580ba2 (2026_03_25_21_04_17) IRONCLADTested on
📈 Trends (vs main branch) for Small Benchmark/Test Suite9580ba2 (2026_03_25_21_04_17) IRONCLAD TrendsM_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128
M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512
M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256
M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8
M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024
M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0
axpy_1_cols_2_channels_2048_tile_2048_3.0
axpy_1_cols_2_channels_2048_tile_2048_3.0_0
axpy_2_cols_2_channels_2048_tile_1024_3.0
axpy_2_cols_2_channels_2048_tile_1024_3.0_0
axpy_4_cols_2_channels_2048_tile_512_3.0
axpy_4_cols_2_channels_2048_tile_512_3.0_0
axpy_8_cols_2_channels_2048_tile_256_3.0
axpy_8_cols_2_channels_2048_tile_256_3.0_0
dequant_1_cols_1_channels_2048_tile_2048
dequant_1_cols_1_channels_2048_tile_2048_0
dequant_1_cols_2_channels_2048_tile_1024
dequant_1_cols_2_channels_2048_tile_1024_0
dequant_2_cols_1_channels_2048_tile_1024
dequant_2_cols_1_channels_2048_tile_1024_0
dequant_2_cols_2_channels_2048_tile_512
dequant_2_cols_2_channels_2048_tile_512_0
dequant_4_cols_1_channels_2048_tile_512
dequant_4_cols_1_channels_2048_tile_512_0
dequant_4_cols_2_channels_2048_tile_256
dequant_4_cols_2_channels_2048_tile_256_0
dequant_8_cols_1_channels_2048_tile_256
dequant_8_cols_1_channels_2048_tile_256_0
dequant_8_cols_2_channels_2048_tile_128
dequant_8_cols_2_channels_2048_tile_128_0
eltwise_add_1_cols_2_channels_2048_tile_2048
eltwise_add_2_cols_2_channels_2048_tile_1024
eltwise_add_4_cols_2_channels_2048_tile_512
eltwise_add_8_cols_2_channels_2048_tile_256
eltwise_mul_1_cols_2_channels_2048_tile_2048
eltwise_mul_2_cols_2_channels_2048_tile_1024
eltwise_mul_4_cols_2_channels_2048_tile_512
eltwise_mul_8_cols_2_channels_2048_tile_256
embedding_dim_2048-hidden_dim_2048No metrics available. gelu_1_cols_1_channels_2048_tile_2048
gelu_1_cols_2_channels_2048_tile_1024
gelu_2_cols_1_channels_2048_tile_1024
gelu_2_cols_2_channels_2048_tile_512
gelu_4_cols_1_channels_2048_tile_512
gelu_4_cols_2_channels_2048_tile_256
gelu_8_cols_1_channels_2048_tile_256
gelu_8_cols_2_channels_2048_tile_128
gemm_1792x896x1152_64x32x48_8cols_ccolmaj
gemm_192x384x64_48x96x16_4cols
gemm_192x384x64_48x96x16_4cols_bcolmaj_ccolmaj
gemm_2048x2048x2048_64x64x32_8_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x32_8_cols_0_bcolmaj_1_ccolmaj_0
gemm_2048x2048x2048_64x64x32_8_cols_1_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_1cols
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_1_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_1_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2_cols_1_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_1_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2cols_bcolmaj
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_1_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8_cols_1_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8cols_bcolmaj_ccolmaj
gemm_384x1536x1792_32x48x64_4cols_bcolmaj
gemm_896x1792x640_32x64x80_8cols_ccolmaj
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_2048-scalar_factor_3.0
input_length_2048-num_aie_columns_1-tile_size_2048
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_1024-scalar_factor_3.0
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False
input_length_2048-num_aie_columns_2-tile_size_1024
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_512-scalar_factor_3.0
input_length_2048-num_aie_columns_4-tile_size_512
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_256-scalar_factor_3.0
input_length_2048-num_aie_columns_8-tile_size_256
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048
input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512
input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512
input_length_32768-num_aie_columns_4-num_channels_4-tile_size_2048
layer_norm_1_cols_1_channels_2048_tile_2048
layer_norm_1_cols_2_channels_2048_tile_1024
layer_norm_2_cols_1_channels_2048_tile_1024
layer_norm_2_cols_2_channels_2048_tile_512
layer_norm_4_cols_1_channels_2048_tile_512
layer_norm_4_cols_2_channels_2048_tile_256
layer_norm_8_cols_1_channels_2048_tile_256
layer_norm_8_cols_2_channels_2048_tile_128
matrix_vector_mul_128x128_32_1col
matrix_vector_mul_128x128_32_1col0
matrix_vector_mul_128x128_32tsi_128tso_1col
matrix_vector_mul_128x128_32tsi_128tso_1col0
matrix_vector_mul_2048x8192_1_1col
matrix_vector_mul_2048x8192_1_1col0
matrix_vector_mul_2048x8192_1_2col
matrix_vector_mul_2048x8192_1_2col0
matrix_vector_mul_2048x8192_1_4col
matrix_vector_mul_2048x8192_1_4col0
matrix_vector_mul_2048x8192_1_8col
matrix_vector_mul_2048x8192_1_8col0
matrix_vector_mul_2048x8192_1tsi_1024tso_2col
matrix_vector_mul_2048x8192_1tsi_1024tso_2col0
matrix_vector_mul_2048x8192_1tsi_2048tso_1col
matrix_vector_mul_2048x8192_1tsi_2048tso_1col0
matrix_vector_mul_2048x8192_1tsi_256tso_8col
matrix_vector_mul_2048x8192_1tsi_256tso_8col0
matrix_vector_mul_2048x8192_1tsi_512tso_4col
matrix_vector_mul_2048x8192_1tsi_512tso_4col0
matrix_vector_mul_8192x2048_4_1col
matrix_vector_mul_8192x2048_4_1col0
matrix_vector_mul_8192x2048_4_2col
matrix_vector_mul_8192x2048_4_2col0
matrix_vector_mul_8192x2048_4_4col
matrix_vector_mul_8192x2048_4_4col0
matrix_vector_mul_8192x2048_4_8col
matrix_vector_mul_8192x2048_4_8col0
matrix_vector_mul_8192x2048_4tsi_1024tso_1col
matrix_vector_mul_8192x2048_4tsi_1024tso_1col0
matrix_vector_mul_8192x2048_4tsi_1024tso_2col
matrix_vector_mul_8192x2048_4tsi_1024tso_2col0
matrix_vector_mul_8192x2048_4tsi_1024tso_4col
matrix_vector_mul_8192x2048_4tsi_1024tso_4col0
matrix_vector_mul_8192x2048_4tsi_1024tso_8col
matrix_vector_mul_8192x2048_4tsi_1024tso_8col0
mem_copy_16_cores_2_chans_2048_tile_128_False
mem_copy_16_cores_2_chans_2048_tile_128_False0
mem_copy_1_cols_1_channels_2048_tile_2048
mem_copy_1_cols_2_channels_2048_tile_1024
mem_copy_1_cores_1_chans_2048_tile_2048_False
mem_copy_1_cores_1_chans_2048_tile_2048_False0
mem_copy_2_cols_1_channels_2048_tile_1024
mem_copy_2_cols_2_channels_2048_tile_512
mem_copy_2_cores_1_chans_2048_tile_1024_False
mem_copy_2_cores_1_chans_2048_tile_1024_False0
mem_copy_2_cores_2_chans_2048_tile_1024_False
mem_copy_2_cores_2_chans_2048_tile_1024_False0
mem_copy_4_cols_1_channels_2048_tile_512
mem_copy_4_cols_2_channels_2048_tile_256
mem_copy_4_cores_1_chans_2048_tile_512_False
mem_copy_4_cores_1_chans_2048_tile_512_False0
mem_copy_4_cores_2_chans_2048_tile_512_False
mem_copy_4_cores_2_chans_2048_tile_512_False0
mem_copy_8_cols_1_channels_2048_tile_256
mem_copy_8_cols_2_channels_2048_tile_128
mem_copy_8_cores_1_chans_2048_tile_256_False
mem_copy_8_cores_1_chans_2048_tile_256_False0
mem_copy_8_cores_2_chans_2048_tile_256_False
mem_copy_8_cores_2_chans_2048_tile_256_False0
mha
mha0
mha_16384_64_1_8_0_0
relu_1_cols_1_channels_2048_tile_2048
relu_2_cols_1_channels_2048_tile_1024
relu_4_cols_1_channels_2048_tile_512
relu_8_cols_1_channels_2048_tile_256
rms_norm_1_cols_1_channels_2048_tile_2048
rms_norm_1_cols_2_channels_2048_tile_1024
rms_norm_2_cols_1_channels_2048_tile_1024
rms_norm_2_cols_2_channels_2048_tile_512
rms_norm_4_cols_1_channels_2048_tile_512
rms_norm_4_cols_2_channels_2048_tile_256
rms_norm_8_cols_1_channels_2048_tile_256
rms_norm_8_cols_2_channels_2048_tile_128
rope_1_cols_2_channels_4096_tile_4096_0
rope_1c_32rows_512cols_32arows_0m
rope_1c_32rows_512cols_8arows_0m
rope_2_cols_2_channels_4096_tile_2048_0
rope_2c_32rows_512cols_32arows_0m
rope_2c_32rows_512cols_8arows_0m
rope_4_cols_2_channels_4096_tile_1024_0
rope_8_cols_2_channels_4096_tile_512_0
rope_8c_32rows_512cols_32arows_0m
rope_8c_32rows_512cols_8arows_0m
rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0
seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0No metrics available. seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_FalseNo metrics available. sigmoid_1_cols_1_channels_2048_tile_2048
sigmoid_2_cols_1_channels_2048_tile_1024
sigmoid_4_cols_1_channels_2048_tile_512
sigmoid_8_cols_1_channels_2048_tile_256
silu_1_cols_1_channels_2048_tile_2048
silu_2_cols_1_channels_2048_tile_1024
silu_4_cols_1_channels_2048_tile_512
silu_8_cols_1_channels_2048_tile_256
softmax_1_cols_2_channels_4096_tile_2048
softmax_2_cols_2_channels_32768_tile_1024
softmax_2_cols_2_channels_32768_tile_512
softmax_2_cols_2_channels_4096_tile_1024
softmax_2_cols_2_channels_4096_tile_512
softmax_4_cols_4_channels_32768_tile_2048
swigluNo metrics available. swiglu_decode_1x2048x2048No metrics available. swiglu_decode_1x2048x2048_0
swiglu_prefill_256x2048x2048No metrics available. tanh_1_cols_1_channels_2048_tile_2048
tanh_2_cols_1_channels_2048_tile_1024
tanh_4_cols_1_channels_2048_tile_512
tanh_8_cols_1_channels_2048_tile_256
transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s0
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s0
weighted_rms_norm_1_cols_2_channels_2048_weights_2048
weighted_rms_norm_2_cols_2_channels_2048_weights_1024
weighted_rms_norm_4_cols_2_channels_2048_weights_512
weighted_rms_norm_8_cols_2_channels_2048_weights_256
|
…ink_with mlir-aie 1.3.1 attaches link_with to each func.func declaration (Kernel), and the aie-assign-core-link-files pass aggregates all referenced .o files into a link_files ArrayAttr on each CoreOp. aiecc resolves these paths relative to its working directory, making .a archives unnecessary. Remove KernelArchiveArtifact, ArchiveCompilationRule, and all -L/-l: archive flags from the compilation infrastructure. Each Kernel() now names its own .o file directly; multiple kernels on the same core simply reference different .o files and the pass deduplicates automatically. - compilation/base.py: remove KernelArchiveArtifact and ArchiveCompilationRule - compilation/context.py: remove ArchiveCompilationRule from rules list - common/base.py: remove self.kernel_archive; pass KernelObjectArtifacts directly as XclbinArtifact dependencies - common/__init__.py: remove KernelArchiveArtifact export - common/fusion.py: remove self.kernel_archive; pass kernel objects directly as FullElfArtifact dependencies; remove kernel_archive from callback_kwargs - all design.py files: remove kernel_archive param; hardcode .o name in each Kernel() call (parameterised names passed via callback_kwargs for gemv and gemm where the filename depends on op-level config) - gemm/op.py: extract _kernel_flags_suffix property to share the flags encoding between get_mlir_artifact() and get_kernel_artifacts() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ions The previous commit incorrectly assigned three mha.cc-defined functions to wrong .o files in their Kernel() declarations: - matmul_bf16_bf16_wrapper: defined in mha.cc → mha_mha.o, not mha_mm.o - partial_softmax: defined in mha.cc → mha_mha.o, not mha_softmax.o - init_scale_buffer: defined in mha.cc → mha_mha.o, not mha_softmax.o The aie-assign-core-link-files pass uses these link_with attributes to populate link_files on each CoreOp; referencing a .o that does not contain the symbol causes the linker to fail with an undefined symbol error when compiling the core ELF. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📊 Test Results for Test Example Applications7d76781 (2026_03_25_21_47_06) IRONCLADTested on
📈 Trends (vs main branch) for Test Example Applications7d76781 (2026_03_25_21_47_06) IRONCLAD Trendsllama_3.2_1b
llama_3.2_1b_prompt_1024_tokens_1
llama_3.2_1b_prompt_1024_tokens_40
llama_3.2_1b_prompt_13_tokens_1
llama_3.2_1b_prompt_13_tokens_40
llama_3.2_1b_prompt_2048_tokens_1
llama_3.2_1b_prompt_2048_tokens_40
|
📊 Test Results for Small Benchmark/Test Suite7d76781 (2026_03_25_21_59_28) IRONCLADTested on
📈 Trends (vs main branch) for Small Benchmark/Test Suite7d76781 (2026_03_25_21_59_28) IRONCLAD TrendsM_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128
M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512
M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256
M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8
M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024
M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0
axpy_1_cols_2_channels_2048_tile_2048_3.0
axpy_1_cols_2_channels_2048_tile_2048_3.0_0
axpy_2_cols_2_channels_2048_tile_1024_3.0
axpy_2_cols_2_channels_2048_tile_1024_3.0_0
axpy_4_cols_2_channels_2048_tile_512_3.0
axpy_4_cols_2_channels_2048_tile_512_3.0_0
axpy_8_cols_2_channels_2048_tile_256_3.0
axpy_8_cols_2_channels_2048_tile_256_3.0_0
dequant_1_cols_1_channels_2048_tile_2048
dequant_1_cols_1_channels_2048_tile_2048_0
dequant_1_cols_2_channels_2048_tile_1024
dequant_1_cols_2_channels_2048_tile_1024_0
dequant_2_cols_1_channels_2048_tile_1024
dequant_2_cols_1_channels_2048_tile_1024_0
dequant_2_cols_2_channels_2048_tile_512
dequant_2_cols_2_channels_2048_tile_512_0
dequant_4_cols_1_channels_2048_tile_512
dequant_4_cols_1_channels_2048_tile_512_0
dequant_4_cols_2_channels_2048_tile_256
dequant_4_cols_2_channels_2048_tile_256_0
dequant_8_cols_1_channels_2048_tile_256
dequant_8_cols_1_channels_2048_tile_256_0
dequant_8_cols_2_channels_2048_tile_128
dequant_8_cols_2_channels_2048_tile_128_0
eltwise_add_1_cols_2_channels_2048_tile_2048
eltwise_add_2_cols_2_channels_2048_tile_1024
eltwise_add_4_cols_2_channels_2048_tile_512
eltwise_add_8_cols_2_channels_2048_tile_256
eltwise_mul_1_cols_2_channels_2048_tile_2048
eltwise_mul_2_cols_2_channels_2048_tile_1024
eltwise_mul_4_cols_2_channels_2048_tile_512
eltwise_mul_8_cols_2_channels_2048_tile_256
embedding_dim_2048-hidden_dim_2048No metrics available. gelu_1_cols_1_channels_2048_tile_2048
gelu_1_cols_2_channels_2048_tile_1024
gelu_2_cols_1_channels_2048_tile_1024
gelu_2_cols_2_channels_2048_tile_512
gelu_4_cols_1_channels_2048_tile_512
gelu_4_cols_2_channels_2048_tile_256
gelu_8_cols_1_channels_2048_tile_256
gelu_8_cols_2_channels_2048_tile_128
gemm_1792x896x1152_64x32x48_8cols_ccolmaj
gemm_192x384x64_48x96x16_4cols
gemm_192x384x64_48x96x16_4cols_bcolmaj_ccolmaj
gemm_2048x2048x2048_64x64x32_8_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x32_8_cols_0_bcolmaj_1_ccolmaj_0
gemm_2048x2048x2048_64x64x32_8_cols_1_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_1cols
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_1_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_1_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2_cols_1_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_1_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2cols_bcolmaj
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_1_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8_cols_1_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8cols_bcolmaj_ccolmaj
gemm_384x1536x1792_32x48x64_4cols_bcolmaj
gemm_896x1792x640_32x64x80_8cols_ccolmaj
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_2048-scalar_factor_3.0
input_length_2048-num_aie_columns_1-tile_size_2048
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_1024-scalar_factor_3.0
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False
input_length_2048-num_aie_columns_2-tile_size_1024
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_512-scalar_factor_3.0
input_length_2048-num_aie_columns_4-tile_size_512
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_256-scalar_factor_3.0
input_length_2048-num_aie_columns_8-tile_size_256
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048
input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512
input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512
input_length_32768-num_aie_columns_4-num_channels_4-tile_size_2048
layer_norm_1_cols_1_channels_2048_tile_2048
layer_norm_1_cols_2_channels_2048_tile_1024
layer_norm_2_cols_1_channels_2048_tile_1024
layer_norm_2_cols_2_channels_2048_tile_512
layer_norm_4_cols_1_channels_2048_tile_512
layer_norm_4_cols_2_channels_2048_tile_256
layer_norm_8_cols_1_channels_2048_tile_256
layer_norm_8_cols_2_channels_2048_tile_128
matrix_vector_mul_128x128_32_1col
matrix_vector_mul_128x128_32_1col0
matrix_vector_mul_128x128_32tsi_128tso_1col
matrix_vector_mul_128x128_32tsi_128tso_1col0
matrix_vector_mul_2048x8192_1_1col
matrix_vector_mul_2048x8192_1_1col0
matrix_vector_mul_2048x8192_1_2col
matrix_vector_mul_2048x8192_1_2col0
matrix_vector_mul_2048x8192_1_4col
matrix_vector_mul_2048x8192_1_4col0
matrix_vector_mul_2048x8192_1_8col
matrix_vector_mul_2048x8192_1_8col0
matrix_vector_mul_2048x8192_1tsi_1024tso_2col
matrix_vector_mul_2048x8192_1tsi_1024tso_2col0
matrix_vector_mul_2048x8192_1tsi_2048tso_1col
matrix_vector_mul_2048x8192_1tsi_2048tso_1col0
matrix_vector_mul_2048x8192_1tsi_256tso_8col
matrix_vector_mul_2048x8192_1tsi_256tso_8col0
matrix_vector_mul_2048x8192_1tsi_512tso_4col
matrix_vector_mul_2048x8192_1tsi_512tso_4col0
matrix_vector_mul_8192x2048_4_1col
matrix_vector_mul_8192x2048_4_1col0
matrix_vector_mul_8192x2048_4_2col
matrix_vector_mul_8192x2048_4_2col0
matrix_vector_mul_8192x2048_4_4col
matrix_vector_mul_8192x2048_4_4col0
matrix_vector_mul_8192x2048_4_8col
matrix_vector_mul_8192x2048_4_8col0
matrix_vector_mul_8192x2048_4tsi_1024tso_1col
matrix_vector_mul_8192x2048_4tsi_1024tso_1col0
matrix_vector_mul_8192x2048_4tsi_1024tso_2col
matrix_vector_mul_8192x2048_4tsi_1024tso_2col0
matrix_vector_mul_8192x2048_4tsi_1024tso_4col
matrix_vector_mul_8192x2048_4tsi_1024tso_4col0
matrix_vector_mul_8192x2048_4tsi_1024tso_8col
matrix_vector_mul_8192x2048_4tsi_1024tso_8col0
mem_copy_16_cores_2_chans_2048_tile_128_False
mem_copy_16_cores_2_chans_2048_tile_128_False0
mem_copy_1_cols_1_channels_2048_tile_2048
mem_copy_1_cols_2_channels_2048_tile_1024
mem_copy_1_cores_1_chans_2048_tile_2048_False
mem_copy_1_cores_1_chans_2048_tile_2048_False0
mem_copy_2_cols_1_channels_2048_tile_1024
mem_copy_2_cols_2_channels_2048_tile_512
mem_copy_2_cores_1_chans_2048_tile_1024_False
mem_copy_2_cores_1_chans_2048_tile_1024_False0
mem_copy_2_cores_2_chans_2048_tile_1024_False
mem_copy_2_cores_2_chans_2048_tile_1024_False0
mem_copy_4_cols_1_channels_2048_tile_512
mem_copy_4_cols_2_channels_2048_tile_256
mem_copy_4_cores_1_chans_2048_tile_512_False
mem_copy_4_cores_1_chans_2048_tile_512_False0
mem_copy_4_cores_2_chans_2048_tile_512_False
mem_copy_4_cores_2_chans_2048_tile_512_False0
mem_copy_8_cols_1_channels_2048_tile_256
mem_copy_8_cols_2_channels_2048_tile_128
mem_copy_8_cores_1_chans_2048_tile_256_False
mem_copy_8_cores_1_chans_2048_tile_256_False0
mem_copy_8_cores_2_chans_2048_tile_256_False
mem_copy_8_cores_2_chans_2048_tile_256_False0
mha
mha0
mha_16384_64_1_8_0_0
relu_1_cols_1_channels_2048_tile_2048
relu_2_cols_1_channels_2048_tile_1024
relu_4_cols_1_channels_2048_tile_512
relu_8_cols_1_channels_2048_tile_256
rms_norm_1_cols_1_channels_2048_tile_2048
rms_norm_1_cols_2_channels_2048_tile_1024
rms_norm_2_cols_1_channels_2048_tile_1024
rms_norm_2_cols_2_channels_2048_tile_512
rms_norm_4_cols_1_channels_2048_tile_512
rms_norm_4_cols_2_channels_2048_tile_256
rms_norm_8_cols_1_channels_2048_tile_256
rms_norm_8_cols_2_channels_2048_tile_128
rope_1_cols_2_channels_4096_tile_4096_0
rope_1c_32rows_512cols_32arows_0m
rope_1c_32rows_512cols_8arows_0m
rope_2_cols_2_channels_4096_tile_2048_0
rope_2c_32rows_512cols_32arows_0m
rope_2c_32rows_512cols_8arows_0m
rope_4_cols_2_channels_4096_tile_1024_0
rope_8_cols_2_channels_4096_tile_512_0
rope_8c_32rows_512cols_32arows_0m
rope_8c_32rows_512cols_8arows_0m
rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0
seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0No metrics available. seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_FalseNo metrics available. sigmoid_1_cols_1_channels_2048_tile_2048
sigmoid_2_cols_1_channels_2048_tile_1024
sigmoid_4_cols_1_channels_2048_tile_512
sigmoid_8_cols_1_channels_2048_tile_256
silu_1_cols_1_channels_2048_tile_2048
silu_2_cols_1_channels_2048_tile_1024
silu_4_cols_1_channels_2048_tile_512
silu_8_cols_1_channels_2048_tile_256
softmax_1_cols_2_channels_4096_tile_2048
softmax_2_cols_2_channels_32768_tile_1024
softmax_2_cols_2_channels_32768_tile_512
softmax_2_cols_2_channels_4096_tile_1024
softmax_2_cols_2_channels_4096_tile_512
softmax_4_cols_4_channels_32768_tile_2048
swigluNo metrics available. swiglu_decode_1x2048x2048No metrics available. swiglu_decode_1x2048x2048_0
swiglu_prefill_256x2048x2048No metrics available. tanh_1_cols_1_channels_2048_tile_2048
tanh_2_cols_1_channels_2048_tile_1024
tanh_4_cols_1_channels_2048_tile_512
tanh_8_cols_1_channels_2048_tile_256
transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s0
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s0
weighted_rms_norm_1_cols_2_channels_2048_weights_2048
weighted_rms_norm_2_cols_2_channels_2048_weights_1024
weighted_rms_norm_4_cols_2_channels_2048_weights_512
weighted_rms_norm_8_cols_2_channels_2048_weights_256
|
Two bugs introduced by the .o-linking refactor: 1. PeanoCompilationRule._prefix_symbols/_rename_symbols hardcoded peano_dir/bin/llvm-objcopy, which doesn't exist in all CI environments (e.g. when llama_npu.py runs as a subprocess with a different peano_dir resolution). Add _find_tool() that searches peano_dir/bin, mlir_aie_dir/bin, and system PATH (both bare name and name-18 variant, matching the devel branch convention). 2. mha/op.py compiled softmax.cc into mha_softmax.o, but no Kernel() declaration in design.py references mha_softmax.o. The aie-assign-core-link-files pass only traces direct func.call edges, so mha_softmax.o was never added to any core's link_files, causing the linker to fail with an undefined symbol for partial_softmax_bf16 (called by partial_softmax in mha_mha.o). Fix: remove the separate mha_softmax.o artifact and instead pass -include softmax.cc when compiling mha.cc, so partial_softmax_bf16 is compiled directly into mha_mha.o. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📊 Test Results for Test Example Applicationsdd8281d (2026_03_26_01_30_35) IRONCLADTested on
📈 Trends (vs main branch) for Test Example Applicationsdd8281d (2026_03_26_01_30_35) IRONCLAD Trendsllama_3.2_1b
llama_3.2_1b_prompt_1024_tokens_1
llama_3.2_1b_prompt_1024_tokens_40
llama_3.2_1b_prompt_13_tokens_1
llama_3.2_1b_prompt_13_tokens_40
llama_3.2_1b_prompt_2048_tokens_1
llama_3.2_1b_prompt_2048_tokens_40
|
📊 Test Results for Small Benchmark/Test Suitedd8281d (2026_03_26_01_39_48) IRONCLADTested on
📈 Trends (vs main branch) for Small Benchmark/Test Suitedd8281d (2026_03_26_01_39_48) IRONCLAD TrendsM_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128
M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512
M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256
M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8
M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024
M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0
axpy_1_cols_2_channels_2048_tile_2048_3.0
axpy_1_cols_2_channels_2048_tile_2048_3.0_0
axpy_2_cols_2_channels_2048_tile_1024_3.0
axpy_2_cols_2_channels_2048_tile_1024_3.0_0
axpy_4_cols_2_channels_2048_tile_512_3.0
axpy_4_cols_2_channels_2048_tile_512_3.0_0
axpy_8_cols_2_channels_2048_tile_256_3.0
axpy_8_cols_2_channels_2048_tile_256_3.0_0
dequant_1_cols_1_channels_2048_tile_2048
dequant_1_cols_1_channels_2048_tile_2048_0
dequant_1_cols_2_channels_2048_tile_1024
dequant_1_cols_2_channels_2048_tile_1024_0
dequant_2_cols_1_channels_2048_tile_1024
dequant_2_cols_1_channels_2048_tile_1024_0
dequant_2_cols_2_channels_2048_tile_512
dequant_2_cols_2_channels_2048_tile_512_0
dequant_4_cols_1_channels_2048_tile_512
dequant_4_cols_1_channels_2048_tile_512_0
dequant_4_cols_2_channels_2048_tile_256
dequant_4_cols_2_channels_2048_tile_256_0
dequant_8_cols_1_channels_2048_tile_256
dequant_8_cols_1_channels_2048_tile_256_0
dequant_8_cols_2_channels_2048_tile_128
dequant_8_cols_2_channels_2048_tile_128_0
eltwise_add_1_cols_2_channels_2048_tile_2048
eltwise_add_2_cols_2_channels_2048_tile_1024
eltwise_add_4_cols_2_channels_2048_tile_512
eltwise_add_8_cols_2_channels_2048_tile_256
eltwise_mul_1_cols_2_channels_2048_tile_2048
eltwise_mul_2_cols_2_channels_2048_tile_1024
eltwise_mul_4_cols_2_channels_2048_tile_512
eltwise_mul_8_cols_2_channels_2048_tile_256
embedding_dim_2048-hidden_dim_2048No metrics available. gelu_1_cols_1_channels_2048_tile_2048
gelu_1_cols_2_channels_2048_tile_1024
gelu_2_cols_1_channels_2048_tile_1024
gelu_2_cols_2_channels_2048_tile_512
gelu_4_cols_1_channels_2048_tile_512
gelu_4_cols_2_channels_2048_tile_256
gelu_8_cols_1_channels_2048_tile_256
gelu_8_cols_2_channels_2048_tile_128
gemm_1792x896x1152_64x32x48_8cols_ccolmaj
gemm_192x384x64_48x96x16_4cols
gemm_192x384x64_48x96x16_4cols_bcolmaj_ccolmaj
gemm_2048x2048x2048_64x64x32_8_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x32_8_cols_0_bcolmaj_1_ccolmaj_0
gemm_2048x2048x2048_64x64x32_8_cols_1_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_1cols
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_1_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_1_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2_cols_1_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_1_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2cols_bcolmaj
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_1_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8_cols_1_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8cols_bcolmaj_ccolmaj
gemm_384x1536x1792_32x48x64_4cols_bcolmaj
gemm_896x1792x640_32x64x80_8cols_ccolmaj
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_2048-scalar_factor_3.0
input_length_2048-num_aie_columns_1-tile_size_2048
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_1024-scalar_factor_3.0
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False
input_length_2048-num_aie_columns_2-tile_size_1024
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_512-scalar_factor_3.0
input_length_2048-num_aie_columns_4-tile_size_512
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_256-scalar_factor_3.0
input_length_2048-num_aie_columns_8-tile_size_256
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048
input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512
input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512
input_length_32768-num_aie_columns_4-num_channels_4-tile_size_2048
layer_norm_1_cols_1_channels_2048_tile_2048
layer_norm_1_cols_2_channels_2048_tile_1024
layer_norm_2_cols_1_channels_2048_tile_1024
layer_norm_2_cols_2_channels_2048_tile_512
layer_norm_4_cols_1_channels_2048_tile_512
layer_norm_4_cols_2_channels_2048_tile_256
layer_norm_8_cols_1_channels_2048_tile_256
layer_norm_8_cols_2_channels_2048_tile_128
matrix_vector_mul_128x128_32_1col
matrix_vector_mul_128x128_32_1col0
matrix_vector_mul_128x128_32tsi_128tso_1col
matrix_vector_mul_128x128_32tsi_128tso_1col0
matrix_vector_mul_2048x8192_1_1col
matrix_vector_mul_2048x8192_1_1col0
matrix_vector_mul_2048x8192_1_2col
matrix_vector_mul_2048x8192_1_2col0
matrix_vector_mul_2048x8192_1_4col
matrix_vector_mul_2048x8192_1_4col0
matrix_vector_mul_2048x8192_1_8col
matrix_vector_mul_2048x8192_1_8col0
matrix_vector_mul_2048x8192_1tsi_1024tso_2col
matrix_vector_mul_2048x8192_1tsi_1024tso_2col0
matrix_vector_mul_2048x8192_1tsi_2048tso_1col
matrix_vector_mul_2048x8192_1tsi_2048tso_1col0
matrix_vector_mul_2048x8192_1tsi_256tso_8col
matrix_vector_mul_2048x8192_1tsi_256tso_8col0
matrix_vector_mul_2048x8192_1tsi_512tso_4col
matrix_vector_mul_2048x8192_1tsi_512tso_4col0
matrix_vector_mul_8192x2048_4_1col
matrix_vector_mul_8192x2048_4_1col0
matrix_vector_mul_8192x2048_4_2col
matrix_vector_mul_8192x2048_4_2col0
matrix_vector_mul_8192x2048_4_4col
matrix_vector_mul_8192x2048_4_4col0
matrix_vector_mul_8192x2048_4_8col
matrix_vector_mul_8192x2048_4_8col0
matrix_vector_mul_8192x2048_4tsi_1024tso_1col
matrix_vector_mul_8192x2048_4tsi_1024tso_1col0
matrix_vector_mul_8192x2048_4tsi_1024tso_2col
matrix_vector_mul_8192x2048_4tsi_1024tso_2col0
matrix_vector_mul_8192x2048_4tsi_1024tso_4col
matrix_vector_mul_8192x2048_4tsi_1024tso_4col0
matrix_vector_mul_8192x2048_4tsi_1024tso_8col
matrix_vector_mul_8192x2048_4tsi_1024tso_8col0
mem_copy_16_cores_2_chans_2048_tile_128_False
mem_copy_16_cores_2_chans_2048_tile_128_False0
mem_copy_1_cols_1_channels_2048_tile_2048
mem_copy_1_cols_2_channels_2048_tile_1024
mem_copy_1_cores_1_chans_2048_tile_2048_False
mem_copy_1_cores_1_chans_2048_tile_2048_False0
mem_copy_2_cols_1_channels_2048_tile_1024
mem_copy_2_cols_2_channels_2048_tile_512
mem_copy_2_cores_1_chans_2048_tile_1024_False
mem_copy_2_cores_1_chans_2048_tile_1024_False0
mem_copy_2_cores_2_chans_2048_tile_1024_False
mem_copy_2_cores_2_chans_2048_tile_1024_False0
mem_copy_4_cols_1_channels_2048_tile_512
mem_copy_4_cols_2_channels_2048_tile_256
mem_copy_4_cores_1_chans_2048_tile_512_False
mem_copy_4_cores_1_chans_2048_tile_512_False0
mem_copy_4_cores_2_chans_2048_tile_512_False
mem_copy_4_cores_2_chans_2048_tile_512_False0
mem_copy_8_cols_1_channels_2048_tile_256
mem_copy_8_cols_2_channels_2048_tile_128
mem_copy_8_cores_1_chans_2048_tile_256_False
mem_copy_8_cores_1_chans_2048_tile_256_False0
mem_copy_8_cores_2_chans_2048_tile_256_False
mem_copy_8_cores_2_chans_2048_tile_256_False0
mha
mha0
mha_16384_64_1_8_0_0
relu_1_cols_1_channels_2048_tile_2048
relu_2_cols_1_channels_2048_tile_1024
relu_4_cols_1_channels_2048_tile_512
relu_8_cols_1_channels_2048_tile_256
rms_norm_1_cols_1_channels_2048_tile_2048
rms_norm_1_cols_2_channels_2048_tile_1024
rms_norm_2_cols_1_channels_2048_tile_1024
rms_norm_2_cols_2_channels_2048_tile_512
rms_norm_4_cols_1_channels_2048_tile_512
rms_norm_4_cols_2_channels_2048_tile_256
rms_norm_8_cols_1_channels_2048_tile_256
rms_norm_8_cols_2_channels_2048_tile_128
rope_1_cols_2_channels_4096_tile_4096_0
rope_1c_32rows_512cols_32arows_0m
rope_1c_32rows_512cols_8arows_0m
rope_2_cols_2_channels_4096_tile_2048_0
rope_2c_32rows_512cols_32arows_0m
rope_2c_32rows_512cols_8arows_0m
rope_4_cols_2_channels_4096_tile_1024_0
rope_8_cols_2_channels_4096_tile_512_0
rope_8c_32rows_512cols_32arows_0m
rope_8c_32rows_512cols_8arows_0m
rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0
seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0No metrics available. seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_FalseNo metrics available. sigmoid_1_cols_1_channels_2048_tile_2048
sigmoid_2_cols_1_channels_2048_tile_1024
sigmoid_4_cols_1_channels_2048_tile_512
sigmoid_8_cols_1_channels_2048_tile_256
silu_1_cols_1_channels_2048_tile_2048
silu_2_cols_1_channels_2048_tile_1024
silu_4_cols_1_channels_2048_tile_512
silu_8_cols_1_channels_2048_tile_256
softmax_1_cols_2_channels_4096_tile_2048
softmax_2_cols_2_channels_32768_tile_1024
softmax_2_cols_2_channels_32768_tile_512
softmax_2_cols_2_channels_4096_tile_1024
softmax_2_cols_2_channels_4096_tile_512
softmax_4_cols_4_channels_32768_tile_2048
swigluNo metrics available. swiglu_decode_1x2048x2048No metrics available. swiglu_decode_1x2048x2048_0
swiglu_prefill_256x2048x2048No metrics available. tanh_1_cols_1_channels_2048_tile_2048
tanh_2_cols_1_channels_2048_tile_1024
tanh_4_cols_1_channels_2048_tile_512
tanh_8_cols_1_channels_2048_tile_256
transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s0
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s0
weighted_rms_norm_1_cols_2_channels_2048_weights_2048
weighted_rms_norm_2_cols_2_channels_2048_weights_1024
weighted_rms_norm_4_cols_2_channels_2048_weights_512
weighted_rms_norm_8_cols_2_channels_2048_weights_256
|
Two bugs:
1. mha_mha.o: clang crashes when softmax.cc is passed via -include because
__AIE_ARCH__ is not yet defined at preamble processing time. Fix by
adding #include "softmax.cc" directly in mha.cc and removing the -include
flag from op.py (softmax.cc added as a proper source dependency instead).
2. Fused operator link_with mismatch: FusedMLIROperator prefixes kernel .o
filenames with op{idx}_ but operator design files passed the un-prefixed
name as the link_with attribute in Kernel(), causing aiecc to look for
files that don't exist. Fix by applying func_prefix to the .o filename in
Kernel() calls across all affected operators (rms_norm, silu, elementwise_add,
elementwise_mul, softmax, transpose, rope, gemv).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📊 Test Results for Test Example Applications8c138a8 (2026_03_26_14_51_15) IRONCLADTested on
📈 Trends (vs main branch) for Test Example Applications8c138a8 (2026_03_26_14_51_15) IRONCLAD Trendsllama_3.2_1b
llama_3.2_1b_prompt_1024_tokens_1
llama_3.2_1b_prompt_1024_tokens_40
llama_3.2_1b_prompt_13_tokens_1
llama_3.2_1b_prompt_13_tokens_40
llama_3.2_1b_prompt_2048_tokens_1
llama_3.2_1b_prompt_2048_tokens_40
|
📊 Test Results for Small Benchmark/Test Suite8c138a8 (2026_03_26_15_00_25) IRONCLADTested on
📈 Trends (vs main branch) for Small Benchmark/Test Suite8c138a8 (2026_03_26_15_00_25) IRONCLAD TrendsM_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128
M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512
M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256
M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8
M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024
M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0
axpy_1_cols_2_channels_2048_tile_2048_3.0
axpy_1_cols_2_channels_2048_tile_2048_3.0_0
axpy_2_cols_2_channels_2048_tile_1024_3.0
axpy_2_cols_2_channels_2048_tile_1024_3.0_0
axpy_4_cols_2_channels_2048_tile_512_3.0
axpy_4_cols_2_channels_2048_tile_512_3.0_0
axpy_8_cols_2_channels_2048_tile_256_3.0
axpy_8_cols_2_channels_2048_tile_256_3.0_0
dequant_1_cols_1_channels_2048_tile_2048
dequant_1_cols_1_channels_2048_tile_2048_0
dequant_1_cols_2_channels_2048_tile_1024
dequant_1_cols_2_channels_2048_tile_1024_0
dequant_2_cols_1_channels_2048_tile_1024
dequant_2_cols_1_channels_2048_tile_1024_0
dequant_2_cols_2_channels_2048_tile_512
dequant_2_cols_2_channels_2048_tile_512_0
dequant_4_cols_1_channels_2048_tile_512
dequant_4_cols_1_channels_2048_tile_512_0
dequant_4_cols_2_channels_2048_tile_256
dequant_4_cols_2_channels_2048_tile_256_0
dequant_8_cols_1_channels_2048_tile_256
dequant_8_cols_1_channels_2048_tile_256_0
dequant_8_cols_2_channels_2048_tile_128
dequant_8_cols_2_channels_2048_tile_128_0
eltwise_add_1_cols_2_channels_2048_tile_2048
eltwise_add_2_cols_2_channels_2048_tile_1024
eltwise_add_4_cols_2_channels_2048_tile_512
eltwise_add_8_cols_2_channels_2048_tile_256
eltwise_mul_1_cols_2_channels_2048_tile_2048
eltwise_mul_2_cols_2_channels_2048_tile_1024
eltwise_mul_4_cols_2_channels_2048_tile_512
eltwise_mul_8_cols_2_channels_2048_tile_256
embedding_dim_2048-hidden_dim_2048No metrics available. gelu_1_cols_1_channels_2048_tile_2048
gelu_1_cols_2_channels_2048_tile_1024
gelu_2_cols_1_channels_2048_tile_1024
gelu_2_cols_2_channels_2048_tile_512
gelu_4_cols_1_channels_2048_tile_512
gelu_4_cols_2_channels_2048_tile_256
gelu_8_cols_1_channels_2048_tile_256
gelu_8_cols_2_channels_2048_tile_128
gemm_1792x896x1152_64x32x48_8cols_ccolmaj
gemm_192x384x64_48x96x16_4cols
gemm_192x384x64_48x96x16_4cols_bcolmaj_ccolmaj
gemm_2048x2048x2048_64x64x32_8_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x32_8_cols_0_bcolmaj_1_ccolmaj_0
gemm_2048x2048x2048_64x64x32_8_cols_1_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_1cols
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_1_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_1_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2_cols_1_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_1_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2cols_bcolmaj
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_1_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8_cols_1_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8cols_bcolmaj_ccolmaj
gemm_384x1536x1792_32x48x64_4cols_bcolmaj
gemm_896x1792x640_32x64x80_8cols_ccolmaj
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_2048-scalar_factor_3.0
input_length_2048-num_aie_columns_1-tile_size_2048
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_1024-scalar_factor_3.0
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False
input_length_2048-num_aie_columns_2-tile_size_1024
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_512-scalar_factor_3.0
input_length_2048-num_aie_columns_4-tile_size_512
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_256-scalar_factor_3.0
input_length_2048-num_aie_columns_8-tile_size_256
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048
input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512
input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512
input_length_32768-num_aie_columns_4-num_channels_4-tile_size_2048
layer_norm_1_cols_1_channels_2048_tile_2048
layer_norm_1_cols_2_channels_2048_tile_1024
layer_norm_2_cols_1_channels_2048_tile_1024
layer_norm_2_cols_2_channels_2048_tile_512
layer_norm_4_cols_1_channels_2048_tile_512
layer_norm_4_cols_2_channels_2048_tile_256
layer_norm_8_cols_1_channels_2048_tile_256
layer_norm_8_cols_2_channels_2048_tile_128
matrix_vector_mul_128x128_32_1col
matrix_vector_mul_128x128_32_1col0
matrix_vector_mul_128x128_32tsi_128tso_1col
matrix_vector_mul_128x128_32tsi_128tso_1col0
matrix_vector_mul_2048x8192_1_1col
matrix_vector_mul_2048x8192_1_1col0
matrix_vector_mul_2048x8192_1_2col
matrix_vector_mul_2048x8192_1_2col0
matrix_vector_mul_2048x8192_1_4col
matrix_vector_mul_2048x8192_1_4col0
matrix_vector_mul_2048x8192_1_8col
matrix_vector_mul_2048x8192_1_8col0
matrix_vector_mul_2048x8192_1tsi_1024tso_2col
matrix_vector_mul_2048x8192_1tsi_1024tso_2col0
matrix_vector_mul_2048x8192_1tsi_2048tso_1col
matrix_vector_mul_2048x8192_1tsi_2048tso_1col0
matrix_vector_mul_2048x8192_1tsi_256tso_8col
matrix_vector_mul_2048x8192_1tsi_256tso_8col0
matrix_vector_mul_2048x8192_1tsi_512tso_4col
matrix_vector_mul_2048x8192_1tsi_512tso_4col0
matrix_vector_mul_8192x2048_4_1col
matrix_vector_mul_8192x2048_4_1col0
matrix_vector_mul_8192x2048_4_2col
matrix_vector_mul_8192x2048_4_2col0
matrix_vector_mul_8192x2048_4_4col
matrix_vector_mul_8192x2048_4_4col0
matrix_vector_mul_8192x2048_4_8col
matrix_vector_mul_8192x2048_4_8col0
matrix_vector_mul_8192x2048_4tsi_1024tso_1col
matrix_vector_mul_8192x2048_4tsi_1024tso_1col0
matrix_vector_mul_8192x2048_4tsi_1024tso_2col
matrix_vector_mul_8192x2048_4tsi_1024tso_2col0
matrix_vector_mul_8192x2048_4tsi_1024tso_4col
matrix_vector_mul_8192x2048_4tsi_1024tso_4col0
matrix_vector_mul_8192x2048_4tsi_1024tso_8col
matrix_vector_mul_8192x2048_4tsi_1024tso_8col0
mem_copy_16_cores_2_chans_2048_tile_128_False
mem_copy_16_cores_2_chans_2048_tile_128_False0
mem_copy_1_cols_1_channels_2048_tile_2048
mem_copy_1_cols_2_channels_2048_tile_1024
mem_copy_1_cores_1_chans_2048_tile_2048_False
mem_copy_1_cores_1_chans_2048_tile_2048_False0
mem_copy_2_cols_1_channels_2048_tile_1024
mem_copy_2_cols_2_channels_2048_tile_512
mem_copy_2_cores_1_chans_2048_tile_1024_False
mem_copy_2_cores_1_chans_2048_tile_1024_False0
mem_copy_2_cores_2_chans_2048_tile_1024_False
mem_copy_2_cores_2_chans_2048_tile_1024_False0
mem_copy_4_cols_1_channels_2048_tile_512
mem_copy_4_cols_2_channels_2048_tile_256
mem_copy_4_cores_1_chans_2048_tile_512_False
mem_copy_4_cores_1_chans_2048_tile_512_False0
mem_copy_4_cores_2_chans_2048_tile_512_False
mem_copy_4_cores_2_chans_2048_tile_512_False0
mem_copy_8_cols_1_channels_2048_tile_256
mem_copy_8_cols_2_channels_2048_tile_128
mem_copy_8_cores_1_chans_2048_tile_256_False
mem_copy_8_cores_1_chans_2048_tile_256_False0
mem_copy_8_cores_2_chans_2048_tile_256_False
mem_copy_8_cores_2_chans_2048_tile_256_False0
mha
mha0
mha_16384_64_1_8_0_0
relu_1_cols_1_channels_2048_tile_2048
relu_2_cols_1_channels_2048_tile_1024
relu_4_cols_1_channels_2048_tile_512
relu_8_cols_1_channels_2048_tile_256
rms_norm_1_cols_1_channels_2048_tile_2048
rms_norm_1_cols_2_channels_2048_tile_1024
rms_norm_2_cols_1_channels_2048_tile_1024
rms_norm_2_cols_2_channels_2048_tile_512
rms_norm_4_cols_1_channels_2048_tile_512
rms_norm_4_cols_2_channels_2048_tile_256
rms_norm_8_cols_1_channels_2048_tile_256
rms_norm_8_cols_2_channels_2048_tile_128
rope_1_cols_2_channels_4096_tile_4096_0
rope_1c_32rows_512cols_32arows_0m
rope_1c_32rows_512cols_8arows_0m
rope_2_cols_2_channels_4096_tile_2048_0
rope_2c_32rows_512cols_32arows_0m
rope_2c_32rows_512cols_8arows_0m
rope_4_cols_2_channels_4096_tile_1024_0
rope_8_cols_2_channels_4096_tile_512_0
rope_8c_32rows_512cols_32arows_0m
rope_8c_32rows_512cols_8arows_0m
rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0
seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0No metrics available. seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_FalseNo metrics available. sigmoid_1_cols_1_channels_2048_tile_2048
sigmoid_2_cols_1_channels_2048_tile_1024
sigmoid_4_cols_1_channels_2048_tile_512
sigmoid_8_cols_1_channels_2048_tile_256
silu_1_cols_1_channels_2048_tile_2048
silu_2_cols_1_channels_2048_tile_1024
silu_4_cols_1_channels_2048_tile_512
silu_8_cols_1_channels_2048_tile_256
softmax_1_cols_2_channels_4096_tile_2048
softmax_2_cols_2_channels_32768_tile_1024
softmax_2_cols_2_channels_32768_tile_512
softmax_2_cols_2_channels_4096_tile_1024
softmax_2_cols_2_channels_4096_tile_512
softmax_4_cols_4_channels_32768_tile_2048
swigluNo metrics available. swiglu_decode_1x2048x2048No metrics available. swiglu_decode_1x2048x2048_0
swiglu_prefill_256x2048x2048No metrics available. tanh_1_cols_1_channels_2048_tile_2048
tanh_2_cols_1_channels_2048_tile_1024
tanh_4_cols_1_channels_2048_tile_512
tanh_8_cols_1_channels_2048_tile_256
transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s0
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s0
weighted_rms_norm_1_cols_2_channels_2048_weights_2048
weighted_rms_norm_2_cols_2_channels_2048_weights_1024
weighted_rms_norm_4_cols_2_channels_2048_weights_512
weighted_rms_norm_8_cols_2_channels_2048_weights_256
|
📊 Test Results for Test Example Applications14eee74 (2026_03_26_15_11_20) IRONCLADTested on
📈 Trends (vs main branch) for Test Example Applications14eee74 (2026_03_26_15_11_20) IRONCLAD Trendsllama_3.2_1b
llama_3.2_1b_prompt_1024_tokens_1
llama_3.2_1b_prompt_1024_tokens_40
llama_3.2_1b_prompt_13_tokens_1
llama_3.2_1b_prompt_13_tokens_40
llama_3.2_1b_prompt_2048_tokens_1
llama_3.2_1b_prompt_2048_tokens_40
|
📊 Test Results for Small Benchmark/Test Suite14eee74 (2026_03_26_15_37_43) IRONCLADTested on
📈 Trends (vs main branch) for Small Benchmark/Test Suite14eee74 (2026_03_26_15_37_43) IRONCLAD TrendsM_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128
M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512
M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256
M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8
M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024
M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0
axpy_1_cols_2_channels_2048_tile_2048_3.0
axpy_1_cols_2_channels_2048_tile_2048_3.0_0
axpy_2_cols_2_channels_2048_tile_1024_3.0
axpy_2_cols_2_channels_2048_tile_1024_3.0_0
axpy_4_cols_2_channels_2048_tile_512_3.0
axpy_4_cols_2_channels_2048_tile_512_3.0_0
axpy_8_cols_2_channels_2048_tile_256_3.0
axpy_8_cols_2_channels_2048_tile_256_3.0_0
dequant_1_cols_1_channels_2048_tile_2048
dequant_1_cols_1_channels_2048_tile_2048_0
dequant_1_cols_2_channels_2048_tile_1024
dequant_1_cols_2_channels_2048_tile_1024_0
dequant_2_cols_1_channels_2048_tile_1024
dequant_2_cols_1_channels_2048_tile_1024_0
dequant_2_cols_2_channels_2048_tile_512
dequant_2_cols_2_channels_2048_tile_512_0
dequant_4_cols_1_channels_2048_tile_512
dequant_4_cols_1_channels_2048_tile_512_0
dequant_4_cols_2_channels_2048_tile_256
dequant_4_cols_2_channels_2048_tile_256_0
dequant_8_cols_1_channels_2048_tile_256
dequant_8_cols_1_channels_2048_tile_256_0
dequant_8_cols_2_channels_2048_tile_128
dequant_8_cols_2_channels_2048_tile_128_0
eltwise_add_1_cols_2_channels_2048_tile_2048
eltwise_add_2_cols_2_channels_2048_tile_1024
eltwise_add_4_cols_2_channels_2048_tile_512
eltwise_add_8_cols_2_channels_2048_tile_256
eltwise_mul_1_cols_2_channels_2048_tile_2048
eltwise_mul_2_cols_2_channels_2048_tile_1024
eltwise_mul_4_cols_2_channels_2048_tile_512
eltwise_mul_8_cols_2_channels_2048_tile_256
embedding_dim_2048-hidden_dim_2048No metrics available. gelu_1_cols_1_channels_2048_tile_2048
gelu_1_cols_2_channels_2048_tile_1024
gelu_2_cols_1_channels_2048_tile_1024
gelu_2_cols_2_channels_2048_tile_512
gelu_4_cols_1_channels_2048_tile_512
gelu_4_cols_2_channels_2048_tile_256
gelu_8_cols_1_channels_2048_tile_256
gelu_8_cols_2_channels_2048_tile_128
gemm_1792x896x1152_64x32x48_8cols_ccolmaj
gemm_192x384x64_48x96x16_4cols
gemm_192x384x64_48x96x16_4cols_bcolmaj_ccolmaj
gemm_2048x2048x2048_64x64x32_8_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x32_8_cols_0_bcolmaj_1_ccolmaj_0
gemm_2048x2048x2048_64x64x32_8_cols_1_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_1cols
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_1_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_1_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2_cols_1_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_1_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2cols_bcolmaj
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_1_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8_cols_1_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8cols_bcolmaj_ccolmaj
gemm_384x1536x1792_32x48x64_4cols_bcolmaj
gemm_896x1792x640_32x64x80_8cols_ccolmaj
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_2048-scalar_factor_3.0
input_length_2048-num_aie_columns_1-tile_size_2048
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_1024-scalar_factor_3.0
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False
input_length_2048-num_aie_columns_2-tile_size_1024
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_512-scalar_factor_3.0
input_length_2048-num_aie_columns_4-tile_size_512
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_256-scalar_factor_3.0
input_length_2048-num_aie_columns_8-tile_size_256
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048
input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512
input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512
input_length_32768-num_aie_columns_4-num_channels_4-tile_size_2048
layer_norm_1_cols_1_channels_2048_tile_2048
layer_norm_1_cols_2_channels_2048_tile_1024
layer_norm_2_cols_1_channels_2048_tile_1024
layer_norm_2_cols_2_channels_2048_tile_512
layer_norm_4_cols_1_channels_2048_tile_512
layer_norm_4_cols_2_channels_2048_tile_256
layer_norm_8_cols_1_channels_2048_tile_256
layer_norm_8_cols_2_channels_2048_tile_128
matrix_vector_mul_128x128_32_1col
matrix_vector_mul_128x128_32_1col0
matrix_vector_mul_128x128_32tsi_128tso_1col
matrix_vector_mul_128x128_32tsi_128tso_1col0
matrix_vector_mul_2048x8192_1_1col
matrix_vector_mul_2048x8192_1_1col0
matrix_vector_mul_2048x8192_1_2col
matrix_vector_mul_2048x8192_1_2col0
matrix_vector_mul_2048x8192_1_4col
matrix_vector_mul_2048x8192_1_4col0
matrix_vector_mul_2048x8192_1_8col
matrix_vector_mul_2048x8192_1_8col0
matrix_vector_mul_2048x8192_1tsi_1024tso_2col
matrix_vector_mul_2048x8192_1tsi_1024tso_2col0
matrix_vector_mul_2048x8192_1tsi_2048tso_1col
matrix_vector_mul_2048x8192_1tsi_2048tso_1col0
matrix_vector_mul_2048x8192_1tsi_256tso_8col
matrix_vector_mul_2048x8192_1tsi_256tso_8col0
matrix_vector_mul_2048x8192_1tsi_512tso_4col
matrix_vector_mul_2048x8192_1tsi_512tso_4col0
matrix_vector_mul_8192x2048_4_1col
matrix_vector_mul_8192x2048_4_1col0
matrix_vector_mul_8192x2048_4_2col
matrix_vector_mul_8192x2048_4_2col0
matrix_vector_mul_8192x2048_4_4col
matrix_vector_mul_8192x2048_4_4col0
matrix_vector_mul_8192x2048_4_8col
matrix_vector_mul_8192x2048_4_8col0
matrix_vector_mul_8192x2048_4tsi_1024tso_1col
matrix_vector_mul_8192x2048_4tsi_1024tso_1col0
matrix_vector_mul_8192x2048_4tsi_1024tso_2col
matrix_vector_mul_8192x2048_4tsi_1024tso_2col0
matrix_vector_mul_8192x2048_4tsi_1024tso_4col
matrix_vector_mul_8192x2048_4tsi_1024tso_4col0
matrix_vector_mul_8192x2048_4tsi_1024tso_8col
matrix_vector_mul_8192x2048_4tsi_1024tso_8col0
mem_copy_16_cores_2_chans_2048_tile_128_False
mem_copy_16_cores_2_chans_2048_tile_128_False0
mem_copy_1_cols_1_channels_2048_tile_2048
mem_copy_1_cols_2_channels_2048_tile_1024
mem_copy_1_cores_1_chans_2048_tile_2048_False
mem_copy_1_cores_1_chans_2048_tile_2048_False0
mem_copy_2_cols_1_channels_2048_tile_1024
mem_copy_2_cols_2_channels_2048_tile_512
mem_copy_2_cores_1_chans_2048_tile_1024_False
mem_copy_2_cores_1_chans_2048_tile_1024_False0
mem_copy_2_cores_2_chans_2048_tile_1024_False
mem_copy_2_cores_2_chans_2048_tile_1024_False0
mem_copy_4_cols_1_channels_2048_tile_512
mem_copy_4_cols_2_channels_2048_tile_256
mem_copy_4_cores_1_chans_2048_tile_512_False
mem_copy_4_cores_1_chans_2048_tile_512_False0
mem_copy_4_cores_2_chans_2048_tile_512_False
mem_copy_4_cores_2_chans_2048_tile_512_False0
mem_copy_8_cols_1_channels_2048_tile_256
mem_copy_8_cols_2_channels_2048_tile_128
mem_copy_8_cores_1_chans_2048_tile_256_False
mem_copy_8_cores_1_chans_2048_tile_256_False0
mem_copy_8_cores_2_chans_2048_tile_256_False
mem_copy_8_cores_2_chans_2048_tile_256_False0
mha
mha0
mha_16384_64_1_8_0_0
relu_1_cols_1_channels_2048_tile_2048
relu_2_cols_1_channels_2048_tile_1024
relu_4_cols_1_channels_2048_tile_512
relu_8_cols_1_channels_2048_tile_256
rms_norm_1_cols_1_channels_2048_tile_2048
rms_norm_1_cols_2_channels_2048_tile_1024
rms_norm_2_cols_1_channels_2048_tile_1024
rms_norm_2_cols_2_channels_2048_tile_512
rms_norm_4_cols_1_channels_2048_tile_512
rms_norm_4_cols_2_channels_2048_tile_256
rms_norm_8_cols_1_channels_2048_tile_256
rms_norm_8_cols_2_channels_2048_tile_128
rope_1_cols_2_channels_4096_tile_4096_0
rope_1c_32rows_512cols_32arows_0m
rope_1c_32rows_512cols_8arows_0m
rope_2_cols_2_channels_4096_tile_2048_0
rope_2c_32rows_512cols_32arows_0m
rope_2c_32rows_512cols_8arows_0m
rope_4_cols_2_channels_4096_tile_1024_0
rope_8_cols_2_channels_4096_tile_512_0
rope_8c_32rows_512cols_32arows_0m
rope_8c_32rows_512cols_8arows_0m
rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0
seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0No metrics available. seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_FalseNo metrics available. sigmoid_1_cols_1_channels_2048_tile_2048
sigmoid_2_cols_1_channels_2048_tile_1024
sigmoid_4_cols_1_channels_2048_tile_512
sigmoid_8_cols_1_channels_2048_tile_256
silu_1_cols_1_channels_2048_tile_2048
silu_2_cols_1_channels_2048_tile_1024
silu_4_cols_1_channels_2048_tile_512
silu_8_cols_1_channels_2048_tile_256
softmax_1_cols_2_channels_4096_tile_2048
softmax_2_cols_2_channels_32768_tile_1024
softmax_2_cols_2_channels_32768_tile_512
softmax_2_cols_2_channels_4096_tile_1024
softmax_2_cols_2_channels_4096_tile_512
softmax_4_cols_4_channels_32768_tile_2048
swigluNo metrics available. swiglu_decode_1x2048x2048No metrics available. swiglu_decode_1x2048x2048_0
swiglu_prefill_256x2048x2048No metrics available. tanh_1_cols_1_channels_2048_tile_2048
tanh_2_cols_1_channels_2048_tile_1024
tanh_4_cols_1_channels_2048_tile_512
tanh_8_cols_1_channels_2048_tile_256
transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s0
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s0
weighted_rms_norm_1_cols_2_channels_2048_weights_2048
weighted_rms_norm_2_cols_2_channels_2048_weights_1024
weighted_rms_norm_4_cols_2_channels_2048_weights_512
weighted_rms_norm_8_cols_2_channels_2048_weights_256
|
- mha.cc: include mm.cc once (col-major via -DB_COL_MAJ flag) and define zero_bf16_rowmaj and matmul_bf16_bf16_rowmaj directly in the same TU using the matmul_vectorized_2x2_mmul template with b_row_maj=true, eliminating transitive cross-object symbol references - op.py: replace mha_mm.o, mha_mm_rowmaj.o, mha_mha.o with single mha.o compiled with mm col-major flags and all three source dependencies - design.py: update all Kernel references from mha_mm.o/mha_mha.o to mha.o - compilation/base.py: relax PeanoCompilationRule to allow multiple source dependencies per KernelObjectArtifact (>= 1 instead of == 1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📊 Test Results for Small Benchmark/Test Suite8728b32 (2026_03_26_21_31_35) IRONCLADTested on
📈 Trends (vs main branch) for Small Benchmark/Test Suite8728b32 (2026_03_26_21_31_35) IRONCLAD TrendsM_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128
M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0
M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512
M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256
M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8
M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024
M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0
axpy_1_cols_2_channels_2048_tile_2048_3.0
axpy_1_cols_2_channels_2048_tile_2048_3.0_0
axpy_2_cols_2_channels_2048_tile_1024_3.0
axpy_2_cols_2_channels_2048_tile_1024_3.0_0
axpy_4_cols_2_channels_2048_tile_512_3.0
axpy_4_cols_2_channels_2048_tile_512_3.0_0
axpy_8_cols_2_channels_2048_tile_256_3.0
axpy_8_cols_2_channels_2048_tile_256_3.0_0
dequant_1_cols_1_channels_2048_tile_2048
dequant_1_cols_1_channels_2048_tile_2048_0
dequant_1_cols_2_channels_2048_tile_1024
dequant_1_cols_2_channels_2048_tile_1024_0
dequant_2_cols_1_channels_2048_tile_1024
dequant_2_cols_1_channels_2048_tile_1024_0
dequant_2_cols_2_channels_2048_tile_512
dequant_2_cols_2_channels_2048_tile_512_0
dequant_4_cols_1_channels_2048_tile_512
dequant_4_cols_1_channels_2048_tile_512_0
dequant_4_cols_2_channels_2048_tile_256
dequant_4_cols_2_channels_2048_tile_256_0
dequant_8_cols_1_channels_2048_tile_256
dequant_8_cols_1_channels_2048_tile_256_0
dequant_8_cols_2_channels_2048_tile_128
dequant_8_cols_2_channels_2048_tile_128_0
eltwise_add_1_cols_2_channels_2048_tile_2048
eltwise_add_2_cols_2_channels_2048_tile_1024
eltwise_add_4_cols_2_channels_2048_tile_512
eltwise_add_8_cols_2_channels_2048_tile_256
eltwise_mul_1_cols_2_channels_2048_tile_2048
eltwise_mul_2_cols_2_channels_2048_tile_1024
eltwise_mul_4_cols_2_channels_2048_tile_512
eltwise_mul_8_cols_2_channels_2048_tile_256
embedding_dim_2048-hidden_dim_2048No metrics available. gelu_1_cols_1_channels_2048_tile_2048
gelu_1_cols_2_channels_2048_tile_1024
gelu_2_cols_1_channels_2048_tile_1024
gelu_2_cols_2_channels_2048_tile_512
gelu_4_cols_1_channels_2048_tile_512
gelu_4_cols_2_channels_2048_tile_256
gelu_8_cols_1_channels_2048_tile_256
gelu_8_cols_2_channels_2048_tile_128
gemm_1792x896x1152_64x32x48_8cols_ccolmaj
gemm_192x384x64_48x96x16_4cols
gemm_192x384x64_48x96x16_4cols_bcolmaj_ccolmaj
gemm_2048x2048x2048_64x64x32_8_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x32_8_cols_0_bcolmaj_1_ccolmaj_0
gemm_2048x2048x2048_64x64x32_8_cols_1_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_1cols
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_1_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_1_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2_cols_1_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_1_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_2cols_bcolmaj
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_1_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8_cols_1_bcolmaj_0_ccolmaj_0_0
gemm_2048x2048x2048_64x64x64_8cols_bcolmaj_ccolmaj
gemm_384x1536x1792_32x48x64_4cols_bcolmaj
gemm_896x1792x640_32x64x80_8cols_ccolmaj
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_2048-scalar_factor_3.0
input_length_2048-num_aie_columns_1-tile_size_2048
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_1024-scalar_factor_3.0
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False
input_length_2048-num_aie_columns_2-tile_size_1024
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_512-scalar_factor_3.0
input_length_2048-num_aie_columns_4-tile_size_512
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_256-scalar_factor_3.0
input_length_2048-num_aie_columns_8-tile_size_256
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048
input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512
input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512
input_length_32768-num_aie_columns_4-num_channels_4-tile_size_2048
layer_norm_1_cols_1_channels_2048_tile_2048
layer_norm_1_cols_2_channels_2048_tile_1024
layer_norm_2_cols_1_channels_2048_tile_1024
layer_norm_2_cols_2_channels_2048_tile_512
layer_norm_4_cols_1_channels_2048_tile_512
layer_norm_4_cols_2_channels_2048_tile_256
layer_norm_8_cols_1_channels_2048_tile_256
layer_norm_8_cols_2_channels_2048_tile_128
matrix_vector_mul_128x128_32_1col
matrix_vector_mul_128x128_32_1col0
matrix_vector_mul_128x128_32tsi_128tso_1col
matrix_vector_mul_128x128_32tsi_128tso_1col0
matrix_vector_mul_2048x8192_1_1col
matrix_vector_mul_2048x8192_1_1col0
matrix_vector_mul_2048x8192_1_2col
matrix_vector_mul_2048x8192_1_2col0
matrix_vector_mul_2048x8192_1_4col
matrix_vector_mul_2048x8192_1_4col0
matrix_vector_mul_2048x8192_1_8col
matrix_vector_mul_2048x8192_1_8col0
matrix_vector_mul_2048x8192_1tsi_1024tso_2col
matrix_vector_mul_2048x8192_1tsi_1024tso_2col0
matrix_vector_mul_2048x8192_1tsi_2048tso_1col
matrix_vector_mul_2048x8192_1tsi_2048tso_1col0
matrix_vector_mul_2048x8192_1tsi_256tso_8col
matrix_vector_mul_2048x8192_1tsi_256tso_8col0
matrix_vector_mul_2048x8192_1tsi_512tso_4col
matrix_vector_mul_2048x8192_1tsi_512tso_4col0
matrix_vector_mul_8192x2048_4_1col
matrix_vector_mul_8192x2048_4_1col0
matrix_vector_mul_8192x2048_4_2col
matrix_vector_mul_8192x2048_4_2col0
matrix_vector_mul_8192x2048_4_4col
matrix_vector_mul_8192x2048_4_4col0
matrix_vector_mul_8192x2048_4_8col
matrix_vector_mul_8192x2048_4_8col0
matrix_vector_mul_8192x2048_4tsi_1024tso_1col
matrix_vector_mul_8192x2048_4tsi_1024tso_1col0
matrix_vector_mul_8192x2048_4tsi_1024tso_2col
matrix_vector_mul_8192x2048_4tsi_1024tso_2col0
matrix_vector_mul_8192x2048_4tsi_1024tso_4col
matrix_vector_mul_8192x2048_4tsi_1024tso_4col0
matrix_vector_mul_8192x2048_4tsi_1024tso_8col
matrix_vector_mul_8192x2048_4tsi_1024tso_8col0
mem_copy_16_cores_2_chans_2048_tile_128_False
mem_copy_16_cores_2_chans_2048_tile_128_False0
mem_copy_1_cols_1_channels_2048_tile_2048
mem_copy_1_cols_2_channels_2048_tile_1024
mem_copy_1_cores_1_chans_2048_tile_2048_False
mem_copy_1_cores_1_chans_2048_tile_2048_False0
mem_copy_2_cols_1_channels_2048_tile_1024
mem_copy_2_cols_2_channels_2048_tile_512
mem_copy_2_cores_1_chans_2048_tile_1024_False
mem_copy_2_cores_1_chans_2048_tile_1024_False0
mem_copy_2_cores_2_chans_2048_tile_1024_False
mem_copy_2_cores_2_chans_2048_tile_1024_False0
mem_copy_4_cols_1_channels_2048_tile_512
mem_copy_4_cols_2_channels_2048_tile_256
mem_copy_4_cores_1_chans_2048_tile_512_False
mem_copy_4_cores_1_chans_2048_tile_512_False0
mem_copy_4_cores_2_chans_2048_tile_512_False
mem_copy_4_cores_2_chans_2048_tile_512_False0
mem_copy_8_cols_1_channels_2048_tile_256
mem_copy_8_cols_2_channels_2048_tile_128
mem_copy_8_cores_1_chans_2048_tile_256_False
mem_copy_8_cores_1_chans_2048_tile_256_False0
mem_copy_8_cores_2_chans_2048_tile_256_False
mem_copy_8_cores_2_chans_2048_tile_256_False0
mha
mha0
mha_16384_64_1_8_0_0
relu_1_cols_1_channels_2048_tile_2048
relu_2_cols_1_channels_2048_tile_1024
relu_4_cols_1_channels_2048_tile_512
relu_8_cols_1_channels_2048_tile_256
rms_norm_1_cols_1_channels_2048_tile_2048
rms_norm_1_cols_2_channels_2048_tile_1024
rms_norm_2_cols_1_channels_2048_tile_1024
rms_norm_2_cols_2_channels_2048_tile_512
rms_norm_4_cols_1_channels_2048_tile_512
rms_norm_4_cols_2_channels_2048_tile_256
rms_norm_8_cols_1_channels_2048_tile_256
rms_norm_8_cols_2_channels_2048_tile_128
rope_1_cols_2_channels_4096_tile_4096_0
rope_1c_32rows_512cols_32arows_0m
rope_1c_32rows_512cols_8arows_0m
rope_2_cols_2_channels_4096_tile_2048_0
rope_2c_32rows_512cols_32arows_0m
rope_2c_32rows_512cols_8arows_0m
rope_4_cols_2_channels_4096_tile_1024_0
rope_8_cols_2_channels_4096_tile_512_0
rope_8c_32rows_512cols_32arows_0m
rope_8c_32rows_512cols_8arows_0m
rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0
seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0
seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_FalseNo metrics available. sigmoid_1_cols_1_channels_2048_tile_2048
sigmoid_2_cols_1_channels_2048_tile_1024
sigmoid_4_cols_1_channels_2048_tile_512
sigmoid_8_cols_1_channels_2048_tile_256
silu_1_cols_1_channels_2048_tile_2048
silu_2_cols_1_channels_2048_tile_1024
silu_4_cols_1_channels_2048_tile_512
silu_8_cols_1_channels_2048_tile_256
softmax_1_cols_2_channels_4096_tile_2048
softmax_2_cols_2_channels_32768_tile_1024
softmax_2_cols_2_channels_32768_tile_512
softmax_2_cols_2_channels_4096_tile_1024
softmax_2_cols_2_channels_4096_tile_512
softmax_4_cols_4_channels_32768_tile_2048
swigluNo metrics available. swiglu_decode_1x2048x2048No metrics available. swiglu_decode_1x2048x2048_0
swiglu_prefill_256x2048x2048No metrics available. tanh_1_cols_1_channels_2048_tile_2048
tanh_2_cols_1_channels_2048_tile_1024
tanh_4_cols_1_channels_2048_tile_512
tanh_8_cols_1_channels_2048_tile_256
transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s0
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s0
weighted_rms_norm_1_cols_2_channels_2048_weights_2048
weighted_rms_norm_2_cols_2_channels_2048_weights_1024
weighted_rms_norm_4_cols_2_channels_2048_weights_512
weighted_rms_norm_8_cols_2_channels_2048_weights_256
|
📊 Test Results for Test Example Applications8728b32 (2026_03_26_21_36_45) IRONCLADTested on
📈 Trends (vs main branch) for Test Example Applications8728b32 (2026_03_26_21_36_45) IRONCLAD Trendsllama_3.2_1b
llama_3.2_1b_prompt_1024_tokens_1
llama_3.2_1b_prompt_1024_tokens_40
llama_3.2_1b_prompt_13_tokens_1
llama_3.2_1b_prompt_13_tokens_40
llama_3.2_1b_prompt_2048_tokens_1
llama_3.2_1b_prompt_2048_tokens_40
|
Integration of
iron.tensors, mlir-aie runtime helpers, and new form oflink_with, mostly.A bunch of small code quality fixes too.
PR Merge Checklist
develcommit and pointing todevel.