Add bf16 Lhs/Rhs packed gemm kernels and packing functions (!133) · Merge requests · Kleidi / KleidiAI · GitLab

Gunes Bayir requested to merge gunes/bf16_interleaved_gemm into main Sep 30, 2024

This commit

Adds bf16 x bf16 = fp32 matmul microkernel with 8x12 output block size
Lhs/Rhs packing functions that packs and converts the inputs from fp32 to bf16
Corresponding tests, and modifications to the testing framework, and reference implementation

Signed-off-by: Gunes Bayir gunes.bayir@arm.com

Edited Oct 10, 2024 by Gunes Bayir