Optimize F32 <- QSI8D32 (LHS) x QSI4C32 (RHS) for SME (!184) · Merge requests · Kleidi / KleidiAI

GEMM and GEMV Micro-kernels to compute the matrix multiplication of dynamically quantized symmetric signed 8-bit integer with per-block quantization (QSI8D32) LHS matrix and quantized symmetric 4-bit signed integer with per-block quantization (QSI4C32) RHS matrix and the accumulation of the result into a single-precision (F32) output, optimized for SME2 technology.

Signed-off-by: Gian Marco Iodice gianmarco.iodice@arm.com

Signed-off-by: Anitha Raj anitha.raj@arm.com

Edited Nov 26, 2024 by Anitha Raj

Optimize F32 <- QSI8D32 (LHS) x QSI4C32 (RHS) for SME