Optimize F32 <- QAI8DXP (LHS) x QSI4CXP (RHS) for SME (!153) · Merge requests · Kleidi / KleidiAI · GitLab

Anitha Raj requested to merge qsi4cx_sme into main Oct 21, 2024

GEMM and GEMV micro-kernels to compute the matrix multiplication of dynamically quantized 8-bit integer (QAI8DX) LHS matrix and quantized 4-bit integer (QSI4CX) RHS matrix and the accumulation of the result into a single-precision (F32) output, optimized for SME2 technology.

Signed-off-by: Mohamad Najem mohamad.najem@arm.com

Signed-off-by: Anitha Raj anitha.raj@arm.com

Signed-off-by: Michael Kozlov michael.kozlov@arm.com

Signed-off-by: Thomas Bamelis thomas.bamelis@arm.com

Edited Nov 13, 2024 by Thomas Bamelis