GEMM and LHS packing kernel <- f16 LHS x QSI4c32p RHS (!487) · Merge requests · Kleidi / KleidiAI

Micro kernels to compute matrix multiplication of packed LHS matrix with f16 and RHS matrix with symmetric 4-bit integer with per channel quantisation and accumulation into a single-precision matrix. This MR also includes the kernel to scale and pack the LHS matrix F32 -> f16.

the GEMM kernel has been optimised for SME2

Signed-off-by: Hugo OKeeffe hugo.okeeffe@arm.com

Admin message

GEMM and LHS packing kernel <- f16 LHS x QSI4c32p RHS

Merge request reports