Skip to content

GEMM and LHS packing kernel <- f16 LHS x QSI4c32p RHS

Hugo OKeeffe requested to merge hugoke01/int4_sme into main

Micro kernels to compute matrix multiplication of packed LHS matrix with f16 and RHS matrix with symmetric 4-bit integer with per channel quantisation and accumulation into a single-precision matrix. This MR also includes the kernel to scale and pack the LHS matrix F32 -> f16.

the GEMM kernel has been optimised for SME2

Signed-off-by: Hugo OKeeffe hugo.okeeffe@arm.com

Merge request reports

Loading