Matmul int4 micro-kernels for QA8DX (LHS) x QS4CX (RHS) -> F32 (!2) · Merge requests · Kleidi / KleidiAI

Gian Marco Iodice requested to merge qs4cx_matmul into main Apr 09, 2024

The LHS matrix is quantized (Q) Asymmetric (A) 8-bit (8) with per-row (DX) quantization parameters
The RHS matrix is quantized (Q) Symmetric (S) 4-bit (4) with per-channel (cx) quantization parameters
The destination is F32
Implement matmul int4 micro-kernels with intrinsics by using the dotprod and i8mm extensions
Implement a micro-kernel to pack the RHS matrix
Implement two micro-kernels to dynamically quantize and pack the LHS matrix
Add README.md
No test added into this PR. Test will be added in a separate PR

Signed-off-by: Gian Marco Iodice gianmarco.iodice@arm.com

Matmul int4 micro-kernels for QA8DX (LHS) x QS4CX (RHS) -> F32