Integrate ASM matmul micro-kernels for F32 <- QSI8D32 x QSI4C32
- Integrate ASM matmul micro-kernel for the GeMV and GeMM variants
- Refactor the LHS and RHS packing function to load the scale from the beginning of the block
- Add timer in the example for profiling the ukernels
Signed-off-by: Gian Marco Iodice gianmarco.iodice@arm.com