Add wider variants of Advanced SIMD FP16 and FP32 MatMul
Add 6x32 block size variant of Advanced SIMD FP16 MatMul, increased from the original 6x16 variant.
Add 6x16 block size variant of Advanced SIMD FP32 MatMul, increased from the original 6x8 variant.
These are the maximum viable block sizes for these kernels.
Add a variant of the kernel optimized for the Arm® Cortex®-A55 processor.
Signed-off-by: Jakub Sujak jakub.sujak@arm.com
Edited by Jakub Sujak