Add bf16 Lhs/Rhs packed gemm kernels and packing functions
This commit
- Adds bf16 x bf16 = fp32 matmul microkernel with 8x12 output block size
- Lhs/Rhs packing functions that packs and converts the inputs from fp32 to bf16
- Corresponding tests, and modifications to the testing framework, and reference implementation
Signed-off-by: Gunes Bayir gunes.bayir@arm.com
Edited by Gunes Bayir