Add fp16 in/out bf16 Gemm kernel and relevant packing functions
This commit
- Adds bf16 x bf16 = fp16 matmul microkernel with 8x12 output block size
- Lhs/Rhs packing functions that packs and converts the inputs from fp16 to bf16
- Corresponding tests, and modifications to the testing framework, and reference implementation
Signed-off-by: Gunes Bayir gunes.bayir@arm.com
Edited by Gunes Bayir