Optimize scalar RHS packing function NxK F32 <- QAI8DXP x QSU4C32
- Optimize the generic RHS packing NxK. The performance improvement is around ~1.5x
Signed-off-by: Gian Marco Iodice gianmarco.iodice@arm.com
Signed-off-by: Gian Marco Iodice gianmarco.iodice@arm.com