Optimize the NxK RHS packing function for qsu4c32s1s0
- Added a specialization for nr=4, kr=16, sr=2
- Improved the RHS packing function performance by 55%
Signed-off-by: Gian Marco Iodice gianmarco.iodice@arm.com
Signed-off-by: Gian Marco Iodice gianmarco.iodice@arm.com