Skip to content

Refactor RHS packing function for F32 <- QAI8DXP x QSU4C32

Gian Marco Iodice requested to merge opt_rhs_pack_bf16 into main
  • Rename the packing function to include the the bf16 scale factor
  • Optimize the scalar variant. The new implementation is ~1.5x faster than the previous one

Signed-off-by: Gian Marco Iodice gianmarco.iodice@arm.com

Merge request reports