Skip to content

Draft: Add SME2 3x3 depthwise int8 kernel (stride=1, per-ch sym weight quant, float scale requant)

Declan Cox requested to merge deccox01/dwconv_qai8_qsi8_qai8cxp_float_requant into main
  • Implements a planar depthwise convolution kernel: 3x3, stride=1, int8.
  • Inputs and outputs use per-tensor asymmetric quantization.
  • Weights use per-channel symmetric quantization (support for asymmetric if required).
  • Accumulate in int32; epilogue applies per-channel float scale requantization with saturation to int8.
  • Includes a required weight packing kernel, packing in VL-sized blocks for SME2 execution.
  • Adds initial source and headers for the kernel, packing kernel (for weights), unit tests.

Signed-off-by: Declan Cox declan.cox@arm.com

Merge request reports

Loading