Are there any matmul kernels designed for MxK lhs and NxK rhs?
I’m just getting started with kernel optimization, and I’ve got two questions I’d like to ask:
- May I ask whether KleidiAI currently supports matmul with MxK LHS and NxK RHS matrices?
- I noticed that most kernels require packing the RHS — are they specifically designed for linear operations? For general matrix multiplication like MxK LHS and KxN RHS, would packing the RHS every time be inefficient?