Support MatMul primitives with missing bias
Hello developers!
In our project we use MatMul primitives from KleidiAI library and notice that all (or almost all?) primitives requires pointer to bias. However, not all MatMuls have biases in DL models. For example, MatMuls are in such patterns as ScaledDotProductAttention pattern, GatedMLP, which are used in LLMs.
To execute MatMuls in such subgraphs we have to allocate the separate memory blob with large size due to big size of N dimension and nullify it. It might lead to extra memory allocation in application or performance degradation due to extra nullifying.
So could you tell me please if there are any possibilities to extend existing functionality with missed bias to avoid any extra empty memory allocation?
For example, we can have 2 asm codes in rhs-repacking primitives: with passed non-null bias pointer (as it is) and with passed null bias pointer (asm code should manually store 0 to specific zones in output repacked data by strides)
Just FYI, currently, we use Neon-specific primitives:
kai_run_rhs_pack_kxn_f32p8x1biasf32_f32_f32_neon
kai_run_rhs_pack_kxn_f16p16x1biasf16_f16_f16_neon
kai_run_rhs_pack_kxn_qsi8cxp_qsi8cx_neon.
Thank you in advance!
