Implement Neon multi-buffer SNOW3G UEA2 & UIA2
Following changes have be done in UEA2 patch.
1. Add job manager of snow3g uea2, this manager can schedule
4-buffer with un-uniform length and different keys.
2. Add 4-buffer implementation of underlying algorithm of snow3g,
such as s1 box, s2 box, multiply alpha and divide alpha.
Compared with single buffer implementation, the performance
improvement on different micro-archetecutre is listed below.
Cortex-A57 : x2.20 @ 2048byte
Cortex-A72 : x2.19 @ 2048byte
Neoverse N1 : x2.32 @ 2048byte
Neoverse V1 : x2.92 @ 2048byte
We use the result "./perf/ipsec_perf --cipher-algo snow3g-uea2" to
calculate performance improvement.
Following changes have be done in UIA2 patch.
1. Add job manager of snow3g uia2.
2. During the keystream generating stage, implement 4-buffer.
Compared with 1buf-4blk implementation (1-buffer during keystream
generating stage, 4-block during polynomial mutiply stage), the
performance improvement of 4buf-4blk implementation (4-buffer
during keystream generating stage, 4-block during polynomial
mutiply stage) on different micro-archetecutre is listed below.
Cortex-A57 : x1.48 @ 2048byte
Cortex-A72 : x1.48 @ 2048byte
Neoverse N1 : x1.36 @ 2048byte
We use the result "./perf/ipsec_perf --hash-algo snow3g-uia2" to
calculate performance improvement.
Edited by Fisher Yu