Skip to content

Implement Neon multi-buffer SNOW3G UEA2 & UIA2

Fisher Yu requested to merge snow3g-neon into main

Following changes have be done in UEA2 patch.

1. Add job manager of snow3g uea2, this manager can schedule
   4-buffer with un-uniform length and different keys.
2. Add 4-buffer implementation of underlying algorithm of snow3g,
   such as s1 box, s2 box, multiply alpha and divide alpha.

Compared with single buffer implementation, the performance
improvement on different micro-archetecutre is listed below.
  Cortex-A57  : x2.20 @ 2048byte
  Cortex-A72  : x2.19 @ 2048byte
  Neoverse N1 : x2.32 @ 2048byte
  Neoverse V1 : x2.92 @ 2048byte

We use the result "./perf/ipsec_perf --cipher-algo snow3g-uea2" to
calculate performance improvement.

Following changes have be done in UIA2 patch.

1. Add job manager of snow3g uia2.
2. During the keystream generating stage, implement 4-buffer.

Compared with 1buf-4blk implementation (1-buffer during keystream
generating stage, 4-block during polynomial mutiply stage), the
performance improvement of 4buf-4blk implementation (4-buffer
during keystream generating stage, 4-block during polynomial
mutiply stage) on different micro-archetecutre is listed below.
  Cortex-A57 : x1.48 @ 2048byte
  Cortex-A72 : x1.48 @ 2048byte
  Neoverse N1 : x1.36 @ 2048byte

We use the result "./perf/ipsec_perf --hash-algo snow3g-uia2" to
calculate performance improvement.
Edited by Fisher Yu

Merge request reports