Improve SME2 FFTs with FMOPA instructions
Improve SME2 code for FFT
Description
This patch modifies the sme2 c2c plan for 256x256 FFTs by introducing some new twiddle factor matrices, for faster load of twiddle factors, and reorder the outer product instructions to make full use of the computation capabilities of the C1-SME2 unit.
Checklist
- [v] Contribution meets RAL's license terms
- [v] New functions adhere to RAL's naming scheme
- [v] Contribution conforms to RAL's directory structure
- [v] Documentation updated
- [v] "Unreleased" section of the Changelog updated
- [v]
clang-formatandclang-tidyrun and changes included (C/C++ code) - [v]
flake8run and changes included (Python code) - [v]
cmake-formatrun and changes included (CMake code) - [v] Tests added or updated
- [v] Tests pass when run with AddressSanitizer
- [v] Benchmarks added or updated