Cinematic Mode Perf Improvement
- Adding fast acle data conversion function from UINT8 to symmetric INT8 (default convertTo/add approach is much slower)
- Adding OpenCV approx algorithm hint to enable KleidiCV Gaussian Blur backend
- Modifying simpleperf record function to capture call graphs via dwarf unwinding (more useful to trace cycles spent on each function)