AArch64 Android builds and publish/test CI jobs
Summary of changes:
- Making XNNPack delegate the default TFlite backend
- Switching to Tensorflow tag r2.16 due to performance regression from latest release tags
- Removing profiling headers - not relevant for public benchmarks
- Separating docker builds for AArch64 and X86_64 hosts
- Adding AArch64 Android builds using cross-compilation from X86_64 with Android NDK
- Adding test job to compare psnr between python and c++ outputs
- Adding publish job to upload latest job artifacts for AArch64 Linux and Android builds on main to package registry
- Updating ReadME with build/execute instructions and direct quick start links to latest job artifacts