Fix bug in ChaCha20 assembly code (was writing one byte too many).
Fix the assembly code to have APPLE format.
Change Poly1305 inline assembly as requested by compiler.
Initialize variables that will be set anyway - compiler complaint.
Change to use the assembly code files for Curve25519 and SHA-512.
Ed25519 not suported with ARM assembly.
1. Remove the file async.c from the iOS Benchmark project.
2. Update the organization name in the Benchmark project to "wolfSSL Inc".
3. In the workspace project, change the path to the wolfSSL test to be a local relative path rather than an absolute path.
4. In the workspace project, remove the benchmark project and re-add it. It becomes a local relative reference with the correct name.
1. Fix iOS Benchmark reference to the async.c file.
2. Fix iOS Benchmark reference to the sp.c file. Changed to spr_c64.c.
3. Removed misc.c from iOS Benchmark as it is using inlined misc.h.
4. Added define of HAVE___UINT128_T to the user_settings.h so the
benchmark would build.
5. Wrapped the benchmark usage strings in NO_MAIN_DRIVER.
* Fixed some `tls_bench` build issues with various configure options.
* Moved the `WOLFSSL_PACK` and `WC_NORETURN` macros into types.h.
* Added support for `__builtin_bswap32` and `__builtin_bswap64`. Since the performance of the builtins varries by platform its off by default, but can be enabled by customer using `WOLF_ALLOW_BUILTIN`. Quick check on x86 showed the 32-bit swap performance matched, but 64-bit swap was slower.