1. Totally understandable. You cannot use ARC to its full potential and you are only paying the price.
2. I assume it was Linux vs Windows,
It would be interesting to see how it compares with ARC removed. Comparing x86 with ARM is comparing apples with oranges - ARC or no ARC. AFAIK, LLVM backend has some influence on performance, too. There are some reports about inefficient code generation - something not ARC related.
Performance mostly suffers because of existing code that is not written for ARC - unnecessary reference counting triggers are real performance killers. It is not fault of ARC per-se.
3. LOL - it is the same memory management model, (if you remove DisposeOf out of the picture) so pitfalls are exactly the same. Developing for ARC requires a bit different mindset.