The 'original' ARM C compiler (Norcroft) certainly did. As other commentators have pointed out, it was extremely powerful and a big performance boost in the early days of shallow pipelined, in-order, small (or no) cache ARM processors.
Yes, it's basically a function prologue/epilogue in a single instruction. In practice any disassembly of an AArch32 program will have loads of these. It saves instruction cycles in most cases.
That wasn't the actual design goal, though: Acorn was trying to get away with not shipping a DMA controller, which was quite expensive kit at the time they were shipping the Archimedes. Having an instruction pair that let you use registers as DMA buffer let them get similar performance and save lots of per-unit cost.
Like all the time. Think that POP and PUSH are aliased to LDM and STM type instructions, so at least they're emitted when entering and returning from functions (exceptions apply).