Don't expect anything magic by using mORMot MoveFast(). Perhaps a few percent more or less.
On Win32 - which is your target, IIRC the Delphi RTL uses X87 registers. On this platform, MoveFast() use SSE2 registers for small sizes, so is likely to be slightly faster, and will leverage ERMSB move (i.e. rep movsb) on newer CPUs which support it.
To be fair, mORMot asm is more optimized for x86_64 than for i386 - because it is the target platform for server side, which is the one needing more optimization.
But I would just try all FastCode variants - some can be very verbose, but "may" be better.
What I would do in your case, is trying to not move any data at all.
Isn't it possible that you pre-allocate a set of buffers, then just consume them in a circular way, passing them from the acquisition to the processing methods as pointers, with no copy?
The fastest move() is ... when there is no move... 🙂