You seem highly concerned about performance implications - so measure the performance. Simulate a million (or whatever constitutes a significant amount) receipts of randomized sample data and measure your few possible implementations. If you find very little difference, then pick the one that is easiest for your team to extend and maintain.
If your original source is TBytes, and perhaps your output source is TBytes, then introducing any conversions seems wasteful, but gut feelings should only act as a general guide. You either measure or you keep guessing. If you keep guessing, you may accidently determine if it's (currently) providing acceptable performance.