It does complicate things but yes, as you suggested, you could have a structure that contains both an TStream (for persistence) and a TBitmap (for display). The TBitmap would be created and loaded from the TStream once, on-demand, when the image needs to be displayed. This is also the technique used internally by many VCL image containers (e.g. TGIFImage, TPNGImage, TJPEGImage).
That at least would mean you only "wasted" memory on the bitmaps that were actually displayed.
Not really. The data in the stream is compressed and probably much smaller than the decompressed image data.