Jump to content
Sign in to follow this  
Tommi Prami

32bit RGBA TBitmap to RGB byte stream.

Recommended Posts

Hello,

One piece of code we use (3rd party component) is taking bitmapdata and put rgb values to stream.

And on large bitmap this will take  quite a long time, due the sheer amount of pixels to go through,

Original code was even sl,owe because it used Pixels []-property.

Now it uses Scanline.

There is two different buffering (now) strategies but changes to that buffering (Collect data to byte array and write that once and awhile to stream) strategy changes can only go so far.

I was just pondering coulöd there be any weird bit fiddling trics etc to get that RGBA -> RGB byte triplet faster? That is the most common operation anyhow. I eman this:

 

      LLine := ASrcmap.Scanline[LY];
      for LX := 0 to xdim - 1 do
      begin
        bbuff[BP] := LLine[LX].R; // RGBColor^.red;
        bbuff[BP + 1] := LLine[LX].G; // RGBColor^.green;
        bbuff[BP + 2] := LLine[LX].B; // RGBColor^.blue;
        Inc(BP, 3);
      end;


Any ideas?

Share this post


Link to post

One that could significantly make this faster would be to use some fast but good enough quality Algorithm to resample the image smaller first. Might be possible to do that, or not, depending how large change it would be an sure would have to be super fast resampling. 

But if possible with put changing the original bitmap, would be cool.

 

-Tee-

Share this post


Link to post

Before doing any deep optimizations, run benchmark to ensure that serialize really is the source of slowdown. Otherwise you could spend hours achieving nothing in the end

  • Like 2

Share this post


Link to post

I can't see how resampling would make it any faster unless your streaming implementation really sucks. Resampling would mean that you'd have to read all the pixel data, juggle it around, store it in a new buffer and then read from that buffer instead. Considerably more expensive than whatever solution you can come up with that just reads the data via Scanline.

 

You haven't shown how you RGBA and RGB types are declared but assuming the R-G-B ordering are the same and the A is the last (i.e. high) byte then just read 4 bytes (that's a DWORD) from the source and write 3 bytes. Rinse, repeat.

 

If the source is ABGR and the destination is RGB (e.g. TColor) then you can rearrange the bits like this:

function ABGR2RGB(ABGR: DWORD): TColor;
begin
  Result := ((ABGR and $00FF0000) shr 16) or (ABGR and $0000FF00) or ((ABGR and $000000FF) shl 16);
end;

or in assembler:

function ABGR2RGB(ABGR: DWORD): TColor;
asm
  mov EAX, ECX // Remove this for 32-bit

  rol EAX, 8
  xor AL, AL
  bswap EAX
end;

 

  • Thanks 1

Share this post


Link to post
18 hours ago, Fr0sT.Brutal said:

Before doing any deep optimizations, run benchmark to ensure that serialize really is the source of slowdown. Otherwise you could spend hours achieving nothing in the end

On large bitmap this takes minutes so I am pretty sure this is the place all speedups are welcome,

Share this post


Link to post
13 hours ago, Anders Melander said:

I can't see how resampling would make it any faster unless your streaming implementation really sucks. Resampling would mean that you'd have to read all the pixel data, juggle it around, store it in a new buffer and then read from that buffer instead. Considerably more expensive than whatever solution you can come up with that just reads the data via Scanline.

 

You haven't shown how you RGBA and RGB types are declared but assuming the R-G-B ordering are the same and the A is the last (i.e. high) byte then just read 4 bytes (that's a DWORD) from the source and write 3 bytes. Rinse, repeat.

 

If the source is ABGR and the destination is RGB (e.g. TColor) then you can rearrange the bits like this:


function ABGR2RGB(ABGR: DWORD): TColor;
begin
  Result := ((ABGR and $00FF0000) shr 16) or (ABGR and $0000FF00) or ((ABGR and $000000FF) shl 16);
end;

or in assembler:


function ABGR2RGB(ABGR: DWORD): TColor;
asm
  mov EAX, ECX // Remove this for 32-bit

  rol EAX, 8
  xor AL, AL
  bswap EAX
end;

 

Would that still be 4 bytes? Right? 

Ah, should learn how to read first 🙂

Edited by Tommi Prami
(Misunderstanmding)

Share this post


Link to post

Input is as in normally in TBitmap with 32 bit pixels.


  TRGB32 = packed record
    B, G, R, A: Byte;
  end;

 

and output should be stream of RGB-bytes in that order.

 

.tee.

Edited by Tommi Prami
Typo

Share this post


Link to post

Thanks everyone, so far. I'll have to check on this later.

 

I'll stress that this is part of 3rd party component, which we can't totally rewrite, this process takes too much time sometimes so if we can speed up it a bit if just can.

I was pondering that if I could define 4byte array and use Absolute trick to map that array to the result of method shown by the Anders above. 

I am still pretty much in a sleep, so all ideas I get how to implement this seems that it would have too much code in it. I bet there is elegant solution, possibly using pointers which I am not too good at. But have to try later.

 

-Tee-

Edited by Tommi Prami
Typo

Share this post


Link to post
38 minutes ago, Tommi Prami said:

On large bitmap this takes minutes

Please define what is a large bitmap for you.

Share this post


Link to post
1 hour ago, Tommi Prami said:

On large bitmap this takes minutes so I am pretty sure this is the place all speedups are welcome,

Are you absolutely sure? What happens if you comment out copy leaving only ScanLine?

Share this post


Link to post

How often do you access the ScanLine property? I found that it is much faster to get the address of the first line, calculate the offset between lines and add (or subtract) the offset to get the other lines.

Also, pointer incrementation is much faster than using an array with indexes.

On top of that, make sure to disable range checking in the release code.

There is some code that does it in u_dzGraphicsUtils in my dzlib. If I remember correctly I blogged about it too.

 

Edit: Yes I did: https://blog.dummzeuch.de/2019/12/12/accessing-bitmap-pixels-with-less-scanline-calls-in-delphi/

Edited by dummzeuch
  • Like 1

Share this post


Link to post
2 hours ago, FPiette said:

Please define what is a large bitmap for you.

Customer had bigger than 5000x3000, which is way way too big, but that just brought this piece of code into my attention..

Share this post


Link to post
46 minutes ago, dummzeuch said:

How often do you access the ScanLine property? I found that it is much faster to get the address of the first line, calculate the offset between lines and add (or subtract) the offset to get the other lines.

Also, pointer incrementation is much faster than using an array with indexes.

On top of that, make sure to disable range checking in the release code.

There is some code that does it in u_dzGraphicsUtils in my dzlib. If I remember correctly I blogged about it too.

 

Edit: Yes I did: https://blog.dummzeuch.de/2019/12/12/accessing-bitmap-pixels-with-less-scanline-calls-in-delphi/

Thanks, I'll have a look...

 

Share this post


Link to post
7 minutes ago, Tommi Prami said:

Customer had bigger than 5000x3000, which is way way too big, but that just brought this piece of code into my attention..

So we have 15mil pixel, now lets assume simple naive assembly handling this pixel by pixel in a loop, and here the loop should also be assembly, and i agree with Thomas on how this should be done, (i do the same ) one ScanLine per bitmap, the naive assembly wit general instruction set can be 3 cycle at most with the loop (not considering the memory bottle neck here because there will be, hit and miss on cache also fetching), anyway, 45mil cycle might be achieved means that converting on 3Ghz CPU will take less than a second adding the memory access overhead, the same memory overhead will be there with MMX or SIMD, but with these you can do many pixel per cycle ( may be +32).

 

If you want us to have fun then please put some small code that really pinpoint the bottle neck and test for its correction and let us have out fun ! of course if assembly is on table.

 

ps : i don't quite understand the target is it to convert 32bit RGBA to 24bit RGB or for just storing (wiring the image on net) to save space ? 

  • Thanks 1

Share this post


Link to post
12 minutes ago, Tommi Prami said:
3 hours ago, FPiette said:

Please define what is a large bitmap for you.

Customer had bigger than 5000x3000

This is not what i call a large bitmap. 15 mega pixel is a normal size for picture. Most today's camera produce much larger images. My Sony A7III which is a mid-range camera produce 6000x4000 pixel while a Sony A7RIV produce 9504x6336 pixel image. I developed radiography software where images can be even really much larger .

Share this post


Link to post

I did quick & dumb test that has shown that 100 ScanLines on 5000*5000 bitmap takes 5 seconds (!) because bitmap is recreated in every call. So this is the real handbrake.

Looking at TBitmap.GetScanLine you can extract necessary parts provided you have the pointer to the 1st row from initial ScanLine call.

BytesPerScanline helper method is public so this even won't be a hack.

 

  • Like 2

Share this post


Link to post
On 1/11/2021 at 10:55 AM, Tommi Prami said:

One piece of code we use (3rd party component) is taking bitmapdata and put rgb values to stream.

Whats the purpose of copying bitmap to a (linear) stream ?
This sounds as its for saving to disk.

 

  • Like 1

Share this post


Link to post

 

1 hour ago, Kas Ob. said:

If you want us to have fun then please put some small code that really pinpoint the bottle neck and test for its correction and let us have out fun ! of course if assembly is on table.

A bit premature wouldn't you say. If you consider the rest of the pipeline then using optimized assembly for this will not make any significant difference.

 

1 hour ago, Fr0sT.Brutal said:

I did quick & dumb test that has shown that 100 ScanLines on 5000*5000 bitmap takes 5 seconds (!) because bitmap is recreated in every call. So this is the real handbrake.

Looking at TBitmap.GetScanLine you can extract necessary parts provided you have the pointer to the 1st row from initial ScanLine call.

BytesPerScanline helper method is public so this even won't be a hack.

Good point.

I think I would just not use TBitmap for this and either create a DIB directly or use a TBitmap32 from Graphics32 (with a memory backend).

Share this post


Link to post
19 hours ago, FPiette said:

This is not what i call a large bitmap. 15 mega pixel is a normal size for picture. Most today's camera produce much larger images. My Sony A7III which is a mid-range camera produce 6000x4000 pixel while a Sony A7RIV produce 9504x6336 pixel image. I developed radiography software where images can be even really much larger .

Not huge, but large, but this is for 7x3cm logo on the print so overkill for that. 

But that makes this piece of code even worse 🙂

Share this post


Link to post
18 hours ago, Rollo62 said:

Whats the purpose of copying bitmap to a (linear) stream ?
This sounds as its for saving to disk.

 

Yes, it is saved to file...

Share this post


Link to post
19 hours ago, Fr0sT.Brutal said:

I did quick & dumb test that has shown that 100 ScanLines on 5000*5000 bitmap takes 5 seconds (!) because bitmap is recreated in every call. So this is the real handbrake.

Should always study the code one is calling 🙂 I've always thought that it would just return pointer to the data and offset that. depending the line you access. Good to learn new things.
 

Share this post


Link to post
1 hour ago, Tommi Prami said:

Should always study the code one is calling 🙂

That's true but not always possible. More essential lesson is when one encounters a slowdown it's wise to track what exactly is the cause. One doesn't even need timers and so on, it's enough to just comment out fragments and see what's changing

Edited by Fr0sT.Brutal

Share this post


Link to post
1 hour ago, Tommi Prami said:
20 hours ago, Rollo62 said:

Whats the purpose of copying bitmap to a (linear) stream ?
This sounds as its for saving to disk.

Yes, it is saved to file...

Saved to which file format? Why are you not telling us the full story? It look to me that it is a XY problem.

Share this post


Link to post
1 hour ago, FPiette said:

Saved to which file format? Why are you not telling us the full story? It look to me that it is a XY problem.

I think this is pretty clear in the Caption or I at least think it is pretty self explanatory. Stream of bytes in order of RGB. What happens after this is another story all together. 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×