Parallel Resampling of (VCL-) Bitmaps

Renate Schaaf · April 11, 2022

55 minutes ago, Anders Melander said:

Please verify that the comments I've added in the source are correct

Correct and very clear.

I like the introduction of the MappingTablePrecicion.. constants.

Renate Schaaf · April 18, 2022

I might have introduced a bug in GR32_Resamplers, as it is, the left bound of the source rectangle is ignored. The fix is simple:

Line 1778 needs to be

SourceColor := @Src.Bits[ClusterY[0].Pos * Src.Width+SrcRect.Left];  //+SrcRect.Left was missing!

and line 1806:

        SourceColor := @Src.Bits[ClusterY[Y].Pos * Src.Width+SrcRect.Left];//+SrcRect.Left was missing!

Hope, you read this, Anders. If I don't hear from you, I'll create an issue on GitHub.

Edit: I definitely intoduced it by changing the order of the loops, I checked against an old version. Instead of

+SrcRect.Left

one should probably use

+MapXLoPos

Renate

Edited April 18, 2022 by Renate Schaaf

johnnydp · January 8, 2023

@Renate SchaafInteresting stuff, can you post this with all latest fixex? Have you got own repo with your projects?

Tom F · January 9, 2023

It's nice to see GR32 getting some TLC! Thanks, all.

Renate Schaaf · April 9, 2023

@johnnydp Sorry for the late reply, I took a break from delphi.

I now created a repository on GITHub which contains the latest version of the resampler plus 2 demos.

https://github.com/rmesch/Repository-R.Schaaf

See you, Renate

Anders Melander · April 9, 2023

11 minutes ago, Renate Schaaf said:

https://github.com/rmesch/Repository-R.Schaaf

Note that in Graphics32 I had to revert the change of the Box filter radius from 0.5 back to the original radius of 1.

See: https://github.com/graphics32/graphics32/issues/209

Since you're using a radius of 0.5 you might have the same issue.

Renate Schaaf · April 9, 2023

Hi Anders,

Just tried the new version of Graphics32, and found that the downscaling with Box looks as cr***y as before we changed the radius to 0.5, which IS the logically correct value, since the box function has a support [-0.5,0.5]. I can't see right now what goes wrong with the upscaling, it must be something different.

Anyway, I don't see any problems with upscaling in my code, just tried it with a factor 20.

You did a lot of work on graphics32, will have a closer look.

Renate

FreeDelphiPascal · September 27, 2023

@Renate Schaaf

Hi. Do you have something similar but for parallel jpeg decoding?

Edited September 27, 2023 by FreeDelphiPascal

Renate Schaaf · September 27, 2023

6 hours ago, FreeDelphiPascal said:

Hi. Do you have something similar but for parallel jpeg decoding?

Sorry, no, but it sounds like a good idea. Naively. I have no idea how parallelizable jpeg-decoding is 🙂

Anders Melander · September 28, 2023

1 hour ago, Renate Schaaf said:

I have no idea how parallelizable jpeg-decoding is

Most modern jpegs require sequential decompression due to the compression algorithms used (decompression of a block is based on the result of the previous block); There's nothing much to parallelize.

Jpegs with lots of restart markers (a restart marker means that the result of the previous blocks isn't needed) in the compression stream would benefit from parallelization but it is my understanding that those have become very rare as the problem they were meant to solve (data corruption during download via modem) no longer exist.

Renate Schaaf · September 28, 2023

Hi Anders,

Thanks for explaining. I had a feeling that the compression is too "global" for parallelizing. But ..

From what I have meanwhile read, it seems that parts of the decompression could be done in parallel.

This link is about compression, but couldn't it apply to decompression too? (not that I know anything about it 🙂

https://stackoverflow.com/questions/61850421/how-to-perform-jpeg-encoding-of-a-big-rgb-image-in-parallel

Anyway, there are research papers which claim that they got a speedup from doing the decoding partly in parallel.

Anders Melander · September 28, 2023

3 minutes ago, Renate Schaaf said:

This link is about compression, but couldn't it apply to decompression too? (not that I know anything about it 🙂

https://stackoverflow.com/questions/61850421/how-to-perform-jpeg-encoding-of-a-big-rgb-image-in-parallel

Yes, there will of course always be some parts that can be parallelized but the problem is that the expensive part, the Huffman decoding, cannot.

6 minutes ago, Renate Schaaf said:

Anyway, there are research papers which claim that they got a speedup from doing the decoding partly in parallel.

I'm guessing they used "cooked" jpegs because there's really not much magic that can be done here.

I think the effort is better spent on using SSE, AVX, or the GPU to decode - which is also what I believe most high-performance decoders do.

Renate Schaaf · September 28, 2023

9 minutes ago, Anders Melander said:

I'm guessing they used "cooked" jpegs because there's really not much magic that can be done here.

OK, I'll stop thinking about it. Time to get some sleep:)

chmichael · September 28, 2023

Just curious, anyone tried Skia for resampling ?

FreeDelphiPascal · September 28, 2023

Sorry. My question was maybe not very clear. I am talking about decoding multiple JPG files in parallel.
Maybe in a pool of threads equal to the number of cores...

Anders Melander · September 28, 2023

39 minutes ago, FreeDelphiPascal said:

My question was maybe not very clear.

Oh, you think?

39 minutes ago, FreeDelphiPascal said:

I am talking about decoding multiple JPG files in parallel.
Maybe in a pool of threads equal to the number of cores...

Yes, of course you can do that.

You don't need a special library to decode a jpeg in a thread.

Renate Schaaf · September 29, 2023

22 hours ago, chmichael said:

Just curious, anyone tried Skia for resampling ?

I did a quick test with the demo of the fmx-version of my resampler, just doing "Enable Skia" on the project.

In the demo I compare my results to TCanvas.DrawBitmap with HighSpeed set to false.

I see that the Skia-Canvas is being used, and that HighSpeed=False results in Skia-resampling set to

SkSamplingOptionsHigh  : TSkSamplingOptions = (UseCubic: True; Cubic: (B: 1 / 3; C: 1 / 3); Filter: TSkFilterMode.Nearest; Mipmap: TSkMipmapMode.None);

So, some form of cubic resampling, if I see that right.

Result:

Timing is slightly slower than native fmx-drawing, but still a lot faster than my parallel resampling.

I see no improvement in quality over plain fmx, which supposedly uses bilinear resampling with this setting.

Here are two results: (How do you make this browser use the original pixel size, this is scaled!)

SkiaCubic.png.f13652c0858a401e06076032aa3b29cb.png SkiaCubic2.png.65b76e3916cd0071f9bbdf7b7b9b7461.png

This doesn't look very cubic to me. As a comparison, here are the results of my resampler using the bicubic filter:

uScaleFMXBicubic.png.d5a2e157921608636391f2f07f7e7eb9.png uScaleFMXBicubic2.png.91352bbab470df55495f76c2858faed5.png

I might not have used Skia to its most favorable advantage.

Renate

Renate Schaaf · October 2, 2023

I just uploaded a new version to https://github.com/rmesch/Parallel-Bitmap-Resampler

Newest addition: a parallel unsharp-mask using Gaussian blur. Can be used to sharpen or blur images.

Dedicated VCL-demo "Sharpen.dproj" included. For FMX the effect can be seen in the thumbnail-viewer-demo (ThreadsInThreadsFMX.dproj).

This is for the "modern" version, 10.4 and up.

I haven't ported the unsharp-mask to the legacy version (Delphi 2006 and up) yet, requires more work, but I plan on doing so.

Renate

Anders Melander · October 2, 2023

43 minutes ago, Renate Schaaf said:

Newest addition: a parallel unsharp-mask using Gaussian blur. Can be used to sharpen or blur images.

Have you benchmarked this against some of the existing Gaussian blur implementations?

It's a bit difficult to decode the algorithm you use due to the lack of comments in the source but it appears you are just applying a Gaussian kernel (with some additional logic) and that approach is usually quite slow.

I have a benchmark suite that compares the performance and fidelity of 8 different implementations. I'll try to find time to integrate your implementation into it.

With regard to the ratio between Radius and Sigma, it's my understanding that:

Ratio = 1 / FWHM (Full Width at Half Maximum)
      = 1 / (2 * Sqrt(2 * Ln(2)))
      = 0.424660891294479

But you have a ratio of 0.5

Have I misunderstood something?

Renate Schaaf · October 3, 2023

43 minutes ago, Anders Melander said:

But you have a ratio of 0.5

I took sigma = 0.2*Radius, but it's easy to change that to something more common. I just took a value for which the integral is very close to 1. With respect to other implementations, I'm ready to learn. I just implemented it as accurately as I could think of without being overly slow. Performance is quite satisfying to me, but I bet with your input it'll get faster 🙂

Anders Melander · October 3, 2023

8 minutes ago, Renate Schaaf said:

I took sigma = 0.2*Radius

Yes, you did. My bad. Time for bed 🙂

Renate Schaaf · October 3, 2023

9 hours ago, Anders Melander said:

I have a benchmark suite that compares the performance and fidelity of 8 different implementations. I'll try to find time to integrate your implementation into it.

Hi Anders,

It's great that you think of it, but hold off on that for a bit. I noticed that I compute the weights in a horrendously stupid way. The weights are mostly identical, it's not like when you resample, dumb me. So taking care of that reduces memory usage by a lot and the subsequent application of the weights becomes much faster.

I've also changed the sigma-to-radius ratio a bit according to your suggestion. I find it hard to make results look nice with cutoff at half the max-value, I changed it to 10^-2 times max-value. But this still allows for smaller radii, and it becomes again a bit faster.

So, before you do anything I would like to finish these changes, and also comment the code a bit more. (Forces me to really understand what I'm doing 🙂

Edited October 3, 2023 by Renate Schaaf

Renate Schaaf · October 3, 2023

New version at https://github.com/rmesch/Parallel-Bitmap-Resampler:

Has more efficient code for the unsharp-mask, and I added more comments in code to explain what I'm doing.

Procedures with explaining comments:

uScaleCommon.Gauss

uScaleCommon.MakeGaussContributors

uScaleCommon.ProcessRowUnsharp

and see type TUnsharpParameters in uScale.pas.

Would it be a good idea to overload the UnsharpMask procedure to take sigma instead of radius as a parameter? Might be easier for comparison to other implementations.

Renate Schaaf · October 10, 2023

On 10/3/2023 at 1:23 AM, Anders Melander said:

Have you benchmarked this against some of the existing Gaussian blur implementations?

OK, I plugged my unsharp-mask into the Blurs-example of GR32. Doing so, made me aware of the need to do gamma-correction when you mix colors. So I implemented that, but see below.

Also, I finally included options to properly handle the alpha-channel for the sharpen/blur.

The repo at GitHub has been updated with these changes.

Results:

Quality: My results seem a tad brighter, otherwise I could see no difference between Gaussian and Unsharp.

Performance: Unthreaded routine: For radii up to 8 Unsharp is on par with FastGaussian, after that FastGaussian is the clear winner.

Threaded routine: Always fastest.

If anybody is interested, I am attaching the test project. It of course requires to have GR32 installed. It also requires 10.3 or higher, I guess.

Gamma-correction:

I did it via an 8bit-Table same as GR32. This seems very unprecise to me, but I wouldn't know how to get it any more precise other than operating with floats, no thanks.

Sadly, this can produce visible banding in some images, no matter which blur is used. Here is an example (for uploading all images have been compressed, but the effect is about the same):

Original, a cutout from a picture taken with my digital camera.

Result of Gaussian with Radius = 40 and Gamma = 1.6

When gamma-correction is used for sharpening, bright edge-artifacts are reduced, but dark edge-artifacts are enhanced. My conclusion right now would be to not use gamma-correction.

But if anybody has an idea for how to implement it better, I'm all ears.

Thanks,

Renate

BlurTest.zip

Anders Melander · October 10, 2023

10 minutes ago, Renate Schaaf said:

Performance: Unthreaded routine: For radii up to 8 Unsharp is on par with FastGaussian, after that FastGaussian is the clear winner.

By "FastGaussian" I guess you mean the FastBlur routine?

FastBlur is actually a box blur and not a true Gaussian blur. This is just fine for some setups, but not so great for others.

Also, performance is, as you've discovered, not the only important metric when comparing blurs. Fidelity can also be important. It completely depends on what the blur is used for. Some algorithms are fast but suffer from signal loss or produce artifacts. Some are precise but slow. And then there are some that do it all well 🙂

The parameters below are [Width, Height, Radius]:

image.png.b32a26003d5ca0aef95412a671ac0dcd.png

image.png.5673519d0e33a7f94c76ef0d92d66c0c.png

image.png.812a0c070745c3517cfc7c836547ab9d.png

image.png.68f41d9bef013a399599363f33280f5a.png

Case in point, BoxBlur32 above is consistently the fastest but also has the worst quality and doesn't handle Alpha at all.

45 minutes ago, Renate Schaaf said:

But if anybody has an idea for how to implement it better, I'm all ears.

Use floats and implement it with SSE. That's what I did 🙂

Sign In

Parallel Resampling of (VCL-) Bitmaps

Recommended Posts

Renate Schaaf 64

Share this post

Link to post

Renate Schaaf 64

Share this post

Link to post

johnnydp 16

Share this post

Link to post

Tom F 80

Share this post

Link to post

Renate Schaaf 64

Share this post

Link to post

Anders Melander 1631

Share this post

Link to post

Renate Schaaf 64

Share this post

Link to post

FreeDelphiPascal 11

Share this post

Link to post

Renate Schaaf 64

Share this post

Link to post

Anders Melander 1631

Share this post

Link to post

Renate Schaaf 64

Share this post

Link to post

Anders Melander 1631

Share this post

Link to post

Renate Schaaf 64

Share this post

Link to post

chmichael 12

Share this post

Link to post

FreeDelphiPascal 11

Share this post

Link to post

Anders Melander 1631

Share this post

Link to post

Renate Schaaf 64

Share this post

Link to post

Renate Schaaf 64

Share this post

Link to post

Anders Melander 1631

Share this post

Link to post

Renate Schaaf 64

Share this post

Link to post

Anders Melander 1631

Share this post

Link to post

Renate Schaaf 64

Share this post

Link to post

Renate Schaaf 64

Share this post

Link to post

Renate Schaaf 64

Share this post

Link to post

Anders Melander 1631

Share this post

Link to post

Create an account or sign in to comment

Create an account