Renate Schaaf 64 Posted April 11, 2022 55 minutes ago, Anders Melander said: Please verify that the comments I've added in the source are correct Correct and very clear. Â I like the introduction of the MappingTablePrecicion.. constants. Share this post Link to post
Renate Schaaf 64 Posted April 18, 2022 (edited) I might have introduced a bug in GR32_Resamplers, as it is, the left bound of the source rectangle is ignored. The fix is simple:  Line 1778 needs to be  SourceColor := @Src.Bits[ClusterY[0].Pos * Src.Width+SrcRect.Left]; //+SrcRect.Left was missing! and line 1806: SourceColor := @Src.Bits[ClusterY[Y].Pos * Src.Width+SrcRect.Left];//+SrcRect.Left was missing! Hope, you read this, Anders. If I don't hear from you, I'll create an issue on GitHub.  Edit: I definitely intoduced it by changing the order of the loops, I checked against an old version. Instead of +SrcRect.Left one should probably use +MapXLoPos  Renate Edited April 18, 2022 by Renate Schaaf 1 Share this post Link to post
johnnydp 20 Posted January 8, 2023 @Renate SchaafInteresting stuff, can you post this with all latest fixex? Have you got own repo with your projects? 1 Share this post Link to post
Tom F 83 Posted January 9, 2023 It's nice to see GR32 getting some TLC! Thanks, all.  Share this post Link to post
Renate Schaaf 64 Posted April 9, 2023 @johnnydp Sorry for the late reply, I took a break from delphi. I now created a repository on GITHub which contains the latest version of the resampler plus 2 demos. https://github.com/rmesch/Repository-R.Schaaf  See you, Renate 2 Share this post Link to post
Anders Melander 1783 Posted April 9, 2023 11 minutes ago, Renate Schaaf said: https://github.com/rmesch/Repository-R.Schaaf Note that in Graphics32 I had to revert the change of the Box filter radius from 0.5 back to the original radius of 1. See: https://github.com/graphics32/graphics32/issues/209 Since you're using a radius of 0.5 you might have the same issue. Share this post Link to post
Renate Schaaf 64 Posted April 9, 2023 Hi Anders, Just tried the new version of Graphics32, and found that the downscaling with Box looks as cr***y as before we changed the radius to 0.5, which IS the logically correct value, since the box function has a support [-0.5,0.5]. I can't see right now what goes wrong with the upscaling, it must be something different. Anyway, I don't see any problems with upscaling in my code, just tried it with a factor 20. Â You did a lot of work on graphics32, will have a closer look. Â Renate Share this post Link to post
FreeDelphiPascal 19 Posted September 27, 2023 (edited) @Renate Schaaf Hi. Do you have something similar but for parallel jpeg decoding? Edited September 27, 2023 by FreeDelphiPascal Share this post Link to post
Renate Schaaf 64 Posted September 27, 2023 6 hours ago, FreeDelphiPascal said: Hi. Do you have something similar but for parallel jpeg decoding? Sorry, no, but it sounds like a good idea. Naively. I have no idea how parallelizable jpeg-decoding is 🙂 Share this post Link to post
Anders Melander 1783 Posted September 28, 2023 1 hour ago, Renate Schaaf said: I have no idea how parallelizable jpeg-decoding is Most modern jpegs require sequential decompression due to the compression algorithms used (decompression of a block is based on the result of the previous block); There's nothing much to parallelize. Â Jpegs with lots of restart markers (a restart marker means that the result of the previous blocks isn't needed) in the compression stream would benefit from parallelization but it is my understanding that those have become very rare as the problem they were meant to solve (data corruption during download via modem) no longer exist. Share this post Link to post
Renate Schaaf 64 Posted September 28, 2023 Hi Anders, Thanks for explaining. I had a feeling that the compression is too "global" for parallelizing. But .. From what I have meanwhile read, it seems that parts of the decompression could be done in parallel. This link is about compression, but couldn't it apply to decompression too? (not that I know anything about it 🙂 https://stackoverflow.com/questions/61850421/how-to-perform-jpeg-encoding-of-a-big-rgb-image-in-parallel Anyway, there are research papers which claim that they got a speedup from doing the decoding partly in parallel. Share this post Link to post
Anders Melander 1783 Posted September 28, 2023 3 minutes ago, Renate Schaaf said: This link is about compression, but couldn't it apply to decompression too? (not that I know anything about it 🙂 https://stackoverflow.com/questions/61850421/how-to-perform-jpeg-encoding-of-a-big-rgb-image-in-parallel Yes, there will of course always be some parts that can be parallelized but the problem is that the expensive part, the Huffman decoding, cannot.  6 minutes ago, Renate Schaaf said: Anyway, there are research papers which claim that they got a speedup from doing the decoding partly in parallel. I'm guessing they used "cooked" jpegs because there's really not much magic that can be done here.  I think the effort is better spent on using SSE, AVX, or the GPU to decode - which is also what I believe most high-performance decoders do. Share this post Link to post
Renate Schaaf 64 Posted September 28, 2023 9 minutes ago, Anders Melander said: I'm guessing they used "cooked" jpegs because there's really not much magic that can be done here. OK, I'll stop thinking about it. Time to get some sleep:) Share this post Link to post
chmichael 12 Posted September 28, 2023 Just curious, anyone tried Skia for resampling ? Â Share this post Link to post
FreeDelphiPascal 19 Posted September 28, 2023 Sorry. My question was maybe not very clear. I am talking about decoding multiple JPG files in parallel. Maybe in a pool of threads equal to the number of cores... Share this post Link to post
Anders Melander 1783 Posted September 28, 2023 39 minutes ago, FreeDelphiPascal said: My question was maybe not very clear. Oh, you think?  39 minutes ago, FreeDelphiPascal said: I am talking about decoding multiple JPG files in parallel. Maybe in a pool of threads equal to the number of cores...  Yes, of course you can do that. You don't need a special library to decode a jpeg in a thread. Share this post Link to post
Renate Schaaf 64 Posted September 29, 2023 22 hours ago, chmichael said: Just curious, anyone tried Skia for resampling ? I did a quick test with the demo of the fmx-version of my resampler, just doing "Enable Skia" on the project. In the demo I compare my results to TCanvas.DrawBitmap with HighSpeed set to false. I see that the Skia-Canvas is being used, and that HighSpeed=False results in Skia-resampling set to SkSamplingOptionsHigh : TSkSamplingOptions = (UseCubic: True; Cubic: (B: 1 / 3; C: 1 / 3); Filter: TSkFilterMode.Nearest; Mipmap: TSkMipmapMode.None); So, some form of cubic resampling, if I see that right.  Result: Timing is slightly slower than native fmx-drawing, but still a lot faster than my parallel resampling. I see no improvement in quality over plain fmx, which supposedly uses bilinear resampling with this setting. Here are two results: (How do you make this browser use the original pixel size, this is scaled!) This doesn't look very cubic to me. As a comparison, here are the results of my resampler using the bicubic filter:  I might not have used Skia to its most favorable advantage.  Renate 2 Share this post Link to post
Renate Schaaf 64 Posted October 2, 2023 I just uploaded a new version to https://github.com/rmesch/Parallel-Bitmap-Resampler  Newest addition: a parallel unsharp-mask using Gaussian blur. Can be used to sharpen or blur images. Dedicated VCL-demo "Sharpen.dproj" included. For FMX the effect can be seen in the thumbnail-viewer-demo (ThreadsInThreadsFMX.dproj).  This is for the "modern" version, 10.4 and up.  I haven't ported the unsharp-mask to the legacy version (Delphi 2006 and up) yet, requires more work, but I plan on doing so.  Renate Share this post Link to post
Anders Melander 1783 Posted October 2, 2023 43 minutes ago, Renate Schaaf said: Newest addition: a parallel unsharp-mask using Gaussian blur. Can be used to sharpen or blur images. Have you benchmarked this against some of the existing Gaussian blur implementations? Â It's a bit difficult to decode the algorithm you use due to the lack of comments in the source but it appears you are just applying a Gaussian kernel (with some additional logic) and that approach is usually quite slow. Â I have a benchmark suite that compares the performance and fidelity of 8 different implementations. I'll try to find time to integrate your implementation into it. Â With regard to the ratio between Radius and Sigma, it's my understanding that: Ratio = 1 / FWHM (Full Width at Half Maximum) = 1 / (2 * Sqrt(2 * Ln(2))) = 0.424660891294479 But you have a ratio of 0.5 Have I misunderstood something? Share this post Link to post
Renate Schaaf 64 Posted October 3, 2023 43 minutes ago, Anders Melander said: But you have a ratio of 0.5 I took sigma = 0.2*Radius, but it's easy to change that to something more common. I just took a value for which the integral is very close to 1. With respect to other implementations, I'm ready to learn. I just implemented it as accurately as I could think of without being overly slow. Performance is quite satisfying to me, but I bet with your input it'll get faster 🙂 Share this post Link to post
Anders Melander 1783 Posted October 3, 2023 8 minutes ago, Renate Schaaf said: I took sigma = 0.2*Radius Yes, you did. My bad. Time for bed 🙂 Share this post Link to post
Renate Schaaf 64 Posted October 3, 2023 (edited) 9 hours ago, Anders Melander said: I have a benchmark suite that compares the performance and fidelity of 8 different implementations. I'll try to find time to integrate your implementation into it. Hi Anders, It's great that you think of it, but hold off on that for a bit. I noticed that I compute the weights in a horrendously stupid way. The weights are mostly identical, it's not like when you resample, dumb me. So taking care of that reduces memory usage by a lot and the subsequent application of the weights becomes much faster. I've also changed the sigma-to-radius ratio a bit according to your suggestion. I find it hard to make results look nice with cutoff at half the max-value, I changed it to 10^-2 times max-value. But this still allows for smaller radii, and it becomes again a bit faster. So, before you do anything I would like to finish these changes, and also comment the code a bit more. (Forces me to really understand what I'm doing 🙂 Edited October 3, 2023 by Renate Schaaf Share this post Link to post
Renate Schaaf 64 Posted October 3, 2023 New version at https://github.com/rmesch/Parallel-Bitmap-Resampler:  Has more efficient code for the unsharp-mask, and I added more comments in code to explain what I'm doing.  Procedures with explaining comments: uScaleCommon.Gauss uScaleCommon.MakeGaussContributors uScaleCommon.ProcessRowUnsharp  and see type TUnsharpParameters in uScale.pas.  Would it be a good idea to overload the UnsharpMask procedure to take sigma instead of radius as a parameter? Might be easier for comparison to other implementations. 1 Share this post Link to post
Renate Schaaf 64 Posted October 10, 2023 On 10/3/2023 at 1:23 AM, Anders Melander said: Have you benchmarked this against some of the existing Gaussian blur implementations? OK, I plugged my unsharp-mask into the Blurs-example of GR32. Doing so, made me aware of the need to do gamma-correction when you mix colors. So I implemented that, but see below. Also, I finally included options to properly handle the alpha-channel for the sharpen/blur. The repo at GitHub has been updated with these changes. Â Results: Â Quality: My results seem a tad brighter, otherwise I could see no difference between Gaussian and Unsharp. Â Performance: Unthreaded routine: For radii up to 8 Unsharp is on par with FastGaussian, after that FastGaussian is the clear winner. Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Threaded routine: Always fastest. Â If anybody is interested, I am attaching the test project. It of course requires to have GR32 installed. It also requires 10.3 or higher, I guess. Â Gamma-correction: Â I did it via an 8bit-Table same as GR32. This seems very unprecise to me, but I wouldn't know how to get it any more precise other than operating with floats, no thanks. Â Sadly, this can produce visible banding in some images, no matter which blur is used. Here is an example (for uploading all images have been compressed, but the effect is about the same): Â Original, a cutout from a picture taken with my digital camera. Result of Gaussian with Radius = 40 and Gamma = 1.6 Â When gamma-correction is used for sharpening, bright edge-artifacts are reduced, but dark edge-artifacts are enhanced. My conclusion right now would be to not use gamma-correction. But if anybody has an idea for how to implement it better, I'm all ears. Â Thanks, Renate BlurTest.zip Share this post Link to post
Anders Melander 1783 Posted October 10, 2023 10 minutes ago, Renate Schaaf said: Performance: Unthreaded routine: For radii up to 8 Unsharp is on par with FastGaussian, after that FastGaussian is the clear winner. By "FastGaussian" I guess you mean the FastBlur routine? FastBlur is actually a box blur and not a true Gaussian blur. This is just fine for some setups, but not so great for others.  Also, performance is, as you've discovered, not the only important metric when comparing blurs. Fidelity can also be important. It completely depends on what the blur is used for. Some algorithms are fast but suffer from signal loss or produce artifacts. Some are precise but slow. And then there are some that do it all well 🙂  The parameters below are [Width, Height, Radius]: Case in point, BoxBlur32 above is consistently the fastest but also has the worst quality and doesn't handle Alpha at all.  45 minutes ago, Renate Schaaf said: But if anybody has an idea for how to implement it better, I'm all ears. Use floats and implement it with SSE. That's what I did 🙂 Share this post Link to post