Renate Schaaf 64 Posted March 31, 2022  I have made my parallel resampling of bitmaps now as fast as I can get it. Now I would find it interesting to know, how the algorithm performs on other systems, and it would be super to get suggestions for improvement. The procedures can be found in the unit uScale under Algorithms in the attached zip. I have tested against Windows StretchBlt-Half_Tone, WICImage and Graphics32. On my PC (AMD Ryzen 7 1700, 8-core) I see a substantial improvement in speed. The threads are based on TThread rather than TTask or TParallel, because I had failures using the latter two, whereas the oldfashioned threads haven't failed me ever in gazillions of runs.  If you want to compile and run the comparison project, you need to include the source folder of Graphics32 (latest version) in your search path. For convenience it is included in the zip under Algorithms. I couldn't find a way to divvy out only the units needed for resampling. The test against Graphics32 might be slightly unfair, because of bitmaps being assigned to TBitmap32 before the processing.  Right now, the procedure itself cannot be run in concurrent threads, because the array of TThreads is a global variable, I need to change the design (ideas welcome). There might be still room for improvement by minimizing the cache misses, but my head can't handle it at the moment.  Hope you find it useful.  Renate  Bitmap Scaling.zip 3 Share this post Link to post
Anders Melander 1783 Posted April 1, 2022 (edited) Very nice.  As a Graphics32 contributor I would have preferred though that you'd spent the (probably considerable) effort improving Graphics32 🙂  A few suggestions: Use TStopWatch to measure elapsed time and add the stopwatch as a parameters to TScaleProcedure. This way the procedure can pause the stopwatch when it's doing something which shouldn't be included in the benchmarking. For example assigning back and forth between TBitmap/TBitmap32. Use TTask instead of TThread. This solves your problem of how to manage the threads and avoids the overhead of creating the threads. I understood that you had problems with TTask, but IMO the solution to that is to locate and fix those problems rather than avoiding TTask. For Graphics32, using the memory backend instead of the GDI backend is better when you don't need to use GDI features. See below. In this case it doesn't make much of a difference though. Your code seems to handle RGBA but as far as I can tell you're doing it wrong. You cannot process each RGBA channel independently. Instead you need to operate on premultiplied RGBA values. For example you wouldn't want an RGBA value of $00FFFFFF to have any influence on neighboring pixels. So: 1) Premultiply, 2) Resample, 3) Unpremultiply. Unfortunately this will probably have a very negative impact on both performance and quality. Have cake/Eat cake - Choose one. FWIW, Graphics32 does have a multi-threaded rasterizer but since it uses TThread it doesn't make sense to use it in your benchmark since the overhead of thread setup/teardown for just a single call would kill performance.   procedure LanczosGR32(const Source, Target: TBitmap; parallel: boolean); var Source32, Target32: TBitmap32; Resampler: TKernelResampler; begin Source32 := TBitmap32.Create(TMemoryBackend); Target32 := TBitmap32.Create(TMemoryBackend); try Source32.Assign(Source); Target32.SetSize(Target.Width, Target.Height); Resampler := TKernelResampler.Create; try Resampler.KernelClassName := 'TLanczosKernel'; Resampler.Resample(Target32, Target32.BoundsRect, Target32.BoundsRect, Source32, Source32.BoundsRect, dmOpaque, nil); finally Resampler.Free; end; Target.Assign(Target32); finally Source32.Free; Target32.Free; end; end;  Edited April 1, 2022 by Anders Melander (edits in blue) Share this post Link to post
Renate Schaaf 64 Posted April 1, 2022 4 hours ago, Anders Melander said: As a Graphics32 contributor I would have preferred though that you'd spent the (probably considerable) effort improving Graphics32 🙂 I can still do that it it's wanted and still makes sense (see below).  4 hours ago, Anders Melander said: You cannot process each RGBA channel independently. Instead you need to operate on premultiplied RGBA values. For example you wouldn't want an RGBA value of $00FFFFFF to have any influence on neighboring pixels. So: 1) Premultiply, 2) Resample, 3) Unpremultiply. Oh, I need to fix that ASA . And include the better GR32-routine.  Thanks for your feedback,  Renate   Share this post Link to post
Anders Melander 1783 Posted April 1, 2022 1 hour ago, Renate Schaaf said: I can still do that it it's wanted and still makes sense (see below). Of course it's wanted. Improvements are always welcome and you seem to have a really good understanding of the subject.  1 hour ago, Renate Schaaf said: Oh, I need to fix that ASA . You might opt to have two versions instead: One for 32-bit opaque RGB and one for 32-bit RGBA.  The premultiply/unpremultiply really messes with the quality since you're operating on 8-bit values. One way to avoid this is to premultiply to floating or fixed point, resample the floating/fixed point values and then unpremultiply back to 8-bit. As you can imagine this doesn't exactly improve the performance or memory usage. The memory usage can be lessened by processing one channel at a time. Share this post Link to post
Renate Schaaf 64 Posted April 1, 2022 50 minutes ago, Anders Melander said: The premultiply/unpremultiply really messes with the quality since you're operating on 8-bit values My cache arrays are floats scaled to integer, so I can do the multiply while filling the array, and the unmultiply on the integers. Let's see. Â 52 minutes ago, Anders Melander said: The memory usage can be lessened by processing one channel at a time. That's a great idea. Â 53 minutes ago, Anders Melander said: Of course it's wanted. Improvements are always welcome I'll see, whether I can plug it in, but the source of Graphics32 is a bit intimidating. But then, being retired, I have all day to understand it. Share this post Link to post
Renate Schaaf 64 Posted April 3, 2022 Now I got the pre-multiplication in place, but before I do an update, I have a question for Anders: Â The pre-multiplication completely erases any RGB-Info stored at transparent pixels. So those can't be unmultiplied. I can see that this is desirable if the image contains a mask and needs to be rescaled. The RGB-part of the scaled image looks something like this, and btw. the result for WICImage looks the same. What puzzles me though is the fact, that Graphics32 still keeps the transparent parts of the RGB in place, so it must have some magic way to compute 0/0, which I would like to learn ... Â Further looking at the source code, I cannot see any pre-multiplication done: C := Src.Bits[X + ClusterY[Y].Pos * Src.Width]; ClustYW := ClusterY[Y].Weight; Inc(Ca, Integer(C shr 24) * ClustYW); Inc(Cr, Integer(C and $00FF0000) shr 16 * ClustYW); Inc(Cg, Integer(C and $0000FF00) shr 8 * ClustYW); Inc(Cb, Integer(C and $000000FF) * ClustYW); The Bytes of the channels are just lumped into one cardinal, the bytes are extracted and multiplied by the weights, or am I just dense, or has this already been done to Src.Bits? Â Have a nice Sunday, Renate Share this post Link to post
Anders Melander 1783 Posted April 3, 2022 20 minutes ago, Renate Schaaf said: The pre-multiplication completely erases any RGB-Info stored at transparent pixels. So those can't be unmultiplied. That's correct. A fully transparent pixel has no color. Â 31 minutes ago, Renate Schaaf said: The RGB-part of the scaled image looks something like this, and btw. the result for WICImage looks the same. I'm not sure what it is you're showing here. Â 33 minutes ago, Renate Schaaf said: Further looking at the source code, I cannot see any pre-multiplication done: I contributed code to handle that in 2011. See TKernelResampler.GetSampleFloat. That said there does seem to be a problem because when I resample in my own bitmap editor the alpha isn't handled correctly. I'll look into it. Share this post Link to post
Anders Melander 1783 Posted April 3, 2022 7 hours ago, Anders Melander said: I contributed code to handle that in 2011. See TKernelResampler.GetSampleFloat. That said there does seem to be a problem because when I resample in my own bitmap editor the alpha isn't handled correctly. I'll look into it. Apparently the GetSampleX methods are only used when sampling via a rasterizer or a transformation (e.g. rotate/skev/etc). Share this post Link to post
Renate Schaaf 64 Posted April 6, 2022 (edited) Here is a new version with lots of changes (thanks, Anders): Â I have changed the algorithm for the filters except box from continous- to discrete-space convolution, because the result didn't seem to look as good as expected, maybe rounding errors, further investigation needed. But I just noticed, that for my videos I need to go back to continuous space, because that gives much smoother zooms and pans. Anyway, now the algorithm itself is more or less identical to Graphics32, so you have a completely fair comparison. I also followed Anders' idea of including a stopwatch in the signature of the test procedures that can be turned on only if it matters. More changes: Alpha: I've put some alpha-shenanigans into the loaded bitmaps, so you can see how the alpha-channel is handled. Â There now is a TAlphaCombineMode=(amIndependent, amPreMultiply, amIgnore) doing the following: Â amIndependent: The behaviour as before, all channels are sampled independently of one another. This is the behavior of GR32 for drawmode dmOpaque. Â amPreMultiply: RGB is mulitplied by alpha prior to resampling, after that nonzero alpha is "unmultiplied". I had to sacrifice some precision for the weights and the alpha-multiplication in order to stay within integer-range. This is the behavior of WICImage with Source.AlphaFormat=afDefined (it seems to have made the same sacrifices). GR32 with drawmode dmBlend does pre-multiply, but doesn't unmultiply, or, rather it "post-multiplies". Â amIgnore: RGB is resampled, alpha is set to 255. Faster for images not needing an alpha-channel. This is the behavior of WICImage with Source.AlphaFormat=afIgnored. Â To prevent apples from being compared to oranges I have included a notice when a certain algorithm does not fully support the chosen mode. To avoid code repetition I had to introduce a number of procedural variables, which slow things down a little bit, but it's still nice and fast. Â Threads: The <= 16 threads hardly consume any CPU-time while waiting, for the time being I want to keep them. There is a TTask-version included in the source, it has worked so far, but TTask behaves somewhat erratically timing-wise, and I don't understand what it's doing. Â I'll try to write a TKernelResample-descendent (unthreaded) for Graphics32, maybe I can speed it up, let's see. Â Renate Bitmap Scaling-New.zip Edited April 6, 2022 by Renate Schaaf Share this post Link to post
Renate Schaaf 64 Posted April 7, 2022 I have been able to make the GR32-resampling as fast as mine unthreaded, by making some simple changes to the procedure GR32_Resamplers.Resample (in Implementation-part): Â Â changing the order of X- and Y- loop in the filling of the horizontal buffer, avoiding jumps in the bitmap-memory, Â using pointers to walk along the arrays, Â turning on compiler-optimization for the procedure (biggest improvement) Â If you want to see for yourself, in the attachment are 3 changed .pas-files that need to overwrite the corresponding ones in the Algorithms-folder under Bitmap Scaling. Â Renate Bitmap Scaling-Diff.zip 1 Share this post Link to post
Anders Melander 1783 Posted April 7, 2022 That's excellent. I'm a bit surprised about the presence of the XY iteration order problem as that's a bit of a rookie mistake. It probably wasn't caught because the outer loop does iterate Y (and then iterates XY in the inner). Anyway, good catch. Â Would you mind either creating a pull request or posting a patched version of GR32_Resamplers.pas (without the reformat as that makes it hard to spot the actual changes - not that I disagree about the need to reformat). Â Your algorithm seems to be consistently 5-15% faster than GR32 on my system (tested with your latest changes). The quality is almost identical. For some kernels yours is marginally better, for some GR32 is marginally better. For the box filter GR32 wins in quality of details, but loses on fidelity (artifacts). WIC wins on both quality and speed. I think these results are as expected; GR32 is paying the price of the overhead a generalized framework. The quality could probably be improved to match WIC but the question is if it's worth the trouble or if the current quality is "good enough"...? Â I haven't had time to investigate why the Resample function doesn't use the alpha-aware methods. I'm knee deep in a quantization/dithering extension for GR32. Â P.S. "const StopWatch: TStopWatch" really should be "var StopWatch: TStopWatch". The only reason it works with const is Delphi doesn't know/care that the methods are modifying the TStopWatch record. Share this post Link to post
Renate Schaaf 64 Posted April 7, 2022 (edited) I'm too stupid to create a pull-request on GitHub, if I read the help for it, I don't understand the first thing about it. Somehow I can post changes to my own repository using GitHub-Desktop, but I don't really understand what it's doing :). So here is the changed GR32_Resamplers.pas. Â 7 hours ago, Anders Melander said: The quality could probably be improved to match WIC but the question is if it's worth the trouble or if the current quality is "good enough"...? I personally think the quality is good enough, on Ultra-High DPI I can hardly see the difference between a low-quality and high-quality filter for "normal" images, glyphs are another story. For me the filters kick in when animating pictures by zooms and pans, the artifacts then really show up. Theoretically the quality could be improved by a higher precision of the weights, which currently run from -256 to 256. Up to $800 should be possible, which I have done for the none-premultiply-modes. But again, I have a hard time seeing a difference. Also, again theoretically, the algorithm using antiderivates of filters should yield better results (except there isn't any in closed form for the Lanczos). But though I can see less artifacts, they decrease details, as you have seen. I've probably made some mistake in the implementation. It could be the same kind of mistake you can make by computing numerical derivatives, Small divided by Small. Â Time to hit the sack. Â Renate GR32_Resamplers.zip Edited April 7, 2022 by Renate Schaaf 1 Share this post Link to post
Anders Melander 1783 Posted April 8, 2022 I've now merged your changes into the main Graphics32 branch. Â In addition I also included the following (very minor) optimizations of the Resample() function: Replaced for loops with while loops where possible. The channels are processed in their physical order: B, G, R, A instead of R, G, B, A. I've replaced the RGB bit-twiddling with regular record field access (32-bit compiler only). For example instead of: with BufferEntry^ do begin B := Integer(SourceColor.ARGB and $000000FF) * ClusterWeight; G := Integer(SourceColor.ARGB and $0000FF00) shr 8 * ClusterWeight; R := Integer(SourceColor.ARGB and $00FF0000) shr 16 * ClusterWeight; A := Integer(SourceColor.ARGB shr 24) * ClusterWeight; end; I now do this: BufferEntry.B := SourceColor.B * ClusterWeight; BufferEntry.G := SourceColor.G * ClusterWeight; BufferEntry.R := SourceColor.R * ClusterWeight; BufferEntry.A := SourceColor.A * ClusterWeight; I'm pretty sure the old method used to be faster but apparently not anymore. Maybe the compiler has gotten better or maybe it's the hardware (I have a pretty old CPU so I doubt it's that). I have only profiled the changes with optimization enabled so it's possible that I've made the unoptimized code slower. With optimization enabled, the performance of Resample() now matches, and in some cases even exceed, your algorithm. However since yours is also alpha-aware I still consider yours faster. Â There are a few additional things that could be done to make the code faster but I'm not really inclined to go there as it would also make it pretty unreadable. Â The performance of Resample() with the 64-bit compiler is horrible. For some reason your code is much faster there. I've not done anything to improve that. Share this post Link to post
dummzeuch 1505 Posted April 8, 2022 4 hours ago, Anders Melander said:  Replaced for loops with while loops where possible What is the advantage of doing that? Share this post Link to post
Renate Schaaf 64 Posted April 8, 2022 4 hours ago, Anders Melander said: With optimization enabled, the performance of Resample() now matches, and in some cases even exceed, your algorithm Confirmed. Nice. 4 hours ago, Anders Melander said: The performance of Resample() with the 64-bit compiler is horrible. I've always been disappointed in the performance of 64-bit code. No idea, why mine is faster. Â Meanwhile I have come up with an alternative way to compute the weights, which seems to decrease artefacts while keeping the brilliance. So far I could not translate it into Graphics32, the filters there all live on different sized intervals, wheras mine all live on [-1,1], and at this time of the day my aging brain can't deal with the math. Maybe tomorrow. Â Share this post Link to post
Anders Melander 1783 Posted April 8, 2022 (edited) 17 minutes ago, dummzeuch said: What is the advantage of doing that? I was surprised that this was necessary at all as I really thought that the compiler already made this optimization. Anyway, basically a "for" loop like this (the loop value isn't used inside the loop): for i := LowValue to HighValue do begin ..stuff here end; is compiled to this pseudo code: var i := LowValue; var test := HighValue - i; if (test = 0) then goto end_loop; :start_loop ..stuff here Inc(i); var test := HighValue - i; if (test <> 0) then goto start_loop; :end_loop  while this equivalent while loop: i := HighValue - LowValue; while (i >= 0) do begin ..stuff here Dec(i); end; is compiled to: var i := HighValue - LowValue; if (test = 0) then goto end_loop; :start_loop ..stuff here Dec(i); if (test <> 0) then goto start_loop; :end_loop Exact same code if iterating from zero in case you wondered. Similarly a for loop where the loop variable is used inside the loop is also slightly faster when implemented as a while loop. Edited April 8, 2022 by Anders Melander 2 Share this post Link to post
Anders Melander 1783 Posted April 8, 2022 5 minutes ago, Renate Schaaf said: at this time of the day my aging brain can't deal with the math. Maybe tomorrow. Take a bath. Always works for me 🙂 Share this post Link to post
Renate Schaaf 64 Posted April 9, 2022 I've managed to translate the alternative computation of weights into Graphics32. It was actually quite easy :). The idea is, to compute the intergral for the convolution with the filter via the midpoint-rule. Before I've used the exact antiderivatives, leading to constant underestimations of peaks and valleys in the bitmap function, and thus to a loss of detail. Now pixels not lying totally within the support of the filter get their weight reduced, leading to less artefacts, but the peaks are better estimated, so contrast and detail is better preserved (the math is for readability): Â //Precision of weights, //Totals Cb,Cg,Cr,Ca in Resample need to be unscaled by Prec * Prec const Prec = $800; function BuildMappingTableNew(DstLo, DstHi: Integer; ClipLo, ClipHi: Integer; SrcLo, SrcHi: Integer; Kernel: TCustomKernel): TMappingTable; var ... begin ... else if Scale < 1 then begin OldScale := Scale; Scale := 1 / Scale; FilterWidth := FilterWidth * Scale; for I := 0 to ClipW - 1 do begin if FullEdge then Center := SrcLo - 0.5 + (I - DstLo + ClipLo + 0.5) * Scale else Center := SrcLo + (I - DstLo + ClipLo) * Scale; Left := Floor(Center - FilterWidth); Right := Ceil(Center + FilterWidth); Count := -Prec; for J := Left to Right do begin //changed part x0 := J - Center; // old weight: Filter(x0*Oldscale)*Oldscale x1 := max(x0 - 0.5, -FilterWidth); x2 := min(x0 + 0.5, FilterWidth); // intersect symmetric interval of length 1 about x0 with support of scaled filter x3 := 0.5 * (x2 + x1); // new center Weight := Round(Prec * Filter(x3 * OldScale) * OldScale * (x2 - x1)); // intersection with support entered into the weight if Weight <> 0 then begin Inc(Count, Weight); K := Length(Result[I]); SetLength(Result[I], K + 1); Result[I][K].Pos := Constrain(J, SrcLo, SrcHi - 1); Result[I][K].Weight := Weight; end; end; ... At first the results were getting too dark and contrast was increased. By increasing the accuracy of the weights and using my own way of rounding the averaged result into bytes, this seems no longer the case: Â If RangeCheck then begin C.B := min((max(Cb, 0) + $1FFFFF) shr 22, 255); //unscale and round C.G := min((max(Cg, 0) + $1FFFFF) shr 22, 255); C.R := min((max(Cr, 0) + $1FFFFF) shr 22, 255); C.A := min((max(Ca, 0) + $1FFFFF) shr 22, 255); end else begin C.B := (Cb + $1FFFFF) shr 22; C.G := (Cg + $1FFFFF) shr 22; C.R := (Cr + $1FFFFF) shr 22; C.A := (Ca + $1FFFFF) shr 22; end; // Combine it with the background case CombineOp of dmOpaque: DstLine[I] := C.ARGB; ... The changed file uScalingProcsGR32.pas is attached. If you are interested in a test, here is a short video, zooms and pans have been done with the new Lanczos. The second picture is one of the most notorious in my collection. uScalingProcsGR32.zip 1 Share this post Link to post
Anders Melander 1783 Posted April 9, 2022 Very, very nice. The performance has suffered a bit but the result is much better. It actually seems that the quality of the Graphics32 resampler now surpasses your original algorithm for most kernels (there's still artifacts with the GR32 box filter). Â I've integrated your changes locally and will commit once I've done a bit more testing. Â One thing that almost gave me brain cancer was the abysmal ASM generated by this code (not your fault): C.B := min((max(Cb, 0) + $1FFFFF) shr 22, 255) It goes something like this cmp dword ptr [Cb],$00 jle A mov eax,[Cb] jmp B A: xor eax,eax B: lea edx,[eax+$001fffff] shr edx,$16 cmp edx,$000000ff jnl C add eax,$001fffff shr eax,$16 jmp D C: mov eax,$000000ff D: mov [C.B],al Luckily the path with that code isn't taken in your example. Share this post Link to post
Renate Schaaf 64 Posted April 10, 2022 (edited) 14 hours ago, Anders Melander said: One thing that almost gave me brain cancer was the abysmal ASM generated by this code (not your fault): I was hoping for you to untwiddle this 🙂  Meanwhile I found the reason for the box-kernel not being up to par, it's here:  function TBoxKernel.GetWidth: TFloat; begin Result := 1; //must be 0.5! end; I also spotted a mistake in my code. It could be that the interval [x0-0.5,x0+0.5] is completely outside of the support of the filter. In this case a false non-zero weight would be generated. So a check of x2>x1 needs to be added: ... for J := Left to Right do begin x0 := J - Center; // previous weight: Filter(x0*Oldscale)*Oldscale x1 := max(x0 - 0.5, -FilterWidth); x2 := min(x0 + 0.5, FilterWidth); // intersect symmetric interval of length 1 about x0 with support of scaled filter if (x2 > x1) then begin x3 := 0.5 * (x2 + x1); // new center Weight := Round(Prec * Filter(x3 * OldScale) * OldScale * (x2 - x1)); // intersection with support entered into the weight if Weight <> 0 then begin Inc(Count, Weight); K := Length(Result[I]); SetLength(Result[I], K + 1); Result[I][K].Pos := Constrain(J, SrcLo, SrcHi - 1); Result[I][K].Weight := Weight; end; end; end; Also, at the analogous place for the case scale>1. The code for the 2 cases could be unified, but it's better to understand as it is. Edited April 10, 2022 by Renate Schaaf Share this post Link to post
Anders Melander 1783 Posted April 10, 2022 9 hours ago, Renate Schaaf said: Result := 1; //must be 0.5! Duh! Yes of course. I've just checked and the value was correct in my original resampler source so it appears to be a copy/paste bug introduced when the code was integrated in GR32. As far as I can tell the bug has been there from the start 😕  Regarding this: x0 := J - Center; // old weight: Filter(x0*Oldscale)*Oldscale x1 := max(x0 - 0.5, -FilterWidth); x2 := min(x0 + 0.5, FilterWidth); // intersect symmetric interval of length 1 about x0 with support of scaled filter x3 := 0.5 * (x2 + x1); // new center Weight := Round(Prec * Filter(x3 * OldScale) * OldScale * (x2 - x1)); // intersection with support entered into the weight You say that you're "computing the integral for the convolution with the filter using the midpoint-rule" so I would have expected to see the averaging of two filter values in order to find the midpoint but I can't really match that with the above. Can you explain what's going on, please? Once I understand it I'll comment the code so it's maintainable.  Also, am I correct in assuming that you're doing calculation in [21:11] fixed precision and storing the weights in [10:22] instead of the old [24:8] and [16:16]? Share this post Link to post
Renate Schaaf 64 Posted April 10, 2022 2 hours ago, Anders Melander said: You say that you're "computing the integral for the convolution with the filter using the midpoint-rule" so I would have expected to see the averaging of two filter values in order to find the midpoint but I can't really match that with the above. Can you explain what's going on, please? No, that would be the trapezoidal rule, and that is just as bad as using the antiderivatives. Midpoint rule: integral from x1 to x2 f(x) dx is appoximately f(0.5*(x1+x2))*(x2-x1). The multiplications by oldscale transform this from the scale of the source (pixelwidth 1) to the scale of the destination ("pixelwidth" NewWidth/OldWidth). If you want to know why using the integral is a good way of thinking about the algorithm, there's a little article in the doc folder. I couldn't explain it any better here. 2 hours ago, Anders Melander said: Also, am I correct in assuming that you're doing calculation in [21:11] fixed precision and storing the weights in [10:22] instead of the old [24:8] and [16:16]? I'm not sure what you're asking, could you do some explaining back? For the pre-mult I'm using precision $100, for the others $800. Is that what you are asking?  BTW, the check x2>x1 isn't necessary, Filter(x3) would be zero if not true.    Share this post Link to post
Anders Melander 1783 Posted April 10, 2022 4 hours ago, Renate Schaaf said: No, that would be the trapezoidal rule, and that is just as bad as using the antiderivatives. Midpoint rule: integral from x1 to x2 f(x) dx is appoximately f(0.5*(x1+x2))*(x2-x1). Yes, you're right. I understand the midpoint rule but I just can't make your code fit it. Let me have a look again... Ah, now I get it. It was your comments that confused me. For example: // old weight: Filter(x0*Oldscale)*Oldscale If I just read the code and ignore the comments it makes much more sense. That's a new one 🙂  4 hours ago, Renate Schaaf said: For the pre-mult I'm using precision $100, for the others $800. Is that what you are asking? I mean that when you operate on 32-bit integer values which has been multiplied by $800 (= 1 shl 11) it means that the upper 21 bits will contain the integer part and the lower 11 bits the fractional part. In other words [21:11] fixed precision. When applying the weight values you do a shr 22 (= $800*$800) to convert from [10:22] fixed precision back to integer. The [10:22] must mean that you have multiplied two [21:11] values at some point but I can't really spot where that happens.  FWIW much of Graphics32 supports the TFixed [16:16] fixed precision type. For example many methods allow you to specify coordinates in either integer, TFixed or floating point format.  4 hours ago, Renate Schaaf said: BTW, the check x2>x1 isn't necessary, Filter(x3) would be zero if not true. I don't understand what this refers to. Share this post Link to post
Renate Schaaf 64 Posted April 11, 2022 (edited) 5 hours ago, Anders Melander said: The [10:22] must mean that you have multiplied two [21:11] values at some point but I can't really spot where that happens. One weight in x-direction times another one in y-direction. Thanks for the explanation, now I understand the notation.  5 hours ago, Anders Melander said: I don't understand what this refers to That my posted "correction" can be safely ignored :).  5 hours ago, Anders Melander said: Ah, now I get it. It was your comments that confused me. For example: Sometimes I'm making it too complicated, sorry. Edited April 11, 2022 by Renate Schaaf Share this post Link to post
Anders Melander 1783 Posted April 11, 2022 11 hours ago, Renate Schaaf said: One weight in x-direction times another one in y-direction. I had to reread the source 3 times before I spotted the place it occurs. "with" strikes again: ClusterWeight := ClusterX[X].Weight; with HorzBuffer[ClusterX[X].Pos - MapXLoPos] do begin Inc(Cb, B * ClusterWeight); // Note: Fixed precision multiplication done here Inc(Cg, G * ClusterWeight); Inc(Cr, R * ClusterWeight); Inc(Ca, A * ClusterWeight); end; As you can see I have now added a comment to make it obvious 🙂  Anyway, your changes has now been committed. Please verify that the comments I've added in the source are correct. I will continue working on correcting the alpha handling (i.e. do premultiplication). 1 Share this post Link to post