hsauro 38 Posted November 23, 2021 If anyone is interested in comparing skia (using skia4delphi) versus VCL for drawing points/pixels, here is a quick demo that draws the Mandelbrot set. The VCL drawing code doesn't use scanlines which I know would be faster but just uses Pixel[i,j] which we know is notoriously slow. The skia code uses DrawPoint although I'm not sure if that is the equivalent to Pixels or not The code also illustrates how you can draw to a skia canvas then copy the result to a TImage. The code could certainly be updated to improve its organization but I didn't have the time. Code at: https://github.com/hsauro/Mandelbrot There is a binary release on Github, https://github.com/hsauro/Mandelbrot/releases/tag/1.0 I couldn't include the binaries here because the skia dll exceeds the size limit for attachments. Spoiler Alert: skia was faster. Share this post Link to post
Anders Melander 1770 Posted November 23, 2021 1 hour ago, hsauro said: If anyone is interested in comparing skia (using skia4delphi) versus VCL for drawing points/pixels .... I just did: TCanvas.Pixels: 320 mS skia4delphi: 130 mS TBitmap.Scanline: 60 mS 2 Share this post Link to post
hsauro 38 Posted November 23, 2021 That is interesting to know, I didn't have the time to try scanline. Share this post Link to post
M.Joos 30 Posted November 26, 2021 On 11/23/2021 at 9:15 PM, Anders Melander said: I just did: TCanvas.Pixels: 320 mS skia4delphi: 130 mS TBitmap.Scanline: 60 mS My guess: Graphics32 would even outperform this. Share this post Link to post
Anders Melander 1770 Posted November 26, 2021 54 minutes ago, M.Joos said: My guess: Graphics32 would even outperform this. No doubt but not by that much. I would guess it would be about 25% faster than TBitmap.ScanLine. I.e. around 45 mS on my system. My point was more that it's not really fair to benchmark anything against TCanvas.Pixels since it's known to be dead slow. It's more of a convenience feature. Without having profiled the code I think the TBitmap.ScanLine implementation mostly suffers from the overhead of the call to GdiFlush inside TBitmap.GetScanLine. Since the original implementation writes the bitmap in X-Y order I duplicated that in order to be fair but this also means that the TBitmap.ScanLine implementation performs Width*Height calls to GetScanLine and thus GdiFlush. I circumvented this by caching the scanline row pointers in an array (without this it would have been slower than using TCanvas.Pixels). The X-Y order also means that each pixel write trashes the CPU cache - It should be significant faster in Y-X order. Share this post Link to post
hsauro 38 Posted November 27, 2021 I also experimented with caching the scan line pointers, and it seemed to work. I played around a little bit with both image32 and graphics32 but didn’t get very far. The advantage of skia over these other two is that it’s completely cross-platform using the same API and seems to generate very smooth antialiased curves, Share this post Link to post
angusj 126 Posted November 27, 2021 (edited) 3 hours ago, hsauro said: The advantage of skia over these other two is that it’s completely cross-platform Hi hsauro. Image32 should be completely cross-platform, so I'd be very interested if you've encountered problems in that regard. 6 hours ago, M.Joos said: My guess: Graphics32 would even outperform this. I also doubt this since the drawing is done by directly addressing (ie coloring) every individual pixel (not by using a polygon renderer). There'd be little to no benefit in using another graphics library here. You could perhaps marginally improve pixel addressing by converting the temporary bitmap into a pf32bit pixelformat, getting the base image address (from bitmap.Scanline[bitmap.Height -1]) and efficiently offsetting that pointer to color everything (as per below). procedure DrawMandelbrotVCL2(bmp: TBitmap; X, Y, au, bu: Double; X2, Y2: Integer); var c1, c2, z1, z2, tmp: Double; i, j, Count, rgb: Integer; hue, saturation, value : double; fr, fg, fb : single; ACanvas: TCanvas; p, currPixel: PColor; bytesPerLine: integer; begin bmp.PixelFormat := pf32bit; //nb: should probably test here for the occas. non-inverted image p := bmp.ScanLine[0]; bytesPerLine := bmp.Width; c2 := bu; for i := 10 to X2 - 1 do begin c1 := au; currPixel := p; inc(currPixel, i); for j := 0 to Y2 - 1 do begin z1 := 0; z2 := 0; Count := 0; // count is deep of iteration of the mandelbrot set // if |z| >=2 then z is not a member of a mandelset while (((z1 * z1 + z2 * z2 < 4) and (Count <= 50))) do begin tmp := z1; z1 := z1 * z1 - z2 * z2 + c1; z2 := 2 * tmp * z2 + c2; Inc(Count); end; // The color depends on the number of iterations hue := count / 50; saturation := 0.6; value := 0.5; currPixel^ := TColor(HSLtoRGB(hue, saturation, value)); c1 := c1 + X; dec(currPixel, bytesPerLine); end; c2 := c2 + Y; end; end; Edited November 27, 2021 by angusj Share this post Link to post
vfbb 281 Posted November 27, 2021 (edited) Hello. I adapted your test with Skia4Delphi to directly access the pixels instead of painting the pixel. The result was 45 ms on win64. procedure TfrmMain.Button3Click(Sender: TObject); procedure DrawMandelbrotPixmap(APixmap: ISkPixmap; X, Y, au, bu: Double; X2, Y2: Integer); var c1, c2, z1, z2, tmp: Double; i, j, Count, rgb: Integer; hue, saturation, value: Double; begin c2 := bu; for i := 10 to X2 - 1 do begin c1 := au; for j := 0 to Y2 - 1 do begin z1 := 0; z2 := 0; Count := 0; // count is deep of iteration of the mandelbrot set // if |z| >=2 then z is not a member of a mandelset while (((z1 * z1 + z2 * z2 < 4) and (Count <= 50))) do begin tmp := z1; z1 := z1 * z1 - z2 * z2 + c1; z2 := 2 * tmp * z2 + c2; Inc(Count); end; // The color depends on the number of iterations hue := count / 50; saturation := 0.6; value := 0.5; PCardinal(APixmap.PixelAddr[i, j])^ := HSLtoRGB(hue, saturation, value); c1 := c1 + X; end; c2 := c2 + Y; end; end; var au, ao: Double; dX, dY, bo, bu: Double; LWidth: Integer; LHeight: Integer; LTimer: TStopwatch; LBitmap: TBitmap; LSurface: ISkSurface; begin LTimer := TStopwatch.StartNew; LWidth := Image1.Width; LHeight := Image1.Height; LSurface := TSkSurface.MakeRaster(LWidth, LHeight); LBitmap := TBitmap.Create(LWidth, LHeight); try ao := 1; au := -2; bo := 1.5; bu := -1.5; // direct scaling cause of speed dX := (ao - au) / (LWidth); dY := (bo - bu) / (LHeight); DrawMandelbrotPixmap(LSurface.PeekPixels, dX, dY, au, bu, LWidth, LHeight); LBitmap.SkiaDraw( procedure (const ACanvas: ISkCanvas) begin ACanvas.DrawImage(LSurface.MakeImageSnapshot, 0, 0); end); Image1.Picture.Assign(LBitmap); finally LBitmap.Free; end; Showmessage(LTimer.Elapsed.TotalMilliseconds.ToString+' ms'); end; However, your benchmark isn't accurate, you're basically changing pixel by pixel, it's not a good way to measure drawing library performance. Also, tasks that change an image pixel by pixel almost always perform better by creating a shader to run on the GPU. This is another advantage of Skia4Delphi, as it allows you to create shaders at runtime through the Skia Shader Language (based on GLSL). Even now I'm preparing a VCL sample of an animated shader, see the performance: 27.11.2021_01.05.45_REC.mp4 27.11.2021_01.05.45_REC.mp4 Edited November 27, 2021 by vfbb 4 Share this post Link to post
hsauro 38 Posted November 27, 2021 (edited) 15 hours ago, angusj said: Hi hsauro. Image32 should be completely cross-platform, so I'd be very interested if you've encountered problems in that regard. I also doubt this since the drawing is done by directly addressing (ie coloring) every individual pixel (not by using a polygon renderer). There'd be little to no benefit in using another graphics library here. You could perhaps marginally improve pixel addressing by converting the temporary bitmap into a pf32bit pixelformat, getting the base image address (from bitmap.Scanline[bitmap.Height -1]) and efficiently offsetting that pointer to color everything (as per below). For some reason I didn't realize image32 was portable, sorry about that. Thanks for the code however, because I wanted to play around a bit more with pixel manipulation with image32. Edited November 27, 2021 by hsauro Share this post Link to post
hsauro 38 Posted November 27, 2021 14 hours ago, vfbb said: Hello. I adapted your test with Skia4Delphi to directly access the pixels instead of painting the pixel. The result was 45 ms on win64. 27.11.2021_01.05.45_REC.mp4 Thanks for the example! I was wondering if there was a direct way to get direct access to the pixels from skia4delphi. Share this post Link to post
hsauro 38 Posted November 29, 2021 On 11/27/2021 at 10:47 AM, hsauro said: For some reason I didn't realize image32 was portable, sorry about that. Thanks for the code however, because I wanted to play around a bit more with pixel manipulation with image32. @angusj I finally got round to looking more seriously Image32, and it's working well so far. In fact, I think I use this instead of GDI+ in the future. Plus as you mentioned earlier it's cross-platform. 1 1 Share this post Link to post
hsauro 38 Posted November 30, 2021 Someone asked that I include the timings in the code and while I was at it I folded in the code provided by contributors here. I'll probably add TImage32 at some point as well. https://github.com/hsauro/Mandelbrot There is a new binary release on Github, https://github.com/hsauro/Mandelbrot/releases/tag/1.1 1 Share this post Link to post
Anders Melander 1770 Posted December 1, 2021 12 hours ago, hsauro said: Someone asked that I include the timings in the code If you're focusing on performance then you should perform the pixel loop in Y-X order instead of X-Y order. - and replace all the constant divisions with reciprocal multiplications. E.g. x/50 = x * (1/50). Share this post Link to post
angusj 126 Posted December 1, 2021 1 hour ago, Anders Melander said: If you're focusing on performance I got the impression from the OP that he was primarily interested in relative performance (ie Skia vs VCL) rather than perfectly optimised performance. I think we've demonstrated that third-party libraries offer no performance benefit when we're only considering coloring individual pixels. Share this post Link to post
Anders Melander 1770 Posted December 1, 2021 3 minutes ago, angusj said: I got the impression from the OP that he was primarily interested in relative performance (ie Skia vs VCL) rather than perfectly optimised performance. I think we've demonstrated that third-party libraries offer no performance benefit when we're only considering coloring individual pixels. Yes, as long as all implementations set pixels via direct memory access then they should perform the same and then benchmarking is pointless. I haven't looked at the new revision to see if that is the case. Share this post Link to post