Skia versus VCL for plotting points

hsauro · November 23, 2021

If anyone is interested in comparing skia (using skia4delphi) versus VCL for drawing points/pixels, here is a quick demo that draws the Mandelbrot set. The VCL drawing code doesn't use scanlines which I know would be faster but just uses Pixel[i,j] which we know is notoriously slow. The skia code uses DrawPoint although I'm not sure if that is the equivalent to Pixels or not

The code also illustrates how you can draw to a skia canvas then copy the result to a TImage. The code could certainly be updated to improve its organization but I didn't have the time.

Code at:

https://github.com/hsauro/Mandelbrot

There is a binary release on Github,

https://github.com/hsauro/Mandelbrot/releases/tag/1.0

I couldn't include the binaries here because the skia dll exceeds the size limit for attachments.

Spoiler Alert: skia was faster.

Anders Melander · November 23, 2021

1 hour ago, hsauro said:

If anyone is interested in comparing skia (using skia4delphi) versus VCL for drawing points/pixels ....

I just did:

TCanvas.Pixels: 320 mS
skia4delphi: 130 mS
TBitmap.Scanline: 60 mS

hsauro · November 23, 2021

That is interesting to know, I didn't have the time to try scanline.

M.Joos · November 26, 2021

On 11/23/2021 at 9:15 PM, Anders Melander said:

I just did:

TCanvas.Pixels: 320 mS

skia4delphi: 130 mS

TBitmap.Scanline: 60 mS

My guess: Graphics32 would even outperform this.

Anders Melander · November 26, 2021

54 minutes ago, M.Joos said:

My guess: Graphics32 would even outperform this.

No doubt but not by that much. I would guess it would be about 25% faster than TBitmap.ScanLine. I.e. around 45 mS on my system.

My point was more that it's not really fair to benchmark anything against TCanvas.Pixels since it's known to be dead slow. It's more of a convenience feature.

Without having profiled the code I think the TBitmap.ScanLine implementation mostly suffers from the overhead of the call to GdiFlush inside TBitmap.GetScanLine. Since the original implementation writes the bitmap in X-Y order I duplicated that in order to be fair but this also means that the TBitmap.ScanLine implementation performs Width*Height calls to GetScanLine and thus GdiFlush. I circumvented this by caching the scanline row pointers in an array (without this it would have been slower than using TCanvas.Pixels). The X-Y order also means that each pixel write trashes the CPU cache - It should be significant faster in Y-X order.

hsauro · November 27, 2021

I also experimented with caching the scan line pointers, and it seemed to work. I played around a little bit with both image32 and graphics32 but didn’t get very far. The advantage of skia over these other two is that it’s completely cross-platform using the same API and seems to generate very smooth antialiased curves,

angusj · November 27, 2021

3 hours ago, hsauro said:

The advantage of skia over these other two is that it’s completely cross-platform

Hi hsauro. Image32 should be completely cross-platform, so I'd be very interested if you've encountered problems in that regard.

6 hours ago, M.Joos said:

My guess: Graphics32 would even outperform this.

I also doubt this since the drawing is done by directly addressing (ie coloring) every individual pixel (not by using a polygon renderer). There'd be little to no benefit in using another graphics library here. You could perhaps marginally improve pixel addressing by converting the temporary bitmap into a pf32bit pixelformat, getting the base image address (from bitmap.Scanline[bitmap.Height -1]) and efficiently offsetting that pointer to color everything (as per below).

procedure DrawMandelbrotVCL2(bmp: TBitmap; X, Y, au, bu: Double; X2, Y2: Integer);
var
  c1, c2, z1, z2, tmp: Double;
  i, j, Count, rgb: Integer;
  hue, saturation, value : double;
  fr, fg, fb : single;
  ACanvas: TCanvas;
  p, currPixel: PColor;
  bytesPerLine: integer;
begin
  bmp.PixelFormat := pf32bit;
  //nb: should probably test here for the occas. non-inverted image
  p := bmp.ScanLine[0];
  bytesPerLine := bmp.Width;
  c2 := bu;
  for i := 10 to X2 - 1 do
  begin
    c1 := au;
    currPixel := p;
    inc(currPixel, i);
    for j := 0 to Y2 - 1 do
    begin
      z1 := 0;
      z2 := 0;
      Count := 0;
      // count is deep of iteration of the mandelbrot set
      // if |z| >=2 then z is not a member of a mandelset
      while (((z1 * z1 + z2 * z2 < 4) and (Count <= 50))) do
      begin
        tmp := z1;
        z1 := z1 * z1 - z2 * z2 + c1;
        z2 := 2 * tmp * z2 + c2;
        Inc(Count);
      end;
      // The color depends on the number of iterations
      hue := count / 50;
      saturation := 0.6;
      value := 0.5;
      currPixel^ := TColor(HSLtoRGB(hue, saturation, value));
      c1 := c1 + X;
      dec(currPixel, bytesPerLine);
    end;
    c2 := c2 + Y;
  end;
end;

Edited November 27, 2021 by angusj

vfbb · November 27, 2021

Hello. I adapted your test with Skia4Delphi to directly access the pixels instead of painting the pixel. The result was 45 ms on win64.

procedure TfrmMain.Button3Click(Sender: TObject);

  procedure DrawMandelbrotPixmap(APixmap: ISkPixmap; X, Y, au, bu: Double; X2, Y2: Integer);
  var
    c1, c2, z1, z2, tmp: Double;
    i, j, Count, rgb: Integer;
    hue, saturation, value: Double;
  begin
    c2 := bu;
    for i := 10 to X2 - 1 do
    begin
      c1 := au;
      for j := 0 to Y2 - 1 do
      begin
        z1 := 0;
        z2 := 0;
        Count := 0;
        // count is deep of iteration of the mandelbrot set
        // if |z| >=2 then z is not a member of a mandelset
        while (((z1 * z1 + z2 * z2 < 4) and (Count <= 50))) do
        begin
          tmp := z1;
          z1 := z1 * z1 - z2 * z2 + c1;
          z2 := 2 * tmp * z2 + c2;
          Inc(Count);
        end;
        // The color depends on the number of iterations
        hue := count / 50;
        saturation := 0.6;
        value := 0.5;

        PCardinal(APixmap.PixelAddr[i, j])^ := HSLtoRGB(hue, saturation, value);
        c1 := c1 + X;
      end;
      c2 := c2 + Y;
    end;
  end;

var
  au, ao: Double;
  dX, dY, bo, bu: Double;
  LWidth: Integer;
  LHeight: Integer;
  LTimer: TStopwatch;
  LBitmap: TBitmap;
  LSurface: ISkSurface;
begin
  LTimer := TStopwatch.StartNew;
  LWidth := Image1.Width;
  LHeight := Image1.Height;
  LSurface := TSkSurface.MakeRaster(LWidth, LHeight);

  LBitmap := TBitmap.Create(LWidth, LHeight);
  try
    ao := 1;
    au := -2;
    bo := 1.5;
    bu := -1.5;
    // direct scaling cause of speed
    dX := (ao - au) / (LWidth);
    dY := (bo - bu) / (LHeight);
    DrawMandelbrotPixmap(LSurface.PeekPixels, dX, dY, au, bu, LWidth, LHeight);
    LBitmap.SkiaDraw(
      procedure (const ACanvas: ISkCanvas)
      begin
        ACanvas.DrawImage(LSurface.MakeImageSnapshot, 0, 0);
      end);
    Image1.Picture.Assign(LBitmap);
  finally
    LBitmap.Free;
  end;

  Showmessage(LTimer.Elapsed.TotalMilliseconds.ToString+' ms');
end;

However, your benchmark isn't accurate, you're basically changing pixel by pixel, it's not a good way to measure drawing library performance.

Also, tasks that change an image pixel by pixel almost always perform better by creating a shader to run on the GPU. This is another advantage of Skia4Delphi, as it allows you to create shaders at runtime through the Skia Shader Language (based on GLSL). Even now I'm preparing a VCL sample of an animated shader, see the performance:

Edited November 27, 2021 by vfbb

hsauro · November 27, 2021

15 hours ago, angusj said:

Hi hsauro. Image32 should be completely cross-platform, so I'd be very interested if you've encountered problems in that regard.

I also doubt this since the drawing is done by directly addressing (ie coloring) every individual pixel (not by using a polygon renderer). There'd be little to no benefit in using another graphics library here. You could perhaps marginally improve pixel addressing by converting the temporary bitmap into a pf32bit pixelformat, getting the base image address (from bitmap.Scanline[bitmap.Height -1]) and efficiently offsetting that pointer to color everything (as per below).

For some reason I didn't realize image32 was portable, sorry about that. Thanks for the code however, because I wanted to play around a bit more with pixel manipulation with image32.

Edited November 27, 2021 by hsauro

hsauro · November 27, 2021

14 hours ago, vfbb said:

Hello. I adapted your test with Skia4Delphi to directly access the pixels instead of painting the pixel. The result was 45 ms on win64.

Thanks for the example! I was wondering if there was a direct way to get direct access to the pixels from skia4delphi.

hsauro · November 29, 2021

On 11/27/2021 at 10:47 AM, hsauro said:

For some reason I didn't realize image32 was portable, sorry about that. Thanks for the code however, because I wanted to play around a bit more with pixel manipulation with image32.

@angusj I finally got round to looking more seriously Image32, and it's working well so far. In fact, I think I use this instead of GDI+ in the future. Plus as you mentioned earlier it's cross-platform.

hsauro · November 30, 2021

Someone asked that I include the timings in the code and while I was at it I folded in the code provided by contributors here. I'll probably add TImage32 at some point as well.

https://github.com/hsauro/Mandelbrot

There is a new binary release on Github,

https://github.com/hsauro/Mandelbrot/releases/tag/1.1

Anders Melander · December 1, 2021

12 hours ago, hsauro said:

Someone asked that I include the timings in the code

If you're focusing on performance then you should perform the pixel loop in Y-X order instead of X-Y order.

- and replace all the constant divisions with reciprocal multiplications. E.g. x/50 = x * (1/50).

angusj · December 1, 2021

1 hour ago, Anders Melander said:

If you're focusing on performance

I got the impression from the OP that he was primarily interested in relative performance (ie Skia vs VCL) rather than perfectly optimised performance.

I think we've demonstrated that third-party libraries offer no performance benefit when we're only considering coloring individual pixels.

Anders Melander · December 1, 2021

3 minutes ago, angusj said:

I got the impression from the OP that he was primarily interested in relative performance (ie Skia vs VCL) rather than perfectly optimised performance.

I think we've demonstrated that third-party libraries offer no performance benefit when we're only considering coloring individual pixels.

Yes, as long as all implementations set pixels via direct memory access then they should perform the same and then benchmarking is pointless. I haven't looked at the new revision to see if that is the case.

Sign In

Skia versus VCL for plotting points

Recommended Posts

hsauro 43

Share this post

Link to post

Anders Melander 2075

Share this post

Link to post

hsauro 43

Share this post

Link to post

M.Joos 30

Share this post

Link to post

Anders Melander 2075

Share this post

Link to post

hsauro 43

Share this post

Link to post

angusj 126

Share this post

Link to post

vfbb 299

Share this post

Link to post

hsauro 43

Share this post

Link to post

hsauro 43

Share this post

Link to post

hsauro 43

Share this post

Link to post

hsauro 43

Share this post

Link to post

Anders Melander 2075

Share this post

Link to post

angusj 126

Share this post

Link to post

Anders Melander 2075

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity