Jump to content
hsauro

Skia versus VCL for plotting points

Recommended Posts

If anyone is interested in comparing skia (using skia4delphi) versus VCL for drawing points/pixels, here is a quick demo that draws the Mandelbrot set. The VCL drawing code doesn't use scanlines which I know would be faster but just uses Pixel[i,j] which we know is notoriously slow. The skia code uses DrawPoint although I'm not sure if that is the equivalent to Pixels or not

 

The code also illustrates how you can draw to a skia canvas then copy the result to a TImage. The code could certainly be updated to improve its organization but I didn't have the time. 

 

Code at: 

 

https://github.com/hsauro/Mandelbrot

 

There is a binary release on Github,

 

https://github.com/hsauro/Mandelbrot/releases/tag/1.0

 

I couldn't include the binaries here because the skia dll exceeds the size limit for attachments. 

 

Spoiler Alert: skia was faster.

Share this post


Link to post
1 hour ago, hsauro said:

If anyone is interested in comparing skia (using skia4delphi) versus VCL for drawing points/pixels ....

I just did:

  • TCanvas.Pixels: 320 mS
  • skia4delphi: 130 mS
  • TBitmap.Scanline: 60 mS
  • Like 2

Share this post


Link to post
On 11/23/2021 at 9:15 PM, Anders Melander said:

I just did:

  • TCanvas.Pixels: 320 mS
  • skia4delphi: 130 mS
  • TBitmap.Scanline: 60 mS

My guess: Graphics32 would even outperform this.

Share this post


Link to post
54 minutes ago, M.Joos said:

My guess: Graphics32 would even outperform this.

No doubt but not by that much. I would guess it would be about 25% faster than TBitmap.ScanLine. I.e. around 45 mS on my system.

My point was more that it's not really fair to benchmark anything against TCanvas.Pixels since it's known to be dead slow. It's more of a convenience feature.

 

Without having profiled the code I think the TBitmap.ScanLine implementation mostly suffers from the overhead of the call to GdiFlush inside TBitmap.GetScanLine. Since the original implementation writes the bitmap in X-Y order I duplicated that in order to be fair but this also means that the TBitmap.ScanLine implementation performs Width*Height calls to GetScanLine and thus GdiFlush. I circumvented this by caching the scanline row pointers in an array (without this it would have been slower than using TCanvas.Pixels). The X-Y order also means that each pixel write trashes the CPU cache - It should be significant faster in Y-X order.

Share this post


Link to post

I also experimented with caching the scan line pointers, and it seemed to work. I played around a little bit with both image32 and graphics32 but didn’t get very far. The advantage of skia over these other two is that it’s completely cross-platform using the same API and seems to generate very smooth antialiased curves,

Share this post


Link to post
3 hours ago, hsauro said:

The advantage of skia over these other two is that it’s completely cross-platform

Hi hsauro. Image32 should be completely cross-platform, so I'd be very interested if you've encountered problems in that regard.

 

6 hours ago, M.Joos said:

My guess: Graphics32 would even outperform this.

I also doubt this since the drawing is done by directly addressing (ie coloring) every individual pixel (not by using a polygon renderer). There'd be little to no benefit in using another graphics library here. You could perhaps marginally improve pixel addressing by converting the temporary bitmap into a pf32bit pixelformat, getting the base image address (from bitmap.Scanline[bitmap.Height -1]) and efficiently offsetting that pointer to color everything (as per below).

 

procedure DrawMandelbrotVCL2(bmp: TBitmap; X, Y, au, bu: Double; X2, Y2: Integer);
var
  c1, c2, z1, z2, tmp: Double;
  i, j, Count, rgb: Integer;
  hue, saturation, value : double;
  fr, fg, fb : single;
  ACanvas: TCanvas;
  p, currPixel: PColor;
  bytesPerLine: integer;
begin
  bmp.PixelFormat := pf32bit;
  //nb: should probably test here for the occas. non-inverted image
  p := bmp.ScanLine[0];
  bytesPerLine := bmp.Width;
  c2 := bu;
  for i := 10 to X2 - 1 do
  begin
    c1 := au;
    currPixel := p;
    inc(currPixel, i);
    for j := 0 to Y2 - 1 do
    begin
      z1 := 0;
      z2 := 0;
      Count := 0;
      // count is deep of iteration of the mandelbrot set
      // if |z| >=2 then z is not a member of a mandelset
      while (((z1 * z1 + z2 * z2 < 4) and (Count <= 50))) do
      begin
        tmp := z1;
        z1 := z1 * z1 - z2 * z2 + c1;
        z2 := 2 * tmp * z2 + c2;
        Inc(Count);
      end;
      // The color depends on the number of iterations
      hue := count / 50;
      saturation := 0.6;
      value := 0.5;
      currPixel^ := TColor(HSLtoRGB(hue, saturation, value));
      c1 := c1 + X;
      dec(currPixel, bytesPerLine);
    end;
    c2 := c2 + Y;
  end;
end;

 

Edited by angusj

Share this post


Link to post

Hello. I adapted your test with Skia4Delphi to directly access the pixels instead of painting the pixel. The result was 45 ms on win64.

 

procedure TfrmMain.Button3Click(Sender: TObject);

  procedure DrawMandelbrotPixmap(APixmap: ISkPixmap; X, Y, au, bu: Double; X2, Y2: Integer);
  var
    c1, c2, z1, z2, tmp: Double;
    i, j, Count, rgb: Integer;
    hue, saturation, value: Double;
  begin
    c2 := bu;
    for i := 10 to X2 - 1 do
    begin
      c1 := au;
      for j := 0 to Y2 - 1 do
      begin
        z1 := 0;
        z2 := 0;
        Count := 0;
        // count is deep of iteration of the mandelbrot set
        // if |z| >=2 then z is not a member of a mandelset
        while (((z1 * z1 + z2 * z2 < 4) and (Count <= 50))) do
        begin
          tmp := z1;
          z1 := z1 * z1 - z2 * z2 + c1;
          z2 := 2 * tmp * z2 + c2;
          Inc(Count);
        end;
        // The color depends on the number of iterations
        hue := count / 50;
        saturation := 0.6;
        value := 0.5;

        PCardinal(APixmap.PixelAddr[i, j])^ := HSLtoRGB(hue, saturation, value);
        c1 := c1 + X;
      end;
      c2 := c2 + Y;
    end;
  end;

var
  au, ao: Double;
  dX, dY, bo, bu: Double;
  LWidth: Integer;
  LHeight: Integer;
  LTimer: TStopwatch;
  LBitmap: TBitmap;
  LSurface: ISkSurface;
begin
  LTimer := TStopwatch.StartNew;
  LWidth := Image1.Width;
  LHeight := Image1.Height;
  LSurface := TSkSurface.MakeRaster(LWidth, LHeight);

  LBitmap := TBitmap.Create(LWidth, LHeight);
  try
    ao := 1;
    au := -2;
    bo := 1.5;
    bu := -1.5;
    // direct scaling cause of speed
    dX := (ao - au) / (LWidth);
    dY := (bo - bu) / (LHeight);
    DrawMandelbrotPixmap(LSurface.PeekPixels, dX, dY, au, bu, LWidth, LHeight);
    LBitmap.SkiaDraw(
      procedure (const ACanvas: ISkCanvas)
      begin
        ACanvas.DrawImage(LSurface.MakeImageSnapshot, 0, 0);
      end);
    Image1.Picture.Assign(LBitmap);
  finally
    LBitmap.Free;
  end;

  Showmessage(LTimer.Elapsed.TotalMilliseconds.ToString+' ms');
end;

 

However, your benchmark isn't accurate, you're basically changing pixel by pixel, it's not a good way to measure drawing library performance.

 

Also, tasks that change an image pixel by pixel almost always perform better by creating a shader to run on the GPU. This is another advantage of Skia4Delphi, as it allows you to create shaders at runtime through the Skia Shader Language (based on GLSL). Even now I'm preparing a VCL sample of an animated shader, see the performance:

 

 

 

Edited by vfbb
  • Like 4

Share this post


Link to post
15 hours ago, angusj said:

Hi hsauro. Image32 should be completely cross-platform, so I'd be very interested if you've encountered problems in that regard.

 

I also doubt this since the drawing is done by directly addressing (ie coloring) every individual pixel (not by using a polygon renderer). There'd be little to no benefit in using another graphics library here. You could perhaps marginally improve pixel addressing by converting the temporary bitmap into a pf32bit pixelformat, getting the base image address (from bitmap.Scanline[bitmap.Height -1]) and efficiently offsetting that pointer to color everything (as per below).

 

 

For some reason I didn't realize image32 was portable, sorry about that. Thanks for the code however, because I wanted to play around a bit more with pixel manipulation with image32. 

Edited by hsauro

Share this post


Link to post
14 hours ago, vfbb said:

Hello. I adapted your test with Skia4Delphi to directly access the pixels instead of painting the pixel. The result was 45 ms on win64.

 

Thanks for the example! I was wondering if there was a direct way to get direct access to the pixels from skia4delphi. 

Share this post


Link to post
On 11/27/2021 at 10:47 AM, hsauro said:

For some reason I didn't realize image32 was portable, sorry about that. Thanks for the code however, because I wanted to play around a bit more with pixel manipulation with image32. 

@angusj I finally got round to looking more seriously Image32, and it's working well so far. In fact, I think I use this instead of GDI+ in the future. Plus as you mentioned earlier it's cross-platform. 

  • Like 1
  • Thanks 1

Share this post


Link to post
12 hours ago, hsauro said:

Someone asked that I include the timings in the code

If you're focusing on performance then you should perform the pixel loop in Y-X order instead of X-Y order.

- and replace all the constant divisions with reciprocal multiplications. E.g. x/50 = x * (1/50).

Share this post


Link to post
1 hour ago, Anders Melander said:

If you're focusing on performance

I got the impression from the OP that he was primarily interested in relative performance (ie Skia vs VCL) rather than perfectly optimised performance.

I think we've demonstrated that third-party libraries offer no performance benefit when we're only considering coloring individual pixels.

Share this post


Link to post
3 minutes ago, angusj said:

I got the impression from the OP that he was primarily interested in relative performance (ie Skia vs VCL) rather than perfectly optimised performance.

I think we've demonstrated that third-party libraries offer no performance benefit when we're only considering coloring individual pixels.

Yes, as long as all implementations set pixels via direct memory access then they should perform the same and then benchmarking is pointless. I haven't looked at the new revision to see if that is the case.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×