

Renate Schaaf
-
Content Count
136 -
Joined
-
Last visited
-
Days Won
5
Posts posted by Renate Schaaf
-
-
1 hour ago, Fr0sT.Brutal said:Do you mean these issues exist inside the DLL itself?
No. The functions have just not been used not quite correctly in the example procedure that encodes a file. I should have made that more clear in my post.
-
29 minutes ago, David Heffernan said:Did you submit a PR for these changes?
No. I'd like to find out if this really works in all cases, first. Then I'll have to teach myself how to do PRs :).
-
I spent some time to have Delphi interface correctly with the Lame-encoder-DLL, so I thought it a good idea to share the result, since I also could not find any good Delphi-code for this on the net.
The Lame-source comes with a rudimentary Delphi-header-file, but this has several issues, which I have tried to fix:
- The file references unnecessary stuff, preventing compilation, easy to fix.
- The encoding starts at the beginning of the wave-file, thereby encoding the header. This gives a noise at the beginning and can switch the stereo-channels. Fix: Offset the source into the data-section of the wave-file. Since this offset can vary, I have used the utility functions by Kambiz R. Khojasteh (http://www.delphiarea.com) to retrieve the necessary info using WinApi.MMSystem.
- Lame suggests writing a VBR-Header to the file, even though it's CBR, I have changed the routine accordingly. This way devices can e.g. figure out the duration of the mp3-audio more easily.
- Instead of file-handles I'm using TFileStream, that seems to speed up encoding considerably.
Usage: EncodeWavToMP3(WaveFile, MP3File, Bitrate)
WaveFile needs to be 16-bit Stereo, but that could be adjusted.
Bitrate is a constant bitrate, for example 128. Support for VBR could be added.
If you use it and find something wrong, I'd like to know 🙂
Here is the unit:
unit MP3ExportLame; interface Uses System.SysUtils, WinApi.Windows, System.Classes; type // type definitions PHBE_STREAM = ^THBE_STREAM; THBE_STREAM = LongWord; BE_ERR = LongWord; const // encoding formats BE_CONFIG_MP3 = 0; BE_CONFIG_LAME = 256; // error codes BE_ERR_SUCCESSFUL: LongWord = 0; BE_ERR_INVALID_FORMAT: LongWord = 1; BE_ERR_INVALID_FORMAT_PARAMETERS: LongWord = 2; BE_ERR_NO_MORE_HANDLES: LongWord = 3; BE_ERR_INVALID_HANDLE: LongWord = 4; // format specific variables BE_MP3_MODE_STEREO = 0; BE_MP3_MODE_DUALCHANNEL = 2; BE_MP3_MODE_MONO = 3; // other constants BE_MAX_HOMEPAGE = 256; type TMP3 = packed record dwSampleRate: LongWord; byMode: Byte; wBitRate: Word; bPrivate: LongWord; bCRC: LongWord; bCopyright: LongWord; bOriginal: LongWord; end; TLHV1 = packed record // STRUCTURE INFORMATION dwStructVersion: DWORD; dwStructSize: DWORD; // BASIC ENCODER SETTINGS dwSampleRate: DWORD; // ALLOWED SAMPLERATE VALUES DEPENDS ON dwMPEGVersion dwReSampleRate: DWORD; // DOWNSAMPLERATE, 0=ENCODER DECIDES nMode: Integer; // BE_MP3_MODE_STEREO, BE_MP3_MODE_DUALCHANNEL, BE_MP3_MODE_MONO dwBitrate: DWORD; // CBR bitrate, VBR min bitrate dwMaxBitrate: DWORD; // CBR ignored, VBR Max bitrate nQuality: Integer; // Quality setting (NORMAL,HIGH,LOW,VOICE) dwMpegVersion: DWORD; // MPEG-1 OR MPEG-2 dwPsyModel: DWORD; // FUTURE USE, SET TO 0 dwEmphasis: DWORD; // FUTURE USE, SET TO 0 // BIT STREAM SETTINGS bPrivate: LONGBOOL; // Set Private Bit (TRUE/FALSE) bCRC: LONGBOOL; // Insert CRC (TRUE/FALSE) bCopyright: LONGBOOL; // Set Copyright Bit (TRUE/FALSE) bOriginal: LONGBOOL; // Set Original Bit (TRUE/FALSE_ // VBR STUFF bWriteVBRHeader: LONGBOOL; // WRITE XING VBR HEADER (TRUE/FALSE) bEnableVBR: LONGBOOL; // USE VBR ENCODING (TRUE/FALSE) nVBRQuality: Integer; // VBR QUALITY 0..9 btReserved: array [0 .. 255] of Byte; // FUTURE USE, SET TO 0 end; TAAC = packed record dwSampleRate: LongWord; byMode: Byte; wBitRate: Word; byEncodingMethod: Byte; end; TFormat = packed record case Byte of 1: (mp3: TMP3); 2: (lhv1: TLHV1); 3: (aac: TAAC); end; TBE_Config = packed record dwConfig: LongWord; format: TFormat; end; PBE_Config = ^TBE_Config; TBE_Version = record byDLLMajorVersion: Byte; byDLLMinorVersion: Byte; byMajorVersion: Byte; byMinorVersion: Byte; byDay: Byte; byMonth: Byte; wYear: Word; zHomePage: Array [0 .. BE_MAX_HOMEPAGE + 1] of Char; end; PBE_Version = ^TBE_Version; //Headers for Lame_enc.dll (ver. 3.100) Function beInitStream(var pbeConfig: TBE_Config; var dwSample: LongWord; var dwBufferSize: LongWord; var phbeStream: THBE_STREAM): BE_ERR; cdecl; external 'Lame_enc.dll'; Function beEncodeChunk(hbeStream: THBE_STREAM; nSamples: LongWord; var pSample; var pOutput; var pdwOutput: LongWord): BE_ERR; cdecl; external 'Lame_enc.dll'; Function beDeinitStream(hbeStream: THBE_STREAM; var pOutput; var pdwOutput: LongWord): BE_ERR; cdecl; external 'Lame_enc.dll'; Function beCloseStream(hbeStream: THBE_STREAM): BE_ERR; cdecl; external 'Lame_enc.dll'; Procedure beVersion(var pbeVersion: TBE_Version); cdecl; external 'Lame_enc.dll'; // Added header for beWriteVBRHeader Procedure beWriteVBRHeader(MP3FileName: pAnsiChar); cdecl; external 'Lame_enc.dll'; Procedure EncodeWavToMP3(WaveFile, MP3File: string; BitRate: Integer); // BitRate 128 192 256 etc. implementation uses WinApi.MMSystem; { ---------------------------------------- } { The following functions retrieve the necessary info from the input-wave-file. } { Source: } { WaveUtils - Utility functions and data types } { by Kambiz R. Khojasteh } { } { kambiz@delphiarea.com } { http://www.delphiarea.com } function mmioStreamProc(lpmmIOInfo: PMMIOInfo; uMsg, lParam1, lParam2: DWORD) : LRESULT; stdcall; var Stream: TStream; begin if Assigned(lpmmIOInfo) and (lpmmIOInfo^.adwInfo[0] <> 0) then begin Stream := TStream(lpmmIOInfo^.adwInfo[0]); case uMsg of MMIOM_OPEN: begin if TObject(lpmmIOInfo^.adwInfo[0]) is TStream then begin Stream.Seek(0, SEEK_SET); lpmmIOInfo^.lDiskOffset := 0; Result := MMSYSERR_NOERROR; end else Result := -1; end; MMIOM_CLOSE: Result := MMSYSERR_NOERROR; MMIOM_SEEK: try if lParam2 = SEEK_CUR then Stream.Seek(lpmmIOInfo^.lDiskOffset, SEEK_SET); Result := Stream.Seek(lParam1, lParam2); lpmmIOInfo^.lDiskOffset := Result; except Result := -1; end; MMIOM_READ: try Stream.Seek(lpmmIOInfo^.lDiskOffset, SEEK_SET); Result := Stream.Read(Pointer(lParam1)^, lParam2); lpmmIOInfo^.lDiskOffset := Stream.Seek(0, SEEK_CUR); except Result := -1; end; MMIOM_WRITE, MMIOM_WRITEFLUSH: try Stream.Seek(lpmmIOInfo^.lDiskOffset, SEEK_SET); Result := Stream.Write(Pointer(lParam1)^, lParam2); lpmmIOInfo^.lDiskOffset := Stream.Seek(0, SEEK_CUR); except Result := -1; end else Result := MMSYSERR_NOERROR; end; end else Result := -1; end; function OpenStreamWaveAudio(Stream: TStream): HMMIO; var mmIOInfo: TMMIOINFO; begin FillChar(mmIOInfo, SizeOf(mmIOInfo), 0); mmIOInfo.pIOProc := @mmioStreamProc; mmIOInfo.adwInfo[0] := DWORD(Stream); Result := mmioOpen(nil, @mmIOInfo, MMIO_READWRITE); end; function GetWaveAudioInfo(mmIO: HMMIO; var pWaveFormat: PWaveFormatEx; var DataSize, DataOffset: DWORD): Boolean; function GetWaveFormat(const ckRIFF: TMMCKInfo): Boolean; var ckFormat: TMMCKInfo; begin Result := False; ckFormat.ckid := mmioStringToFOURCC('fmt', 0); if (mmioDescend(mmIO, @ckFormat, @ckRIFF, MMIO_FINDCHUNK) = MMSYSERR_NOERROR) and (ckFormat.cksize >= SizeOf(TWaveFormat)) then begin if ckFormat.cksize < SizeOf(TWaveFormatEx) then begin GetMem(pWaveFormat, SizeOf(TWaveFormatEx)); FillChar(pWaveFormat^, SizeOf(TWaveFormatEx), 0); end else GetMem(pWaveFormat, ckFormat.cksize); Result := (mmioRead(mmIO, pAnsiChar(pWaveFormat), ckFormat.cksize) = Integer(ckFormat.cksize)); end; end; function GetWaveData(const ckRIFF: TMMCKInfo): Boolean; var ckData: TMMCKInfo; begin Result := False; ckData.ckid := mmioStringToFOURCC('data', 0); if (mmioDescend(mmIO, @ckData, @ckRIFF, MMIO_FINDCHUNK) = MMSYSERR_NOERROR) then begin DataSize := ckData.cksize; DataOffset := ckData.dwDataOffset; Result := True; end; end; var ckRIFF: TMMCKInfo; OrgPos: Integer; begin Result := False; OrgPos := mmioSeek(mmIO, 0, SEEK_CUR); try mmioSeek(mmIO, 0, SEEK_SET); ckRIFF.fccType := mmioStringToFOURCC('WAVE', 0); if (mmioDescend(mmIO, @ckRIFF, nil, MMIO_FINDRIFF) = MMSYSERR_NOERROR) then begin pWaveFormat := nil; if GetWaveFormat(ckRIFF) and GetWaveData(ckRIFF) then Result := True else if Assigned(pWaveFormat) then ReallocMem(pWaveFormat, 0); end finally mmioSeek(mmIO, OrgPos, SEEK_SET); end; end; function GetStreamWaveAudioInfo(Stream: TStream; var pWaveFormat: PWaveFormatEx; var DataSize, DataOffset: DWORD): Boolean; var mmIO: HMMIO; begin Result := False; if Stream.Size <> 0 then begin mmIO := OpenStreamWaveAudio(Stream); if mmIO <> 0 then try Result := GetWaveAudioInfo(mmIO, pWaveFormat, DataSize, DataOffset); finally mmioClose(mmIO, MMIO_FHOPEN); end; end; end; Procedure EncodeWavToMP3(WaveFile, MP3File: string; BitRate: Integer); var beConfig: TBE_Config; dwSamples, dwSamplesMP3: LongWord; hbeStream: THBE_STREAM; error: BE_ERR; pBuffer: PSmallInt; pMP3Buffer: PByte; done: LongWord; dwWrite: LongWord; ToRead: LongWord; ToWrite: LongWord; // changed from THandle to TFileStream fs, ft: TFileStream; TotalSize: DWORD; // variables to hold the wave info necessary for encoding pWaveFormat: PWaveFormatEx; DataOffset, DataSize, InputSampleRate: DWORD; begin beConfig.dwConfig := BE_CONFIG_LAME; fs := TFileStream.Create(WaveFile, fmOpenRead or fmShareDenyWrite); ft := TFileStream.Create(MP3File, fmCreate or fmShareDenyWrite); try TotalSize := fs.Size; // obtain info from source wave file try if not GetStreamWaveAudioInfo(fs, pWaveFormat, DataSize, DataOffset) then raise Exception.Create ('Unable to obtain necessary info from wave file.'); if (pWaveFormat.nChannels <> 2) or (pWaveFormat.wBitsPerSample <> 16) then raise Exception.Create('Wave format must be 16bit Stereo.'); InputSampleRate := pWaveFormat.nSamplesPerSec; finally FreeMem(pWaveFormat); end; // Structure information beConfig.format.lhv1.dwStructVersion := 1; beConfig.format.lhv1.dwStructSize := SizeOf(beConfig); // Basic encoder setting beConfig.format.lhv1.dwSampleRate := InputSampleRate; beConfig.format.lhv1.dwReSampleRate := InputSampleRate; beConfig.format.lhv1.nMode := BE_MP3_MODE_STEREO; beConfig.format.lhv1.dwBitrate := BitRate; beConfig.format.lhv1.dwMaxBitrate := BitRate; beConfig.format.lhv1.nQuality := 4; beConfig.format.lhv1.dwMpegVersion := 1; // MPEG1 beConfig.format.lhv1.dwPsyModel := 0; beConfig.format.lhv1.dwEmphasis := 0; // Bit Stream Settings beConfig.format.lhv1.bPrivate := False; beConfig.format.lhv1.bCRC := True; beConfig.format.lhv1.bCopyright := True; beConfig.format.lhv1.bOriginal := True; // VBR Stuff // Have it write a VBRHeader, as recommended by Lame, even though it's CBR beConfig.format.lhv1.bWriteVBRHeader := True; beConfig.format.lhv1.bEnableVBR := False; beConfig.format.lhv1.nVBRQuality := 0; error := beInitStream(beConfig, dwSamples, dwSamplesMP3, hbeStream); if error = BE_ERR_SUCCESSFUL then begin pBuffer := AllocMem(dwSamples * 2); pMP3Buffer := AllocMem(dwSamplesMP3); try // Position the source file stream at the beginning of the PCM-data: done := DataOffset; fs.Seek(DataOffset, soFromBeginning); While (done < TotalSize) do begin if (done + dwSamples * 2 < TotalSize) then ToRead := dwSamples * 2 else begin ToRead := TotalSize - done; FillChar(pBuffer^, dwSamples * 2, 0); end; fs.Read(pBuffer^, ToRead); error := beEncodeChunk(hbeStream, ToRead div 2, pBuffer^, pMP3Buffer^, ToWrite); if error <> BE_ERR_SUCCESSFUL then begin beCloseStream(hbeStream); raise Exception.Create('Encoding Error'); end; ft.Write(pMP3Buffer^, ToWrite); done := done + ToRead; end; error := beDeinitStream(hbeStream, pMP3Buffer^, dwWrite); if error <> BE_ERR_SUCCESSFUL then begin beCloseStream(hbeStream); raise Exception.Create('Close Error'); end; if dwWrite <> 0 then begin ft.Write(pMP3Buffer^, dwWrite); end; error := beCloseStream(hbeStream); if error <> BE_ERR_SUCCESSFUL then begin raise Exception.Create('Close Error'); end; finally FreeMem(pBuffer); FreeMem(pMP3Buffer); end; end else begin Raise Exception.Create('InitStream failure'); end; finally fs.free; ft.free; end; beWriteVBRHeader(pAnsiChar(AnsiString(MP3File))); end; end.
-
1
-
1
-
1 hour ago, Remy Lebeau said:I generally prefer to use a local accessor class instead:
What's the advantage? When do you use a class helper and when do you use a local accessor class, to me they both seem to serve the same purpose.
-
I might have introduced a bug in GR32_Resamplers, as it is, the left bound of the source rectangle is ignored. The fix is simple:
Line 1778 needs to be
SourceColor := @Src.Bits[ClusterY[0].Pos * Src.Width+SrcRect.Left]; //+SrcRect.Left was missing!
and line 1806:
SourceColor := @Src.Bits[ClusterY[Y].Pos * Src.Width+SrcRect.Left];//+SrcRect.Left was missing!
Hope, you read this, Anders. If I don't hear from you, I'll create an issue on GitHub.
Edit: I definitely intoduced it by changing the order of the loops, I checked against an old version. Instead of
+SrcRect.Left
one should probably use
+MapXLoPos
Renate
-
1
-
-
55 minutes ago, Anders Melander said:Please verify that the comments I've added in the source are correct
Correct and very clear.
I like the introduction of the MappingTablePrecicion.. constants.
-
5 hours ago, Anders Melander said:The [10:22] must mean that you have multiplied two [21:11] values at some point but I can't really spot where that happens.
One weight in x-direction times another one in y-direction. Thanks for the explanation, now I understand the notation.
5 hours ago, Anders Melander said:I don't understand what this refers to
That my posted "correction" can be safely ignored :).
5 hours ago, Anders Melander said:Ah, now I get it. It was your comments that confused me. For example:
Sometimes I'm making it too complicated, sorry.
-
2 hours ago, Anders Melander said:You say that you're "computing the integral for the convolution with the filter using the midpoint-rule" so I would have expected to see the averaging of two filter values in order to find the midpoint but I can't really match that with the above. Can you explain what's going on, please?
No, that would be the trapezoidal rule, and that is just as bad as using the antiderivatives.
Midpoint rule:
integral from x1 to x2 f(x) dx is appoximately f(0.5*(x1+x2))*(x2-x1).
The multiplications by oldscale transform this from the scale of the source (pixelwidth 1) to the scale of the destination ("pixelwidth" NewWidth/OldWidth).
If you want to know why using the integral is a good way of thinking about the algorithm, there's a little article in the doc folder. I couldn't explain it any better here.
2 hours ago, Anders Melander said:Also, am I correct in assuming that you're doing calculation in [21:11] fixed precision and storing the weights in [10:22] instead of the old [24:8] and [16:16]?
I'm not sure what you're asking, could you do some explaining back? For the pre-mult I'm using precision $100, for the others $800. Is that what you are asking?
BTW, the check x2>x1 isn't necessary, Filter(x3) would be zero if not true.
-
14 hours ago, Anders Melander said:One thing that almost gave me brain cancer was the abysmal ASM generated by this code (not your fault):
I was hoping for you to untwiddle this 🙂
Meanwhile I found the reason for the box-kernel not being up to par, it's here:
function TBoxKernel.GetWidth: TFloat; begin Result := 1; //must be 0.5! end;
I also spotted a mistake in my code. It could be that the interval [x0-0.5,x0+0.5] is completely outside of the support of the filter. In this case a false non-zero weight would be generated. So a check of x2>x1 needs to be added:
... for J := Left to Right do begin x0 := J - Center; // previous weight: Filter(x0*Oldscale)*Oldscale x1 := max(x0 - 0.5, -FilterWidth); x2 := min(x0 + 0.5, FilterWidth); // intersect symmetric interval of length 1 about x0 with support of scaled filter if (x2 > x1) then begin x3 := 0.5 * (x2 + x1); // new center Weight := Round(Prec * Filter(x3 * OldScale) * OldScale * (x2 - x1)); // intersection with support entered into the weight if Weight <> 0 then begin Inc(Count, Weight); K := Length(Result[I]); SetLength(Result[I], K + 1); Result[I][K].Pos := Constrain(J, SrcLo, SrcHi - 1); Result[I][K].Weight := Weight; end; end; end;
Also, at the analogous place for the case scale>1. The code for the 2 cases could be unified, but it's better to understand as it is.
-
I've managed to translate the alternative computation of weights into Graphics32. It was actually quite easy :). The idea is, to compute the intergral for the convolution with the filter via the midpoint-rule. Before I've used the exact antiderivatives, leading to constant underestimations of peaks and valleys in the bitmap function, and thus to a loss of detail. Now pixels not lying totally within the support of the filter get their weight reduced, leading to less artefacts, but the peaks are better estimated, so contrast and detail is better preserved (the math is for readability):
//Precision of weights, //Totals Cb,Cg,Cr,Ca in Resample need to be unscaled by Prec * Prec const Prec = $800; function BuildMappingTableNew(DstLo, DstHi: Integer; ClipLo, ClipHi: Integer; SrcLo, SrcHi: Integer; Kernel: TCustomKernel): TMappingTable; var ... begin ... else if Scale < 1 then begin OldScale := Scale; Scale := 1 / Scale; FilterWidth := FilterWidth * Scale; for I := 0 to ClipW - 1 do begin if FullEdge then Center := SrcLo - 0.5 + (I - DstLo + ClipLo + 0.5) * Scale else Center := SrcLo + (I - DstLo + ClipLo) * Scale; Left := Floor(Center - FilterWidth); Right := Ceil(Center + FilterWidth); Count := -Prec; for J := Left to Right do begin //changed part x0 := J - Center; // old weight: Filter(x0*Oldscale)*Oldscale x1 := max(x0 - 0.5, -FilterWidth); x2 := min(x0 + 0.5, FilterWidth); // intersect symmetric interval of length 1 about x0 with support of scaled filter x3 := 0.5 * (x2 + x1); // new center Weight := Round(Prec * Filter(x3 * OldScale) * OldScale * (x2 - x1)); // intersection with support entered into the weight if Weight <> 0 then begin Inc(Count, Weight); K := Length(Result[I]); SetLength(Result[I], K + 1); Result[I][K].Pos := Constrain(J, SrcLo, SrcHi - 1); Result[I][K].Weight := Weight; end; end; ...
At first the results were getting too dark and contrast was increased. By increasing the accuracy of the weights and using my own way of rounding the averaged result into bytes, this seems no longer the case:
If RangeCheck then begin C.B := min((max(Cb, 0) + $1FFFFF) shr 22, 255); //unscale and round C.G := min((max(Cg, 0) + $1FFFFF) shr 22, 255); C.R := min((max(Cr, 0) + $1FFFFF) shr 22, 255); C.A := min((max(Ca, 0) + $1FFFFF) shr 22, 255); end else begin C.B := (Cb + $1FFFFF) shr 22; C.G := (Cg + $1FFFFF) shr 22; C.R := (Cr + $1FFFFF) shr 22; C.A := (Ca + $1FFFFF) shr 22; end; // Combine it with the background case CombineOp of dmOpaque: DstLine[I] := C.ARGB; ...
The changed file uScalingProcsGR32.pas is attached.
If you are interested in a test, here is a short video, zooms and pans have been done with the new Lanczos. The second picture is one of the most notorious in my collection.
-
1
-
-
4 hours ago, Anders Melander said:With optimization enabled, the performance of Resample() now matches, and in some cases even exceed, your algorithm
Confirmed. Nice.
4 hours ago, Anders Melander said:The performance of Resample() with the 64-bit compiler is horrible.
I've always been disappointed in the performance of 64-bit code. No idea, why mine is faster.
Meanwhile I have come up with an alternative way to compute the weights, which seems to decrease artefacts while keeping the brilliance. So far I could not translate it into Graphics32, the filters there all live on different sized intervals, wheras mine all live on [-1,1], and at this time of the day my aging brain can't deal with the math. Maybe tomorrow.
-
I'm too stupid to create a pull-request on GitHub, if I read the help for it, I don't understand the first thing about it.
Somehow I can post changes to my own repository using GitHub-Desktop, but I don't really understand what it's doing :).
So here is the changed GR32_Resamplers.pas.
7 hours ago, Anders Melander said:The quality could probably be improved to match WIC but the question is if it's worth the trouble or if the current quality is "good enough"...?
I personally think the quality is good enough, on Ultra-High DPI I can hardly see the difference between a low-quality and high-quality filter for "normal" images, glyphs are another story.
For me the filters kick in when animating pictures by zooms and pans, the artifacts then really show up.
Theoretically the quality could be improved by a higher precision of the weights, which currently run from -256 to 256. Up to $800 should be possible, which I have done for the none-premultiply-modes. But again, I have a hard time seeing a difference.
Also, again theoretically, the algorithm using antiderivates of filters should yield better results (except there isn't any in closed form for the Lanczos). But though I can see less artifacts, they decrease details, as you have seen. I've probably made some mistake in the implementation. It could be the same kind of mistake you can make by computing numerical derivatives, Small divided by Small.
Time to hit the sack.
Renate
-
1
-
-
I have been able to make the GR32-resampling as fast as mine unthreaded, by making some simple changes to the procedure GR32_Resamplers.Resample (in Implementation-part):
changing the order of X- and Y- loop in the filling of the horizontal buffer, avoiding jumps in the bitmap-memory,
using pointers to walk along the arrays,
turning on compiler-optimization for the procedure (biggest improvement)
If you want to see for yourself, in the attachment are 3 changed .pas-files that need to overwrite the corresponding ones in the Algorithms-folder under Bitmap Scaling.
Renate
-
1
-
-
Here is a new version with lots of changes (thanks, Anders):
I have changed the algorithm for the filters except box from continous- to discrete-space convolution, because the result didn't seem to look as good as expected, maybe rounding errors, further investigation needed. But I just noticed, that for my videos I need to go back to continuous space, because that gives much smoother zooms and pans.
Anyway, now the algorithm itself is more or less identical to Graphics32, so you have a completely fair comparison. I also followed Anders' idea of including a stopwatch in the signature of the test procedures that can be turned on only if it matters.More changes:
Alpha:
I've put some alpha-shenanigans into the loaded bitmaps, so you can see how the alpha-channel is handled.There now is a TAlphaCombineMode=(amIndependent, amPreMultiply, amIgnore) doing the following:
amIndependent:
The behaviour as before, all channels are sampled independently of one another.
This is the behavior of GR32 for drawmode dmOpaque.amPreMultiply:
RGB is mulitplied by alpha prior to resampling, after that nonzero alpha is "unmultiplied".
I had to sacrifice some precision for the weights and the alpha-multiplication in order to stay within integer-range.
This is the behavior of WICImage with Source.AlphaFormat=afDefined (it seems to have made the same sacrifices).
GR32 with drawmode dmBlend does pre-multiply, but doesn't unmultiply, or, rather it "post-multiplies".amIgnore:
RGB is resampled, alpha is set to 255. Faster for images not needing an alpha-channel.
This is the behavior of WICImage with Source.AlphaFormat=afIgnored.To prevent apples from being compared to oranges I have included a notice when a certain algorithm does not fully support the chosen mode.
To avoid code repetition I had to introduce a number of procedural variables, which slow things down a little bit, but it's still nice and fast.
Threads:
The <= 16 threads hardly consume any CPU-time while waiting, for the time being I want to keep them.
There is a TTask-version included in the source, it has worked so far, but TTask behaves somewhat erratically timing-wise, and I don't understand what it's doing.I'll try to write a TKernelResample-descendent (unthreaded) for Graphics32, maybe I can speed it up, let's see.
Renate
-
Now I got the pre-multiplication in place, but before I do an update, I have a question for Anders:
The pre-multiplication completely erases any RGB-Info stored at transparent pixels. So those can't be unmultiplied. I can see that this is desirable if the image contains a mask and needs to be rescaled.
The RGB-part of the scaled image looks something like this, and btw. the result for WICImage looks the same.
What puzzles me though is the fact, that Graphics32 still keeps the transparent parts of the RGB in place, so it must have some magic way to compute 0/0, which I would like to learn ...
Further looking at the source code, I cannot see any pre-multiplication done:
C := Src.Bits[X + ClusterY[Y].Pos * Src.Width]; ClustYW := ClusterY[Y].Weight; Inc(Ca, Integer(C shr 24) * ClustYW); Inc(Cr, Integer(C and $00FF0000) shr 16 * ClustYW); Inc(Cg, Integer(C and $0000FF00) shr 8 * ClustYW); Inc(Cb, Integer(C and $000000FF) * ClustYW);
The Bytes of the channels are just lumped into one cardinal, the bytes are extracted and multiplied by the weights, or am I just dense, or has this already been done to Src.Bits?
Have a nice Sunday,
Renate
-
50 minutes ago, Anders Melander said:The premultiply/unpremultiply really messes with the quality since you're operating on 8-bit values
My cache arrays are floats scaled to integer, so I can do the multiply while filling the array, and the unmultiply on the integers. Let's see.
52 minutes ago, Anders Melander said:The memory usage can be lessened by processing one channel at a time.
That's a great idea.
53 minutes ago, Anders Melander said:Of course it's wanted. Improvements are always welcome
I'll see, whether I can plug it in, but the source of Graphics32 is a bit intimidating. But then, being retired, I have all day to understand it.
-
4 hours ago, Anders Melander said:As a Graphics32 contributor I would have preferred though that you'd spent the (probably considerable) effort improving Graphics32 🙂
I can still do that it it's wanted and still makes sense (see below).
4 hours ago, Anders Melander said:You cannot process each RGBA channel independently. Instead you need to operate on premultiplied RGBA values. For example you wouldn't want an RGBA value of $00FFFFFF to have any influence on neighboring pixels. So: 1) Premultiply, 2) Resample, 3) Unpremultiply.
Oh, I need to fix that ASA
. And include the better GR32-routine.
Thanks for your feedback,
Renate
-
I have made my parallel resampling of bitmaps now as fast as I can get it. Now I would find it interesting to know, how the algorithm performs on other systems, and it would be super to get suggestions for improvement.
The procedures can be found in the unit uScale under Algorithms in the attached zip.
I have tested against Windows StretchBlt-Half_Tone, WICImage and Graphics32. On my PC (AMD Ryzen 7 1700, 8-core) I see a substantial improvement in speed.
The threads are based on TThread rather than TTask or TParallel, because I had failures using the latter two, whereas the oldfashioned threads haven't failed me ever in gazillions of runs.
If you want to compile and run the comparison project, you need to include the source folder of Graphics32 (latest version) in your search path. For convenience it is included in the zip under Algorithms. I couldn't find a way to divvy out only the units needed for resampling.
The test against Graphics32 might be slightly unfair, because of bitmaps being assigned to TBitmap32 before the processing.
Right now, the procedure itself cannot be run in concurrent threads, because the array of TThreads is a global variable, I need to change the design (ideas welcome).
There might be still room for improvement by minimizing the cache misses, but my head can't handle it at the moment.
Hope you find it useful.
Renate
-
3
-
-
57 minutes ago, Stefan Glienke said:setting freed memory to a certain pattern
Thanks, I didn't know that.
Otherwise my question (starting with why) has, as far as I am concerned, been answered by Attila, and I wouldn't want to use anymore bandwidth for it.
-
1 minute ago, David Heffernan said:No I don't think so. Follow the actual rules.
I'm doing that, unless I'm too dim to notice. I just wanted to understand why it doesn't blow up in my face .. oh well.
-
30 minutes ago, Attila Kovacs said:After the "Free" the reference is a "dangling pointer".
That's why I don't use it :).
33 minutes ago, Attila Kovacs said:Try to use explicit Free's and you don't have to keep the implementation details in your head.
Good advice.
34 minutes ago, Attila Kovacs said:Or keep it there and follow the rules.
So you are saying it is Ok (other than maintainance headaches) to keep it there on condition that one follows the rules? I'm saying this, because I remember reactions like "Shriek" when somebody posted code like this.
-
type TDoSomething = class Danger: string; constructor create; procedure DoSomething; end; implementation ... constructor TDoSomething.create; begin Danger := 'Danger!'; end; procedure TDoSomething.DoSomething; begin ShowMessage('Doing something'); Sleep(1000); ShowMessage('Done doing something'); Free; //<----- end; procedure TForm2.Button1Click(Sender: TObject); begin var DoSomething: TDoSomething := TDoSomething.create; DoSomething.DoSomething; // ShowMessage(DoSomething.Danger); end; initialization ReportMemoryLeaksOnShutDown := true; end.
I'm asking because I just noticed that I was doing something similar in my code
, only it was a bit more hidden. It has worked without complaint for quite a while, so I was wondering why. My guess is, it's because the 'Free' is the very last thing I ask the class to do. In the destructor the class is still alive, after that the memory is freed, and as long as I don't do anything like the commented ShowMessage, that should be safe or no?
I am cleaning this up, of course, but I didn't want to miss the chance of learning something, I don't understand what's going on in the Delphi-source at this point all that well.
Renate
-
Hi, Anders,
Thanks a lot for yet another great tool. Totally straightforward, fast and hasslefree to use.
I have 3 little issues with it, which I can work around, but maybe you'd like to know (newest compiled version 1.3.8055.21506):
1. On my 4K-monitor scaled 200% the ribbon text is barely readable. Minimizing the ribbon displays the text correctly when folded out.
2. Translating a rather long string using the text-editor inserts line breaks at the end of every displayed line.
Translation manager then rightly complains about mismatching line breaks.
I tried to find a setting which turns this off, but couldn't.
Using notepad for the translation and pasting the translated string into the in-place-editor cuts off the string.
3. The font of the text-editor is rather large on my display, adding to the problem in 2.
Thanks again,
Renate
-
Confirmed: Works with uProf now. Great Job.
I get the same info as with VTune: hotspot timings, processor use, stack graph.
TSaveDialog is not resizable with Styles
in VCL
Posted
You can add the following line e.g. in the OnCreate of the main form:
TStyleManager.SystemHooks := TStyleManager.SystemHooks - [shDialogs];
VCL.Themes needs to be in the uses list. The drawback is, that the style is also removed from all message dialogs. But I find it easier to make your own message boxes than your own file dialogs.