Jump to content

Mike Torrettinni

Members
  • Content Count

    1509
  • Joined

  • Last visited

  • Days Won

    3

Everything posted by Mike Torrettinni

  1. Thanks, I have a full version but the report is for the whole project and not limited in scope from the main form. Great tool, but hard to pin point specifics when results are for the full project. Was hoping for some more specific tool/way to achieve this.
  2. I was expecting that 1x iterating array would be 3x faster than 3x iterating and counting items. I was refactoring counting of items in array and created a function that counts items per item type, and this one is then called multiple times. Then I needed a case where I need to count all items, so I called this new function for each item type. And looking at the code I was assuming that multiple calls should be multiple times slower than single call. So, count each type: aQuotes := CountItems(aDocuments, dtQuote); aOrders := CountItems(aDocuments, dtOrder); aInvoices := CountItems(aDocuments, dtInvoice); // CountItems function: for i := Low(aDocuments) to High(aDocuments) do if aDocuments[i].DocType = aDocType then Inc(Result); vs for i := Low(aDocuments) to High(aDocuments) do case aDocuments[i].DocType of dtQuote: Inc(aQuotes); dtOrder: Inc(aOrders); dtInvoice: Inc(aInvoices); end; But timing it, the multiple calls are not even 2x slower than single iteration, instead of 3x slower. So, I looked into the generated asm and there is not a lot of difference between If and Case comparisons. Comparing what it does, for 10mio items: - 3calls: 30mio iterations + 30mio comparisons. - 1 iteration: 10mio iterations + (10 to 30)mio comparisons (depends which item it is) I guess this is one of those 'branch prediction, caching..' stuff I don't understand. Still, if anybody has an easy explanation why 3x iteration is not 3x slower, please share. Thanks! program Project1; {$APPTYPE CONSOLE} {$R *.res} uses System.SysUtils, System.Diagnostics; type TDocType = (dtQuote, dtOrder, dtInvoice); TDocLine = record ID: integer; DocType: TDocType; end; TDocuments = TArray<TDocLine>; const cDocs: integer = 10000000; function CountItems(const aDocuments: TDocuments; aDocType: TDocType): integer; var i: Integer; begin Result := 0; for i := Low(aDocuments) to High(aDocuments) do if aDocuments[i].DocType = aDocType then Inc(Result); end; procedure CountAllItems(aDocuments: TDocuments; var aQuotes, aOrders, aInvoices: integer); begin aQuotes := CountItems(aDocuments, dtQuote); aOrders := CountItems(aDocuments, dtOrder); aInvoices := CountItems(aDocuments, dtInvoice); end; procedure CountAllItems2(aDocuments: TDocuments; var aQuotes, aOrders, aInvoices: integer); var i: integer; begin aQuotes := 0; aOrders := 0; aInvoices := 0; for i := Low(aDocuments) to High(aDocuments) do case aDocuments[i].DocType of dtQuote: Inc(aQuotes); dtOrder: Inc(aOrders); dtInvoice: Inc(aInvoices); end; end; procedure CountAllItems3(aDocuments: TDocuments; var aQuotes, aOrders, aInvoices: integer); var i: integer; begin aQuotes := 0; aOrders := 0; aInvoices := 0; for i := Low(aDocuments) to High(aDocuments) do begin if aDocuments[i].DocType = dtQuote then Inc(aQuotes) else if aDocuments[i].DocType = dtOrder then Inc(aOrders) else if aDocuments[i].DocType = dtInvoice then Inc(aInvoices); end; end; procedure PrepData(var aDocuments: TDocuments); var i: Integer; begin aDocuments := nil; SetLength(aDocuments, cDocs); for i := 0 to cDocs - 1 do begin aDocuments[i].ID := i + 1; aDocuments[i].DocType := TDocType(Random(3)); end; end; procedure RunTests; var vDocuments: TDocuments; vSW: TStopWatch; q, o, i: integer; begin PrepData(vDocuments); vSW := TStopwatch.StartNew; CountAllItems(vDocuments, q, o, i); Writeln(Format('3x iterate: %d ms (%d %d %d)', [vSW.ElapsedMilliseconds, q, o, i])); vSW := TStopwatch.StartNew; CountAllItems2(vDocuments, q, o, i); Writeln(Format('1x iterate (case): %d ms (%d %d %d)', [vSW.ElapsedMilliseconds, q, o, i])); vSW := TStopwatch.StartNew; CountAllItems3(vDocuments, q, o, i); Writeln(Format('1x iterate (if): %d ms (%d %d %d)', [vSW.ElapsedMilliseconds, q, o, i])); end; begin Randomize; RunTests; Readln; end.
  3. Thanks, I didn't know that I can use Default for arrays. I use it for simple types and records, didn't know about arrays. Good!
  4. Interesting. I implemented like this: type TCounts = array[TDocType] of Cardinal; procedure CountAllItems6(const aDocuments: TDocuments; var counts: TCounts); var i: integer; dt: TDocType; begin for dt := Low(TDocType) to High(TDocType) do counts[dt] := 0; for i := Low(aDocuments) to High(aDocuments) do Inc(counts[aDocuments[i].DocType]); end; And results (changed to 100mio doc lines):
  5. Very clever, thanks! I can have this fast count run right after data import and then just return the result, based on requested data type. No more counting on demand!
  6. Is there a way to use TStringList to ignore duplicates and doesn't need to be sorted? I thought this would be pretty easy, but it seems is not doable, unless of course I manage checking for duplicates and so on. For example: It would be nice to have: vSL := TStringList.Create; vSL.Duplicates := dupIgnore; vSL.Sorted := false; vSL.Add('ProjectC'); vSL.Add('ProjectC'); vSL.Add('ProjectA'); vSL.Add('ProjectB'); vSL.Add('ProjectA'); vSL.Add('ProjectC'); and result would be no-duplicates, with order as they first occur: ProjectC ProjectA ProjectB Is there a way to use TStringList for this without me writing custom duplicate checking code?
  7. Yes, better. So, like this: function TUniqueStringList.Add(const aValue: string): Integer; begin if Duplicates = dupIgnore then Result := IndexOf(aValue) else Result := - 1; if (Duplicates <> dupIgnore) or (Result = - 1) then Result := inherited Add(aValue); end;
  8. Thanks, but why do you suggest I re-implement AddObjects, doesn't a simple Add like in my example work?
  9. So, Delphi was hinting that W1035 Return value of function 'TUniqueStringList.Add' might be undefined And I changed that ValueExists (ex StrExists) returns Index of existing value and all is good: function TUniqueStringList.ValueExists(const aValue: string; out aIndexOfExisting: integer): boolean; var i: Integer; begin Result := false; for i := 0 to Pred(Count) do if Self[i] = aValue then begin aIndexOfExisting := i; Exit(True); end; end; function TUniqueStringList.Add(const aValue: string): Integer; begin if (Duplicates <> dupIgnore) or (Duplicates = dupIgnore) And (Not ValueExists(aValue, Result)) then Result := inherited Add(aValue); end; Pretty impressed by Delphi to recognize the Result is updated in the call.
  10. I have a simple search for duplicated integers in array and I was just doing simple benchmark exercise, that I thought it will show that simple one-time iteration is faster than binary search ( because bin search needs to sort data first). But seems that for large arrays the sort + binary search is faster than simple iteration: First number is iteration, second number is binary search - (execution loop adjusted to not run for hours in large arrays, so don't compare timings between different records, it;s not comparable, just iteration vs bin search per records): Is there any obvious mistake in my benchmark, or is Sort + Binary Search really that faster than simple one-time iteration? In most cases, the data is like this: 1. Array initial state is numbers from 1..n 2. Then some changes occur and new numbers replace a few old ones at random places (in my test I add 10% of numbers above n) 3. At some point a few duplicates can occur *To add: binary search has to first copy array, because I need to keep the original array in original order. So, another reason why I thought using binary search would be slower. program Project1; {$APPTYPE CONSOLE} {$R *.res} uses System.SysUtils, System.StrUtils, System.Diagnostics, System.Generics.Defaults, System.Generics.Collections; type TDataRec = record IntId: integer; Duplicate: boolean; end; TCustomComparer = class(TInterfacedObject, IComparer<TDataRec>) public function Compare(const Left, Right: TDataRec): Integer; end; const cCounts = 6; cLengths: array[1..cCounts] of integer = (10,100,1000,10000,100000,1000000); cLoops: array[1..cCounts] of integer = (100000,10000, 1000, 100, 1, 1); var xComparer : TCustomComparer; function TCustomComparer.Compare(const Left, Right: TDataRec): Integer; begin Result := Left.IntId - Right.IntId; end; // Get 2 random positions: a - where duplicate will be, b - a pos where original nummber is procedure Get2RandomPositions(const aData: TArray<TDataRec>; out a, b: integer); begin // making sure they don't posint to the same position repeat a := Random(Length(aData)); b := Random(Length(aData)); until a <> b; end; procedure FillData(var aData: TArray<TDataRec>; aIdx: integer); var i, a, b : Integer; begin aData := nil; SetLength(aData, cLengths[aIdx]); // fill sequential integers for i := Low(aData) to High(aData) do aData[i].IntId := i + 1; // now add some (10%) random numbers above last number, at random pos for i := 1 to cLengths[aIdx] div 10 do aData[Random(Length(aData))].IntId := cLengths[aIdx] + Random(MaxInt); // add 2 duplicates per each div 10, at random pos for i := 1 to cLengths[aIdx] div 10 do begin Get2RandomPositions(aData, a, b); aData[a].IntId := aData[b].IntId; end; end; // search 1 by 1 procedure FindDuplicates(const aData: TArray<TDataRec>); var i, j: Integer; begin for i := Low(aData) to High(aData) - 1 do for j := i + 1 to High(aData) do if aData[i].IntId = aData[j].IntId then begin aData[i].Duplicate := True; aData[j].Duplicate := True; end; end; function ResultsAreOK(const aData1, aData2: TArray<TDataRec>): boolean; var i, j: Integer; begin for i := Low(aData1) to High(aData1) do begin for j := Low(aData2) to High(aData2) do if (aData2[j].IntId = aData1[i].IntId) and (aData2[j].Duplicate <> aData1[i].Duplicate) then Exit(false); end; Result := True; end; // Duplicate data to benchmark the identical data procedure CopyData(const aDataIn:TArray<TDataRec>; out aDataOut: TArray<TDataRec>); var i: integer; begin aDataOut := nil; SetLength(aDataOut, Length(aDataIn)); for i := Low(aDataIn) to High(aDataIn) do aDataOut[i].IntId := aDataIn[i].IntId; end; function Time1(aIdx: integer): Int64; var vData: TArray<TDataRec>; vSW: TStopWatch; i: Integer; begin FillData(vData, aIdx); vSW := TStopwatch.StartNew; for i := 1 to cLoops[aIdx] do FindDuplicates(vData); Result := vSW.ElapsedMilliseconds; end; procedure SortData(var aData:TArray<TDataRec>); begin TArray.Sort<TDataRec>(aData, xComparer); end; procedure FindDuplicates_BinSearch(var aData:TArray<TDataRec>); var i,j: Integer; begin for i := Low(aData) to High(aData) do begin if TArray.BinarySearch<TDataRec>(aData, aData[i], j, xComparer) then if i <> j then begin aData[i].Duplicate := True; aData[j].Duplicate := True; end; end; end; procedure RunTests; var vData1, vData2: TArray<TDataRec>; c, i, t1, t2: int64; vSW: TStopWatch; begin xComparer := TCustomComparer.Create; for c := 1 to cCounts do begin FillData(vData1, c); Write(cLengths[c].ToString + ' records: '); // loop array vSW := TStopwatch.StartNew; for i := 1 to cLoops[c] do FindDuplicates(vData1); t1 := vSW.ElapsedMilliseconds; // binary search vSW := TStopwatch.StartNew; for i := 1 to cLoops[c] do begin CopyData(vData1, vData2); SortData(vData2); FindDuplicates_BinSearch(vData2); end; t2 := vSW.ElapsedMilliseconds; if not ResultsAreOK(vData1, vData2) then raise Exception.Create('issue in data!'); Writeln(Format('%10d vs %d', [t1, t2])); end; end; begin Randomize; RunTests; readln; end.
  11. That's exactly why I thought binary search will be much slower, because it needs to copy data (as it would in real case, because I need to preserve order in original array) and then sort. But the copy and sort is not the big performance hog in cases of 100+ items,
  12. I thought that my iteration through array is similar to what sort does, it has to iterate and compare items for the full array. I already improved iteration, to not iterate full array for each item, but only from the item forward till the end.
  13. Good point! Although default false is OK for my needs today, but will implement it.
  14. I had a very simple implementation, I guess enough for my needs: // String List, non-duplicates, preserve order (non sorted) TUniqueStringList = class(TStringList) private function StrExists(const aString: string): boolean; public function Add(const aString: string): Integer; override; end; function TUniqueStringList.StrExists(const aString: string): boolean; var i: Integer; begin Result := false; for i := 0 to Pred(Count) do if Self[i] = aString then Exit(True); end; function TUniqueStringList.Add(const aString: string): Integer; begin if (Duplicates <> dupIgnore) or (Duplicates = dupIgnore) And (Not StrExists(aString)) then Result := inherited Add(aString); end; and it works: var vSL: TUniqueStringList; begin vSL := TUniqueStringList.Create; vSL.Duplicates := dupIgnore; vSL.Sorted := false; vSL.Add('ProjectC'); vSL.Add('ProjectC'); vSL.Add('ProjectA'); vSL.Add('ProjectB'); vSL.Add('ProjectA'); vSL.Add('ProjectC'); result:
  15. Yes, but I thought perhaps there still is a way and just wasn't documented. Or perhaps newer version (10.3, 10.4) have such implementation. I was actually almost done with my class of TList<string> that does this, but I just wanted to be sure I'm not implementing something that is ready available.
  16. I have some reports that use For loop with condition of value between 2 numbers like this: for i := 1 to n do if (Data[i].ProjId >= X) and (Data[i].ProjId <= Y) then So, to refactor and simplify code, I looked into InRange function in System.Math unit, but why is it designed this way: function InRange(const AValue, AMin, AMax: Integer): Boolean; var A, B: Boolean; begin A := (AValue >= AMin); B := (AValue <= AMax); Result := B and A; end; Why does it need 2 variables, if simple Result := (AValue >= AMin) and (AValue <= AMax); is all that is needed. So, I measured the performance difference with function without variables: function IsInRange(const AValue, AMin, AMax: Integer): Boolean; inline; begin Result := (AValue >= AMin) and (AValue <= AMax); end; And is a little faster. But they are both still 100% slower than just If statement: 32bit: and once again we see how 64bit is so much worse than 32 bit, while single line function is almost 20% faster than Math.InRange: Why would they design function with 2 variables, if they are useless? Is it just readability or am missing something obvious? and code: program Project1; {$APPTYPE CONSOLE} {$R *.res} uses System.SysUtils, System.Math, System.Diagnostics; const cLoop: integer = 10000000000; var vSW: TStopWatch; // Single line function replacing System.Math.InRange function IsInRange(const AValue, AMin, AMax: Integer): Boolean; inline; begin Result := (AValue >= AMin) and (AValue <= AMax); end; procedure Test1; var i, a, b: integer; begin a := 999; b := 999999; vSW := TStopwatch.StartNew; for i := 1 to cLoop do if System.Math.InRange(i, a, b) then; writeln('Math.InRange: '+vSW.ElapsedMilliseconds.ToString); end; procedure Test2; var i, a, b: integer; begin a := 999; b := 999999; vSW := TStopwatch.StartNew; for i := 1 to cLoop do if IsInRange(i, a, b) then; writeln('IsInRange: ' + vSW.ElapsedMilliseconds.ToString); end; procedure Test3; var i, a, b: integer; begin a := 999; b := 999999; vSW := TStopwatch.StartNew; for i := 1 to cLoop do if (i >= a) and (i<=b) then; writeln('If: ' + vSW.ElapsedMilliseconds.ToString); end; begin Test1; Test2; Test3; Readln; end.
  17. Mike Torrettinni

    Micro optimization: Math.InRange

    Interesting that all these calculations (- and * and comparison) are as fast as simple 2 value comparisons.
  18. Mike Torrettinni

    Micro optimization: Math.InRange

    Thanks, but these are some math tricks I can't figure out 😉
  19. Mike Torrettinni

    Micro optimization: Math.InRange

    It's almost 3 years old report and no comments from the support team. I assume this will no be improved for 10.5. Too bad.
  20. As we can see in Range Check Error ERangeError topic (https://en.delphipraxis.net/topic/4825-range-check-error-erangeerror/), a RegEx script could find the issue. So, I wanted to share my 3 simple scripts I run on my code every now and then to make sure I don't make mistakes: 1. To find all array increments by 1, like SetLength(array, Length(array)+1): SetLength.*\(.*Length\(.*\).*\+.*1 2. and when using High() - this is error anyway, so good to find it! SetLength.*\(.*High\(.*\).*\+.*1 3. If I forgot to add -1 when iterating array (although I use High most of the time, I still use Length() - 1 sometimes): 0 to Length\(.*\).do If anybody wants to share any scripts they use, please do!
  21. Mike Torrettinni

    List of usable RegEx for source code

    If you search for Regex: uppercase\(.*lowercase\( There was one that used these functions like this: if UpperCase(LowerCase(s)) = UppperCase(s) then which I assume has something to do with utf8 checking, but looks very odd. there are plenty of examples in open source projects where this is used to capitalize first characters, like: Value := UpperCase(Copy(Value, 1, 1)) + LowerCase(Copy(Value, 2, Length(Value))); The reversed lowercase\(.*uppercase\( was not found. Pretty good but will still make the list, probably! 🙂
  22. Mike Torrettinni

    Micro optimization: Math.InRange

    Yes, that is how my data is, in data array indexes of 1..1000 a sub-range is always continuous, 10..50, 300..400, sometimes full range 1..1000. What you are suggesting I would need to pre-process data and remember for each sub-range To and From indexes, and then I could loop just the sub-range. Right?
  23. Mike Torrettinni

    Micro optimization: Math.InRange

    Why would comparison be different, 1-based or 0-based, if 0 is already in range, then starting point is -1. No?
  24. Mike Torrettinni

    Micro optimization: Math.InRange

    I see. It's kind of hard to measure the performance, when adding additional conditions. I tried this: for i := 1 to cLoop do if i mod 3 = 0 then if System.Math.InRange(i, i+1, b) then asm nop end // every 3rd will not meat criteria else if System.Math.InRange(i, a, b) then asm nop end; And results are pretty much similar in percentages to each other, of course each test takes much longer now.
  25. Mike Torrettinni

    Micro optimization: Math.InRange

    What would be nop for 64bit, since asm is not available?
×