Jump to content
dummzeuch

When sorting a “StringList” is very costly

Recommended Posts

The following code looks innocuous but slows down a program significantly:

type
  TJCHListSortCompare = function(Item1, Item2: Integer): Integer of object;
  TCheckListBoxWithHints = class(TCheckListBox)
  private
    procedure QuickSort(L, R: Integer; SCompare: TJCHListSortCompare);

// [...]
procedure TCheckListBoxWithHints.QuickSort(L, R: Integer; SCompare: TJCHListSortCompare);
var
  I, J, P: Integer;
  tmpObj: TObject;
  tmpStr: string;
  tmpChecked: Boolean;
begin
  repeat
    I := L;
    J := R;
    P := (L + R) shr 1;
    repeat
      while SCompare(I, P) < 0 do Inc(I);
      while SCompare(J, P) > 0 do Dec(J);
      if I <= J then
      begin
        // exchange I and J
        tmpStr           := Items[I];
        tmpObj           := Items.Objects[I];
        tmpChecked       := Self.Checked[I];

        Items[I]         := Items[J];
        Items.Objects[I] := Items.Objects[J];
        Self.Checked[I]  := Self.Checked[J];

        Items[J]         := tmpStr;
        Items.Objects[J] := tmpObj;
        Self.Checked[J]  := tmpChecked;
        if P = I then
          P := J
        else if P = J then
          P := I;

        Inc(I);
        Dec(J);
      end;
    until I > J;
    if L < J then QuickSort(L, J, SCompare);
    L := I;
  until I >= R;
end;

Yes it’s Quicksort and it sorts strings in a TCheckListBox’s Items property, swapping not only the strings but also the objects and the Checked values.

Now, run this with, lets say 100 entries. That shouldn’t be any problem for Quicksort, should it? But it takes about 2 seconds on my computer which is muuuuuuch longer than I expected. The same code running on a simple TStringList takes less than 1/10 of a second. Why is that?

 

read on in the blog post

 

 

Share this post


Link to post

If the TStrings.Objects isn't used for anything then, instead of separating the strings and checked states into two list and then sort both simultaneously, you could just stuff the checked state into TStrings.Objects and then use the standard TStrings.Sort.

Or you could sort a TList<integer> of indices, using TList<>.Sort and then reassemble the sorted string list based on the result.

Share this post


Link to post
Posted (edited)

Yes @Anders Melander, you are right in all points. Unfortunately none of the options you mention was possible in this case (Delphi 6 -> no Generics, Objects already used).

 

I could also have used TStringList.Objects to store the old index and use Sort, just as you describe for TList<integer>, but that wouldn't have helped much in this particular case. The point of this post was that everything else is better than directly using the Items property and that it is not obvious why.

 

A virtual CheckListBox would have been even better, but that code is broken since like forever.

Edited by dummzeuch

Share this post


Link to post
Posted (edited)
8 minutes ago, Attila Kovacs said:

I'd suspect that you call onclick every time you set checked.

Hm, good point, I'll have to check that too.

 

Edit: No, doesn't happen.

Edited by dummzeuch

Share this post


Link to post
Posted (edited)

Oh I see, this component is just a wrapper for the windows component, so you are causing a message storm. There is not even a beginupdate/endupdate.

Edited by Attila Kovacs

Share this post


Link to post
27 minutes ago, David Heffernan said:

Paying the price for using a GUI control to hold your data, rather than to view your data. 

As I said: A virtual ChecklistBox would have been ideal, but unfortunately that mode has been broken since forever. So it's either live with that, or use a different control (ListView?). I'm still considering the latter, but I'm not sure it's worth the effort since it's fast enough now and newer IDEs already come with a similar functionality.

Share this post


Link to post
7 minutes ago, dummzeuch said:

A virtual ChecklistBox

I doubt that will be better, on contrary i think it will be slower, because almost all operations with ListBox are messages, so making it virtual will add additional message to retrieve the data.

It will be faster many folds if you read all the data then cleared the listbox then sorted after that added the data back to the list box. 

Switch to ListView or VirtualTreeView, or don't read the data from the list box, keep a copy in the memory to work with, then clear and add after sorting your copy.

Share this post


Link to post

A ListView would also require lots of messages, but from experience i know that it can easily handle thousands of lines and still be reasonable fast (we're talking about less than 200 here). A VirtualTreeview would be an option, but meaning to add yet another component which i am rather reluctant to do in this case. Another option would be a (virtual) StringGrid, I actually started down that road but decided to try and improve that sorting first. Now I've got a solution that's fast enough and works, so I'll probably leave it as is.

Share this post


Link to post

You don't need a new control. You just need not to use it to store your data. Store your data in a non visual control. 

Share this post


Link to post
9 hours ago, David Heffernan said:

You don't need a new control. You just need not to use it to store your data. Store your data in a non visual control. 

Dear David, I think you are right for the sorting issue.


Since I like to use the "Tag" or "Object" property for having an simple integer ID to each entry, beside the display function,
I'm curious how you would handle this.

Do you also use an external storage, if you just need the "Tag" for identification of the entry, but not the sorting ?

When would you use a "Tag" locally, and when would you use it externally ?

I think if the "Tag" only would exists in the ListView then it should be better hold externally, to have the data separated,
but most of the time it comes already from external sources, which were not so convenient to access or need more processing.

 

Share this post


Link to post

Quick sort is known to suffer performance-wise on already sorted lists, but with only 100 to 200 elements, that should not be the problem.

Other than that, keeping your working data outside UI controls is sound advice.

Share this post


Link to post
34 minutes ago, Lars Fosdal said:

Quick sort is known to suffer performance-wise on already sorted lists, but with only 100 to 200 elements, that should not be the problem.

Other than that, keeping your working data outside UI controls is sound advice.

These days I use Timsort for this reason. 

Share this post


Link to post
22 minutes ago, David Heffernan said:

These days I use Timsort for this reason. 

I still use Shell's Sort - but with a bit of extra work to keep the original ordering intact for identical elements.

Is there a good implementation of Timsort for Delphi to be found?

 

Share this post


Link to post
Posted (edited)
3 minutes ago, Lars Fosdal said:

Is there a good implementation of Timsort for Delphi to be found?

We've talked about this before I think. Stefan and I developed one for spring4d. You'll find it in the develop branch. 

 

As I'm sure you know, Timsort is stable. 

Edited by David Heffernan
  • Thanks 1

Share this post


Link to post

Ah, yeah... It is a bit embarrassing, but I still have not even looked at spring4d.  I really should.

I did follow it on GitHub, but never got around to actually delving into it.

Share this post


Link to post

When we all will use quantum computers - and when Windows 12 will run on quantum computers - then this kind of speed problems will be a thing of the past. 🔮

Share this post


Link to post
9 minutes ago, PeterPanettone said:

When we all will use quantum computers - and when Windows 12 will run on quantum computers - then this kind of speed problems will be a thing of the past. 🔮

Nicklaus Wirth disagrees: https://en.wikipedia.org/wiki/Wirth's_law

 

Bonus quote:

What Intel giveth, Microsoft taketh away: https://en.wikipedia.org/wiki/Andy_and_Bill's_law

  • Like 2

Share this post


Link to post
Posted (edited)
19 hours ago, dummzeuch said:

Don't know whether it's any good, but here is one for free pascal:

 

https://github.com/avk959/LGenerics

 

Unit lgMiscUtils

Impressive:

                               sorted     reverse        half      random
      QuickSort(10000000):    0.69550     0.72872    not done     1.89622
 QuickSort+(12)(10000000):    0.63999     0.65609     3.61170     1.70513
 QuickSort+(15)(10000000):    0.63134     0.66484     3.65941     1.72658
 QuickSort+(17)(10000000):    0.63200     0.65742     3.58655     1.73312
 QuickSort+(20)(10000000):    0.59678     0.62114     3.60640     1.75100
        TimSort(10000000):    0.02052     0.03282     0.11939     1.88033

     QuickSort(100000000):    8.13450     8.50642    not done    21.52633
QuickSort+(12)(100000000):    7.16910     7.47369    45.95549    19.66967
QuickSort+(15)(100000000):    7.25634     7.49350    45.59182    19.57720
QuickSort+(17)(100000000):    7.15293     7.43410    46.06242    19.69688
QuickSort+(20)(100000000):    7.26109     7.46288    46.09772    20.61769
       TimSort(100000000):    0.20487     0.33015     1.12082    23.05309

This is the result of a test sorting an array of 10 Million and 100 milliion integers on my computer. The values are the time in seconds.


QuickSort is my regular quicksort implementation

QuickSort+ is my optimized quicksort which:

  • choses a pivot by selecting the median between elements Left, Right and (Left + Right) / 2
  • uses InsertionSort for small sets, if Right - Left < Cutoff (The value in parenthesis is the cutoff.)

TimSort is the code from the link above, adjusted so it compiles with Delphi 2007 (no generics).

 

Edit: Here is the source code. Note that It currently can only sort an array of integer.

Edited by dummzeuch
  • Like 1

Share this post


Link to post
1 hour ago, dummzeuch said:

I'm not sure whether the license allows me to share that implementation

It does. You just need to keep the attribution, copyright and license notice in the source.

  • Thanks 1

Share this post


Link to post
Posted (edited)
14 hours ago, Anders Melander said:

It does. You just need to keep the attribution, copyright and license notice in the source.

I just read the Apache License (never did before) and have now added a link to the source code above.

 

Note that the current code can only sort an array of integer and is pretty rough. I'll clean it up a bit today and add more tests.

 

I also started a new thread on this: TimSort for Delphi without Generics.

Edited by dummzeuch
  • Like 1

Share this post


Link to post

I appreciate your interest.

But it still seems to me that your QuicksortPlus is not in a hurry.
Anyway, my own benchmark looks like this:

                        Sorted   	Reverse  	OrganPipe	Random
QuickSort         	0,0073   	0,0135   	0,2448   	0,7248
IntroSort         	0,0064   	0,0134   	0,2122   	0,7448
DualPivotQuickSort	0,0064   	0,0134   	0,27     	0,7356
PDQSort           	0,0064   	0,0136   	0,549    	0,388
MergeSort         	0,0066   	0,0125   	0,0362   	0,9725
RadixSort         	0,006    	0,013    	0,2692   	0,2554
TimSort           	0,0064   	0,0126   	0,044    	1,255
QuicksortPlus     	0,4123   	0,4246   	2,1564   	1,5676

for an array of 10 M integers

and 

                        Sorted   	Reverse  	OrganPipe	Random
QuickSort         	0,0643   	0,1363   	2,6703   	8,4403
IntroSort         	0,0793   	0,1557   	2,3547   	8,529
DualPivotQuickSort	0,0643   	0,1363   	2,9583   	8,4367
PDQSort           	0,0643   	0,1363   	6,2103   	4,2567
MergeSort         	0,0647   	0,1263   	0,3347   	11,114
RadixSort         	0,0597   	0,1313   	3,268    	2,4947
TimSort           	0,065    	0,126    	0,4107   	15,0483
QuicksortPlus     	5,1873   	5,3153   	30,6993  	18,7067

for an array of 100 M integers

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×