Jump to content
dummzeuch

System.Generics.Collections.TList.BinarySearch if there are duplicates

Recommended Posts

From the OLH of  System.Generics.Collections.TList.BinarySearch:

Quote

If there is more than one element in the list equal to Item, the index of the first match is returned in Index. This is the index of any of the matching items, not necessarily the first.

Is there another method / function in the RTL which does binary search on TList<T> but is guaranteed to return the match with the lowest index? (IndexOf returns it but does a linear search). Or do I really have to implement that myself?

 

(I'm trying to be more specific with my question this time, but just in case: I'm not looking for code that does this but an existing function / method or maybe a different class / generic in the RTL which I may have overlooked.)

Share this post


Link to post

I doubt, this is by definition how Binary search works. To get the lowest index you just to iterate from the matched  item and find the lowest index.

  • Like 2

Share this post


Link to post
1 minute ago, David Heffernan said:

If there is no existing function then what? You just give up. Why is it not possible to write code? 

I have tried to ask the question as precisely as possible to avoid people posting code and making suggestions on how to implement that functionality when all I want to know is if I was missing something that's already in the RTL.

 

3 minutes ago, Lajos Juhász said:

I doubt, this is by definition how Binary search works. To get the lowest index you just to iterate from the matched  item and find the lowest index.

Yes, I know that. I was asking for a function / method that "uses binary search" in contrast to e.g. linear search, not "that only uses binary search".

Share this post


Link to post

Reading the current code of Delphi 11.2 I assume the documentation is wrong. TList<T.BinarySearch just calls TArray.BinarySearch, which implements the search loop like this:

  while L <= H do
  begin
    mid := L + (H - L) shr 1;
    cmp := Comparer.Compare(Values[mid], Item);
    if cmp < 0 then
      L := mid + 1
    else if cmp > 0 then
      H := mid - 1
    else
    begin
      repeat
        Dec(mid);
      until (mid < Index) or (Comparer.Compare(Values[mid], Item) <> 0);
      FoundIndex := mid + 1;
      Exit(True);
    end;
  end;

The repeat inside the equals case searches for the lowest matching index before returning it and exit.

 

AFAIK, this was introduced with 11.0 and nobody thought of updating the docs for both methods.

Share this post


Link to post
18 minutes ago, Uwe Raabe said:

The repeat inside the equals case searches for the lowest matching index before returning it and exit.

 

AFAIK, this was introduced with 11.0 and nobody thought of updating the docs for both methods.

OK, so it depends on the Delphi version. 😞

I can confirm that the change was introduced sometime after Delphi 10.2 because there the repeat loop is missing from TArray<T>.BinarySearch.

I wonder why they changed BinarySearch rather than introducing an additional method. That change impacts performance after all, even if only marginally.

Share this post


Link to post
12 minutes ago, dummzeuch said:

That change impacts performance after all, even if only marginally.

Much more important, it changes behavior.

Share this post


Link to post
1 minute ago, Fr0sT.Brutal said:

Much more important, it changes behavior.

Which was undefined before (to be fair).

Share this post


Link to post
11 minutes ago, Fr0sT.Brutal said:

Much more important, it changes behavior.

No, it doesn't. The function still returns the index of an item that matches the search criteria.

Which one it actually returns is just an implementation detail.

Edited by dummzeuch

Share this post


Link to post
2 hours ago, Uwe Raabe said:

The repeat inside the equals case searches for the lowest matching index before returning it and exit.

 

AFAIK, this was introduced with 11.0 and nobody thought of updating the docs for both methods.

That's some weird implementation of a lower bound binary search tbh

2 hours ago, Lajos Juhász said:

I doubt, this is by definition how Binary search works. To get the lowest index you just to iterate from the matched  item and find the lowest index.

Not true - in a lower bound binary search you keep on binary searching until your range got down to 1 element - with a loop you turn O(log n) into O(n)

Edited by Stefan Glienke

Share this post


Link to post

Wait. If you have a list with two million items that all have the same value, and you search for that value, the code will do 1 million compares to find the first item? 

  • Haha 1

Share this post


Link to post
3 minutes ago, David Heffernan said:

Wait. If you have a list with two million items that all have the same value, and you search for that value, the code will do 1 million compares to find the first item? 

Yes, that seems to be what that repeat loop does.

Share this post


Link to post
25 minutes ago, David Heffernan said:

the code will do 1 million compares to find the first item

The famous Quick Fix™ technique - Sponsored by Intel and your local power company.

  • Haha 6

Share this post


Link to post
3 hours ago, Stefan Glienke said:

Not true - in a lower bound binary search you keep on binary searching until your range got down to 1 element - with a loop you turn O(log n) into O(n)

This is the definition of the binary search: https://en.wikipedia.org/wiki/Binary_search_algorithm

 

"The procedure may return any index whose element is equal to the target value, even if there are duplicate elements in the array. For example, if the array to be searched was {\displaystyle [1,2,3,4,4,5,6,7]}{\displaystyle [1,2,3,4,4,5,6,7]} and the target was {\displaystyle 4}4, then it would be correct for the algorithm to either return the 4th (index 3) or 5th (index 4) element. The regular procedure would return the 4th element (index 3) in this case. It does not always return the first duplicate (consider {\displaystyle [1,2,4,4,4,5,6,7]}{\displaystyle [1,2,4,4,4,5,6,7]} which still returns the 4th element). "

 

Share this post


Link to post

Interesting: I just googled for

"lower bound binary search" pascal

and the first hit is Stefan's message in this topic.

 

image.thumb.png.988d8e16e87c0623d64d06d2f6dc293b.png

 

Now that's fast indexing.

Share this post


Link to post

Strangely enough, TStringList.Find uses (and has been using for decades) a modified binary search algorithm that always returns the first matching item if Duplicates = dupAccept and which does not have such a repeat loop.

But the documentation does not mention that.

Share this post


Link to post

Nobody should create such a worst case list in the first place.

 

Honestly, I don't understand such a statement targeting an extreme corner case. Although I would vote for a better implementation, my lists are usually without any duplicates and therefore don't care about that, but perhaps I am special.

  • Like 1

Share this post


Link to post
4 minutes ago, Uwe Raabe said:

Nobody should create such a worst case list in the first place.

And where is this written?

Whoever made the change obviously did it to handle lists with duplicates, so it's pointless to argue that lists are usually without duplicates.

It's fine that there are corner cases that performs worse than optimal, like with quick sort, but the worst case here is a bit extreme considering that it could have been avoided with a bit of effort - or they could have just Googled the solution. There's just no excuse for the current implementation.

  • Like 4

Share this post


Link to post
16 hours ago, David Heffernan said:

Stefan is talking about a different algorithm, modified binary search algorithms know as lower or upper bound binary search. Kind of the entire point of this topic.

My point was with that Wikipedia link that modified algorithms have their names. Like in the link the variation to find the elemest with lowest index is called binary_search_leftmost and not binary_search.

 

16 hours ago, Anders Melander said:

Wikipedia (even though it references the almighty Knuth) isn't the definition". It's a definition.

That's true, but most developers expect that if an implementation reference to an algorithm it will implement that algorithm and not something else. That was the whole point to name algorithms in matemathics and computer science. That no matter from where you learned an algorithm you know how it works.

  • Like 1

Share this post


Link to post
12 hours ago, Anders Melander said:

And where is this written?

I admit the wording was chosen as a counterpart to Davids claim and should not be taken literally. In addition I also mentioned that I would vote for a better implementation.

 

It is just my personal opinion that if everybody is going to avoid the standard routines in favor of writing its own, we may never see any improvement in that area.

  • Like 2

Share this post


Link to post
3 minutes ago, Uwe Raabe said:

I admit the wording was chosen as a counterpart to Davids claim and should not be taken literally. In addition I also mentioned that I would vote for a better implementation.

 

It is just my personal opinion that if everybody is going to avoid the standard routines in favor of writing its own, we may never see any improvement in that area.

Has someone created it on QP, so that we can vote? Probably with link to https://en.wikipedia.org/wiki/Binary_search_algorithm#Procedure_for_finding_the_leftmost_element

 

Everyone usually needs better version immediatelly, while waiting for the fix can take... ehm... few years?

  • Like 1

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×