dummzeuch 1505 Posted December 8, 2022 From the OLH of System.Generics.Collections.TList.BinarySearch: Quote If there is more than one element in the list equal to Item, the index of the first match is returned in Index. This is the index of any of the matching items, not necessarily the first. Is there another method / function in the RTL which does binary search on TList<T> but is guaranteed to return the match with the lowest index? (IndexOf returns it but does a linear search). Or do I really have to implement that myself? (I'm trying to be more specific with my question this time, but just in case: I'm not looking for code that does this but an existing function / method or maybe a different class / generic in the RTL which I may have overlooked.) Share this post Link to post
David Heffernan 2345 Posted December 8, 2022 If there is no existing function then what? You just give up. Why is it not possible to write code? Share this post Link to post
Lajos Juhász 293 Posted December 8, 2022 I doubt, this is by definition how Binary search works. To get the lowest index you just to iterate from the matched item and find the lowest index. 2 Share this post Link to post
dummzeuch 1505 Posted December 8, 2022 1 minute ago, David Heffernan said: If there is no existing function then what? You just give up. Why is it not possible to write code? I have tried to ask the question as precisely as possible to avoid people posting code and making suggestions on how to implement that functionality when all I want to know is if I was missing something that's already in the RTL. 3 minutes ago, Lajos Juhász said: I doubt, this is by definition how Binary search works. To get the lowest index you just to iterate from the matched item and find the lowest index. Yes, I know that. I was asking for a function / method that "uses binary search" in contrast to e.g. linear search, not "that only uses binary search". Share this post Link to post
Uwe Raabe 2057 Posted December 8, 2022 Reading the current code of Delphi 11.2 I assume the documentation is wrong. TList<T.BinarySearch just calls TArray.BinarySearch, which implements the search loop like this: while L <= H do begin mid := L + (H - L) shr 1; cmp := Comparer.Compare(Values[mid], Item); if cmp < 0 then L := mid + 1 else if cmp > 0 then H := mid - 1 else begin repeat Dec(mid); until (mid < Index) or (Comparer.Compare(Values[mid], Item) <> 0); FoundIndex := mid + 1; Exit(True); end; end; The repeat inside the equals case searches for the lowest matching index before returning it and exit. AFAIK, this was introduced with 11.0 and nobody thought of updating the docs for both methods. Share this post Link to post
dummzeuch 1505 Posted December 8, 2022 18 minutes ago, Uwe Raabe said: The repeat inside the equals case searches for the lowest matching index before returning it and exit. AFAIK, this was introduced with 11.0 and nobody thought of updating the docs for both methods. OK, so it depends on the Delphi version. 😞 I can confirm that the change was introduced sometime after Delphi 10.2 because there the repeat loop is missing from TArray<T>.BinarySearch. I wonder why they changed BinarySearch rather than introducing an additional method. That change impacts performance after all, even if only marginally. Share this post Link to post
Fr0sT.Brutal 900 Posted December 8, 2022 12 minutes ago, dummzeuch said: That change impacts performance after all, even if only marginally. Much more important, it changes behavior. Share this post Link to post
Uwe Raabe 2057 Posted December 8, 2022 1 minute ago, Fr0sT.Brutal said: Much more important, it changes behavior. Which was undefined before (to be fair). Share this post Link to post
dummzeuch 1505 Posted December 8, 2022 (edited) 11 minutes ago, Fr0sT.Brutal said: Much more important, it changes behavior. No, it doesn't. The function still returns the index of an item that matches the search criteria. Which one it actually returns is just an implementation detail. Edited December 8, 2022 by dummzeuch Share this post Link to post
Stefan Glienke 2002 Posted December 8, 2022 (edited) 2 hours ago, Uwe Raabe said: The repeat inside the equals case searches for the lowest matching index before returning it and exit. AFAIK, this was introduced with 11.0 and nobody thought of updating the docs for both methods. That's some weird implementation of a lower bound binary search tbh 2 hours ago, Lajos Juhász said: I doubt, this is by definition how Binary search works. To get the lowest index you just to iterate from the matched item and find the lowest index. Not true - in a lower bound binary search you keep on binary searching until your range got down to 1 element - with a loop you turn O(log n) into O(n) Edited December 8, 2022 by Stefan Glienke Share this post Link to post
David Heffernan 2345 Posted December 8, 2022 Wait. If you have a list with two million items that all have the same value, and you search for that value, the code will do 1 million compares to find the first item? 1 Share this post Link to post
dummzeuch 1505 Posted December 8, 2022 3 minutes ago, David Heffernan said: Wait. If you have a list with two million items that all have the same value, and you search for that value, the code will do 1 million compares to find the first item? Yes, that seems to be what that repeat loop does. Share this post Link to post
Anders Melander 1782 Posted December 8, 2022 25 minutes ago, David Heffernan said: the code will do 1 million compares to find the first item The famous Quick Fix™ technique - Sponsored by Intel and your local power company. 6 Share this post Link to post
Lajos Juhász 293 Posted December 8, 2022 3 hours ago, Stefan Glienke said: Not true - in a lower bound binary search you keep on binary searching until your range got down to 1 element - with a loop you turn O(log n) into O(n) This is the definition of the binary search: https://en.wikipedia.org/wiki/Binary_search_algorithm "The procedure may return any index whose element is equal to the target value, even if there are duplicate elements in the array. For example, if the array to be searched was {\displaystyle [1,2,3,4,4,5,6,7]}{\displaystyle [1,2,3,4,4,5,6,7]} and the target was {\displaystyle 4}4, then it would be correct for the algorithm to either return the 4th (index 3) or 5th (index 4) element. The regular procedure would return the 4th element (index 3) in this case. It does not always return the first duplicate (consider {\displaystyle [1,2,4,4,4,5,6,7]}{\displaystyle [1,2,4,4,4,5,6,7]} which still returns the 4th element). " Share this post Link to post
David Heffernan 2345 Posted December 8, 2022 6 minutes ago, Lajos Juhász said: This is the definition of the binary search: https://en.wikipedia.org/wiki/Binary_search_algorithm Stefan is talking about a different algorithm, modified binary search algorithms know as lower or upper bound binary search. Kind of the entire point of this topic. Share this post Link to post
Anders Melander 1782 Posted December 8, 2022 9 minutes ago, Lajos Juhász said: This is the definition of the binary search: https://en.wikipedia.org/wiki/Binary_search_algorithm Wikipedia (even though it references the almighty Knuth) isn't the definition". It's a definition. Share this post Link to post
dummzeuch 1505 Posted December 8, 2022 Interesting: I just googled for "lower bound binary search" pascal and the first hit is Stefan's message in this topic. Now that's fast indexing. Share this post Link to post
dummzeuch 1505 Posted December 8, 2022 Strangely enough, TStringList.Find uses (and has been using for decades) a modified binary search algorithm that always returns the first matching item if Duplicates = dupAccept and which does not have such a repeat loop. But the documentation does not mention that. Share this post Link to post
David Heffernan 2345 Posted December 8, 2022 Nobody should be calling that function that makes binary search O(n) in worst case. Share this post Link to post
Uwe Raabe 2057 Posted December 8, 2022 Nobody should create such a worst case list in the first place. Honestly, I don't understand such a statement targeting an extreme corner case. Although I would vote for a better implementation, my lists are usually without any duplicates and therefore don't care about that, but perhaps I am special. 1 Share this post Link to post
Anders Melander 1782 Posted December 8, 2022 4 minutes ago, Uwe Raabe said: Nobody should create such a worst case list in the first place. And where is this written? Whoever made the change obviously did it to handle lists with duplicates, so it's pointless to argue that lists are usually without duplicates. It's fine that there are corner cases that performs worse than optimal, like with quick sort, but the worst case here is a bit extreme considering that it could have been avoided with a bit of effort - or they could have just Googled the solution. There's just no excuse for the current implementation. 4 Share this post Link to post
David Heffernan 2345 Posted December 8, 2022 This is why I avoid Embarcadero library code and write my own Share this post Link to post
Lajos Juhász 293 Posted December 9, 2022 16 hours ago, David Heffernan said: Stefan is talking about a different algorithm, modified binary search algorithms know as lower or upper bound binary search. Kind of the entire point of this topic. My point was with that Wikipedia link that modified algorithms have their names. Like in the link the variation to find the elemest with lowest index is called binary_search_leftmost and not binary_search. 16 hours ago, Anders Melander said: Wikipedia (even though it references the almighty Knuth) isn't the definition". It's a definition. That's true, but most developers expect that if an implementation reference to an algorithm it will implement that algorithm and not something else. That was the whole point to name algorithms in matemathics and computer science. That no matter from where you learned an algorithm you know how it works. 1 Share this post Link to post
Uwe Raabe 2057 Posted December 9, 2022 12 hours ago, Anders Melander said: And where is this written? I admit the wording was chosen as a counterpart to Davids claim and should not be taken literally. In addition I also mentioned that I would vote for a better implementation. It is just my personal opinion that if everybody is going to avoid the standard routines in favor of writing its own, we may never see any improvement in that area. 2 Share this post Link to post
Vandrovnik 214 Posted December 9, 2022 3 minutes ago, Uwe Raabe said: I admit the wording was chosen as a counterpart to Davids claim and should not be taken literally. In addition I also mentioned that I would vote for a better implementation. It is just my personal opinion that if everybody is going to avoid the standard routines in favor of writing its own, we may never see any improvement in that area. Has someone created it on QP, so that we can vote? Probably with link to https://en.wikipedia.org/wiki/Binary_search_algorithm#Procedure_for_finding_the_leftmost_element Everyone usually needs better version immediatelly, while waiting for the fix can take... ehm... few years? 1 Share this post Link to post