Attila Kovacs 629 Posted February 25, 2020 (edited) 14 minutes ago, Anders Melander said: better to add unsorted and then sort at the end The "Sorted" property changes the behavior how stringlist works. If it's not set, it won't yield on duplicate values (looking up would take too much time), and Thomas obviously needs this function as he tries to mimic a TDictionary. Edited February 25, 2020 by Attila Kovacs Share this post Link to post
dummzeuch 1505 Posted February 25, 2020 14 minutes ago, Anders Melander said: Okay, but setting Sorted=True and then adding to the list will keep the list sorted; It's using binary search to find the insertion point so it's not that bad, but it might be better to add unsorted and then sort at the end. It depends on the dataset. Obviously, as you pointed out yourself, I am adding duplicates, which the string list should ignore. (That actually is based on my "real world" usage of the list which I want to replace.) I know of no way to achieve that otherwise, or is there one that I don't know about? Share this post Link to post
Anders Melander 1782 Posted February 25, 2020 3 minutes ago, Attila Kovacs said: Thomas obviously needs IMO it's not that clear what he needs. dupError is set which means that duplicates should cause an exception, but the code explicitly produce duplicates - and prevents them from being added. Anyhow, I don't think I have anything to contribute on the topic that a quick google search couldn't have provided in the first place. Share this post Link to post
Anders Melander 1782 Posted February 25, 2020 7 minutes ago, dummzeuch said: I know of no way to achieve that otherwise, or is there one that I don't know about? Probably not, but either use Duplicates=dupIgnore or do the test manually. Don't do both. Share this post Link to post
Attila Kovacs 629 Posted February 25, 2020 7 minutes ago, Anders Melander said: IMO it's not that clear what he needs I must have misunderstood the topic title. My apologies. 1 Share this post Link to post
dummzeuch 1505 Posted February 25, 2020 50 minutes ago, Anders Melander said: Probably not, but either use Duplicates=dupIgnore or do the test manually. Don't do both. Good point, I added the manual test later to make THashedStringList work with the same code as a sorted TStringList. It shouldn't hurt performance though since the expensive search code is executed anyway to find the place for inserting the new string. btw: I appreciate your feedback on the testing code, even if it might not have sounded like I did. Share this post Link to post
Edwin Yip 154 Posted February 25, 2020 5 hours ago, Fr0sT.Brutal said: I believe mORMot has the fastest implementation of everything :)) There also big libs like Griijy , TheUnknownOnes etc. Check https://github.com/Fr0sT-Brutal/awesome-pascal Agreed. Check TSynDictionary from SynCommons.pas of the mORMot framework. SynCommons.pas is self-contained without any dependency. 1 Share this post Link to post
Thijs van Dien 9 Posted February 26, 2020 (edited) I was surprised by @dummzeuch's benchmark results, so ran my own. In short, I created 44350 strings by taking 8870 English words and appending 1 to 5 to them, added half of them (index odd) to the data structure, to finally check for all of them if they were present. The experiment was repeated 100 times, and the listed results are the average total time for each action: TStringList - Add: 143.44 ms - TryGet: 123.09 ms TStringHash (Size = 256) - Add: 23.58 ms - TryGet: 62.56 ms TStringHash (Size = 512) - Add: 17.09 ms - TryGet: 40.54 ms TStringHash (Size = 1024) - Add: 13.47 ms - TryGet: 29.3 ms TStringHash (Size = 2048) - Add: 11.45 ms - TryGet: 21.68 ms TStringHash (Size = 4096) - Add: 10.2 ms - TryGet: 18.41 msTStringHash (Size = 8192) - Add: 9.49 ms - TryGet: 16.28 msTStringHash (Size = 16384) - Add: 9.17 ms - TryGet: 15.21 msTStringHash (Size = 32768) - Add: 8.73 ms - TryGet: 14.24 msTStringHash (Size = 65536) - Add: 8.68 ms - TryGet: 14.03 msTDictionary - Add: 10.2 ms - TryGet: 16.09 ms TSynDictionary - Add: 9.02 ms - TryGet: 14.99 ms If you're really interested, I can send you the code, but I think these results make sense. As long as the number of buckets is somewhat appropriate, a TStringHash outperforms TDictionary even. Possibly the former has a more specialized hash function, or simply does less other work (e.g. no accounting for reallocation). Note that I did make it fair by checking for duplicates in TStringHash too. If you don't need that, it becomes even faster. Edited February 26, 2020 by Thijs van Dien 1 Share this post Link to post
timfrost 78 Posted February 26, 2020 I got similar results for loading (6.25ms) and finding (9.06ms) in a ternary search tree. And the StringHash(65536) load time for my list on my machine was 9.4ms. Looking at the StringHash source, the memory usage of both that and TST is the same order of magnitude, but the StringHash makes much more use of strings, of course. The TST meets my needs, and I have no plans to switch, mainly because my function builds in more load helpers and matching features, which fit with how I use it. 1 Share this post Link to post
Georgge Bakh 29 Posted March 1, 2020 My collections library based on pseudo-templates has a HashMap implementation. Besides of key and value types it can be parametrized with a hash function of the default one is not suitable for some reason. https://github.com/casteng/tacl 1 Share this post Link to post
Andrea Raimondi 13 Posted March 14, 2020 Yo! Use an interface and then create two different implementing classes, one for DXE+ with a TDictionary and another one using some minimal custom list. Then you compile it accordingly. Share this post Link to post