pre-generic dictionary class

dummzeuch · February 23, 2020

I think that some functionality in GExperts (and some of my other tools) could profit from switching from sorted TStringLists to a dictionary. Unfortunately I can't just use TDictionary<string, [whatever]> because I want to continue supporting Delphi versions that didn't have generics (in particular Delphi 2007). Does anybody know of a free dictionary implementation that stores pointers indexed by strings? I know of THashedStringList but it has too many gotchas to simply use it (e.g. the hashes get recalculated every time a new item was inserted and IndexOf is called, which is exactly what I would need to do). Yes, I could write one myself, but why reinvent the wheel if there already is one?

Edited February 23, 2020 by dummzeuch

David Heffernan · February 23, 2020

Expand a generic dictionary implementation manually. Just replace the TKey and TValue textually.

It would also be very valid to stop developing GExperts for pre generic delphi.

Edited February 23, 2020 by David Heffernan

Alexander Sviridenkov · February 23, 2020

http://guildalfa.ru/alsha/node/32

Article is in russian, but at the bottom there is link to implementation file,

dummzeuch · February 23, 2020

2 hours ago, David Heffernan said:

Expand a generic dictionary implementation manually. Just replace the TKey and TValue textually.

Even if that was as easy as it sounds, I doubt that this meets the definition of "free implementation".

2 hours ago, David Heffernan said:

It would also be very valid to stop developing GExperts for pre generic delphi.

Yes, this is what many people tell me. I might even drop some of them, but since I do most of my professional work with Delphi 2007 (the last pre Unicode version), dropping support for it in GExperts would not really be smart.

Edited February 23, 2020 by dummzeuch

dummzeuch · February 23, 2020

1 hour ago, Alexander Sviridenkov said:

http://guildalfa.ru/alsha/node/32

Article is in russian, but at the bottom there is link to implementation file,

What's the license of this code?

Alexander Sviridenkov · February 23, 2020

@dummzeuch written in a unit header - "Free for any use"

David Heffernan · February 23, 2020

37 minutes ago, dummzeuch said:

Even if that was as easy as it sounds, I doubt that this meets the definition of "free implementation".

It is easy. And there are open source dictionaries out there. As for whether they are compatible with the GExperts license, I've no idea.

Dave Novo · February 24, 2020

I would suggest having a look at

https://www.amazon.com/Tomes-Delphi-Algorithms-Data-Structures-ebook/dp/B007FKB0EI

It has all those basic data structures for pre generic delphi.

IIRC the code is all open source. If you can find an old version with the CD you might not even have to type anything in. I will see if I have a copy of the code in the office and will attach it here if I find it.

Thijs van Dien · February 24, 2020

Did you consider IniFiles.TStringHash? It has a fixed number of buckets, but you could compose it and implement your own reallocation logic if desired. Also, you could inherit from it to use a better hash should the default not be good enough.

Edited February 24, 2020 by Thijs van Dien

dummzeuch · February 24, 2020

2 hours ago, Thijs van Dien said:

Did you consider IniFiles.TStringHash?

Yes:

12 hours ago, dummzeuch said:

I know of THashedStringList but it has too many gotchas to simply use it

I thought of improving it too, but it probably would be more work than writing my own. Just the fact that it's based on TStrings adds overhead that I would like to avoid.

(I guess you meant THashedStringlist, because if I remember correctly, TStringHash isn't even exported.)

Edited February 24, 2020 by dummzeuch

dummzeuch · February 24, 2020

3 hours ago, Dave Novo said:

I would suggest having a look at

https://www.amazon.com/Tomes-Delphi-Algorithms-Data-Structures-ebook/dp/B007FKB0EI

It has all those basic data structures for pre generic delphi.

IIRC the code is all open source.

If I remember correctly (I bought Tomes of Delphi years ago), the source is not freely available. But it's a good idea, I'll have to check.

Uwe Raabe · February 24, 2020

4 hours ago, Dave Novo said:

I would suggest having a look at

https://www.amazon.com/Tomes-Delphi-Algorithms-Data-Structures-ebook/dp/B007FKB0EI

It has all those basic data structures for pre generic delphi.

Interestingly that was my first thought, too. Unfortunately I couldn't find any directory implementation described there.

luebbe · February 24, 2020

A github search revealed these:

- https://github.com/coolsoftware/VHashedStringList

- https://github.com/bodo-hugo-barwich/hash-lists

I didn't take a closer look. Just dumping the links here.

Edited February 24, 2020 by luebbe

timfrost · February 24, 2020

I have never liked the idea of a hash, and I use an implementation of a ternary seach tree as a dictionary. I based it on an article in Dr Dobb's Journal in 1998, and it has stood the test of time!

I have versions for C and Pascal; the latter for AnsiChar, WidwChar 32-bit and WideChar 64-bit keys. It is incredibly fast and easy to use, and has reasonable memory consumption, unless of course you load a million random 100-byte keys (which takes only a second or so). It has various ways to load and look up keys, and to traverse the set; values can be integer, object or string. I use it it in many different ways, both to match a full key and to find the longest initial partial key. On updating a value, the space used by the old value is not recovered, but that suits most of my uses as a dictionary; and the unit does include a function to rebuild the data store for a dynamic application, if you can spare a second to do so. In its simplest form:

tree := tTSTtree.Create;
tree.insert(key, value);
q := tree.match(key);
if (q <> nil) then s := string(pchar(q));

The source is reasonably well commented, and I would be happy let you have a copy, but I have never got around to properly documenting it and uploading it somewhere.

David Heffernan · February 24, 2020

TST, like binary tree, requires there to be an ordering of the key values?

dummzeuch · February 24, 2020

4 hours ago, luebbe said:

A github search revealed these:

- https://github.com/coolsoftware/VHashedStringList

- https://github.com/bodo-hugo-barwich/hash-lists

I didn't take a closer look. Just dumping the links here.

Both look interesting, thanks. Now, I just need to find the time to run timing tests. 😉

dummzeuch · February 24, 2020

2 hours ago, timfrost said:

I have never liked the idea of a hash, and I use an implementation of a ternary seach tree as a dictionary. I based it on an article in Dr Dobb's Journal in 1998, and it has stood the test of time!

The source is reasonably well commented, and I would be happy let you have a copy, but I have never got around to properly documenting it and uploading it somewhere.

Would you please be so kind and send it to me via private message? (Unless of course you prefer to attach it publicly to this thread.)

Angus Robertson · February 24, 2020

ICS includes a unit OverbyteIcsAvlTrees.pas written by Arno Garrels, from the unit:

Implements a fast cache-like data storage based on two linked AVL-Trees for primary and secondary indexing. Primary key of type string has to be unique, secondary key of type TDateTime may have duplicates. AVL or balanced binary trees are extremely efficient data structures for searching data. Finding an element in 65536 requires at most 16 compares. Uses an AVL-Tree as it is described in the book "Algorithms & Data Structures", Prof. Niklaus Wirth.

No real dependencies on other ICS units.

Angus

Thijs van Dien · February 24, 2020

10 hours ago, dummzeuch said:

Yes:

I thought of improving it too, but it probably would be more work than writing my own. Just the fact that it's based on TStrings adds overhead that I would like to avoid.

(I guess you meant THashedStringlist, because if I remember correctly, TStringHash isn't even exported.)

TStringHash is a completely different class than THashedStringList and gives you exactly what you asked for: string to Integer (Pointer). I've been happily using it in D6 and it's still publicly available in D2010 (the latest version I have at hand). Sure it might not be the absolute optimal solution, but it will be a noticeable improvement over the status quo, is very understandable, and doesn't need any third party code.

Edited February 24, 2020 by Thijs van Dien

dummzeuch · February 24, 2020

36 minutes ago, Thijs van Dien said:

TStringHash is a completely different class than THashedStringList and gives you exactly what you asked for: string to Integer (Pointer). I've been happily using it in D6 and it's still publicly available in D2010 (the latest version I have at hand). Sure it might not be the absolute optimal solution, but it will be a noticeable improvement over the status quo, is very understandable, and doesn't need any third party code.

Hm, I must be doing something wrong. I can't see any improvement of a TStringHash over a sorted TStringList. Adding the same entries to a TStringHash took 3.12 seconds while it took 3.07 for a sorted TStringList. The same took 0.32 seconds for a TDictionary.

Looking them up with ValueOf took approximately the same amount of time for TStringHash and sorted TStringList, while for TDictionary it took 0.32 seconds.

I tried sizes of 1024 and 10240.

This is the timing code:

procedure TForm1.DoTiming(sl: TStringHash);
const
  CYCLES = 100;
var
  IdxCycles: Integer;
  Idx1: Integer;
  Idx2: Integer;
  Idx3: Integer;
  sw: TStopwatch;
  s: string;
  Idx: Integer;
begin
  sw := TStopwatch.StartNew;
  for IdxCycles := 1 to CYCLES do begin
    for Idx1 := Ord('A') to Ord('Z') do begin
      for Idx2 := Ord('A') to Ord('Z') do begin
        for Idx3 := Ord('A') to Ord('Z') do begin
          s := chr(Idx1) + chr(Idx2) + chr(Idx3);
          if sl.ValueOf(s) = -1 then
            sl.Add(s, Idx1 * 10000 + Idx2 * 100 + Idx3);
        end;
      end;
    end;
  end;
  sw.Stop;
  m_Output.Lines.Add('Add: ' + sw.Elapsed.ToString);

  sw.Reset;
  sw.Start;
  for IdxCycles := 1 to CYCLES do begin
    for Idx1 := Ord('A') to Ord('Z') do begin
      for Idx2 := Ord('A') to Ord('Z') do begin
        for Idx3 := Ord('A') to Ord('Z') do begin
          s := chr(Idx1) + chr(Idx2) + chr(Idx3);
          Idx := sl.ValueOf(s);
          Assert(Idx = Idx1 * 10000 + Idx2 * 100 + Idx3);
        end;
      end;
    end;
  end;
  m_Output.Lines.Add('IndexOf: ' + sw.Elapsed.ToString);
end;

The code for TStringList is very similar. It uses AddObject instead of Add and IndexOf instead of ValueOf.

procedure Tf_HashedStringListTest.DoTiming(sl: TStringList);
const
  CYCLES = 100;
var
  IdxCycles: Integer;
  Idx1: Integer;
  Idx2: Integer;
  Idx3: Integer;
  sw: TStopwatch;
  s: string;
  Idx: Integer;
begin
  sl.Sorted := True;
  sl.CaseSensitive := True;
  sl.Duplicates := dupError;
  sw := TStopwatch.StartNew;
  sl.BeginUpdate;
  for IdxCycles := 1 to CYCLES do begin
    for Idx1 := Ord('A') to Ord('Z') do begin
      for Idx2 := Ord('A') to Ord('Z') do begin
        for Idx3 := Ord('A') to Ord('Z') do begin
          s := chr(Idx1) + chr(Idx2) + chr(Idx3);
          if sl.IndexOf(s) = -1 then
            sl.AddObject(s, Pointer(Idx1 * 10000 + Idx2 * 100 + Idx3));
        end;
      end;
    end;
  end;
  sl.EndUpdate;
  sw.Stop;
  m_Output.Lines.Add(sl.Count.ToString + ': Add: ' + sw.Elapsed.ToString);

  sw.Reset;
  sw.Start;
  for IdxCycles := 1 to CYCLES do begin
    for Idx1 := Ord('A') to Ord('Z') do begin
      for Idx2 := Ord('A') to Ord('Z') do begin
        for Idx3 := Ord('A') to Ord('Z') do begin
          s := chr(Idx1) + chr(Idx2) + chr(Idx3);
          sl.IndexOf(s);
        end;
      end;
    end;
  end;
  m_Output.Lines.Add(sl.Count.ToString + ': IndexOf: ' + sw.Elapsed.ToString);

And the one for the TDictionary just calls TryGetValue instead of IndexOf and Add instead of AddObject.

I guess it has something to do with the limited length of the strings. Or I am overlooking something else entirely (it has been a long day and I'm not going to test the other solutions right now).

Anders Melander · February 24, 2020

1 hour ago, dummzeuch said:

Or I am overlooking something else entirely

It's a bit hard to see what exactly it is you're benchmarking with that code.

Your first loop is supposedly meant to measure Add performance, but it also contains a Lookup. And after the first iteration of the outer loop it just produces duplicates, but then skips all those duplicates.

Also your test is too synthetic. Use realistic data instead. I'm guessing your actual data isn't all possible three character strings with perfect distribution...

And is sorting the stringlist after each insert really a realistic scenario?

If you're really that interested in the performance of your dictionary I suggest you profile with real data instead of this guesswork.

Fr0sT.Brutal · February 25, 2020

I believe mORMot has the fastest implementation of everything :))

There also big libs like Griijy , TheUnknownOnes etc. Check https://github.com/Fr0sT-Brutal/awesome-pascal

Edited February 25, 2020 by Fr0sT.Brutal

dummzeuch · February 25, 2020

18 minutes ago, Fr0sT.Brutal said:

I believe mORMot has the fastest implementation of everything :))

There also big libs like Griijy , TheUnknownOnes etc. Check https://github.com/Fr0sT-Brutal/awesome-pascal

Hey, I just noted that my dzlib is missing from that list. 😉

Thanks for the hints.

dummzeuch · February 25, 2020

15 hours ago, Anders Melander said:

Your first loop is supposedly meant to measure Add performance, but it also contains a Lookup. And after the first iteration of the outer loop it just produces duplicates, but then skips all those duplicates.

Also your test is too synthetic. Use realistic data instead. I'm guessing your actual data isn't all possible three character strings with perfect distribution...

You are right about that this is not a good test. My goal was not to create a real performance test for it, just something easy to write that gives me an idea which possible solution to investigate further. But yes, inserting strings in ascending order is definitely going to create some bias.

15 hours ago, Anders Melander said:

And is sorting the stringlist after each insert really a realistic scenario?

Where am I sorting the string list? It's set to be sorted in the beginning and not changed. ~~Actually I didn't post the code for the string list at all, just for the hash and the dictionary.~~ (OK, i did)

Edited February 25, 2020 by dummzeuch

Anders Melander · February 25, 2020

1 hour ago, dummzeuch said:

Where am I sorting the string list?

Here:

17 hours ago, dummzeuch said:


procedure Tf_HashedStringListTest.DoTiming(sl: TStringList);
...
begin
  sl.Sorted := True;
...

1 hour ago, dummzeuch said:

It's set to be sorted in the beginning and not changed. Actually I didn't post the code for the string list at all, just for the hash and the dictionary.

Okay, but setting Sorted=True and then adding to the list will keep the list sorted; It's using binary search to find the insertion point so it's not that bad, but it might be better to add unsorted and then sort at the end. It depends on the dataset.

Sign In

pre-generic dictionary class

Recommended Posts

dummzeuch 1717

Share this post

Link to post

David Heffernan 2482

Share this post

Link to post

Alexander Sviridenkov 363

Share this post

Link to post

dummzeuch 1717

Share this post

Link to post

dummzeuch 1717

Share this post

Link to post

Alexander Sviridenkov 363

Share this post

Link to post

David Heffernan 2482

Share this post

Link to post

Dave Novo 62

Share this post

Link to post

Thijs van Dien 10

Share this post

Link to post

dummzeuch 1717

Share this post

Link to post

dummzeuch 1717

Share this post

Link to post

Uwe Raabe 2201

Share this post

Link to post

luebbe 26

Share this post

Link to post

timfrost 81

Share this post

Link to post

David Heffernan 2482

Share this post

Link to post

dummzeuch 1717

Share this post

Link to post

dummzeuch 1717

Share this post

Link to post

Angus Robertson 677

Share this post

Link to post

Thijs van Dien 10

Share this post

Link to post

dummzeuch 1717

Share this post

Link to post

Anders Melander 2114

Share this post

Link to post

Fr0sT.Brutal 904

Share this post

Link to post

dummzeuch 1717

Share this post

Link to post

dummzeuch 1717

Share this post

Link to post

Anders Melander 2114

Share this post

Link to post

Create an account or sign in to comment

Create an account