chkaufmann 17 Posted October 18, 2024 I have a question regarding memory usage for strings. When I load a big number of persons, there are many hundred with the same name. Is this handled automatically so that there will be only one copy of the string for a certain name or should I write my own code for this? Or does it depend how the strings are loaded and to what kind of variable these are assigned? I didn't find a good source to read about this topic. Any hints for that? Thanks Christian Share this post Link to post
PeterBelow 240 Posted October 18, 2024 Depends on how you store the loaded names. If these are records from a database, for instance, and you access them via a query or table component each result row will have its own memory, regardless of what it contains. With a query you have the option of removing duplicates in the SQL statement used, but only if the row does not have any other columns with non-duplicate content. If you load the names into a TStringlist you can set the list's Duplicates property to dupIgnore, it will then discard values already in the list when you try to add them. Share this post Link to post
Brandon Staggs 305 Posted October 18, 2024 Strings are copy-on-write, but the compiler doesn't do any magic to automatically de-duplicate strings. The only way you get COW efficiency in memory is when you assign one string to another. Some classes have code for deduplication, like TStringList, but you would know if you were using it and it probably doesn't apply to your situation. 1 Share this post Link to post
darnocian 93 Posted October 19, 2024 I think the official term being referenced is 'string interning'.https://en.wikipedia.org/wiki/String_interning Scenarios where it may be useful is with JSON, results from dbs, or when custom collections containing strings from file or network. Depending on how the data is pulled in, you may have a spike in memory if all f the raw data is loaded, followed by the interning process which may normalise the structures. If the data is loaded incrementally, memory utilisation should correlate to the levels of duplication you know exists in the data. There is some overhead to the interning process in terms of constantly trying to deduplicate, so it depends on your scenario. If you keep the data in memory for a longish period of time, I think it can be useful to do this, especially if you are a ware that memory utilisation is a concern in your problem space. C# has a .Intern() method on string https://learn.microsoft.com/en-us/dotnet/api/system.string.intern?view=net-8.0. Similar in Java https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/String.html#intern() As Delphi strings are reference counted, you could say that having multiple variables assigned to the same string would be a form of interning. e.g. var str := 'hello world'; var str2 := str; var str3 := str; All of the above will reference the same data. Under the hood there should be a reference count (3). A non thread-safe approach to illustrate leveraging the above property by managing a TDictionary<string,string>: var GInternedStrings : TDictionary<string,string>; function Intern(const AString:string) : string; begin if not GInternedStrings.TryGetValue(AString, Result) then begin GInternedStrings.Add(AString, AString); Result := AString; end; end; initialization GInternedStrings := TDictionary<string,string>.Create; finalization GInternedStrings.Free; Here we see a lookup being done on the pool, and result being populated with a value from the pool if it exists. If no value exists, we simply add the value too the pool, and return the original. This could be used like: for var rec in records do begin rec.str := Intern(rec.str); end; Obviously, if you decide to go multi threaded, you would need to introduce some synchronisation, which will add some performance overhead due to locking and unlocking to keep the dictionary consistent. If you don't do anything multithreaded, you can get away with not having a sync object and could even just allocate a local pool in your loading procedure to localise the lifetime of the pool. Having the global pool will mean that you will have that memory 'permanently' allocated... so however you decide to manage the pool depends on your use case. Ideally, the functionality that loads data, such as TJSONValue.ParseJSONValue() or db query libraries would offer an option to do this. However, in many cases, I suspect it may be seen as an overhead in itself if the data is loaded into memory, processed and discarded. So the lifetime of the data in the app and how it is used is relevant to the effort of doing this all over. Share this post Link to post
DelphiUdIT 202 Posted October 19, 2024 (edited) EDIT: The answer is very "basic", I don't think I fully understand what you are asking ... I think it refers to differences from other languages where an "array" is automatically created in certain cases. In Pascal every object, variable, type or other must be known and defined before use. So in your case, knowing that you will have more "strings" to use you must predict their use using for example these solutions (there can be a thousand others that can be created, but these are some of those that the language and the predefined types handle natively, except for database that is another kind of container): Here same ref: https://docwiki.embarcadero.com/RADStudio/Athens/en/Structured_Types_(Delphi)#Arrays 1) static arrays; 2) dynamic arrays; 3) Lists; 4) Collections; 5) Dictionary; 6) Generics array; 7) .... 1) Static arrays are arrays with a predefined and immutable length, they are defined at compiler time and are fine when you know how long your list is at most. //This array contains 1000 strings (i.e. for example a thousand names) var myarray: array [0..999] of string; i: integer; myarray[0] := ' Mario'; myarray[1] := ' Maria'; //loop through the entire array for i:= Low(myarray) to High(myarray) do ShowMessage(myarray[i]); 2) Dynamic arrays: they are defined at compiler time (with the number of elements predefined at 0) and can be changed at runtime. var myarray: array of string; s: string; //Creates space for a thousand strings //I can change this number whenever I need it during the program, either increasing or decreasing it SetLength(myarray, 1000); myarray[0] := ' Mario'; myarray[1] := ' Maria'; //loop through the entire array, you can also use the same syntax with for //as for the static array. for s in myarray do ShowMessage(s); ........ ........ //At the end of the use, the space must be returned by setting the length to zero SetLength(myarray, 0); 3) and 4) and 5) They are specialized "containers": https://docwiki.embarcadero.com/Libraries/Athens/en/System.Generics.Collections 6) Generics can contain any "thing", in this example I show you the use with strings: var myarray: TArray<string>; s: string; //Create space for a thousand strings //I can vary this number whenever I need it during the program, both increasing and decreasing it SetLength(myarray, 1000); myarray[0] := ' Mario'; myarray[1] := ' Maria'; //loop through the entire array, you can also use the same syntax with for //as for the static array. for s in myarray do ShowMessage(s); ....... ....... //At the end of the use, the space must be returned by setting the length to zero SetLength(myarray, 0); 7) ... you can create your own array, for example with adavanced record: type //Really simple example TAnag = record private fName: string; fAge: integer; public property Age: integer read fAge write fAge; property Name: string read fName write fName; end; var Anag: TArray<TAnag>; s: TAnag; SetLength(Anag, 1000); Anag[0].Name := 'Mario'; Anag[0].Age := 21; Anag[1].Name := 'Maria'; Anag[1].Age := 18; //loop through the entire array, you can also use the same syntax with for //as for the static array. for s in Anag do ShowMessage(s.Name); ....... ....... //At the end of the use, the space must be returned by setting the length to zero SetLength(Anag, 0); A lot of functions in standard library return an array of object, like strings, integers of doubles. In this case you can assign to an empty generic or dynamic array the result ... automatically the array will be resized. Remeber to free it after used, example; var Data: TArray<string>; Combo: TComboBox; .... .... Data := Combo.Items.ToStringArray; ..... ..... SetLength(Data, 0); Edited October 19, 2024 by DelphiUdIT Share this post Link to post
Rollo62 542 Posted October 21, 2024 (edited) What about // Key Value TDictionary<Integer,string> with Integer Key as Hash from the original string? This will allow you to handle and operate with key's and only retrieve the real, original strings whenever necessary. Edited October 21, 2024 by Rollo62 Share this post Link to post
Anders Melander 1840 Posted October 21, 2024 FWIW, I once, as an experiment, implemented string interning using a dictionary in an application that contained hundred of thousands of strings with about 75% duplicates. It was a 32-bit application which was hitting the 4Gb limit and running out of memory. Sure, it saved a few hundred megabytes but the overhead of the dictionary lookup completely killed the performance. With 64-bit and virtual memory I can't see any reason at all to do this exerciser. Share this post Link to post
Stefan Glienke 2026 Posted October 21, 2024 1 hour ago, Anders Melander said: overhead of the dictionary lookup completely killed the performance. If you were using the RTL dictionary with its abysmal performance, that does not surprise me at all. Mostly because of its poor hash function, replacing that with a better one speeds it up significantly. However, string interning does not require a dictionary with key/value but just a hashtable for strings. DelphiAST has the option to use one for its parsing - we needed that when using it in the context of the IDE to avoid spikes in memory consumption. 5 Share this post Link to post
pyscripter 703 Posted October 21, 2024 (edited) 38 minutes ago, Stefan Glienke said: DelphiAST has the option to use one for its parsing This is the unit @Stefan Glienke was referring to. 100 lines and it uses no other unit! Edited October 21, 2024 by pyscripter 3 Share this post Link to post
Anders Melander 1840 Posted October 21, 2024 1 hour ago, pyscripter said: This is the unit @Stefan Glienke was referring to. Nice and simple. Lovely! Share this post Link to post
Lars Fosdal 1798 Posted October 21, 2024 I sort of assumed that 16777619 in that code was a prime, but wanted to check... ChatGPT 4o mini said "No". So, I asked CoPilot in Norwegian - and it said "No" (Nei) On a whim, I then asked it in English LLMs - Lovely Lying Machines. 3 Share this post Link to post
Anders Melander 1840 Posted October 21, 2024 4 hours ago, Lars Fosdal said: LLMs - Lovely Lying Machines. And yet you asked it. 2 Share this post Link to post
Lars Fosdal 1798 Posted October 21, 2024 It supposedly was a trivial question. Share this post Link to post
Dalija Prasnikar 1408 Posted October 22, 2024 10 hours ago, Lars Fosdal said: It supposedly was a trivial question. AI can't even count properly. Why would it be able to know whether number is prime or not? Share this post Link to post
Lars Fosdal 1798 Posted October 22, 2024 Well, the English CoPilot knew. But, the Norwegian one did not. Share this post Link to post
Uwe Raabe 2080 Posted October 22, 2024 2 hours ago, Lars Fosdal said: Well, the English CoPilot knew. But, the Norwegian one did not. That sounds more like Quantum Mechanics than Mathematics. 4 Share this post Link to post
Sherlock 668 Posted October 22, 2024 17 hours ago, Lars Fosdal said: It supposedly was a trivial question. There are no trivial questions for LLMs...or perhaps I should rephrase that: All questions are considered equally trivial by an LLM. Unless your specific number was covered in a publication it has scanned, it will "know" nothing about that number. Hell, it might even believe it to be your grandmas name, if you should claim so. 1 Share this post Link to post