JohnLM 23 Posted November 20, 2024 I want to create or calculate a unique 8-digit ID (checksum/hash/etc) for variable-sized texts. I am not sure the ID has to be a certain limited length depending on the length of the actual text, but I would like the ID to be as small as possible. I definitely don't want a GUID. EXAMPLE: Name comment ------ -------------------- car neon. sumvac is not my favorite any more but i do like the cooler autom car neon. RESULT: ID Name comment -------- ------ ------------ AT1UW72Z car neon. 0B1UR7PY sumvac is not my favorite any more but i do like the cooler autom AT1UW72Z car neon. I want to add RESULT to a database table. Duplicates are okay since I can use a query to remove them from a query/report run. Or, I can remove them completely (at a later time) and just have unique table entries. But in general, duplicate entries are acceptable. Is there anything already included in Delphi that supports this? I will be working in XE7 and/or 12.2 IDE's. Thanks in advanced. Share this post Link to post
FPiette 387 Posted November 20, 2024 One possibility is to use a CRC32 applied to the text and then convert the resulting UInt32 to hex-ascii representation. See CRC32: https://docwiki.embarcadero.com/Libraries/Sydney/en/System.ZLib.crc32 IntToHex: https://docwiki.embarcadero.com/Libraries/Sydney/en/System.SysUtils.IntToHex Share this post Link to post
PeterBelow 239 Posted November 20, 2024 7 hours ago, JohnLM said: I want to create or calculate a unique 8-digit ID (checksum/hash/etc) for variable-sized texts. I am not sure the ID has to be a certain limited length depending on the length of the actual text, but I would like the ID to be as small as possible. I definitely don't want a GUID. EXAMPLE: Name comment ------ -------------------- car neon. sumvac is not my favorite any more but i do like the cooler autom car neon. RESULT: ID Name comment -------- ------ ------------ AT1UW72Z car neon. 0B1UR7PY sumvac is not my favorite any more but i do like the cooler autom AT1UW72Z car neon. I want to add RESULT to a database table. Duplicates are okay since I can use a query to remove them from a query/report run. Or, I can remove them completely (at a later time) and just have unique table entries. But in general, duplicate entries are acceptable. Is there anything already included in Delphi that supports this? I will be working in XE7 and/or 12.2 IDE's. Thanks in advanced. Look at the System.Hash unit, this kind of problem is usually the domain of hashing. Share this post Link to post
JohnLM 23 Posted November 20, 2024 progress update on this endeavour. . . I've been researching all about Hash and then Hash'ing a string. And I have it sort of working, but there are some issues. The way I understand it, when a hash is created it is a longword, or cardinal or other value, depending on the implementation of the function doing the hash process. So, take for instance the following: // link; https://stackoverflow.com/questions/3690608/simple-string-hashing-function function StrHash(const st:string): cardinal; var i:integer; begin result:=0; for i:=1 to length(st) do result:=result*$20844 xor byte(st[i]); end; This function returns a Cardinal value. Then, I wrote the following to help me see what is going on and how its working for my needs. I do not need any security. It is not that level. procedure TForm1.btnGenClick(Sender: TObject); var s: ansistring; c: cardinal; i: integer; hashLen: integer; begin // c := strhash(edit1.Text); m1.Lines.Add(c.ToString()); // works; // debugging purposes c := StrHash(edit1.Text); // get/calculate the hash and store it in 'c' i:=c; // convert cardinal into integer value so i can intTostr(i) hashLen := length(intTostr(i)); // get the length and store it. (But I want an even number length for the Hex output. // which is next question, "how do I do that?") *** s := inttohex(i,hashLen); // convert the hash value, I into a string m1.Lines.Add(s); // and display it--this will be the ID, the hash of the string, any length, say up to 255-chars end; With regard to the final hash ID, I would like to keep it in a specific length so that the output is clean and not croked/jaggid when I display a bunch in a report/output. So, I thinking that I need to calculate "even" numbers from the hashLen variable. So, if hashLen is 9 I want to add 1 and make it 10. If hashLen is 8, nothing to do. This is what I have so far, and although it does seem work, it has some problems/drawbacks. Share this post Link to post
JohnLM 23 Posted November 20, 2024 PS; To clarify: The string (edit1.text) can be any length, from 1-char up to 255-char in length. But the hash value, once converted to a Hex string, I want to be as short as possible, not 255-char. That is for the edit1.text. I want the hash value to be around 8 to 12 chars if possible. But if not, then I want to at least "pad" with zero's. Share this post Link to post
Anders Melander 1820 Posted November 21, 2024 6 hours ago, JohnLM said: I want to be as short as possible "as short as possible" is not a usable specification. The shortest possible key is zero characters long. The longer the key, the fewer key collisions. You need to specify the exact length you want your key to be. Also, are you really sure that you want to hex encode the key? With hex encoding you are wasting half the bits by using 8 bits (i.e. an ansichar) to represent a 4 bit value (i.e. a hex digit). A more efficient encoding would be something like Base32 (5 bits per byte) or Base64 (6 bits per byte). I believe Delphi has implementations of both. Search the source (or wait for someone here to write what they are, as I'm sure they will do). Share this post Link to post
FPiette 387 Posted November 21, 2024 8 hours ago, JohnLM said: I would like to keep it in a specific length so that the output is clean and not croked/jaggid when I display a bunch in a report/output. Why don't you use what I proposed? You will always get an 8 characters key and make use only Delphi RTL functions. Please comment on this. I proposed you use CRC32 which is a kind of hash. https://docwiki.embarcadero.com/Libraries/Sydney/en/System.ZLib.crc32 IntToHex: https://docwiki.embarcadero.com/Libraries/Sydney/en/System.SysUtils.IntToHex 1 Share this post Link to post
JohnLM 23 Posted November 21, 2024 @FPiette I did consider your suggestion, but I could not understand it. I tried a second time after your last post and I still could not understand it, how to use it, etc. I think that "buf" is throwing me off. I'm not that good at pointers. I thought buf was an array and was to hold some bytes, so I tryined to move the string into it and, well, I just don't have the skills for this. I'm an amature programmer, and this is my hobby, and I'm not some wiz-kid know-it-all. So, I'm giving it up your suggestion per your link. its 5:32am and I work the night shift (truck unloading work) and I'm exhausted. However, I found another resource for crc32 and tried that, and it seems to be working. But again, I'm exhausted. I'll play around with it much later. I need sleep. Share this post Link to post
FPiette 387 Posted November 21, 2024 3 hours ago, JohnLM said: I did consider your suggestion, but I could not understand it. I tried a second time after your last post and I still could not understand it, how to use it, etc. Here an example: program Crcr32Demo; {$APPTYPE CONSOLE} {$R *.res} uses System.SysUtils, System.ZLib; var Buf1 : AnsiString; Buf2 : String; Crc : UInt32; begin Buf1 := 'Hello World!'; Crc := System.ZLib.Crc32(0, PByte(Buf1), Length(Buf1) * Sizeof(Buf1[1])); WriteLn('AnsiString ID=', IntToHex(Crc, 8)); Buf2 := 'Hello World!'; Crc := System.ZLib.Crc32(0, PByte(Buf2), Length(Buf2) * Sizeof(Buf2[1])); WriteLn('UnicodeString ID=', IntToHex(Crc, 8)); ReadLn; end. The output is: AnsiString ID=1C291CA3 UnicodeString ID=E2106423 AnsiString and Unicode string doesn't produce the same result because character code are different (8 bit and 16 bit per character). and CRC32 work at the byte level. 2 Share this post Link to post
JohnLM 23 Posted November 22, 2024 Solved. . . Thanks @FPiette, your's confirmed what I was getting from another version I was testing, though different values, but the same outcome. Share this post Link to post