karl Jonson 0 Posted November 18, 2020 Hi, What's the best method to convert hexadecimal (e.g. F0) to binary (e.g. 11110000) ? Thank you. Share this post Link to post
David Heffernan 2345 Posted November 18, 2020 Define a simple map from the 16 hex digits to the 4 character binary strings. Iterate over each hex digit and concatenate. Share this post Link to post
Mike Torrettinni 198 Posted November 19, 2020 2 hours ago, karl Jonson said: Hi, What's the best method to convert hexadecimal (e.g. F0) to binary (e.g. 11110000) ? Thank you. This is what I use: // aHex is expected hex string of chars: 0..9, A..F function Hex2Bin(const aHex: string): string; const // Array of [hex, binary] pairs cBinArray: Array[0..15, 0..1] of string = (('0', '0000'), ('1', '0001'), ('2', '0010'), ('3', '0011'), ('4', '0100'), ('5', '0101'), ('6', '0110'), ('7', '0111'), ('8', '1000'), ('9', '1001'), ('A', '1010'), ('B', '1011'), ('C', '1100'), ('D', '1101'), ('E', '1110'), ('F', '1111')); var i: integer; x: string; begin Result:=''; // Iterate hex string for x in aHex do // For each hex char find binary result in cBinArray for i := Low(cBinArray) to High(cBinArray) do if cBinArray[i, 0] = x then begin // Concatenate binary results Result := Result + cBinArray[i, 1]; Break; end; end; Note: it expects valid Hex string input (0..9 and A..F chars), so if you need to validate if input is valid hex string, or make it UpperCase (a..f -> A..F), make necessary checks. 1 Share this post Link to post
Guest Posted November 19, 2020 (edited) NOTE: in RAD Studio 10.3.3 already exist this function in "System.Classes.pas" unit function HexToBin(Text: PWideChar; Buffer: PAnsiChar; BufSize: Integer): Integer; overload; function HexToBin(Text: PAnsiChar; Buffer: PAnsiChar; BufSize: Integer): Integer; overload; function HexToBin(Text: PWideChar; var Buffer; BufSize: Integer): Integer; overload; inline; function HexToBin(Text: PAnsiChar; var Buffer; BufSize: Integer): Integer; overload; inline; function HexToBin(Text: PWideChar; Buffer: Pointer; BufSize: Integer): Integer; overload; inline; function HexToBin(Text: PAnsiChar; Buffer: Pointer; BufSize: Integer): Integer; overload; inline; maybe some like this, using Mike concept! function fncMyHexToBin(const lHexValue: string): string; const lHexChars: array [0 .. 15] of char = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'); lBinValues: array [0 .. 15] of Ansistring = ('0000', '0001', '0010', '0011', '0100', '0101', '0110', '0111', '1000', '1001', '1010', '1011', '1100', '1101', '1110', '1111'); var lEachHexChar: char; begin Result := ''; // for lEachHexChar in lHexValue do try Result := Result + lBinValues[Pos(UpperCase(lEachHexChar), lHexChars) - 1]; except // case the "char" is not found, we have a "AV"! then.... doesnt matter for us! end; end; procedure TForm1.Button1Click(Sender: TObject); begin Memo1.Lines.Add('Hex2Binxxxxxx = ' + Hex2Bin('zFF0r0ABu11')); // chars that not allow to Hex values, will be = '' Memo1.Lines.Add('fncMyHexToBin = ' + fncMyHexToBin('zFF0r0ABu11')); // Memo1.Lines.Add('Hex2Binxxxxxx = ' + Hex2Bin('z')); Memo1.Lines.Add('fncMyHexToBin = ' + fncMyHexToBin('z')); end; hug Edited November 19, 2020 by Guest Share this post Link to post
David Heffernan 2345 Posted November 19, 2020 3 hours ago, emailx45 said: for lEachHexChar in lHexValue do try Result := Result + lBinValues[Pos(UpperCase(lEachHexChar), lHexChars) - 1]; except // case the "char" is not found, we have a "AV"! then.... doesnt matter for us! end; Ugh. You can't rely on getting an AV. Don't ever write code like this. 2 Share this post Link to post
David Heffernan 2345 Posted November 19, 2020 I'd probably write it something like this: function HexToBin(const HexValue: string): string; const BinaryValues: array [0..15] of string = ( '0000', '0001', '0010', '0011', '0100', '0101', '0110', '0111', '1000', '1001', '1010', '1011', '1100', '1101', '1110', '1111' ); var HexDigit: Char; HexDigitValue: Integer; Ptr: PChar; begin SetLength(Result, Length(HexValue) * 4); Ptr := Pointer(Result); for HexDigit in HexValue do begin case HexDigit of '0'..'9': HexDigitValue := Ord(HexDigit) - Ord('0'); 'a'..'f': HexDigitValue := 10 + Ord(HexDigit) - Ord('a'); 'A'..'F': HexDigitValue := 10 + Ord(HexDigit) - Ord('A'); else raise EConvertError.CreateFmt('Invalid hex digit ''%s'' found in ''%s''', [HexDigit, HexValue]); end; Move(Pointer(BinaryValues[HexDigitValue])^, Ptr^, 4 * SizeOf(Char)); Inc(Ptr, 4); end; end; Some notes: A case statement makes this quite readable in my view. You really don't want to be wasting time using Pos to search within a string. You can get the value directly with arithmetic. I prefer to perform just a single allocation, rather than use repeated allocations with concatenation. You might want to consider how to treat leading zeros. For instance how should you treat 0F, should that be 00001111 or 1111? I'd expect that both would be desirable in different situations, so an option in an extra argument to the function would be needed. 2 3 Share this post Link to post
Mike Torrettinni 198 Posted November 19, 2020 57 minutes ago, David Heffernan said: Move(Pointer(BinaryValues[HexDigitValue])^, Ptr^, 4 * SizeOf(Char)); @David Heffernan What does this do? Share this post Link to post
Alexander Elagin 143 Posted November 19, 2020 5 minutes ago, Mike Torrettinni said: @David Heffernan What does this do? Copies four characters from the BinaryValues constant array item at index HexDigitValue to the location pointed by Ptr. 1 Share this post Link to post
Mike Torrettinni 198 Posted November 19, 2020 3 minutes ago, Alexander Elagin said: Copies four characters from the BinaryValues constant array item at index HexDigitValue to the location pointed by Ptr. Aha, pretty neat trick. Thanks! Share this post Link to post
Guest Posted November 19, 2020 You can also optimize David's Move by replacing it with this Quote PUInt64(Ptr)^ := PUInt64(BinaryValues[HexDigitValue])^; // move 4 chars in Unicode Share this post Link to post
Mahdi Safsafi 225 Posted November 19, 2020 @David Heffernan Few remarks about your code if you don't mind : 1- Its pointless to use string when characters are fixed in size ... Simply use static array of X char. 2- Its also pointless to calculate index when you already used a case ... Simply declare your array using char-range. In your case, compiler generated additional instructions to compute the index. function HexToBin2(const HexValue: string): string; type TChar4 = array [0 .. 3] of Char; PChar4 = ^TChar4; const Table1: array ['0' .. '9'] of TChar4 = ('0000', '0001', '0010', '0011', '0100', '0101', '0110', '0111', '1000', '1001'); Table2: array ['a' .. 'f'] of TChar4 = ('1010', '1011', '1100', '1101', '1110', '1111'); var HexDigit: Char; P: PChar4; begin SetLength(Result, Length(HexValue) * 4); P := PChar4(Result); for HexDigit in HexValue do begin case HexDigit of '0' .. '9': P^ := Table1[HexDigit]; 'a' .. 'f': P^ := Table2[HexDigit]; 'A' .. 'F': P^ := Table2[Chr(Ord(HexDigit) xor $20)]; else raise EConvertError.CreateFmt('Invalid hex digit ''%s'' found in ''%s''', [HexDigit, HexValue]); end; Inc(P); end; end; 5 1 Share this post Link to post
David Heffernan 2345 Posted November 19, 2020 Holding the nibble binary text in a fixed length array is rather nice, I approve of that. Good stuff. Share this post Link to post
Guest Posted November 19, 2020 (edited) 9 hours ago, David Heffernan said: You might want to consider how to treat leading zeros. For instance how should you treat 0F, should that be 00001111 or 1111? Where is the problem with "0F" or just "F" in my function or by Mike? 9 hours ago, David Heffernan said: You really don't want to be wasting time using Pos to search within a string. You can get the value directly with arithmetic. Can you measure the time losted? look the function size (in code) by Embarcadero in RAD 10.3.3 Arch! This is readable? function HexToBin(Text: PWideChar; Buffer: PAnsiChar; BufSize: Integer): Integer; var I: Integer; b1, b2: Byte; begin I := BufSize; while I > 0 do begin if (Ord(Text[0]) > 255) or (Ord(Text[1]) > 255) then Break; b1 := H2BConvert[Ord(Text[0])]; b2 := H2BConvert[Ord(Text[1])]; if (b1 = $FF) or (b2 = $FF) then Break; Buffer[0] := AnsiChar((b1 shl 4) + b2); Inc(Buffer); Inc(Text, 2); Dec(I); end; Result := BufSize - I; end; Edited November 19, 2020 by Guest Share this post Link to post
Guest Posted November 19, 2020 (edited) 9 hours ago, David Heffernan said: Don't ever write code like this. Sorry! Don't ever answer like this! The world is rounded, for that the sun light just one side by time! This allow that others, see like is beauty of moonlight! Exception treated: function fncMyHexToBin(const lHexValue: string): string; const lHexChars: array [0 .. 15] of char = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'); lBinValues: array [0 .. 15] of Ansistring = ('0000', '0001', '0010', '0011', '0100', '0101', '0110', '0111', '1000', '1001', '1010', '1011', '1100', '1101', '1110', '1111'); var lEachHexChar: char; begin Result := ''; // for lEachHexChar in lHexValue do try Result := Result + lBinValues[Pos(UpperCase(lEachHexChar), lHexChars) - 1]; except // case the "char" is not found, we have a "AV"! then.... doesnt matter for us! // If Embarcadero use... I can too! end; end; procedure TForm1.Button1Click(Sender: TObject); begin try Memo1.Lines.Add('Hex2Binxxxxxx = ' + Hex2Bin('zFF0r0ABu11')); // chars that not allow to Hex values, will be = '' Memo1.Lines.Add('fncMyHexToBin = ' + fncMyHexToBin('zFF0r0ABu11')); // Memo1.Lines.Add('Hex2Binxxxxxx = ' + Hex2Bin('0F')); Memo1.Lines.Add('fncMyHexToBin = ' + fncMyHexToBin('0F')); // Memo1.Lines.Add('Hex2Binxxxxxx = ' + Hex2Bin('F')); Memo1.Lines.Add('fncMyHexToBin = ' + fncMyHexToBin('F')); except on E: Exception do showMessage('Exception dont treated: ' + sLineBreak + E.ClassName + sLineBreak + E.Message) end; end; Exception dont treated: hug Edited November 19, 2020 by Guest Share this post Link to post
David Heffernan 2345 Posted November 19, 2020 No guarantee that an out of bounds array access leads to an exception. You have just been unlucky that you've seen one every time you ran your code. Once again, nobody should ever write code like that. 1 Share this post Link to post
Guest Posted November 19, 2020 @emailx45 I am fan of you do what ever you like, so here a better version of yours without try..except and it is safe Quote lBinValues: array[0..16] of Ansistring = ('', '0000', '0001', '0010', '0011', '0100', '0101', '0110', '0111', '1000', '1001', '1010', '1011', '1100', '1101', '1110', '1111'); for lEachHexChar in lHexValue do Result := Result + lBinValues[Pos(UpperCase(lEachHexChar), lHexChars)]; And you also can remove UpperCase by adding the small case letters to the table. Share this post Link to post
Mahdi Safsafi 225 Posted November 19, 2020 @Kas Ob. @emailx45 Relying on AV is potentially dangerous ! Result := Result + lBinValues[Pos(UpperCase(lEachHexChar), lHexChars) - 1]; { Result = Result + Content Content = Address^ Address = @lBinValues[Pos(UpperCase(lEachHexChar), lHexChars) - 1] If pos fails => Address = lBinValues - 1 Address^ => if Address points to a valid location that has a read access then no AV ! Otherwise an AV. Result + Content => An exception may occur if content does not point to a valid location / invalid AnsiString ... otherwise no exception (HAZARD) ! } So far ... you just have been lucky because the location (lBinValues - 1) does not point to a valid Location/AnsiString. Why ? because you used an array of char before lBinValues. But remember, compilers in general can optimize/insert/remove/align/reorder things ! Here is what happens when I just simulate what I explained : const Boom: AnsiString = 'Boooom!!!'; // lBinValues - 1 function fncMyHexToBin(const lHexValue: string): string; // I just reordered constants const lBinValues: array [0 .. 15] of AnsiString = ('0000', '0001', '0010', '0011', '0100', '0101', '0110', '0111', '1000', '1001', '1010', '1011', '1100', '1101', '1110', '1111'); lHexChars: array [0 .. 15] of char = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'); var lEachHexChar: char; begin Result := ''; for lEachHexChar in lHexValue do try Result := Result + lBinValues[Pos(UpperCase(lEachHexChar), lHexChars) - 1]; except // case the "char" is not found, we have a "AV"! then.... doesnt matter for us! // If Embarcadero use... I can too! end; end; procedure test; var s: string; begin Writeln(Boom); // Just to prevent compiler from omitting Boom. s := fncMyHexToBin('123x2'); Writeln(s); // <------ Booooooommmmmm end; begin test(); readln; end. Share this post Link to post
Guest Posted November 19, 2020 Thank you Mahdi, and i can't agree more about letting exception lose and their danger and insecure bahaviour, for that i fixed it (for him!) and i think you missed that i removed the "-1" and added an empty string '' for the failed pos (=0), hence made it safe, Share this post Link to post
Mahdi Safsafi 225 Posted November 19, 2020 5 minutes ago, Kas Ob. said: Thank you Mahdi, and i can't agree more about letting exception lose and their danger and insecure bahaviour, for that i fixed it (for him!) and i think you missed that i removed the "-1" and added an empty string '' for the failed pos (=0), hence made it safe, Yep I missed that ... my bad 🙂 But still doesn't handle invalid chars. Share this post Link to post
Guest Posted November 19, 2020 1 hour ago, Mahdi Safsafi said: procedure test; var s: string; begin Writeln(Boom); // Just to prevent compiler from omitting Boom. s := fncMyHexToBin('123x2'); Writeln(s); // <------ Booooooommmmmm end; begin test(); readln; end. const Boom: Ansistring = 'Boooom!!!'; // lBinValues - 1 procedure test; var s: string; begin Form1.Memo1.Lines.Add('Boom: ' + Boom); // Just to prevent compiler from omitting Boom. s := fncMyHexToBin('123x2'); Form1.Memo1.Lines.Add('s: ' + s); // <------ Booooooommmmmm end; procedure TForm1.btnTestMahdiClick(Sender: TObject); begin test; end; initialization ReportMemoryLeaksOnShutdown := true; finalization end. XX Share this post Link to post
Mike Torrettinni 198 Posted November 20, 2020 (edited) @Mahdi Safsafi you are the winner! 🙂 I benchmarked 3 methods and results are like this: (time in ms) fncMyHexToBin (Emailx45) = 15722 = 100% Hex2Bin (Mike) = 6170 = 39% HexToBin2 (Mahdi) = 925 = 5% I assume using Pos, For loop and string concatenation kills our performance, @emailx45, while Mahdi's doesn't use any of it. Of course, credit goes also to @David Heffernan because Mahdi's function is evolution of David's example. Thanks! 🙂 Edited November 20, 2020 by Mike Torrettinni 2 Share this post Link to post
Lars Fosdal 1791 Posted November 20, 2020 I wonder how it would look in assembly if you filled the out buffer with zeros, then swapped out the 1's by going shr/shl on a 64bit register. I guess the potential gain would be eaten by the time required for stuffing the hex data into the register. Share this post Link to post
Guest Posted November 21, 2020 @Lars Fosdal Special For You ! I did it differently from what you suggested, the following is for one hex char to show the assembly, and as you wanted it, with no lookup table, the code i used to get the decimal value from Hex char is the only trick i have at mind by old habit i think, but searching the internet showed many tricks that can be utilized, few of them are branch free too. // Convert one Hex char into 4 chars representing 4 bit Binary string of HexChar // HexBuffer must point to 4 Char (8 bytes) allocated space procedure CharToBin_ASM32(HexChar: Char; HexBuffer: PChar); asm push edi mov edi,edx // Get the decimal value of one Hex Char (= half byte) movzx eax, HexChar mov ecx, 57 sub ecx, eax sar ecx, 31 and ecx, 39 neg ecx add eax, ecx add eax, - 48 // Produce 4 Chars presenting 4 bits of HexChar xor ecx,ecx mov dx,$1 test al,4 cmovne cx,dx shl ecx,16 test al,8 cmovne cx,dx add ecx,$00300030 mov [edi],ecx xor ecx,ecx test al,1 cmovne cx,dx shl ecx,16 test al,2 cmovne cx,dx add ecx,$00300030 mov [edi+4],ecx pop edi end; Branch free! , i also tried MMX instruction approach // Convert one Hex char into 4 chars representing 4 bit Binary string of HexChar // HexBuffer must point to 4 Char (8 bytes) allocated space procedure CharToBin_MMX(HexChar: Char; HexBuffer: PChar); const DEC_TO_BIN_WORD_MASK: array[0..3] of UInt16 = ($01, $02, $04, $08); DEC_TO_BIN_FF_TO_CHARONE_DISTANCE: array[0..3] of UInt16 = ($FFCF, $FFCF, $FFCF, $FFCF); asm // Get the decimal value of one Hex Char (= half byte) movzx eax, HexChar mov ecx, 57 sub ecx, eax sar ecx, 31 and ecx, 39 neg ecx add eax, ecx add eax, - 48 // Produce 4 Chars presenting 4 bits of HexChar movd mm0, eax pxor mm1, mm1 punpckldq mm0, mm0 packssdw mm0, mm0 pand mm0, qword ptr[DEC_TO_BIN_WORD_MASK] pcmpeqw mm0, mm1 psubw mm0, qword ptr[DEC_TO_BIN_FF_TO_CHARONE_DISTANCE] pshufw mm0, mm0, $1B // reverse the result movq qword ptr[HexBuffer], mm0 emms end; Now that is a beauty, only "emms" will kill big part of the performance, but that lose can be recovered partly by delaying it until a full string being processed, means pay its price once. The advantage of MMX instruction that can be easily modified to convert two Hex chars at the same speed in the converting, while in XMM it will double that converting 4 Hex chars, YMM the same..., also while we have plenty of registers we can parallel two bytes also at the same time. Another thing is the consts in the MMX version can be loaded into mm2 and mm3, means any loop will be a little faster. Share this post Link to post
Lars Fosdal 1791 Posted November 21, 2020 Nice! My ASM knowledge predates MMX, so this was a learning experience 🙂 How does it measure up speedwise to the others? Share this post Link to post
David Heffernan 2345 Posted November 21, 2020 16 minutes ago, Lars Fosdal said: How does it measure up speedwise to the others? How do you think it will compare against a lookup table? How would you expect computing an answer at runtime compare to computing the answer before compile time? Share this post Link to post