TNetHTTPRequest Unicode Mapping Issue

egnew · September 15, 2024

I am converting all my internet support code to use native TNetHttpClient and TNetHttpRequest components. I am getting the exception "No mapping for the Unicode character exists in the target multi-byte code page" for some web pages I am downloading using the function shown below.

Here is an instance where the exception occur:

GetText('https://www.google.com/index.html');

I assume this is an encoding issue. What is the best way to handle this to get my string result?

function TIndigoHttp.GetText (const p_Url: String): String;
var
  v_Response: IHTTPResponse;
begin
  f_Error := ''; // Used by OnRequestError
  try
    v_Response := f_NetHTTPRequest.Get(p_Url);
    f_StatusCode := v_Response.StatusCode;
    f_StatusText := v_Response.StatusText;
    Result := v_Response.ContentAsString;
  except
    on E: Exception do
      with v_Response do
      begin
        begin
          Result := '';
          f_StatusCode := -1*v_Response.StatusCode;
          f_StatusText := E.Message+' ['+v_Response.StatusText+']';
        end;
      end;
  end;
end;

ertank · September 15, 2024

Hi,

I do not see any problem that may raise such an error in the shared code.

You might want to check other events assigned to f_NetHTTPRequest. Exception may be raising in them.

If you are sure that TIndigoHttp.GetText() is where the error occurs then which line is it?

What is the computer codepage that you are making tests.

BTW, your request might complete without exception. But response received might be an error.

I would check if "f_StatusCode" is in successful response range. In my own code I check it to be ">= 200" and "<= 299"

egnew · September 15, 2024

The status is 200 - OK as there is not a problem fetching the webpage. The exception occurs during the call to ContentAsString when TEncoding.GetString is executed.

function TEncoding.GetString(const Bytes: TBytes; ByteIndex, ByteCount: Integer): string;
var
  Len: Integer;
begin
  if (Length(Bytes) = 0) and (ByteCount <> 0) then
    raise EEncodingError.CreateRes(@SInvalidSourceArray);
  if ByteIndex < 0 then
    raise EEncodingError.CreateResFmt(@SByteIndexOutOfBounds, [ByteIndex]);
  if ByteCount < 0 then
    raise EEncodingError.CreateResFmt(@SInvalidCharCount, [ByteCount]);
  if (Length(Bytes) - ByteIndex) < ByteCount then
    raise EEncodingError.CreateResFmt(@SInvalidCharCount, [ByteCount]);

  Len := GetCharCount(Bytes, ByteIndex, ByteCount);
  if (ByteCount > 0) and (Len = 0) then
    raise EEncodingError.CreateRes(@SNoMappingForUnicodeCharacter);
  SetLength(Result, Len);
  GetChars(@Bytes[ByteIndex], ByteCount, PChar(Result), Len);
end;

The value for LEN is zero which causes the EEncodingError exception. As originally stated, I suspect the problem is related to encoding. The question is how to resolve the issue with native Http. I have no problem using Indy as it seems to handle the necessary details on its own.

Thanks, Sidney

ertank · September 15, 2024

Below works for me without any exception. I see "All good" message and debugging shows data is actually in LResult variable.

uses
  System.Net.HttpClient,
  System.Net.HttpClientComponent;

procedure TForm1.Button1Click(Sender: TObject);
var
  LHttp: TNetHTTPClient;
  LResponse: IHTTPResponse;
  LResult: string;
begin
  LHttp := TNetHTTPClient.Create(Self);
  try
    try
      LResponse := LHttp.Get('https://www.google.com/index.html');
    except
      on E: Exception do
      begin
        ShowMessage('Cannot communicate' + sLineBreak + E.Message);
        Exit();
      end;
    end;

    if (LResponse.StatusCode < 200) or (LResponse.StatusCode > 299) then
    begin
      ShowMessage('Error status received');
      Exit();
    end;

    LResult := LResponse.ContentAsString();
    ShowMessage('All good');
  finally
    LHttp.Free();
  end;
end;

You may want to test this code in a new project.

If you do not get exception for google, but some other URL. You need to be sure that you are not downloading something binary.

There are binary contents that can be retrieved using GET and these cannot be simply read as string.

For example, I download my application update setup executables using GET into a TStream.

egnew · September 15, 2024

Thanks -- I copied your code into my program and it worked. I traced the problem to the constructor of my TIndigoHttp type. I had recently added custom headers to match those chrome was sending when I was having an issue logging into a website. After resolving the logon issue, I forgot to remove the custom headers.

The "No mapping for the Unicode character exists in the target multi-byte code page" error occurs when the two custom headers shown below are added to the request.. The error does not occur when I comment out either Add. I am not sure why there is a conflict. Google's encoding is "br". If you want to observe the issue, copy the custom header code below to immediately after you create lhttp. Comment out either Add and the mapping error will not occur.

Do you have an idea why the headers cause the error?

Thanks for your help, Sidney

  lhttp.CustHeaders.Clear;
  lHttp.CustHeaders.Add('Accept-Encoding','gzip, deflate, br, zstd');
  lHttp.CustHeaders.Add('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36');

Remy Lebeau · September 16, 2024

5 hours ago, egnew said:

Do you have an idea why the headers cause the error?

Because you are explicitly giving the server permission to send compressed response, even though by default IHttpResponse DOES NOT support compressed responses. So, you are likely getting a compressed response in binary format, but IHTttpResponse does not decompress it, and then you try to convert the compressed data into a String, which fails,

You need to use the TNetHTTPClient.AutomaticDecompression property to enable handling of "gzip" and "deflate" compressions.

In general, DO NOT manipulate the "Accept-Encoding" header manually, unless you are prepared to decode the response manually (ie, by receiving it as a TStream and decompressing it yourself). Just because a BROWSER sends that header (and browsers do support compression) does not mean YOU should send it.

TNetHTTPClient will manage the "Accept-Encoding" header for you. It will allow "gzip" and "deflate" compression if the AutomaticDecompression property enables them.

Similarly, Indy's TIdHTTP does the same thing. It supports "gzip" and "deflate" compressions, and will set the "Accept-Encoding" accordingly, if you have a Compressor assigned to it.

Sign In

TNetHTTPRequest Unicode Mapping Issue

Recommended Posts

egnew 5

Share this post

Link to post

ertank 30

Share this post

Link to post

egnew 5

Share this post

Link to post

ertank 30

Share this post

Link to post

egnew 5

Share this post

Link to post

Remy Lebeau 1654

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity