Jump to content
chkaufmann

ExtractHeaderFields with special characters

Recommended Posts

Hi,

 

I use ExtractHeaderFields() from Web.HttpApp when I parse a post upload. 

 

With the following value for "Content" (containing special german characters) this function fails:

'form-data; name="File1"; filename="Test1MitäÄ-Umlaut.pdf"'

 

I get this error:

System.SysUtils    33477 TEncoding.GetString
System.NetEncoding  1007 TURLEncoding.Decode
Web.HTTPApp         2108 ExtractHeaderFields

Now I'm not sure if the input is wrong or if I have to use a different function to parse the content of this header (Content-Disposition:).

 

Thanks for any help.

 

Christian

Share this post


Link to post
5 hours ago, chkaufmann said:

With the following value for "Content" (containing special german characters) this function fails:


'form-data; name="File1"; filename="Test1MitäÄ-Umlaut.pdf"'

That header is malformed.  HTTP headers simply can't have un-encoded non-ASCII characters like that.  There are competing standards for how they need to be encoded, though.  There is RFC 2183, RFC 2047, RFC 7578RFC 8187, HTML5, etc.

5 hours ago, chkaufmann said:

I get this error:


System.SysUtils    33477 TEncoding.GetString
System.NetEncoding  1007 TURLEncoding.Decode
Web.HTTPApp         2108 ExtractHeaderFields

TURLEncoding decodes %HH sequences into bytes, and then charset-decodes those bytes into Unicode.  IIRC, it expects the bytes to be UTF-8 encoded by default.

Share this post


Link to post

Ok, the request that comes with this malformed Content-Disposition is created by another Delphi application where I use Indy components. The code looks like this:

mPartStream    := TIdMultiPartFormDataStream.Create;
postDataStream := mPartStream;
FHttp.Request.ContentType := mPartStream.RequestContentType;
for ix := 0 to FPostNames.Count -1 do begin
  if FPostFiles[ix].IsNull
    then mPartStream.AddFormField(FPostNames[ix], FPostValues[ix], 'UTF-8').ContentTransfer := '8bit'
    else mPartStream.AddFile(FPostNames[ix], FPostFiles[ix].PathName, FPostContentTypes[ix]);
end;

FPostFiles[ix].PathName is the Windows path of a file. Should I encode it on my side? Or do I have to set another parameter to ensure correct encoding?

 

Christian

Share this post


Link to post

Which version of Delphi is that other app written in, and what version of Indy is it using?

 

TIdMultipartFormDataStream encodes non-ASCII characters in all field names and filenames according to RFC 2047, which your earlier example is NOT encoded as, so I doubt the example is coming from Indy, unless maybe it is a really old version.

 

On Windows, depending on what the OS system language is set to, TIdMultipartFormDataStream uses either UTF-8 or the OS language as the charset to encode characters to bytes.  And then depending on the charset used, it uses either Quoted-Printable or Base64 to encode those bytes in the Content-Disposition header.  These values are reflected in the HeaderCharSet and HeaderEncoding properties of each TIdFormDataField object that the TIdMultipartFormDataStream.Add(...) methods create.  Double-check what these values are actually being set to on your system, but you can also set them yourself as needed.  I would suggest using HeaderCharSet='utf-8' and HeaderEncoding='B', eg:

var
  mPartStream := TIdMultiPartFormDataStream;
  field: TIdFormDataField;

...

mPartStream := TIdMultiPartFormDataStream.Create;
FHttp.Request.ContentType := mPartStream.RequestContentType;
for ix := 0 to FPostNames.Count -1 do begin
  if FPostFiles[ix].IsNull then begin
    field := mPartStream.AddFormField(FPostNames[ix], FPostValues[ix], 'UTF-8');
    field.ContentTransfer := '8bit';
  end
  else begin
    field := mPartStream.AddFile(FPostNames[ix], FPostFiles[ix].PathName, FPostContentTypes[ix]);
  end;
  field.HeaderCharSet := 'UTF-8';
  field.HeaderEncoding := 'B';
end;

Though, I suspect even this will not give you the end result you are looking for with ExtractHeaderFields(), if it is really trying to url-decode fields that are not url-encoded to begin with.  HTTP does not use url-encoding in header content, so what you are experiencing really sounds like a logic bug in the HttpApp framework.  But at least this should give your code access to the stream's encoded Base64 data, which you can then decode manually to a Unicode string, such as with Indy's DecodeHeader() function in the IdCoderHeader unit.

Edited by Remy Lebeau

Share this post


Link to post

I use Delphi 10.4.2 and the Indy library coming with the default installation.

 

I found that I can set the "Decode" parameter of ExtractHeaderFields() to false, then it works fine.

 

Christian

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×