Jump to content
Mark Williams

TIDMessage extract RFT body

Recommended Posts

I can't find any examples of how to do this online. Plenty of how to send rtf, but not read it. I can't seem to find any specific unit that deals with it.

 

What mime types do I need to check for just  'text/rtf' or also 'text/enriched' and 'text/richtext'?

 

How do I extract the raw rtf?

 

And how are inline images handled? Are they accessible via the attachments list or in some other way?

 

Thanks.

Share this post


Link to post
32 minutes ago, Mark Williams said:

I can't find any examples of how to do this online.

Probably because receiving RTF emails is not common.  Formatted emails typically use HTML instead.

32 minutes ago, Mark Williams said:

Plenty of how to send rtf, but not read it.

Assuming the RTF is in the email body itself, and not in a separate TNEF attachment (winmail.dat, etc), then it is really no different than handling HTML emails.  If the top-level TIdMessage.ContentType is RTF then read the TIdMessage.Body property, otherwise search the TIdMessage.MessageParts collection, in MIME order, looking for a TIdText object that has an RTF media type and then read its Body property.

32 minutes ago, Mark Williams said:

I can't seem to find any specific unit that deals with it.

There is no *unit* for handling RTF, unless TNEF is involved, in which case you would have to use the TIdCoderTNEF class to parse the TNEF attachment to extract its inner email into another TIdMessage, and then you can process that email as needed.

32 minutes ago, Mark Williams said:

What mime types do I need to check for just  'text/rtf' or also 'text/enriched' and 'text/richtext'?

RTF and "enrighted text" ('text/richedit' is the predecessor to 'text/enriched') are separate formats.  It is up to you whether you want to handle them all or not.  'text/rtf' is likely to be the more common format you encounter, if any.

32 minutes ago, Mark Williams said:

How do I extract the raw rtf?

See above.

32 minutes ago, Mark Williams said:

And how are inline images handled? Are they accessible via the attachments list or in some other way?

That, I can't really answer.  I don't know how RTF email encode images.  I think they are embedded directly inside the RTF markup itself, not referred to using separate attachments, like in HTML emails.  But I'm not sure.

  • Thanks 1

Share this post


Link to post
On 11/30/2020 at 5:16 PM, Remy Lebeau said:

If the top-level TIdMessage.ContentType is RTF then read the TIdMessage.Body property

I have mistakenly thought that the Body would contain plain text only. Presumably, it's also therefore possible for an html body to be contained in the body itself if the ContentType is text/html? I have yet to come across any examples of this although I can now see it would be possible.

 

I would prefer to deal with an html body if it exists, if not then a rtf body and finally plain text. With that in mind can you see any problems with the following approach:

 

  1. Check the IDMessage ContentType. If it is text/???? then that is the body type I have to work with. 
  2. If the ContentType is multipart/???? iterate through the attachments and find the one that best suits my purpose.

Is it possible for the ContentType to be multipart and the body to be stored in TIDMessage body rather than in the message parts?

 

On 11/30/2020 at 5:16 PM, Remy Lebeau said:

RTF and "enrighted text" ('text/richedit' is the predecessor to 'text/enriched') are separate formats

I thought they were just alternate names for RTF. Having looked at the specification(s), I see not. What a pain! I have not come across either of these formats and from what I can see online they're not used that much, but I would still wish to be able to handle them if my app comes across them. I suppose the options are to display the body text as plaint text with tags and all or write a converted to convert to html and handle in same way as I handle html body. I have checked online to see if anyone has already written anything to handle the conversion and drew a blank. However, it doesn't look like it would be too difficult to write some code to convert the existing tags to their html equivalents at least the most common ones.

 

On 11/30/2020 at 5:16 PM, Remy Lebeau said:

I don't know how RTF email encode images. 

AFAIK RTF usually inserts the raw image data into the RTF document (at least that is what WordPad does).

 

I have created RTF emails with Outlook for text purposes. However, when I send these to a Google account they are getting converted to html along the way. I don't know if this is done at the Outlook or Google end. Outlook retains the original email as rtf and when I used Outlook's MailItem object to read it, the body is provided in RTF format and any images are handled as OLE attachments and incorporated by referenced in the rtf. But this may just be how Outlook handles them internally. Without any examples to play with I will work on the basis that the image data is embedded in the raw rtf. 

 

Share this post


Link to post
28 minutes ago, Mark Williams said:

I have mistakenly thought that the Body would contain plain text only.

The TIdMessage.Body can contain any textual format.  RTF is just plain text with markup.  If there are no attachments, and the RTF is not wrapped in MIME, then it very well could end up at the top-level email body.  Which is why you have to check the TIdMessage.ContentType to determine whether you should look for the RTF in the TIdMessage.Body or in a TIdText.Body within the TIdMessage.MessageParts.

28 minutes ago, Mark Williams said:

Presumably, it's also therefore possible for an html body to be contained in the body itself if the ContentType is text/html?

Yes.

28 minutes ago, Mark Williams said:

I have yet to come across any examples of this although I can now see it would be possible.

See https://www.indyproject.org/2005/08/17/html-messages/ which is written for HTML, but similar logic would apply for RTF, too.

28 minutes ago, Mark Williams said:

I would prefer to deal with an html body if it exists, if not then a rtf body and finally plain text.

That is perfectly fine.  That is what MIME is good at - providing multiple representations for the same data.  You can certain scan an email for HTML first, then scan it again for RTF, then scan it again for plain-text.  Or, just scan it once and handle whichever format you encounter first (which is how MIME is meant to be handled, as representations are supposed to be ordered from least-complex to most-complex).

28 minutes ago, Mark Williams said:

With that in mind can you see any problems with the following approach:

  1. Check the IDMessage ContentType. If it is text/???? then that is the body type I have to work with. 
  2. If the ContentType is multipart/???? iterate through the attachments and find the one that best suits my purpose.

Yes, that is what you should do.

28 minutes ago, Mark Williams said:

Is it possible for the ContentType to be multipart and the body to be stored in TIDMessage body rather than in the message parts?

No.

28 minutes ago, Mark Williams said:

I thought they were just alternate names for RTF.

No, they are actually distinct formats for different kinds of RTF.

28 minutes ago, Mark Williams said:

AFAIK RTF usually inserts the raw image data into the RTF document (at least that is what WordPad does).

Then you would have to actually parse the RTF to extract the image data as needed.

28 minutes ago, Mark Williams said:

I have created RTF emails with Outlook for text purposes. However, when I send these to a Google account they are getting converted to html along the way. I don't know if this is done at the Outlook or Google end.

I can't answer that.  I suspect on the Google end, since Google has a web front-end, and can't be sure that browsers can handle RTF.

28 minutes ago, Mark Williams said:

Outlook retains the original email as rtf and when I used Outlook's MailItem object to read it, the body is provided in RTF format and any images are handled as OLE attachments and incorporated by referenced in the rtf.

On the other hand, Outlook is known for storing RTF and related attachments in a TNEF file, and then send that in an email as a single attachment.  In which case, the TIdMessage will contain an attachment of type 'application/ms-tnef', with a filename of 'winmail.dat' or 'attXXXXX.dat'.  You can then extract that attachment and parse it using TIdCoderTNEF.

  • Thanks 1

Share this post


Link to post
14 hours ago, Remy Lebeau said:

In which case, the TIdMessage will contain an attachment of type 'application/ms-tnef', with a filename of 'winmail.dat' or 'attXXXXX.dat'.  You can then extract that attachment and parse it using TIdCoderTNEF.

I have had a look at the functions in IDCoderTNef.

 

I will be working with a TIDAttachment object.

 

A couple of questions:

 

  1. With a TNef attachment does the entire email get wrapped in the .dat file (including any attachments) or just the rtf? In other words. is it safe to parse out the TNef attachment into a new IDMessage object and just ignore the remainder of the original IDMessage?
  2. I can see there is a function for IsFileNameTNef which basically checks to see if the file is in the form mentioned by you above.  But if the 'application/ms-tnef' is it possible it could be in any other form and, if so what?

Share this post


Link to post
7 hours ago, Mark Williams said:

With a TNef attachment does the entire email get wrapped in the .dat file (including any attachments) or just the rtf?

The entire email.  That is why a TNEF is parsed into a new TIdMessage.

7 hours ago, Mark Williams said:

In other words. is it safe to parse out the TNef attachment into a new IDMessage object and just ignore the remainder of the original IDMessage?

Yes.

7 hours ago, Mark Williams said:

I can see there is a function for IsFileNameTNef which basically checks to see if the file is in the form mentioned by you above.  But if the 'application/ms-tnef' is it possible it could be in any other form and, if so what?

Typically no, those are the filename forms that Microsoft uses for TNEF.  But the filename is technically under the control of the sender and COULD be different if a non-Microsoft client is sending a TNEF attachment.  So you should pay more attention to the attachment's Content-Type rather than its filename.

  • Thanks 1

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×