Jump to content
Angus Robertson

Internationalized Domain Names (IDN)

Recommended Posts

Has anyone have a requirement to support IDNs in ICS, or have any punycode conversion functions to share with ICS?

 

Someone just filled in a form on my web site with email domain with an accented character, first time I've seen that.  My ICS email client can handle the MIME encoded email header for display, but is unable to reply to the email due to no IDN handling.  I guess punycode conversion would fix that, but not sure which email header fields need which encoding.

 

Or indeed whether we should try and handle IDN at a lower level in ICS. 

 

Guess I should register an accented domain name for testing.

 

Angus

Share this post


Link to post
14 hours ago, Angus Robertson said:

Has anyone have a requirement to support IDNs in ICS, or have any punycode conversion functions to share with ICS?

Indy has some very limited support for IDN/Punycode, but it is at the socket layer when resolving a hostname to an IP, not at the SMTP/MIME layer.

Share this post


Link to post

Do you ever get questions about IDN support in Indy, because I don't recall it ever being mentioned in the ICS mailing list.  No point adding IDN support if no-one is going to use, except the one Delphi developer that emailed me, not about ICS.

 

Angus

 

Share this post


Link to post
12 hours ago, Angus Robertson said:

Do you ever get questions about IDN support in Indy

No.  But that does not mean IDN is not important to support, though.  Granted, the majority of the Internet does not use IDNs, but some parts do, and you never know what you users are going to want to access.

 

Share this post


Link to post

IDN is pretty common in Norway, due to our beloved æ ø and å characters.

Share this post


Link to post

Thanks everyone, I'll add IDN support to TWSocket.DnsLookup and TWSocket.ReverseDnsLookup which fortunately have string arguments.  I'll set-up a sub-domain with accents for testing.  Then think about email. 

 

If anyone has any real IDN URLs  they would be useful for testing. 

 

Angus

 

  • Thanks 1

Share this post


Link to post
1 hour ago, Angus Robertson said:

If anyone has any real IDN URLs  they would be useful for testing. 

www.háčkyčárky.cz by CZ Domain Registry.

Share this post


Link to post

Thanks for the URLs everyone, I've also setup www.éxàmplê.ftptest.co.uk although it's not working yet, will make sure it works with SSL as a server as well.

 

Angus

Share this post


Link to post

I have looked into this for our SMTP (server and client) applications, but have done nothing about it yet  It seems to me that implementing IDN is the easier task, because Microsoft provides functions to do conversions in all the OS which we need to support (https://docs.microsoft.com/en-us/windows/win32/intl/handling-internationalized-domain-names--idns).  What is much harder is getting the SMTP headers correct, as you mentioned at the strart of this thread, which requires clients, servers and MTAs to support the necessary SMTP extensions.  I found a useful brief summary, with links to all the many RFCs, at https://en.wikipedia.org/wiki/International_email but I suspect there is a lot of work needed to get it all working.

 

We have email server and client users in Japan, the Middle East, and other potential-user locations for this capability, but nobody has yet asked for it.

Share this post


Link to post

ICS already has the inline MIME processing used for email headers since international names are far more common than international domains, bu this needs to be done at application level, the SMTP and POP3 components don't do it internally. 

 

Rather than use the Windows IDN APIs, I've trying some pascal code first, multi-platform. 

 

Angus

Share this post


Link to post

I took a quick look at IDN format and wonder what drugs its creators did take.

  • Haha 2

Share this post


Link to post
Quote

Not a single ASCII character in that name, converts to xn--90aijkdmaud0d.xn--p1ai with no single hyphens, although conversion back to Unicode does not like it. 

 

IDN is now working with forward DNS lookups, using the DnsLookup function, now need to look at other places that don't use that function like ping.

 

Angus

  • Like 1

Share this post


Link to post

Testing IDN with Windows 2019 DNS Server is proving problematic.  I've been able to set-up A records for scrúdú and xn--scrd-srab,  I would have expected DNS Manager to convert an accented domain to puncode ASCII but it actually stores scr\303\272d\303\272 in the file.  It also stores 16-bit characters.  I had to convert the punycode version manually.

 

Old versions of ICS are actually able to lookup scrúdú.ftptest.co.uk without any punycode, so it seems windows usea the full 8-bits for DNS queries,  Internally, we convert Unicode to ANSI before the query, so this will only work for code pages that match the DNS server. 

 

The real question is whether this DNS behaviour is by design or ignorance, perhaps internal networks are allow full 8-bit local names?  I've Googled a lot, but can not find any design recommendations for IDN and Windows DNS Server, indeed no mentions atall. 

 

I was planning on changing ICS to automatically convert IDNs in Unicode to punycode ASCII, but this would break existing internal applications using 8-bit ANSI, so perhaps it needs to be optional, I'd prefer ASCII to become the default.

 

Also, think the HTTP client and server need changing, because the Host: header needs to be punycode ASCII, probably Location: as well, but sure about sub directories, arguments are always character converted but does that apply to directories as well? 

 

Angus

 

 

Share this post


Link to post

IMHO all ANSI uses should be discarded even if they are convenient in some cases. Too much compatibility issues in the world that speaks with letters other than A-Z

Share this post


Link to post

It seems those using non-English domains hedge their bets on their sites:

 

Handshake done, error #0 - SSL Connected OK with TLSv1.2, cipher ECDHE-RSA-AES128-GCM-SHA256, key auth RSA, key exchange ECDH, encryption AESGCM(128), message auth AEAD
! VerifyResult: ok, Peer domain: мособлеирц.рф
3 Certificate(s) in the verify chain.
#3 Issued to (CN): mosobleirc.ru
Alt Domains (SAN): mosobleirc.ru, www.mosobleirc.ru, www.мособлеирц.рф, www.новый.мособлеирц.рф, мособлеирц.рф, новый.мособлеирц.рф
Issued by (CN): Let's Encrypt Authority X3, (O): Let's Encrypt
Expires: 11/05/2020 18:43:06, Signature: sha256WithRSAEncryption
 

Does anyone have any working Far East web sites with IDNs, Chinese, Japanese, etc, those I've tried are all dead.

 

Angus

  • Like 1

Share this post


Link to post
Posted (edited)
Quote

That works OK, but redirects to an Alibaba site using western domains. 

 

I did try and register an accented test domain earlier with 123-Reg.  My attempt for co.uk was declined but they registered a com OK except it is missing all the accented characters, won't accept xn-- names only Unicode.   I'll try an eu instead, they must support accents.

 

Angus

Edited by Angus Robertson

Share this post


Link to post

SVN and the overnight zip have been updated with a lot of changes so ICS  supports International Domain Names for Applications (IDNA), i.e. using accents and Unicode characters in domain names.

Domain names can only contain lowercase ASCII letters and numbers and a couple of symbols, so Unicode U-Labels (nodes in a domain) must be converted to A-Labels (Punycode ASCII) with an ACE (ASCII Compatible Encoding) prefix.  So www.mâgsÿstést.eu becomes www.xn--mgsstst-pwa1e4l.eu and мособлеирц.рф becomes xn--90aijkdmaud0d.xn--p1ai.

ICS mostly does the Unicode to A-Label conversion just before looking up an IP address for a domain name (in DnsLookup) and converts back from A-Label to Unicode when doing a reverse lookup (in ReverseDnsLookup).  HTTP headers also contain A-Labels for the Host: header and the host part of URLs for proxy or relocation, but Unicode paths should be UrlEncoded by the application as now.  Not looked at SMTP yet.

The HTTP client and server, Ping, ICMP and DNS Query components all now support Unicode domain names, generally without application changes unless you want to display the A-Label name looked-up (PunycodeHost property).  DNS Query does require application changes due to all methods and properties previously being AnsiString, now String.

SSL/TLS now fully supports Unicode domain names, including displaying the Unicode version of the domain name (except for Subject and Issuer lines), and X509 automatic certificate ordering from Let's Encrypt fully supports Unicode domain names.  Certificate files are saved with Unicode names, not A-Labels.
 

For server testing I registered an eu domain which is live on one of my web sites at https://www.mâgsÿstést.eu/  and https://scrúdú.mâgsÿstést.eu/ which have ICS ordered SSL certificates.  I do have DNS for Cyrillic and Far East domains, but this web server is built with Delphi 2007 so no full Unicode. 

 

Angus

 

 

 

  • Like 1
  • Thanks 2

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×