Angus Robertson 574 Posted February 29, 2020 Has anyone have a requirement to support IDNs in ICS, or have any punycode conversion functions to share with ICS? Someone just filled in a form on my web site with email domain with an accented character, first time I've seen that. My ICS email client can handle the MIME encoded email header for display, but is unable to reply to the email due to no IDN handling. I guess punycode conversion would fix that, but not sure which email header fields need which encoding. Or indeed whether we should try and handle IDN at a lower level in ICS. Guess I should register an accented domain name for testing. Angus Share this post Link to post
Vandrovnik 214 Posted February 29, 2020 On https://www.xn--hkyrky-ptac70bc.cz/ , there is an e-mail address to which you can send test e-mails and it should send you a reply, if test message was OK. (You can switch the site to English language.) Share this post Link to post
Remy Lebeau 1393 Posted March 1, 2020 14 hours ago, Angus Robertson said: Has anyone have a requirement to support IDNs in ICS, or have any punycode conversion functions to share with ICS? Indy has some very limited support for IDN/Punycode, but it is at the socket layer when resolving a hostname to an IP, not at the SMTP/MIME layer. Share this post Link to post
Angus Robertson 574 Posted March 1, 2020 Do you ever get questions about IDN support in Indy, because I don't recall it ever being mentioned in the ICS mailing list. No point adding IDN support if no-one is going to use, except the one Delphi developer that emailed me, not about ICS. Angus Share this post Link to post
Remy Lebeau 1393 Posted March 1, 2020 12 hours ago, Angus Robertson said: Do you ever get questions about IDN support in Indy No. But that does not mean IDN is not important to support, though. Granted, the majority of the Internet does not use IDNs, but some parts do, and you never know what you users are going to want to access. Share this post Link to post
Lars Fosdal 1792 Posted March 2, 2020 IDN is pretty common in Norway, due to our beloved æ ø and å characters. Share this post Link to post
Angus Robertson 574 Posted March 2, 2020 Thanks everyone, I'll add IDN support to TWSocket.DnsLookup and TWSocket.ReverseDnsLookup which fortunately have string arguments. I'll set-up a sub-domain with accents for testing. Then think about email. If anyone has any real IDN URLs they would be useful for testing. Angus 1 Share this post Link to post
Vandrovnik 214 Posted March 2, 2020 1 hour ago, Angus Robertson said: If anyone has any real IDN URLs they would be useful for testing. www.háčkyčárky.cz by CZ Domain Registry. Share this post Link to post
Fr0sT.Brutal 900 Posted March 2, 2020 2 hours ago, Angus Robertson said: If anyone has any real IDN URLs they would be useful for testing. https://мособлеирц.рф/ RU Cyrillic domain Share this post Link to post
Lars Fosdal 1792 Posted March 2, 2020 https://strøm.no/ i.e. https://xn--strm-ira.no/ Share this post Link to post
Angus Robertson 574 Posted March 2, 2020 Thanks for the URLs everyone, I've also setup www.éxàmplê.ftptest.co.uk although it's not working yet, will make sure it works with SSL as a server as well. Angus Share this post Link to post
Fr0sT.Brutal 900 Posted March 2, 2020 Here are some other links: https://en.wikipedia.org/wiki/IDN_Test_TLDs https://www.w3.org/International/tests/test-incubator/oldtests/test-idn-display-0 Share this post Link to post
timfrost 78 Posted March 2, 2020 I have looked into this for our SMTP (server and client) applications, but have done nothing about it yet It seems to me that implementing IDN is the easier task, because Microsoft provides functions to do conversions in all the OS which we need to support (https://docs.microsoft.com/en-us/windows/win32/intl/handling-internationalized-domain-names--idns). What is much harder is getting the SMTP headers correct, as you mentioned at the strart of this thread, which requires clients, servers and MTAs to support the necessary SMTP extensions. I found a useful brief summary, with links to all the many RFCs, at https://en.wikipedia.org/wiki/International_email but I suspect there is a lot of work needed to get it all working. We have email server and client users in Japan, the Middle East, and other potential-user locations for this capability, but nobody has yet asked for it. Share this post Link to post
Angus Robertson 574 Posted March 3, 2020 ICS already has the inline MIME processing used for email headers since international names are far more common than international domains, bu this needs to be done at application level, the SMTP and POP3 components don't do it internally. Rather than use the Windows IDN APIs, I've trying some pascal code first, multi-platform. Angus Share this post Link to post
Fr0sT.Brutal 900 Posted March 3, 2020 I took a quick look at IDN format and wonder what drugs its creators did take. 2 Share this post Link to post
Angus Robertson 574 Posted March 3, 2020 It is a very strange format, no real reason for re-ordering the letters. Angus Share this post Link to post
Angus Robertson 574 Posted March 5, 2020 Quote https://мособлеирц.рф/ Not a single ASCII character in that name, converts to xn--90aijkdmaud0d.xn--p1ai with no single hyphens, although conversion back to Unicode does not like it. IDN is now working with forward DNS lookups, using the DnsLookup function, now need to look at other places that don't use that function like ping. Angus 1 Share this post Link to post
Angus Robertson 574 Posted March 6, 2020 Testing IDN with Windows 2019 DNS Server is proving problematic. I've been able to set-up A records for scrúdú and xn--scrd-srab, I would have expected DNS Manager to convert an accented domain to puncode ASCII but it actually stores scr\303\272d\303\272 in the file. It also stores 16-bit characters. I had to convert the punycode version manually. Old versions of ICS are actually able to lookup scrúdú.ftptest.co.uk without any punycode, so it seems windows usea the full 8-bits for DNS queries, Internally, we convert Unicode to ANSI before the query, so this will only work for code pages that match the DNS server. The real question is whether this DNS behaviour is by design or ignorance, perhaps internal networks are allow full 8-bit local names? I've Googled a lot, but can not find any design recommendations for IDN and Windows DNS Server, indeed no mentions atall. I was planning on changing ICS to automatically convert IDNs in Unicode to punycode ASCII, but this would break existing internal applications using 8-bit ANSI, so perhaps it needs to be optional, I'd prefer ASCII to become the default. Also, think the HTTP client and server need changing, because the Host: header needs to be punycode ASCII, probably Location: as well, but sure about sub directories, arguments are always character converted but does that apply to directories as well? Angus Share this post Link to post
Fr0sT.Brutal 900 Posted March 10, 2020 IMHO all ANSI uses should be discarded even if they are convenient in some cases. Too much compatibility issues in the world that speaks with letters other than A-Z Share this post Link to post
Angus Robertson 574 Posted March 10, 2020 It seems those using non-English domains hedge their bets on their sites: Handshake done, error #0 - SSL Connected OK with TLSv1.2, cipher ECDHE-RSA-AES128-GCM-SHA256, key auth RSA, key exchange ECDH, encryption AESGCM(128), message auth AEAD ! VerifyResult: ok, Peer domain: мособлеирц.рф 3 Certificate(s) in the verify chain. #3 Issued to (CN): mosobleirc.ru Alt Domains (SAN): mosobleirc.ru, www.mosobleirc.ru, www.мособлеирц.рф, www.новый.мособлеирц.рф, мособлеирц.рф, новый.мособлеирц.рф Issued by (CN): Let's Encrypt Authority X3, (O): Let's Encrypt Expires: 11/05/2020 18:43:06, Signature: sha256WithRSAEncryption Does anyone have any working Far East web sites with IDNs, Chinese, Japanese, etc, those I've tried are all dead. Angus 1 Share this post Link to post
Lars Fosdal 1792 Posted March 10, 2020 I see many Norwegian sites simply do a redirect on the IDN URL to a non-IDN version of it. As for Chinese - here is a slew: https://www.101domain.com/chinese-simplified_idn_domains.htm Edit: Err... where you can register a slew Share this post Link to post
Lars Fosdal 1792 Posted March 10, 2020 This one redirects: http://welcometothe.中国 Share this post Link to post
Angus Robertson 574 Posted March 10, 2020 (edited) Quote http://welcometothe.中国 That works OK, but redirects to an Alibaba site using western domains. I did try and register an accented test domain earlier with 123-Reg. My attempt for co.uk was declined but they registered a com OK except it is missing all the accented characters, won't accept xn-- names only Unicode. I'll try an eu instead, they must support accents. Angus Edited March 10, 2020 by Angus Robertson Share this post Link to post
Angus Robertson 574 Posted March 12, 2020 SVN and the overnight zip have been updated with a lot of changes so ICS supports International Domain Names for Applications (IDNA), i.e. using accents and Unicode characters in domain names. Domain names can only contain lowercase ASCII letters and numbers and a couple of symbols, so Unicode U-Labels (nodes in a domain) must be converted to A-Labels (Punycode ASCII) with an ACE (ASCII Compatible Encoding) prefix. So www.mâgsÿstést.eu becomes www.xn--mgsstst-pwa1e4l.eu and мособлеирц.рф becomes xn--90aijkdmaud0d.xn--p1ai. ICS mostly does the Unicode to A-Label conversion just before looking up an IP address for a domain name (in DnsLookup) and converts back from A-Label to Unicode when doing a reverse lookup (in ReverseDnsLookup). HTTP headers also contain A-Labels for the Host: header and the host part of URLs for proxy or relocation, but Unicode paths should be UrlEncoded by the application as now. Not looked at SMTP yet. The HTTP client and server, Ping, ICMP and DNS Query components all now support Unicode domain names, generally without application changes unless you want to display the A-Label name looked-up (PunycodeHost property). DNS Query does require application changes due to all methods and properties previously being AnsiString, now String. SSL/TLS now fully supports Unicode domain names, including displaying the Unicode version of the domain name (except for Subject and Issuer lines), and X509 automatic certificate ordering from Let's Encrypt fully supports Unicode domain names. Certificate files are saved with Unicode names, not A-Labels. For server testing I registered an eu domain which is live on one of my web sites at https://www.mâgsÿstést.eu/ and https://scrúdú.mâgsÿstést.eu/ which have ICS ordered SSL certificates. I do have DNS for Cyrillic and Far East domains, but this web server is built with Delphi 2007 so no full Unicode. Angus 1 2 Share this post Link to post