Google bites IDNs

Poor Google is a bit buggy. Sooner coders there already faced some character encoding issues and now, have problems with domains containing international (non-ASCII) characters. Source of the bug I found nowadays is that using some JavaScript magic Google doesn’t really forward you direct to the given search result. It handles the hit itself for search analysis and user tracking, and then redirects you to the real target. Let’s look up for gábor.20y.hu. The corresponding link result Google will return with is something similar to this:

http://www.google.com/url?sa=t&ct=res&cd=1&url=http%3A//g%E1bor.20y.hu/&ei=TXA1RMPxB7viwQHc-6WuAw&sig2=5dhrtGyojR_GPShMOKCdjg

Of course the guys at Google are smart, so they encoded the url parameter. Did they right? Not exactly. In internationalized domain names the special chars are not resolved like URL params. They have their own logic system, e.g. gábor means xn--gbor-5na for nameservers. So when you try to reach an URL like above, Firefox will notice you kindly that „Firefox can’t find the server at g%c3%a1bor.20y.hu.” And it’s got the point. It really doesn’t exist. So folks remember to use URL encoding carefully, do not encode domain names, only their GET parameters.