From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anton Altaparmakov Subject: Re: [PATCH] Full NLS support for HFS (classic) filesystem Date: Tue, 31 May 2005 15:49:18 +0100 Message-ID: <1117550958.8073.30.camel@imp.csi.cam.ac.uk> References: <429B1E35.2040905@rambler.ru> <429C68A0.20003@rambler.ru> <429CBC75.2030605@rambler.ru> <429CD545.1070308@rambler.ru> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: Pavel Fedin , linux-fsdevel@vger.kernel.org Return-path: Received: from ppsw-0.csi.cam.ac.uk ([131.111.8.130]:18622 "EHLO ppsw-0.csi.cam.ac.uk") by vger.kernel.org with ESMTP id S261504AbVEaOt3 (ORCPT ); Tue, 31 May 2005 10:49:29 -0400 To: Roman Zippel In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Hi, On Tue, 2005-05-31 at 15:59 +0200, Roman Zippel wrote: > On Tue, 31 May 2005, Pavel Fedin wrote: > > > If the names were translated correctly, HFS would have found them. You need > > > to give me an example, which should have worked, but failed. > > > > I can't produce exact russian string (don't remember), but it was about 50% > > of all russian names. > > Without an example I can't reproduce, what you're trying to say here (at > least the "HFS doesn't find the file, even though it's correctly > translated" part). There are lost of characters that cannot be translated. I have this problem with NTFS, too. My solution is to just ignore file names that cannot be translated. If a user complains they cannot see some filenames, I tell them to use utf8 for their encoding which always works for translation. NLS is fundamentally broken so there is no point in trying to use clever dynamic tables to do it. Just ignore it is IMO the correct way. btw. not having mappings is not even the biggest problem. It gets much worse and even Pavel's dynamic mappings are not actually going to work. For example there are some characters in asian languages which when translated will give a character but when you then reverse translate this character you end up with something that is not the same as the starting character. This is a fundamental flaw with the whole NLS codepages approach because there are symbols in Unicode which have identical meaning but two different Unicode values. You get this for the "exact" ideographs and the "simplified" ideographs (e.g. CJK and compatibility ideographs - see Unicode standard and various NLS pages for details). If you don't believe me I can dig out the old emails which have concrete examples of how you convert a CJK ideograph to some codepage and then back and you end up with a compatibility CJK ideograph instead of the original one. Of course if you start with the compatibility CJK ideograph and do the translation + reverse translation you end up with the same compatibility ideograph but that doesn't help you when you use the "real" ideograph as for example Windows seems to do as a lot of asian people have complained to me about ntfs when used with codepages. All of them went away happy when I told them to tell the ntfs driver to use utf8... So unless you use UTF8, any other conversion using NLS/code pages will always have failure cases... (btw. I first suspected bugs in the code pages but I verified them on the MS website and they were correct...) > > > Create the tables in a nls module and you can do whatever you want in the > > > uni2char/char2uni functions. > > > > Huh... > > The problem is: when using 8-bit iocharset and 8-bit codepage char2uni from > > codepage always gives the result but AFTER THIS uni2char to iocharset does NOT > > necessarily gives the result. There are characters in codepage which have no > > equivalents in iocharset. They will be lost, you suggest to turn them into > > '?'. But how to reverse this in order to supply to hfs_strcmp()? > > So create two functions uni2char/char2uni, which provide perfect reverse > mapping. Sorry, but I don't understand what your problem is here. > It seems you're making it more complex than it really is. That is impossible due to the problems with compatibility characters I explained above which why I would agree with you that such magic conversions should never happen, just put "use utf8 if you have problems" in the mount man page... Best regards, Anton -- Anton Altaparmakov (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/