From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pavel Fedin Subject: Re: [PATCH] Full NLS support for HFS (classic) filesystem Date: Tue, 31 May 2005 17:21:09 -0400 Message-ID: <429CD545.1070308@rambler.ru> References: <429B1E35.2040905@rambler.ru> <429C68A0.20003@rambler.ru> <429CBC75.2030605@rambler.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: linux-fsdevel@vger.kernel.org Return-path: Received: from mxb.rambler.ru ([81.19.66.30]:16398 "EHLO mxb.rambler.ru") by vger.kernel.org with ESMTP id S261887AbVEaNUK (ORCPT ); Tue, 31 May 2005 09:20:10 -0400 To: Roman Zippel In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Roman Zippel wrote: >>>Use different mount points. This is not a reason to create a lot of >>>different options all doing the same. >> >> Why??? It's incomfortable for the user! > > > Because it's a user space problem, if you want to use different codepages > for different cd's tell it to mount. The kernel only provides the > functionality, giving the same functionality several different names is > not an option. How is it possible (except using different mount points)? Besides, using different mount points is problematic with automounters, only first fstab entry is taken into account by them. Second point won't be mounted automatically by, for example, magicdev. Well, if you really don't care about users opinion and listen only to yourself, we can find a compromise. iso9660 currently does not use "codepage" option, udf will never use it because stores names in Unicode. So we can rename this option to "codepage". There seem to be no ISO discs with national 8-bit names so probably iso9660 will never need it. >> Yes, it may find wrong file. But i mean here that HFS does not use string >>comparison at all for finding a file but some specific hashing (i tried to >>understand how it actually works and was unable to, too difficult). So it just >>will not find many files at all, even if names are translated correctly. As i >>say, i tried this, it completely failed. > > > If the names were translated correctly, HFS would have found them. You > need to give me an example, which should have worked, but failed. I can't produce exact russian string (don't remember), but it was about 50% of all russian names. >> Don't understand about dynamic NLS module. What code should it contain? > > > Create the tables in a nls module and you can do whatever you want in the > uni2char/char2uni functions. Huh... The problem is: when using 8-bit iocharset and 8-bit codepage char2uni from codepage always gives the result but AFTER THIS uni2char to iocharset does NOT necessarily gives the result. There are characters in codepage which have no equivalents in iocharset. They will be lost, you suggest to turn them into '?'. But how to reverse this in order to supply to hfs_strcmp()? If i leave it unreversed, convert second hfs_strcmp() argument to iocharset too, then hfs_strcmp() gives wrong results. It can return 1 for the cases where it would return -1 if names are supplied in original Mac encoding. Look for example at KOI8-R and CP10007 (Mac Cyrillic) pages. Note that order of the characters is completely different. For example 'X'>'Y' in Mac encoding and 'X'<'Y' in koi8 (here 'X' and 'Y' are just substitutions, i assume some two russian characters there. There's no way to get around this. B-Tree scanning in __hfs_brec_find() routine just works wrong. > Why don't you believe me there is a better solution? :) I just know that the usual algorythm used for example in FAT will not work. I totally broke my brain when tried to find a solution. Well, the whole B-Tree scanning routine can be totally rewritten in order to check every name in B-tree against the requested name, without using that '<' and '>' magic. But i guess it will be much slower. In addidion i'm just unable to rewrite this. HFS is a total mess and i have not enough spare time to put into its understanding. I wouldn't even try to rewrite such deep things, i'm afraid of breaking something. I completely don't understand HFS structure. Or, lexical order table in hfs/string.h can be regenerated dynamically correspondingly to used iocharset. But this will break hfs_hash_dentry() function which relies on this particular table too. Probably two tables can be used: original one and dynamically created one. But this requires the time i don't have. In addition, i guess you wouldn't like it too. -- Kind regards, Pavel Fedin