From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pavel Fedin Subject: Re: [PATCH] Full NLS support for HFS (classic) filesystem Date: Tue, 31 May 2005 09:37:36 -0400 Message-ID: <429C68A0.20003@rambler.ru> References: <429B1E35.2040905@rambler.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: linux-fsdevel@vger.kernel.org Return-path: Received: from mxb.rambler.ru ([81.19.66.30]:49161 "EHLO mxb.rambler.ru") by vger.kernel.org with ESMTP id S261836AbVEaFgp (ORCPT ); Tue, 31 May 2005 01:36:45 -0400 To: Roman Zippel In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Roman Zippel wrote: >> Codepage option is called "hfscodepage", not "codepage" because in >>future "codepage" option might be added to iso9660 filesystem in order >>to enable translation of 8-bit names and in many countries ISO codepage >>differs from HFS codepage. > > > If the codepage differs, you simply use different arguments for that > option. Is there a _technical_ reason why "hfscodepage" and "codepage" > might behave differently? Otherwise I'd prefer to use the same name for > the option. Yes, there is one reason. It is my personal idea but i guess many people would agree. There are ISO CDs and HFS CDs. Using two different names allows to use one line in /etc/fstab file for all types of CDs. For example: /dev/cdrom /mnt/cdrom iso9660,udf,hfsplus,hfs user,noauto,iocharset=koi8-r,hfscodepage=10007 0 0 This is my line. If "codepage" option is added in the future then the line would look like: /dev/cdrom /mnt/cdrom iso9660,udf,hfsplus,hfs user,noauto,iocharset=koi8-r,codepage=866,hfscodepage=10007 0 0 You see, noone recompiles the kernel in order to specify codepages used. Instead mount options are used. > Why do you build the extra translation tables? I'm not relly convinced > this is a kernel problem at all, but doing it more like fat would be more > acceptable (maybe just with some more sane defaults). I tried this in the beginning of my work and failed. As i can see some special tricky algorythm is used in order to find a file by name which depends on actual character code values (there are "<" and ">" conditions). It does not use string comparison at all. This algorythm requires the filename to be translated back from UNIX encoding to Mac encoding. I just can't apply another algorythm like "list all names in the directory, convert every name to UNIX encoding and compare", HFS simply does not behave this way. When using non-utf8 iocharset not all characters in "codepage" have their equivalents in "iocharset". There are some unmapped characters. In other implementations they all are replaced by "?". Here this will not work because this "?" can't be translated back to original character. Extra tables built on the fly ensure that every character in "codepage" has its own unique equivalent in "iocharset", some equivalents are "invented" by marking all unmapped characters in "codepage" all unused characters in "iocharset" (those to which no one of "codepage"'s characters is mapped) and then maps unmapped characters to those unused characters. This works perfectly. I used this approach in my first attempt which introduced koi8-r only and were rejected by you several months ago. > (BTW please try to inline the patch otherwise it's rather difficult to > quote from it.) Ok, next time. >>+extern void hfs_triv2mac(struct hfs_name *, struct qstr *, unsigned char *, struct nls_table *); > > > If you add a new argument, use "struct superblock *sb" as the first > argument. Instead of two arguments (unsigned char *table, struct nls_table *nls) ? May be struct hfs_sb_info *sbi instead in order to avoid additional HFS_SB() macro? -- Kind regards, Pavel Fedin