From mboxrd@z Thu Jan  1 00:00:00 1970
From: Pavel Fedin <sonic_amiga@rambler.ru>
Subject: Re: [PATCH] Full NLS support for HFS (classic) filesystem
Date: Tue, 31 May 2005 09:37:36 -0400
Message-ID: <429C68A0.20003@rambler.ru>
References: <429B1E35.2040905@rambler.ru> <Pine.LNX.4.61.0505301337040.3743@scrub.home>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Cc: linux-fsdevel@vger.kernel.org
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mxb.rambler.ru ([81.19.66.30]:49161 "EHLO mxb.rambler.ru")
	by vger.kernel.org with ESMTP id S261836AbVEaFgp (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Tue, 31 May 2005 01:36:45 -0400
To: Roman Zippel <zippel@linux-m68k.org>
In-Reply-To: <Pine.LNX.4.61.0505301337040.3743@scrub.home>
Sender: linux-fsdevel-owner@vger.kernel.org
List-Id: linux-fsdevel.vger.kernel.org

Roman Zippel wrote:

>>  Codepage option is called "hfscodepage", not "codepage" because in
>>future "codepage" option might be added to iso9660 filesystem in order
>>to enable translation of 8-bit names and in many countries ISO codepage
>>differs from HFS codepage.
> 
> 
> If the codepage differs, you simply use different arguments for that 
> option. Is there a _technical_ reason why "hfscodepage" and "codepage" 
> might behave differently? Otherwise I'd prefer to use the same name for 
> the option.

  Yes, there is one reason. It is my personal idea but i guess many 
people would agree. There are ISO CDs and HFS CDs. Using two different 
names allows to use one line in /etc/fstab file for all types of CDs. 
For example:

/dev/cdrom /mnt/cdrom iso9660,udf,hfsplus,hfs 
user,noauto,iocharset=koi8-r,hfscodepage=10007 0 0

  This is my line. If "codepage" option is added in the future then the 
line would look like:

/dev/cdrom /mnt/cdrom iso9660,udf,hfsplus,hfs 
user,noauto,iocharset=koi8-r,codepage=866,hfscodepage=10007 0 0

  You see, noone recompiles the kernel in order to specify codepages 
used. Instead mount options are used.

> Why do you build the extra translation tables? I'm not relly convinced 
> this is a kernel problem at all, but doing it more like fat would be more 
> acceptable (maybe just with some more sane defaults).

  I tried this in the beginning of my work and failed. As i can see some 
special tricky algorythm is used in order to find a file by name which 
depends on actual character code values (there are "<" and ">" 
conditions). It does not use string comparison at all. This algorythm 
requires the filename to be translated back from UNIX encoding to Mac 
encoding. I just can't apply another algorythm like "list all names in 
the directory, convert every name to UNIX encoding and compare", HFS 
simply does not behave this way.
  When using non-utf8 iocharset not all characters in "codepage" have 
their equivalents in "iocharset". There are some unmapped characters. In 
other implementations they all are replaced by "?". Here this will not 
work because this "?" can't be translated back to original character.
  Extra tables built on the fly ensure that every character in 
"codepage"  has its own unique equivalent in "iocharset", some 
equivalents are "invented" by marking all unmapped characters in 
"codepage" all unused characters in "iocharset" (those to which no one 
of "codepage"'s characters is mapped) and then maps unmapped characters 
to those unused characters. This works perfectly. I used this approach 
in my first attempt which introduced koi8-r only and were rejected by 
you several months ago.

> (BTW please try to inline the patch otherwise it's rather difficult to 
> quote from it.)

  Ok, next time.

>>+extern void hfs_triv2mac(struct hfs_name *, struct qstr *, unsigned char *, struct nls_table *);
> 
> 
> If you add a new argument, use "struct superblock *sb" as the first 
> argument.

  Instead of two arguments (unsigned char *table, struct nls_table *nls) 
? May be struct hfs_sb_info *sbi instead in order to avoid additional 
HFS_SB() macro?

-- 
  Kind regards, Pavel Fedin