From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nicholas Miell Subject: Re: RFC: Case-insensitive support for XFS Date: Sun, 07 Oct 2007 22:44:48 -0700 Message-ID: <1191822288.2694.10.camel@entropy> References: <20071005154442.GA6432@infradead.org> <1191610338.2695.8.camel@entropy> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Christoph Hellwig , "xfs@oss.sgi.com" , linux-fsdevel@vger.kernel.org, urban@svenskatest.se To: Barry Naujok Return-path: Received: from sccrmhc14.comcast.net ([63.240.77.84]:50463 "EHLO sccrmhc14.comcast.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750875AbXJHFwY (ORCPT ); Mon, 8 Oct 2007 01:52:24 -0400 In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Mon, 2007-10-08 at 15:07 +1000, Barry Naujok wrote: > On Sat, 06 Oct 2007 04:52:18 +1000, Nicholas Miell =20 > wrote: >=20 > > On Fri, 2007-10-05 at 16:44 +0100, Christoph Hellwig wrote: > >> [Adding -fsdevel because some of the things touched here might be = of > >> broader interest and Urban because his name is on nls_utf8.c] > >> > >> On Fri, Oct 05, 2007 at 11:57:54AM +1000, Barry Naujok wrote: > >> > > >> > On it's own, linux only provides case conversion for old-style > >> > character sets - 8 bit sequences only. A lot of distos are > >> > now defaulting to UTF-8 and Linux NLS stuff does not support > >> > case conversion for any unicode sets. > >> > >> The lack of case tables in nls_utf8.c defintively seems odd to me. > >> Urban, is there a reason for that? The only thing that comes to > >> mind is that these tables might be quite large. > >> > > > > Case conversion in Unicode is locale dependent. The legacy 8-bit > > character encodings don't code for enough characters to run into th= e > > ambiguities, so they can get away with fixed case conversion tables= =2E > > Unicode can't. >=20 > Based on http://www.unicode.org/reports/tr21/tr21-5.html and > http://www.unicode.org/Public/UNIDATA/CaseFolding.txt >=20 > Doing case comparison using that table should cater for most > circumstances except a few exeptions. It should be enough > to satisfy a locale independant case-insensitive filesystem > (ie. the C + F case folding option). >=20 > Is normalization required after case-folding? What I read > implies it is not necessary for this purpose (and would > slow things down and bloat the code more). >=20 > Now I suppose, it's just a question of a fixed table in the > kernel driver (HFS+ style), or data stored in a special > inode on-disk (NTFS style, shared refcounted in memory > when the same). With the on-disk, the table can be generated > from mkfs.xfs. You also have to decide whether to screw over people who speak Turkic languages and expect an 'I' to '=C4=B1' mapping or everybody else who e= xpect an 'I' to 'i' mapping. Although, if you're content in ignoring the kernel's native NLS case mapping tables (which expect a locale-independent 1-to-1 mapping), you could just uppercase everything and map both 'i' and '=C4=B1' to 'I'. Then you have to decide whether things like '=C3=AA' map to 'E' or '=C3= =8A', which is also locale dependent. --=20 Nicholas Miell - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html