linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicholas Miell <nmiell@comcast.net>
To: Barry Naujok <bnaujok@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	"xfs@oss.sgi.com" <xfs@oss.sgi.com>,
	linux-fsdevel@vger.kernel.org, urban@svenskatest.se
Subject: Re: RFC: Case-insensitive support for XFS
Date: Sun, 07 Oct 2007 22:44:48 -0700	[thread overview]
Message-ID: <1191822288.2694.10.camel@entropy> (raw)
In-Reply-To: <op.tzu4jagf3jf8g2@pc-bnaujok.melbourne.sgi.com>

On Mon, 2007-10-08 at 15:07 +1000, Barry Naujok wrote:
> On Sat, 06 Oct 2007 04:52:18 +1000, Nicholas Miell <nmiell@comcast.net>  
> wrote:
> 
> > On Fri, 2007-10-05 at 16:44 +0100, Christoph Hellwig wrote:
> >> [Adding -fsdevel because some of the things touched here might be of
> >>  broader interest and Urban because his name is on nls_utf8.c]
> >>
> >> On Fri, Oct 05, 2007 at 11:57:54AM +1000, Barry Naujok wrote:
> >> >
> >> > On it's own, linux only provides case conversion for old-style
> >> > character sets - 8 bit sequences only. A lot of distos are
> >> > now defaulting to UTF-8 and Linux NLS stuff does not support
> >> > case conversion for any unicode sets.
> >>
> >> The lack of case tables in nls_utf8.c defintively seems odd to me.
> >> Urban, is there a reason for that?  The only thing that comes to
> >> mind is that these tables might be quite large.
> >>
> >
> > Case conversion in Unicode is locale dependent. The legacy 8-bit
> > character encodings don't code for enough characters to run into the
> > ambiguities, so they can get away with fixed case conversion tables.
> > Unicode can't.
> 
> Based on http://www.unicode.org/reports/tr21/tr21-5.html and
> http://www.unicode.org/Public/UNIDATA/CaseFolding.txt
> 
> Doing case comparison using that table should cater for most
> circumstances except a few exeptions. It should be enough
> to satisfy a locale independant case-insensitive filesystem
> (ie. the C + F case folding option).
> 
> Is normalization required after case-folding? What I read
> implies it is not necessary for this purpose (and would
> slow things down and bloat the code more).
> 
> Now I suppose, it's just a question of a fixed table in the
> kernel driver (HFS+ style), or data stored in a special
> inode on-disk (NTFS style, shared refcounted in memory
> when the same). With the on-disk, the table can be generated
>  from mkfs.xfs.

You also have to decide whether to screw over people who speak Turkic
languages and expect an 'I' to 'ı' mapping or everybody else who expect
an 'I' to 'i' mapping.

Although, if you're content in ignoring the kernel's native NLS case
mapping tables (which expect a locale-independent 1-to-1 mapping), you
could just uppercase everything and map both 'i' and 'ı' to 'I'.

Then you have to decide whether things like 'ê' map to 'E' or 'Ê', which
is also locale dependent.

-- 
Nicholas Miell <nmiell@comcast.net>

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2007-10-08  5:52 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <op.ty6361ut3jf8g2@pc-bnaujok.melbourne.sgi.com>
     [not found] ` <op.tzpbqspl3jf8g2@pc-bnaujok.melbourne.sgi.com>
2007-10-05 15:44   ` RFC: Case-insensitive support for XFS Christoph Hellwig
2007-10-05 18:52     ` Nicholas Miell
2007-10-08  5:07       ` Barry Naujok
2007-10-08  5:44         ` Nicholas Miell [this message]
2007-10-08  6:17           ` Barry Naujok
2007-10-08  7:00           ` Barry Naujok
2007-10-10  2:27           ` Barry Naujok
2007-10-05 19:10     ` Anton Altaparmakov
2007-10-06  6:37       ` Brad Boyer
2007-10-06 13:00         ` Anton Altaparmakov
2007-10-08  0:43       ` Barry Naujok
2007-10-08  0:33     ` Barry Naujok

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1191822288.2694.10.camel@entropy \
    --to=nmiell@comcast.net \
    --cc=bnaujok@sgi.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=urban@svenskatest.se \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).