public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: kernel@mikebell.org
To: linux-kernel@vger.kernel.org
Subject: Re: JFS default behavior / UTF-8 filenames
Date: Thu, 19 Feb 2004 02:59:13 -0800	[thread overview]
Message-ID: <20040219105913.GE432@tinyvaio.nome.ca> (raw)
In-Reply-To: <1076886183.18571.14.camel@m222.net81-64-248.noos.fr>

So then, just about everyone agrees that if you've got a filename with
non-ASCII characters, you should pass it to creat() as UTF-8. You have
to pass it as something, individual encodings like BIG5 and EUC-JP
are unacceptable, and UCS-4's benefits over UTF-8 (simplicity and in
VERY rare cases storage size reductions) aren't worth the stuff it
breaks. Correct?

As I see it, there's no way for the kernel to deal with all the legacy
filenames out there. There's no way the kernel can magically fix them.

So the only thing the kernel could do for those who want to see valid
unicode is have an option to make UTF-8 only filesystems. Best would be
if it was done at mkfs time and always enforced from then on in so that
a non-UTF8 filename can never be created. Because if you want the kernel
to not pass non-UTF8 filenames back to userspace, the ONLY clean way to
do that is to make sure they're not there in the first place. You could
maybe try it with a mount=utf8only flag, but the only thing that could
do then would be to make the files with invalid filenames "disappear".

For filesystems like JFS and NTFS, I think this is the best way in the
long run, have the kernel output as UTF-8 by default, assume UTF-8
inputs, and reject non-UTF8 filenames because they can't really store
the arbitrary string of bytes model anyway.

For others which can, maybe leave it up to the filesystem creator
whether to reject non-UTF8 filenames or to accept invalid ones as well?
Either way, a well-written userspace app shouldn't barf on recieving
invalid UTF-8 from the kernel, we'll have legacy filenames around for a
good long time yet, and it's the only way to be portable to older
linuxes and other UNIXes where you definatly would not be guaranteed
valid UTF-8 no matter what new linux kernels decide.

In any case, the important part is to make sure userspace stops writing
filenames in BIG5 as soon as possible. I don't know if this can be done
nicely in libc, with libc automagically transforming the BIG5 filename
in open() to UTF-8 and the UTF-8 in readdir() to BIG5 based on the
locale, or if we have to rely on every userspace app to store filenames
in UTF-8 by themselves. But that's a decision for the glibc guys. It
doesn't affect that filenames need to start being written to the
filesystem in UTF-8 rather than other encodings, and that the only
decision the kernel has to make is whether or not to reject attempts to
create filenames which are invalid UTF-8.

  parent reply	other threads:[~2004-02-19 10:58 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-15 23:03 JFS default behavior Nicolas Mailhot
2004-02-16  3:45 ` Jan Knutar
2004-02-16  8:30   ` Nicolas Mailhot
2004-02-16  8:54     ` Valdis.Kletnieks
2004-02-16  6:21 ` jw schultz
2004-02-16 15:55   ` Jamie Lokier
2004-02-17  6:47     ` jw schultz
2004-02-17 21:37       ` Jamie Lokier
2004-02-17 22:12         ` Linus Torvalds
2004-02-18  9:59           ` Jamie Lokier
2004-02-18 15:54             ` Linus Torvalds
2004-02-18 23:58               ` Jamie Lokier
2004-02-19 10:59 ` kernel [this message]
2004-02-19 14:05   ` JFS default behavior / UTF-8 filenames Dave Kleikamp
2004-02-19 23:47     ` kernel
2004-02-20 15:00       ` Dave Kleikamp
2004-02-22 19:22         ` kernel
2004-02-24 14:44           ` Dave Kleikamp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040219105913.GE432@tinyvaio.nome.ca \
    --to=kernel@mikebell.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox