public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Norman Diamond" <ndiamond@wta.att.ne.jp>
To: <linux-kernel@vger.kernel.org>
Subject: Re: UTF-8 filenames
Date: Sun, 22 Feb 2004 21:30:50 +0900	[thread overview]
Message-ID: <18de01c3f93f$dc6d91d0$b5ee4ca5@DIAMONDLX60> (raw)

kernel@mikebell.org wrote:

> So then, just about everyone agrees that if you've got a filename with
> non-ASCII characters, you should pass it to creat() as UTF-8. You have
> to pass it as something, individual encodings like BIG5 and EUC-JP
> are unacceptable, and UCS-4's benefits over UTF-8 (simplicity and in
> VERY rare cases storage size reductions) aren't worth the stuff it
> breaks. Correct?

Correct except for the following cases.  Unix users for more than 20 years
have been creating filenames encoded in EUC-JP or SJIS (yes sadly some Unix
systems used SJIS).  I don't know how long BIG5 and Korean filenames have
been supported in Unix but it's probably not much different.  Consider
converting all your ASCII filenames to UTF-16.  Let everyone share the
short-term pain for the long-term gain.  When you get everyone to agree on
UTF-16, it will be ugly, but it will be equal for everyone.

By the way, another subthread mentioned that stty puts some stuff in the
kernel that could be done in user space.  In Unix systems the same is true
for IMEs, stty options specify the encoding of the output of an IME (e.g.
EUC-JP or SJIS, which then gets forwarded as input to shells, applications,
etc.), and whether a single backspace (or whatever character deletion
character) deletes an entire input character instead of just deleting a
single byte, etc.  I keep forgetting to see if Linux has the same stty
options.  I haven't needed to set them with stty because if I need to use a
different locale then I just open a new terminal emulator window using that
locale.

I don't have time even to follow all of this thread, so if anyone has
questions then CC me personally.  I don't know if I'll have time to answer
either, but I'll try.

> As I see it, there's no way for the kernel to deal with all the legacy
> filenames out there. There's no way the kernel can magically fix them.

That's true.  Some options of mount and some options of stty can be moved to
user space, but they will always need to be available.

By the way in Windows 98 it's really neat to share a disk folder across the
network and let clients with different code pages create files.  The host
where the folder is stored can't even delete some of the files that get
created.


             reply	other threads:[~2004-02-22 12:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-22 12:30 Norman Diamond [this message]
2004-02-22 20:45 ` UTF-8 filenames Jamie Lokier
2004-02-22 23:35   ` Norman Diamond
2004-02-23  6:10     ` Robin Rosenberg
2004-02-23 11:34       ` Norman Diamond
2004-02-23 12:15         ` Robin Rosenberg
2004-02-23 12:00     ` Jamie Lokier
2004-02-23 23:42       ` Norman Diamond
2004-02-24  0:02         ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='18de01c3f93f$dc6d91d0$b5ee4ca5@DIAMONDLX60' \
    --to=ndiamond@wta.att.ne.jp \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox