public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: UTF-8 filenames
@ 2004-02-22 12:30 Norman Diamond
  2004-02-22 20:45 ` Jamie Lokier
  0 siblings, 1 reply; 9+ messages in thread
From: Norman Diamond @ 2004-02-22 12:30 UTC (permalink / raw)
  To: linux-kernel

kernel@mikebell.org wrote:

> So then, just about everyone agrees that if you've got a filename with
> non-ASCII characters, you should pass it to creat() as UTF-8. You have
> to pass it as something, individual encodings like BIG5 and EUC-JP
> are unacceptable, and UCS-4's benefits over UTF-8 (simplicity and in
> VERY rare cases storage size reductions) aren't worth the stuff it
> breaks. Correct?

Correct except for the following cases.  Unix users for more than 20 years
have been creating filenames encoded in EUC-JP or SJIS (yes sadly some Unix
systems used SJIS).  I don't know how long BIG5 and Korean filenames have
been supported in Unix but it's probably not much different.  Consider
converting all your ASCII filenames to UTF-16.  Let everyone share the
short-term pain for the long-term gain.  When you get everyone to agree on
UTF-16, it will be ugly, but it will be equal for everyone.

By the way, another subthread mentioned that stty puts some stuff in the
kernel that could be done in user space.  In Unix systems the same is true
for IMEs, stty options specify the encoding of the output of an IME (e.g.
EUC-JP or SJIS, which then gets forwarded as input to shells, applications,
etc.), and whether a single backspace (or whatever character deletion
character) deletes an entire input character instead of just deleting a
single byte, etc.  I keep forgetting to see if Linux has the same stty
options.  I haven't needed to set them with stty because if I need to use a
different locale then I just open a new terminal emulator window using that
locale.

I don't have time even to follow all of this thread, so if anyone has
questions then CC me personally.  I don't know if I'll have time to answer
either, but I'll try.

> As I see it, there's no way for the kernel to deal with all the legacy
> filenames out there. There's no way the kernel can magically fix them.

That's true.  Some options of mount and some options of stty can be moved to
user space, but they will always need to be available.

By the way in Windows 98 it's really neat to share a disk folder across the
network and let clients with different code pages create files.  The host
where the folder is stored can't even delete some of the files that get
created.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-02-24  0:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-22 12:30 UTF-8 filenames Norman Diamond
2004-02-22 20:45 ` Jamie Lokier
2004-02-22 23:35   ` Norman Diamond
2004-02-23  6:10     ` Robin Rosenberg
2004-02-23 11:34       ` Norman Diamond
2004-02-23 12:15         ` Robin Rosenberg
2004-02-23 12:00     ` Jamie Lokier
2004-02-23 23:42       ` Norman Diamond
2004-02-24  0:02         ` Jamie Lokier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox