All of lore.kernel.org
 help / color / mirror / Atom feed
From: hpa@zytor.com (H. Peter Anvin)
To: linux-kernel@vger.kernel.org
Subject: Re: UTF-8 and case-insensitivity
Date: Wed, 18 Feb 2004 02:37:22 +0000 (UTC)	[thread overview]
Message-ID: <c0uj52$3mg$1@terminus.zytor.com> (raw)
In-Reply-To: 16434.41376.453823.260362@samba.org

Followup to:  <16434.41376.453823.260362@samba.org>
By author:    tridge@samba.org
In newsgroup: linux.dev.kernel
> 
>  > I don't know how Windows does it, so maybe this thing is hardcoded, and 
>  > you don't even want "true" case insensitivity. 
> 
> NTFS has a 128k table on disk, created at mkfs time and indexed by the
> UCS2 character.

So you're hosed if anyone uses characters outside the UCS-2 character
set...

> The interesting thing about this table is that it doesn't seem to
> vary between different locales as one might expect. I have checked 3
> locales so far (Swedish, Japanese and English) and all have the same
> 128k table. I should check a few more locales to see if it really is
> the same everywhere. Contact me off-list if you have a NTFS
> filesystem created in a different locale and would be willing to run
> a test program against it to see if the table is different from the
> one we have in Samba.

There is a "standard" table, which is published by the Unicode
consortium.  However, the "standard" table isn't what you want in
certain locales, e.g. Turkish.

> There is stuff in the charset handling of every locale that does vary
> in windows, but it isn't the case table, its the "valid characters"
> map used to determine what characters are allowed when converting
> strings into legacy multi-byte encodings. Even I don't think that the
> kernel will ever have to deal with that crap unless someone is foolish
> enough to port Samba into the kernel (several people have actually
> done that despite the insanity of the idea, but they all did an
> absolutely terrible job of it and certainly didn't take care to get
> all the charset handling right).
> 
> > How "correct" is Windows?
> 
> from my rather limited point of view I always have to assume that
> windows is "correct", unless I can show that its behaviour leads to
> data loss, a security hole or something equally extreme.

Well, we don't want to support a bunch of hacks to make it behave like
Windows if what Windows does doesn't make sense.  If so you should use
a metalayer where you canonicalize the filenames and don't store
"Makefile" on the disk; store "makefile" and keep the "real" filename
stashed elsewhere, perhaps an EA.

	-hpa


  parent reply	other threads:[~2004-02-18  2:37 UTC|newest]

Thread overview: 135+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-17  4:12 UTF-8 and case-insensitivity tridge
2004-02-17  5:11 ` Linus Torvalds
2004-02-17  6:54   ` tridge
2004-02-17  8:33     ` Neil Brown
2004-02-17 22:48       ` tridge
2004-02-18  0:06         ` Neil Brown
2004-02-18  9:47           ` Helge Hafting
2004-02-17 15:13     ` Linus Torvalds
2004-02-17 16:57       ` Linus Torvalds
2004-02-17 19:44         ` viro
2004-02-17 20:10           ` Linus Torvalds
2004-02-17 20:17             ` viro
2004-02-17 20:23               ` Linus Torvalds
2004-02-17 21:08         ` Robin Rosenberg
2004-02-17 21:17           ` Linus Torvalds
2004-02-17 22:27             ` Robin Rosenberg
2004-02-18  3:02               ` tridge
2004-02-17 23:57         ` tridge
2004-02-17 23:20       ` tridge
2004-02-17 23:43         ` Linus Torvalds
2004-02-18  3:26           ` tridge
2004-02-18  5:33             ` H. Peter Anvin
2004-02-18  7:54             ` Marc Lehmann
2004-02-18  2:37         ` H. Peter Anvin [this message]
2004-02-18  3:03           ` Linus Torvalds
2004-02-18  3:14             ` H. Peter Anvin
2004-02-18  3:27               ` Linus Torvalds
2004-02-18 21:31                 ` tridge
2004-02-18 22:23                   ` Linus Torvalds
2004-02-18 22:28                     ` Linus Torvalds
2004-02-18 22:50                       ` tridge
2004-02-18 22:59                         ` Linus Torvalds
2004-02-18 23:09                           ` tridge
2004-02-18 23:16                             ` Linus Torvalds
2004-02-19  8:10                               ` Jamie Lokier
2004-02-19 16:09                                 ` Linus Torvalds
2004-02-19 16:38                                   ` Jamie Lokier
2004-02-19 16:54                                     ` Linus Torvalds
2004-02-19 18:29                                       ` Jamie Lokier
2004-02-19 19:48                                         ` Eureka! (was Re: UTF-8 and case-insensitivity) Linus Torvalds
2004-02-19 19:51                                           ` Linus Torvalds
2004-02-19 19:48                                             ` H. Peter Anvin
2004-02-19 20:04                                               ` Linus Torvalds
2004-02-19 20:05                                           ` viro
2004-02-19 20:23                                             ` Linus Torvalds
2004-02-19 20:32                                               ` Linus Torvalds
2004-02-19 20:45                                                 ` viro
2004-02-19 21:26                                                   ` Linus Torvalds
2004-02-19 21:38                                                     ` Linus Torvalds
2004-02-19 21:45                                                     ` Linus Torvalds
2004-02-19 21:43                                                       ` viro
2004-02-19 21:53                                                         ` Linus Torvalds
2004-02-19 22:21                                                           ` David Lang
2004-02-19 20:48                                                 ` Jamie Lokier
2004-02-19 21:30                                                   ` Linus Torvalds
2004-02-20  0:00                                                     ` Jamie Lokier
2004-02-20  0:17                                                       ` Linus Torvalds
2004-02-20  0:24                                                         ` Linus Torvalds
2004-02-20  0:30                                                           ` Trond Myklebust
2004-02-20  0:54                                                           ` Jamie Lokier
2004-02-20  0:57                                                           ` tridge
2004-02-20  1:07                                                           ` Paul Wagland
2004-02-20 13:31                                                           ` Chris Wedgwood
2004-02-20  0:46                                                         ` Jamie Lokier
2004-02-23 10:13                                                           ` Tim Connors
2004-02-20  1:39                                                     ` Junio C Hamano
2004-02-20 12:54                                                       ` Jamie Lokier
2004-02-19 23:37                                           ` tridge
2004-02-20  0:02                                             ` Linus Torvalds
2004-02-20  0:16                                               ` tridge
2004-02-20  0:37                                                 ` Linus Torvalds
2004-02-20  1:26                                                   ` tridge
2004-02-20  1:07                                               ` H. Peter Anvin
2004-02-20  2:30                                           ` Theodore Ts'o
2004-02-20 12:04                                           ` explicit dcache <-> user-space cache coherency, sys_mark_dir_clean(), O_CLEAN Ingo Molnar
2004-02-20 13:19                                             ` Jamie Lokier
2004-02-20 13:37                                               ` Ingo Molnar
2004-02-20 14:00                                                 ` Ingo Molnar
2004-02-20 16:31                                                 ` Jamie Lokier
2004-02-20 13:23                                             ` [patch] " Ingo Molnar
2004-02-20 18:00                                               ` viro
2004-02-20 15:41                                             ` Linus Torvalds
2004-02-20 17:04                                               ` Ingo Molnar
2004-02-20 17:19                                                 ` Linus Torvalds
2004-02-20 18:48                                                   ` Ingo Molnar
2004-02-21  1:44                                                     ` Jamie Lokier
2004-02-21  7:58                                                     ` Ingo Molnar
2004-02-21  8:04                                                       ` viro
2004-02-21 17:46                                                         ` Ingo Molnar
2004-02-21 18:15                                                         ` Linus Torvalds
2004-02-21  8:26                                                       ` Keith Owens
2004-02-23 10:59                                                       ` Pavel Machek
2004-02-23 13:55                                                         ` Jamie Lokier
2004-02-23 16:45                                                           ` Ingo Molnar
2004-02-23 17:32                                                             ` Jamie Lokier
2004-02-20 23:00                                                   ` tridge
2004-02-20 17:33                                               ` Jamie Lokier
2004-02-20 18:22                                                 ` Linus Torvalds
2004-02-21  0:38                                                   ` Jamie Lokier
2004-02-21  1:10                                                     ` Linus Torvalds
2004-02-21  3:01                                                       ` Jamie Lokier
2004-02-20 17:47                                               ` Jamie Lokier
2004-02-20 20:38                                             ` Christer Weinigel
2004-02-22 15:07                                               ` Jamie Lokier
2004-02-22 16:55                                                 ` Miquel van Smoorenburg
2004-02-19 19:08                                       ` UTF-8 and case-insensitivity Helge Hafting
2004-02-18  4:08           ` tridge
2004-02-18 10:05             ` Robin Rosenberg
2004-02-18 11:43               ` tridge
2004-02-18 12:31                 ` Robin Rosenberg
2004-02-18 16:48                   ` H. Peter Anvin
2004-02-18 20:00                     ` H. Peter Anvin
2004-02-19  2:53   ` Daniel Newby
2004-02-17  5:25 ` Tim Connors
2004-02-17  7:43 ` H. Peter Anvin
2004-02-17  8:05   ` H. Peter Anvin
2004-02-17 14:25 ` Dave Kleikamp
2004-02-18  0:16 ` Robert White
2004-02-18  0:20   ` Linus Torvalds
2004-02-18  1:03     ` Robert White
2004-02-18 21:48     ` Ville Herva
2004-02-18  2:48   ` tridge
2004-02-18 20:56     ` Robert White
     [not found] <fa.epf5o9k.1rkudgo@ifi.uio.no>
     [not found] ` <fa.idvvhjl.1jge92d@ifi.uio.no>
2004-02-18  1:09   ` Andy Lutomirski
     [not found] <1q4Si-658-5@gated-at.bofh.it>
     [not found] ` <1q7no-8ss-7@gated-at.bofh.it>
     [not found]   ` <1qfb7-7s5-19@gated-at.bofh.it>
     [not found]     ` <1qmPm-6Gl-11@gated-at.bofh.it>
     [not found]       ` <1qpWI-1Sa-1@gated-at.bofh.it>
     [not found]         ` <1qqpO-2lx-3@gated-at.bofh.it>
     [not found]           ` <1qqzv-2tr-3@gated-at.bofh.it>
     [not found]             ` <1qqJc-2A2-5@gated-at.bofh.it>
     [not found]               ` <1qHAR-2Wm-49@gated-at.bofh.it>
     [not found]                 ` <1qIwr-5GB-11@gated-at.bofh.it>
     [not found]                   ` <1qIwr-5GB-9@gated-at.bofh.it>
     [not found]                     ` <1qIQ1-5WR-27@gated-at.bofh.it>
     [not found]                       ` <1qIZt-6b9-11@gated-at.bofh.it>
     [not found]                         ` <1qJsF-6Be-45@gated-at.bofh.it>
2004-02-19  0:06                           ` Pascal Schmidt
2004-02-19  1:01                             ` tridge
2004-02-19  1:08                               ` Hua Zhong
2004-02-19  1:46                                 ` tridge
2004-02-19  2:44                               ` Theodore Ts'o
2004-02-19  3:20                                 ` tridge
2004-02-19 10:18                                   ` Helge Hafting
2004-02-19 12:11                                   ` Paulo Marques
2004-02-19 19:04                                     ` Helge Hafting
2004-02-19 14:08                                   ` Theodore Ts'o
2004-02-19 20:12                                   ` Robert White

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='c0uj52$3mg$1@terminus.zytor.com' \
    --to=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.