public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: tridge@samba.org
To: "Robert White" <rwhite@casabyte.com>
Cc: <linux-kernel@vger.kernel.org>,
	"'Linus Torvalds'" <torvalds@osdl.org>,
	"'Al Viro'" <viro@parcelfarce.linux.theplanet.co.uk>,
	"'Neil Brown'" <neilb@cse.unsw.edu.au>
Subject: RE: UTF-8 and case-insensitivity
Date: Wed, 18 Feb 2004 13:48:56 +1100	[thread overview]
Message-ID: <16434.53912.10089.216436@samba.org> (raw)
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA2ZSI4XW+fk25FhAf9BqjtMKAAAAQAAAABSIByOpdrUqd8yc+ZmEhqQEAAAAA@casabyte.com>

Robert,

Just about everything in your posting is either years out of date or
just totally wrong. 

 > OK, so I wrote the below, but then in the summary I realized that there was
 > a significant factor that doesn't fit in with the rest of the post.  Case
 > insensitivity, and more generally locale equivalence rules, is a security
 > nightmare.  Consider the number of different file names that "su" could map
 > to if you apply case insensitivity (4) and/or worse yet the various accents
 > and umlats (?,etc) that sort-equivalent for "u" in some locales.  The user
 > types "su" and runs "S(u-umlat)" etc. 

This is no different from the "stupid admin puts . in $PATH"
problem. Simple solutions:

 1) don't mount your root filesystem with case insensitive naming
 2) use a sane $PATH
 3) don't allow untrusted users to create files in your $PATH
 4) don't run bash in case insensitive mode if you can't for some
    you can't do (1) or (2) or (3)

any of (1), (2) or (3) solves this. 


 > In point of fact the entire windows application space has a
 > singular active locale at any one time and there is a well-defined
 > but horrible layer of indirection where "long names" like "My
 > Documents" become "real names" like "MYDOCU~1".  Essentially every
 > windows file name is subject to a double-indirect file name
 > translation.  The first pass is the strcasecmp() locale-dependent
 > traversal of the "long name" list.  The second is the strcasecmp()
 > frozen-locale-spec-dependent traversal of (US Latin?) 8.3 file
 > naming standard list of media elements (files/directories).

this is just total crap. That might have been true for msdos and even
possibly win9x, but its totally untrue for NTFS. There are enough
stupidities in windows without having to invent more.

NTFS is case insensitive at the filesystem level. In fact, its
selectable whether its case sensitive or case insensitive per-process
(a process can switch between the two models). The case mapping table
is built into the filesystem itself. That mapping has absolutely
*zero* to do with US Latin or any other legacy multi-byte encoding.

What you have done is the equivalent of stating that Linux can only do
14 character filenames, because once upon a time Linux had a
filesystem called minix. We've moved beyond that and so has windows. 

 > In point of fact, Windows is *not* "properly" case insensitive at the file
 > system level.  Use "dir /x" more often on your windows box to relive the
 > experience.  The "real" file names are mangled to good old 8.3 uppercase
 > internally(1).  You don't usually have to think about this, but if you have
 > ever lost the long-to-short file name mapping on a drive you know the hell
 > that ensues.  (see also iso9660.)

again, this is just complete crap. NTFS has had the ability to
completely disable 8.3 "alternative name" support for ages. Microsoft
is even starting to use this switch in their published benchmark
results, and I suspect it will become the default in a couple of
years. 

We've been through the same transition in Samba:

  - Samba 0.x only supported 8.3
  - Samba 1.x was oriented towards 8.3, but also supported long names
  - Samba 2.x and 3.x is oriented towards long names, and can disable 8.3
    names to some extent
  
by the time Samba 4.x comes out (I am working on it now) we may see a
significant number of sites disabling 8.3 completely. 

 > The thing is, to match this ersatz "functionality" on a system where more
 > than one locale may be used at the same time, you end up with a kind of
 > Cartesian product of user locales and filesystem native locales.  The cost
 > could get extreme and can only really be amortized if Linux were to declare
 > our own 8.3 style pronouncement for the character classes used for the
 > "real" file name storage (etc).

you are *way* out of date here. All recent windows apps use the UCS-2
interfaces which provides a single charset encoding across all
locales. I've heard that they may be redefining this as UCS-16 to
allow for an even larger range of characters, although I haven't seen
this popping up on the wire yet (then again, I just might not have
noticed). I wish they had chosen UTF-8 instead of UCS-2, but at least
they chose something and got it into every part of the OS years ago.

 > Late stage case insensitivity isn't that hard to put in a linux application,
 > just crack open your file selection dialog boxes and have them use
 > strcasecmp() in all their select/sort logic.  Also then replace open() with
 > CaseOpen() which does a find/search operation before daring to
 > creat().

Have you read *any* of what I've been saying about how expensive this is??

 > That is, in every practical way, how Windows handles these problems.  It
 > just happens in some fairly interesting and hard-to-predict places depending
 > on context.

No, that is *not* how current versions of windows do things. 

 > So what was I saying... Oh yea...
 > 
 > -- Single Locale storage standard required to prevent multiplicative cost.

windows has this. Linux doesn't.

 > -- Not that hard to fake case insensitivity "when necessary".

ditto

 > -- Cheaper in CPU/Space to mix case.

ditto

 > -- Native file names in native locales simplifies administration and
 > expectations. (not elaborated above, but true.)

?? single locale storage makes this just a no-op

 > -- Case insensitivity and locale equivalence leads to uncertainties about
 > what/which file may be intended in a given context, which could often lead
 > to exploitable error.

and that is just a complete load of crap. Windows has had exploitable
bugs due to case insensitivity, but the cause was things like leaving
directories in the search path writeable by unprivileged users. It was
*not* due to anything fundamentally insecure about case-insensitive
names in filesystems. 

 > (1) The actual truth is a tad uglier than this, the media can have the 8.3
 > names stored in interesting ways, but essentially a "toupper()" is done on
 > every file name as it is retrieved and processed.  This cuts out a lot of
 > possibilities and leads to a lot of "tildes of shame" in even some of the
 > more harmless seeming name conflicts.

oh i get it, you're just a troll ....

Cheers, Tridge

  parent reply	other threads:[~2004-02-18  2:49 UTC|newest]

Thread overview: 135+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-17  4:12 UTF-8 and case-insensitivity tridge
2004-02-17  5:11 ` Linus Torvalds
2004-02-17  6:54   ` tridge
2004-02-17  8:33     ` Neil Brown
2004-02-17 22:48       ` tridge
2004-02-18  0:06         ` Neil Brown
2004-02-18  9:47           ` Helge Hafting
2004-02-17 15:13     ` Linus Torvalds
2004-02-17 16:57       ` Linus Torvalds
2004-02-17 19:44         ` viro
2004-02-17 20:10           ` Linus Torvalds
2004-02-17 20:17             ` viro
2004-02-17 20:23               ` Linus Torvalds
2004-02-17 21:08         ` Robin Rosenberg
2004-02-17 21:17           ` Linus Torvalds
2004-02-17 22:27             ` Robin Rosenberg
2004-02-18  3:02               ` tridge
2004-02-17 23:57         ` tridge
2004-02-17 23:20       ` tridge
2004-02-17 23:43         ` Linus Torvalds
2004-02-18  3:26           ` tridge
2004-02-18  5:33             ` H. Peter Anvin
2004-02-18  7:54             ` Marc Lehmann
2004-02-18  2:37         ` H. Peter Anvin
2004-02-18  3:03           ` Linus Torvalds
2004-02-18  3:14             ` H. Peter Anvin
2004-02-18  3:27               ` Linus Torvalds
2004-02-18 21:31                 ` tridge
2004-02-18 22:23                   ` Linus Torvalds
2004-02-18 22:28                     ` Linus Torvalds
2004-02-18 22:50                       ` tridge
2004-02-18 22:59                         ` Linus Torvalds
2004-02-18 23:09                           ` tridge
2004-02-18 23:16                             ` Linus Torvalds
2004-02-19  8:10                               ` Jamie Lokier
2004-02-19 16:09                                 ` Linus Torvalds
2004-02-19 16:38                                   ` Jamie Lokier
2004-02-19 16:54                                     ` Linus Torvalds
2004-02-19 18:29                                       ` Jamie Lokier
2004-02-19 19:48                                         ` Eureka! (was Re: UTF-8 and case-insensitivity) Linus Torvalds
2004-02-19 19:51                                           ` Linus Torvalds
2004-02-19 19:48                                             ` H. Peter Anvin
2004-02-19 20:04                                               ` Linus Torvalds
2004-02-19 20:05                                           ` viro
2004-02-19 20:23                                             ` Linus Torvalds
2004-02-19 20:32                                               ` Linus Torvalds
2004-02-19 20:45                                                 ` viro
2004-02-19 21:26                                                   ` Linus Torvalds
2004-02-19 21:38                                                     ` Linus Torvalds
2004-02-19 21:45                                                     ` Linus Torvalds
2004-02-19 21:43                                                       ` viro
2004-02-19 21:53                                                         ` Linus Torvalds
2004-02-19 22:21                                                           ` David Lang
2004-02-19 20:48                                                 ` Jamie Lokier
2004-02-19 21:30                                                   ` Linus Torvalds
2004-02-20  0:00                                                     ` Jamie Lokier
2004-02-20  0:17                                                       ` Linus Torvalds
2004-02-20  0:24                                                         ` Linus Torvalds
2004-02-20  0:30                                                           ` Trond Myklebust
2004-02-20  0:54                                                           ` Jamie Lokier
2004-02-20  0:57                                                           ` tridge
2004-02-20  1:07                                                           ` Paul Wagland
2004-02-20 13:31                                                           ` Chris Wedgwood
2004-02-20  0:46                                                         ` Jamie Lokier
2004-02-23 10:13                                                           ` Tim Connors
2004-02-20  1:39                                                     ` Junio C Hamano
2004-02-20 12:54                                                       ` Jamie Lokier
2004-02-19 23:37                                           ` tridge
2004-02-20  0:02                                             ` Linus Torvalds
2004-02-20  0:16                                               ` tridge
2004-02-20  0:37                                                 ` Linus Torvalds
2004-02-20  1:26                                                   ` tridge
2004-02-20  1:07                                               ` H. Peter Anvin
2004-02-20  2:30                                           ` Theodore Ts'o
2004-02-20 12:04                                           ` explicit dcache <-> user-space cache coherency, sys_mark_dir_clean(), O_CLEAN Ingo Molnar
2004-02-20 13:19                                             ` Jamie Lokier
2004-02-20 13:37                                               ` Ingo Molnar
2004-02-20 14:00                                                 ` Ingo Molnar
2004-02-20 16:31                                                 ` Jamie Lokier
2004-02-20 13:23                                             ` [patch] " Ingo Molnar
2004-02-20 18:00                                               ` viro
2004-02-20 15:41                                             ` Linus Torvalds
2004-02-20 17:04                                               ` Ingo Molnar
2004-02-20 17:19                                                 ` Linus Torvalds
2004-02-20 18:48                                                   ` Ingo Molnar
2004-02-21  1:44                                                     ` Jamie Lokier
2004-02-21  7:58                                                     ` Ingo Molnar
2004-02-21  8:04                                                       ` viro
2004-02-21 17:46                                                         ` Ingo Molnar
2004-02-21 18:15                                                         ` Linus Torvalds
2004-02-21  8:26                                                       ` Keith Owens
2004-02-23 10:59                                                       ` Pavel Machek
2004-02-23 13:55                                                         ` Jamie Lokier
2004-02-23 16:45                                                           ` Ingo Molnar
2004-02-23 17:32                                                             ` Jamie Lokier
2004-02-20 23:00                                                   ` tridge
2004-02-20 17:33                                               ` Jamie Lokier
2004-02-20 18:22                                                 ` Linus Torvalds
2004-02-21  0:38                                                   ` Jamie Lokier
2004-02-21  1:10                                                     ` Linus Torvalds
2004-02-21  3:01                                                       ` Jamie Lokier
2004-02-20 17:47                                               ` Jamie Lokier
2004-02-20 20:38                                             ` Christer Weinigel
2004-02-22 15:07                                               ` Jamie Lokier
2004-02-22 16:55                                                 ` Miquel van Smoorenburg
2004-02-19 19:08                                       ` UTF-8 and case-insensitivity Helge Hafting
2004-02-18  4:08           ` tridge
2004-02-18 10:05             ` Robin Rosenberg
2004-02-18 11:43               ` tridge
2004-02-18 12:31                 ` Robin Rosenberg
2004-02-18 16:48                   ` H. Peter Anvin
2004-02-18 20:00                     ` H. Peter Anvin
2004-02-19  2:53   ` Daniel Newby
2004-02-17  5:25 ` Tim Connors
2004-02-17  7:43 ` H. Peter Anvin
2004-02-17  8:05   ` H. Peter Anvin
2004-02-17 14:25 ` Dave Kleikamp
2004-02-18  0:16 ` Robert White
2004-02-18  0:20   ` Linus Torvalds
2004-02-18  1:03     ` Robert White
2004-02-18 21:48     ` Ville Herva
2004-02-18  2:48   ` tridge [this message]
2004-02-18 20:56     ` Robert White
     [not found] <fa.epf5o9k.1rkudgo@ifi.uio.no>
     [not found] ` <fa.idvvhjl.1jge92d@ifi.uio.no>
2004-02-18  1:09   ` Andy Lutomirski
     [not found] <1q4Si-658-5@gated-at.bofh.it>
     [not found] ` <1q7no-8ss-7@gated-at.bofh.it>
     [not found]   ` <1qfb7-7s5-19@gated-at.bofh.it>
     [not found]     ` <1qmPm-6Gl-11@gated-at.bofh.it>
     [not found]       ` <1qpWI-1Sa-1@gated-at.bofh.it>
     [not found]         ` <1qqpO-2lx-3@gated-at.bofh.it>
     [not found]           ` <1qqzv-2tr-3@gated-at.bofh.it>
     [not found]             ` <1qqJc-2A2-5@gated-at.bofh.it>
     [not found]               ` <1qHAR-2Wm-49@gated-at.bofh.it>
     [not found]                 ` <1qIwr-5GB-11@gated-at.bofh.it>
     [not found]                   ` <1qIwr-5GB-9@gated-at.bofh.it>
     [not found]                     ` <1qIQ1-5WR-27@gated-at.bofh.it>
     [not found]                       ` <1qIZt-6b9-11@gated-at.bofh.it>
     [not found]                         ` <1qJsF-6Be-45@gated-at.bofh.it>
2004-02-19  0:06                           ` Pascal Schmidt
2004-02-19  1:01                             ` tridge
2004-02-19  1:08                               ` Hua Zhong
2004-02-19  1:46                                 ` tridge
2004-02-19  2:44                               ` Theodore Ts'o
2004-02-19  3:20                                 ` tridge
2004-02-19 10:18                                   ` Helge Hafting
2004-02-19 12:11                                   ` Paulo Marques
2004-02-19 19:04                                     ` Helge Hafting
2004-02-19 14:08                                   ` Theodore Ts'o
2004-02-19 20:12                                   ` Robert White

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=16434.53912.10089.216436@samba.org \
    --to=tridge@samba.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@cse.unsw.edu.au \
    --cc=rwhite@casabyte.com \
    --cc=torvalds@osdl.org \
    --cc=viro@parcelfarce.linux.theplanet.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox