Re: UTF-8 and case-insensitivity

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: UTF-8 and case-insensitivity
       [not found]                         ` <1qJsF-6Be-45@gated-at.bofh.it>
@ 2004-02-19  0:06                           ` Pascal Schmidt
  2004-02-19  1:01                             ` tridge
  0 siblings, 1 reply; 69+ messages in thread
From: Pascal Schmidt @ 2004-02-19  0:06 UTC (permalink / raw)
  To: tridge; +Cc: linux-kernel

On Thu, 19 Feb 2004 00:40:21 +0100, you wrote in linux.kernel:

> Because a large number of file operations are on filenames that don't
> exist. I have to *prove* they don't exist. That includes:

Evil question: do you need to be case-preserving? 'Cause if not, you
could simply squash all incoming filenames from case-insensitive clients
to some canonical form (say, all lower-case) and use that.

-- 
Ciao,
Pascal

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-19  0:06                           ` UTF-8 and case-insensitivity Pascal Schmidt
@ 2004-02-19  1:01                             ` tridge
  2004-02-19  1:08                               ` Hua Zhong
  2004-02-19  2:44                               ` Theodore Ts'o
  0 siblings, 2 replies; 69+ messages in thread
From: tridge @ 2004-02-19  1:01 UTC (permalink / raw)
  To: Pascal Schmidt; +Cc: linux-kernel

Pascal,

 > Evil question: do you need to be case-preserving? 'Cause if not, you
 > could simply squash all incoming filenames from case-insensitive clients
 > to some canonical form (say, all lower-case) and use that.

yes, we have to be case preserving, but thats not the problem. Keeping
some name mapping in user space or xattrs is tedious but conceptually
easy and potentially quite efficient.

The problem is that Samba isn't the only program to be accessing these
directories. Multi-protocol file servers and file servers where users
also have local access are common. That means we can't assume that
some other filesystem user hasn't created a file which matches in a
case-insensitive manner. That means we need to do an awful lot of
directory scans.

I also understand the decision Linus has made that we won't be doing
anything fundamental at the filesystem level to fix this, so we will
just have to live with it. 

Cheers, Tridge

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: UTF-8 and case-insensitivity
  2004-02-19  1:01                             ` tridge
@ 2004-02-19  1:08                               ` Hua Zhong
  2004-02-19  1:46                                 ` tridge
  2004-02-19  2:44                               ` Theodore Ts'o
  1 sibling, 1 reply; 69+ messages in thread
From: Hua Zhong @ 2004-02-19  1:08 UTC (permalink / raw)
  To: tridge, 'Pascal Schmidt'; +Cc: linux-kernel

> The problem is that Samba isn't the only program to be accessing these
> directories. Multi-protocol file servers and file servers where users
> also have local access are common. That means we can't assume that
> some other filesystem user hasn't created a file which matches in a
> case-insensitive manner. That means we need to do an awful lot of
> directory scans.

Do you also require NFSD or other file daemons to do the same
case-insensitivity check? Say you create a foo, how do you prevent NFSD
from creating FOO? What could you do about that?



^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: UTF-8 and case-insensitivity
  2004-02-19  1:08                               ` Hua Zhong
@ 2004-02-19  1:46                                 ` tridge
  0 siblings, 0 replies; 69+ messages in thread
From: tridge @ 2004-02-19  1:46 UTC (permalink / raw)
  To: hzhong; +Cc: 'Pascal Schmidt', linux-kernel

Hua,

 > Do you also require NFSD or other file daemons to do the same
 > case-insensitivity check?

no. That's the point of the per-process check. Only Samba needs to pay
the price.

 > Say you create a foo, how do you prevent NFSD from creating FOO?
 > What could you do about that?

You don't need to do anything in particular about it. I did explain
this earlier in this thread, but here goes again:

 * samba always tries the name exactly as given by the client. If that
   succeeds then we are done. 

 * if it doesn't find an exact match then it does a directory scan. It
   uses the first case-insensitive matching name it finds, or if it
   reaches the end of the directory then it concludes that the file
   doesn't exist.

So if FOO and foo both exist in the filesystem, and someone asks for
FoO then its pretty much random which one they get (ok, not exactly
random, but close enough for this argument). The thing is that just
making an arbitrary choice is a perfectly fine set of semantics. You
can't deal with this situation any more sanely, so don't even try.

well, actually, there is something you could do that we don't do. We
could have some special marker that distinguishes files created by
windows clients and files created by unix clients, and preferentially
return the one created by windows clients, I just don't think this is
worth doing. Nobody has even complained (within earshot of me anyway)
of the current "pick one" method.

Cheers, Tridge

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-19  1:01                             ` tridge
  2004-02-19  1:08                               ` Hua Zhong
@ 2004-02-19  2:44                               ` Theodore Ts'o
  2004-02-19  3:20                                 ` tridge
  1 sibling, 1 reply; 69+ messages in thread
From: Theodore Ts'o @ 2004-02-19  2:44 UTC (permalink / raw)
  To: tridge; +Cc: Pascal Schmidt, linux-kernel

On Thu, Feb 19, 2004 at 12:01:53PM +1100, tridge@samba.org wrote:
> The problem is that Samba isn't the only program to be accessing these
> directories. Multi-protocol file servers and file servers where users
> also have local access are common. That means we can't assume that
> some other filesystem user hasn't created a file which matches in a
> case-insensitive manner. That means we need to do an awful lot of
> directory scans.

Actually, not necessarily.  What if Samba gets notifications of all
filename renames and creates in the directory, so that after the
initial directory scan, it can keep track of what filenames are
present in the directory?  It can then "prove the negative", as you
put it, without having to continuously do directory scans.

Yeah, there can be some race conditions, but Samba already has to deal
with the race condition where it tries to create "MaKeFiLe" either
just before or just after a Posix process creates "Makefile".  

						- Ted

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-19  2:44                               ` Theodore Ts'o
@ 2004-02-19  3:20                                 ` tridge
  2004-02-19 10:18                                   ` Helge Hafting
                                                     ` (3 more replies)
  0 siblings, 4 replies; 69+ messages in thread
From: tridge @ 2004-02-19  3:20 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Pascal Schmidt, linux-kernel

Ted,

 > Actually, not necessarily.  What if Samba gets notifications of all
 > filename renames and creates in the directory, so that after the
 > initial directory scan, it can keep track of what filenames are
 > present in the directory?  It can then "prove the negative", as you
 > put it, without having to continuously do directory scans.

Currently dnotify doesn't give you the filename that is being
added/deleted/renamed. It just tells you that something has happened,
but not enough to actually maintain a name cache in user space.

That could be changed, so that on a dnotify event you do a fcntl() to
ask for the name of the file. Or perhaps we could cram it into the
structure the signal handler gets passed? I doubt that would make
sense, but maybe some signal guru can tell me otherwise. Maybe we
could even invent a new dnotify system where you do a read on a file
descriptor to get details on what event happened, and give some
"everything has changed" error when you run out of buffers.

If that happened then we could build our own dcache in user space, but
it will be a very second rate dcache, with a racy and slow update
mechanism that will in itself chew cpu. Maybe thats the best we can
do, or maybe I should be asking distro vendors if they would consider
a case-insensitive patch, especially the vendors aiming for
"enterprise" scalability which might include serving windows clients.

 > Yeah, there can be some race conditions, but Samba already has to deal
 > with the race condition where it tries to create "MaKeFiLe" either
 > just before or just after a Posix process creates "Makefile".  

yes, thats true. 

The races aren't my primary concern really. I've spent the last week
doing profiling of a large Samba install, and after fixing a
horrendous scalability problem do to with fcntl locking (more on that
later) the next thing on the profile is stat() and directory
scans. That's why the efficiency of this stuff is a hot topic for me
right now.

It's not all as bleak as perhaps I make it seem though. I suspect
there is still quite a bit of improvement that can be made in Samba
just because our code is so messy that sometimes we do a stat() call
or a directory scan when perhaps we can prove that we don't need
to. The Samba4 code is much cleaner, and maybe we have room to keep
improving things for a couple of years by finding those inefficiencies
and fixing them. We will eventually hit a wall, but it could be a fair
way off. Maybe windows will be dead by then.

Cheers, Tridge

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-19  3:20                                 ` tridge
@ 2004-02-19 10:18                                   ` Helge Hafting
  2004-02-19 12:11                                   ` Paulo Marques
                                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 69+ messages in thread
From: Helge Hafting @ 2004-02-19 10:18 UTC (permalink / raw)
  To: tridge; +Cc: Theodore Ts'o, Pascal Schmidt, linux-kernel

tridge@samba.org wrote:
> Ted,
> 
>  > Actually, not necessarily.  What if Samba gets notifications of all
>  > filename renames and creates in the directory, so that after the
>  > initial directory scan, it can keep track of what filenames are
>  > present in the directory?  It can then "prove the negative", as you
>  > put it, without having to continuously do directory scans.
> 
> Currently dnotify doesn't give you the filename that is being
> added/deleted/renamed. It just tells you that something has happened,
> but not enough to actually maintain a name cache in user space.
> 
You can still keep per-directory caches that you simply invalidate on each dnotify,
and rebuild when necessary.  At least it would help the "repeated
lookup of nonexistant filenames" case.  
Path searches for executables usually happens on directories that don't 
see much writing.

Helge Hafting


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-19  3:20                                 ` tridge
  2004-02-19 10:18                                   ` Helge Hafting
@ 2004-02-19 12:11                                   ` Paulo Marques
  2004-02-19 19:04                                     ` Helge Hafting
  2004-02-19 14:08                                   ` Theodore Ts'o
  2004-02-19 20:12                                   ` Robert White
  3 siblings, 1 reply; 69+ messages in thread
From: Paulo Marques @ 2004-02-19 12:11 UTC (permalink / raw)
  To: tridge; +Cc: Theodore Ts'o, Pascal Schmidt, linux-kernel

tridge@samba.org wrote:

> Currently dnotify doesn't give you the filename that is being
> added/deleted/renamed. It just tells you that something has happened,
> but not enough to actually maintain a name cache in user space.

This might be a crazy / stupid idea, so flame at will :)

Wouldn't it be possible to do a samba "super-server" mode, in which samba would 
assume that it controlled the directories it is exporting?

In this mode a "corporate" Samba server, serving Windows clients, could improve 
performance by assuming that its cache was always up-to-date.

If if we wanted to access the directory locally we could always mount locally 
using samba, and access the files anyway, albeit a lot slower and without linux 
permissions, etc.

What we would gain was the ability to say "I want to give priority to my samba 
server" (and set it to "super-server" mode) or "my priority is to the linux 
native filesystem, and just want to share my files with windows users anyway" 
(and keep using samba as always).

-- 
Paulo Marques - www.grupopie.com

"In a world without walls and fences who needs windows and gates?"

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-19 12:11                                   ` Paulo Marques
@ 2004-02-19 19:04                                     ` Helge Hafting
  0 siblings, 0 replies; 69+ messages in thread
From: Helge Hafting @ 2004-02-19 19:04 UTC (permalink / raw)
  To: Paulo Marques; +Cc: tridge, Theodore Ts'o, Pascal Schmidt, linux-kernel

On Thu, Feb 19, 2004 at 12:11:32PM +0000, Paulo Marques wrote:
> tridge@samba.org wrote:
> 
> >Currently dnotify doesn't give you the filename that is being
> >added/deleted/renamed. It just tells you that something has happened,
> >but not enough to actually maintain a name cache in user space.
> 
> This might be a crazy / stupid idea, so flame at will :)
> 
> Wouldn't it be possible to do a samba "super-server" mode, in which samba 
> would assume that it controlled the directories it is exporting?
> 
> In this mode a "corporate" Samba server, serving Windows clients, could 
> improve performance by assuming that its cache was always up-to-date.
> 
> If if we wanted to access the directory locally we could always mount 
> locally using samba, and access the files anyway, albeit a lot slower and 
> without linux permissions, etc.
> 
You don't really need to go to such extremes.  Samba can use dnotify,
and run with caching and great performance as long as nobody touch
the files in other ways.  There is no need to _enforce_ it though,
samba can cope by invalidating the cache on those rare occations
the files are accessed in other ways. It won't happen often, because:

1. Linux/nfs people have no business in a directory full of
   windows .dll's and .exe's
2. On a corporate server you simply tell people to stay out.
   nfs may export another set of homedirs for the unix people.

> What we would gain was the ability to say "I want to give priority to my 
> samba server" (and set it to "super-server" mode) or "my priority is to the 
> linux native filesystem, and just want to share my files with windows users 
> anyway" (and keep using samba as always).
>
Thanks to dnotify even the "linux priority" setup will be able to benefit
from a cache.  Particularly if we can get a dnotify that doesn't trip
when samba is the one making changes.  


Helge Hafting

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-19  3:20                                 ` tridge
  2004-02-19 10:18                                   ` Helge Hafting
  2004-02-19 12:11                                   ` Paulo Marques
@ 2004-02-19 14:08                                   ` Theodore Ts'o
  2004-02-19 20:12                                   ` Robert White
  3 siblings, 0 replies; 69+ messages in thread
From: Theodore Ts'o @ 2004-02-19 14:08 UTC (permalink / raw)
  To: tridge; +Cc: Pascal Schmidt, linux-kernel

On Thu, Feb 19, 2004 at 02:20:44PM +1100, tridge@samba.org wrote:
> Currently dnotify doesn't give you the filename that is being
> added/deleted/renamed. It just tells you that something has happened,
> but not enough to actually maintain a name cache in user space.
> 
> That could be changed, so that on a dnotify event you do a fcntl() to
> ask for the name of the file. Or perhaps we could cram it into the
> structure the signal handler gets passed? I doubt that would make
> sense, but maybe some signal guru can tell me otherwise. Maybe we
> could even invent a new dnotify system where you do a read on a file
> descriptor to get details on what event happened, and give some
> "everything has changed" error when you run out of buffers.

Yes, that's what I was suggesting.  One advantage of such a scheme is
that it's not just for Windows compatibility.  A more rich directory
change notification scheme would also be useful for graphical file
managers, automatic indexing tools, and many, many other applications.

No, it's not everything you were requesting, but it may very well
represent three-quarters of a loaf, instead of nothing.

> If that happened then we could build our own dcache in user space, but
> it will be a very second rate dcache, with a racy and slow update
> mechanism that will in itself chew cpu. Maybe thats the best we can
> do, or maybe I should be asking distro vendors if they would consider
> a case-insensitive patch, especially the vendors aiming for
> "enterprise" scalability which might include serving windows clients.

I don't know that the update mechanism has to seriously chew that much
CPU.  It can certainly can be designed to minimize the amount of CPU
that is consumed, especially if it is read via a file descriptor so
that multiple updates can be sent via a single read() system call,
instead of sending a signal every single time a directory entry is
created, renamed, or deleted.

The problem with a case-insentive patch is that for most modern
filesystems (i.e., any filesystem that does better than O(1) directory
searches), it will have to involve a format change, since the case
insensitivity has to be built into the hash function or the tree
comparison fucture, or both.  At this point, the filesystem author has
to make the choice of whether to try to solve the Windows-specific
problem, in which case the fundamental filesystem format would have to
be tailored to the Windows case mapping table, or try to solve the
more general I18N case mapping problem.  (Lots of luck!  It's
constantly changing over time as new character sets are added or
modified...)  Yes, a few such filesystems might have this support
already, but I doubt distributions would be willing to accept patches
that make filesystem format-incompatible changes just for the sake of
accelerating Samba operations.

I don't know if the distributions would be willing to accept a
case-insensitive patch, but my suspicions is that it would be
difficult, and I would argue that it might be more efficient to get a
richer directory change notification system, for the reasons I argued
above.

						- Ted

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: UTF-8 and case-insensitivity
  2004-02-19  3:20                                 ` tridge
                                                     ` (2 preceding siblings ...)
  2004-02-19 14:08                                   ` Theodore Ts'o
@ 2004-02-19 20:12                                   ` Robert White
  3 siblings, 0 replies; 69+ messages in thread
From: Robert White @ 2004-02-19 20:12 UTC (permalink / raw)
  To: tridge, 'Theodore Ts'o'
  Cc: 'Pascal Schmidt', linux-kernel

(I may, of course, be overly naive... but a thought occurs... 8-)

It would seem that the there is a moment of opportunity at the
dentry_operations invocation point to harvest all the information you would
need to maintain a specialized dcache in a separate module.  Unfortunately,
since the individual file systems get to tweak their own pointer(s) to
this/these struct-of-calls it could get hard to hijack things at that level.

With two changes to core Linux behavior, which could easily be implemented
as a configurable kernel option, you could create an advisory hook.

1) add a usually-null pointer(*) to dentry_operations structure to the
superblock data structure in vfs (and, of course, an install/remove
structure call pair) as a look-aside mechanism, and

2) if-not-null "parallel" invocations of these "advisory" calls are then
added to the fixed vfs invocation points along side the normal dentry
notices...

You could then add any imaginable advisory behavior to any file system.  A
well crafted module could then attach to file systems, flag directories (+),
and get low-level advisory service at core dentry action time.

A module so attached could answer all your negative enquiries quickly and
yet remain nicely segregated.  You could probably create the magic_open
dream logic of your choice and net near, if not absolute, race elimination.

You still might have to readdir a whole dirctory from time to time just to
clean-up a partily aged cache, but there would be no need for the stepwise
transfer of this information into the user context.

100% of the native function of each file system is preserved and there are
probably other applications for this look-aside feature like low-level
security auditing or semantic mirroring (a-la real-time rdist). 

But, you know, just a thought...

Rob.

(*) this should, if enabled, be arranged as a linked list of structures so
that multiple modules could be installed for different purposes.

(+) flagging and un-flagging directories of interest ad-hoc is needed to
prevent saturation of resources.

^ permalink raw reply	[flat|nested] 69+ messages in thread

[parent not found: <fa.epf5o9k.1rkudgo@ifi.uio.no>]

[parent not found: <fa.idvvhjl.1jge92d@ifi.uio.no>]

* Re: UTF-8 and case-insensitivity
       [not found] ` <fa.idvvhjl.1jge92d@ifi.uio.no>
@ 2004-02-18  1:09   ` Andy Lutomirski
  0 siblings, 0 replies; 69+ messages in thread
From: Andy Lutomirski @ 2004-02-18  1:09 UTC (permalink / raw)
  To: Kernel Mailing List; +Cc: Andrew Tridgell, Linus Torvalds, Al Viro

Linus Torvalds wrote:
> 	int magic_open(
> 		/* Input arguments */
> 		const char *pathname,
> 		unsigned long flags,
> 		mode_t mode,
> 
> 		/* output arguments */
> 		int *fd,
> 		struct stat *st,
> 		int *successful_path_length);
> 
> ie the system call would:
> 
>  - look up as far into the pathname (using _exact_ lookup) as possible
>  - return the error code of the last failure
>  - the "flags" could be extended so that you can specify that you mustn't 
>    traverse ".." or symlinks (ie those would count as failures)
> 
> but also:
> 
>  - fill in the "struct stat" information for the last _successful_ 
>    pathname component.
>  - fill in the "fd" with a fd of the last _successful_ pathname component.
>  - tell how much of the pathname it could traverse.

Aside from just case-insensitivity, I imagine this could give lots of other 
benefits:

  - file servers that don't want to follow symlinks can do it quickly.
  - Apache could serve things like http://www.foo.com/a/b/c/d.php/e/f/g a lot 
faster.
  - a flag to avoid traversing mountpoints could help someone
  - a flag for root to see _through_ mountpoints would make it possible to clean 
up initramfs and such that got mounted over, or to do other useful and currently 
  impossible tasks.  (e.g. I could see what's under my devfs mount...)

I would be nice to see this added even if it's not the perfect solution for samba :)

BTW, here's a thought for solving samba's negative lookup problem:

int ugly_stat(char *pattern, struct stat *st, char *match_out)

Pattern would be some description of what the filename should look like. 
Something like:

- pattern is an array of slash-delimited groups of characters separated by nulls 
and terminated by two nulls.  For example, ugly_stat("F/f\0O/o\0O/o\0\0", ...) 
finds a file called foo, case-insensitively in English, while 
ugly_stat("F\0i\0l\0e\011/22/33") finds "File" followed by either 11, 22, or 33.
- the dcache problem is easy: don't use it.  All Andrew wants (I think) is proof 
that there is no such file or the name if there is one.  Samba can cache it 
itself; I don't think the kernel should involve itself in trying to cache this.
- ugly_stat does not traverse directories -- that's why the slash trick is safe.
- st gets the stat data, and match_out gets the filename if any
- if there are multiple matches, one is arbitrarily selected.

If the file-system doesn't have specific support for this, then either VFS or 
the caller could emulate it (probably VFS -- it would avoid lots of syscalls).

Would ugly_stat + magic_open be sufficient?

--Andy

^ permalink raw reply	[flat|nested] 69+ messages in thread

* UTF-8 and case-insensitivity
@ 2004-02-17  4:12 tridge
  2004-02-17  5:11 ` Linus Torvalds
                   ` (4 more replies)
  0 siblings, 5 replies; 69+ messages in thread
From: tridge @ 2004-02-17  4:12 UTC (permalink / raw)
  To: linux-kernel

Given how much pain the "kernel is agnostic to charset encoding"
attitude has cost me in terms of programming pain, I thought I should
de-cloak from lurk mode and put my 2c into the UTF-8 issue.

Personally I think that eventually the Linux kernel will have to
embrace the interpretation of the byte streams that applications have
given it, despite the fact that this will be very painful and
potentially quite complex. The reason is that I think that eventually
the Linux kernel will need to efficiently support a userspace policy
of case-insensitivity and the only way to do case-insensitive filename
operations is to interpret those byte streams as a particular
encoding.

Personally I much prefer the systems I use to be case-sensitive, but
there are important applications that require case-insensitivity for
interoperability. Right now it is not possible to write a case
insensitive application on Linux in an efficient manner. With the
current "encoding agnostic" APIs a simple open() or stat() call
becomes a horrendously expensive operation and one that is fraught
with race conditions. Providing the same functionality in the kernel
is dirt cheap by comparison (not cheap in terms of code complexity,
but cheap in terms of runtime efficiency).

Cheers, Tridge

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17  4:12 tridge
@ 2004-02-17  5:11 ` Linus Torvalds
  2004-02-17  6:54   ` tridge
  2004-02-19  2:53   ` Daniel Newby
  2004-02-17  5:25 ` Tim Connors
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 69+ messages in thread
From: Linus Torvalds @ 2004-02-17  5:11 UTC (permalink / raw)
  To: Andrew Tridgell; +Cc: Kernel Mailing List, Al Viro

[ Al cc'd, because while I'm pretty certain that he agrees with me 100% on 
  the craziness of case-insensitive name lookups, he may have some input
  on the "samba helper" function approach. That input may well boil down 
  to "Linus is crazy", of course. Wouldn't be the first time ;)

  Andrew - you really should assume that case insensitivity is a hell of a 
  lot more costly than you think it is, and forget that particular idea. 
  Let's see if there are acceptable half-measures. ]

On Tue, 17 Feb 2004 tridge@samba.org wrote:
>
> Given how much pain the "kernel is agnostic to charset encoding"
> attitude has cost me in terms of programming pain, I thought I should
> de-cloak from lurk mode and put my 2c into the UTF-8 issue.
> 
> Personally I think that eventually the Linux kernel will have to
> embrace the interpretation of the byte streams that applications have
> given it, despite the fact that this will be very painful and
> potentially quite complex.

I seriously doubt it. There just isn't any point.

>		 The reason is that I think that eventually
> the Linux kernel will need to efficiently support a userspace policy
> of case-insensitivity and the only way to do case-insensitive filename
> operations is to interpret those byte streams as a particular
> encoding.

The thing is, if you want to do efficient user-space case-insensitive 
lookups, that is a _completely_ different matter from having the kernel do 
case-insensitivity.

Kernel-level case insensitivity is a total disaster, and your "very
painful and potentially quite complex" assertion is the understatement of
the year. The thing is, you can't sanely do dentry caching, since the case
insensitivity has to be per-open or at least per-process (you MUST NOT be
case-insensitive in a POSIX process).

So the only way to do case-insensitive names is to do all lookups very 
slowly. I'm willing to bet that WNT opens files a hell of a lot slower 
than Linux does, and one big portion of that is exactly the fact that 
Linux can do a really good job with the dentry cache.

And that _depends_ on a well-defined and unique filename setup (by
changing the hashing function and compare function, a filesystem can do a
limited kind of case-insensitivity right now in Linux, but then it will
have to be not only fairly slow, but also case-insensitive for _everybody_
which is unacceptable in a mixed POSIX/samba environment).

In other words, just forget the whole notion. The only set people who have
any reason at _all_ to want it is the samba team, and we can solve the 
samba-specific problems other ways.

Just take that as a simple fact - case insensitivity in the kernel is such 
a horribly bad idea, that you really shouldn't go there.

With that destructive criticism out of the way, let's look at somewhat 
more constructive approaches, ie some way to allow certain processes that 
need it better help in their quest for case insensitivity.

Let's start with some assumptions:

 - MOST name lookups are likely results of some kind of "readdir()" 
   lookup, and tend to have the case right in the first place. So that 
   should go fast. Maybe Tridge has some statistics on this one?

 - samba probably has certain pretty well-defined special patterns for 
   what it wants to do with a filename, do you probably don't need a 
   generic "everything that takes a filename should be case-insensitive", 
   and it would be acceptable to have a few _very_ specific system calls.

With those assumptions out of the way, we could think of an interface that
exports some partial functionality of the "lookup_path()" code the kernel
as a special system call. In particular, something that takes an input
pathname, and is able to stop at any point of the name when a lookup
fails.

So some variation of the interface

	int magic_open(
		/* Input arguments */
		const char *pathname,
		unsigned long flags,
		mode_t mode,

		/* output arguments */
		int *fd,
		struct stat *st,
		int *successful_path_length);

ie the system call would:

 - look up as far into the pathname (using _exact_ lookup) as possible
 - return the error code of the last failure
 - the "flags" could be extended so that you can specify that you mustn't 
   traverse ".." or symlinks (ie those would count as failures)

but also:

 - fill in the "struct stat" information for the last _successful_ 
   pathname component.
 - fill in the "fd" with a fd of the last _successful_ pathname component.
 - tell how much of the pathname it could traverse.

so that the user can do a "readdir" and try to "fix up" the problem
without having to restart the whole thing. For the (hopefully common case)  
where the cases match, this would just boil down to an "open with stat
information" thing.

We'd need something more interesting to guarantee unique filename on file
create, possibly even including letting a trusted process maintain some
locks in the VFS layer. The point being that the kernel can _help_ some 
specific usage, but making case-insensitive names be part of the VFS layer 
proper is not acceptable.

I suspect we can do case-insensitive names faster than WNT even with a 
fairly complex user-mode interface. Just because _not_ having them in the 
kernel allows us to have much faster default behaviour.

			Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17  5:11 ` Linus Torvalds
@ 2004-02-17  6:54   ` tridge
  2004-02-17  8:33     ` Neil Brown
  2004-02-17 15:13     ` Linus Torvalds
  2004-02-19  2:53   ` Daniel Newby
  1 sibling, 2 replies; 69+ messages in thread
From: tridge @ 2004-02-17  6:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List, Al Viro

Linus,

 > Kernel-level case insensitivity is a total disaster, and your "very
 > painful and potentially quite complex" assertion is the understatement of
 > the year. The thing is, you can't sanely do dentry caching, since the case
 > insensitivity has to be per-open or at least per-process (you MUST NOT be
 > case-insensitive in a POSIX process).

right, and the patches to add this support to Linux that I have been
involved with in the past have been per-process. You are right that it
is messy, but it is not *horribly* messy. In fact I'd say it is no
worse than many of the other things we already have in the kernel,
although it certainly is much harder than sticking to the "bag of
bytes" interpretation of filenames. I just think that in this case the
simple solution is also wrong.

 > So the only way to do case-insensitive names is to do all lookups very 
 > slowly.

I don't agree with this at all. I agree that the worst-case will get
worse, but I see absolutely no reason why the average case will get
sigificantly worse and I think that the worst case will be rare.

In fact, John Bonesio did a patch to the 2.4 kernel with XFS that
implemened per-process case-insensitivity. It's been a long time since
I played with that patch, but I certainly don't recall any significant
slowdowns. The patch was messy, but it wasn't grossly
inefficient. (that patch was just a proof of concept, and just used
strcasecmp() instead of doing a proper UTF-8 case-insensitive compare,
so there will be some amount of additional cost to adding that).

>From memory, the patch added new classes of dentries to the current
"+ve" and "-ve" dentries. It added concepts like a "-ve
case-insensitive" dentry and a "-ve case-sensitive" dentry. It
certainly adds more code in trying to deal with these variants, but I
see no reason why it should be significantly computationally less
efficient.

 > I'm willing to bet that WNT opens files a hell of a lot slower 
 > than Linux does, and one big portion of that is exactly the fact that 
 > Linux can do a really good job with the dentry cache.

Anyone have any lmbench filesystem numbers for w2k3? The only windows
boxes I use are in vmware sessions, so running performance tests
myself is pretty pointless.

 > And that _depends_ on a well-defined and unique filename setup (by
 > changing the hashing function and compare function, a filesystem can do a
 > limited kind of case-insensitivity right now in Linux, but then it will
 > have to be not only fairly slow, but also case-insensitive for _everybody_
 > which is unacceptable in a mixed POSIX/samba environment).

right, and thats why bones made it per-process in his patch. It was
set using a process personality bit, which really wasn't ideal (that
was one of my contributions to the patch) but it did work.

 > In other words, just forget the whole notion. The only set people who have
 > any reason at _all_ to want it is the samba team, and we can solve the 
 > samba-specific problems other ways.

Nope, its not just Samba, though perhaps Samba is the app that cares
the most about the actual performance. The other obvious people who
care are wine and anyone porting an application from windows. Also,
the problem isn't just one of performance, its also hard to make it
raceless from userspace.

I also think that if the choice were given then some linux distros
(the likes of Lindows comes to mind) would choose to run all processes
case-insensitive. These sorts of distros are aiming at the sorts of
users that would want everything to be case-insensitive.

 > Just take that as a simple fact - case insensitivity in the kernel is such 
 > a horribly bad idea, that you really shouldn't go there.

I'm yet to be convinced :)

 >  - MOST name lookups are likely results of some kind of "readdir()" 
 >    lookup, and tend to have the case right in the first place. So that 
 >    should go fast. Maybe Tridge has some statistics on this one?

ok, the first thing you need to understand about case-insensitivity on
a case-sensitive system is that the hardest thing to do is prove that
a file doesn't exist. File operations on non-existant files are *very*
common. If you can come up with a solution that allows me to prove
that a file doesn't exist in any case combination then we will be most
of the way there.

That immediately throws out most of the "why don't you just use a
cache" arguments that everyone seems to come up with. We *do* use a
cache that primes the "most likely" filename code, its just that a
cache is almost useless when you are trying to prove that a file
definately doesn't exist.

 >  - samba probably has certain pretty well-defined special patterns for 
 >    what it wants to do with a filename, do you probably don't need a 
 >    generic "everything that takes a filename should be case-insensitive", 
 >    and it would be acceptable to have a few _very_ specific system calls.

yes, if we had a single function that took a pathname and gave us
either -1/ENOENT or the pathname of a file that matches
case-insensitively then that would be great. Then again, if we had
such a function then it would be really easy to use that function in
the VFS to make the Linux case-insensitive on a per-process basis.

So lets imagine we have such a function like this:

  int ci_normalize(char *path);

Lets assume it takes a pathname and returns either -1/ENOENT or
modifies the pathname in place (totally ignoring the fact that the
length of the pathname could change, and that the "char *" is really a
"const char *" - pedants go home).

now lets build a ci_unlink() on top of that:

   int ci_unlink(char *path)
   {
	if (task_is_case_sensitive(current)) {
		return unlink(path);
	}
	if (ci_normalize(path) == -1) {
		return -1;
	}
	return unlink(path);
   }

The problem is the negative dentries. If you do the above then
case-sensitive processes will be fast, but case-insensitive processes
will effectively be running without the negative dcache, so unlink()
on paths that don't exist will be slow each and every time. That's why
doing this with any sort of decent efficiency needs dcache changes.

btw, I already know that Al is completely and utterly opposed to
putting any case-insensitivity in the dcache (I think the phrase "over
my dead body" was mentioned), so I know that I'm fighting an uphill
battle here, but I like trying every now and again to see if I can
make any progress.

 > With those assumptions out of the way, we could think of an interface that
 > exports some partial functionality of the "lookup_path()" code the kernel
 > as a special system call. In particular, something that takes an input
 > pathname, and is able to stop at any point of the name when a lookup
 > fails.
 > So some variation of the interface
 > 
 > 	int magic_open(
....

how would this interact with the negative dcache entries? That is the
key.

 > I suspect we can do case-insensitive names faster than WNT even with a 
 > fairly complex user-mode interface. Just because _not_ having them in the 
 > kernel allows us to have much faster default behaviour.

on this I completely disagree. Any solution that doesn't cope with
case insensitive properties of negative dentries is just going to
start filling the dcache with lots of useless entries (case
combinations) or effectively not end up using the dcache at
all. Either way its a big loss compared to making the dcache know
about case insensitivity properly.

Cheers, Tridge

PS: ahh, what timing, someone just posted a request to the rsync list
asking for case-insensitivity in rsync.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17  6:54   ` tridge
@ 2004-02-17  8:33     ` Neil Brown
  2004-02-17 22:48       ` tridge
  2004-02-17 15:13     ` Linus Torvalds
  1 sibling, 1 reply; 69+ messages in thread
From: Neil Brown @ 2004-02-17  8:33 UTC (permalink / raw)
  To: tridge; +Cc: Linus Torvalds, Kernel Mailing List, Al Viro

On Tuesday February 17, tridge@samba.org wrote:
> 
> I also think that if the choice were given then some linux distros
> (the likes of Lindows comes to mind) would choose to run all processes
> case-insensitive. These sorts of distros are aiming at the sorts of
> users that would want everything to be case-insensitive.

This is the bit I don't understand.

Surely the value of case-insensitivity is that you can type in a
filename from memory and not worry about what case you used when you
created the file.

Yet with Lindows / MS-Windows style interfaces, you virtually never
type the name of a pre-existing file.  So case-insensitivity doesn't
seem to be a win to the user.

I thought the value of a case-insensitive filenames was for
legacy applications which have been written to the WIN32 API and took
lots of liberties with "pretty-casing" filenames between readdir and
open. 

NeilBrown

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17  8:33     ` Neil Brown
@ 2004-02-17 22:48       ` tridge
  2004-02-18  0:06         ` Neil Brown
  0 siblings, 1 reply; 69+ messages in thread
From: tridge @ 2004-02-17 22:48 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linus Torvalds, Kernel Mailing List, Al Viro

Neil,

 > I thought the value of a case-insensitive filenames was for
 > legacy applications which have been written to the WIN32 API and took
 > lots of liberties with "pretty-casing" filenames between readdir and
 > open. 

No, thats a common misconception. It does happen (the "pretty-casing")
but its relatively rare these days. The real problem is *proving* that
a file doesn't exist. If a file does exist then there are all sorts of
heuristic and cache mechanisms that can be used to get the real
filename quickly on average, but if you have to prove absolutely that
a file does not exist then all of that stuff is pretty much useless.

Samba (and any other system that wants case-insensitive semantics on
Linux) can't make do with "oh, it probably doesn't exist". That way
leads to data loss. You have to know with 100% certainty that the file
doesn't exist in any case combination.

Unfortunately, that is also the hardest thing to do.

Cheers, Tridge

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17 22:48       ` tridge
@ 2004-02-18  0:06         ` Neil Brown
  2004-02-18  9:47           ` Helge Hafting
  0 siblings, 1 reply; 69+ messages in thread
From: Neil Brown @ 2004-02-18  0:06 UTC (permalink / raw)
  To: tridge; +Cc: Linus Torvalds, Kernel Mailing List, Al Viro

On Wednesday February 18, tridge@samba.org wrote:
> 
> Samba (and any other system that wants case-insensitive semantics on
> Linux) can't make do with "oh, it probably doesn't exist". That way
> leads to data loss. You have to know with 100% certainty that the file
> doesn't exist in any case combination.
> 
> Unfortunately, that is also the hardest thing to do.

Hi Tridge,

Maybe if it is so hard, we should just define it to be easy.... just
change the universe a bit.....

I'm, sure you've thought about this a lot more that I have or will, so
I must be missing something, but there seems to be a solution that is
efficient, predictable, and should we acceptable.

The first observation is that POSIX applications and WIN32 application
cannot both get exactly the file system, semantics they expect in the
same directory. The example:
    POSIX:
       create "Makefile"
       create "makefile"
    WIN32:
       unlink "MakeFile"
seems to show that.

So decide up front that a WIN32 application will see something
different, and decide what the best thing for it to see would be
(i.e. change the universe).

First cut:
   An application that wants case-insensitive filenames only
   sees those filenames that are in a case-insensitive-canonical-form.
   So the interface maps all file names in requests to a canonical
   form, and the readdir equivalent discards all non-canonical names. 

   Thus in the above example, the WIN32 app would unlink "makefile"
   and never notice that "Makefile" exists.

   This has (to me) two problems.
    1/ case gets lost, so if I save "My File", I will find "my file"
    has been created (unless the application pretty-cases things, in
    which case I can expect case to change anyway).

    2/ Files created by posix apps might be invisible.

    To answer 2/, I'd say "tough".  If you want posix files to be
    visible to WIN32 apps, choose appropriate names.  However I would
    allow there to be a process, either once-off or periodic, which
    creates symlinks from canonical names to non-canocial filenames.
    This would allow you to access pre-existing files where there was
    no ambiguity.

    To answer 1/ I would suggest a second cut at the problem...

Second cut:
    As above, but readdir tries to be clever.  If it sees two (or
    more) names which have the same canonical form, it chooses just
    one of them (predictably), prefering a non-canonical name which is
    a symlink to the canonical name.

    Then when creating an a object, you create it with the canonical
    name and (if that succeeds) subsequently create a symlink from the
    requested name to the canonical name (if that is possible, don't
    worry if it isn't).

Given this approach:

  If only case-insensitive apps use a linux filesystem, they will see
  exactly the semantics they expect, with minimal performance impact.

  If case-sensitive and case-insensitive apps use a linux filesystem,
  they will each see a consistent view and though they may not see the
  same view, there will be well-defined mechanisms which can work at a
  user-space level to resolve or highlight any issues.

The biggest cost I see with this is with large directories.  The
"readdir" equivalent would need to read the whole directory before it
could reliably return any of it.
However  dropping the "guarantee to preserve case" semantic on really
large directories probably isn't an enormous cost (and could be
configurable).

NeilBrown

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18  0:06         ` Neil Brown
@ 2004-02-18  9:47           ` Helge Hafting
  0 siblings, 0 replies; 69+ messages in thread
From: Helge Hafting @ 2004-02-18  9:47 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-kernel

Neil Brown wrote:

>     1/ case gets lost, so if I save "My File", I will find "my file"
>     has been created (unless the application pretty-cases things, in
>     which case I can expect case to change anyway).
> 
>     2/ Files created by posix apps might be invisible.
> 
> 
>     To answer 2/, I'd say "tough".  If you want posix files to be

This is a bit worse than just "though".  
win32: rmdir foo
       directory not empty!
win32: there are _no_ files there?

Helge Hafting


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17  6:54   ` tridge
  2004-02-17  8:33     ` Neil Brown
@ 2004-02-17 15:13     ` Linus Torvalds
  2004-02-17 16:57       ` Linus Torvalds
  2004-02-17 23:20       ` tridge
  1 sibling, 2 replies; 69+ messages in thread
From: Linus Torvalds @ 2004-02-17 15:13 UTC (permalink / raw)
  To: tridge; +Cc: Kernel Mailing List, Al Viro

On Tue, 17 Feb 2004 tridge@samba.org wrote:
> 
> From memory, the patch added new classes of dentries to the current
> "+ve" and "-ve" dentries. It added concepts like a "-ve
> case-insensitive" dentry and a "-ve case-sensitive" dentry. It
> certainly adds more code in trying to deal with these variants, but I
> see no reason why it should be significantly computationally less
> efficient.

Yes, we could add context sensitivity to the dcache with a context 
bitmask.

However, it's _not_ correct.

It assumes that there is only one way to do lower/upper case, which just 
isn't true. What about different locales that have different case rules? 
Your "one bit per dentry" becomes "one bit per locale per dentry". That's 
just horribly hard to do.

I don't know how Windows does it, so maybe this thing is hardcoded, and 
you don't even want "true" case insensitivity. How "correct" is Windows?

(And don't even bother telling me about the translation table in NTFS 
volumes - I'm not interested. This would have to work on a sane filesystem 
to be useful, even for samba.)

		Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17 15:13     ` Linus Torvalds
@ 2004-02-17 16:57       ` Linus Torvalds
  2004-02-17 19:44         ` viro
                           ` (2 more replies)
  2004-02-17 23:20       ` tridge
  1 sibling, 3 replies; 69+ messages in thread
From: Linus Torvalds @ 2004-02-17 16:57 UTC (permalink / raw)
  To: tridge; +Cc: Kernel Mailing List, Al Viro

On Tue, 17 Feb 2004, Linus Torvalds wrote:
> 
> It assumes that there is only one way to do lower/upper case, which just 
> isn't true. What about different locales that have different case rules? 
> Your "one bit per dentry" becomes "one bit per locale per dentry". That's 
> just horribly hard to do.

It's also hard to know what to do when there are two filenames that
literally _are_ the same when not comparing cases. Which can obviously
happen under Linux - you'd have a case-sensitive app that creates a both
"makefile" and "Makefile", and now you have a case-insensitive app that
looks it up (or worse, removes it), and what the *heck* is the dcache now
supposed to really do?

This is why I'd hate for the generic Linux dcache to know about case
sensitivity, and I'd be a lot happier having a separate path (which isn't
as speed-critical) that can be used to help implement helper functions for
doing case-insensitive things.

That way the bugs and strange behaviour would be all be limited to the 
case-insensitive special code, and not pollute the "sane" side.

For example, I fundamentally can't easily do an atomic exclusive
case-insensitive "create" or "rename", but we _could_ expose things like
directory generation counts to the special interfaces, and thus allow at
least "local-atomic" operations (but they would _not_ be atomic over a
network, to give you an idea of the kinds of _fundamental_ limitations
there are here).

That's why I'd advocate having a few very special system calls for doing
the operations that samba (and I'll throw wine into the pot too) wants to
do. So you could literally do an atomic create with something like

 - regular atomic create of random case-_sensitive_ name using something 
   tempnam()-like (use a prefix that is invalid on windows or something: 
   make the first character be 0xff or whatever).
 - "read directory local sequence count"
 - readdir to make sure that the new name is still unique even in the
   case-insensitive sense
 - "atomic move conditionally on the local sequence count still being X"

The thing is, we can do hack like the above, and yes, we could do them all 
inside the kernel, and give user space a reasonably nice interface with 
"pseudo-atomic" behaviour (ie it will _not_ be atomic if multiple clients 
do this over NFS, but I doubt you care).

But it wouldn't be "open()" and "rename()". It would be a totally separate
kernel path. It would be in the "case-insensitivity-module". It would be 
_outside_ the regular VFS layer, although it would have some visibility 
into it (ie it could follow dentries on its own, and know about the RCU 
etc locking rules).

We can even allow that case-insensitive module to set some flags in the 
dentries (so that you can create negative dentries that have a flag set 
"this is negative for all cases").

Trust me, this is much less intrusive, and a lot easier to debug too. It 
won't be as fast as the regular path operations, but depending on what the 
common cases are (hopefully "look up name that is exact"), it would likely 
not be horrible either. And it could probably be debugged as a real 
module, without impacting any existing code, which would make it a lot 
easier to create.

See where I'm going? Would this be acceptable to you? Are there any samba 
people who are knowledgeable about the VFS-layer and have the time/energy 
to try something like this?

Al? What do you think?

		Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17 16:57       ` Linus Torvalds
@ 2004-02-17 19:44         ` viro
  2004-02-17 20:10           ` Linus Torvalds
  2004-02-17 21:08         ` Robin Rosenberg
  2004-02-17 23:57         ` tridge
  2 siblings, 1 reply; 69+ messages in thread
From: viro @ 2004-02-17 19:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: tridge, Kernel Mailing List

On Tue, Feb 17, 2004 at 08:57:40AM -0800, Linus Torvalds wrote:
> Trust me, this is much less intrusive, and a lot easier to debug too. It 
> won't be as fast as the regular path operations, but depending on what the 
> common cases are (hopefully "look up name that is exact"), it would likely 
> not be horrible either. And it could probably be debugged as a real 
> module, without impacting any existing code, which would make it a lot 
> easier to create.
> 
> See where I'm going? Would this be acceptable to you? Are there any samba 
> people who are knowledgeable about the VFS-layer and have the time/energy 
> to try something like this?
> 
> Al? What do you think?

What will protect your generation counts during the operation itself?
->i_sem?

If anything, I'd suggest doing it as
	cretinous_rename(dir_fd, name1, name2)
with the following semantics:

	* if directory had been changed since open() that gave us dir_fd -
-EFOAD
	* otherwise, rename name1 to name2 (no cross-directory renames here).

No need to expose generation counts to userland - we can just compare the
count at open() time with that at operation time.  The rest can be done
in userland (including creation of files).

We _definitely_ don't want to put "UTF-8 case-insensitive comparison" anywhere
near the kernel - it's insane.  If samba wants it, they get to pay the price,
both in performance and keeping butt-ugly code (after all, the goal of project
is to imitate butt-ugly system for butt-ugly clients).  The same goes for Wine.

And we really don't want to encourage those who port Windows userland in
not fixing the idiotic semantics.  As for Lindows... let's just say that
I can't find any way to describe what I really think of those clowns, their
intellect and their morals that wouldn't lead to a lawsuit from them.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17 19:44         ` viro
@ 2004-02-17 20:10           ` Linus Torvalds
  2004-02-17 20:17             ` viro
  0 siblings, 1 reply; 69+ messages in thread
From: Linus Torvalds @ 2004-02-17 20:10 UTC (permalink / raw)
  To: viro; +Cc: tridge, Kernel Mailing List

On Tue, 17 Feb 2004 viro@parcelfarce.linux.theplanet.co.uk wrote:
> 
> What will protect your generation counts during the operation itself?
> ->i_sem?

Yes. You have to take it anyway, so why not?

> If anything, I'd suggest doing it as
> 	cretinous_rename(dir_fd, name1, name2)
> with the following semantics:
> 
> 	* if directory had been changed since open() that gave us dir_fd -
>   -EFOAD
> 	* otherwise, rename name1 to name2 (no cross-directory renames here).

Sure, that works.

> No need to expose generation counts to userland - we can just compare the
> count at open() time with that at operation time.  The rest can be done
> in userland (including creation of files).

Note that I'm not sure we would expose generation counts at all to user 
space: we might keep all of this inside the "crapola windows behaviour" 
module, and user space could actually see some easier highlevel interface. 
Something like yours, but I suspect we'd want to see what the whole 
user-level loop would look like to know what the architecture should be 
like.

I do believe we'd need to have some way to "refresh" the fd in your
example, without restarting the whole lookup. So that when the user gets 
EFOAD, it can do

	refresh(fd);
	readdir(fd);
	/* Check that nothing clashes */
	goto try_again;

or similar. So the generation count _semantics_ would be exposed, even if 
the numbers themselves would be hidden inside the kernel.

> We _definitely_ don't want to put "UTF-8 case-insensitive comparison" anywhere
> near the kernel - it's insane.  If samba wants it, they get to pay the price,
> both in performance and keeping butt-ugly code (after all, the goal of project
> is to imitate butt-ugly system for butt-ugly clients).  The same goes for Wine.

I agree. We'd need to let user space do the equality comparisons, I just 
don't see how to sanely do it in kernel land.

> And we really don't want to encourage those who port Windows userland in
> not fixing the idiotic semantics.  As for Lindows... let's just say that
> I can't find any way to describe what I really think of those clowns, their
> intellect and their morals that wouldn't lead to a lawsuit from them.

Heh.

I suspect most people don't care that much, but I also suspect that 
projects like samba have to have a "anal mode" where they really act like 
Windows, even when it's "wrong". People can then choose to say "screw that 
idiocy", but by just _having_ a very compatible mode you deflect a lot of 
criticism. Regardless of whether people want the anal mode or not in real 
life.

Backwards compatibility is King. It's _hugely_ important. It's one of the
most important things to me in the kernel, and by the same logic I do see
that it is important to others as well - even when the backwards
compatibility ends up being inherited from a broken Windows setup. So
while I hate case-insensitive names, I do understand that people want to
have some way to emulate the braindamage for some _really_ "ass-backwards"
compatibility reasons.

So I think it's worth some pain, as long as we keep that compatibility 
from starting to encrust the _good_ stuff.

		Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17 20:10           ` Linus Torvalds
@ 2004-02-17 20:17             ` viro
  2004-02-17 20:23               ` Linus Torvalds
  0 siblings, 1 reply; 69+ messages in thread
From: viro @ 2004-02-17 20:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: tridge, Kernel Mailing List

On Tue, Feb 17, 2004 at 12:10:23PM -0800, Linus Torvalds wrote:
> I do believe we'd need to have some way to "refresh" the fd in your
> example, without restarting the whole lookup. So that when the user gets 
> EFOAD, it can do
> 
> 	refresh(fd);

lseek(fd, 0, 0);

> > And we really don't want to encourage those who port Windows userland in
> > not fixing the idiotic semantics.  As for Lindows... let's just say that
> > I can't find any way to describe what I really think of those clowns, their
> > intellect and their morals that wouldn't lead to a lawsuit from them.
> 
> Heh.
> 
> I suspect most people don't care that much, but I also suspect that 
> projects like samba have to have a "anal mode" where they really act like 
> Windows, even when it's "wrong". People can then choose to say "screw that 
> idiocy", but by just _having_ a very compatible mode you deflect a lot of 
> criticism. Regardless of whether people want the anal mode or not in real 
> life.

Umm...  Samba deals with Windows clients.  Windows software allegedly being
ported to Linux is a different story and in that case there's no excuse for
demanding case-insensitive operations.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17 20:17             ` viro
@ 2004-02-17 20:23               ` Linus Torvalds
  0 siblings, 0 replies; 69+ messages in thread
From: Linus Torvalds @ 2004-02-17 20:23 UTC (permalink / raw)
  To: viro; +Cc: tridge, Kernel Mailing List

On Tue, 17 Feb 2004 viro@parcelfarce.linux.theplanet.co.uk wrote:
>
> > 	refresh(fd);
> 
> lseek(fd, 0, 0);

Yes. We can make that implicitly refresh, I'm certainly ok with that.

> > I suspect most people don't care that much, but I also suspect that 
> > projects like samba have to have a "anal mode" where they really act like 
> > Windows, even when it's "wrong". People can then choose to say "screw that 
> > idiocy", but by just _having_ a very compatible mode you deflect a lot of 
> > criticism. Regardless of whether people want the anal mode or not in real 
> > life.
> 
> Umm...  Samba deals with Windows clients.  Windows software allegedly being
> ported to Linux is a different story and in that case there's no excuse for
> demanding case-insensitive operations.

"wine". It's not porting, it's emulation.

But yes, I agree, I don't see any other cases where we want it. 

We basically want to support broken clients - whether they be on the other 
side of the network, or the other side of an emulation interface. That is 
the only valid reason to do this crap.

It's a fairly sizeable reason, though. On another front ("World
Domination, Fast!") we'll try to fix the problem another way, but there's 
nothing wrong with fighting on multiple fronts if you have the man-power.

		Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17 16:57       ` Linus Torvalds
  2004-02-17 19:44         ` viro
@ 2004-02-17 21:08         ` Robin Rosenberg
  2004-02-17 21:17           ` Linus Torvalds
  2004-02-17 23:57         ` tridge
  2 siblings, 1 reply; 69+ messages in thread
From: Robin Rosenberg @ 2004-02-17 21:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: tridge, Kernel Mailing List, Al Viro

On Tuesday 17 February 2004 17.57, Linus Torvalds wrote:
[case-insanesititvity proposal ///]
> See where I'm going? Would this be acceptable to you? Are there any samba 
> people who are knowledgeable about the VFS-layer and have the time/energy 
> to try something like this?

So the same guy that strongly insist that a file is a string of bytes and nothing else,
now thinks it is sane to even think of "case" of a byte. That's impossible unless you
actually DO believe its a bunch of characters.  What is it?

-- robin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17 21:08         ` Robin Rosenberg
@ 2004-02-17 21:17           ` Linus Torvalds
  2004-02-17 22:27             ` Robin Rosenberg
  0 siblings, 1 reply; 69+ messages in thread
From: Linus Torvalds @ 2004-02-17 21:17 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: tridge, Kernel Mailing List, Al Viro

On Tue, 17 Feb 2004, Robin Rosenberg wrote:
>
> On Tuesday 17 February 2004 17.57, Linus Torvalds wrote:
> [case-insanesititvity proposal ///]
> > See where I'm going? Would this be acceptable to you? Are there any samba 
> > people who are knowledgeable about the VFS-layer and have the time/energy 
> > to try something like this?
> 
> So the same guy that strongly insist that a file is a string of bytes and nothing else,
> now thinks it is sane to even think of "case" of a byte. That's impossible unless you
> actually DO believe its a bunch of characters.  What is it?

Which part of my argumen don't you understand?

The kernel proper thinks it's just a stream of bytes, and all the existing 
interfaces do likewise.

But we'd have a kernel helper module to let samba do what it already does 
now, except help it do so more efficiently?

The fact that _I_ think pathnames are just a nice stream of bytes sadly 
doesn't make Windows clients do the same. Some day when I'm King Of The 
World, and I can outlaw windows clients, we'll finally get rid of the 
braindamage, but until then I'm pragmatic enough to say "let's help out 
the poor samba people who have to deal with the crap day in and day out".

What's your problem with that?

		Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17 21:17           ` Linus Torvalds
@ 2004-02-17 22:27             ` Robin Rosenberg
  2004-02-18  3:02               ` tridge
  0 siblings, 1 reply; 69+ messages in thread
From: Robin Rosenberg @ 2004-02-17 22:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: tridge, Kernel Mailing List, Al Viro

On Tuesday 17 February 2004 22.17, Linus Torvalds wrote:
> The fact that _I_ think pathnames are just a nice stream of bytes sadly 
> doesn't make Windows clients do the same. Some day when I'm King Of The 
> World, and I can outlaw windows clients, we'll finally get rid of the 
LPA = Linus' Patriot Act. 

> braindamage, but until then I'm pragmatic enough to say "let's help out 
> the poor samba people who have to deal with the crap day in and day out".
> 
> What's your problem with that?

Nothing wrong with helping people. 

Having to put up with the existence of Windows day in and out is the reason I'm still on
an eight-bit encoding.  Sorry for not explaining the REAL problem, but only a partial
problem. I need to support all kinds of clients on Windows with protocols that convey no
character set info. With samba that's no problem. Having to put up with a Unix world running 
ISO-8859-1 (or ISO-8859-15) is another. Ofcourse that means Linux machines also add
to the disturbance by not storing things as unicode. The real obstable is file names, 
everything else including content of files, I can handle (I think). Maybe I'll find a solution
for the filenames too, but usually some hot discussions are needed for the brain to kick
into the right gear. 

I want to switch to UTF-8 to work better with the outside world, but as things are people will 
start to take notice of what OS is running in the shadows when they see the filename problems, and 
start demanding Windows, and ...  You see; I'm not mean; I don't want to do that to them (or myself),

-- robin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17 22:27             ` Robin Rosenberg
@ 2004-02-18  3:02               ` tridge
  0 siblings, 0 replies; 69+ messages in thread
From: tridge @ 2004-02-18  3:02 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: Linus Torvalds, Kernel Mailing List, Al Viro

Robin,

 > Having to put up with the existence of Windows day in and out is
 > the reason I'm still on an eight-bit encoding.  Sorry for not
 > explaining the REAL problem, but only a partial problem. I need to
 > support all kinds of clients on Windows with protocols that convey
 > no character set info. With samba that's no problem. Having to put
 > up with a Unix world running ISO-8859-1 (or ISO-8859-15) is
 > another. Ofcourse that means Linux machines also add to the
 > disturbance by not storing things as unicode. The real obstable is
 > file names, everything else including content of files, I can
 > handle (I think). Maybe I'll find a solution for the filenames too,
 > but usually some hot discussions are needed for the brain to kick
 > into the right gear.

I suspect you are running Samba 2.x, which negotiated all that
multi-byte stuff on the wire. Samba 3.x does the same as windows
servers have done for years and negotiates UCS-2, which means that
every windows box that connects to it no matter what locale it is in
uses the same charset encoding as every other windows box.

There are still some legacy interfaces on the wire that use the old
encodings, but they are rare and getting rarer. To support these,
Samba3 juggles 4 character set encodings internally:

  * the unix-charset, which it uses to talk to the OS, and defaults to
    UTF-8

  * the windows wire charset, which is always UCS-2

  * the dos-charset for legacy parts of the protocol, which you have
    to configure in the samba config if you care about these legacy
    parts of the protocol (for example if you have older apps). It
    defaults to either CP850 or ASCII depending on what autoconf
    discovers. 

  * the display-charset which is used to put stuff on an admins
    terminal for utilities like smbclient. The default depends on your
    LOCALE setting, or if nothing is set it uses ASCII.

Internally Samba3 only ever stores stuff in the "unix-charset"
encoding, which is usually UTF-8. It converts to the others as needed
when talking on the wire or to terminals.

 > I want to switch to UTF-8 to work better with the outside world,
 > but as things are people will start to take notice of what OS is
 > running in the shadows when they see the filename problems, and
 > start demanding Windows, and ...  You see; I'm not mean; I don't
 > want to do that to them (or myself),

If you use Samba3 then they will not notice what charset you are using
on your Linux filesystems. The windows clients will just see UCS-2.

Cheers, Tridge

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17 16:57       ` Linus Torvalds
  2004-02-17 19:44         ` viro
  2004-02-17 21:08         ` Robin Rosenberg
@ 2004-02-17 23:57         ` tridge
  2 siblings, 0 replies; 69+ messages in thread
From: tridge @ 2004-02-17 23:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List, Al Viro

Linus,

 > It's also hard to know what to do when there are two filenames that
 > literally _are_ the same when not comparing cases. Which can obviously
 > happen under Linux - you'd have a case-sensitive app that creates a both
 > "makefile" and "Makefile", and now you have a case-insensitive app that
 > looks it up (or worse, removes it), and what the *heck* is the dcache now
 > supposed to really do?

This is really not as bad as it first seems. Just think what the
absolutely obvious thing to do is and do that. It's like all those
things in POSIX where it says "if you do XXX then the behaviour is
undefined" and the implementations end up doing whatever the heck they
find easiest to do. It's the same here.

In the example you give then you just give whatever file you come
across first or happen to have in the dcache. You can't do better than
that, as the problem is fundamentally insoluble in a sane fashion, so
just don't try. We've been doing exactly that in Samba for 12 years
(picking the first file we come across) and I can't recall a *single*
complaint about that behaviour. Users *expect* the server to just pick
one, and have no pre-conceived idea of which one it will pick.

Of course, some samba-tuned filesystem could have a mount option to
refuse to allow the creation of filenames that conflict in this way,
but don't even try to enforce this in the kernel core.

 > This is why I'd hate for the generic Linux dcache to know about case
 > sensitivity, and I'd be a lot happier having a separate path (which isn't
 > as speed-critical) that can be used to help implement helper functions for
 > doing case-insensitive things.

The problem is that if that separate path doesn't go via the dcache
then we won't get the invalidation of our negative dentries so we
won't be able to do any better than scanning the whole directory every
time to prove files don't exist. The dcache has to know about this as
its the only place where all the information that is needed comes
together (I'm sure you'll correct me if I'm wrong about this).

 > That way the bugs and strange behaviour would be all be limited to the 
 > case-insensitive special code, and not pollute the "sane" side.

except when something like a file create happens on the "sane" side of
things and we then have no way of knowing that our name space has just
changed. I suppose we could create a completely new dcache in parallel
with the current one and have some sort of notify between the "sane"
and "insane" worlds, but I suspect the glue code between them would be
worse than just adding that context bit to the main dcache.

 > For example, I fundamentally can't easily do an atomic exclusive
 > case-insensitive "create" or "rename", but we _could_ expose things like
 > directory generation counts to the special interfaces, and thus allow at
 > least "local-atomic" operations (but they would _not_ be atomic over a
 > network, to give you an idea of the kinds of _fundamental_ limitations
 > there are here).

yes, doing atomic network file operations sucks, but please don't let
that stop us doing it in a reasonable fashion for local filesystems.

Doing a nice atomic case-insensitive create or rename is really *no*
different from what we do now in Linux, it just means that we need to
have case-insensitive dentries that mean "this is a negative dentry
that covers all possible case combinations of the name it
contains". It is up to the filesystem to provide you with that -ve
dentry (just like the filesystem provides the case-sensitive -ve
dentries now) and the dcache just has to use it in the same way that
it uses the existing ones.

If you really don't want to do this then fine, in which case I'll ask
again in a year or twos time and see if I can convince you then. I
know this would make the code messier, and making code messier for the
sake of interoperability with windows is perhaps reason enough not to
do it. But please don't tell me it *can't* be done or that it is just
too hard. That's just not true.

 >  - regular atomic create of random case-_sensitive_ name using something 
 >    tempnam()-like (use a prefix that is invalid on windows or something: 
 >    make the first character be 0xff or whatever).
 >  - "read directory local sequence count"
 >  - readdir to make sure that the new name is still unique even in the
 >    case-insensitive sense
 >  - "atomic move conditionally on the local sequence count still being X"

that could make things atomic, but it won't make it fast. Think about
the fact that modern filesystems are now using better than linear
lists for directories. So in most cases lookups in large directories
can be done in much better than O(n) time (for reasonable values of
n). The above solution means Samba will never be better than O(n), so
for large directories we will always suck performance wise. It doesn't
have to be that way.

 > We can even allow that case-insensitive module to set some flags in the 
 > dentries (so that you can create negative dentries that have a flag set 
 > "this is negative for all cases").

ahh! yipee!

yes, if we have that dentry bit then we have a hope. Without that I
think it won't help much.

 > See where I'm going? Would this be acceptable to you? Are there any samba 
 > people who are knowledgeable about the VFS-layer and have the time/energy 
 > to try something like this?

I'll discuss this with some of the people here in OzLabs and see if we
can come up with a plan. I suspect most of OzLabs will be avoiding me
for a day or two in an attempt to not be the one to do this :-)

Cheers, Tridge

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17 15:13     ` Linus Torvalds
  2004-02-17 16:57       ` Linus Torvalds
@ 2004-02-17 23:20       ` tridge
  2004-02-17 23:43         ` Linus Torvalds
  2004-02-18  2:37         ` H. Peter Anvin
  1 sibling, 2 replies; 69+ messages in thread
From: tridge @ 2004-02-17 23:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List, Al Viro

Linus,

 > Yes, we could add context sensitivity to the dcache with a context 
 > bitmask.
 > 
 > However, it's _not_ correct.
 > 
 > It assumes that there is only one way to do lower/upper case, which just 
 > isn't true. What about different locales that have different case rules? 
 > Your "one bit per dentry" becomes "one bit per locale per dentry". That's 
 > just horribly hard to do.

I think you're making it sound much harder than it really is.

We just add a VFS hook in the filesystems. The filesystem chooses the
encoding specific comparison function. If the filesystem doesn't
provide one then don't do case insensitivity. If the filesystem does
provide one (for example NTFS, JFS) then use it. Then all I need to do
is convince one of the filesystem maintainers to add a mount time
option to specify the case table (for example by specifying the name
of a file in the filesystem that holds it).

So, all the really ugly stuff is then in the per-filesystem code, and
all the VFS and dcache has to do is know about a single context bit
per dentry. 

 > I don't know how Windows does it, so maybe this thing is hardcoded, and 
 > you don't even want "true" case insensitivity. 

NTFS has a 128k table on disk, created at mkfs time and indexed by the
UCS2 character. The interesting thing about this table is that it
doesn't seem to vary between different locales as one might expect. I
have checked 3 locales so far (Swedish, Japanese and English) and all
have the same 128k table. I should check a few more locales to see if
it really is the same everywhere. Contact me off-list if you have a
NTFS filesystem created in a different locale and would be willing to
run a test program against it to see if the table is different from
the one we have in Samba.

There is stuff in the charset handling of every locale that does vary
in windows, but it isn't the case table, its the "valid characters"
map used to determine what characters are allowed when converting
strings into legacy multi-byte encodings. Even I don't think that the
kernel will ever have to deal with that crap unless someone is foolish
enough to port Samba into the kernel (several people have actually
done that despite the insanity of the idea, but they all did an
absolutely terrible job of it and certainly didn't take care to get
all the charset handling right).

> How "correct" is Windows?

from my rather limited point of view I always have to assume that
windows is "correct", unless I can show that its behaviour leads to
data loss, a security hole or something equally extreme.

Cheers, Tridge

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17 23:20       ` tridge
@ 2004-02-17 23:43         ` Linus Torvalds
  2004-02-18  3:26           ` tridge
  2004-02-18  2:37         ` H. Peter Anvin
  1 sibling, 1 reply; 69+ messages in thread
From: Linus Torvalds @ 2004-02-17 23:43 UTC (permalink / raw)
  To: tridge; +Cc: Kernel Mailing List, Al Viro

On Wed, 18 Feb 2004 tridge@samba.org wrote:
> 
> I think you're making it sound much harder than it really is.

I think I'm just making the mistake of assuming that anybody would care to 
do it "right", while everybody really only cares to get it be compatible 
with Windows.

For example, if you only want to be compatible with Windows, you don't 
have to worry about UCS-4, you only have the UCS-2 part, which means that 
you can do a silly array-lookup based thing or something.

> We just add a VFS hook in the filesystems. The filesystem chooses the
> encoding specific comparison function. If the filesystem doesn't
> provide one then don't do case insensitivity. If the filesystem does
> provide one (for example NTFS, JFS) then use it. Then all I need to do
> is convince one of the filesystem maintainers to add a mount time
> option to specify the case table (for example by specifying the name
> of a file in the filesystem that holds it).

Ugh. What a horrible kludge, and it won't work without "preparing" the 
filesystem at mount-time. I'd much rather leave the translation table in 
user space, and just give it as an argument to the "look up case 
insensitive" special thing.

That would mean that we can hold the directory semaphore over the whole 
thing, which would simplify _my_ kludge, since there would be no need to 
worry about user space having separate stages.

The hard part would be negative dentries. We'd have to invalidate all
"case-insensitive" negative dentries when creating any new file in a
directory, and that would be something the generic VFS layer would have to 
know about, and that might be unacceptable to Al.

		Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17 23:43         ` Linus Torvalds
@ 2004-02-18  3:26           ` tridge
  2004-02-18  5:33             ` H. Peter Anvin
  2004-02-18  7:54             ` Marc Lehmann
  0 siblings, 2 replies; 69+ messages in thread
From: tridge @ 2004-02-18  3:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List, Al Viro

Linus,

 > For example, if you only want to be compatible with Windows, you don't 
 > have to worry about UCS-4, you only have the UCS-2 part, which means that 
 > you can do a silly array-lookup based thing or something.

Even within UCS-2 land the case-mapping table is sparse as only some
characters have a upper/lower mapping. In fact, there are just 636
characters out of 64k that have an upper/lower case mapping that isn't
the identity. That is across *all* languages that windows uses for
UCS-2.

In Samba that's not sparse enough that its worth saving the single
mmap of 128k to encode it sparsely in memory, but in UCS-4 land you
would obviously use a sparse mapping, and that mapping table would
probably be just a few k in size. If you allow for extents then I
expect you could encode it in a couple of hundred bytes.

(I experimented with using a sparse mapping in Samba, and it was a
slight loss on the machine I was testing on compared to just doing the
mmap, so I went with the mmap. Maybe someone else can do a better
sparse encoding than I did and actually get a win due to better cache
behaviour.)

 > Ugh. What a horrible kludge, and it won't work without "preparing" the 
 > filesystem at mount-time. I'd much rather leave the translation table in 
 > user space, and just give it as an argument to the "look up case 
 > insensitive" special thing.

The case mapping table must remain the same for the lifetime of the
mounted filesyste, otherwise you'd get chaos.  That's why tying it to
the filesystem (ie. hanging it off the superblock) makes sense.

 > The hard part would be negative dentries. We'd have to invalidate all
 > "case-insensitive" negative dentries when creating any new file in a
 > directory, and that would be something the generic VFS layer would have to 
 > know about

Right, the handling of negative dentries is the key. I don't think its
quite as bad as you say though, as you can do this:

1) use a filesystem provided case-insensitive hash in the dcache. If
   the filesystem provided hash isn't case-insensitive then don't try
   to do case-insensitive lookups on this filesystem.

2) you only need to potentially invalidate entries in the same hash
   bucket as the name you are creating. 

3) Even better, you don't need to invalidate entries that don't have
   the same hash value (presuming your hash values are larger than
   your truncated hash keys).

> and that might be unacceptable to Al.

yes, and I'm quite sympathentic to that point of view. I just want to
make sure that if we don't do this then we use honest reasons for not
doing it, not "that's impossible" reasons which are bogus when you
examine them.

Cheers, Tridge

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18  3:26           ` tridge
@ 2004-02-18  5:33             ` H. Peter Anvin
  2004-02-18  7:54             ` Marc Lehmann
  1 sibling, 0 replies; 69+ messages in thread
From: H. Peter Anvin @ 2004-02-18  5:33 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <16434.56190.639555.554525@samba.org>
By author:    tridge@samba.org
In newsgroup: linux.dev.kernel
> 
> In Samba that's not sparse enough that its worth saving the single
> mmap of 128k to encode it sparsely in memory, but in UCS-4 land you
> would obviously use a sparse mapping, and that mapping table would
> probably be just a few k in size. If you allow for extents then I
> expect you could encode it in a couple of hundred bytes.
> 

If all you care about is the UTF-16-compatible range, you only need
1088K entries in your table; small enough that it can be reasonably
had in userspace.

> (I experimented with using a sparse mapping in Samba, and it was a
> slight loss on the machine I was testing on compared to just doing the
> mmap, so I went with the mmap. Maybe someone else can do a better
> sparse encoding than I did and actually get a win due to better cache
> behaviour.)

The thing is, you're probably only touching small parts of your table,
so the kernel and the CPU cache works quite well on the large table as
it is.

Wouldn't work in kernel space, though.

	-hpa

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18  3:26           ` tridge
  2004-02-18  5:33             ` H. Peter Anvin
@ 2004-02-18  7:54             ` Marc Lehmann
  1 sibling, 0 replies; 69+ messages in thread
From: Marc Lehmann @ 2004-02-18  7:54 UTC (permalink / raw)
  To: linux-kernel

On Wed, Feb 18, 2004 at 02:26:54PM +1100, tridge@samba.org wrote:
> Even within UCS-2 land the case-mapping table is sparse as only some
> characters have a upper/lower mapping. In fact, there are just 636
> characters out of 64k that have an upper/lower case mapping that isn't
> the identity. That is across *all* languages that windows uses for
> UCS-2.

This is because scripts differentiating between upper and lower case are
rare exceptions in the world.

Unfortunately, commonly used exceptions, and still locale dependent.

Having a samba-helper kernel module that would contain this table (I am
confident that it's only a single table in existing versions of windows,
but maybe they improve that in future versions) could solve this problem.

I still wonder wether it ever can be made efficient, though.

-- 
      -----==-                                             |
      ----==-- _                                           |
      ---==---(_)__  __ ____  __       Marc Lehmann      +--
      --==---/ / _ \/ // /\ \/ /       pcg@goof.com      |e|
      -=====/_/_//_/\_,_/ /_/\_\       XX11-RIPE         --+
    The choice of a GNU generation                       |
                                                         |

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17 23:20       ` tridge
  2004-02-17 23:43         ` Linus Torvalds
@ 2004-02-18  2:37         ` H. Peter Anvin
  2004-02-18  3:03           ` Linus Torvalds
  2004-02-18  4:08           ` tridge
  1 sibling, 2 replies; 69+ messages in thread
From: H. Peter Anvin @ 2004-02-18  2:37 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <16434.41376.453823.260362@samba.org>
By author:    tridge@samba.org
In newsgroup: linux.dev.kernel
> 
>  > I don't know how Windows does it, so maybe this thing is hardcoded, and 
>  > you don't even want "true" case insensitivity. 
> 
> NTFS has a 128k table on disk, created at mkfs time and indexed by the
> UCS2 character.

So you're hosed if anyone uses characters outside the UCS-2 character
set...

> The interesting thing about this table is that it doesn't seem to
> vary between different locales as one might expect. I have checked 3
> locales so far (Swedish, Japanese and English) and all have the same
> 128k table. I should check a few more locales to see if it really is
> the same everywhere. Contact me off-list if you have a NTFS
> filesystem created in a different locale and would be willing to run
> a test program against it to see if the table is different from the
> one we have in Samba.

There is a "standard" table, which is published by the Unicode
consortium.  However, the "standard" table isn't what you want in
certain locales, e.g. Turkish.

> There is stuff in the charset handling of every locale that does vary
> in windows, but it isn't the case table, its the "valid characters"
> map used to determine what characters are allowed when converting
> strings into legacy multi-byte encodings. Even I don't think that the
> kernel will ever have to deal with that crap unless someone is foolish
> enough to port Samba into the kernel (several people have actually
> done that despite the insanity of the idea, but they all did an
> absolutely terrible job of it and certainly didn't take care to get
> all the charset handling right).
> 
> > How "correct" is Windows?
> 
> from my rather limited point of view I always have to assume that
> windows is "correct", unless I can show that its behaviour leads to
> data loss, a security hole or something equally extreme.

Well, we don't want to support a bunch of hacks to make it behave like
Windows if what Windows does doesn't make sense.  If so you should use
a metalayer where you canonicalize the filenames and don't store
"Makefile" on the disk; store "makefile" and keep the "real" filename
stashed elsewhere, perhaps an EA.

	-hpa


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18  2:37         ` H. Peter Anvin
@ 2004-02-18  3:03           ` Linus Torvalds
  2004-02-18  3:14             ` H. Peter Anvin
  2004-02-18  4:08           ` tridge
  1 sibling, 1 reply; 69+ messages in thread
From: Linus Torvalds @ 2004-02-18  3:03 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

On Wed, 18 Feb 2004, H. Peter Anvin wrote:
> 
> Well, we don't want to support a bunch of hacks to make it behave like
> Windows if what Windows does doesn't make sense.

I'd disagree, for a very simple reason: case-insensitivity itself simply 
does not make sense, so the _only_ reason for having a bunch of hacks is 
literally to support windows file exports and nothing else.

I obviously agree with the fact that we should _not_ put those hacks into 
the VFS layer proper - we should keep them as a separate thing, and we 
should make it clear that it makes no sense _except_ for Windows 
compatibility.

Think of it as nothing more than a binary compatibility layer, the same 
way we have hooks to support "lcall 7,0" for binary compatibility with 
some silly (and much less interesting) x86 OSes through external modules.

		Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18  3:03           ` Linus Torvalds
@ 2004-02-18  3:14             ` H. Peter Anvin
  2004-02-18  3:27               ` Linus Torvalds
  0 siblings, 1 reply; 69+ messages in thread
From: H. Peter Anvin @ 2004-02-18  3:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds wrote:
> 
> On Wed, 18 Feb 2004, H. Peter Anvin wrote:
> 
>>Well, we don't want to support a bunch of hacks to make it behave like
>>Windows if what Windows does doesn't make sense.
> 
> 
> I'd disagree, for a very simple reason: case-insensitivity itself simply 
> does not make sense, so the _only_ reason for having a bunch of hacks is 
> literally to support windows file exports and nothing else.
> 
> I obviously agree with the fact that we should _not_ put those hacks into 
> the VFS layer proper - we should keep them as a separate thing, and we 
> should make it clear that it makes no sense _except_ for Windows 
> compatibility.
> 
> Think of it as nothing more than a binary compatibility layer, the same 
> way we have hooks to support "lcall 7,0" for binary compatibility with 
> some silly (and much less interesting) x86 OSes through external modules.
> 

Well, this is also true :)  I still say it belongs in userspace.

For 100% bug-compatibility with Windows, though, it is probably
worthwhile to have the filename in the native filesystem be not what a
Windows user would see, but rather the normalized filename.  That makes
a userspace implementation much easier.

	-hpa


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18  3:14             ` H. Peter Anvin
@ 2004-02-18  3:27               ` Linus Torvalds
  2004-02-18 21:31                 ` tridge
  0 siblings, 1 reply; 69+ messages in thread
From: Linus Torvalds @ 2004-02-18  3:27 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

On Tue, 17 Feb 2004, H. Peter Anvin wrote:
> 
> Well, this is also true :)  I still say it belongs in userspace.

The thing is, I do agree with Tridge on one simple fact: it's very hard 
indeed to do atomic file operations from user space.

That's not necessarily a problem if samba is the only process accessing
the directories in question, since then samba could do all locking
internally and make sure that it never does anything inconsistent.

However, clearly people who run samba on a machine want to potentially 
_also_ export that same filesystem as a NFS volume, as a way to have both 
Windows and UNIX clients access the same data. And that pretty much means 
that other people _will_ access the directories, and that samba can't do 
its internal locking in that kind of environment.

This is why I am symphathetic to the need to add _some_ kind of support 
for this. And the only common place ends up being the kernel.

> For 100% bug-compatibility with Windows, though, it is probably
> worthwhile to have the filename in the native filesystem be not what a
> Windows user would see, but rather the normalized filename.  That makes
> a userspace implementation much easier.

Oh, absolutely. But that's something that samba can easily do internally: 
it can choose to just entirely ignore filenames that aren't normalized, or 
it can export it on the wire (obviously in the normalized UCS-2 format), 
and just consider non-normalized names to be another "case". In fact, 
that's what the naive implementation would do anyway, so that's not any 
added complexity.

(And samba clearly _cannot_ show the client a non-normalized name per se, 
since the smb protocol ends up using UCS-2).

		Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18  3:27               ` Linus Torvalds
@ 2004-02-18 21:31                 ` tridge
  2004-02-18 22:23                   ` Linus Torvalds
  0 siblings, 1 reply; 69+ messages in thread
From: tridge @ 2004-02-18 21:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, linux-kernel

Linus,

 > The thing is, I do agree with Tridge on one simple fact: it's very hard 
 > indeed to do atomic file operations from user space.

I'm glad I'm making progress :)

The second basic fact that I think is relevant is that its not
possible to do case-insensitive filesystem operations efficiently
without the filesystem having knowledge of the fact that you want a
case-insensitive lookup.

The reason for this is that modern filesystems do much better than an
O(n) linear scan for lookups in directories. They use a hash, or a
tree or whatever you like to take advantage of an ordering function on
the names in the directory. The days of linear scans in directories
are fast dwindling.

The only way you are going to avoid the linear scan for a
case-insensitive lookup is to make that ordering function
case-insensitive. The question really is whether we are willing to pay
the price in terms of complexity for doing that. I've tried to make
the claim in this thread that the code complexity cost of doing this
isn't really all that high, but it is definately non-zero.

So your magic_open() proposal would probably be a help, and would
certainly reduce the amount of code we would need in userspace, but it
doesn't change the fundamental linear scan of directories problem at
all. 

That doesn't mean I won't take you up on the magic_open() proposal,
it's just that I'd need to try it to see if its a sufficient win to
justify using it given the limitations.

Cheers, Tridge

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18 21:31                 ` tridge
@ 2004-02-18 22:23                   ` Linus Torvalds
  2004-02-18 22:28                     ` Linus Torvalds
  0 siblings, 1 reply; 69+ messages in thread
From: Linus Torvalds @ 2004-02-18 22:23 UTC (permalink / raw)
  To: tridge; +Cc: H. Peter Anvin, linux-kernel



On Thu, 19 Feb 2004 tridge@samba.org wrote:
> 
> The second basic fact that I think is relevant is that its not
> possible to do case-insensitive filesystem operations efficiently
> without the filesystem having knowledge of the fact that you want a
> case-insensitive lookup.

That's not my problem. That is _your_ problem, and I don't care. I 
disagree violently with the notion that we would push this down to a 
filesystem level.

Sorry, but there are limits to how much we care about broken operating 
systems.

		Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18 22:23                   ` Linus Torvalds
@ 2004-02-18 22:28                     ` Linus Torvalds
  2004-02-18 22:50                       ` tridge
  0 siblings, 1 reply; 69+ messages in thread
From: Linus Torvalds @ 2004-02-18 22:28 UTC (permalink / raw)
  To: tridge; +Cc: H. Peter Anvin, linux-kernel



On Wed, 18 Feb 2004, Linus Torvalds wrote:
> 
> That's not my problem. That is _your_ problem, and I don't care. I 
> disagree violently with the notion that we would push this down to a 
> filesystem level.
> 
> Sorry, but there are limits to how much we care about broken operating 
> systems.

Side note: this only matters for cold cache entries anyway, so I doubt 
you'll see any performance improvement on a file server from passing the 
brain damage down to the lower levels. 

And I bet the performance advantages of _not_ doing native case 
insensitivity are likely to dominate hugely.

		Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18 22:28                     ` Linus Torvalds
@ 2004-02-18 22:50                       ` tridge
  2004-02-18 22:59                         ` Linus Torvalds
  0 siblings, 1 reply; 69+ messages in thread
From: tridge @ 2004-02-18 22:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, linux-kernel

Linus,

 > And I bet the performance advantages of _not_ doing native case 
 > insensitivity are likely to dominate hugely.

This part I just don't understand at all. The proposed changes would
be extremely cheap performance wise as you are just replacing one hash
with another, and dealing with one extra context bit in the
dcache. There is no way that this could come anywhere near the cost of
doing linear directory scans.

The hash function would be slightly more expensive (when enabled), but
not much, especially when you put in the obvious optimisation for 7
bit characters. The string comparison function in a couple of places
would also become more expensive, but once again it would only be
expensive for case-insensitive processes and benefits from the 7 bit
optimisation so that the average case will only be very slightly more
expensive than the current function.

Fair enough that you don't want to do this for code complexity
reasons, but please don't tell me it would be slower than what we have
to do now. 

Try an strace of Samba trying to unlink() a non-existant file in a
large directory. It's enough to make you want to curl up and die :)

Cheers, Tridge

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18 22:50                       ` tridge
@ 2004-02-18 22:59                         ` Linus Torvalds
  2004-02-18 23:09                           ` tridge
  0 siblings, 1 reply; 69+ messages in thread
From: Linus Torvalds @ 2004-02-18 22:59 UTC (permalink / raw)
  To: tridge; +Cc: H. Peter Anvin, linux-kernel

On Thu, 19 Feb 2004 tridge@samba.org wrote:
> 
>  > And I bet the performance advantages of _not_ doing native case 
>  > insensitivity are likely to dominate hugely.
> 
> This part I just don't understand at all. The proposed changes would
> be extremely cheap performance wise as you are just replacing one hash
> with another, and dealing with one extra context bit in the
> dcache. There is no way that this could come anywhere near the cost of
> doing linear directory scans.

Why do you focus on linear directory scans?

They simply do not happen under any reasonable IO patterns. You look up 
names under the same name that they are on the disk. So the _only_ thing 
that should matter is the exact match.

The inexact matches should be a case of "make them correct". Screw 
performance. And tell people that they are slower.

Sure, I can imaging that MS would make some benchmark to show that case, 
but at that point I just don't care. 

		Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18 22:59                         ` Linus Torvalds
@ 2004-02-18 23:09                           ` tridge
  2004-02-18 23:16                             ` Linus Torvalds
  0 siblings, 1 reply; 69+ messages in thread
From: tridge @ 2004-02-18 23:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, linux-kernel

Linus,

 > Why do you focus on linear directory scans?

Because a large number of file operations are on filenames that don't
exist. I have to *prove* they don't exist. That includes:

 * every file create. I have to prove there wasn't an existing file
   under a different case combination.

 * every rename. Again, I have to prove that the destination name
   doesn't exist.

 * every open of a non-existant name (*very* common, its what MS
   office does all the time).

 etc etc.

If I had a single function that could quickly tell me that a file does
not exist in any case combination then I would be much better off.

 > They simply do not happen under any reasonable IO patterns. You look up 
 > names under the same name that they are on the disk. So the _only_ thing 
 > that should matter is the exact match.

nope, see above. The most common pattern of accesses involves doing a
full directory scan on every access.

 > Sure, I can imaging that MS would make some benchmark to show that case, 
 > but at that point I just don't care. 

It's not just "some benchmark". It's the normal use case.

Cheers, Tridge

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18 23:09                           ` tridge
@ 2004-02-18 23:16                             ` Linus Torvalds
  2004-02-19  8:10                               ` Jamie Lokier
  0 siblings, 1 reply; 69+ messages in thread
From: Linus Torvalds @ 2004-02-18 23:16 UTC (permalink / raw)
  To: tridge; +Cc: H. Peter Anvin, linux-kernel

On Thu, 19 Feb 2004 tridge@samba.org wrote:
> 
>  > Why do you focus on linear directory scans?
> 
> Because a large number of file operations are on filenames that don't
> exist. I have to *prove* they don't exist.

And you only need to do that ONCE per name.

There is zero reason to do it over and over again, and there is zero 
reason to push case insensitivity deep into the filesystem.

Have you checked how many filesystems we have? Hint: 

	ls -l fs/ | grep '^d' | wc

The thing is, you have to realize that Windows-compatibility is very very 
much second-class. If you refuse to realize that, you can't argue 
effectively, because you are arguing for things that simply WILL NOT 
happen.

So instead of having this crazy windows-centric idea, I would suggest you 
try to come up with ways to make it easier for you. I can tell you already 
that it won't be everything you want or need, but quite frankly, your 
choice is between _nada_ and something reasonable.

So give it up. We're not making the same STUPID mistakes that Microsoft 
has done. 

		Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18 23:16                             ` Linus Torvalds
@ 2004-02-19  8:10                               ` Jamie Lokier
  2004-02-19 16:09                                 ` Linus Torvalds
  0 siblings, 1 reply; 69+ messages in thread
From: Jamie Lokier @ 2004-02-19  8:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: tridge, H. Peter Anvin, linux-kernel

Linus Torvalds wrote:
> >  > Why do you focus on linear directory scans?
> > 
> > Because a large number of file operations are on filenames that don't
> > exist. I have to *prove* they don't exist.
> 
> And you only need to do that ONCE per name.
> 
> There is zero reason to do it over and over again, and there is zero 
> reason to push case insensitivity deep into the filesystem.

Linus, while I agree with you wholeheartedly on everything else in
this thread - how can Samba only do that lookup ONCE per name if a
client is issuing many requests for non-existent opens or stats?

Example: A client has a search path for executables or libraries.

Each time SomeThing.DLL is looked up by the client, it will issue an
open() for each entry in the path, until it finds the file it wants.

For each request, Samba must readdir() every directory in the path
until the file is found.

If a directory doesn't change between requests, Samba can use dnotify
to cache the negative lookups.

However, if any change occurs in a directory, or if the directory is
not dnotify-capable, Samba is not allowed to cache these negative
results: It has to do the readdir() for _every_ request.

-- Jamie

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-19  8:10                               ` Jamie Lokier
@ 2004-02-19 16:09                                 ` Linus Torvalds
  2004-02-19 16:38                                   ` Jamie Lokier
  0 siblings, 1 reply; 69+ messages in thread
From: Linus Torvalds @ 2004-02-19 16:09 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: tridge, H. Peter Anvin, linux-kernel

On Thu, 19 Feb 2004, Jamie Lokier wrote:
> 
> Linus, while I agree with you wholeheartedly on everything else in
> this thread - how can Samba only do that lookup ONCE per name if a
> client is issuing many requests for non-existent opens or stats?

While I'm not willing to push case insensitivity deep into the
filesystems, I _am_ willing to entertain the notion of an extra flag to a
dcache entry that the regular VFS operations ignore (apart from clearing
it when they change anything and having to flush them under some
circumstances), which would basically be "this dentry has been judged
unique in a case-insensitive environment".

So assuming nobody else is touching the directory, the case-insensitive 
special module could create these kinds of dentries to its hearts content 
when it does a lookup.

> Example: A client has a search path for executables or libraries.
> 
> Each time SomeThing.DLL is looked up by the client, it will issue an
> open() for each entry in the path, until it finds the file it wants.
> 
> For each request, Samba must readdir() every directory in the path
> until the file is found.
> 
> If a directory doesn't change between requests, Samba can use dnotify
> to cache the negative lookups.
> 
> However, if any change occurs in a directory, or if the directory is
> not dnotify-capable, Samba is not allowed to cache these negative
> results: It has to do the readdir() for _every_ request.

But this is exactly what I _am_ willing to entertain: have some limited 
special logic inside the kernel (but outside the VFS layer proper), that 
allows samba to use special interfaces that avoids this.

For example, the rule can be that _any_ regular dentry create will 
invalidate all the "case-insensitive" dentries. Just to be simple about 
it. But if samba is the only thing that accesses a certain directory (or 
the directory is not written to, like / and /usr etc usually behave), the 
"windows hack" interface will be able to populate it with its fake 
dentries all it wants.

Or something like this. Basically, I'm convinced that the problem _can_ be 
solved without going deep into the VFS layer. Maybe I'm wrong. But I'd 
better not be, because we're definitely not going to screw up the VFS 
layer for Windows.

			Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-19 16:09                                 ` Linus Torvalds
@ 2004-02-19 16:38                                   ` Jamie Lokier
  2004-02-19 16:54                                     ` Linus Torvalds
  0 siblings, 1 reply; 69+ messages in thread
From: Jamie Lokier @ 2004-02-19 16:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: tridge, H. Peter Anvin, linux-kernel

Linus Torvalds wrote:
> For example, the rule can be that _any_ regular dentry create will 
> invalidate all the "case-insensitive" dentries. Just to be simple about 
> it.

If that's the rule, then with exactly the same algorithmic efficiency,
readdir+dnotify can be used to maintain the cache in userspace
instead.  There is nothing gained by using the helper module in that case.

It follows that a helper module is only useful if readdir+dnotify
isn't fast enough, and the invalidation rule has to be more selective.

(Although, maybe there are atomicity concerns I haven't thought of).

-- Jamie

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-19 16:38                                   ` Jamie Lokier
@ 2004-02-19 16:54                                     ` Linus Torvalds
  2004-02-19 18:29                                       ` Jamie Lokier
  2004-02-19 19:08                                       ` Helge Hafting
  0 siblings, 2 replies; 69+ messages in thread
From: Linus Torvalds @ 2004-02-19 16:54 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: tridge, H. Peter Anvin, linux-kernel



On Thu, 19 Feb 2004, Jamie Lokier wrote:
> Linus Torvalds wrote:
> > For example, the rule can be that _any_ regular dentry create will 
> > invalidate all the "case-insensitive" dentries. Just to be simple about 
> > it.
> 
> If that's the rule, then with exactly the same algorithmic efficiency,
> readdir+dnotify can be used to maintain the cache in userspace
> instead.  There is nothing gained by using the helper module in that case.

Wrong.

Because the dnotify would trigger EVEN FOR SAMBA OPERATIONS.

Think about it. Think about samba doing a "rename()" within the directory.

		Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-19 16:54                                     ` Linus Torvalds
@ 2004-02-19 18:29                                       ` Jamie Lokier
  2004-02-19 19:08                                       ` Helge Hafting
  1 sibling, 0 replies; 69+ messages in thread
From: Jamie Lokier @ 2004-02-19 18:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: tridge, H. Peter Anvin, linux-kernel

Linus Torvalds wrote:
> > > For example, the rule can be that _any_ regular dentry create will 
> > > invalidate all the "case-insensitive" dentries. Just to be simple about 
> > > it.
> > 
> > If that's the rule, then with exactly the same algorithmic efficiency,
> > readdir+dnotify can be used to maintain the cache in userspace
> > instead.  There is nothing gained by using the helper module in that case.
> 
> Wrong.
> Because the dnotify would trigger EVEN FOR SAMBA OPERATIONS.

Ah, I didn't know you meant "_any_ regular dentry create (except for
Samba operations)".

To apply that rule, you either need alternate versions of rename() and
other file syscalls, or something akin to a process-specific flag (set
by the helper module) saying that this is a Samba process and dentry
creation _by this process_ shouldn't invalidate case-insensitive
dentries.

And if you have either of those, the bit of code which says "don't
invalidate case-insenitive dentries because this is a Samba process"
can just as easily say "don't send dnotify events to the current
process".

And once you've done that, it's easier just to add a DN_IGNORE_SELF
flag to dnotify meaning to ignore events caused by the current
process, and forget about the helper module.  That'd be useful for
other programs, too.

-- Jamie

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-19 16:54                                     ` Linus Torvalds
  2004-02-19 18:29                                       ` Jamie Lokier
@ 2004-02-19 19:08                                       ` Helge Hafting
  1 sibling, 0 replies; 69+ messages in thread
From: Helge Hafting @ 2004-02-19 19:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jamie Lokier, tridge, H. Peter Anvin, linux-kernel

On Thu, Feb 19, 2004 at 08:54:51AM -0800, Linus Torvalds wrote:
> 
> 
> On Thu, 19 Feb 2004, Jamie Lokier wrote:
> > Linus Torvalds wrote:
> > > For example, the rule can be that _any_ regular dentry create will 
> > > invalidate all the "case-insensitive" dentries. Just to be simple about 
> > > it.
> > 
> > If that's the rule, then with exactly the same algorithmic efficiency,
> > readdir+dnotify can be used to maintain the cache in userspace
> > instead.  There is nothing gained by using the helper module in that case.
> 
> Wrong.
> 
> Because the dnotify would trigger EVEN FOR SAMBA OPERATIONS.
> 
> Think about it. Think about samba doing a "rename()" within the directory.

Avoiding its own operations is a nice one.  Could dnotify pass
some information, such as the inode number involved to samba?
samba could then look up the filename in its cache and take a
closer look at that file only.  That would avoid loosing the cache,
even in case of other processes intruding.

Helge Hafting

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18  2:37         ` H. Peter Anvin
  2004-02-18  3:03           ` Linus Torvalds
@ 2004-02-18  4:08           ` tridge
  2004-02-18 10:05             ` Robin Rosenberg
  1 sibling, 1 reply; 69+ messages in thread
From: tridge @ 2004-02-18  4:08 UTC (permalink / raw)
  To: hpa; +Cc: Kernel Mailing List

Hpa,

> So you're hosed if anyone uses characters outside the UCS-2 character
> set...

I've heard they are re-defining all those 16 bit numbers to be UCS-16
instead of UCS-2 for exactly that reason. This is rather similar to
the move in the Unix community to start using UTF-8.

Note that I am not at all proposing that we use UCS-2 in the Linux
kernel (except in places where you have to, like the NTFS
filesystem). I am proposing that the filesystems be able to offer a
case-insenstive hash function to the dcache, and I would expect that
this function would be based on UTF-8. 

The function might operate internally by converting UTF-8 to UCS-2, or
it might use a sparse mapping table. It would almost certainly have a
fast-path that looked first to see if there are any bytes with the top
bit set, and if there are none then it can do a really easy 7 bit
table based hash which would make this really fast for most users.

The point is that the kernel proper (the VFS and dcache in particular)
won't have to care how this hash works. They're just consumers of it. 

> There is a "standard" table, which is published by the Unicode
> consortium. 

The table used in windows is not exactly the same as the one on
unicode.org. Which is "correct" I will leave up to the pedants to
discuss, as all that Samba cares about is that it uses the same table
as w2k.

> However, the "standard" table isn't what you want in certain
> locales, e.g. Turkish.

I'd really like someone to confirm this for me by volunteering to run
a tool I provide on a Turkish NTFS filesystem or sending me a
compressed empty Turkish NTFS volume (please ask first by email - I
only need one of these). Up to now I have only ever seen the one 128k
table used across all windows locales. If this table really *is*
different in some locales then I need to know.

Cheers, Tridge

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18  4:08           ` tridge
@ 2004-02-18 10:05             ` Robin Rosenberg
  2004-02-18 11:43               ` tridge
  0 siblings, 1 reply; 69+ messages in thread
From: Robin Rosenberg @ 2004-02-18 10:05 UTC (permalink / raw)
  To: tridge; +Cc: hpa, Kernel Mailing List

On Wednesday 18 February 2004 05.08, tridge@samba.org wrote:
> Hpa,
> 
> > So you're hosed if anyone uses characters outside the UCS-2 character
> > set...
> 
> I've heard they are re-defining all those 16 bit numbers to be UCS-16
> instead of UCS-2 for exactly that reason. This is rather similar to
> the move in the Unix community to start using UTF-8.

I've read it also: http://www.microsoft.com/globaldev/getwr/steps/wrg_unicode.mspx
"The fundamental representation of text in Windows NT-based operating systems is UTF-16"

-- robin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18 10:05             ` Robin Rosenberg
@ 2004-02-18 11:43               ` tridge
  2004-02-18 12:31                 ` Robin Rosenberg
  0 siblings, 1 reply; 69+ messages in thread
From: tridge @ 2004-02-18 11:43 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: hpa, Kernel Mailing List

Robin,

 > I've read it also:
 > http://www.microsoft.com/globaldev/getwr/steps/wrg_unicode.mspx
 > "The fundamental representation of text in Windows NT-based
 > operating systems is UTF-16"

yep, in this thread I've been mistakenly using the term UCS-16 when I
should have said UTF-16 (ie. the variable length, 2 byte encoding).

Samba currently treats the bytes on the wire from windows as UCS-2 (a
2 byte fixed width encoding), whereas perhaps it should be treating
them as UTF-16. I should write a smbtorture test to detect the
difference and see what different versions of windows actually use.

luckily the new charset handling stuff in samba3 and samba4 will make
this easy to fix :-)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18 11:43               ` tridge
@ 2004-02-18 12:31                 ` Robin Rosenberg
  2004-02-18 16:48                   ` H. Peter Anvin
  0 siblings, 1 reply; 69+ messages in thread
From: Robin Rosenberg @ 2004-02-18 12:31 UTC (permalink / raw)
  To: tridge; +Cc: hpa, Kernel Mailing List

On Wednesday 18 February 2004 12.43, tridge@samba.org wrote:
> Robin,
>  > I've read it also:
>  > http://www.microsoft.com/globaldev/getwr/steps/wrg_unicode.mspx
>  > "The fundamental representation of text in Windows NT-based
>  > operating systems is UTF-16"

I believe (please correct me if this is wrong) that Windows never actually
supported any of the UCS-2 code that were in conflict with UTF-16. The cost
of this operation was that some of the "private" code blocks of unicode 2.0, i.e. 
U+D800..U+DFFF were redefined as "surrogates" in Unicode 3.0 making the 
UTF-16 encoding more or less backwards compatible with UCS-2. And it's 
UTF-16LE and UCS-2LE, but I suspect you knew that :-)

> yep, in this thread I've been mistakenly using the term UCS-16 when I
> should have said UTF-16 (ie. the variable length, 2 byte encoding).
> 
> Samba currently treats the bytes on the wire from windows as UCS-2 (a
> 2 byte fixed width encoding), whereas perhaps it should be treating
> them as UTF-16. I should write a smbtorture test to detect the
> difference and see what different versions of windows actually use.
See above, and most importantly the definition in Amendment 1 of the unicode 
3.0 standard.

> luckily the new charset handling stuff in samba3 and samba4 will make
> this easy to fix :-)
Happy man!

-- robin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18 12:31                 ` Robin Rosenberg
@ 2004-02-18 16:48                   ` H. Peter Anvin
  2004-02-18 20:00                     ` H. Peter Anvin
  0 siblings, 1 reply; 69+ messages in thread
From: H. Peter Anvin @ 2004-02-18 16:48 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: tridge, Kernel Mailing List

Robin Rosenberg wrote:
> 
> I believe (please correct me if this is wrong) that Windows never actually
> supported any of the UCS-2 code that were in conflict with UTF-16. The cost
> of this operation was that some of the "private" code blocks of unicode 2.0, i.e. 
> U+D800..U+DFFF were redefined as "surrogates" in Unicode 3.0 making the 
> UTF-16 encoding more or less backwards compatible with UCS-2. And it's 
> UTF-16LE and UCS-2LE, but I suspect you knew that :-)
> 

Make that Unicode 1.0 and 1.1, and you're correct.

	-hpa

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18 16:48                   ` H. Peter Anvin
@ 2004-02-18 20:00                     ` H. Peter Anvin
  0 siblings, 0 replies; 69+ messages in thread
From: H. Peter Anvin @ 2004-02-18 20:00 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <4033974F.4090706@zytor.com>
By author:    "H. Peter Anvin" <hpa@zytor.com>
In newsgroup: linux.dev.kernel
>
> Robin Rosenberg wrote:
> > 
> > I believe (please correct me if this is wrong) that Windows never actually
> > supported any of the UCS-2 code that were in conflict with UTF-16. The cost
> > of this operation was that some of the "private" code blocks of unicode 2.0, i.e. 
> > U+D800..U+DFFF were redefined as "surrogates" in Unicode 3.0 making the 
> > UTF-16 encoding more or less backwards compatible with UCS-2. And it's 
> > UTF-16LE and UCS-2LE, but I suspect you knew that :-)
> > 
> 
> Make that Unicode 1.0 and 1.1, and you're correct.
> 

Err, that was supposed to be 1.1 and 2.0.

Unicode 1.1 reshuffled the private use range from Unicode 1.0, in
order to make room for surrogates in Unicode 2.0.

UTF-16, what a horrible ugly hack.

	-hpa

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17  5:11 ` Linus Torvalds
  2004-02-17  6:54   ` tridge
@ 2004-02-19  2:53   ` Daniel Newby
  1 sibling, 0 replies; 69+ messages in thread
From: Daniel Newby @ 2004-02-19  2:53 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Tridgell, Kernel Mailing List

Linus Torvalds wrote:
> So some variation of the interface
> 
> 	int magic_open(
> 		/* Input arguments */
> 		const char *pathname,
> 		unsigned long flags,
> 		mode_t mode,

What about making the pathname hold the alternative cases for each 
character, not just an exact string?  If Samba wanted to open
"A File.txt", it would do

     magic_open( "[a|A][ ][f|F][i|I][e|E][.][t|T][x|X][t|T]", ... )

The syntax shown is conceptual; the actual code would use binary 
packing.  Characters would be variable length to support UTF-8 and 
the like.

Userland would be responsible for making a useful pathname.  If it 
tried something like "[aL|P|#][m|m]", the kernel would cheerfully 
use it.  The only sanity checking would be that special characters 
like "/" and ":" cannot have alternatives.

Pros:

1.  Filesystem names are looked up in kernel mode, where it might be 
efficient.  (Less grossly slow at least.)

2.  But the kernel doesn't care about encodings and character sets.

3.  No new kernel infrastructure needed.  (I hope?)  The case- 
insensitive system calls don't take a performance hit.

4.  The kernel can detect name collisions and decide what to do 
based on a flag.

5.  Lookup tables are totally in userland and outside locks.  Each 
app can use the table it finds appropriate.

6.  A naughty app can't deadlock the filesystem.

7.  Case-insensitive calls can be atomic, if you're willing to pay 
the performance price.  It's straightforward for magic_creat() to 
refuse to create collisions.

Cons:

1.  Looking up multiple alternatives is hairy.  (Not that the other 
approaches are much prettier.)

2.  Massive filenames would get turned into something *really* 
massive (five times as many bytes for a simple packing).  Does this 
break anything?

     -- Daniel Newby

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17  4:12 tridge
  2004-02-17  5:11 ` Linus Torvalds
@ 2004-02-17  5:25 ` Tim Connors
  2004-02-17  7:43 ` H. Peter Anvin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 69+ messages in thread
From: Tim Connors @ 2004-02-17  5:25 UTC (permalink / raw)
  To: linux-kernel

tridge@samba.org said on Tue, 17 Feb 2004 15:12:06 +1100:
> Given how much pain the "kernel is agnostic to charset encoding"
> attitude has cost me in terms of programming pain, I thought I should
> de-cloak from lurk mode and put my 2c into the UTF-8 issue.
> 
> Personally I think that eventually the Linux kernel will have to
> embrace the interpretation of the byte streams that applications have
> given it,

What applications?

> despite the fact that this will be very painful and
> potentially quite complex. The reason is that I think that eventually
> the Linux kernel will need to efficiently support a userspace policy
> of case-insensitivity and the only way to do case-insensitive filename
> operations is to interpret those byte streams as a particular
> encoding.
> 
> Personally I much prefer the systems I use to be case-sensitive, but
> there are important applications that require case-insensitivity for
> interoperability. 

Why? Sounds pretty idiotic to me.

If you don't like it, using some microshit filesystem like vfat. I'll
keep using ext3 etc, thanks.

-- 
TimC -- http://astronomy.swin.edu.au/staff/tconnors/
Conclusion to my thesis -- "It is trivial to show that it is 
clearly obvious that this is not woofly."

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17  4:12 tridge
  2004-02-17  5:11 ` Linus Torvalds
  2004-02-17  5:25 ` Tim Connors
@ 2004-02-17  7:43 ` H. Peter Anvin
  2004-02-17  8:05   ` H. Peter Anvin
  2004-02-17 14:25 ` Dave Kleikamp
  2004-02-18  0:16 ` Robert White
  4 siblings, 1 reply; 69+ messages in thread
From: H. Peter Anvin @ 2004-02-17  7:43 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <16433.38038.881005.468116@samba.org>
By author:    tridge@samba.org
In newsgroup: linux.dev.kernel
>
> Given how much pain the "kernel is agnostic to charset encoding"
> attitude has cost me in terms of programming pain, I thought I should
> de-cloak from lurk mode and put my 2c into the UTF-8 issue.
> 
> Personally I think that eventually the Linux kernel will have to
> embrace the interpretation of the byte streams that applications have
> given it, despite the fact that this will be very painful and
> potentially quite complex. The reason is that I think that eventually
> the Linux kernel will need to efficiently support a userspace policy
> of case-insensitivity and the only way to do case-insensitive filename
> operations is to interpret those byte streams as a particular
> encoding.
> 

Realistically, the only sane way to do this is to set our foot down
and say: UTF-8 is *the* encoding.  A good step in that direction would
be to set utf-8 to be the default NLS in the kernel, but as long as
people keep the whole sick idea that we can continue to use
locale-dependent encoding we're in for a world of hurt.

That's really the long and short of it.  Until people are willing to
say "we support UTF-8, anything else and it's anyone's guess what
happens" then nothing is going to happen.

	-hpa
-- 
PGP public key available - finger hpa@zytor.com
Key fingerprint: 2047/2A960705 BA 03 D3 2C 14 A8 A8 BD  1E DF FE 69 EE 35 BD 74
"The earth is but one country, and mankind its citizens."  --  Bahá'u'lláh
Just Say No to Morden * The Shadows were defeated -- Babylon 5 is renewed!!

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17  7:43 ` H. Peter Anvin
@ 2004-02-17  8:05   ` H. Peter Anvin
  0 siblings, 0 replies; 69+ messages in thread
From: H. Peter Anvin @ 2004-02-17  8:05 UTC (permalink / raw)
  To: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 2181 bytes --]

Followup to:  <c0sgnc$ngo$1@terminus.zytor.com>
By author:    hpa@zytor.com (H. Peter Anvin)
In newsgroup: linux.dev.kernel
> 
> Realistically, the only sane way to do this is to set our foot down
> and say: UTF-8 is *the* encoding.  A good step in that direction would
> be to set utf-8 to be the default NLS in the kernel, but as long as
> people keep the whole sick idea that we can continue to use
> locale-dependent encoding we're in for a world of hurt.
> 
> That's really the long and short of it.  Until people are willing to
> say "we support UTF-8, anything else and it's anyone's guess what
> happens" then nothing is going to happen.
> 

Oh yes, on top of that, if you want case insensitivity, then you also
need to start thinking about a whole lot of other things, including
what normalization form(s) you care about.  Keeping normalization (as
well as case-conversion) data for the entire Unicode space in the
kernel is a boatload of memory.

Then, you have to deal with your filesystem going sour on you when two
files suddenly alias, because there is a new revision of the mapping
tables.

Case seemed simple when we were dealing with the "let's teach them all
English" world, but even when you're dealing with languages like
German (ÃŸ) or Dutch (Ä²) things get fuzzy... what's worse, in
Turkish the uppercase equivalent of "i" (U+0069) isn't "I" (U+0049),
it's "Ä°" (U+0130)!  There is no table which can tell you that, since
it's context-dependent.  Thus, you may now need to consider larger
equivalence classes, but is the other user expecting the same thing?
You can't just use the same base letter being equivalent everywhere,
or a Swedish user would beat the sh*t out of you for confusing the
words "vas" and "vÃ¤s".  On the other hand, the Swedish user would be
perfectly happy having "Ã¤" equivalent with "Ã¦" and "Ã¼" equivalent
with "y"!

Therein lies madness.

	-hpa

-- 
PGP public key available - finger hpa@zytor.com
Key fingerprint: 2047/2A960705 BA 03 D3 2C 14 A8 A8 BD  1E DF FE 69 EE 35 BD 74
"The earth is but one country, and mankind its citizens."  --  Bahá'u'lláh
Just Say No to Morden * The Shadows were defeated -- Babylon 5 is renewed!!

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-17  4:12 tridge
                   ` (2 preceding siblings ...)
  2004-02-17  7:43 ` H. Peter Anvin
@ 2004-02-17 14:25 ` Dave Kleikamp
  2004-02-18  0:16 ` Robert White
  4 siblings, 0 replies; 69+ messages in thread
From: Dave Kleikamp @ 2004-02-17 14:25 UTC (permalink / raw)
  To: tridge; +Cc: linux-kernel

On Mon, 2004-02-16 at 22:12, tridge@samba.org wrote:
> Given how much pain the "kernel is agnostic to charset encoding"
> attitude has cost me in terms of programming pain, I thought I should
> de-cloak from lurk mode and put my 2c into the UTF-8 issue.
> 
> Personally I think that eventually the Linux kernel will have to
> embrace the interpretation of the byte streams that applications have
> given it, despite the fact that this will be very painful and
> potentially quite complex. The reason is that I think that eventually
> the Linux kernel will need to efficiently support a userspace policy
> of case-insensitivity and the only way to do case-insensitive filename
> operations is to interpret those byte streams as a particular
> encoding.
> 
> Personally I much prefer the systems I use to be case-sensitive, but
> there are important applications that require case-insensitivity for
> interoperability. Right now it is not possible to write a case
> insensitive application on Linux in an efficient manner. With the
> current "encoding agnostic" APIs a simple open() or stat() call
> becomes a horrendously expensive operation and one that is fraught
> with race conditions. Providing the same functionality in the kernel
> is dirt cheap by comparison (not cheap in terms of code complexity,
> but cheap in terms of runtime efficiency).

This would be easy to do in JFS due to the baggage we carried over to be
compatible with OS/2-formatted volumes.  In OS/2, the directories were
ordered in a case-insensitive fashion.  This would have to be a mkfs
option, and would not be a per-process option.  The directories must be
created either case-sensitive or not.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: UTF-8 and case-insensitivity
  2004-02-17  4:12 tridge
                   ` (3 preceding siblings ...)
  2004-02-17 14:25 ` Dave Kleikamp
@ 2004-02-18  0:16 ` Robert White
  2004-02-18  0:20   ` Linus Torvalds
  2004-02-18  2:48   ` tridge
  4 siblings, 2 replies; 69+ messages in thread
From: Robert White @ 2004-02-18  0:16 UTC (permalink / raw)
  To: tridge, linux-kernel
  Cc: 'Linus Torvalds', 'Kernel Mailing List',
	'Al Viro', 'Neil Brown'

OK, so I wrote the below, but then in the summary I realized that there was
a significant factor that doesn't fit in with the rest of the post.  Case
insensitivity, and more generally locale equivalence rules, is a security
nightmare.  Consider the number of different file names that "su" could map
to if you apply case insensitivity (4) and/or worse yet the various accents
and umlats (?,etc) that sort-equivalent for "u" in some locales.  The user
types "su" and runs "S(u-umlat)" etc. 

====

In point of fact (ok in point of "technically abstract truth"), it is a "bad
thing" that Windows (and seemingly only Windows these days) is case
insensitive.  It is sometimes said that windows is really an application and
not an OS.  If you ignore the occasionally snide *way* it is said you can
find some technical truth to the matter.

In point of fact the entire windows application space has a singular active
locale at any one time and there is a well-defined but horrible layer of
indirection where "long names" like "My Documents" become "real names" like
"MYDOCU~1".  Essentially every windows file name is subject to a
double-indirect file name translation.  The first pass is the strcasecmp()
locale-dependent traversal of the "long name" list.  The second is the
strcasecmp() frozen-locale-spec-dependent traversal of (US Latin?) 8.3 file
naming standard list of media elements (files/directories).

In point of fact, Windows is *not* "properly" case insensitive at the file
system level.  Use "dir /x" more often on your windows box to relive the
experience.  The "real" file names are mangled to good old 8.3 uppercase
internally(1).  You don't usually have to think about this, but if you have
ever lost the long-to-short file name mapping on a drive you know the hell
that ensues.  (see also iso9660.)

So the application file naming interface wedge thingy (in windows) creates
and maintains the mixed case names as an illusion.  It just happens to be an
illusion planted so deeply in the application space that it appears to be
coming up from the "operating system level".

OK, as time has moved on, some later versions of later file systems *may* (I
honestly don't know) have modified the double-indirection model, but if they
have, they must have done so in a guaranteed-to-look-the-same way.  Either
way it ends up being quite costly.

Further, the model only really works because a DOS (and therefore windows)
based program invariably and individually takes responsibility for doing all
sorts of tasks like wildcard expansions (etc) in the application space
(often "free" through comctl32.dll).  [This tends to be foreign to Linux
(UNIX) programmers where shells and such do the expansion.]

The line is then blurred further by the subsequent steady creep of
wildcarding and file selection back into common DLLs.  (more comctl32.dll
and friends.)

The thing is, to match this ersatz "functionality" on a system where more
than one locale may be used at the same time, you end up with a kind of
Cartesian product of user locales and filesystem native locales.  The cost
could get extreme and can only really be amortized if Linux were to declare
our own 8.3 style pronouncement for the character classes used for the
"real" file name storage (etc).

Late stage case insensitivity isn't that hard to put in a linux application,
just crack open your file selection dialog boxes and have them use
strcasecmp() in all their select/sort logic.  Also then replace open() with
CaseOpen() which does a find/search operation before daring to creat().
That is, in every practical way, how Windows handles these problems.  It
just happens in some fairly interesting and hard-to-predict places depending
on context.

It is easier, IMHO, to bring the users into the 20th century (let alone the
21st 8-) by making them mean what they say (if they deign to step out from
behind their GUIs).

So what was I saying... Oh yea...

-- Single Locale storage standard required to prevent multiplicative cost.
-- Not that hard to fake case insensitivity "when necessary".
-- Cheaper in CPU/Space to mix case.
-- Native file names in native locales simplifies administration and
expectations. (not elaborated above, but true.)
-- Case insensitivity and locale equivalence leads to uncertainties about
what/which file may be intended in a given context, which could often lead
to exploitable error.

Rob.

(1) The actual truth is a tad uglier than this, the media can have the 8.3
names stored in interesting ways, but essentially a "toupper()" is done on
every file name as it is retrieved and processed.  This cuts out a lot of
possibilities and leads to a lot of "tildes of shame" in even some of the
more harmless seeming name conflicts.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: UTF-8 and case-insensitivity
  2004-02-18  0:16 ` Robert White
@ 2004-02-18  0:20   ` Linus Torvalds
  2004-02-18  1:03     ` Robert White
  2004-02-18 21:48     ` Ville Herva
  2004-02-18  2:48   ` tridge
  1 sibling, 2 replies; 69+ messages in thread
From: Linus Torvalds @ 2004-02-18  0:20 UTC (permalink / raw)
  To: Robert White
  Cc: tridge, 'Kernel Mailing List', 'Al Viro',
	'Neil Brown'



On Tue, 17 Feb 2004, Robert White wrote:
>
> OK, so I wrote the below, but then in the summary I realized that there was
> a significant factor that doesn't fit in with the rest of the post.  Case
> insensitivity, and more generally locale equivalence rules, is a security
> nightmare.  Consider the number of different file names that "su" could map
> to if you apply case insensitivity (4) and/or worse yet the various accents
> and umlats (?,etc) that sort-equivalent for "u" in some locales.  The user
> types "su" and runs "S(u-umlat)" etc. 

This is but one reason why I will _refuse_ to make case insensitivity
magically start happening on regular "open()" etc calls.

You'd literally have to use a _different_ system call to do a 
case-insensitive file open. Exactly because anything else would be very 
confusing to existing apps (and thus be potential security holes).

		Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: UTF-8 and case-insensitivity
  2004-02-18  0:20   ` Linus Torvalds
@ 2004-02-18  1:03     ` Robert White
  2004-02-18 21:48     ` Ville Herva
  1 sibling, 0 replies; 69+ messages in thread
From: Robert White @ 2004-02-18  1:03 UTC (permalink / raw)
  To: tridge
  Cc: 'Kernel Mailing List', 'Al Viro',
	'Neil Brown', 'Linus Torvalds'

P.S. Given that the GUI libraries (almost invariably) already deal with
displaying things in a case insensitive way, the "best place to cut" to add
case insensitivity to the user command-line experience would be adding a
flag to file name completion in bash.  Bash is already doing file name finds
and lookups when you press tab; and the user is actively looking at the
correctness and singularity/duality of the results.

So the proverbial "vi makef{tab}" would, if the flag was set, show you
makefile, Makefile, and MakeFile (etc) as existent or just switch makef to
"Makefile" if the name were unique.

It doesn't make lives easier for the API level project programmer people
(c.f. samba), but it could uber-happy the incoming newbies, and people like
me who have to interoperate within a vast wasteland of directories full of
inconsistently named files created by windows programmers (like SOCKET.C,
Socket.H, constants.h, and ss_switch.c all in one directory tree with
hundreds of their friends. 8-)

I would however, be forced to throttle myself with my own intestine if
kernel started doing this magic mapping "for me", especially "in some
calls/contexts but not in others".  (Not that I want to provide my possible
death as a strong motivation for adding the feature. 8-)

Rob.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: UTF-8 and case-insensitivity
  2004-02-18  0:20   ` Linus Torvalds
  2004-02-18  1:03     ` Robert White
@ 2004-02-18 21:48     ` Ville Herva
  1 sibling, 0 replies; 69+ messages in thread
From: Ville Herva @ 2004-02-18 21:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Robert White, tridge, 'Kernel Mailing List',
	'Al Viro', 'Neil Brown'

On Tue, Feb 17, 2004 at 04:20:26PM -0800, you [Linus Torvalds] wrote:
> 
> This is but one reason why I will _refuse_ to make case insensitivity
> magically start happening on regular "open()" etc calls.
> 
> You'd literally have to use a _different_ system call to do a 
> case-insensitive file open. 

Tongue-in-cheek:

  int Open(const char *pathname, int flags); ?




-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: UTF-8 and case-insensitivity
  2004-02-18  0:16 ` Robert White
  2004-02-18  0:20   ` Linus Torvalds
@ 2004-02-18  2:48   ` tridge
  2004-02-18 20:56     ` Robert White
  1 sibling, 1 reply; 69+ messages in thread
From: tridge @ 2004-02-18  2:48 UTC (permalink / raw)
  To: Robert White
  Cc: linux-kernel, 'Linus Torvalds', 'Al Viro',
	'Neil Brown'

Robert,

Just about everything in your posting is either years out of date or
just totally wrong. 

 > OK, so I wrote the below, but then in the summary I realized that there was
 > a significant factor that doesn't fit in with the rest of the post.  Case
 > insensitivity, and more generally locale equivalence rules, is a security
 > nightmare.  Consider the number of different file names that "su" could map
 > to if you apply case insensitivity (4) and/or worse yet the various accents
 > and umlats (?,etc) that sort-equivalent for "u" in some locales.  The user
 > types "su" and runs "S(u-umlat)" etc. 

This is no different from the "stupid admin puts . in $PATH"
problem. Simple solutions:

 1) don't mount your root filesystem with case insensitive naming
 2) use a sane $PATH
 3) don't allow untrusted users to create files in your $PATH
 4) don't run bash in case insensitive mode if you can't for some
    you can't do (1) or (2) or (3)

any of (1), (2) or (3) solves this. 

 > In point of fact the entire windows application space has a
 > singular active locale at any one time and there is a well-defined
 > but horrible layer of indirection where "long names" like "My
 > Documents" become "real names" like "MYDOCU~1".  Essentially every
 > windows file name is subject to a double-indirect file name
 > translation.  The first pass is the strcasecmp() locale-dependent
 > traversal of the "long name" list.  The second is the strcasecmp()
 > frozen-locale-spec-dependent traversal of (US Latin?) 8.3 file
 > naming standard list of media elements (files/directories).

this is just total crap. That might have been true for msdos and even
possibly win9x, but its totally untrue for NTFS. There are enough
stupidities in windows without having to invent more.

NTFS is case insensitive at the filesystem level. In fact, its
selectable whether its case sensitive or case insensitive per-process
(a process can switch between the two models). The case mapping table
is built into the filesystem itself. That mapping has absolutely
*zero* to do with US Latin or any other legacy multi-byte encoding.

What you have done is the equivalent of stating that Linux can only do
14 character filenames, because once upon a time Linux had a
filesystem called minix. We've moved beyond that and so has windows. 

 > In point of fact, Windows is *not* "properly" case insensitive at the file
 > system level.  Use "dir /x" more often on your windows box to relive the
 > experience.  The "real" file names are mangled to good old 8.3 uppercase
 > internally(1).  You don't usually have to think about this, but if you have
 > ever lost the long-to-short file name mapping on a drive you know the hell
 > that ensues.  (see also iso9660.)

again, this is just complete crap. NTFS has had the ability to
completely disable 8.3 "alternative name" support for ages. Microsoft
is even starting to use this switch in their published benchmark
results, and I suspect it will become the default in a couple of
years. 

We've been through the same transition in Samba:

  - Samba 0.x only supported 8.3
  - Samba 1.x was oriented towards 8.3, but also supported long names
  - Samba 2.x and 3.x is oriented towards long names, and can disable 8.3
    names to some extent

by the time Samba 4.x comes out (I am working on it now) we may see a
significant number of sites disabling 8.3 completely. 

 > The thing is, to match this ersatz "functionality" on a system where more
 > than one locale may be used at the same time, you end up with a kind of
 > Cartesian product of user locales and filesystem native locales.  The cost
 > could get extreme and can only really be amortized if Linux were to declare
 > our own 8.3 style pronouncement for the character classes used for the
 > "real" file name storage (etc).

you are *way* out of date here. All recent windows apps use the UCS-2
interfaces which provides a single charset encoding across all
locales. I've heard that they may be redefining this as UCS-16 to
allow for an even larger range of characters, although I haven't seen
this popping up on the wire yet (then again, I just might not have
noticed). I wish they had chosen UTF-8 instead of UCS-2, but at least
they chose something and got it into every part of the OS years ago.

 > Late stage case insensitivity isn't that hard to put in a linux application,
 > just crack open your file selection dialog boxes and have them use
 > strcasecmp() in all their select/sort logic.  Also then replace open() with
 > CaseOpen() which does a find/search operation before daring to
 > creat().

Have you read *any* of what I've been saying about how expensive this is??

 > That is, in every practical way, how Windows handles these problems.  It
 > just happens in some fairly interesting and hard-to-predict places depending
 > on context.

No, that is *not* how current versions of windows do things. 

 > So what was I saying... Oh yea...
 > 
 > -- Single Locale storage standard required to prevent multiplicative cost.

windows has this. Linux doesn't.

 > -- Not that hard to fake case insensitivity "when necessary".

ditto

 > -- Cheaper in CPU/Space to mix case.

ditto

 > -- Native file names in native locales simplifies administration and
 > expectations. (not elaborated above, but true.)

?? single locale storage makes this just a no-op

 > -- Case insensitivity and locale equivalence leads to uncertainties about
 > what/which file may be intended in a given context, which could often lead
 > to exploitable error.

and that is just a complete load of crap. Windows has had exploitable
bugs due to case insensitivity, but the cause was things like leaving
directories in the search path writeable by unprivileged users. It was
*not* due to anything fundamentally insecure about case-insensitive
names in filesystems. 

 > (1) The actual truth is a tad uglier than this, the media can have the 8.3
 > names stored in interesting ways, but essentially a "toupper()" is done on
 > every file name as it is retrieved and processed.  This cuts out a lot of
 > possibilities and leads to a lot of "tildes of shame" in even some of the
 > more harmless seeming name conflicts.

oh i get it, you're just a troll ....

Cheers, Tridge

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: UTF-8 and case-insensitivity
  2004-02-18  2:48   ` tridge
@ 2004-02-18 20:56     ` Robert White
  0 siblings, 0 replies; 69+ messages in thread
From: Robert White @ 2004-02-18 20:56 UTC (permalink / raw)
  To: tridge
  Cc: linux-kernel, 'Linus Torvalds', 'Al Viro',
	'Neil Brown'

I guess I don't get it...

tridge@samba.org [mailto:tridge@samba.org] said:

> NTFS is case insensitive at the filesystem level. In fact, its
> selectable whether its case sensitive or case insensitive per-process
> (a process can switch between the two models). The case mapping table
> is built into the filesystem itself. That mapping has absolutely
> *zero* to do with US Latin or any other legacy multi-byte encoding.

If the process selects whether it wants to be case insensitive or not how is
NTFS case insensitive "at the file-system level"?  Let me guess, they have
two complete paths through the logic?  Lots of DLLs?  Redundant conflicting
access semantics^Wfeatures?

> you are *way* out of date here. All recent windows apps use the UCS-2
> interfaces which provides a single charset encoding across all locales.

Which kind of directly supports where I said that to amortize the expense
Linux would have to set up its *own* cannon about all file systems using the
same encoding.  The fact that I kept bringing up 8.3 was out of date.  Point
to you.  The point that picking an arbitrary encoding will lead Linux
getting out of date, or at least require a catastrophic realignment of every
program that deigns to open() any file anywhere, remains germane.

> Have you read *any* of what I've been saying about how expensive this is??

Yes, I understand the expense.  I have *paid* that expense in excruciating
detail on several occasions.  You want to have the kernel pay that expense
(in place of the application) as a fixed (amortized) cost or you want to
codify the file names with a standard encoding which would penalize the
entire system uniformly by raising the base cost to localize.

I appreciate the unbounded regex-like expense of iteratively applying
case/encoding insensitivity to a list of files.  I really don't want to pay
that cost in every application when I only need it at the front end.  Sue
me.

I also understand the pain of having to load any/each entire directory into
memory one blasted dirent at a time, and appreciate that since the kernel is
bulk loading them at the filesystem interface it seems (is) wasteful to have
to spoon them across the kernel/user-space interface.  I really do
understand.  (ASIDE: a bulk-fetch-directory-into-buffer call might be nice,
I havn't looked lately, but I presume none such exists.)

Your proposed "single locale storage" would penalize all us embedded systems
types with our space sensitive embedded file systems and low-powered CPUs so
that the larger system that _can_ afford to pay the cost only when necessary
don't have to.  Two-bytes for one in every file name isn't a good trade off
when you are dealing with a 32k file system image.

I kind of tried (and apparently utterly failed) to make the points about how
the Windows model worked and what it would cost by describing the basis for
the model, not the current implementation.  That is kind of why I *started*
the message with "(ok in point of "technically abstract truth")" and
mentioned later that what I was saying may have changed, but if so, it
changed in a way consistent with the model as described.

Windows has been digging themselves steadily out of the deep hole of
case-insensitive file name handling for years; which does nothing to entice
me to jump in and join them.  So bully for windows that they have, iteration
after iteration, managed to reduce the cost of their mistake.

Even *with* a standardized file name character set/encoding case
insensitivity would still be very bad-off in some important areas.  Consider
a simple security log.  "[date] user command xx satisfied with executive
Xx." etc.  I can think of *lots* of times when I would have to open a file
and then have to ask what the real name of the file I opened actually was.
"I asked for 'Bob', what did I get?" isn't a fun question to have to answer
*after* an open.  Yes, all this *can* be addressed by scrubbing paths, but
history suggests that this doesn't happen and the more the system does for
you, the more likely you are to miss something.

At the application level, since I have to sort file names for a picklist
anyway, I'd rather pay the case insensitivity cost while I was sorting.
It's actually cleaner and I am already paying to sort.

I used to write SMB based applications (yes, I'm still way out of date) and
I appreciate the painful tit-for-tat non-streaming ugliness.  I feel your
pain at having to read a whole directory and doing the sort/search.  I
understand the race condition that occurs between the directory read and the
actual open where the file could be renamed or replaced.  I really do.

But "fixing" Linux so that it can share Window's pain doesn't seem wise.

I can imagine a mod/module that would graft a localized and/or
case-insensitive companion hash onto the dirent(s) as the central facility
was doing its work.  I can imagine an alternate open that traversed this
alternate tree.  Creating sort of a giant look-aside into the current file
information tree. But I can't imagine any winning scenario that came from
making that alternate hash the normal access method.  Too many people and
projects would suddenly break.

{And I try not to troll, but I apparently have a knack for getting peoples
dander in a bunch when I write.  I think it is because I write as I speak,
and the loss of tone and inflection in writing makes my turn-of-phrase come
off very priggish.  I'm not sure how to fix that.  /sigh 8-)

Rob.

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2004-02-19 20:13 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1q4Si-658-5@gated-at.bofh.it>
     [not found] ` <1q7no-8ss-7@gated-at.bofh.it>
     [not found]   ` <1qfb7-7s5-19@gated-at.bofh.it>
     [not found]     ` <1qmPm-6Gl-11@gated-at.bofh.it>
     [not found]       ` <1qpWI-1Sa-1@gated-at.bofh.it>
     [not found]         ` <1qqpO-2lx-3@gated-at.bofh.it>
     [not found]           ` <1qqzv-2tr-3@gated-at.bofh.it>
     [not found]             ` <1qqJc-2A2-5@gated-at.bofh.it>
     [not found]               ` <1qHAR-2Wm-49@gated-at.bofh.it>
     [not found]                 ` <1qIwr-5GB-11@gated-at.bofh.it>
     [not found]                   ` <1qIwr-5GB-9@gated-at.bofh.it>
     [not found]                     ` <1qIQ1-5WR-27@gated-at.bofh.it>
     [not found]                       ` <1qIZt-6b9-11@gated-at.bofh.it>
     [not found]                         ` <1qJsF-6Be-45@gated-at.bofh.it>
2004-02-19  0:06                           ` UTF-8 and case-insensitivity Pascal Schmidt
2004-02-19  1:01                             ` tridge
2004-02-19  1:08                               ` Hua Zhong
2004-02-19  1:46                                 ` tridge
2004-02-19  2:44                               ` Theodore Ts'o
2004-02-19  3:20                                 ` tridge
2004-02-19 10:18                                   ` Helge Hafting
2004-02-19 12:11                                   ` Paulo Marques
2004-02-19 19:04                                     ` Helge Hafting
2004-02-19 14:08                                   ` Theodore Ts'o
2004-02-19 20:12                                   ` Robert White
     [not found] <fa.epf5o9k.1rkudgo@ifi.uio.no>
     [not found] ` <fa.idvvhjl.1jge92d@ifi.uio.no>
2004-02-18  1:09   ` Andy Lutomirski
2004-02-17  4:12 tridge
2004-02-17  5:11 ` Linus Torvalds
2004-02-17  6:54   ` tridge
2004-02-17  8:33     ` Neil Brown
2004-02-17 22:48       ` tridge
2004-02-18  0:06         ` Neil Brown
2004-02-18  9:47           ` Helge Hafting
2004-02-17 15:13     ` Linus Torvalds
2004-02-17 16:57       ` Linus Torvalds
2004-02-17 19:44         ` viro
2004-02-17 20:10           ` Linus Torvalds
2004-02-17 20:17             ` viro
2004-02-17 20:23               ` Linus Torvalds
2004-02-17 21:08         ` Robin Rosenberg
2004-02-17 21:17           ` Linus Torvalds
2004-02-17 22:27             ` Robin Rosenberg
2004-02-18  3:02               ` tridge
2004-02-17 23:57         ` tridge
2004-02-17 23:20       ` tridge
2004-02-17 23:43         ` Linus Torvalds
2004-02-18  3:26           ` tridge
2004-02-18  5:33             ` H. Peter Anvin
2004-02-18  7:54             ` Marc Lehmann
2004-02-18  2:37         ` H. Peter Anvin
2004-02-18  3:03           ` Linus Torvalds
2004-02-18  3:14             ` H. Peter Anvin
2004-02-18  3:27               ` Linus Torvalds
2004-02-18 21:31                 ` tridge
2004-02-18 22:23                   ` Linus Torvalds
2004-02-18 22:28                     ` Linus Torvalds
2004-02-18 22:50                       ` tridge
2004-02-18 22:59                         ` Linus Torvalds
2004-02-18 23:09                           ` tridge
2004-02-18 23:16                             ` Linus Torvalds
2004-02-19  8:10                               ` Jamie Lokier
2004-02-19 16:09                                 ` Linus Torvalds
2004-02-19 16:38                                   ` Jamie Lokier
2004-02-19 16:54                                     ` Linus Torvalds
2004-02-19 18:29                                       ` Jamie Lokier
2004-02-19 19:08                                       ` Helge Hafting
2004-02-18  4:08           ` tridge
2004-02-18 10:05             ` Robin Rosenberg
2004-02-18 11:43               ` tridge
2004-02-18 12:31                 ` Robin Rosenberg
2004-02-18 16:48                   ` H. Peter Anvin
2004-02-18 20:00                     ` H. Peter Anvin
2004-02-19  2:53   ` Daniel Newby
2004-02-17  5:25 ` Tim Connors
2004-02-17  7:43 ` H. Peter Anvin
2004-02-17  8:05   ` H. Peter Anvin
2004-02-17 14:25 ` Dave Kleikamp
2004-02-18  0:16 ` Robert White
2004-02-18  0:20   ` Linus Torvalds
2004-02-18  1:03     ` Robert White
2004-02-18 21:48     ` Ville Herva
2004-02-18  2:48   ` tridge
2004-02-18 20:56     ` Robert White

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox