[RFC] A proposal for adding case insensitive lookups to ext4

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC] A proposal for adding case insensitive lookups to ext4
@ 2016-11-03 17:28 Theodore Ts'o
  2016-11-04 16:14 ` Andreas Dilger
       [not found] ` <CAEuANoLdZ0STmhp+1voS9K6+Ndkb6zfSWEcc0_thUtDjS8NPwg@mail.gmail.com>
  0 siblings, 2 replies; 9+ messages in thread
From: Theodore Ts'o @ 2016-11-03 17:28 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: guy, jra, drosen

HTML version (which will get updated in response to comments):

     https://thunk.org/tytso/casei-fs.html

# A proposal for adding case insensitive lookups to ext4

## Theodore Ts'o
## Version 0.10

### Introduction

Over the years there has been a desire to add case insensitive lookups
to ext 2/3/4.  The reason why this hasn't happened is doing it right
is hard.  Unfortunately, the workarounds that people have been using
in the absense of first class support for case insensitive lookups are
slow, and evade all of the problems that make this problem hard
anyway.

Hence, I think it's worthwhile to outline what could be done to allow
ext4 to support case insensitive that would be more efficient and less
hacky than some other solutions (e.g., slow FUSE file systems and
hacky wrapfs-based solutions that are subject to crashes when run
under fsstress).

This proposal does not make any on-disk format changes, but rather
adds a mount option which causes lookups to be case insensitive, while
case would be preserved in the directory entries when they are created
and returned via readdir().

### Changes to be made to ext4

1.  If case-insensitivity is enabled, override the default dcache hash
and compare operations to ones that are case insensitive in ext4's
dcache_operations structure.

2.  In ext4_lookup(), if case insensitivity is enabled, and the
directory lookup does not succeed, fall back to a linear search of the
directory using using a case insensitive compare.  (This is slow, but
it's faster compared to doing this in userspace).

### Limitations

1.  Like all of the FUSE and in-kernel searches, case insensitivity
will be implemented using strcasecmp and tolower().  This implies that
only ASCII case folding will be accepted.  One of the problems of
using Unicode is that it is not a fixed target.  The case folding
algorithm is changing as new scripts are added; if someone wants to
add support for Unicode case folding, it should be added to the
kernel, with someone assigned with the headache of updating the case
folding algorithm when new versions of Unicode are issued.

2.  If the lookup is done using the filename as it is stored in the
directory, lookups will be O(1) if the dir_index (htree) ext4 file
system feature is enabled (which is the default).  It might be
possible to use a case insensitive hash for the htree feature.
However, if we do this, then the hash could be broken by changes to
use Unicode, and if we do implement this with Unicode 8.0.0 support,
the on-disk format could be broken by future Unicode updates.  So
adding support for case O(1) lookups when case is not preserved by the
filename provided by the user is highly unlikely.  (I will note that
none of the kludges commonly in use support Unicode anyway, so this
proposal is no worse than those kludges, and my personal approach is
to exert a very strong Somebody Else's Problem field and hope someone
else comes up with a solution for us.)

3.  Some of the hacky alternatives are also trying to support
Android's "unique" file permissions scheme.  Support for this is out
of scope for this proposal, although I do acknowledge we will need to
come up with a clean way of implementing those permissions-related
requirements before we have a complete, clean, bug-free, upstreamable
solution for Android.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] A proposal for adding case insensitive lookups to ext4
  2016-11-03 17:28 [RFC] A proposal for adding case insensitive lookups to ext4 Theodore Ts'o
@ 2016-11-04 16:14 ` Andreas Dilger
  2016-11-04 21:51   ` Theodore Ts'o
       [not found] ` <CAEuANoLdZ0STmhp+1voS9K6+Ndkb6zfSWEcc0_thUtDjS8NPwg@mail.gmail.com>
  1 sibling, 1 reply; 9+ messages in thread
From: Andreas Dilger @ 2016-11-04 16:14 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Ext4 Developers List, guy, jra, drosen

[-- Attachment #1: Type: text/plain, Size: 5562 bytes --]

On Nov 3, 2016, at 11:28 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> 
> HTML version (which will get updated in response to comments):
> 
>     https://thunk.org/tytso/casei-fs.html
> 
> 
> # A proposal for adding case insensitive lookups to ext4
> 
> ## Theodore Ts'o
> ## Version 0.10
> 
> ### Introduction
> 
> Over the years there has been a desire to add case insensitive lookups
> to ext 2/3/4.  The reason why this hasn't happened is doing it right
> is hard.  Unfortunately, the workarounds that people have been using
> in the absense of first class support for case insensitive lookups are
> slow, and evade all of the problems that make this problem hard
> anyway.
> 
> Hence, I think it's worthwhile to outline what could be done to allow
> ext4 to support case insensitive that would be more efficient and less
> hacky than some other solutions (e.g., slow FUSE file systems and
> hacky wrapfs-based solutions that are subject to crashes when run
> under fsstress).
> 
> This proposal does not make any on-disk format changes, but rather
> adds a mount option which causes lookups to be case insensitive, while
> case would be preserved in the directory entries when they are created
> and returned via readdir().
> 
> ### Changes to be made to ext4
> 
> 1.  If case-insensitivity is enabled, override the default dcache hash
> and compare operations to ones that are case insensitive in ext4's
> dcache_operations structure.
> 
> 2.  In ext4_lookup(), if case insensitivity is enabled, and the
> directory lookup does not succeed, fall back to a linear search of the
> directory using using a case insensitive compare.  (This is slow, but
> it's faster compared to doing this in userspace).

Does it make sense to flag directories with whether entries are inserted
with the case-insensitive hash?  That allows the common case of having
case insensitivity always enabled or disabled working optimally.  Falling
back to linear search for every negative lookup would be prohibitive for
large directories.

Depending on the filename length and the size of the directory, it may
still be faster to do 2^name_length lookups with the different case
combinations than a full linear search for each lookup (i.e. when
(1 << name_length < dir_blocks) this repeated lookup is more efficient).
Avoiding linear searching is most important for large directories, and
this becomes worthwhile for increasingly long filenames.

> ### Limitations
> 
> 1.  Like all of the FUSE and in-kernel searches, case insensitivity
> will be implemented using strcasecmp and tolower().  This implies that
> only ASCII case folding will be accepted.  One of the problems of
> using Unicode is that it is not a fixed target.  The case folding
> algorithm is changing as new scripts are added; if someone wants to
> add support for Unicode case folding, it should be added to the
> kernel, with someone assigned with the headache of updating the case
> folding algorithm when new versions of Unicode are issued.

What happens if filenames that collide after case folding are already
existing in the filesystem (e.g. include/uapi/linux/netfilter/xt_*.h
and net/netfilter/xt_*.c are examples of this, and it is a bit sad this
was exposed as part of uapi instead of giving the files useful names)?
Presumably that would prevent one of the filenames from being looked
up, or would it still be able to find both if the correct case was used?

> 2.  If the lookup is done using the filename as it is stored in the
> directory, lookups will be O(1) if the dir_index (htree) ext4 file
> system feature is enabled (which is the default).  It might be
> possible to use a case insensitive hash for the htree feature.
> However, if we do this, then the hash could be broken by changes to
> use Unicode, and if we do implement this with Unicode 8.0.0 support,
> the on-disk format could be broken by future Unicode updates.  So
> adding support for case O(1) lookups when case is not preserved by the
> filename provided by the user is highly unlikely.  (I will note that
> none of the kludges commonly in use support Unicode anyway, so this
> proposal is no worse than those kludges, and my personal approach is
> to exert a very strong Somebody Else's Problem field and hope someone
> else comes up with a solution for us.)

Is this conflating the htree ASCII case folding problem with Unicode?
It would still be possible to insert names into the htree using the hash
of the ASCII-folded names, regardless of what is done for Unicode folding.
Changing the folding method would make the filesystem slow with large
directories (possibly unusable for very large directories), but that could
be fixed by running "e2fsck -fD" on the filesystem to reindex directories.

The filesystem would remain functional even if the folding method changed,
and even without running "e2fsck -fD" it would eventually get better as
new files are added with the right hash and old files are deleted.

Cheers, Andreas

> 3.  Some of the hacky alternatives are also trying to support
> Android's "unique" file permissions scheme.  Support for this is out
> of scope for this proposal, although I do acknowledge we will need to
> come up with a clean way of implementing those permissions-related
> requirements before we have a complete, clean, bug-free, upstreamable
> solution for Android.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] A proposal for adding case insensitive lookups to ext4
  2016-11-04 16:14 ` Andreas Dilger
@ 2016-11-04 21:51   ` Theodore Ts'o
  2016-11-04 23:12     ` Andreas Dilger
  2016-11-06 23:57     ` Dave Chinner
  0 siblings, 2 replies; 9+ messages in thread
From: Theodore Ts'o @ 2016-11-04 21:51 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Ext4 Developers List, guy, jra, drosen

On Fri, Nov 04, 2016 at 10:14:03AM -0600, Andreas Dilger wrote:
> > 2.  In ext4_lookup(), if case insensitivity is enabled, and the
> > directory lookup does not succeed, fall back to a linear search of the
> > directory using using a case insensitive compare.  (This is slow, but
> > it's faster compared to doing this in userspace).
> 
> Does it make sense to flag directories with whether entries are inserted
> with the case-insensitive hash?  That allows the common case of having
> case insensitivity always enabled or disabled working optimally.  Falling
> back to linear search for every negative lookup would be prohibitive for
> large directories.

I'm proposing that we not make any on-disk format changes for now.
It's true that this means that we need to degrade to a O(N) brute
force search, and that it is undefined if there are two files that are
the same when case folding is enabled (e.g., if there is both a
Makefile and makefile in the directory).

However, the horrible hacks that people have been using have these
problems *already*.  Doing it in the kernel has a number of
advantages: (1) it's faster since the FUSE hack or the userspace hack
doesn't have to transfer the contents of the directory to userspace to
do the case insensitive search, and (2) the O(N) search only happens
in the cold cache case since we can rely on the dcache to cache the
case-folded filename.  So it's far better than especially the FUSE and
Samba implementations of case-folded lookups that I've seen.

> What happens if filenames that collide after case folding are already
> existing in the filesystem

As in the current schemes, it's undefined which file you get.

In practice it doesn't seem to be an issue since very often the
directory starts empty and all of the file creates would be done in a
case insensitive fashion.

> Is this conflating the htree ASCII case folding problem with Unicode?
> It would still be possible to insert names into the htree using the hash
> of the ASCII-folded names, regardless of what is done for Unicode folding.
> Changing the folding method would make the filesystem slow with large
> directories (possibly unusable for very large directories), but that could
> be fixed by running "e2fsck -fD" on the filesystem to reindex directories.

Well, the issue is that I assume the ASCII case folding is not going
to be long-term acceptible.  So sooner or later someone is going to
want to try to insert a Unicode-8 case folding system into the kernel.
I just don't want to have to deal with that mess.  (I don't get paid
enough to deal with I18N, so this is going to be a situation of
'patches gratefully accepted').  So committing to an on-disk format
when eventually people will want to add Unicode seems like more work
than it's worth.

Eventually if we do want to use a case insensitive hash for the
hash_tree, we'd have to add a new read-only feature, store the case
folding algorithm used in the superblock, and then handle the
conversion cases with tune2fs and/or e2fsck -fD, etc.  That's all a
huge amount of work, and see previous comments of I'm not getting paid
enough to deal with I18N.  So it's very likely that we wouldn't
support converting from ASCII to Unicode 8 (again, unless someone
wants to send me patches), or deal with what happens some number of
years from now when the Unicode consortium publishes Unicode 9 (e.g.,
how quickly do we need to support Unicode 9, etc.)

It's basically a question of tradeing off developer time with fast
lookups when case insensitivity is turned on and the case is coming
from the user (as opposed to coming from readdir) and the case is
incorrect.  In the past, we've let the perfect be the enemy of the
good.   And getting "perfect" is a massive pain in the tuckus.

So a very explicit goal in this proposal is to do something very low
effort, and not painting ourselves into the corner.  Which is why
doing something which does not have any on-disk format changes was a
key part of the design.

If someone wants to do something "right", which means e2fsprogs and
kernel changes, getting the Unicode translation code into the kernel
(and dealing with the bikeshedding that will probably happen when we
try to get generic Unicode support into the kernel), and that someone
is a reasonably experienced ext4 developer so I'm not forced to
reimplement prototype code, I'm certainly willing to entertain the
discussion.  But the main reason why we havne't had this for decades
was because (a) at least initially the people who ext4 to support case
folding wanted us to support mutliple codepages instead of just
Unicode/UTF-8, and (b) most of the ext4 developers aren't paid enough
to deal with I18N.  :-)

     	    	 	    	   	       - Ted

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] A proposal for adding case insensitive lookups to ext4
  2016-11-04 21:51   ` Theodore Ts'o
@ 2016-11-04 23:12     ` Andreas Dilger
  2016-11-06 23:57     ` Dave Chinner
  1 sibling, 0 replies; 9+ messages in thread
From: Andreas Dilger @ 2016-11-04 23:12 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Ext4 Developers List, guy, jra, drosen

[-- Attachment #1: Type: text/plain, Size: 6767 bytes --]

On Nov 4, 2016, at 3:51 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> 
> On Fri, Nov 04, 2016 at 10:14:03AM -0600, Andreas Dilger wrote:
>>> 2.  In ext4_lookup(), if case insensitivity is enabled, and the
>>> directory lookup does not succeed, fall back to a linear search of the
>>> directory using using a case insensitive compare.  (This is slow, but
>>> it's faster compared to doing this in userspace).
>> 
>> Does it make sense to flag directories with whether entries are inserted
>> with the case-insensitive hash?  That allows the common case of having
>> case insensitivity always enabled or disabled working optimally.  Falling
>> back to linear search for every negative lookup would be prohibitive for
>> large directories.
> 
> I'm proposing that we not make any on-disk format changes for now.
> It's true that this means that we need to degrade to a O(N) brute
> force search, and that it is undefined if there are two files that are
> the same when case folding is enabled (e.g., if there is both a
> Makefile and makefile in the directory).
> 
> However, the horrible hacks that people have been using have these
> problems *already*.  Doing it in the kernel has a number of
> advantages: (1) it's faster since the FUSE hack or the userspace hack
> doesn't have to transfer the contents of the directory to userspace to
> do the case insensitive search, and (2) the O(N) search only happens
> in the cold cache case since we can rely on the dcache to cache the
> case-folded filename.  So it's far better than especially the FUSE and
> Samba implementations of case-folded lookups that I've seen.
> 
> 
>> What happens if filenames that collide after case folding are already
>> existing in the filesystem
> 
> As in the current schemes, it's undefined which file you get.
> 
> In practice it doesn't seem to be an issue since very often the
> directory starts empty and all of the file creates would be done in a
> case insensitive fashion.
> 
>> Is this conflating the htree ASCII case folding problem with Unicode?
>> It would still be possible to insert names into the htree using the hash
>> of the ASCII-folded names, regardless of what is done for Unicode folding.
>> Changing the folding method would make the filesystem slow with large
>> directories (possibly unusable for very large directories), but that could
>> be fixed by running "e2fsck -fD" on the filesystem to reindex directories.
> 
> Well, the issue is that I assume the ASCII case folding is not going
> to be long-term acceptible.  So sooner or later someone is going to
> want to try to insert a Unicode-8 case folding system into the kernel.
> I just don't want to have to deal with that mess.  (I don't get paid
> enough to deal with I18N, so this is going to be a situation of
> 'patches gratefully accepted').  So committing to an on-disk format
> when eventually people will want to add Unicode seems like more work
> than it's worth.
> 
> Eventually if we do want to use a case insensitive hash for the
> hash_tree, we'd have to add a new read-only feature, store the case
> folding algorithm used in the superblock, and then handle the
> conversion cases with tune2fs and/or e2fsck -fD, etc.  That's all a
> huge amount of work, and see previous comments of I'm not getting paid
> enough to deal with I18N.  So it's very likely that we wouldn't
> support converting from ASCII to Unicode 8 (again, unless someone
> wants to send me patches), or deal with what happens some number of
> years from now when the Unicode consortium publishes Unicode 9 (e.g.,
> how quickly do we need to support Unicode 9, etc.)
> 
> It's basically a question of tradeing off developer time with fast
> lookups when case insensitivity is turned on and the case is coming
> from the user (as opposed to coming from readdir) and the case is
> incorrect.  In the past, we've let the perfect be the enemy of the
> good.   And getting "perfect" is a massive pain in the tuckus.
> 
> So a very explicit goal in this proposal is to do something very low
> effort, and not painting ourselves into the corner.  Which is why
> doing something which does not have any on-disk format changes was a
> key part of the design.
> 
> If someone wants to do something "right", which means e2fsprogs and
> kernel changes, getting the Unicode translation code into the kernel
> (and dealing with the bikeshedding that will probably happen when we
> try to get generic Unicode support into the kernel), and that someone
> is a reasonably experienced ext4 developer so I'm not forced to
> reimplement prototype code, I'm certainly willing to entertain the
> discussion.  But the main reason why we havne't had this for decades
> was because (a) at least initially the people who ext4 to support case
> folding wanted us to support mutliple codepages instead of just
> Unicode/UTF-8, and (b) most of the ext4 developers aren't paid enough
> to deal with I18N.  :-)

The importance of "getting it right" depends on what you consider a
disk format change, and how often people are doing lookups with the
wrong case?  Presumably you have some motivation to implement this
functionality in the first place, and hopefully have some data on how
it is being used?

If the "linear search" case is rare, because people almost always look
up filenames using the right case, then that is half of the problem.

However, you are ignoring the negative lookup case, where the filename
doesn't exist at all.  If the kernel is falling back to a full linear
search for every negative lookup (e.g. binaries or libraries in a path)
then this could be punishing if the directory is large and/or there are
lots of path components (which happens at sites I've seen).  The negative
lookup case also doesn't even have the benefit of early exit when the
entry is found, but will always search the whole directory each time.

Hence, my thought that if someone wants to use case-insensitive lookups,
they probably want that to be true for a long time, and using the case-
insensitive hash to both find and insert the entry is best, even if the
folding method changes in the future.  In the worst case, it is no worse
than your linear-search fallback, but in the common case it will have
O(1) lookup and insertion.

I don't think that adding a new hash type for ASCII-folded half-md4 is
too much to ask, so that e2fsck knows how to handle it.  The kernel could
set this on new directories, and e2fsck can use it when running with
e2fsck -fD.  I don't think this requires a new read-only feature, since
the kernel will already handle unknown hash functions with linear search.
We still don't need to handle Unicode at this point (that would be a
different hash type), but avoids a partial solution with bad failure modes.

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] A proposal for adding case insensitive lookups to ext4
  2016-11-04 21:51   ` Theodore Ts'o
  2016-11-04 23:12     ` Andreas Dilger
@ 2016-11-06 23:57     ` Dave Chinner
  2016-11-07  0:14       ` Theodore Ts'o
  1 sibling, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2016-11-06 23:57 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Andreas Dilger, Ext4 Developers List, guy, jra, drosen

On Fri, Nov 04, 2016 at 05:51:05PM -0400, Theodore Ts'o wrote:
> On Fri, Nov 04, 2016 at 10:14:03AM -0600, Andreas Dilger wrote:
> > > 2.  In ext4_lookup(), if case insensitivity is enabled, and the
> > > directory lookup does not succeed, fall back to a linear search of the
> > > directory using using a case insensitive compare.  (This is slow, but
> > > it's faster compared to doing this in userspace).
> > 
> > Does it make sense to flag directories with whether entries are inserted
> > with the case-insensitive hash?  That allows the common case of having
> > case insensitivity always enabled or disabled working optimally.  Falling
> > back to linear search for every negative lookup would be prohibitive for
> > large directories.
> 
> I'm proposing that we not make any on-disk format changes for now.
> It's true that this means that we need to degrade to a O(N) brute
> force search, and that it is undefined if there are two files that are
> the same when case folding is enabled (e.g., if there is both a
> Makefile and makefile in the directory).

FYI, avoiding having to degrade to brute-force searches is why XFS
added a mkfs option for ascii-ci support.  It is there to indicate
that the directory name hashes are lower-case, case-insensitive
hashes on disk. This means that all case versions of the filename
hash to the same value and collisions can be resolved without
changing any of the existing search code.

We did this with a simple abstraction:

static struct xfs_nameops xfs_ascii_ci_nameops = {
        .hashname       = xfs_ascii_ci_hashname,
	.compname       = xfs_ascii_ci_compname,
};

Where ->hashname() calculates the hash, and ->compname() compares
the hash on disk for a match during lookup.

Otherwise, the only difference is the lookup path to instantiate the
dentry differently depending on whether it was an exact match or CI
match (see xfs_vn_ci_lookup()).

As on-disk changes go, this one should be relatively simple as
there is no actual structural change. :P

> If someone wants to do something "right", which means e2fsprogs and
> kernel changes, getting the Unicode translation code into the kernel
> (and dealing with the bikeshedding that will probably happen when we
> try to get generic Unicode support into the kernel), and that someone

Already happened once with an attempt to get unicode case folding
into XFS. Unfortunately, SGI disappeared before review was completed
and so it never got finalised and merged. However, the code is out
there and so we have pretty much a full implementation of unicode
case folding code out there. The v3 RFC (which contains links back
to the previous two versions and discussions) can be found here:

http://oss.sgi.com/archives/xfs/2014-10/msg00067.html

That's the place to start if people want to pick this up - I'd
suggest a generic interface similar to what has been done with the
fs encryption code is the way to proceed with this....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] A proposal for adding case insensitive lookups to ext4
  2016-11-06 23:57     ` Dave Chinner
@ 2016-11-07  0:14       ` Theodore Ts'o
  2016-11-07  4:30         ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Theodore Ts'o @ 2016-11-07  0:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Andreas Dilger, Ext4 Developers List, guy, jra, drosen, hch

I talked to Christoph at the Plumbers Closing party, and he suggested
that we get something simple in first which (a) assumes no on-disk
format changes, (b) does everything in the VFS layer, by using a
MS_CASE_FOLD, uses a case-insensitive dentry hash, and which degrades
to a brute force search in the VFS by using readdir interfaces if the
direct lookup does not succeed, and (c) at least initially assumes
only ASCII.

This could be extended by individual file systems who are willing to
make on-disk format changes.

It could be further extended later to support Unicode, and worse,
different versions of Unicode.  (The XFS patches you referenced
support Unicode 7.0.0, and so they are already obsolete.  Now we're up
to Unicode 8.0.0.  Fun.)  The basic issue here is neither Christoph
nor I are paid enough to worry about Unicode, and all of the hacks out
there don't support Unicode anyway.  If someone wants to pay Collabra
$$$ to deal with the Unicode nightmare, life is simple if we let it
degrade to brute force search, and they can have that work done under
contract.  :-)

If we want to handle on-disk format changes, then the file system
superblock would have to specify whether it's using ASCII, Unicode v7,
Unicode v8, etc., and the kernel would have to provide helper routines
to deal with all Unicode versions that we've ever supported before.  I
agree if we go down that path, we should have generic helper functions
ala how we handled encryption.  But the idea is get something basic in
first, and then add other support later, incrementally.

Christoph also suggested at the party that we should look at whether
or not Android weird permissions system could be handled using an LSM.
(It would have to be a stackable LSM, layered on top of SELinux.)
That was definitely an intriguing idea, and much more likely to be
sane than trying to use a wrapfs-based hack.  The problem is I don't
understand the weird Android permissions model well enough to know
whether or not this is doable, but it's something I may try to take a
look at if I can find enough round tuits.

						- Ted

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] A proposal for adding case insensitive lookups to ext4
  2016-11-07  0:14       ` Theodore Ts'o
@ 2016-11-07  4:30         ` Dave Chinner
  2016-11-07  5:42           ` Theodore Ts'o
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2016-11-07  4:30 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Andreas Dilger, Ext4 Developers List, guy, jra, drosen, hch

On Sun, Nov 06, 2016 at 07:14:56PM -0500, Theodore Ts'o wrote:
> I talked to Christoph at the Plumbers Closing party, and he suggested
> that we get something simple in first which (a) assumes no on-disk
> format changes, (b) does everything in the VFS layer, by using a
> MS_CASE_FOLD, uses a case-insensitive dentry hash, and which degrades
> to a brute force search in the VFS by using readdir interfaces if the
> direct lookup does not succeed, and (c) at least initially assumes
> only ASCII.
> 
> This could be extended by individual file systems who are willing to
> make on-disk format changes.

OK, as long people are ok with things going wonky when the
filesystem is mounted without those mount options, I'm not really
fussed...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] A proposal for adding case insensitive lookups to ext4
  2016-11-07  4:30         ` Dave Chinner
@ 2016-11-07  5:42           ` Theodore Ts'o
  0 siblings, 0 replies; 9+ messages in thread
From: Theodore Ts'o @ 2016-11-07  5:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Andreas Dilger, Ext4 Developers List, guy, jra, drosen, hch

On Mon, Nov 07, 2016 at 03:30:35PM +1100, Dave Chinner wrote:
> 
> OK, as long people are ok with things going wonky when the
> filesystem is mounted without those mount options, I'm not really
> fussed...

There are apparently NAS boxes that export the same directory with
case insensitive Samba and case sensitive NFS exports at the same
time, which would have all of these problems, and people apparently
don't complain much.

I agree we need to warn people in the documentation that if the file
system is initially mounted w/o case folding, and they create files
such as Readme and README in the same directory, and then mount the
file system with case folding, which file they get will be a bit Wonky
--- specifically, "Readme" and "README" will get the file which is an
exact match, but it is undefined which file "ReAdMe" will return.

Personally, I don't plan to lose any sleep over the issue.  :-)

	      	    	    	     	   - Ted

^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <CAEuANoLdZ0STmhp+1voS9K6+Ndkb6zfSWEcc0_thUtDjS8NPwg@mail.gmail.com>]

* Re: [RFC] A proposal for adding case insensitive lookups to ext4
       [not found] ` <CAEuANoLdZ0STmhp+1voS9K6+Ndkb6zfSWEcc0_thUtDjS8NPwg@mail.gmail.com>
@ 2016-11-05  0:00   ` Theodore Ts'o
  0 siblings, 0 replies; 9+ messages in thread
From: Theodore Ts'o @ 2016-11-05  0:00 UTC (permalink / raw)
  To: Jeremy Allison; +Cc: Ext4 Developers List, Daniel Rosenberg, guy

On Fri, Nov 04, 2016 at 04:28:05PM -0700, Jeremy Allison wrote:
> I don't suppose ext4 has a negative cache for lookups ? That would
> certainly help the linear search case on lookup miss.

The dcache caches negative results, so as long as the lookup miss is
for the same non-existing file name, that's not a problem.

The issue will be if someone is using Makefile with default rules,
say, and Makefile is checking for foo.y, foo.l, foo.C, etc. for each
object file, there could be a fairly large number of failed lookups
that might require O(n) lookups.  Eventually all of these will be
cached, yes, but it could be a large number of negative dentry caches.

However, I'm not sure I care; no self-respecting programmer should be
using a case-insensitive file system, so in practice, I'm not sure it
matters all that much....

					- Ted

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-11-07  5:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-03 17:28 [RFC] A proposal for adding case insensitive lookups to ext4 Theodore Ts'o
2016-11-04 16:14 ` Andreas Dilger
2016-11-04 21:51   ` Theodore Ts'o
2016-11-04 23:12     ` Andreas Dilger
2016-11-06 23:57     ` Dave Chinner
2016-11-07  0:14       ` Theodore Ts'o
2016-11-07  4:30         ` Dave Chinner
2016-11-07  5:42           ` Theodore Ts'o
     [not found] ` <CAEuANoLdZ0STmhp+1voS9K6+Ndkb6zfSWEcc0_thUtDjS8NPwg@mail.gmail.com>
2016-11-05  0:00   ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox