All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gabriel Krisman Bertazi <krisman@suse.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>,
	 Al Viro <viro@zeniv.linux.org.uk>,
	ebiggers@kernel.org,  linux-fsdevel@vger.kernel.org,
	 jaegeuk@kernel.org
Subject: Re: [PATCH] libfs: Attempt exact-match comparison first during casefold lookup
Date: Thu, 18 Jan 2024 12:06:11 -0300	[thread overview]
Message-ID: <87a5p2fzq4.fsf@mailhost.krisman.be> (raw)
In-Reply-To: <CAHk-=wjkFcF4HKDhSf_fpsLNmDGMkD-ozaNdEhpEQ4JH=MsnNg@mail.gmail.com> (Linus Torvalds's message of "Wed, 17 Jan 2024 18:33:40 -0800")

Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Wed, 17 Jan 2024 at 18:06, Theodore Ts'o <tytso@mit.edu> wrote:
>> So we don't need to worry about the user not being able to fix it,
>> because they won't have been able to create the file in the first
>> place.
>
> Yeah, that's a fine argument, until you have a bug or subtle bit flip
> data corruption, and now instead of having something you can recover,
> the system actively says "Nope".

I know this is not your point, but I should add that, in case of a
bug or bit flip, we support "fixing" the "bad utf8" string through fsck.

>> I admit that when I discovered that MacOS errored out on illegal utf-8
>> characters it was mildly annoying,
>
> We may have to be able to interoperate with shit, but let's call it what it is.
>
> Nobody pretends FAT is a great filesystem that made great design
> decisions. That doesn't mean that we can't interoperate with it just
> fine.
>
> But we don't need to take those idiotic and bad design decisions to
> heart, and we don't need to hide the fact that they are horrendous
> design mistakes.

There is a correctness issue with accepting the creation of invalid
utf-8 names that justifies the existence of strict mode.  Currently
undefined code-points can become a casefold match to some other file in
a later unicode version. When you decide to update your unicode version
or even copy the file to a volume with a different version, the lookup
might yield a different file, making one of them inaccessible or
overwriting the wrong file.

Obviously, not all corruptions would yield a "valid" undefined
code-point.  But those are possible.

We currently don't care much, since mkfs will create the volume with a
fixed, never-changed unicode version. That is, unless the user goes out
of their way to shoot themselves in the foot.

Strict mode is an easy way to prevent this class of issues (aside from
corruptions).

> So "strict" mode should mean that you can't *create* a misformed UTF-8
> filename.
>
> It's that same "be conservative in what you do".
>
> But *dammit*, if "strict" mode means that you can't even read other
> peoples mistakes because your "->lookup()" function refuses to even
> look at it, then "strict" mode is GARBAGE.
>
> That's the "be liberal in what you accept" part. Do it, or be damned.

Yes, we could be more liberal in the lookup while restricting the
creation of invalid utf8 sequences.  But, the only case where it would
matter is for corrupted volumes, where a file-name suddenly changed to
something invalid.  Considering ext4 and f2fs, since the disk direntry
hash (which is hash(casefolded(filename))) didn't get corrupted exactly
right, looking up the exact-match of the invalid name might fail.

This would create an even more inconsistent semantics, where small,
non-hashed directories can find these files, but larger, hashed
directories might not.  And that is even more confusing to users,
since it exposes internal filesystem details.

I get the point about how annoying the current semantics is.  But I
still think this is the sanest approach to a fundamentally insane
feature.

-- 
Gabriel Krisman Bertazi

      reply	other threads:[~2024-01-18 15:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 22:28 [PATCH] libfs: Attempt exact-match comparison first during casefold lookup Gabriel Krisman Bertazi
2024-01-17 22:38 ` Al Viro
2024-01-18  0:02   ` Gabriel Krisman Bertazi
2024-01-18  0:40     ` Linus Torvalds
2024-01-18  2:05       ` Theodore Ts'o
2024-01-18  2:33         ` Linus Torvalds
2024-01-18 15:06           ` Gabriel Krisman Bertazi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a5p2fzq4.fsf@mailhost.krisman.be \
    --to=krisman@suse.de \
    --cc=ebiggers@kernel.org \
    --cc=jaegeuk@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.