All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Gabriel Krisman Bertazi <krisman@suse.de>,
	Al Viro <viro@zeniv.linux.org.uk>,
	ebiggers@kernel.org, linux-fsdevel@vger.kernel.org,
	jaegeuk@kernel.org
Subject: Re: [PATCH] libfs: Attempt exact-match comparison first during casefold lookup
Date: Wed, 17 Jan 2024 21:05:54 -0500	[thread overview]
Message-ID: <20240118020554.GA1353741@mit.edu> (raw)
In-Reply-To: <CAHk-=wjd_uD4aHWEVZ735EKRcEU6FjUo8_aMXSxRA7AD8DapZA@mail.gmail.com>

On Wed, Jan 17, 2024 at 04:40:17PM -0800, Linus Torvalds wrote:
> Note that the whole "malformed utf-8 is an error" is actually wrong anyway.
> 
> Yes, if you *output* utf-8, and your output is malformed, then that's
> an error that needs fixing.
> 
> But honestly, "malformed utf-8" on input is almost always just "oh, it
> wasn't utf-8 to begin with, and somebody is still using Latin-1 or
> Shift-JIS or whatever".
> 
> And then treating that as some kind of hard error is actually really
> really wrong and annoying, and may end up meaning that the user cannot
> *fix* it, because they can't access the data at all.

A file system which supports casefolding can support "strict" mode
(not the default) where attempts to create files that have invalid
UTF-8 characters are rejected before a file or hard link is created
(or renamed) with an error.

This is what MacOS does, by the way.  If you try to rsync a file from
a Linux box where the file was created by unpacking a Windows Zip file
created by downloading a directory hierarchy from a Microsoft
Sharepoint, and then you try to scp or rsync it over to MacOS, MacOS
will will refuse to allow the file to be created if it contains
invalid UTF-8 characters, and rsync or scp will report an error.  I
just ran into this earlier today...

So we don't need to worry about the user not being able to fix it,
because they won't have been able to create the file in the first
place.  This is not the default, since we know there are a bunch of
users who might be creating files using the unofficial "Klingon"
characters (for example) that are not officially part of Unicode since
Unicode will only allow characters used by human languages, and
Klingon doesn't qualify.  I believe though that Android has elected to
enable casefolding in strict mode, which is fine as far as I'm concerned.

> I find libraries that just error out on "malformed utf-8" to be
> actively harmful.

I admit that when I discovered that MacOS errored out on illegal utf-8
characters it was mildly annoying, but it wasn't that hard to fix it
on the Linux side and then I retried the rsync.  It also turned out
that if I unpacked the zip file on MacOS, the filename was created
without the illegal utf-8 characters, so there may have been something
funky going on with the zip userspace program on Linux.  I haven't
cared enough to try to debug it...

       		      	     	   	    - Ted

  reply	other threads:[~2024-01-18  2:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 22:28 [PATCH] libfs: Attempt exact-match comparison first during casefold lookup Gabriel Krisman Bertazi
2024-01-17 22:38 ` Al Viro
2024-01-18  0:02   ` Gabriel Krisman Bertazi
2024-01-18  0:40     ` Linus Torvalds
2024-01-18  2:05       ` Theodore Ts'o [this message]
2024-01-18  2:33         ` Linus Torvalds
2024-01-18 15:06           ` Gabriel Krisman Bertazi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240118020554.GA1353741@mit.edu \
    --to=tytso@mit.edu \
    --cc=ebiggers@kernel.org \
    --cc=jaegeuk@kernel.org \
    --cc=krisman@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.