All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jaegeuk Kim <jaegeuk@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Gabriel Krisman Bertazi <krisman@suse.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"hanqi@vivo.com" <hanqi@vivo.com>, Theodore Ts'o <tytso@mit.edu>
Subject: Re: Unicode conversion issue
Date: Wed, 11 Dec 2024 21:13:44 +0000	[thread overview]
Message-ID: <Z1oAiAAKzAmV5M2h@google.com> (raw)
In-Reply-To: <CAHk-=wiC3evUXq8QTcOBFTMu1wsUR_dYiS8eGxy0Hh7VbL55yA@mail.gmail.com>

On 12/11, Linus Torvalds wrote:
> On Wed, 11 Dec 2024 at 11:58, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > The problem is that all the filesystems basically do some variation of
> >
> >         if (IS_CASEFOLDED(dir) ..) {
> >
> >                 len = utf8_casefold(sb->s_encoding, orig_name,
> >                         new_name, MAXLEN);
> >
> > and then they use that "new_name" for both hashing and for comparisons.
> 
> Oh, actually, f2fs does pass in the original name to
> generic_ci_match(), so I think this is solvable.
> 
> The solution involves just telling f2fs to ignore the hash if it has
> seen odd characters.

But, the hash is not just used when matching the dentry, but gives a block
location withiin multi-level hash tables for faster lookup as well. If the
filename length is also changed by the unicode patch, utf8_strncasecmp_folded()
will also give an error?

> 
> So I think f2fs could actually do something like this:
> 
>   --- a/fs/f2fs/dir.c
>   +++ b/fs/f2fs/dir.c
>   @@ -67,6 +67,7 @@ int f2fs_init_casefolded_name(const struct inode *dir,
>                         /* fall back to treating name as opaque byte sequence */
>                         return 0;
>                 }
>   +             fname->ignore_hash = utf8_oddname(fname->usr_fname);
>                 fname->cf_name.name = buf;
>                 fname->cf_name.len = len;
>         }
>   @@ -231,7 +232,7 @@ struct f2fs_dir_entry
> *f2fs_find_target_dentry(const struct f2fs_dentry_ptr *d,
>                         continue;
>                 }
> 
>   -             if (de->hash_code == fname->hash) {
>   +             if (fname->ignore_hash || de->hash_code == fname->hash) {
>                         res = f2fs_match_name(d->inode, fname,
>                                               d->filename[bit_pos],
>                                               le16_to_cpu(de->name_len));
>   --- a/fs/f2fs/f2fs.h
>   +++ b/fs/f2fs/f2fs.h
>   @@ -521,6 +521,7 @@ struct f2fs_filename {
> 
>         /* The dirhash of this filename */
>         f2fs_hash_t hash;
>   +     bool ignore_hash;
> 
>    #ifdef CONFIG_FS_ENCRYPTION
>         /*
> 
> where that "utf8_oddname()" is the one that goes "this filename
> contains unhashable characters".
> 
> I didn't look very closely at what ext4 does, but it seems to already
> have a pattern for "don't even look at the hash because it's not
> reliable", so I think ext4 can do something similar.
> 
> So then all you actually need is that utf8_oddname() that recognizes
> those ignored code-points.
> 
> So I take it all back: option (1) actually doesn't look that bad, and
> would make reverting commit 5c26d2f1d3f5 ("unicode: Don't special case
> ignorable code points") unnecessary.
> 
>                 Linus

  parent reply	other threads:[~2024-12-11 21:13 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-11 15:46 Unicode conversion issue Jaegeuk Kim
2024-12-11 16:08 ` Gabriel Krisman Bertazi
2024-12-11 17:08   ` Jaegeuk Kim
2024-12-11 19:45     ` Gabriel Krisman Bertazi
2024-12-11 19:58       ` Linus Torvalds
2024-12-11 20:18         ` Linus Torvalds
2024-12-11 21:10           ` Gabriel Krisman Bertazi
2024-12-11 21:25             ` Linus Torvalds
2024-12-11 21:53               ` Jaegeuk Kim
2024-12-11 21:56                 ` Linus Torvalds
2024-12-11 22:01                   ` Jaegeuk Kim
2024-12-11 22:09                     ` Linus Torvalds
2024-12-11 21:13           ` Jaegeuk Kim [this message]
2024-12-11 19:22   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z1oAiAAKzAmV5M2h@google.com \
    --to=jaegeuk@kernel.org \
    --cc=hanqi@vivo.com \
    --cc=krisman@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.