All of lore.kernel.org
 help / color / mirror / Atom feed
From: bugzilla-daemon--- via Linux-f2fs-devel <linux-f2fs-devel@lists.sourceforge.net>
To: linux-f2fs-devel@lists.sourceforge.net
Subject: [f2fs-dev] [Bug 219586] New: Unable to find file after unicode change
Date: Tue, 10 Dec 2024 06:58:44 +0000	[thread overview]
Message-ID: <bug-219586-202145@https.bugzilla.kernel.org/> (raw)

https://bugzilla.kernel.org/show_bug.cgi?id=219586

            Bug ID: 219586
           Summary: Unable to find file after unicode change
           Product: File System
           Version: 2.5
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: blocking
          Priority: P3
         Component: f2fs
          Assignee: filesystem_f2fs@kernel-bugs.kernel.org
          Reporter: hanqi@vivo.com
        Regression: No

Hi everybody,
The f2fs filesystem is unable to read some files with special characters,
such as ❤️, after the kernel was updated with the following patch:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=18b5f47e7da46d3a0d7331e48befcaf151ed2ddf

We can reproduce this in the following steps:
1、First, we need to roll back the unicode-related changes above and create
the special character file or folder:
./tools/mkfs.f2fs -f -O casefold -C utf8 f2fs.img
mount f2fs.img f2fs_dir/
mkdir Picture
./f2fs_io setflags casefold Picture
cd Picture
touch ❤️

2、Then we apply the above unicode patch, and after mounting the filesystem,
we get a message that the special character file was not found.
mount f2fs.img f2fs_dir/
cd Picture
ls -alh
ls: cannot access '❤️': No such file or directory
total 8
drwxr-xr-x 2 root root 3488 Dec 10 06:11 .
drwxr-xr-x 3 root root 4096 Dec  9 10:21 ..
-????????? ? ?    ?       ?            ? ❤️

Here are the conclusions of my preliminary analysis.
In casefole-enabled f2fs filesystems, file names are converted to lowercase
by the utf8_casefold function when querying for a file, and then the hash is
calculated based on the lowercase filename and stored on disk. The path to
the function is:
f2fs_lookup
    f2fs_prepare_lookup
        __f2fs_setup_filename
            f2fs_init_casefolded_name
                utf8_casefold
            f2fs_hash_filename
    __f2fs_find_entry

For some files that contain special characters, such as ❤️. We found that the
length of the output characters changed after the utf8_casefold function
converted
them to lowercase before and after the patch, which ultimately led to a change
in the
calculated hash. Files created before patch are not readable after path is
enabled.

I think we need to modify the f2fs filesystem to be compatible with unicode
related changes.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

             reply	other threads:[~2024-12-10  6:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-10  6:58 bugzilla-daemon--- via Linux-f2fs-devel [this message]
2024-12-10 15:47 ` [f2fs-dev] [Bug 219586] Unable to find file after unicode change bugzilla-daemon--- via Linux-f2fs-devel
2024-12-11  0:08 ` bugzilla-daemon--- via Linux-f2fs-devel
2024-12-11  2:11 ` bugzilla-daemon--- via Linux-f2fs-devel
2024-12-11  4:13 ` bugzilla-daemon--- via Linux-f2fs-devel
2024-12-12  8:35 ` bugzilla-daemon--- via Linux-f2fs-devel
2024-12-12  8:39 ` bugzilla-daemon--- via Linux-f2fs-devel
2024-12-12 15:25 ` bugzilla-daemon--- via Linux-f2fs-devel
2024-12-13  1:32 ` bugzilla-daemon--- via Linux-f2fs-devel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-219586-202145@https.bugzilla.kernel.org/ \
    --to=linux-f2fs-devel@lists.sourceforge.net \
    --cc=bugzilla-daemon@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.