All of lore.kernel.org
 help / color / mirror / Atom feed
* [f2fs-dev] [Bug 219586] New: Unable to find file after unicode change
@ 2024-12-10  6:58 bugzilla-daemon--- via Linux-f2fs-devel
  2024-12-10 15:47 ` [f2fs-dev] [Bug 219586] " bugzilla-daemon--- via Linux-f2fs-devel
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: bugzilla-daemon--- via Linux-f2fs-devel @ 2024-12-10  6:58 UTC (permalink / raw)
  To: linux-f2fs-devel

https://bugzilla.kernel.org/show_bug.cgi?id=219586

            Bug ID: 219586
           Summary: Unable to find file after unicode change
           Product: File System
           Version: 2.5
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: blocking
          Priority: P3
         Component: f2fs
          Assignee: filesystem_f2fs@kernel-bugs.kernel.org
          Reporter: hanqi@vivo.com
        Regression: No

Hi everybody,
The f2fs filesystem is unable to read some files with special characters,
such as ❤️, after the kernel was updated with the following patch:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=18b5f47e7da46d3a0d7331e48befcaf151ed2ddf

We can reproduce this in the following steps:
1、First, we need to roll back the unicode-related changes above and create
the special character file or folder:
./tools/mkfs.f2fs -f -O casefold -C utf8 f2fs.img
mount f2fs.img f2fs_dir/
mkdir Picture
./f2fs_io setflags casefold Picture
cd Picture
touch ❤️

2、Then we apply the above unicode patch, and after mounting the filesystem,
we get a message that the special character file was not found.
mount f2fs.img f2fs_dir/
cd Picture
ls -alh
ls: cannot access '❤️': No such file or directory
total 8
drwxr-xr-x 2 root root 3488 Dec 10 06:11 .
drwxr-xr-x 3 root root 4096 Dec  9 10:21 ..
-????????? ? ?    ?       ?            ? ❤️

Here are the conclusions of my preliminary analysis.
In casefole-enabled f2fs filesystems, file names are converted to lowercase
by the utf8_casefold function when querying for a file, and then the hash is
calculated based on the lowercase filename and stored on disk. The path to
the function is:
f2fs_lookup
    f2fs_prepare_lookup
        __f2fs_setup_filename
            f2fs_init_casefolded_name
                utf8_casefold
            f2fs_hash_filename
    __f2fs_find_entry

For some files that contain special characters, such as ❤️. We found that the
length of the output characters changed after the utf8_casefold function
converted
them to lowercase before and after the patch, which ultimately led to a change
in the
calculated hash. Files created before patch are not readable after path is
enabled.

I think we need to modify the f2fs filesystem to be compatible with unicode
related changes.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-12-13  1:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-10  6:58 [f2fs-dev] [Bug 219586] New: Unable to find file after unicode change bugzilla-daemon--- via Linux-f2fs-devel
2024-12-10 15:47 ` [f2fs-dev] [Bug 219586] " bugzilla-daemon--- via Linux-f2fs-devel
2024-12-11  0:08 ` bugzilla-daemon--- via Linux-f2fs-devel
2024-12-11  2:11 ` bugzilla-daemon--- via Linux-f2fs-devel
2024-12-11  4:13 ` bugzilla-daemon--- via Linux-f2fs-devel
2024-12-12  8:35 ` bugzilla-daemon--- via Linux-f2fs-devel
2024-12-12  8:39 ` bugzilla-daemon--- via Linux-f2fs-devel
2024-12-12 15:25 ` bugzilla-daemon--- via Linux-f2fs-devel
2024-12-13  1:32 ` bugzilla-daemon--- via Linux-f2fs-devel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.