From: Gabriel Krisman Bertazi <krisman@suse.de>
To: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
"hanqi@vivo.com" <hanqi@vivo.com>,
"Theodore Ts'o" <tytso@mit.edu>
Subject: Re: Unicode conversion issue
Date: Wed, 11 Dec 2024 14:45:51 -0500 [thread overview]
Message-ID: <87cyhyuhow.fsf@mailhost.krisman.be> (raw)
In-Reply-To: <Z1nG-PSEe6tPOZIG@google.com> (Jaegeuk Kim's message of "Wed, 11 Dec 2024 17:08:08 +0000")
Jaegeuk Kim <jaegeuk@kernel.org> writes:
> On 12/11, Gabriel Krisman Bertazi wrote:
>> Jaegeuk Kim <jaegeuk@kernel.org> writes:
>>
>> > Hi Linus/Gabriel,
>> >
>> > Once Android applied the below patch [1], some special characters started to be
>> > converted differently resulting in different length, so that f2fs cannot find
>> > the filename correctly which was created when the kernel didn't have [1].
>> >
>> > There is one bug report in [2] where describes more details. In order to avoid
>> > this, could you please consider reverting [1] asap? Or, is there any other
>> > way to keep the conversion while addressing CVE? It's very hard for f2fs to
>> > distinguish two valid converted lengths before/after [1].
>>
>> I got this report yesterday. I'm looking into it.
>>
>> It seems commit 5c26d2f1d3f5 ("unicode: Don't special case ignorable
>> code points") has affected more than ignorable code points, because that
>> U+2764 is not marked as Ignorable in the unicode database.
>>
>> I still think the solution to the original issue is eliminating
>> ignorable code points, and that should be fine. Let me look at why this
>> block of characters is mishandled.
I was struggling to reproduce it, until I copy-pasted the character
directly from the bugzilla:
The character the user has is ❤️, which is different than just ❤. This
is a combination of:
U+2764 + U+FE0F (Heavy Black Heart + Variation Selector-16)
Variation Selector-16 is an ignorable character with zero length,
exactly what we wanted to ignore with that patch. What I didn't
consider in the original submission was that, differently from other
ignorable code-points, this block might be used intentionally in a filename.
> Thank you so much. If it takes some time to find the root cause, may I
> propose the revert first to unblock production? The problem is quite severe
> as users cannot access their files.
We have 3 ways forward.
1) The first is to revert the patch and fix the original issue in a
different way. That would be: We would restore the original database
and treat Ignorable codepoints as folding to themselves only when doing
string comparisons, but not when calculating hashes. This way, the hash
will be the same, but filenames with Ignorable codepoints will be
handled as byte sequences.
2) We keep the original patch and add support in fsck to update the
hashes in volumes like the above.
3) We regenerate the database to Ignore codepoints in the code-block
FE00..FE0F. That would be the simplest, solution, but there might be
more cases that need fixing later.
At this point, I'd be pending torwards 1 or 3. Both of them can be done
after reverting my original patch, so I'm fine with that. Thoughts?
> Thank you so much. If it takes some time to find the root cause, may I
> propose the revert first to unblock production? The problem is quite
> severe as users cannot access their files.
I don't oppose this, considering the case at hand. I'll base the new patch
on top of the revert.
--
Gabriel Krisman Bertazi
next prev parent reply other threads:[~2024-12-11 19:45 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-11 15:46 Unicode conversion issue Jaegeuk Kim
2024-12-11 16:08 ` Gabriel Krisman Bertazi
2024-12-11 17:08 ` Jaegeuk Kim
2024-12-11 19:45 ` Gabriel Krisman Bertazi [this message]
2024-12-11 19:58 ` Linus Torvalds
2024-12-11 20:18 ` Linus Torvalds
2024-12-11 21:10 ` Gabriel Krisman Bertazi
2024-12-11 21:25 ` Linus Torvalds
2024-12-11 21:53 ` Jaegeuk Kim
2024-12-11 21:56 ` Linus Torvalds
2024-12-11 22:01 ` Jaegeuk Kim
2024-12-11 22:09 ` Linus Torvalds
2024-12-11 21:13 ` Jaegeuk Kim
2024-12-11 19:22 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87cyhyuhow.fsf@mailhost.krisman.be \
--to=krisman@suse.de \
--cc=hanqi@vivo.com \
--cc=jaegeuk@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.