From: "Darrick J. Wong" <djwong@kernel.org>
To: Andrey Albershteyn <aalbersh@redhat.com>
Cc: xfs <linux-xfs@vger.kernel.org>, hch@lst.de
Subject: [RFC PATCH] generic/45[34]: add colored emoji variants to unicode tests
Date: Thu, 20 Feb 2025 14:10:27 -0800 [thread overview]
Message-ID: <20250220221027.GU21808@frogsfrogsfrogs> (raw)
In-Reply-To: <20250220220758.GT21808@frogsfrogsfrogs>
From: Darrick J. Wong <djwong@kernel.org>
Ted told me this morning about a recent problem with kernel Unicode name
casefolding vs. emoji -- initially, someone decided that zero-width
joiners should be stripped out of filenames during comparisons, which
lead to malicious git pulls of branches containing "<zwj>.git/config"
files overwriting git repo config files. A quick fix was to stop
ignoring the "ignorable" code points, but that broke emoji in filenames,
because emoji use zero-width joiners to combine simpler emoji into more
complex ones, or alter skin tones, or colors, etc. Reportedly the
casefolding code will also fold a red heart into a black one.
So. To our filename support test, let's add various colors of heart
emoji and various skin tones of heart-hands; and compound emoji
consisting of multiple emoji glued together with zero width joiners.
This actually caused a buffer overflow in the string-escaping functions
of xfs_scrub phase 5 because I hadn't anticipated that we'd end up with
a filename consisting *entirely* of nonprinting bytes.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
tests/generic/453 | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++
tests/generic/454 | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 110 insertions(+)
diff --git a/tests/generic/453 b/tests/generic/453
index 04945ad1085b2d..bd5ce8b2bb11d9 100755
--- a/tests/generic/453
+++ b/tests/generic/453
@@ -203,6 +203,36 @@ setf "job offer.pdf" "actual period"
setf "llamapirate\xf3\xa0\x80\x81\xf3\xa0\x81\x94\xf3\xa0\x81\xa8\xf3\xa0\x81\xa5\xf3\xa0\x80\xa0\xf3\xa0\x81\xb3\xf3\xa0\x81\xa1\xf3\xa0\x81\xac\xf3\xa0\x81\xa5\xf3\xa0\x81\xb3\xf3\xa0\x80\xa0\xf3\xa0\x81\xa6\xf3\xa0\x81\xaf\xf3\xa0\x81\xb2\xf3\xa0\x80\xa0\xf3\xa0\x81\x93\xf3\xa0\x81\xa5\xf3\xa0\x81\xa1\xf3\xa0\x81\xb4\xf3\xa0\x81\xb4\xf3\xa0\x81\xac\xf3\xa0\x81\xa5\xf3\xa0\x80\xa0\xf3\xa0\x81\xb7\xf3\xa0\x81\xa5\xf3\xa0\x81\xb2\xf3\xa0\x81\xa5\xf3\xa0\x80\xa0\xf3\xa0\x81\x95\xf3\xa0\x81\x93\xf3\xa0\x81\x84\xf3\xa0\x80\xa0\xf3\xa0\x80\xb1\xf3\xa0\x80\xb2\xf3\xa0\x80\xb0\xf3\xa0\x80\xb0\xf3\xa0\x80\xb0\xf3\xa0\x80\xb0\xf3\xa0\x81\xbf"
setf "llamapirate"
+# colored heart emoji to check if casefolding whacks emoji
+setf "\xf0\x9f\x92\x9c" "purple"
+setf "\xf0\x9f\x92\x99" "blue"
+setf "\xf0\x9f\x92\x9a" "green"
+setf "\xf0\x9f\x92\x9b" "yellow"
+setf "\xf0\x9f\xab\x80" "heart"
+setf "\xe2\x9d\xa4\xef\xb8\x8f" "red"
+setf "\xf0\x9f\xa4\x8e" "brown"
+setf "\xf0\x9f\xa4\x8d" "white"
+setf "\xf0\x9f\x96\xa4" "black"
+setf "\xf0\x9f\xa7\xa1" "orange"
+setf "\xe2\x99\xa5\xef\xb8\x8f" "red suit"
+
+# zero width joiners exist in the middle of emoji sequences aren't supposed
+# to be normalized to nothing, but apparently this caused issues with
+# casefolding on ext4; also the mending heart caused a crash in xfs_scrub
+setf "\xf0\x9f\x92\x94" "broken heart"
+setf "\xe2\x9d\xa4\xef\xb8\x8f\xe2\x80\x8d\xf0\x9f\xa9\xb9" "mending heart"
+setf "\xf0\x9f\xa7\x91\xf0\x9f\x8f\xbb\xe2\x80\x8d\xe2\x9d\xa4\xef\xb8
+\x8f\xe2\x80\x8d\xf0\x9f\xa7\x91\xf0\x9f\x8f\xbc" "couple with heart"
+setf "\xf0\x9f\xa7\x91\xf0\x9f\x8f\xbb\xe2\x80\x8d\xe2\x9d\xa4\xef\xb8\x8f\xe2\x80\x8d\xf0\x9f\xa7\x91\xf0\x9f\x8f\xbf" "couple with heart, light and dark skin tone"
+
+# emoji heart hands with skin tone variations
+setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbf" "dark"
+setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbe" "medium dark"
+setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbd" "medium"
+setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbc" "medium light"
+setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbb" "light"
+setf "\xf0\x9f\xab\xb6" "neutral"
+
ls -laR $testdir >> $seqres.full
echo "Test files"
@@ -276,6 +306,31 @@ testf "job offer.pdf" "actual period"
testf "llamapirate\xf3\xa0\x80\x81\xf3\xa0\x81\x94\xf3\xa0\x81\xa8\xf3\xa0\x81\xa5\xf3\xa0\x80\xa0\xf3\xa0\x81\xb3\xf3\xa0\x81\xa1\xf3\xa0\x81\xac\xf3\xa0\x81\xa5\xf3\xa0\x81\xb3\xf3\xa0\x80\xa0\xf3\xa0\x81\xa6\xf3\xa0\x81\xaf\xf3\xa0\x81\xb2\xf3\xa0\x80\xa0\xf3\xa0\x81\x93\xf3\xa0\x81\xa5\xf3\xa0\x81\xa1\xf3\xa0\x81\xb4\xf3\xa0\x81\xb4\xf3\xa0\x81\xac\xf3\xa0\x81\xa5\xf3\xa0\x80\xa0\xf3\xa0\x81\xb7\xf3\xa0\x81\xa5\xf3\xa0\x81\xb2\xf3\xa0\x81\xa5\xf3\xa0\x80\xa0\xf3\xa0\x81\x95\xf3\xa0\x81\x93\xf3\xa0\x81\x84\xf3\xa0\x80\xa0\xf3\xa0\x80\xb1\xf3\xa0\x80\xb2\xf3\xa0\x80\xb0\xf3\xa0\x80\xb0\xf3\xa0\x80\xb0\xf3\xa0\x80\xb0\xf3\xa0\x81\xbf"
testf "llamapirate"
+testf "\xf0\x9f\x92\x9c" "purple"
+testf "\xf0\x9f\x92\x99" "blue"
+testf "\xf0\x9f\x92\x9a" "green"
+testf "\xf0\x9f\x92\x9b" "yellow"
+testf "\xf0\x9f\xab\x80" "heart"
+testf "\xe2\x9d\xa4\xef\xb8\x8f" "red"
+testf "\xf0\x9f\xa4\x8e" "brown"
+testf "\xf0\x9f\xa4\x8d" "white"
+testf "\xf0\x9f\x96\xa4" "black"
+testf "\xf0\x9f\xa7\xa1" "orange"
+testf "\xe2\x99\xa5\xef\xb8\x8f" "red suit"
+
+testf "\xf0\x9f\x92\x94" "broken heart"
+testf "\xe2\x9d\xa4\xef\xb8\x8f\xe2\x80\x8d\xf0\x9f\xa9\xb9" "mending heart"
+testf "\xf0\x9f\xa7\x91\xf0\x9f\x8f\xbb\xe2\x80\x8d\xe2\x9d\xa4\xef\xb8
+\x8f\xe2\x80\x8d\xf0\x9f\xa7\x91\xf0\x9f\x8f\xbc" "couple with heart"
+testf "\xf0\x9f\xa7\x91\xf0\x9f\x8f\xbb\xe2\x80\x8d\xe2\x9d\xa4\xef\xb8\x8f\xe2\x80\x8d\xf0\x9f\xa7\x91\xf0\x9f\x8f\xbf" "couple with heart, light and dark skin tone"
+
+testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbf" "dark"
+testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbe" "medium dark"
+testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbd" "medium"
+testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbc" "medium light"
+testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbb" "light"
+testf "\xf0\x9f\xab\xb6" "neutral"
+
echo "Uniqueness of inodes?"
stat -c '%i' "${testdir}/"* | sort | uniq -c | while read nr inum; do
if [ "${nr}" -gt 1 ]; then
diff --git a/tests/generic/454 b/tests/generic/454
index aec8beb8b43ca0..9f6ddb4a0e48b2 100755
--- a/tests/generic/454
+++ b/tests/generic/454
@@ -124,6 +124,36 @@ setf "combmark_\xe1\x80\x9c\xe1\x80\xaf\xe1\x80\xad.txt" "combining marks"
setf "llamapirate\xf3\xa0\x80\x81\xf3\xa0\x81\x94\xf3\xa0\x81\xa8\xf3\xa0\x81\xa5\xf3\xa0\x80\xa0\xf3\xa0\x81\xb3\xf3\xa0\x81\xa1\xf3\xa0\x81\xac\xf3\xa0\x81\xa5\xf3\xa0\x81\xb3\xf3\xa0\x80\xa0\xf3\xa0\x81\xa6\xf3\xa0\x81\xaf\xf3\xa0\x81\xb2\xf3\xa0\x80\xa0\xf3\xa0\x81\x93\xf3\xa0\x81\xa5\xf3\xa0\x81\xa1\xf3\xa0\x81\xb4\xf3\xa0\x81\xb4\xf3\xa0\x81\xac\xf3\xa0\x81\xa5\xf3\xa0\x80\xa0\xf3\xa0\x81\xb7\xf3\xa0\x81\xa5\xf3\xa0\x81\xb2\xf3\xa0\x81\xa5\xf3\xa0\x80\xa0\xf3\xa0\x81\x95\xf3\xa0\x81\x93\xf3\xa0\x81\x84\xf3\xa0\x80\xa0\xf3\xa0\x80\xb1\xf3\xa0\x80\xb2\xf3\xa0\x80\xb0\xf3\xa0\x80\xb0\xf3\xa0\x80\xb0\xf3\xa0\x80\xb0\xf3\xa0\x81\xbf" "secret instructions"
setf "llamapirate" "no secret instructions"
+# colored heart emoji to check if casefolding whacks emoji
+setf "\xf0\x9f\x92\x9c" "purple"
+setf "\xf0\x9f\x92\x99" "blue"
+setf "\xf0\x9f\x92\x9a" "green"
+setf "\xf0\x9f\x92\x9b" "yellow"
+setf "\xf0\x9f\xab\x80" "heart"
+setf "\xe2\x9d\xa4\xef\xb8\x8f" "red"
+setf "\xf0\x9f\xa4\x8e" "brown"
+setf "\xf0\x9f\xa4\x8d" "white"
+setf "\xf0\x9f\x96\xa4" "black"
+setf "\xf0\x9f\xa7\xa1" "orange"
+setf "\xe2\x99\xa5\xef\xb8\x8f" "red suit"
+
+# zero width joiners exist in the middle of emoji sequences aren't supposed
+# to be normalized to nothing, but apparently this caused issues with
+# casefolding on ext4; also the mending heart caused a crash in xfs_scrub
+setf "\xf0\x9f\x92\x94" "broken heart"
+setf "\xe2\x9d\xa4\xef\xb8\x8f\xe2\x80\x8d\xf0\x9f\xa9\xb9" "mending heart"
+setf "\xf0\x9f\xa7\x91\xf0\x9f\x8f\xbb\xe2\x80\x8d\xe2\x9d\xa4\xef\xb8
+\x8f\xe2\x80\x8d\xf0\x9f\xa7\x91\xf0\x9f\x8f\xbc" "couple with heart"
+setf "\xf0\x9f\xa7\x91\xf0\x9f\x8f\xbb\xe2\x80\x8d\xe2\x9d\xa4\xef\xb8\x8f\xe2\x80\x8d\xf0\x9f\xa7\x91\xf0\x9f\x8f\xbf" "couple with heart, light and dark skin tone"
+
+# emoji heart hands with skin tone variations
+setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbf" "dark"
+setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbe" "medium dark"
+setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbd" "medium"
+setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbc" "medium light"
+setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbb" "light"
+setf "\xf0\x9f\xab\xb6" "neutral"
+
_getfattr --absolute-names -d "${testfile}" >> $seqres.full
echo "Test files"
@@ -174,6 +204,31 @@ testf "combmark_\xe1\x80\x9c\xe1\x80\xaf\xe1\x80\xad.txt" "combining marks"
testf "llamapirate\xf3\xa0\x80\x81\xf3\xa0\x81\x94\xf3\xa0\x81\xa8\xf3\xa0\x81\xa5\xf3\xa0\x80\xa0\xf3\xa0\x81\xb3\xf3\xa0\x81\xa1\xf3\xa0\x81\xac\xf3\xa0\x81\xa5\xf3\xa0\x81\xb3\xf3\xa0\x80\xa0\xf3\xa0\x81\xa6\xf3\xa0\x81\xaf\xf3\xa0\x81\xb2\xf3\xa0\x80\xa0\xf3\xa0\x81\x93\xf3\xa0\x81\xa5\xf3\xa0\x81\xa1\xf3\xa0\x81\xb4\xf3\xa0\x81\xb4\xf3\xa0\x81\xac\xf3\xa0\x81\xa5\xf3\xa0\x80\xa0\xf3\xa0\x81\xb7\xf3\xa0\x81\xa5\xf3\xa0\x81\xb2\xf3\xa0\x81\xa5\xf3\xa0\x80\xa0\xf3\xa0\x81\x95\xf3\xa0\x81\x93\xf3\xa0\x81\x84\xf3\xa0\x80\xa0\xf3\xa0\x80\xb1\xf3\xa0\x80\xb2\xf3\xa0\x80\xb0\xf3\xa0\x80\xb0\xf3\xa0\x80\xb0\xf3\xa0\x80\xb0\xf3\xa0\x81\xbf" "secret instructions"
testf "llamapirate" "no secret instructions"
+testf "\xf0\x9f\x92\x9c" "purple"
+testf "\xf0\x9f\x92\x99" "blue"
+testf "\xf0\x9f\x92\x9a" "green"
+testf "\xf0\x9f\x92\x9b" "yellow"
+testf "\xf0\x9f\xab\x80" "heart"
+testf "\xe2\x9d\xa4\xef\xb8\x8f" "red"
+testf "\xf0\x9f\xa4\x8e" "brown"
+testf "\xf0\x9f\xa4\x8d" "white"
+testf "\xf0\x9f\x96\xa4" "black"
+testf "\xf0\x9f\xa7\xa1" "orange"
+testf "\xe2\x99\xa5\xef\xb8\x8f" "red suit"
+
+testf "\xf0\x9f\x92\x94" "broken heart"
+testf "\xe2\x9d\xa4\xef\xb8\x8f\xe2\x80\x8d\xf0\x9f\xa9\xb9" "mending heart"
+testf "\xf0\x9f\xa7\x91\xf0\x9f\x8f\xbb\xe2\x80\x8d\xe2\x9d\xa4\xef\xb8
+\x8f\xe2\x80\x8d\xf0\x9f\xa7\x91\xf0\x9f\x8f\xbc" "couple with heart"
+testf "\xf0\x9f\xa7\x91\xf0\x9f\x8f\xbb\xe2\x80\x8d\xe2\x9d\xa4\xef\xb8\x8f\xe2\x80\x8d\xf0\x9f\xa7\x91\xf0\x9f\x8f\xbf" "couple with heart, light and dark skin tone"
+
+testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbf" "dark"
+testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbe" "medium dark"
+testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbd" "medium"
+testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbc" "medium light"
+testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbb" "light"
+testf "\xf0\x9f\xab\xb6" "neutral"
+
echo "Uniqueness of keys?"
crazy_keys="$(_getfattr --absolute-names -d "${testfile}" | grep -E -c '(french_|chinese_|greek_|arabic_|urk)')"
expected_keys=11
next prev parent reply other threads:[~2025-02-20 22:10 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-20 22:07 [PATCH] xfs_scrub: fix buffer overflow in string_escape Darrick J. Wong
2025-02-20 22:10 ` Darrick J. Wong [this message]
2025-02-24 14:34 ` [RFC PATCH] generic/45[34]: add colored emoji variants to unicode tests Christoph Hellwig
2025-02-24 8:36 ` [PATCH] xfs_scrub: fix buffer overflow in string_escape Andrey Albershteyn
2025-02-24 14:11 ` Christoph Hellwig
2025-02-24 17:54 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250220221027.GU21808@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=aalbersh@redhat.com \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.