From: Zorro Lang <zlang@kernel.org>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org, fstests@vger.kernel.org
Subject: Re: [PATCH 1/2] generic/45[34]: add detection of confusable variation sequences
Date: Fri, 8 May 2026 15:07:13 +0800 [thread overview]
Message-ID: <af2J30oIlwumD7wV@zlang-mailbox> (raw)
In-Reply-To: <177819254775.3505531.17842420789857268686.stgit@frogsfrogsfrogs>
On Thu, May 07, 2026 at 03:23:19PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> ArsTechnica recently wrote about a GitHub supply chain attack wherein
> non-rendering unicode sequences were embedded in javascript files to
> hide payloads that could be decrypted trivially later. While these are
> unlikely to appear in file and attribute names, xfs_scrub will warn about
> this sort of steganography, so let's make sure it works.
>
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
Make sense to me, I saw your patch:
commit 95329f9fa13040962c5a2a5e91a29ba215eb341f
Author: Darrick J. Wong <djwong@kernel.org>
Date: Mon Apr 13 07:57:00 2026 -0700
xfs_scrub: warn about unicode variation selectors in names
Maybe we can metion the fix in the test case or in the commit log?
Reviewed-by: Zorro Lang <zlang@kernel.org>
> tests/generic/453 | 35 +++++++++++++++++++++++++++++++++++
> tests/generic/454 | 36 ++++++++++++++++++++++++++++++++++++
> 2 files changed, 71 insertions(+)
>
>
> diff --git a/tests/generic/453 b/tests/generic/453
> index bd5ce8b2bb11d9..0193b010306c48 100755
> --- a/tests/generic/453
> +++ b/tests/generic/453
> @@ -233,6 +233,20 @@ setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbc" "medium light"
> setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbb" "light"
> setf "\xf0\x9f\xab\xb6" "neutral"
>
> +# confusion with variation selectors
> +setf "variations.txt" v0
> +setf "varia\xef\xb8\x80tions.txt" v1
> +setf "\xef\xb8\x80variations.txt" v2
> +setf "vari\xef\xb8\x80\xef\xb8\x81ations.txt" v3
> +setf "varia\xf3\xa0\x87\xa4tions.txt" v4
> +
> +# deprecated tags are considered control characters
> +setf "tags_moocow.txt" u0
> +setf "tags_m\xf3\xa0\x81\xadoocow.txt" u1
> +
> +# totally hidden name? "(Hi)" is the file name
> +setf "\xf3\xa0\x80\xa8\xf3\xa0\x81\x88\xf3\xa0\x81\xa9\xf3\xa0\x80\xa9" "(Hi)"
> +
> ls -laR $testdir >> $seqres.full
>
> echo "Test files"
> @@ -331,6 +345,20 @@ testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbc" "medium light"
> testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbb" "light"
> testf "\xf0\x9f\xab\xb6" "neutral"
>
> +# confusion with variation selectors
> +testf "variations.txt" v0
> +testf "varia\xef\xb8\x80tions.txt" v1
> +testf "\xef\xb8\x80variations.txt" v2
> +testf "vari\xef\xb8\x80\xef\xb8\x81ations.txt" v3
> +testf "varia\xf3\xa0\x87\xa4tions.txt" v4
> +
> +# deprecated tags are considered control characters
> +testf "tags_moocow.txt" u0
> +testf "tags_m\xf3\xa0\x81\xadoocow.txt" u1
> +
> +# totally hidden name? "(Hi)" is the file name
> +testf "\xf3\xa0\x80\xa8\xf3\xa0\x81\x88\xf3\xa0\x81\xa9\xf3\xa0\x80\xa9" "(Hi)"
> +
> echo "Uniqueness of inodes?"
> stat -c '%i' "${testdir}/"* | sort | uniq -c | while read nr inum; do
> if [ "${nr}" -gt 1 ]; then
> @@ -368,6 +396,13 @@ if _check_xfs_scrub_does_unicode "$SCRATCH_MNT" "$SCRATCH_DEV"; then
> grep -q "llamapirate" $tmp.scrub || echo "No complaints about hidden llm instructions in filenames?"
> fi
>
> + if grep -q "variations" $tmp.scrub; then
> + grep -q 'varia.xef.xb8' $tmp.scrub || echo "No complaints about variation sequence confusion?"
> + grep -q 'varia.xf3.xa0' $tmp.scrub || echo "No complaints about extended variation sequence confusion?"
> + grep -q 'x80variations' $tmp.scrub || echo "No complaints about variations starting a name?"
> + grep -q 'tags_m.xf3.xa0.x81' $tmp.scrub || echo "No complaints about deprecated unicode tags in a name?"
> + fi
> +
> echo "Actual xfs_scrub output:" >> $seqres.full
> cat $tmp.scrub >> $seqres.full
> fi
> diff --git a/tests/generic/454 b/tests/generic/454
> index 9f6ddb4a0e48b2..3454cae5d5ea6c 100755
> --- a/tests/generic/454
> +++ b/tests/generic/454
> @@ -154,6 +154,20 @@ setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbc" "medium light"
> setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbb" "light"
> setf "\xf0\x9f\xab\xb6" "neutral"
>
> +# confusion with variation selectors
> +setf "variations.txt" v0
> +setf "varia\xef\xb8\x80tions.txt" v1
> +setf "\xef\xb8\x80variations.txt" v2
> +setf "vari\xef\xb8\x80\xef\xb8\x81ations.txt" v3
> +setf "varia\xf3\xa0\x87\xa4tions.txt" v4
> +
> +# deprecated tags are considered control characters
> +setf "tags_moocow.txt" u0
> +setf "tags_m\xf3\xa0\x81\xadoocow.txt" u1
> +
> +# totally hidden name? "(Hi)" is the file name
> +setf "\xf3\xa0\x80\xa8\xf3\xa0\x81\x88\xf3\xa0\x81\xa9\xf3\xa0\x80\xa9" "(Hi)"
> +
> _getfattr --absolute-names -d "${testfile}" >> $seqres.full
>
> echo "Test files"
> @@ -229,6 +243,20 @@ testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbc" "medium light"
> testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbb" "light"
> testf "\xf0\x9f\xab\xb6" "neutral"
>
> +# confusion with variation selectors
> +testf "variations.txt" v0
> +testf "varia\xef\xb8\x80tions.txt" v1
> +testf "\xef\xb8\x80variations.txt" v2
> +testf "vari\xef\xb8\x80\xef\xb8\x81ations.txt" v3
> +testf "varia\xf3\xa0\x87\xa4tions.txt" v4
> +
> +# deprecated tags are considered control characters
> +testf "tags_moocow.txt" u0
> +testf "tags_m\xf3\xa0\x81\xadoocow.txt" u1
> +
> +# totally hidden name? "(Hi)" is the file name
> +testf "\xf3\xa0\x80\xa8\xf3\xa0\x81\x88\xf3\xa0\x81\xa9\xf3\xa0\x80\xa9" "(Hi)"
> +
> echo "Uniqueness of keys?"
> crazy_keys="$(_getfattr --absolute-names -d "${testfile}" | grep -E -c '(french_|chinese_|greek_|arabic_|urk)')"
> expected_keys=11
> @@ -249,6 +277,14 @@ if _check_xfs_scrub_does_unicode "$SCRATCH_MNT" "$SCRATCH_DEV"; then
> grep -q "prohibition_" $tmp.scrub || echo "No complaints about prohibited sequence confusables?"
> grep -q "zerojoin_" $tmp.scrub || echo "No complaints about zero-width join confusables?"
> grep -q "llamapirate" $tmp.scrub || echo "No complaints about hidden llm instructions in filenames?"
> +
> + if grep -q "variations" $tmp.scrub; then
> + grep -q 'varia.xef.xb8' $tmp.scrub || echo "No complaints about variation sequence confusion?"
> + grep -q 'varia.xf3.xa0' $tmp.scrub || echo "No complaints about extended variation sequence confusion?"
> + grep -q 'x80variations' $tmp.scrub || echo "No complaints about variations starting a name?"
> + grep -q 'tags_m.xf3.xa0.x81' $tmp.scrub || echo "No complaints about deprecated unicode tags in a name?"
> + fi
> +
> echo "Actual xfs_scrub output:" >> $seqres.full
> echo "${output}" >> $seqres.full
> fi
>
next prev parent reply other threads:[~2026-05-08 7:07 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-07 22:23 [PATCHSET] fstests: catch up with xfsprogs 7.0 Darrick J. Wong
2026-05-07 22:23 ` [PATCH 1/2] generic/45[34]: add detection of confusable variation sequences Darrick J. Wong
2026-05-08 7:07 ` Zorro Lang [this message]
2026-05-07 22:23 ` [PATCH 2/2] generic/45[34]: don't warn on mixed bidirectional characters Darrick J. Wong
2026-05-08 7:22 ` Zorro Lang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=af2J30oIlwumD7wV@zlang-mailbox \
--to=zlang@kernel.org \
--cc=djwong@kernel.org \
--cc=fstests@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox