Linux XFS filesystem development
 help / color / mirror / Atom feed
From: Zorro Lang <zlang@kernel.org>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org, fstests@vger.kernel.org
Subject: Re: [PATCH 1/2] generic/45[34]: add detection of confusable variation sequences
Date: Fri, 8 May 2026 15:07:13 +0800	[thread overview]
Message-ID: <af2J30oIlwumD7wV@zlang-mailbox> (raw)
In-Reply-To: <177819254775.3505531.17842420789857268686.stgit@frogsfrogsfrogs>

On Thu, May 07, 2026 at 03:23:19PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> ArsTechnica recently wrote about a GitHub supply chain attack wherein
> non-rendering unicode sequences were embedded in javascript files to
> hide payloads that could be decrypted trivially later.  While these are
> unlikely to appear in file and attribute names, xfs_scrub will warn about
> this sort of steganography, so let's make sure it works.
> 
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---

Make sense to me, I saw your patch:

  commit 95329f9fa13040962c5a2a5e91a29ba215eb341f
  Author: Darrick J. Wong <djwong@kernel.org>
  Date:   Mon Apr 13 07:57:00 2026 -0700

      xfs_scrub: warn about unicode variation selectors in names

Maybe we can metion the fix in the test case or in the commit log?

Reviewed-by: Zorro Lang <zlang@kernel.org>

>  tests/generic/453 |   35 +++++++++++++++++++++++++++++++++++
>  tests/generic/454 |   36 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 71 insertions(+)
> 
> 
> diff --git a/tests/generic/453 b/tests/generic/453
> index bd5ce8b2bb11d9..0193b010306c48 100755
> --- a/tests/generic/453
> +++ b/tests/generic/453
> @@ -233,6 +233,20 @@ setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbc" "medium light"
>  setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbb" "light"
>  setf "\xf0\x9f\xab\xb6" "neutral"
>  
> +# confusion with variation selectors
> +setf "variations.txt" v0
> +setf "varia\xef\xb8\x80tions.txt" v1
> +setf "\xef\xb8\x80variations.txt" v2
> +setf "vari\xef\xb8\x80\xef\xb8\x81ations.txt" v3
> +setf "varia\xf3\xa0\x87\xa4tions.txt" v4
> +
> +# deprecated tags are considered control characters
> +setf "tags_moocow.txt" u0
> +setf "tags_m\xf3\xa0\x81\xadoocow.txt" u1
> +
> +# totally hidden name? "(Hi)" is the file name
> +setf "\xf3\xa0\x80\xa8\xf3\xa0\x81\x88\xf3\xa0\x81\xa9\xf3\xa0\x80\xa9" "(Hi)"
> +
>  ls -laR $testdir >> $seqres.full
>  
>  echo "Test files"
> @@ -331,6 +345,20 @@ testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbc" "medium light"
>  testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbb" "light"
>  testf "\xf0\x9f\xab\xb6" "neutral"
>  
> +# confusion with variation selectors
> +testf "variations.txt" v0
> +testf "varia\xef\xb8\x80tions.txt" v1
> +testf "\xef\xb8\x80variations.txt" v2
> +testf "vari\xef\xb8\x80\xef\xb8\x81ations.txt" v3
> +testf "varia\xf3\xa0\x87\xa4tions.txt" v4
> +
> +# deprecated tags are considered control characters
> +testf "tags_moocow.txt" u0
> +testf "tags_m\xf3\xa0\x81\xadoocow.txt" u1
> +
> +# totally hidden name? "(Hi)" is the file name
> +testf "\xf3\xa0\x80\xa8\xf3\xa0\x81\x88\xf3\xa0\x81\xa9\xf3\xa0\x80\xa9" "(Hi)"
> +
>  echo "Uniqueness of inodes?"
>  stat -c '%i' "${testdir}/"* | sort | uniq -c | while read nr inum; do
>  	if [ "${nr}" -gt 1 ]; then
> @@ -368,6 +396,13 @@ if _check_xfs_scrub_does_unicode "$SCRATCH_MNT" "$SCRATCH_DEV"; then
>  		grep -q "llamapirate" $tmp.scrub || echo "No complaints about hidden llm instructions in filenames?"
>  	fi
>  
> +	if grep -q "variations" $tmp.scrub; then
> +		grep -q 'varia.xef.xb8' $tmp.scrub || echo "No complaints about variation sequence confusion?"
> +		grep -q 'varia.xf3.xa0' $tmp.scrub || echo "No complaints about extended variation sequence confusion?"
> +		grep -q 'x80variations' $tmp.scrub || echo "No complaints about variations starting a name?"
> +		grep -q 'tags_m.xf3.xa0.x81' $tmp.scrub || echo "No complaints about deprecated unicode tags in a name?"
> +	fi
> +
>  	echo "Actual xfs_scrub output:" >> $seqres.full
>  	cat $tmp.scrub >> $seqres.full
>  fi
> diff --git a/tests/generic/454 b/tests/generic/454
> index 9f6ddb4a0e48b2..3454cae5d5ea6c 100755
> --- a/tests/generic/454
> +++ b/tests/generic/454
> @@ -154,6 +154,20 @@ setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbc" "medium light"
>  setf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbb" "light"
>  setf "\xf0\x9f\xab\xb6" "neutral"
>  
> +# confusion with variation selectors
> +setf "variations.txt" v0
> +setf "varia\xef\xb8\x80tions.txt" v1
> +setf "\xef\xb8\x80variations.txt" v2
> +setf "vari\xef\xb8\x80\xef\xb8\x81ations.txt" v3
> +setf "varia\xf3\xa0\x87\xa4tions.txt" v4
> +
> +# deprecated tags are considered control characters
> +setf "tags_moocow.txt" u0
> +setf "tags_m\xf3\xa0\x81\xadoocow.txt" u1
> +
> +# totally hidden name? "(Hi)" is the file name
> +setf "\xf3\xa0\x80\xa8\xf3\xa0\x81\x88\xf3\xa0\x81\xa9\xf3\xa0\x80\xa9" "(Hi)"
> +
>  _getfattr --absolute-names -d "${testfile}" >> $seqres.full
>  
>  echo "Test files"
> @@ -229,6 +243,20 @@ testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbc" "medium light"
>  testf "\xf0\x9f\xab\xb6\xf0\x9f\x8f\xbb" "light"
>  testf "\xf0\x9f\xab\xb6" "neutral"
>  
> +# confusion with variation selectors
> +testf "variations.txt" v0
> +testf "varia\xef\xb8\x80tions.txt" v1
> +testf "\xef\xb8\x80variations.txt" v2
> +testf "vari\xef\xb8\x80\xef\xb8\x81ations.txt" v3
> +testf "varia\xf3\xa0\x87\xa4tions.txt" v4
> +
> +# deprecated tags are considered control characters
> +testf "tags_moocow.txt" u0
> +testf "tags_m\xf3\xa0\x81\xadoocow.txt" u1
> +
> +# totally hidden name? "(Hi)" is the file name
> +testf "\xf3\xa0\x80\xa8\xf3\xa0\x81\x88\xf3\xa0\x81\xa9\xf3\xa0\x80\xa9" "(Hi)"
> +
>  echo "Uniqueness of keys?"
>  crazy_keys="$(_getfattr --absolute-names -d "${testfile}" | grep -E -c '(french_|chinese_|greek_|arabic_|urk)')"
>  expected_keys=11
> @@ -249,6 +277,14 @@ if _check_xfs_scrub_does_unicode "$SCRATCH_MNT" "$SCRATCH_DEV"; then
>  	grep -q "prohibition_" $tmp.scrub || echo "No complaints about prohibited sequence confusables?"
>  	grep -q "zerojoin_" $tmp.scrub || echo "No complaints about zero-width join confusables?"
>  	grep -q "llamapirate" $tmp.scrub || echo "No complaints about hidden llm instructions in filenames?"
> +
> +	if grep -q "variations" $tmp.scrub; then
> +		grep -q 'varia.xef.xb8' $tmp.scrub || echo "No complaints about variation sequence confusion?"
> +		grep -q 'varia.xf3.xa0' $tmp.scrub || echo "No complaints about extended variation sequence confusion?"
> +		grep -q 'x80variations' $tmp.scrub || echo "No complaints about variations starting a name?"
> +		grep -q 'tags_m.xf3.xa0.x81' $tmp.scrub || echo "No complaints about deprecated unicode tags in a name?"
> +	fi
> +
>  	echo "Actual xfs_scrub output:" >> $seqres.full
>  	echo "${output}" >> $seqres.full
>  fi
> 

  reply	other threads:[~2026-05-08  7:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-07 22:23 [PATCHSET] fstests: catch up with xfsprogs 7.0 Darrick J. Wong
2026-05-07 22:23 ` [PATCH 1/2] generic/45[34]: add detection of confusable variation sequences Darrick J. Wong
2026-05-08  7:07   ` Zorro Lang [this message]
2026-05-07 22:23 ` [PATCH 2/2] generic/45[34]: don't warn on mixed bidirectional characters Darrick J. Wong
2026-05-08  7:22   ` Zorro Lang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af2J30oIlwumD7wV@zlang-mailbox \
    --to=zlang@kernel.org \
    --cc=djwong@kernel.org \
    --cc=fstests@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox