public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET] xfsprogs: handle emoji filenames better
@ 2025-02-24 19:07 Darrick J. Wong
  2025-02-24 19:07 ` [PATCH 1/3] xfs_scrub: fix buffer overflow in string_escape Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Darrick J. Wong @ 2025-02-24 19:07 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: linux-xfs, hch, linux-xfs

Hi all,

Ted told me about some bugs that the ext4 Unicode casefolding code has
suffered over the past year -- they tried stripping out zero width
joiner (ZWJ) codepoints to try to eliminate casefolded lookup comparison
issues, but doing so corrupts compound emoji handling in filenames.

XFS of course persists names with byte accuracy (aka it doesn't do
casefolding or normalization) so it's not affected by those problems.
However, xfs_scrub has the ability to warn about confusing names and
other utf8 shenanigans so I decided to expand fstests.

I wired up Ted's confusing names into generic/453 in fstests and it
promptly crashed when trying to warn about filenames that consist
entirely of compound emoji (e.g. heart + zwj + bandaid render as a heart
with a bandaid over it).  So there's a patch to fix that buffer
overflow.  There's a second patch to avoid complaining about ZWJ unless
it results in confusing names in the same namespace.  The third patch
fixes a minor reporting problem when parent pointers are enabled.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-emoji-fixes
---
Commits in this patchset:
 * xfs_scrub: fix buffer overflow in string_escape
 * xfs_scrub: don't warn about zero width joiner control characters
 * xfs_scrub: use the display mountpoint for reporting file corruptions
---
 scrub/common.c   |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 scrub/unicrash.c |   10 ++++++++--
 2 files changed, 58 insertions(+), 4 deletions(-)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] xfs_scrub: fix buffer overflow in string_escape
  2025-02-24 19:07 [PATCHSET] xfsprogs: handle emoji filenames better Darrick J. Wong
@ 2025-02-24 19:07 ` Darrick J. Wong
  2025-02-24 22:28   ` Christoph Hellwig
  2025-02-24 19:07 ` [PATCH 2/3] xfs_scrub: don't warn about zero width joiner control characters Darrick J. Wong
  2025-02-24 19:07 ` [PATCH 3/3] xfs_scrub: use the display mountpoint for reporting file corruptions Darrick J. Wong
  2 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2025-02-24 19:07 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: linux-xfs, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Need to allocate one more byte for the null terminator, just in case the
/entire/ input string consists of non-printable bytes e.g. emoji.

Cc: <linux-xfs@vger.kernel.org> # v4.15.0
Fixes: 396cd0223598bb ("xfs_scrub: warn about suspicious characters in directory/xattr names")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>
---
 scrub/common.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)


diff --git a/scrub/common.c b/scrub/common.c
index 6eb3c026dc5ac9..2b2d4a67bc47a2 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -320,7 +320,11 @@ string_escape(
 	char			*q;
 	int			x;
 
-	str = malloc(strlen(in) * 4);
+	/*
+	 * Each non-printing byte renders as a four-byte escape sequence, so
+	 * allocate 4x the input length, plus a byte for the null terminator.
+	 */
+	str = malloc(strlen(in) * 4 + 1);
 	if (!str)
 		return NULL;
 	for (p = in, q = str; *p != '\0'; p++) {


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] xfs_scrub: don't warn about zero width joiner control characters
  2025-02-24 19:07 [PATCHSET] xfsprogs: handle emoji filenames better Darrick J. Wong
  2025-02-24 19:07 ` [PATCH 1/3] xfs_scrub: fix buffer overflow in string_escape Darrick J. Wong
@ 2025-02-24 19:07 ` Darrick J. Wong
  2025-02-24 22:28   ` Christoph Hellwig
  2025-02-24 19:07 ` [PATCH 3/3] xfs_scrub: use the display mountpoint for reporting file corruptions Darrick J. Wong
  2 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2025-02-24 19:07 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: linux-xfs, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The Unicode code point for "zero width joiners" (aka 0x200D) is used to
hint to renderers that a sequence of simple code points should be
combined into a more complex rendering.  This is how compound emoji such
as "wounded heart" are composed out of "heart" and "bandaid"; and how
complex glyphs are rendered in Malayam.

Emoji in filenames are a supported usecase, so stop warning about the
mere existence of ZWJ.  We already warn about ZWJ that are used to
produce confusingly rendered names in a single namespace, so we're not
losing any robustness here.

Cc: <linux-xfs@vger.kernel.org> # v6.10.0
Fixes: d43362c78e3e37 ("xfs_scrub: store bad flags with the name entry")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/unicrash.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)


diff --git a/scrub/unicrash.c b/scrub/unicrash.c
index 143060b569f27c..b83bef644b6dce 100644
--- a/scrub/unicrash.c
+++ b/scrub/unicrash.c
@@ -508,8 +508,14 @@ name_entry_examine(
 		if (is_nonrendering(uchr))
 			ret |= UNICRASH_INVISIBLE;
 
-		/* control characters */
-		if (u_iscntrl(uchr))
+		/*
+		 * Warn about control characters in filenames except for zero
+		 * width joiners because those are used to construct compound
+		 * emoji and glyphs in various languages.  ZWJ is already
+		 * covered by UNICRASH_INVISIBLE, so we can detect its use in
+		 * confusing names.
+		 */
+		if (uchr != 0x200D && u_iscntrl(uchr))
 			ret |= UNICRASH_CONTROL_CHAR;
 
 		switch (u_charDirection(uchr)) {


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] xfs_scrub: use the display mountpoint for reporting file corruptions
  2025-02-24 19:07 [PATCHSET] xfsprogs: handle emoji filenames better Darrick J. Wong
  2025-02-24 19:07 ` [PATCH 1/3] xfs_scrub: fix buffer overflow in string_escape Darrick J. Wong
  2025-02-24 19:07 ` [PATCH 2/3] xfs_scrub: don't warn about zero width joiner control characters Darrick J. Wong
@ 2025-02-24 19:07 ` Darrick J. Wong
  2025-02-24 22:29   ` Christoph Hellwig
  2 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2025-02-24 19:07 UTC (permalink / raw)
  To: djwong, aalbersh; +Cc: linux-xfs, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

In systemd service mode, we make systemd bind-mount the target
mountpoint onto /tmp/scrub (/tmp is private to the service) so that
updates to the global mountpoint in the shared mount namespace don't
propagate into our service container and vice versa, and pass the path
to the bind mount to xfs_scrub via -M.  This solves races such as
unmounting of the target mount point after service container creation
but before process invocation that result in the wrong filesystem being
scanned.

IOWs, to scrub /usr, systemd runs "xfs_scrub -M /tmp/scrub /usr".
Pretend that /usr is a separate filesystem.

However, when xfs_scrub snapshots the handle of /tmp/scrub, libhandle
remembers that /tmp/scrub the beginning of the path, not the pathname
that we want to use for reporting (/usr).  This means that
handle_to_path returns /tmp/scrub and not /usr as well, with the
unfortunate result that file corrupts are reported with the pathnames in
the xfs_scrub@ service container, not the global ones.

Put another way, xfs_scrub should complain that /usr/bin/X is corrupt,
not /tmp/scrub/bin/X.

Therefore, modify scrub_render_ino_descr to manipulate the path buffer
during error reporting so that the user always gets the mountpoint
passed in, even if someone tells us to use another path for the actual
open() call in phase 1.

Cc: <linux-xfs@vger.kernel.org> # v6.10.0
Fixes: 9a8b09762f9a52 ("xfs_scrub: use parent pointers when possible to report file operations")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 scrub/common.c |   46 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)


diff --git a/scrub/common.c b/scrub/common.c
index 2b2d4a67bc47a2..14cd677ba5e9c4 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -418,15 +418,59 @@ scrub_render_ino_descr(
 
 	if (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_PARENT) {
 		struct xfs_handle handle;
+		char		*pathbuf = buf;
+		size_t		used = 0;
 
 		handle_from_fshandle(&handle, ctx->fshandle, ctx->fshandle_len);
 		handle_from_inogen(&handle, ino, gen);
 
+		/*
+		 * @actual_mntpoint is the path we used to open the filesystem,
+		 * and @mntpoint is the path we use for display purposes.  If
+		 * these aren't the same string, then for reporting purposes
+		 * we must fix the start of the path string.  Start by copying
+		 * the display mountpoint into buf, except for trailing
+		 * slashes.  At this point buf will not be null-terminated.
+		 */
+		if (ctx->actual_mntpoint != ctx->mntpoint) {
+			used = strlen(ctx->mntpoint);
+			while (used && ctx->mntpoint[used - 1] == '/')
+				used--;
+
+			/* If it doesn't fit, report the handle instead. */
+			if (used >= buflen) {
+				used = 0;
+				goto report_inum;
+			}
+
+			memcpy(buf, ctx->mntpoint, used);
+			pathbuf += used;
+		}
+
 		ret = handle_to_path(&handle, sizeof(struct xfs_handle), 4096,
-				buf, buflen);
+				pathbuf, buflen - used);
 		if (ret)
 			goto report_inum;
 
+		/*
+		 * Now that handle_to_path formatted the full path (including
+		 * the actual mount point, stripped of any trailing slashes)
+		 * into the rest of pathbuf, slide down the contents by the
+		 * length of the actual mount point.  Don't count any trailing
+		 * slashes because handle_to_path uses libhandle, which strips
+		 * trailing slashes.  Copy one more byte to ensure we get the
+		 * terminating null.
+		 */
+		if (ctx->actual_mntpoint != ctx->mntpoint) {
+			size_t	len = strlen(ctx->actual_mntpoint);
+
+			while (len && ctx->actual_mntpoint[len - 1] == '/')
+				len--;
+
+			pathlen = strlen(pathbuf);
+			memmove(pathbuf, pathbuf + len, pathlen - len + 1);
+		}
+
 		/*
 		 * Leave at least 16 bytes for the description of what went
 		 * wrong.  If we can't do that, we'll use the inode number.


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] xfs_scrub: fix buffer overflow in string_escape
  2025-02-24 19:07 ` [PATCH 1/3] xfs_scrub: fix buffer overflow in string_escape Darrick J. Wong
@ 2025-02-24 22:28   ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2025-02-24 22:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs, hch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] xfs_scrub: don't warn about zero width joiner control characters
  2025-02-24 19:07 ` [PATCH 2/3] xfs_scrub: don't warn about zero width joiner control characters Darrick J. Wong
@ 2025-02-24 22:28   ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2025-02-24 22:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] xfs_scrub: use the display mountpoint for reporting file corruptions
  2025-02-24 19:07 ` [PATCH 3/3] xfs_scrub: use the display mountpoint for reporting file corruptions Darrick J. Wong
@ 2025-02-24 22:29   ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2025-02-24 22:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: aalbersh, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-02-24 22:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-24 19:07 [PATCHSET] xfsprogs: handle emoji filenames better Darrick J. Wong
2025-02-24 19:07 ` [PATCH 1/3] xfs_scrub: fix buffer overflow in string_escape Darrick J. Wong
2025-02-24 22:28   ` Christoph Hellwig
2025-02-24 19:07 ` [PATCH 2/3] xfs_scrub: don't warn about zero width joiner control characters Darrick J. Wong
2025-02-24 22:28   ` Christoph Hellwig
2025-02-24 19:07 ` [PATCH 3/3] xfs_scrub: use the display mountpoint for reporting file corruptions Darrick J. Wong
2025-02-24 22:29   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox