From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: eguan@redhat.com, linux-xfs@vger.kernel.org, fstests@vger.kernel.org
Subject: Re: [PATCH 4/5] generic/45[34]: force UTF-8 codeset to enable utf-8 namer checks in xfs_scrub
Date: Fri, 20 Oct 2017 10:56:48 -0700 [thread overview]
Message-ID: <20171020175648.GA4741@magnolia> (raw)
In-Reply-To: <20171019071842.GA28970@infradead.org>
On Thu, Oct 19, 2017 at 12:18:42AM -0700, Christoph Hellwig wrote:
> On Wed, Oct 18, 2017 at 04:37:55PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> >
> > The upcoming xfs_scrub tool will have the ability to warn about
> > suspicious UTF-8 normalization collisions. We want generic/45[34] to be
> > able to test this functionality, but to do that we have to forcibly set
> > the codeset to UTF-8 via LC_ALL since the rest of xfstests only uses
> > LC_ALL=C.
>
> Wait. Where do you want to validate UTF-8 normalization? There is
> absolutely no guarantee that someone uses UTF-8, so any reliance on
> the character set in the file system is bogus.
I'll start by summarizing a problem statement[1]. In XFS (and nearly
all the other filesystems), neither the on-disk format nor the kernel
driver care about the contents of file names or attribute names; they
treat these as an arbitrary byte sequence. Userspace can set whatever
localization and encoding parameters it wants, and the kernel doesn't
care except for '\0' and '/'. That doesn't change.
In modern Linux userspace, however, we /do/ care about being able to
encode Unicode codepoints into byte streams, so we encode them in UTF8.
Because there's two different normalization methods in Unicode, this
leads to the funny situation where two unique filename byte sequences
can render the same but point to totally different files:
$ echo NFC > "$(echo -e "french_caf\xc3\xa9.txt")"
$ echo NFD > "$(echo -e "french_caf\xcc\x81.txt")"
$ ls -lai
133 -rw-r--r-- 1 root root 4 Oct 20 10:40 french_café.txt
132 -rw-r--r-- 1 root root 4 Oct 20 10:40 french_café.txt
$ echo $LANG
en_US.UTF-8
At least on my computer, the two filenames render identically yet point
to different inodes. This could be used to mislead people into opening
a malicious file whose name appears identical to a legitimate file.
xfs_scrub is the (proposed) userspace component of XFS online fsck. The
first four phases simply call the in-kernel fsck code and pass status
back, but the fifth phase walks the directory tree looking for problems.
If xfs_scrub (the userspace component of online fsck) was built with
libunistring and the LC_MESSAGES string contains "UTF-8", phase 5 will
warn if it finds multiple filenames in a directory that normalize to the
same string but point to different inodes. Similarly, it will warn
about colliding attribute names. Warnings in xfs_scrub are for
situations that warrant administrative review but are not filesystem
corruptions.
IOWs, if userspace is configured for UTF-8, the userspace part of online
fsck will flag suspicious-looking uses of Unicode for admin review. The
kernel remains uninvolved.
--D
[1] https://eclecticlight.co/2017/04/06/apfs-is-currently-unusable-with-most-non-english-languages/
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-10-20 17:57 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-18 23:37 [PATCH 0/5] miscellaneous fstests fixes Darrick J. Wong
2017-10-18 23:37 ` [PATCH 1/5] quota: clear speculative delalloc when checking quota usage Darrick J. Wong
2017-10-18 23:37 ` [PATCH 2/5] common/xfs: refactor xfs_scrub presence testing Darrick J. Wong
2017-10-25 11:04 ` Eryu Guan
2017-10-25 18:54 ` Darrick J. Wong
2017-10-18 23:37 ` [PATCH 3/5] common/xfs: standardize the xfs_scrub output that gets recorded to $seqres.full Darrick J. Wong
2017-10-25 11:06 ` Eryu Guan
2017-10-18 23:37 ` [PATCH 4/5] generic/45[34]: force UTF-8 codeset to enable utf-8 namer checks in xfs_scrub Darrick J. Wong
2017-10-19 7:18 ` Christoph Hellwig
2017-10-20 17:56 ` Darrick J. Wong [this message]
2017-10-18 23:38 ` [PATCH 5/5] xfs: test that we don't leak inodes and dquots during failed cow recovery Darrick J. Wong
2017-10-25 11:48 ` Eryu Guan
2017-10-25 19:09 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171020175648.GA4741@magnolia \
--to=darrick.wong@oracle.com \
--cc=eguan@redhat.com \
--cc=fstests@vger.kernel.org \
--cc=hch@infradead.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).