From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
linux-xfs@vger.kernel.org, david@fromorbit.com
Subject: Re: [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation
Date: Wed, 5 Apr 2023 08:40:22 -0700 [thread overview]
Message-ID: <20230405154022.GF303486@frogsfrogsfrogs> (raw)
In-Reply-To: <ZC0RaOeTFtCxFfhx@infradead.org>
On Tue, Apr 04, 2023 at 11:12:56PM -0700, Christoph Hellwig wrote:
> On Tue, Apr 04, 2023 at 11:32:14AM -0700, Darrick J. Wong wrote:
> > Yeah, I get that. Fifteen years ago, Barry Naujok and Christoph merged
> > this weird ascii-ci feature for XFS that purportedly does ... something.
> > It clearly only works properly if you force userspace to use latin1,
> > which is totally nuts in 2023 given that the distros default to UTF8
> > and likely don't test anything else. It probably wasn't even a good
> > idea in *2008*, but it went in anyway. Nobody tested this feature,
> > metadump breaks with this feature enabled, but as maintainer I get to
> > maintain these poorly designed half baked projects.
>
> IIRC the idea was that it should do 7-bit ASCII only, so even accepting
> Latin 1 characters seems like a bug compared to what it was documented
> to do.
>
> > I wouldn't ever enable this feature on any computer I use, and I think
> > the unicode case-insensitive stuff that's been put in to ext4 and f2fs
> > lately are not a tarpit that I ever want to visit in XFS. Directory
> > names should be sequences of bytes that don't include nulls or slashes,
> > end of story.
>
> That works fine if all you care is classic Linux / Unix users. And
> while I'd prefer if all the world was like that, the utf8 based CI
> has real use cases. Initially mostly for Samba file serving, but
> apparently Wine also really benefits from it, so some people have CI
> directories for that. XFS ignoring this means we are missing out on
> those usrers.
<shrug> Welllll... if someone presents a strong case for adopting the
utf8 casefolding feature that f2fs and ext4 added some ways back, I
could be persuaded to import that, bugs and all, into XFS. However,
given all the weird problems I've uncovered with "ascii"-ci, I'm going
to be very hardnosed about adding test cases and making sure /all/ the
tooling works properly.
I wasn't thrilled at all the "Handle invalid UTF8 sequence as either an
error or as an opaque byte sequence." that went into the ext4 code.
While I concede that it's the least-legacy-code-regressive solution to
people demanding to create non-utf8 filenames on a "utf8-casefold"
filesystem, it's just ... compromised.
Really it's "utf8 casefolded lookups if all the names you create are
valid utf8 byte sequences, and if you fail at that then we fall back to
memcmp(); also there's a strict-utf8 creat mode but you can't enable it".
Gross.
> The irony is all the utf8 infrastruture was developed for XFS use
> by SGI, never made it upstream back then and got picked up for ext4.
> And while it is objectively horrible, plugging into this actually
> working infrastructure would be the right thing for XFS instead
> of the whacky ASCII only mode only done as a stepping stone while
> the utf8 infrastructure got finished.
fsdevel, the gift that keeps on giving...
--D
next prev parent reply other threads:[~2023-04-05 15:40 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-04 17:07 [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
2023-04-04 17:07 ` [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation Darrick J. Wong
2023-04-04 17:54 ` Linus Torvalds
2023-04-04 18:32 ` Darrick J. Wong
2023-04-04 18:58 ` Linus Torvalds
2023-04-04 23:30 ` Dave Chinner
2023-04-05 0:17 ` Linus Torvalds
2023-04-05 6:12 ` Christoph Hellwig
2023-04-05 15:40 ` Darrick J. Wong [this message]
2023-04-05 15:42 ` Christoph Hellwig
2023-04-05 17:10 ` Darrick J. Wong
2023-04-05 10:48 ` Christoph Hellwig
2023-04-05 15:30 ` Darrick J. Wong
2023-04-05 15:45 ` Linus Torvalds
2023-04-04 17:07 ` [PATCH 2/3] xfs: test the ascii case-insensitive hash Darrick J. Wong
2023-04-04 18:06 ` Linus Torvalds
2023-04-04 20:51 ` Darrick J. Wong
2023-04-04 21:21 ` Linus Torvalds
2023-04-05 6:15 ` Christoph Hellwig
2023-04-04 17:07 ` [PATCH 3/3] xfs: use the directory name hash function for dir scrubbing Darrick J. Wong
2023-04-04 17:17 ` [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
2023-04-04 18:19 ` Linus Torvalds
2023-04-04 20:21 ` Linus Torvalds
2023-04-04 21:00 ` Darrick J. Wong
2023-04-04 21:50 ` Linus Torvalds
2023-04-04 21:09 ` [PATCH] xfstests: add a couple more tests for ascii-ci problems Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230405154022.GF303486@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-xfs@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox