From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation
Date: Wed, 5 Apr 2023 09:30:32 +1000 [thread overview]
Message-ID: <20230404233032.GL3223426@dread.disaster.area> (raw)
In-Reply-To: <20230404183214.GG109974@frogsfrogsfrogs>
On Tue, Apr 04, 2023 at 11:32:14AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 04, 2023 at 10:54:27AM -0700, Linus Torvalds wrote:
> > On Tue, Apr 4, 2023 at 10:07 AM Darrick J. Wong <djwong@kernel.org> wrote:
> > >
> > > + if (c >= 0xc0 && c <= 0xd6) /* latin A-O with accents */
> > > + return true;
> > > + if (c >= 0xd8 && c <= 0xde) /* latin O-Y with accents */
> > > + return true;
> >
> > Please don't do this.
> >
> > We're not in the dark ages any more. We don't do crazy locale-specific
> > crud. There is no such thing as "latin1" any more in any valid model.
> >
> > For example, it is true that 0xC4 is 'Ä' in Latin1, and that the
> > lower-case version is 'ä', and you can do the lower-casing exactly the
> > same way as you do for US-ASCII: you just set bit 5 (or "add 32" or
> > "subtract 0xE0" - the latter is what you seem to do, crazy as it is).
> >
> > So the above was fine back in the 80s, and questionably correct in the
> > 90s, but it is COMPLETE GARBAGE to do this in the year 2023.
>
> Yeah, I get that. Fifteen years ago, Barry Naujok and Christoph merged
> this weird ascii-ci feature for XFS that purportedly does ... something.
> It clearly only works properly if you force userspace to use latin1,
> which is totally nuts in 2023 given that the distros default to UTF8
> and likely don't test anything else. It probably wasn't even a good
> idea in *2008*, but it went in anyway. Nobody tested this feature,
> metadump breaks with this feature enabled, but as maintainer I get to
> maintain these poorly designed half baked projects.
It was written specifically for a NFS/CIFS fileserver appliance
product and Samba needed filesystem-side CI to be able to perform
even vaguely well on industry-standard fileserver benchmarketing
workloads that were all the rage at the time.
Because of the inherent problems with CI and UTF-8 encoding, etc, it
was decided that Samba would be configured to export latin1
encodings as that covered >90% of the target markets for the
product. Hence the "ascii-ci" casefolding code could be encoded into
the XFS directory operations and remove all the overhead of
casefolding from Samba.
In various "important" directory benchmarketing workloads, ascii-ci
resulted in speedups of 100-1000x. These were competitive results
comapred to the netapp/bluearc/etc appliances of the time in the
same cost bracket. Essentially, it was a special case solution to
meet a specific product requirement and was never intended to be
used outside that specific controlled environment.
Realistically, this is the one major downside of "upstream first"
development principle. i.e. when the vendor product that required
a specific feature is long gone, upstream still has to support that
functionality even though there may be no users of it remaining
and/or no good reason for it continuing to exist.
I'd suggest that after this is fixed we deprecate ascii-ci and make
it go away at the same time v4 filesystems go away. It was, after
all, a feature written for v4 filesystems....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2023-04-04 23:30 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-04 17:07 [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
2023-04-04 17:07 ` [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation Darrick J. Wong
2023-04-04 17:54 ` Linus Torvalds
2023-04-04 18:32 ` Darrick J. Wong
2023-04-04 18:58 ` Linus Torvalds
2023-04-04 23:30 ` Dave Chinner [this message]
2023-04-05 0:17 ` Linus Torvalds
2023-04-05 6:12 ` Christoph Hellwig
2023-04-05 15:40 ` Darrick J. Wong
2023-04-05 15:42 ` Christoph Hellwig
2023-04-05 17:10 ` Darrick J. Wong
2023-04-05 10:48 ` Christoph Hellwig
2023-04-05 15:30 ` Darrick J. Wong
2023-04-05 15:45 ` Linus Torvalds
2023-04-04 17:07 ` [PATCH 2/3] xfs: test the ascii case-insensitive hash Darrick J. Wong
2023-04-04 18:06 ` Linus Torvalds
2023-04-04 20:51 ` Darrick J. Wong
2023-04-04 21:21 ` Linus Torvalds
2023-04-05 6:15 ` Christoph Hellwig
2023-04-04 17:07 ` [PATCH 3/3] xfs: use the directory name hash function for dir scrubbing Darrick J. Wong
2023-04-04 17:17 ` [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
2023-04-04 18:19 ` Linus Torvalds
2023-04-04 20:21 ` Linus Torvalds
2023-04-04 21:00 ` Darrick J. Wong
2023-04-04 21:50 ` Linus Torvalds
2023-04-04 21:09 ` [PATCH] xfstests: add a couple more tests for ascii-ci problems Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230404233032.GL3223426@dread.disaster.area \
--to=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox