From: "Darrick J. Wong" <djwong@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-xfs@vger.kernel.org, david@fromorbit.com,
Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH 2/3] xfs: test the ascii case-insensitive hash
Date: Tue, 4 Apr 2023 13:51:36 -0700 [thread overview]
Message-ID: <20230404205136.GA110000@frogsfrogsfrogs> (raw)
In-Reply-To: <CAHk-=wi-W-zJkW-URTQoLcLnRuwzmWj4MRqV6SHXmjKDV2zXFg@mail.gmail.com>
[now adding hch because his name is on the original patches from 2008]
On Tue, Apr 04, 2023 at 11:06:27AM -0700, Linus Torvalds wrote:
> On Tue, Apr 4, 2023 at 10:07 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> \> Now that we've made kernel and userspace use the same tolower code for
> > computing directory index hashes, add that to the selftest code.
>
> Please just delete this test. It's really really fundamentally wrong.
>
> The fact that you even *think* that you use the same tolower() as user
> space does shows just that you don't even understand how user space
> works.
Wrong. I'm well aware that userspace tolower and kernel tolower are
*NOT* the same thing. I'm trying to **STOP USING** tolower in XFS.
I'm **replacing** tolower with a new function that substitutes specific
bytes with other bytes, and I'm redefining the ondisk format to say that
names remain arbitrary sequences of bytes that do not include nulls or
slashes but that hash indexing and lookups will apply the above byte
transformation. The rules for that transformation may or may not
coincide with what anyone thinks is "upper case in ASCII", but that's
irrelevant. The rules will be the same in all places, and they will not
break any existing filesystems.
Maybe I should've named it xfs_ItB8n_ci_o43jM28() to make it clearer
that I don't care what ascii is, nor does it matter here. Perhaps I
should have written "Now that we've made kernel and userspace perform
the same mathematical transformation on dirent name byte arrays before
hashing, add that to the selftest code as well."
Christoph and Barry Naujok should have defined specifically the exact
transformation and the permitted ranges of name inputs when they wrote
the feature. I wasn't around when this feature was invented...
> Really. The only thing this series shows is that you do not understand
> the complexities.
...and I don't think they understood the complexities when the code was
written.
> Lookie here: compile and run this program:
>
> #include <stdio.h>
> #include <ctype.h>
> #include <locale.h>
>
> int main(int argc, char **argv)
> {
> printf("tolower(0xc4)=%#x\n", tolower(0xc4));
> setlocale(LC_ALL, "C");
> printf("tolower(0xc4)=%#x\n", tolower(0xc4));
> setlocale(LC_ALL, "sv_SE.iso88591");
> printf("tolower(0xc4)=%#x\n", tolower(0xc4));
> }
>
> and on my machine, I get this:
>
> tolower(0xc4)=0xc4
> tolower(0xc4)=0xc4
> tolower(0xc4)=0xe4
>
> and the important thing to note is that "on my machine". The first
> line could be *different* on some other machine (and the last line
> could be too: there's no guarantee that the sv_SE locale even exists).
>
> So this whole "kernel and userspace use the same tolower code"
> sentence is simply completely and utterly wrong. It's not even "wrong"
> in the sense fo "that's not true". It's "wrong" in the sense "that
> shows that you didn't understand the problem at all".
>
> Put another way: saying "5+8=10" is wrong. But saying "5+8=tiger" is
> nonsensical.
>
> Your patches are nonsensical.
I disagree. I'm saying that 5 💩 8 = tiger because that's what the 💩
operator does. 💩 is not +, even if 4 💩 8 = 12.
You claim to understand the complexities; how would /you/ fix this?
I'll send along the test cases that reproduce the problems.
--D
> Linus
next prev parent reply other threads:[~2023-04-04 20:51 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-04 17:07 [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
2023-04-04 17:07 ` [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation Darrick J. Wong
2023-04-04 17:54 ` Linus Torvalds
2023-04-04 18:32 ` Darrick J. Wong
2023-04-04 18:58 ` Linus Torvalds
2023-04-04 23:30 ` Dave Chinner
2023-04-05 0:17 ` Linus Torvalds
2023-04-05 6:12 ` Christoph Hellwig
2023-04-05 15:40 ` Darrick J. Wong
2023-04-05 15:42 ` Christoph Hellwig
2023-04-05 17:10 ` Darrick J. Wong
2023-04-05 10:48 ` Christoph Hellwig
2023-04-05 15:30 ` Darrick J. Wong
2023-04-05 15:45 ` Linus Torvalds
2023-04-04 17:07 ` [PATCH 2/3] xfs: test the ascii case-insensitive hash Darrick J. Wong
2023-04-04 18:06 ` Linus Torvalds
2023-04-04 20:51 ` Darrick J. Wong [this message]
2023-04-04 21:21 ` Linus Torvalds
2023-04-05 6:15 ` Christoph Hellwig
2023-04-04 17:07 ` [PATCH 3/3] xfs: use the directory name hash function for dir scrubbing Darrick J. Wong
2023-04-04 17:17 ` [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
2023-04-04 18:19 ` Linus Torvalds
2023-04-04 20:21 ` Linus Torvalds
2023-04-04 21:00 ` Darrick J. Wong
2023-04-04 21:50 ` Linus Torvalds
2023-04-04 21:09 ` [PATCH] xfstests: add a couple more tests for ascii-ci problems Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230404205136.GA110000@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-xfs@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox