From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@infradead.org>, david@fromorbit.com
Cc: torvalds@linux-foundation.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation
Date: Wed, 5 Apr 2023 08:30:02 -0700 [thread overview]
Message-ID: <20230405153002.GE303486@frogsfrogsfrogs> (raw)
In-Reply-To: <ZC1R4IRx7ZiBeeLJ@infradead.org>
On Wed, Apr 05, 2023 at 03:48:00AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 04, 2023 at 10:07:06AM -0700, Darrick J. Wong wrote:
> > Which means that the kernel and userspace do not agree on the hash value
> > for a directory filename that contains those higher values. The hash
> > values are written into the leaf index block of directories that are
> > larger than two blocks in size, which means that xfs_repair will flag
> > these directories as having corrupted hash indexes and rewrite the index
> > with hash values that the kernel now will not recognize.
> >
> > Because the ascii-ci feature is not frequently enabled and the kernel
> > touches filesystems far more frequently than xfs_repair does, fix this
> > by encoding the kernel's toupper predicate and tolower functions into
> > libxfs. This makes userspace's behavior consistent with the kernel.
>
> I agree with making the userspace behavior consistent with the actual
> kernel behavior. Sadly the documented behavior differs from both
> of them, so I think we need to also document the actual tables used
> in the mkfs.xfs manpage, as it isn't actually just ASCII.
Agreed. Given that kernel tolower() behavior has been stable since 1996
(and remaps the ISO 8859-1 accented letters), the "ASCII CI" feature
most closely maps to "ISO 8859-1 CI". But at this point there's not
even a shared understanding (Dave said latin1, you said 7-bit ascii,
IDGAF) so I agree that documenting the exact transformations in the
manpage is the only sane way forward.
I propose the changing the mkfs.xfs manpage wording from:
"The version=ci option enables ASCII only case-insensitive filename
lookup and version 2 directories. Filenames are case-preserving, that
is, the names are stored in directories using the case they were
created with."
into:
"If the version=ci option is specified, the kernel will transform
certain bytes in filenames before performing lookup-related operations.
The byte sequence given to create a directory entry is persisted without
alterations. The lookup transformations are defined as follows:
0x41 - 0x5a -> 0x61 - 0x7a
0xc0 - 0xd6 -> 0xe0 - 0xf6
0xd8 - 0xde -> 0xf8 - 0xfe
This transformation roughly corresponds to case insensitivity in ISO
8859-1 and may cause problems with other encodings (e.g. UTF8). The
feature will be disabled by default in September 2025, and removed from
the kernel in September 2030."
> Does the kernel twolower behavior map to an actual documented charset?
> In that case we can just point to it, which would be way either than
> documenting all the details.
It *seems* to operate on ISO 8859-1 (aka latin1), but Linus implied that
the history of lib/ctype.c is lost to the ages. Or at least 1996-era
mailing list archives.
--D
next prev parent reply other threads:[~2023-04-05 15:30 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-04 17:07 [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
2023-04-04 17:07 ` [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation Darrick J. Wong
2023-04-04 17:54 ` Linus Torvalds
2023-04-04 18:32 ` Darrick J. Wong
2023-04-04 18:58 ` Linus Torvalds
2023-04-04 23:30 ` Dave Chinner
2023-04-05 0:17 ` Linus Torvalds
2023-04-05 6:12 ` Christoph Hellwig
2023-04-05 15:40 ` Darrick J. Wong
2023-04-05 15:42 ` Christoph Hellwig
2023-04-05 17:10 ` Darrick J. Wong
2023-04-05 10:48 ` Christoph Hellwig
2023-04-05 15:30 ` Darrick J. Wong [this message]
2023-04-05 15:45 ` Linus Torvalds
2023-04-04 17:07 ` [PATCH 2/3] xfs: test the ascii case-insensitive hash Darrick J. Wong
2023-04-04 18:06 ` Linus Torvalds
2023-04-04 20:51 ` Darrick J. Wong
2023-04-04 21:21 ` Linus Torvalds
2023-04-05 6:15 ` Christoph Hellwig
2023-04-04 17:07 ` [PATCH 3/3] xfs: use the directory name hash function for dir scrubbing Darrick J. Wong
2023-04-04 17:17 ` [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
2023-04-04 18:19 ` Linus Torvalds
2023-04-04 20:21 ` Linus Torvalds
2023-04-04 21:00 ` Darrick J. Wong
2023-04-04 21:50 ` Linus Torvalds
2023-04-04 21:09 ` [PATCH] xfstests: add a couple more tests for ascii-ci problems Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230405153002.GE303486@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-xfs@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox