[PATCHSET 0/3] xfs: fix ascii-ci problems with userspace

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace
@ 2023-04-04 17:07 Darrick J. Wong
  2023-04-04 17:07 ` [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation Darrick J. Wong
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-04 17:07 UTC (permalink / raw)
  To: torvalds, djwong; +Cc: linux-xfs, david

Hi all,

Last week, I was fiddling around with the metadump name obfuscation code
while writing a debugger command to generate directories full of names
that all have the same hash name.  I had a few questions about how well
all that worked with ascii-ci mode, and discovered a nasty discrepancy
between the kernel and glibc's implementations of the tolower()
function.

I discovered that I could create a directory that is large enough to
require separate leaf index blocks.  The hashes stored in the dabtree
use the ascii-ci specific hash function, which uses a library function
to convert the name to lowercase before hashing.  If the kernel and C
library's versions of tolower do not behave exactly identically,
xfs_ascii_ci_hashname will not produce the same results for the same
inputs.  xfs_repair will deem the leaf information corrupt and rebuild
the directory.  After that, lookups in the kernel will fail because the
hash index doesn't work.

The kernel's tolower function will convert extended ascii uppercase
letters (e.g. A-with-umlaut) to extended ascii lowercase letters (e.g.
a-with-umlaut), whereas glibc's will only do that if you force LANG to
ascii.  Tiny embedded libc implementations just plain won't do it at
all, and the result is a mess.  Stabilize the behavior of the hash
function by encoding the kernel's tolower function in libxfs, add it to
the selftest, and fix xfs_scrub not handling this correctly.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fix-asciici-tolower-6.3
---
 fs/xfs/libxfs/xfs_dir2.c |    4 -
 fs/xfs/libxfs/xfs_dir2.h |   20 ++++
 fs/xfs/scrub/dir.c       |    7 +-
 fs/xfs/xfs_dahash_test.c |  211 ++++++++++++++++++++++++----------------------
 4 files changed, 139 insertions(+), 103 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation
  2023-04-04 17:07 [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
@ 2023-04-04 17:07 ` Darrick J. Wong
  2023-04-04 17:54   ` Linus Torvalds
  2023-04-05 10:48   ` Christoph Hellwig
  2023-04-04 17:07 ` [PATCH 2/3] xfs: test the ascii case-insensitive hash Darrick J. Wong
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-04 17:07 UTC (permalink / raw)
  To: torvalds, djwong; +Cc: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

There's a major discrepancy in the function that computes directory entry
hashes for filesystems that have ASCII case-insensitive lookups enabled.
The root of this is that the kernel and glibc's tolower implementations
have differing behavior for extended ASCII accented characters.  I wrote
a program to spit out characters for which the tolower() return value is
different from the input:

glibc tolower:
65:A 66:B 67:C 68:D 69:E 70:F 71:G 72:H 73:I 74:J 75:K 76:L 77:M 78:N
79:O 80:P 81:Q 82:R 83:S 84:T 85:U 86:V 87:W 88:X 89:Y 90:Z

kernel tolower:
65:A 66:B 67:C 68:D 69:E 70:F 71:G 72:H 73:I 74:J 75:K 76:L 77:M 78:N
79:O 80:P 81:Q 82:R 83:S 84:T 85:U 86:V 87:W 88:X 89:Y 90:Z 192:À 193:Á
194:Â 195:Ã 196:Ä 197:Å 198:Æ 199:Ç 200:È 201:É 202:Ê 203:Ë 204:Ì 205:Í
206:Î 207:Ï 208:Ð 209:Ñ 210:Ò 211:Ó 212:Ô 213:Õ 214:Ö 215:× 216:Ø 217:Ù
218:Ú 219:Û 220:Ü 221:Ý 222:Þ

Which means that the kernel and userspace do not agree on the hash value
for a directory filename that contains those higher values.  The hash
values are written into the leaf index block of directories that are
larger than two blocks in size, which means that xfs_repair will flag
these directories as having corrupted hash indexes and rewrite the index
with hash values that the kernel now will not recognize.

Because the ascii-ci feature is not frequently enabled and the kernel
touches filesystems far more frequently than xfs_repair does, fix this
by encoding the kernel's toupper predicate and tolower functions into
libxfs.  This makes userspace's behavior consistent with the kernel.

Found by auditing obfuscate_name in xfs_metadump as part of working on
parent pointers, wondering how it could possibly work correctly with ci
filesystems, writing a test tool to create a directory with
hash-colliding names, and watching xfs_repair flag it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_dir2.c |    4 ++--
 fs/xfs/libxfs/xfs_dir2.h |   20 ++++++++++++++++++++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index 92bac3373f1f..eb3187ee977d 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -64,7 +64,7 @@ xfs_ascii_ci_hashname(
 	int			i;

 	for (i = 0, hash = 0; i < name->len; i++)
-		hash = tolower(name->name[i]) ^ rol32(hash, 7);
+		hash = xfs_ascii_ci_tolower(name->name[i]) ^ rol32(hash, 7);

 	return hash;
 }
@@ -85,7 +85,7 @@ xfs_ascii_ci_compname(
 	for (i = 0; i < len; i++) {
 		if (args->name[i] == name[i])
 			continue;
-		if (tolower(args->name[i]) != tolower(name[i]))
+		if (xfs_ascii_ci_tolower(args->name[i]) != xfs_ascii_ci_tolower(name[i]))
 			return XFS_CMP_DIFFERENT;
 		result = XFS_CMP_CASE;
 	}
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index dd39f17dd9a9..2772fd53279c 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -248,4 +248,24 @@ unsigned int xfs_dir3_data_end_offset(struct xfs_da_geometry *geo,
 		struct xfs_dir2_data_hdr *hdr);
 bool xfs_dir2_namecheck(const void *name, size_t length);

+static inline bool
+xfs_ascii_ci_isupper(unsigned char c)
+{
+	if (c >= 0x41 && c <= 0x5a)	/* A-Z */
+		return true;
+	if (c >= 0xc0 && c <= 0xd6)	/* latin A-O with accents */
+		return true;
+	if (c >= 0xd8 && c <= 0xde)	/* latin O-Y with accents */
+		return true;
+	return false;
+}
+
+static inline unsigned char
+xfs_ascii_ci_tolower(unsigned char c)
+{
+	if (xfs_ascii_ci_isupper(c))
+		c -= 'A' - 'a';
+	return c;
+}
+
 #endif	/* __XFS_DIR2_H__ */

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/3] xfs: test the ascii case-insensitive hash
  2023-04-04 17:07 [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
  2023-04-04 17:07 ` [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation Darrick J. Wong
@ 2023-04-04 17:07 ` Darrick J. Wong
  2023-04-04 18:06   ` Linus Torvalds
  2023-04-04 17:07 ` [PATCH 3/3] xfs: use the directory name hash function for dir scrubbing Darrick J. Wong
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-04 17:07 UTC (permalink / raw)
  To: torvalds, djwong; +Cc: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

Now that we've made kernel and userspace use the same tolower code for
computing directory index hashes, add that to the selftest code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_dahash_test.c |  211 ++++++++++++++++++++++++----------------------
 1 file changed, 111 insertions(+), 100 deletions(-)


diff --git a/fs/xfs/xfs_dahash_test.c b/fs/xfs/xfs_dahash_test.c
index 230651ab5ce4..e4efb82fc2f5 100644
--- a/fs/xfs/xfs_dahash_test.c
+++ b/fs/xfs/xfs_dahash_test.c
@@ -9,6 +9,9 @@
 #include "xfs_format.h"
 #include "xfs_da_format.h"
 #include "xfs_da_btree.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_dir2_priv.h"
 #include "xfs_dahash_test.h"
 
 /* 4096 random bytes */
@@ -533,108 +536,109 @@ static struct dahash_test {
 	uint16_t	start;	/* random 12 bit offset in buf */
 	uint16_t	length;	/* random 8 bit length of test */
 	xfs_dahash_t	dahash;	/* expected dahash result */
+	xfs_dahash_t	ascii_ci_hash;	/* expected ascii-ci hash result */
 } test[] __initdata =
 {
-	{0x0567, 0x0097, 0x96951389},
-	{0x0869, 0x0055, 0x6455ab4f},
-	{0x0c51, 0x00be, 0x8663afde},
-	{0x044a, 0x00fc, 0x98fbe432},
-	{0x0f29, 0x0079, 0x42371997},
-	{0x08ba, 0x0052, 0x942be4f7},
-	{0x01f2, 0x0013, 0x5262687e},
-	{0x09e3, 0x00e2, 0x8ffb0908},
-	{0x007c, 0x0051, 0xb3158491},
-	{0x0854, 0x001f, 0x83bb20d9},
-	{0x031b, 0x0008, 0x98970bdf},
-	{0x0de7, 0x0027, 0xbfbf6f6c},
-	{0x0f76, 0x0005, 0x906a7105},
-	{0x092e, 0x00d0, 0x86631850},
-	{0x0233, 0x0082, 0xdbdd914e},
-	{0x04c9, 0x0075, 0x5a400a9e},
-	{0x0b66, 0x0099, 0xae128b45},
-	{0x000d, 0x00ed, 0xe61c216a},
-	{0x0a31, 0x003d, 0xf69663b9},
-	{0x00a3, 0x0052, 0x643c39ae},
-	{0x0125, 0x00d5, 0x7c310b0d},
-	{0x0105, 0x004a, 0x06a77e74},
-	{0x0858, 0x008e, 0x265bc739},
-	{0x045e, 0x0095, 0x13d6b192},
-	{0x0dab, 0x003c, 0xc4498704},
-	{0x00cd, 0x00b5, 0x802a4e2d},
-	{0x069b, 0x008c, 0x5df60f71},
-	{0x0454, 0x006c, 0x5f03d8bb},
-	{0x040e, 0x0032, 0x0ce513b5},
-	{0x0874, 0x00e2, 0x6a811fb3},
-	{0x0521, 0x00b4, 0x93296833},
-	{0x0ddc, 0x00cf, 0xf9305338},
-	{0x0a70, 0x0023, 0x239549ea},
-	{0x083e, 0x0027, 0x2d88ba97},
-	{0x0241, 0x00a7, 0xfe0b32e1},
-	{0x0dfc, 0x0096, 0x1a11e815},
-	{0x023e, 0x001e, 0xebc9a1f3},
-	{0x067e, 0x0066, 0xb1067f81},
-	{0x09ea, 0x000e, 0x46fd7247},
-	{0x036b, 0x008c, 0x1a39acdf},
-	{0x078f, 0x0030, 0x964042ab},
-	{0x085c, 0x008f, 0x1829edab},
-	{0x02ec, 0x009f, 0x6aefa72d},
-	{0x043b, 0x00ce, 0x65642ff5},
-	{0x0a32, 0x00b8, 0xbd82759e},
-	{0x0d3c, 0x0087, 0xf4d66d54},
-	{0x09ec, 0x008a, 0x06bfa1ff},
-	{0x0902, 0x0015, 0x755025d2},
-	{0x08fe, 0x000e, 0xf690ce2d},
-	{0x00fb, 0x00dc, 0xe55f1528},
-	{0x0eaa, 0x003a, 0x0fe0a8d7},
-	{0x05fb, 0x0006, 0x86281cfb},
-	{0x0dd1, 0x00a7, 0x60ab51b4},
-	{0x0005, 0x001b, 0xf51d969b},
-	{0x077c, 0x00dd, 0xc2fed268},
-	{0x0575, 0x00f5, 0x432c0b1a},
-	{0x05be, 0x0088, 0x78baa04b},
-	{0x0c89, 0x0068, 0xeda9e428},
-	{0x0f5c, 0x0068, 0xec143c76},
-	{0x06a8, 0x0009, 0xd72651ce},
-	{0x060f, 0x008e, 0x765426cd},
-	{0x07b1, 0x0047, 0x2cfcfa0c},
-	{0x04f1, 0x0041, 0x55b172f9},
-	{0x0e05, 0x00ac, 0x61efde93},
-	{0x0bf7, 0x0097, 0x05b83eee},
-	{0x04e9, 0x00f3, 0x9928223a},
-	{0x023a, 0x0005, 0xdfada9bc},
-	{0x0acb, 0x000e, 0x2217cecd},
-	{0x0148, 0x0060, 0xbc3f7405},
-	{0x0764, 0x0059, 0xcbc201b1},
-	{0x021f, 0x0059, 0x5d6b2256},
-	{0x0f1e, 0x006c, 0xdefeeb45},
-	{0x071c, 0x00b9, 0xb9b59309},
-	{0x0564, 0x0063, 0xae064271},
-	{0x0b14, 0x0044, 0xdb867d9b},
-	{0x0e5a, 0x0055, 0xff06b685},
-	{0x015e, 0x00ba, 0x1115ccbc},
-	{0x0379, 0x00e6, 0x5f4e58dd},
-	{0x013b, 0x0067, 0x4897427e},
-	{0x0e64, 0x0071, 0x7af2b7a4},
-	{0x0a11, 0x0050, 0x92105726},
-	{0x0109, 0x0055, 0xd0d000f9},
-	{0x00aa, 0x0022, 0x815d229d},
-	{0x09ac, 0x004f, 0x02f9d985},
-	{0x0e1b, 0x00ce, 0x5cf92ab4},
-	{0x08af, 0x00d8, 0x17ca72d1},
-	{0x0e33, 0x000a, 0xda2dba6b},
-	{0x0ee3, 0x006a, 0xb00048e5},
-	{0x0648, 0x001a, 0x2364b8cb},
-	{0x0315, 0x0085, 0x0596fd0d},
-	{0x0fbb, 0x003e, 0x298230ca},
-	{0x0422, 0x006a, 0x78ada4ab},
-	{0x04ba, 0x0073, 0xced1fbc2},
-	{0x007d, 0x0061, 0x4b7ff236},
-	{0x070b, 0x00d0, 0x261cf0ae},
-	{0x0c1a, 0x0035, 0x8be92ee2},
-	{0x0af8, 0x0063, 0x824dcf03},
-	{0x08f8, 0x006d, 0xd289710c},
-	{0x021b, 0x00ee, 0x6ac1c41d},
-	{0x05b5, 0x00da, 0x8e52f0e2},
+	{0x0567, 0x0097, 0x96951389, 0xc153aa0d},
+	{0x0869, 0x0055, 0x6455ab4f, 0xd07f69bf},
+	{0x0c51, 0x00be, 0x8663afde, 0xf9add90c},
+	{0x044a, 0x00fc, 0x98fbe432, 0xbf2abb76},
+	{0x0f29, 0x0079, 0x42371997, 0x282588b3},
+	{0x08ba, 0x0052, 0x942be4f7, 0x2e023547},
+	{0x01f2, 0x0013, 0x5262687e, 0x5266287e},
+	{0x09e3, 0x00e2, 0x8ffb0908, 0x1da892f3},
+	{0x007c, 0x0051, 0xb3158491, 0xb67f9e63},
+	{0x0854, 0x001f, 0x83bb20d9, 0x22bb21db},
+	{0x031b, 0x0008, 0x98970bdf, 0x9cd70adf},
+	{0x0de7, 0x0027, 0xbfbf6f6c, 0xae3f296c},
+	{0x0f76, 0x0005, 0x906a7105, 0x906a7105},
+	{0x092e, 0x00d0, 0x86631850, 0xa3f6ac04},
+	{0x0233, 0x0082, 0xdbdd914e, 0x5d8c7aac},
+	{0x04c9, 0x0075, 0x5a400a9e, 0x12f60711},
+	{0x0b66, 0x0099, 0xae128b45, 0x7551310d},
+	{0x000d, 0x00ed, 0xe61c216a, 0xc22d3c4c},
+	{0x0a31, 0x003d, 0xf69663b9, 0x51960bf8},
+	{0x00a3, 0x0052, 0x643c39ae, 0xa93c73a8},
+	{0x0125, 0x00d5, 0x7c310b0d, 0xf221cbb3},
+	{0x0105, 0x004a, 0x06a77e74, 0xa4ef4561},
+	{0x0858, 0x008e, 0x265bc739, 0xd6c36d9b},
+	{0x045e, 0x0095, 0x13d6b192, 0x5f5c1d62},
+	{0x0dab, 0x003c, 0xc4498704, 0x10414654},
+	{0x00cd, 0x00b5, 0x802a4e2d, 0xfbd17c9d},
+	{0x069b, 0x008c, 0x5df60f71, 0x91ddca5f},
+	{0x0454, 0x006c, 0x5f03d8bb, 0x5c59fce0},
+	{0x040e, 0x0032, 0x0ce513b5, 0xa8cd99b1},
+	{0x0874, 0x00e2, 0x6a811fb3, 0xca028316},
+	{0x0521, 0x00b4, 0x93296833, 0x2c4d4880},
+	{0x0ddc, 0x00cf, 0xf9305338, 0x2c94210d},
+	{0x0a70, 0x0023, 0x239549ea, 0x22b561aa},
+	{0x083e, 0x0027, 0x2d88ba97, 0x5cd8bb9d},
+	{0x0241, 0x00a7, 0xfe0b32e1, 0x17b506b8},
+	{0x0dfc, 0x0096, 0x1a11e815, 0xee4141bd},
+	{0x023e, 0x001e, 0xebc9a1f3, 0x5689a1f3},
+	{0x067e, 0x0066, 0xb1067f81, 0xd9952571},
+	{0x09ea, 0x000e, 0x46fd7247, 0x42b57245},
+	{0x036b, 0x008c, 0x1a39acdf, 0x58bf1586},
+	{0x078f, 0x0030, 0x964042ab, 0xb04218b9},
+	{0x085c, 0x008f, 0x1829edab, 0x9ceca89c},
+	{0x02ec, 0x009f, 0x6aefa72d, 0x634cc2a7},
+	{0x043b, 0x00ce, 0x65642ff5, 0x6c8a584e},
+	{0x0a32, 0x00b8, 0xbd82759e, 0x0f96a34f},
+	{0x0d3c, 0x0087, 0xf4d66d54, 0xb71ba5f4},
+	{0x09ec, 0x008a, 0x06bfa1ff, 0x576ca80f},
+	{0x0902, 0x0015, 0x755025d2, 0x517225c2},
+	{0x08fe, 0x000e, 0xf690ce2d, 0xf690cf3d},
+	{0x00fb, 0x00dc, 0xe55f1528, 0x707d7d92},
+	{0x0eaa, 0x003a, 0x0fe0a8d7, 0x87638cc5},
+	{0x05fb, 0x0006, 0x86281cfb, 0x86281cf9},
+	{0x0dd1, 0x00a7, 0x60ab51b4, 0xe28ef00c},
+	{0x0005, 0x001b, 0xf51d969b, 0xe71dd6d3},
+	{0x077c, 0x00dd, 0xc2fed268, 0xdc30c555},
+	{0x0575, 0x00f5, 0x432c0b1a, 0x81dd7d16},
+	{0x05be, 0x0088, 0x78baa04b, 0xd69b433e},
+	{0x0c89, 0x0068, 0xeda9e428, 0xe9b4fa0a},
+	{0x0f5c, 0x0068, 0xec143c76, 0x9947067a},
+	{0x06a8, 0x0009, 0xd72651ce, 0xd72651ee},
+	{0x060f, 0x008e, 0x765426cd, 0x2099626f},
+	{0x07b1, 0x0047, 0x2cfcfa0c, 0x1a4baa07},
+	{0x04f1, 0x0041, 0x55b172f9, 0x15331a79},
+	{0x0e05, 0x00ac, 0x61efde93, 0x320568cc},
+	{0x0bf7, 0x0097, 0x05b83eee, 0xc72fb7a3},
+	{0x04e9, 0x00f3, 0x9928223a, 0xe8c77de2},
+	{0x023a, 0x0005, 0xdfada9bc, 0xdfadb9be},
+	{0x0acb, 0x000e, 0x2217cecd, 0x0017d6cd},
+	{0x0148, 0x0060, 0xbc3f7405, 0xf5fd6615},
+	{0x0764, 0x0059, 0xcbc201b1, 0xbb089bf4},
+	{0x021f, 0x0059, 0x5d6b2256, 0xa16a0a59},
+	{0x0f1e, 0x006c, 0xdefeeb45, 0xfc34f9d6},
+	{0x071c, 0x00b9, 0xb9b59309, 0xb645eae2},
+	{0x0564, 0x0063, 0xae064271, 0x954dc6d1},
+	{0x0b14, 0x0044, 0xdb867d9b, 0xdf432309},
+	{0x0e5a, 0x0055, 0xff06b685, 0xa65ff257},
+	{0x015e, 0x00ba, 0x1115ccbc, 0x11c365f4},
+	{0x0379, 0x00e6, 0x5f4e58dd, 0x2d176d31},
+	{0x013b, 0x0067, 0x4897427e, 0xc40532fe},
+	{0x0e64, 0x0071, 0x7af2b7a4, 0x1fb7bf43},
+	{0x0a11, 0x0050, 0x92105726, 0xb1185e51},
+	{0x0109, 0x0055, 0xd0d000f9, 0x60a60bfd},
+	{0x00aa, 0x0022, 0x815d229d, 0x215d379c},
+	{0x09ac, 0x004f, 0x02f9d985, 0x10b90b20},
+	{0x0e1b, 0x00ce, 0x5cf92ab4, 0x6a477573},
+	{0x08af, 0x00d8, 0x17ca72d1, 0x385af156},
+	{0x0e33, 0x000a, 0xda2dba6b, 0xda2dbb69},
+	{0x0ee3, 0x006a, 0xb00048e5, 0xa9a2decc},
+	{0x0648, 0x001a, 0x2364b8cb, 0x3364b1cb},
+	{0x0315, 0x0085, 0x0596fd0d, 0xa651740f},
+	{0x0fbb, 0x003e, 0x298230ca, 0x7fc617c7},
+	{0x0422, 0x006a, 0x78ada4ab, 0xc576ae2a},
+	{0x04ba, 0x0073, 0xced1fbc2, 0xaac8455b},
+	{0x007d, 0x0061, 0x4b7ff236, 0x347d5739},
+	{0x070b, 0x00d0, 0x261cf0ae, 0xc7fb1c10},
+	{0x0c1a, 0x0035, 0x8be92ee2, 0x8be9b4e1},
+	{0x0af8, 0x0063, 0x824dcf03, 0x53010388},
+	{0x08f8, 0x006d, 0xd289710c, 0x30418edd},
+	{0x021b, 0x00ee, 0x6ac1c41d, 0x2557e9a3},
+	{0x05b5, 0x00da, 0x8e52f0e2, 0x98531012},
 };
 
 int __init
@@ -644,12 +648,19 @@ xfs_dahash_test(void)
 	unsigned int	errors = 0;
 
 	for (i = 0; i < ARRAY_SIZE(test); i++) {
+		struct xfs_name	xname = { };
 		xfs_dahash_t	hash;
 
 		hash = xfs_da_hashname(test_buf + test[i].start,
 				test[i].length);
 		if (hash != test[i].dahash)
 			errors++;
+
+		xname.name = test_buf + test[i].start;
+		xname.len = test[i].length;
+		hash = xfs_ascii_ci_hashname(&xname);
+		if (hash != test[i].ascii_ci_hash)
+			errors++;
 	}
 
 	if (errors) {


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/3] xfs: use the directory name hash function for dir scrubbing
  2023-04-04 17:07 [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
  2023-04-04 17:07 ` [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation Darrick J. Wong
  2023-04-04 17:07 ` [PATCH 2/3] xfs: test the ascii case-insensitive hash Darrick J. Wong
@ 2023-04-04 17:07 ` Darrick J. Wong
  2023-04-04 17:17 ` [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
  2023-04-04 21:09 ` [PATCH] xfstests: add a couple more tests for ascii-ci problems Darrick J. Wong
  4 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-04 17:07 UTC (permalink / raw)
  To: torvalds, djwong; +Cc: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

The directory code has a directory-specific hash computation function
that includes a modified hash function for case-insensitive lookups.
Hence we must use that function (and not the raw da_hashname) when
checking the dabtree structure.

Found by accidentally breaking xfs/188 to create an abnormally huge
case-insensitive directory and watching scrub break.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/dir.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index d1b0f23c2c59..aeb815a483ff 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -201,6 +201,7 @@ xchk_dir_rec(
 	struct xchk_da_btree		*ds,
 	int				level)
 {
+	struct xfs_name			dname = { };
 	struct xfs_da_state_blk		*blk = &ds->state->path.blk[level];
 	struct xfs_mount		*mp = ds->state->mp;
 	struct xfs_inode		*dp = ds->dargs.dp;
@@ -297,7 +298,11 @@ xchk_dir_rec(
 		xchk_fblock_set_corrupt(ds->sc, XFS_DATA_FORK, rec_bno);
 		goto out_relse;
 	}
-	calc_hash = xfs_da_hashname(dent->name, dent->namelen);
+
+	/* Does the directory hash match? */
+	dname.name = dent->name;
+	dname.len = dent->namelen;
+	calc_hash = xfs_dir2_hashname(mp, &dname);
 	if (calc_hash != hash)
 		xchk_fblock_set_corrupt(ds->sc, XFS_DATA_FORK, rec_bno);
 


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace
  2023-04-04 17:07 [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
                   ` (2 preceding siblings ...)
  2023-04-04 17:07 ` [PATCH 3/3] xfs: use the directory name hash function for dir scrubbing Darrick J. Wong
@ 2023-04-04 17:17 ` Darrick J. Wong
  2023-04-04 18:19   ` Linus Torvalds
  2023-04-04 21:09 ` [PATCH] xfstests: add a couple more tests for ascii-ci problems Darrick J. Wong
  4 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-04 17:17 UTC (permalink / raw)
  To: torvalds; +Cc: david, xfs

Hi Linus,

My finger slipped and I accidentally added you to the To: list on this
new series.  This series needs to go through review on linux-xfs; when
this is ready to go I (or Dave) will send you a pull request.

Sorry about the noise.

--D

On Tue, Apr 04, 2023 at 10:07:00AM -0700, Darrick J. Wong wrote:
> Hi all,
> 
> Last week, I was fiddling around with the metadump name obfuscation code
> while writing a debugger command to generate directories full of names
> that all have the same hash name.  I had a few questions about how well
> all that worked with ascii-ci mode, and discovered a nasty discrepancy
> between the kernel and glibc's implementations of the tolower()
> function.
> 
> I discovered that I could create a directory that is large enough to
> require separate leaf index blocks.  The hashes stored in the dabtree
> use the ascii-ci specific hash function, which uses a library function
> to convert the name to lowercase before hashing.  If the kernel and C
> library's versions of tolower do not behave exactly identically,
> xfs_ascii_ci_hashname will not produce the same results for the same
> inputs.  xfs_repair will deem the leaf information corrupt and rebuild
> the directory.  After that, lookups in the kernel will fail because the
> hash index doesn't work.
> 
> The kernel's tolower function will convert extended ascii uppercase
> letters (e.g. A-with-umlaut) to extended ascii lowercase letters (e.g.
> a-with-umlaut), whereas glibc's will only do that if you force LANG to
> ascii.  Tiny embedded libc implementations just plain won't do it at
> all, and the result is a mess.  Stabilize the behavior of the hash
> function by encoding the kernel's tolower function in libxfs, add it to
> the selftest, and fix xfs_scrub not handling this correctly.
> 
> If you're going to start using this mess, you probably ought to just
> pull from my git trees, which are linked below.
> 
> This is an extraordinary way to destroy everything.  Enjoy!
> Comments and questions are, as always, welcome.
> 
> --D
> 
> kernel git tree:
> https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fix-asciici-tolower-6.3
> ---
>  fs/xfs/libxfs/xfs_dir2.c |    4 -
>  fs/xfs/libxfs/xfs_dir2.h |   20 ++++
>  fs/xfs/scrub/dir.c       |    7 +-
>  fs/xfs/xfs_dahash_test.c |  211 ++++++++++++++++++++++++----------------------
>  4 files changed, 139 insertions(+), 103 deletions(-)
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation
  2023-04-04 17:07 ` [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation Darrick J. Wong
@ 2023-04-04 17:54   ` Linus Torvalds
  2023-04-04 18:32     ` Darrick J. Wong
  2023-04-05 10:48   ` Christoph Hellwig
  1 sibling, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2023-04-04 17:54 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david

On Tue, Apr 4, 2023 at 10:07 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> +       if (c >= 0xc0 && c <= 0xd6)     /* latin A-O with accents */
> +               return true;
> +       if (c >= 0xd8 && c <= 0xde)     /* latin O-Y with accents */
> +               return true;

Please don't do this.

We're not in the dark ages any more. We don't do crazy locale-specific
crud. There is no such thing as "latin1" any more in any valid model.

For example, it is true that 0xC4 is 'Ä' in Latin1, and that the
lower-case version is 'ä', and you can do the lower-casing exactly the
same way as you do for US-ASCII: you just set bit 5 (or "add 32" or
"subtract 0xE0" - the latter is what you seem to do, crazy as it is).

So the above was fine back in the 80s, and questionably correct in the
90s, but it is COMPLETE GARBAGE to do this in the year 2023.

Because 'Ä' today is *not* 0xC4. It is "0xC3 0x84" (in the sanest
simplest form), and your crazy "tolower" will turn that into "0xE3
0x84", and that not only is no longer 'ä', it's not even valid UTF-8
any  more.

I realize that filesystem people really don't get this, but
case-insensitivity is pure and utter CRAP. Really. You *cannot* do
case sensitivity well. It's impossible. It's either locale-dependent,
or you have to have translation models for Unicode characters that are
horrifically slow and even then you *will* get it wrong, because you
will start asking questions about normalization forms, and the end
result is an UNMITIGATED DISASTER.

I wish filesystem people just finally understood this.  FAT was not a
good filesystem.  HFS+ is garbage. And any network filesystem that
does this needs to pass locale information around and do it per-mount,
not on disk.

Because you *will* get it wrong. It's that simple. The only sane model
these days is Unicode, and the only sane encoding for Unicode is
UTF-8, but even given those objectively true facts, you have

 (a) people who are going to use some internal locale, because THEY
HAVE TO. Maybe they have various legacy things, and they use Shift-JIS
or Latin1, and they really treat filenames that way.

 (b) you will have people who disagree about normal forms. NFC is the
only sane case, but you *will* have people who use other forms. OS X
got this completely wrong, and it causes real issues.

 (c) you'll find that "lower-case" isn't even well-defined for various
characters (the typical example is German 'ß', but there are lots of
them)

 (d) and then you'll hit the truly crazy cases with "what about
compatibility equivalence". You'll find that even in English with NBSP
vs regular SPACE, but it gets crazy.

End result: the only well-defined area is US-ASCII. Nothing else is
even *remotely* clear. Don't touch it. You *will* screw up.

Now, if you *only* use this for hashing, maybe you will feel like "you
will screw up" is not such a big deal.

But people will wonder why the file 'Björn' does not compare equal to
the file 'BJÖRN' when in a sane locale, but then *does* compare equal
if they happen to use a legacy Latin1 one.

So no. Latin1 isn't that special, and if you special-case them, you
*will* screw up other locales.

The *only* situation where 'tolower()' and 'toupper()' are valid is
for US-ASCII.

And when you compare to glibc, you only compare to "some random locale
that happens to be active rigth n ow". Something that the kernel
itself cannot and MUST NOT do.

                Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] xfs: test the ascii case-insensitive hash
  2023-04-04 17:07 ` [PATCH 2/3] xfs: test the ascii case-insensitive hash Darrick J. Wong
@ 2023-04-04 18:06   ` Linus Torvalds
  2023-04-04 20:51     ` Darrick J. Wong
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2023-04-04 18:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david

On Tue, Apr 4, 2023 at 10:07 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
\> Now that we've made kernel and userspace use the same tolower code for
> computing directory index hashes, add that to the selftest code.

Please just delete this test. It's really really fundamentally wrong.

The fact that you even *think* that you use the same tolower() as user
space does shows just that you don't even understand how user space
works.

Really. The only thing this series shows is that you do not understand
the complexities.

Lookie here: compile and run this program:

    #include <stdio.h>
    #include <ctype.h>
    #include <locale.h>

    int main(int argc, char **argv)
    {
        printf("tolower(0xc4)=%#x\n", tolower(0xc4));
        setlocale(LC_ALL, "C");
        printf("tolower(0xc4)=%#x\n", tolower(0xc4));
        setlocale(LC_ALL, "sv_SE.iso88591");
        printf("tolower(0xc4)=%#x\n", tolower(0xc4));
    }

and on my machine, I get this:

    tolower(0xc4)=0xc4
    tolower(0xc4)=0xc4
    tolower(0xc4)=0xe4

and the important thing to note is that "on my machine". The first
line could be *different* on some other machine (and the last line
could be too: there's no guarantee that the sv_SE locale even exists).

So this whole "kernel and userspace use the same tolower code"
sentence is simply completely and utterly wrong. It's not even "wrong"
in the sense fo "that's not true". It's "wrong" in the sense "that
shows that you didn't understand the problem at all".

Put another way: saying "5+8=10" is wrong. But saying "5+8=tiger" is
nonsensical.

Your patches are nonsensical.

               Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace
  2023-04-04 17:17 ` [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
@ 2023-04-04 18:19   ` Linus Torvalds
  2023-04-04 20:21     ` Linus Torvalds
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2023-04-04 18:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, xfs

On Tue, Apr 4, 2023 at 10:17 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> My finger slipped and I accidentally added you to the To: list on this
> new series.  This series needs to go through review on linux-xfs; when
> this is ready to go I (or Dave) will send you a pull request.
>
> Sorry about the noise.

No, it's good. See my rant emails.

Case insensitivity is a subject near and dear to my heart.

And when I say "near and dear to my heart", I obviously mean "I
despise it, and I have spent _decades_ ranting against it and trying
to explain why it's a problem".

I've ranted against it before, but it's been probably a decade since I
needed to. I was hoping this was long behind us.

I'm surprised people still think it's even acceptable.

FAT isn't known for being a great filesystem. HFS+ was garbage, but
even Apple eventually got the message and tried to make APFS better.
I'm not sure Apple succeeded (maybe the case insensitivity is still
the default on OS X, but apparently at least iOS got the message).

Every single case-insensitive filesystem in the world HAS BEEN CRAP.
Please. Learn from literally decades of reality, and stop doing it.

And if you have to do it for some legacy reason, at least understand
the problems with it. Understand that you DO NOT KNOW what the user
space locale is.

Limiting yourself to US-ASCII is at least technically valid. Because
EBCDIC isn't worth worrying about.  But when the high bit is set, you
had better not touch it, or you need to limit it spectacularly.

                   Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation
  2023-04-04 17:54   ` Linus Torvalds
@ 2023-04-04 18:32     ` Darrick J. Wong
  2023-04-04 18:58       ` Linus Torvalds
                         ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-04 18:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-xfs, david

On Tue, Apr 04, 2023 at 10:54:27AM -0700, Linus Torvalds wrote:
> On Tue, Apr 4, 2023 at 10:07 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > +       if (c >= 0xc0 && c <= 0xd6)     /* latin A-O with accents */
> > +               return true;
> > +       if (c >= 0xd8 && c <= 0xde)     /* latin O-Y with accents */
> > +               return true;
> 
> Please don't do this.
> 
> We're not in the dark ages any more. We don't do crazy locale-specific
> crud. There is no such thing as "latin1" any more in any valid model.
> 
> For example, it is true that 0xC4 is 'Ä' in Latin1, and that the
> lower-case version is 'ä', and you can do the lower-casing exactly the
> same way as you do for US-ASCII: you just set bit 5 (or "add 32" or
> "subtract 0xE0" - the latter is what you seem to do, crazy as it is).
> 
> So the above was fine back in the 80s, and questionably correct in the
> 90s, but it is COMPLETE GARBAGE to do this in the year 2023.

Yeah, I get that.  Fifteen years ago, Barry Naujok and Christoph merged
this weird ascii-ci feature for XFS that purportedly does ... something.
It clearly only works properly if you force userspace to use latin1,
which is totally nuts in 2023 given that the distros default to UTF8
and likely don't test anything else.  It probably wasn't even a good
idea in *2008*, but it went in anyway.  Nobody tested this feature,
metadump breaks with this feature enabled, but as maintainer I get to
maintain these poorly designed half baked projects.

I wouldn't ever enable this feature on any computer I use, and I think
the unicode case-insensitive stuff that's been put in to ext4 and f2fs
lately are not a tarpit that I ever want to visit in XFS.  Directory
names should be sequences of bytes that don't include nulls or slashes,
end of story.

Frankly, I'd rather just drop the entire ascii-ci feature from XFS.  I
doubt anyone's really using it given that xfs_repair will corrupt the
filesystem.  This patch encodes the isupper/tolower code from the
kernel's lib/ctype.c into libxfs so that repair will stop corrupting
these filesystems, nothing more.

We're not introducing any new functionality, just making stupid code
less broken.  I wasn't around let alone maintainer when this feature was
committed, and I wouldn't let it in now.  I am, however, the stuckee who
has to clean up all this shit in the least impactful way I can devise.

Can we instead simply drop ascii-ci support from Linux 6.3i and abruptly
break all those filesystems?  Even though we're past -rc5 now?  That
would make all of this baloney go away, at a cost of breaking userspace.

> Because 'Ä' today is *not* 0xC4. It is "0xC3 0x84" (in the sanest
> simplest form), and your crazy "tolower" will turn that into "0xE3
> 0x84", and that not only is no longer 'ä', it's not even valid UTF-8
> any  more.
> 
> I realize that filesystem people really don't get this, but
> case-insensitivity is pure and utter CRAP. Really. You *cannot* do
> case sensitivity well. It's impossible. It's either locale-dependent,
> or you have to have translation models for Unicode characters that are
> horrifically slow and even then you *will* get it wrong, because you
> will start asking questions about normalization forms, and the end
> result is an UNMITIGATED DISASTER.

Agreed!

> I wish filesystem people just finally understood this.  FAT was not a
> good filesystem.  HFS+ is garbage. And any network filesystem that
> does this needs to pass locale information around and do it per-mount,
> not on disk.
> 
> Because you *will* get it wrong. It's that simple. The only sane model
> these days is Unicode, and the only sane encoding for Unicode is
> UTF-8, but even given those objectively true facts, you have
> 
>  (a) people who are going to use some internal locale, because THEY
> HAVE TO. Maybe they have various legacy things, and they use Shift-JIS
> or Latin1, and they really treat filenames that way.
> 
>  (b) you will have people who disagree about normal forms. NFC is the
> only sane case, but you *will* have people who use other forms. OS X
> got this completely wrong, and it causes real issues.
> 
>  (c) you'll find that "lower-case" isn't even well-defined for various
> characters (the typical example is German 'ß', but there are lots of
> them)
> 
>  (d) and then you'll hit the truly crazy cases with "what about
> compatibility equivalence". You'll find that even in English with NBSP
> vs regular SPACE, but it gets crazy.
> 
> End result: the only well-defined area is US-ASCII. Nothing else is
> even *remotely* clear. Don't touch it. You *will* screw up.
> 
> Now, if you *only* use this for hashing, maybe you will feel like "you
> will screw up" is not such a big deal.
> 
> But people will wonder why the file 'Björn' does not compare equal to
> the file 'BJÖRN' when in a sane locale, but then *does* compare equal
> if they happen to use a legacy Latin1 one.
> 
> So no. Latin1 isn't that special, and if you special-case them, you
> *will* screw up other locales.
> 
> The *only* situation where 'tolower()' and 'toupper()' are valid is
> for US-ASCII.
> 
> And when you compare to glibc, you only compare to "some random locale
> that happens to be active rigth n ow". Something that the kernel
> itself cannot and MUST NOT do.

What then is the point of having tolower in the kernel at all?

--D

>                 Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation
  2023-04-04 18:32     ` Darrick J. Wong
@ 2023-04-04 18:58       ` Linus Torvalds
  2023-04-04 23:30       ` Dave Chinner
  2023-04-05  6:12       ` Christoph Hellwig
  2 siblings, 0 replies; 26+ messages in thread
From: Linus Torvalds @ 2023-04-04 18:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david

On Tue, Apr 4, 2023 at 11:32 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> >
> > And when you compare to glibc, you only compare to "some random locale
> > that happens to be active rigth n ow". Something that the kernel
> > itself cannot and MUST NOT do.
>
> What then is the point of having tolower in the kernel at all?

It's perfectly fine for US-ASCII. So together with 'isascii()' is is just fine.

Now, if you ask me why the data itself isn't then just limited to
US-ASCII, I can only say "history and bad drugs".

The Linux tolower() goes back to Linux-0.01, and my original version
actually got this right, and left all the upper 128 characters as 0 in
the _ctype[] array.

But then at some point, we failed at life, and started filling in the
upper bit cases too.

Looking around, it was at Linux-2.0.1, back in 1996. It's way before
we had good changelogs, so I can't really say *why* we did that
change, but I do believe that bad taste was involved.

But at least it was *somewhat* reasonable to do a Latin1-based ctype
back in 1996:

  --- v2.0.0/linux/lib/ctype.c Mon Nov 27 15:53:48 1995
  +++ linux/lib/ctype.c Tue Jul  2 19:08:43 1996

I would not object to going back to the proper US-ASCII only version
today, but I fear that we might have a lot of subtle legacy use ;(

                   Linus

PS Heh, and now that I look at my original ctype.h, find the bug.
Clearly that wasn't *used*:

  #define tolower(c) (_ctmp=c,isupper(_ctmp)?_ctmp+('a'+'A'):_ctmp)
  #define toupper(c) (_ctmp=c,islower(_ctmp)?_ctmp+('A'-'a'):_ctmp)

and they weren't fixed until 0.11 - probably because nothing actually used them.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace
  2023-04-04 18:19   ` Linus Torvalds
@ 2023-04-04 20:21     ` Linus Torvalds
  2023-04-04 21:00       ` Darrick J. Wong
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2023-04-04 20:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, xfs

On Tue, Apr 4, 2023 at 11:19 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Limiting yourself to US-ASCII is at least technically valid. Because
> EBCDIC isn't worth worrying about.  But when the high bit is set, you
> had better not touch it, or you need to limit it spectacularly.

Side note: if limiting it to US-ASCII is fine (and it had better be,
because as mentioned, anything else will result in unresolvable
problems), you might look at using this as the pre-hash function:

    unsigned char prehash(unsigned char c)
    {
        unsigned char mask = (~(c >> 1) & c & 64) >> 1;
        return c & ~mask;
    }

which does modify a few other characters too, but nothing that matters
for hashing.

The advantage of the above is that you can trivially vectorize it. You
can do it with just regular integer math (64 bits = 8 bytes in
parallel), no need to use *actual* vector hardware.

The actual comparison needs to do the careful thing (because '~' and
'^' may hash to the same value, but obviously aren't the same), but
even there you can do a cheap "are these 8 characters _possibly_ the
same) with a very simple single 64-bit comparison, and only go to the
careful path if things match, ie

    /* Cannot possibly be equal even case-insentivitely? */
    if ((word1 ^ word2) & ~0x2020202020202020ul)
        continue;
    /* Ok, same in all but the 5th bits, go be careful */
    ....

and the reason I mention this is because I have been idly thinking
about supporting case-insensitivity at the VFS layer for multiple
decades, but have always decided that it's *so* nasty that I really
was hoping it just is never an issue in practice.

Particularly since the low-level filesystems then inevitably decide
that they need to do things wrong and need a locale, and at that point
all hope is lost.

I was hoping xfs would be one of the sane filesystems.

               Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] xfs: test the ascii case-insensitive hash
  2023-04-04 18:06   ` Linus Torvalds
@ 2023-04-04 20:51     ` Darrick J. Wong
  2023-04-04 21:21       ` Linus Torvalds
  0 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-04 20:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-xfs, david, Christoph Hellwig

[now adding hch because his name is on the original patches from 2008]

On Tue, Apr 04, 2023 at 11:06:27AM -0700, Linus Torvalds wrote:
> On Tue, Apr 4, 2023 at 10:07 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> \> Now that we've made kernel and userspace use the same tolower code for
> > computing directory index hashes, add that to the selftest code.
> 
> Please just delete this test. It's really really fundamentally wrong.
> 
> The fact that you even *think* that you use the same tolower() as user
> space does shows just that you don't even understand how user space
> works.

Wrong.  I'm well aware that userspace tolower and kernel tolower are
*NOT* the same thing.  I'm trying to **STOP USING** tolower in XFS.

I'm **replacing** tolower with a new function that substitutes specific
bytes with other bytes, and I'm redefining the ondisk format to say that
names remain arbitrary sequences of bytes that do not include nulls or
slashes but that hash indexing and lookups will apply the above byte
transformation.  The rules for that transformation may or may not
coincide with what anyone thinks is "upper case in ASCII", but that's
irrelevant.  The rules will be the same in all places, and they will not
break any existing filesystems.

Maybe I should've named it xfs_ItB8n_ci_o43jM28() to make it clearer
that I don't care what ascii is, nor does it matter here.  Perhaps I
should have written "Now that we've made kernel and userspace perform
the same mathematical transformation on dirent name byte arrays before
hashing, add that to the selftest code as well."

Christoph and Barry Naujok should have defined specifically the exact
transformation and the permitted ranges of name inputs when they wrote
the feature.  I wasn't around when this feature was invented...

> Really. The only thing this series shows is that you do not understand
> the complexities.

...and I don't think they understood the complexities when the code was
written.

> Lookie here: compile and run this program:
> 
>     #include <stdio.h>
>     #include <ctype.h>
>     #include <locale.h>
> 
>     int main(int argc, char **argv)
>     {
>         printf("tolower(0xc4)=%#x\n", tolower(0xc4));
>         setlocale(LC_ALL, "C");
>         printf("tolower(0xc4)=%#x\n", tolower(0xc4));
>         setlocale(LC_ALL, "sv_SE.iso88591");
>         printf("tolower(0xc4)=%#x\n", tolower(0xc4));
>     }
> 
> and on my machine, I get this:
> 
>     tolower(0xc4)=0xc4
>     tolower(0xc4)=0xc4
>     tolower(0xc4)=0xe4
> 
> and the important thing to note is that "on my machine". The first
> line could be *different* on some other machine (and the last line
> could be too: there's no guarantee that the sv_SE locale even exists).
> 
> So this whole "kernel and userspace use the same tolower code"
> sentence is simply completely and utterly wrong. It's not even "wrong"
> in the sense fo "that's not true". It's "wrong" in the sense "that
> shows that you didn't understand the problem at all".
> 
> Put another way: saying "5+8=10" is wrong. But saying "5+8=tiger" is
> nonsensical.
> 
> Your patches are nonsensical.

I disagree.  I'm saying that 5 💩 8 = tiger because that's what the 💩
operator does.  💩 is not +, even if 4 💩 8 = 12.

You claim to understand the complexities; how would /you/ fix this?
I'll send along the test cases that reproduce the problems.

--D

>                Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace
  2023-04-04 20:21     ` Linus Torvalds
@ 2023-04-04 21:00       ` Darrick J. Wong
  2023-04-04 21:50         ` Linus Torvalds
  0 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-04 21:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: david, xfs

On Tue, Apr 04, 2023 at 01:21:25PM -0700, Linus Torvalds wrote:
> On Tue, Apr 4, 2023 at 11:19 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Limiting yourself to US-ASCII is at least technically valid. Because
> > EBCDIC isn't worth worrying about.  But when the high bit is set, you
> > had better not touch it, or you need to limit it spectacularly.
> 
> Side note: if limiting it to US-ASCII is fine (and it had better be,
> because as mentioned, anything else will result in unresolvable
> problems), you might look at using this as the pre-hash function:
> 
>     unsigned char prehash(unsigned char c)
>     {
>         unsigned char mask = (~(c >> 1) & c & 64) >> 1;
>         return c & ~mask;
>     }
> 
> which does modify a few other characters too, but nothing that matters
> for hashing.
> 
> The advantage of the above is that you can trivially vectorize it. You
> can do it with just regular integer math (64 bits = 8 bytes in
> parallel), no need to use *actual* vector hardware.
> 
> The actual comparison needs to do the careful thing (because '~' and
> '^' may hash to the same value, but obviously aren't the same), but
> even there you can do a cheap "are these 8 characters _possibly_ the
> same) with a very simple single 64-bit comparison, and only go to the
> careful path if things match, ie
> 
>     /* Cannot possibly be equal even case-insentivitely? */
>     if ((word1 ^ word2) & ~0x2020202020202020ul)
>         continue;
>     /* Ok, same in all but the 5th bits, go be careful */
>     ....
> 
> and the reason I mention this is because I have been idly thinking
> about supporting case-insensitivity at the VFS layer for multiple
> decades, but have always decided that it's *so* nasty that I really
> was hoping it just is never an issue in practice.

If it were up to me I'd invent a userspace shim fs that would perform
whatever normalizations are desired, and pass that (and ideally a lookup
hash) to the underlying kernel/fs.  Users can configure whatever
filtering they want and set LC_ALL as they please, and we kernel
developers never have to know, and the users never have to see what
actually gets written to disk.  If users want normalized ci lookups, the
shim can do that.

ext4 tried to do better than XFS by actually defining the mathematical
transformation that would be applied to incoming names and refusing
things that would devolve into brokenness, but then it turns out that it
was utf8_data.c.  Urgh.

I get it, shi* fses are not popular and are not fast, but if the Samba
benchmarks are still valid, multiple kernel<->fuserspace transitions are
still faster that their workaround.

> Particularly since the low-level filesystems then inevitably decide
> that they need to do things wrong and need a locale, and at that point
> all hope is lost.
> 
> I was hoping xfs would be one of the sane filesystems.

Hah, nope, I'm all out of sanity here. :(

--D

>                Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] xfstests: add a couple more tests for ascii-ci problems
  2023-04-04 17:07 [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
                   ` (3 preceding siblings ...)
  2023-04-04 17:17 ` [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
@ 2023-04-04 21:09 ` Darrick J. Wong
  4 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-04 21:09 UTC (permalink / raw)
  To: torvalds; +Cc: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

Add some tests to make sure that userspace and the kernel actually
agree on how to do ascii case-insensitive directory lookups, and that
metadump can actually obfuscate such filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/859     |   63 ++++++++++++++++++++++++++++++++++
 tests/xfs/859.out |   24 +++++++++++++
 tests/xfs/860     |   99 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/860.out |    9 +++++
 tests/xfs/861     |   90 ++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/861.out |    2 +
 6 files changed, 287 insertions(+)
 create mode 100755 tests/xfs/859
 create mode 100644 tests/xfs/859.out
 create mode 100755 tests/xfs/860
 create mode 100644 tests/xfs/860.out
 create mode 100755 tests/xfs/861
 create mode 100644 tests/xfs/861.out

diff --git a/tests/xfs/859 b/tests/xfs/859
new file mode 100755
index 0000000000..da6c70632c
--- /dev/null
+++ b/tests/xfs/859
@@ -0,0 +1,63 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2023 Oracle.  All Rights Reserved.
+#
+# FS QA Test 859
+#
+# Make sure that the kernel and userspace agree on which byte sequences are
+# ASCII uppercase letters, and how to convert them.
+#
+. ./common/preamble
+_begin_fstest auto ci dir
+
+# Override the default cleanup function.
+# _cleanup()
+# {
+# 	cd /
+# 	rm -r -f $tmp.*
+# }
+
+# Import common functions.
+. ./common/filter
+
+_fixed_by_kernel_commit XXXXXXXXXXXX "xfs: stabilize the tolower function used for ascii-ci dir hash computation"
+_fixed_by_kernel_commit XXXXXXXXXXXX "xfs: use the directory name hash function for dir scrubbing"
+
+_supported_fs xfs
+_require_scratch
+
+_scratch_mkfs -n version=ci > $seqres.full
+_scratch_mount
+
+# Create a two-block directory to force leaf format
+mkdir "$SCRATCH_MNT/lol"
+touch "$SCRATCH_MNT/lol/autoexec.bat"
+i=0
+dblksz=$(_xfs_get_dir_blocksize "$SCRATCH_MNT")
+nr_dirents=$((dblksz * 2 / 256))
+
+for ((i = 0; i < nr_dirents; i++)); do
+	name="$(printf "y%0254d" $i)"
+	ln "$SCRATCH_MNT/lol/autoexec.bat" "$SCRATCH_MNT/lol/$name"
+done
+
+dirsz=$(stat -c '%s' $SCRATCH_MNT/lol)
+test $dirsz -gt $dblksz || echo "dir size $dirsz, expected at least $dblksz?"
+stat $SCRATCH_MNT/lol >> $seqres.full
+
+# Create names with extended ascii characters in them to exploit the fact
+# that the Linux kernel will transform extended ASCII uppercase characters
+# but libc won't.  Need to force LANG=C here so that awk doesn't spit out utf8
+# sequences.
+old_lang="$LANG"
+LANG=C
+awk 'END { for (i = 192; i < 247; i++) printf("%c\n", i); }' < /dev/null | while read name; do
+	ln "$SCRATCH_MNT/lol/autoexec.bat" "$SCRATCH_MNT/lol/$name" 2>&1 | _filter_scratch
+done
+
+LANG=$old_lang
+
+# Now just let repair run
+
+status=0
+exit
diff --git a/tests/xfs/859.out b/tests/xfs/859.out
new file mode 100644
index 0000000000..a4939ba670
--- /dev/null
+++ b/tests/xfs/859.out
@@ -0,0 +1,24 @@
+QA output created by 859
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\340': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\341': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\342': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\343': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\344': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\345': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\346': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\347': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\350': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\351': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\352': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\353': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\354': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\355': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\356': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\357': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\360': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\361': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\362': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\363': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\364': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\365': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\366': File exists
diff --git a/tests/xfs/860 b/tests/xfs/860
new file mode 100755
index 0000000000..524d0cfa95
--- /dev/null
+++ b/tests/xfs/860
@@ -0,0 +1,99 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2023 Oracle.  All Rights Reserved.
+#
+# FS QA Test 860
+#
+# Make sure that metadump obfuscation works for filesystems with ascii-ci
+# enabled.
+#
+. ./common/preamble
+_begin_fstest auto dir ci
+
+_cleanup()
+{
+      cd /
+      rm -r -f $tmp.* $testdir
+}
+
+_fixed_by_git_commit xfsprogs XXXXXXXXXXX "xfs_db: fix metadump name obfuscation for ascii-ci filesystems"
+
+_supported_fs xfs
+_require_test
+_require_scratch
+
+_scratch_mkfs -n version=ci > $seqres.full
+_scratch_mount
+
+# Create a two-block directory to force leaf format
+mkdir "$SCRATCH_MNT/lol"
+touch "$SCRATCH_MNT/lol/autoexec.bat"
+i=0
+dblksz=$(_xfs_get_dir_blocksize "$SCRATCH_MNT")
+nr_dirents=$((dblksz * 2 / 256))
+
+for ((i = 0; i < nr_dirents; i++)); do
+	name="$(printf "y%0254d" $i)"
+	ln "$SCRATCH_MNT/lol/autoexec.bat" "$SCRATCH_MNT/lol/$name"
+done
+
+dirsz=$(stat -c '%s' $SCRATCH_MNT/lol)
+test $dirsz -gt $dblksz || echo "dir size $dirsz, expected at least $dblksz?"
+stat $SCRATCH_MNT/lol >> $seqres.full
+
+# Create a two-block attr to force leaf format
+i=0
+for ((i = 0; i < nr_dirents; i++)); do
+	name="$(printf "user.%0250d" $i)"
+	$SETFATTR_PROG -n "$name" -v 1 "$SCRATCH_MNT/lol/autoexec.bat"
+done
+stat $SCRATCH_MNT/lol/autoexec.bat >> $seqres.full
+
+_scratch_unmount
+
+testdir=$TEST_DIR/$seq.metadumps
+mkdir -p $testdir
+metadump_file=$testdir/scratch.md
+metadump_file_a=${metadump_file}.a
+metadump_file_o=${metadump_file}.o
+metadump_file_ao=${metadump_file}.ao
+
+echo metadump
+_scratch_xfs_metadump $metadump_file >> $seqres.full
+
+echo metadump a
+_scratch_xfs_metadump $metadump_file_a -a >> $seqres.full
+
+echo metadump o
+_scratch_xfs_metadump $metadump_file_o -o >> $seqres.full
+
+echo metadump ao
+_scratch_xfs_metadump $metadump_file_ao -a -o >> $seqres.full
+
+echo mdrestore
+_scratch_xfs_mdrestore $metadump_file
+_scratch_mount
+_check_scratch_fs
+_scratch_unmount
+
+echo mdrestore a
+_scratch_xfs_mdrestore $metadump_file_a
+_scratch_mount
+_check_scratch_fs
+_scratch_unmount
+
+echo mdrestore o
+_scratch_xfs_mdrestore $metadump_file_o
+_scratch_mount
+_check_scratch_fs
+_scratch_unmount
+
+echo mdrestore ao
+_scratch_xfs_mdrestore $metadump_file_ao
+_scratch_mount
+_check_scratch_fs
+_scratch_unmount
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/860.out b/tests/xfs/860.out
new file mode 100644
index 0000000000..136fc5f7d6
--- /dev/null
+++ b/tests/xfs/860.out
@@ -0,0 +1,9 @@
+QA output created by 860
+metadump
+metadump a
+metadump o
+metadump ao
+mdrestore
+mdrestore a
+mdrestore o
+mdrestore ao
diff --git a/tests/xfs/861 b/tests/xfs/861
new file mode 100755
index 0000000000..7b0a37a3f1
--- /dev/null
+++ b/tests/xfs/861
@@ -0,0 +1,90 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2023 Oracle.  All Rights Reserved.
+#
+# FS QA Test 861
+#
+# Make sure that the kernel and utilities can handle large numbers of dirhash
+# collisions in both the directory and extended attribute structures.
+#
+# This started as a regression test for the new 'hashcoll' function in xfs_db,
+# but became a regression test for an xfs_repair bug affecting hashval checks
+# applied to the second and higher node levels of a dabtree.
+#
+. ./common/preamble
+_begin_fstest auto dir
+
+_fixed_by_git_commit xfsprogs b7b81f336ac "xfs_repair: fix incorrect dabtree hashval comparison"
+
+_supported_fs xfs
+_require_xfs_db_command "hashcoll"
+_require_xfs_db_command "path"
+_require_scratch
+
+_scratch_mkfs > $seqres.full
+_scratch_mount
+
+crash_dir=$SCRATCH_MNT/lol/
+crash_attrs=$SCRATCH_MNT/hah
+
+mkdir -p "$crash_dir"
+touch "$crash_attrs"
+
+# Create enough dirents to fill two dabtree node blocks with names that all
+# hash to the same value.  Each dirent gets its own record in the dabtree,
+# so we must create enough dirents to get a dabtree of at least height 2.
+dblksz=$(_xfs_get_dir_blocksize "$SCRATCH_MNT")
+
+da_records_per_block=$((dblksz / 8))	# 32-bit hash and 32-bit before
+nr_dirents=$((da_records_per_block * 2))
+
+longname="$(mktemp --dry-run "$(perl -e 'print "X" x 255;')" | tr ' ' 'X')"
+echo "creating $nr_dirents dirents from '$longname'" >> $seqres.full
+_scratch_xfs_db -r -c "hashcoll -n $nr_dirents -p $crash_dir $longname"
+
+# Create enough xattrs to fill two dabtree nodes.  Each attribute leaf block
+# gets its own record in the dabtree, so we have to create enough attr blocks
+# (each full of attrs) to get a dabtree of at least height 2.
+blksz=$(_get_block_size "$SCRATCH_MNT")
+
+attr_records_per_block=$((blksz / 255))
+da_records_per_block=$((blksz / 8))	# 32-bit hash and 32-bit before
+nr_attrs=$((da_records_per_block * attr_records_per_block * 2))
+
+longname="$(mktemp --dry-run "$(perl -e 'print "X" x 249;')" | tr ' ' 'X')"
+echo "creating $nr_attrs attrs from '$longname'" >> $seqres.full
+_scratch_xfs_db -r -c "hashcoll -a -n $nr_attrs -p $crash_attrs $longname"
+
+_scratch_unmount
+
+# Make sure that there's one hash value dominating the dabtree block.
+# We don't require 100% because directories create dabtree records for dot
+# and dotdot.
+filter_hashvals() {
+	uniq -c | awk -v seqres_full="$seqres.full" \
+		'{print $0 >> seqres_full; tot += $1; if ($1 > biggest) biggest = $1;} END {if (biggest >= (tot - 2)) exit(0); exit(1);}'
+	test "${PIPESTATUS[1]}" -eq 0 || \
+		echo "Scattered dabtree hashes?  See seqres.full"
+}
+
+# Did we actually get a two-level dabtree for the directory?  Does it contain a
+# long run of hashes?
+echo "dir check" >> $seqres.full
+da_node_block_offset=$(( (2 ** 35) / blksz ))
+dir_db_args=(-c 'path /lol/' -c "dblock $da_node_block_offset" -c 'addr nbtree[0].before')
+dir_count="$(_scratch_xfs_db "${dir_db_args[@]}" -c 'print lhdr.count' | awk '{print $3}')"
+_scratch_xfs_db "${dir_db_args[@]}" -c "print lents[0-$((dir_count - 1))].hashval" | sed -e 's/lents\[[0-9]*\]/lents[NN]/g' | filter_hashvals
+
+# Did we actually get a two-level dabtree for the attrs?  Does it contain a
+# long run of hashes?
+echo "attr check" >> $seqres.full
+attr_db_args=(-c 'path /hah' -c "ablock 0" -c 'addr btree[0].before')
+attr_count="$(_scratch_xfs_db "${attr_db_args[@]}" -c 'print hdr.count' | awk '{print $3}')"
+_scratch_xfs_db "${attr_db_args[@]}" -c "print btree[0-$((attr_count - 1))].hashval" | sed -e 's/btree\[[0-9]*\]/btree[NN]/g' | filter_hashvals
+
+# Remount to get some coverage of xfs_scrub before seeing if xfs_repair
+# will trip over the large dabtrees.
+echo Silence is golden
+_scratch_mount
+status=0
+exit
diff --git a/tests/xfs/861.out b/tests/xfs/861.out
new file mode 100644
index 0000000000..d11b76c82e
--- /dev/null
+++ b/tests/xfs/861.out
@@ -0,0 +1,2 @@
+QA output created by 861
+Silence is golden

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] xfs: test the ascii case-insensitive hash
  2023-04-04 20:51     ` Darrick J. Wong
@ 2023-04-04 21:21       ` Linus Torvalds
  2023-04-05  6:15         ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2023-04-04 21:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, Christoph Hellwig

On Tue, Apr 4, 2023 at 1:51 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> Wrong.  I'm well aware that userspace tolower and kernel tolower are
> *NOT* the same thing.  I'm trying to **STOP USING** tolower in XFS.

Ok, so you're not actually talking about "userspace tolower()".

You are actually talking about just the same function in xfstools and
the kernel, and neither is really "tolower()", but is a
"xfs_hashprep()".

Fair enough. That works. I still think it should be made to be
US-ASCII only, in order to not have any strange oddities.

Do you really have to support that "pseudo-latin1" thing? If it's
literally just a "xfs_hashprep()" function, and you arbitrarily pick
one random function, why make it be a known-broken one?

            Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace
  2023-04-04 21:00       ` Darrick J. Wong
@ 2023-04-04 21:50         ` Linus Torvalds
  0 siblings, 0 replies; 26+ messages in thread
From: Linus Torvalds @ 2023-04-04 21:50 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: david, xfs

On Tue, Apr 4, 2023 at 2:00 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> If it were up to me I'd invent a userspace shim fs that would perform
> whatever normalizations are desired, and pass that (and ideally a lookup
> hash) to the underlying kernel/fs.

No, really.

It's been done.

It sucks.

Guess what Windows has been doing for *decades* because of their
ass-backwards idiotic filesystem interfaces? Pretty much exactly that.

They had this disgusting wide-char thing for the native filename
interface, historically using something like UCS-2LE (sometimes called
UTF-16), and then people convert that back and forth in random ways.
Mostly in libraries so that it's mostly invisible to you, but

Almost exactly like that "If it were up to you".

It's BS. It's incredibly bad. It's taken them decades to get away from
that mistake (I _think_ the UCS-2 interfaces are all considered
legacy)

What Windows used to do (again: I _think_ they've gotten over it), was
to have those special native wchat_t interfaces like _wfopen(), and
then when you do a regular "fopen()" call it converts the "normal"
strings to the whcar_t interface for the actual system call. IOW,
doing exactly that "shim fs" thing wrt filenames.

Sprinkle special byte-order markers in there for completeness (because
UCS-2 was a mindblowingly bad idea along with UCS-4), and add ways to
pick locale on a per-call basis, and you have *really* completed the
unholy mess.

No. Don't do there.

There is absolutely *one* right way, and one right way only: bytes are
bytes. No locale crap, and no case insensitivity.

It avoids *all* the problems, doesn't need any silly conversions, and
just *works*.

Absolutely nobody sane actually wants case insensitivity. A byte is a
byte is a byte, and your pathname is a pure byte stream.

Did people learn *nothing* from that fundamental Unix lesson?  We had
all kinds of crap filesystems before unix, with 8.3 filenames, or
record-based file contents, or all kinds of stupid ideas.

Do you want your file contents to have a "this is Japanese" marker and
special encoding? No? Then why would you want your pathnames to do
that?

Just making it a byte stream avoids all of that.

Yes, we have special characters (notably NUL and '/'), and we have a
couple of special byte stream entries ("." and ".."), but those are
all solidly unambiguous. We can take US-ASCII as a given.

And once it's a byte stream, you can use it any way you want.

Now, the *sane* way is to then use UTF-8 on top of that byte stream,
and avoid all locale issues, but *if* some user space wants to treat
those bytes as Latin1 or as Shift-JS, it still "works" for them.

This is also the reason why a filesystem *MUST NOT* think the byte
stream is UTF-8, and do some kind of unicode normalization, and reject
- or replace - byte sequences that aren't valid UTF-8.

Thinking names should have some record-based structure is as wrong as
thinking that file contents should be record-based.

And *no*, you should not have some kind of "standardized translation
layer" in user space - that's just completely unnecessary overhead for
any actual sane user. It's exactly the wrong thing to do.

Others have been there, done that. Learn from their mistakes.

           Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation
  2023-04-04 18:32     ` Darrick J. Wong
  2023-04-04 18:58       ` Linus Torvalds
@ 2023-04-04 23:30       ` Dave Chinner
  2023-04-05  0:17         ` Linus Torvalds
  2023-04-05  6:12       ` Christoph Hellwig
  2 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2023-04-04 23:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Linus Torvalds, linux-xfs

On Tue, Apr 04, 2023 at 11:32:14AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 04, 2023 at 10:54:27AM -0700, Linus Torvalds wrote:
> > On Tue, Apr 4, 2023 at 10:07 AM Darrick J. Wong <djwong@kernel.org> wrote:
> > >
> > > +       if (c >= 0xc0 && c <= 0xd6)     /* latin A-O with accents */
> > > +               return true;
> > > +       if (c >= 0xd8 && c <= 0xde)     /* latin O-Y with accents */
> > > +               return true;
> > 
> > Please don't do this.
> > 
> > We're not in the dark ages any more. We don't do crazy locale-specific
> > crud. There is no such thing as "latin1" any more in any valid model.
> > 
> > For example, it is true that 0xC4 is 'Ä' in Latin1, and that the
> > lower-case version is 'ä', and you can do the lower-casing exactly the
> > same way as you do for US-ASCII: you just set bit 5 (or "add 32" or
> > "subtract 0xE0" - the latter is what you seem to do, crazy as it is).
> > 
> > So the above was fine back in the 80s, and questionably correct in the
> > 90s, but it is COMPLETE GARBAGE to do this in the year 2023.
> 
> Yeah, I get that.  Fifteen years ago, Barry Naujok and Christoph merged
> this weird ascii-ci feature for XFS that purportedly does ... something.
> It clearly only works properly if you force userspace to use latin1,
> which is totally nuts in 2023 given that the distros default to UTF8
> and likely don't test anything else.  It probably wasn't even a good
> idea in *2008*, but it went in anyway.  Nobody tested this feature,
> metadump breaks with this feature enabled, but as maintainer I get to
> maintain these poorly designed half baked projects.

It was written specifically for a NFS/CIFS fileserver appliance
product and Samba needed filesystem-side CI to be able to perform
even vaguely well on industry-standard fileserver benchmarketing
workloads that were all the rage at the time.

Because of the inherent problems with CI and UTF-8 encoding, etc, it
was decided that Samba would be configured to export latin1
encodings as that covered >90% of the target markets for the
product. Hence the "ascii-ci" casefolding code could be encoded into
the XFS directory operations and remove all the overhead of
casefolding from Samba.

In various "important" directory benchmarketing workloads, ascii-ci
resulted in speedups of 100-1000x. These were competitive results
comapred to the netapp/bluearc/etc appliances of the time in the
same cost bracket.  Essentially, it was a special case solution to
meet a specific product requirement and was never intended to be
used outside that specific controlled environment.

Realistically, this is the one major downside of "upstream first"
development principle.  i.e. when the vendor product that required
a specific feature is long gone, upstream still has to support that
functionality even though there may be no users of it remaining
and/or no good reason for it continuing to exist.

I'd suggest that after this is fixed we deprecate ascii-ci and make
it go away at the same time v4 filesystems go away. It was, after
all, a feature written for v4 filesystems....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation
  2023-04-04 23:30       ` Dave Chinner
@ 2023-04-05  0:17         ` Linus Torvalds
  0 siblings, 0 replies; 26+ messages in thread
From: Linus Torvalds @ 2023-04-05  0:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, linux-xfs

On Tue, Apr 4, 2023 at 4:30 PM Dave Chinner <david@fromorbit.com> wrote:
>
> Because of the inherent problems with CI and UTF-8 encoding, etc, it
> was decided that Samba would be configured to export latin1
> encodings as that covered >90% of the target markets for the
> product. Hence the "ascii-ci" casefolding code could be encoded into
> the XFS directory operations and remove all the overhead of
> casefolding from Samba.

Ahh. Ugh. All right, that at least explains the code.

Maybe this is documented somewhere, but I'm not finding anything
talking about that latin1 choice.

It may well have been just so obvious back in the day that people
didn't even bother saying so, and it's presumably close to what the
in-kernel tolower() has mostly done (even if that is probably a bad
thing).

I *would* ask that the xfs function that does "tolower" really then be
named "latin1". Not "ascii".

I'm not loving it, but hey, compatibility is certainly always a good reason.

               Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation
  2023-04-04 18:32     ` Darrick J. Wong
  2023-04-04 18:58       ` Linus Torvalds
  2023-04-04 23:30       ` Dave Chinner
@ 2023-04-05  6:12       ` Christoph Hellwig
  2023-04-05 15:40         ` Darrick J. Wong
  2 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2023-04-05  6:12 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Linus Torvalds, linux-xfs, david

On Tue, Apr 04, 2023 at 11:32:14AM -0700, Darrick J. Wong wrote:
> Yeah, I get that.  Fifteen years ago, Barry Naujok and Christoph merged
> this weird ascii-ci feature for XFS that purportedly does ... something.
> It clearly only works properly if you force userspace to use latin1,
> which is totally nuts in 2023 given that the distros default to UTF8
> and likely don't test anything else.  It probably wasn't even a good
> idea in *2008*, but it went in anyway.  Nobody tested this feature,
> metadump breaks with this feature enabled, but as maintainer I get to
> maintain these poorly designed half baked projects.

IIRC the idea was that it should do 7-bit ASCII only, so even accepting
Latin 1 characters seems like a bug compared to what it was documented
to do.

> I wouldn't ever enable this feature on any computer I use, and I think
> the unicode case-insensitive stuff that's been put in to ext4 and f2fs
> lately are not a tarpit that I ever want to visit in XFS.  Directory
> names should be sequences of bytes that don't include nulls or slashes,
> end of story.

That works fine if all you care is classic Linux / Unix users.  And
while I'd prefer if all the world was like that, the utf8 based CI
has real use cases.  Initially mostly for Samba file serving, but
apparently Wine also really benefits from it, so some people have CI
directories for that.  XFS ignoring this means we are missing out on
those usrers.

The irony is all the utf8 infrastruture was developed for XFS use
by SGI, never made it upstream back then and got picked up for ext4.
And while it is objectively horrible, plugging into this actually
working infrastructure would be the right thing for XFS instead
of the whacky ASCII only mode only done as a stepping stone while
the utf8 infrastructure got finished.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] xfs: test the ascii case-insensitive hash
  2023-04-04 21:21       ` Linus Torvalds
@ 2023-04-05  6:15         ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-04-05  6:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Darrick J. Wong, linux-xfs, david, Christoph Hellwig

On Tue, Apr 04, 2023 at 02:21:35PM -0700, Linus Torvalds wrote:
> Fair enough. That works. I still think it should be made to be
> US-ASCII only, in order to not have any strange oddities.
> 
> Do you really have to support that "pseudo-latin1" thing? If it's
> literally just a "xfs_hashprep()" function, and you arbitrarily pick
> one random function, why make it be a known-broken one?

Because that's the one that has been used for 15 years as no one would
do this for new code?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation
  2023-04-04 17:07 ` [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation Darrick J. Wong
  2023-04-04 17:54   ` Linus Torvalds
@ 2023-04-05 10:48   ` Christoph Hellwig
  2023-04-05 15:30     ` Darrick J. Wong
  1 sibling, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2023-04-05 10:48 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: torvalds, linux-xfs, david

On Tue, Apr 04, 2023 at 10:07:06AM -0700, Darrick J. Wong wrote:
> Which means that the kernel and userspace do not agree on the hash value
> for a directory filename that contains those higher values.  The hash
> values are written into the leaf index block of directories that are
> larger than two blocks in size, which means that xfs_repair will flag
> these directories as having corrupted hash indexes and rewrite the index
> with hash values that the kernel now will not recognize.
> 
> Because the ascii-ci feature is not frequently enabled and the kernel
> touches filesystems far more frequently than xfs_repair does, fix this
> by encoding the kernel's toupper predicate and tolower functions into
> libxfs.  This makes userspace's behavior consistent with the kernel.

I agree with making the userspace behavior consistent with the actual
kernel behavior.  Sadly the documented behavior differs from both
of them, so I think we need to also document the actual tables used
in the mkfs.xfs manpage, as it isn't actually just ASCII.

Does the kernel twolower behavior map to an actual documented charset?
In that case we can just point to it, which would be way either than
documenting all the details.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation
  2023-04-05 10:48   ` Christoph Hellwig
@ 2023-04-05 15:30     ` Darrick J. Wong
  2023-04-05 15:45       ` Linus Torvalds
  0 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-05 15:30 UTC (permalink / raw)
  To: Christoph Hellwig, david; +Cc: torvalds, linux-xfs

On Wed, Apr 05, 2023 at 03:48:00AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 04, 2023 at 10:07:06AM -0700, Darrick J. Wong wrote:
> > Which means that the kernel and userspace do not agree on the hash value
> > for a directory filename that contains those higher values.  The hash
> > values are written into the leaf index block of directories that are
> > larger than two blocks in size, which means that xfs_repair will flag
> > these directories as having corrupted hash indexes and rewrite the index
> > with hash values that the kernel now will not recognize.
> > 
> > Because the ascii-ci feature is not frequently enabled and the kernel
> > touches filesystems far more frequently than xfs_repair does, fix this
> > by encoding the kernel's toupper predicate and tolower functions into
> > libxfs.  This makes userspace's behavior consistent with the kernel.
> 
> I agree with making the userspace behavior consistent with the actual
> kernel behavior.  Sadly the documented behavior differs from both
> of them, so I think we need to also document the actual tables used
> in the mkfs.xfs manpage, as it isn't actually just ASCII.

Agreed.  Given that kernel tolower() behavior has been stable since 1996
(and remaps the ISO 8859-1 accented letters), the "ASCII CI" feature
most closely maps to "ISO 8859-1 CI".  But at this point there's not
even a shared understanding (Dave said latin1, you said 7-bit ascii,
IDGAF) so I agree that documenting the exact transformations in the
manpage is the only sane way forward.

I propose the changing the mkfs.xfs manpage wording from:

"The version=ci  option  enables  ASCII  only case-insensitive filename
lookup and version 2 directories. Filenames  are  case-preserving, that
is, the names are stored in directories using  the  case  they  were
created with."

into:

"If the version=ci option is specified, the kernel will transform
certain bytes in filenames before performing lookup-related operations.
The byte sequence given to create a directory entry is persisted without
alterations.  The lookup transformations are defined as follows:

0x41 - 0x5a -> 0x61 - 0x7a
0xc0 - 0xd6 -> 0xe0 - 0xf6
0xd8 - 0xde -> 0xf8 - 0xfe

This transformation roughly corresponds to case insensitivity in ISO
8859-1 and may cause problems with other encodings (e.g. UTF8).  The
feature will be disabled by default in September 2025, and removed from
the kernel in September 2030."

> Does the kernel twolower behavior map to an actual documented charset?
> In that case we can just point to it, which would be way either than
> documenting all the details.

It *seems* to operate on ISO 8859-1 (aka latin1), but Linus implied that
the history of lib/ctype.c is lost to the ages.  Or at least 1996-era
mailing list archives.

--D

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation
  2023-04-05  6:12       ` Christoph Hellwig
@ 2023-04-05 15:40         ` Darrick J. Wong
  2023-04-05 15:42           ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-05 15:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Linus Torvalds, linux-xfs, david

On Tue, Apr 04, 2023 at 11:12:56PM -0700, Christoph Hellwig wrote:
> On Tue, Apr 04, 2023 at 11:32:14AM -0700, Darrick J. Wong wrote:
> > Yeah, I get that.  Fifteen years ago, Barry Naujok and Christoph merged
> > this weird ascii-ci feature for XFS that purportedly does ... something.
> > It clearly only works properly if you force userspace to use latin1,
> > which is totally nuts in 2023 given that the distros default to UTF8
> > and likely don't test anything else.  It probably wasn't even a good
> > idea in *2008*, but it went in anyway.  Nobody tested this feature,
> > metadump breaks with this feature enabled, but as maintainer I get to
> > maintain these poorly designed half baked projects.
> 
> IIRC the idea was that it should do 7-bit ASCII only, so even accepting
> Latin 1 characters seems like a bug compared to what it was documented
> to do.
> 
> > I wouldn't ever enable this feature on any computer I use, and I think
> > the unicode case-insensitive stuff that's been put in to ext4 and f2fs
> > lately are not a tarpit that I ever want to visit in XFS.  Directory
> > names should be sequences of bytes that don't include nulls or slashes,
> > end of story.
> 
> That works fine if all you care is classic Linux / Unix users.  And
> while I'd prefer if all the world was like that, the utf8 based CI
> has real use cases.  Initially mostly for Samba file serving, but
> apparently Wine also really benefits from it, so some people have CI
> directories for that.  XFS ignoring this means we are missing out on
> those usrers.

<shrug> Welllll... if someone presents a strong case for adopting the
utf8 casefolding feature that f2fs and ext4 added some ways back, I
could be persuaded to import that, bugs and all, into XFS.  However,
given all the weird problems I've uncovered with "ascii"-ci, I'm going
to be very hardnosed about adding test cases and making sure /all/ the
tooling works properly.

I wasn't thrilled at all the "Handle invalid UTF8 sequence as either an
error or as an opaque byte sequence." that went into the ext4 code.
While I concede that it's the least-legacy-code-regressive solution to
people demanding to create non-utf8 filenames on a "utf8-casefold"
filesystem, it's just ... compromised.

Really it's "utf8 casefolded lookups if all the names you create are
valid utf8 byte sequences, and if you fail at that then we fall back to
memcmp(); also there's a strict-utf8 creat mode but you can't enable it".

Gross.

> The irony is all the utf8 infrastruture was developed for XFS use
> by SGI, never made it upstream back then and got picked up for ext4.
> And while it is objectively horrible, plugging into this actually
> working infrastructure would be the right thing for XFS instead
> of the whacky ASCII only mode only done as a stepping stone while
> the utf8 infrastructure got finished.

fsdevel, the gift that keeps on giving...

--D

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation
  2023-04-05 15:40         ` Darrick J. Wong
@ 2023-04-05 15:42           ` Christoph Hellwig
  2023-04-05 17:10             ` Darrick J. Wong
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2023-04-05 15:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, Linus Torvalds, linux-xfs, david

On Wed, Apr 05, 2023 at 08:40:22AM -0700, Darrick J. Wong wrote:
> <shrug> Welllll... if someone presents a strong case for adopting the
> utf8 casefolding feature that f2fs and ext4 added some ways back, I
> could be persuaded to import that, bugs and all, into XFS.  However,
> given all the weird problems I've uncovered with "ascii"-ci, I'm going
> to be very hardnosed about adding test cases and making sure /all/ the
> tooling works properly.

You'll love this paper:

https://www.usenix.org/conference/fast23/presentation/basu

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation
  2023-04-05 15:30     ` Darrick J. Wong
@ 2023-04-05 15:45       ` Linus Torvalds
  0 siblings, 0 replies; 26+ messages in thread
From: Linus Torvalds @ 2023-04-05 15:45 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, david, linux-xfs

On Wed, Apr 5, 2023 at 8:30 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> It *seems* to operate on ISO 8859-1 (aka latin1), but Linus implied that
> the history of lib/ctype.c is lost to the ages.  Or at least 1996-era
> mailing list archives.

I'm pretty sure that the *intent* at the time was Latin1, so if the
table matches that and there aren't any bugs, I think we can just
document it as such.

I just don't know why we decided to actually do the conversion from
7-bit tables to 8-bit ones. That is shrouded in the mists of time, but
it's unquestionably true that we were fairly Latin1-centric back then.

We got better. Slowly.

               Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation
  2023-04-05 15:42           ` Christoph Hellwig
@ 2023-04-05 17:10             ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-05 17:10 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Linus Torvalds, linux-xfs, david

On Wed, Apr 05, 2023 at 08:42:34AM -0700, Christoph Hellwig wrote:
> On Wed, Apr 05, 2023 at 08:40:22AM -0700, Darrick J. Wong wrote:
> > <shrug> Welllll... if someone presents a strong case for adopting the
> > utf8 casefolding feature that f2fs and ext4 added some ways back, I
> > could be persuaded to import that, bugs and all, into XFS.  However,
> > given all the weird problems I've uncovered with "ascii"-ci, I'm going
> > to be very hardnosed about adding test cases and making sure /all/ the
> > tooling works properly.
> 
> You'll love this paper:
> 
> https://www.usenix.org/conference/fast23/presentation/basu

I know.

I stick to my earlier statements about "I wouldn't ever enable this
feature on any computer I use..." and "...not a tarpit that I ever want
to visit in XFS."

At one point I had wired up xfs_scrub to complain about filenames that
map to the same casefolded utf8 names to warn syadmins that this could
be used in some sort of unicode casefolding attack.  I pushed it back on
my patch stack and ran it against /home today and got a bunch of stuff
like this from the kernel source tree:

Warning: inode 4187068442 (31/26318874): Case-folded Unicode name "ip6t_hl.h" in directory could be confused with "ip6t_HL.h". (unicrash.c line 614)
Warning: inode 4032422946 (30/5891106): Case-folded Unicode name "ipt_ecn.h" in directory could be confused with "ipt_ECN.h". (unicrash.c line 614)
Warning: inode 4032422946 (30/5891106): Case-folded Unicode name "ipt_ttl.h" in directory could be confused with "ipt_TTL.h". (unicrash.c line 614)
Warning: inode 2285477942 (17/3776566): Case-folded Unicode name "xt_hl.c" in directory could be confused with "xt_HL.c". (unicrash.c line 614)
Warning: inode 2285477942 (17/3776566): Case-folded Unicode name "xt_tcpmss.c" in directory could be confused with "xt_TCPMSS.c". (unicrash.c line 614)
Warning: inode 3627924489 (27/4045833): Case-folded Unicode name "xt_tcpmss.h" in directory could be confused with "xt_TCPMSS.h". (unicrash.c line 614)
Warning: inode 3763353714 (28/5257330): Case-folded Unicode name "ip6t_hl.h" in directory could be confused with "ip6t_HL.h". (unicrash.c line 614)
Warning: inode 3763353714 (28/5257330): Case-folded Unicode name ".ip6t_hl.h.cmd" in directory could be confused with ".ip6t_HL.h.cmd". (unicrash.c line 614)
Warning: inode 7042717 (0/7042717): Case-folded Unicode name ".ipt_ecn.h.cmd" in directory could be confused with ".ipt_ECN.h.cmd". (unicrash.c line 614)
Warning: inode 7042717 (0/7042717): Case-folded Unicode name ".ipt_ttl.h.cmd" in directory could be confused with ".ipt_TTL.h.cmd". (unicrash.c line 614)
Warning: inode 7042718 (0/7042718): Case-folded Unicode name "ip6t_hl.h" in directory could be confused with "ip6t_HL.h". (unicrash.c line 614)
Warning: inode 7042718 (0/7042718): Case-folded Unicode name ".ip6t_hl.h.cmd" in directory could be confused with ".ip6t_HL.h.cmd". (unicrash.c line 614)
Warning: inode 406880264 (3/4227080): Case-folded Unicode name "Z6.0+pooncelock+pooncelock+pombonce.litmus" in directory could be confused with "Z6.0+pooncelock+poonceLock+pombonce.litmus". (unicrash.c line 614)

Yuck.

I never sent this patch to linux-xfs because XFS doesn't do casefolding
so who cares.  The xtables stuff is easy to spot, but that last one took
some staring before I even figured out what was different between the
two names -- lock vs Lock.

--D

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-04-05 17:10 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-04 17:07 [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
2023-04-04 17:07 ` [PATCH 1/3] xfs: stabilize the tolower function used for ascii-ci dir hash computation Darrick J. Wong
2023-04-04 17:54   ` Linus Torvalds
2023-04-04 18:32     ` Darrick J. Wong
2023-04-04 18:58       ` Linus Torvalds
2023-04-04 23:30       ` Dave Chinner
2023-04-05  0:17         ` Linus Torvalds
2023-04-05  6:12       ` Christoph Hellwig
2023-04-05 15:40         ` Darrick J. Wong
2023-04-05 15:42           ` Christoph Hellwig
2023-04-05 17:10             ` Darrick J. Wong
2023-04-05 10:48   ` Christoph Hellwig
2023-04-05 15:30     ` Darrick J. Wong
2023-04-05 15:45       ` Linus Torvalds
2023-04-04 17:07 ` [PATCH 2/3] xfs: test the ascii case-insensitive hash Darrick J. Wong
2023-04-04 18:06   ` Linus Torvalds
2023-04-04 20:51     ` Darrick J. Wong
2023-04-04 21:21       ` Linus Torvalds
2023-04-05  6:15         ` Christoph Hellwig
2023-04-04 17:07 ` [PATCH 3/3] xfs: use the directory name hash function for dir scrubbing Darrick J. Wong
2023-04-04 17:17 ` [PATCHSET 0/3] xfs: fix ascii-ci problems with userspace Darrick J. Wong
2023-04-04 18:19   ` Linus Torvalds
2023-04-04 20:21     ` Linus Torvalds
2023-04-04 21:00       ` Darrick J. Wong
2023-04-04 21:50         ` Linus Torvalds
2023-04-04 21:09 ` [PATCH] xfstests: add a couple more tests for ascii-ci problems Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox