public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Björn JACKE" <bjacke@SerNet.DE>
To: linux-fsdevel@vger.kernel.org
Subject: casefold is using unsuitable case mapping table
Date: Tue, 22 Apr 2025 14:31:41 +0200	[thread overview]
Message-ID: <20250422123141.GD855798@sernet.de> (raw)

Hi,

I started to experiment with the casefold feature of ext4 and some other
filesystems. I was hoping to get some significant performance gains for Samba
server with large directories.

It turns out though that the case insensitive feature is not usable because it
does not match the case mapping tables that other operating systems use. More
specifically, the german letter "ß" is treated as a case equivanten of "ss".

There is an equivalent of "ß" and "ss in some other scopes, also AD LDAP treats
them as an equivante. For systems that requires "lossless" case conversion
however should not treat ß and ss as equivalent. This is also why a filesystem
should never ever do that

Since 2017 there is a well-defined uppercase version of the codepoint (U+00DF)
of the "ß" letter in Unicode: U+1E9E, this could eventually be used but I
haven't seen any filesystem using that so far. This would be a possible and
lossless case equivalent, but well, that's actually another thing to discuss.

The important point is to _not_ use the ß/ss case equicalent. The casefold
feature is mainly useless otherwise.

Can this be changed without causing too much hassle?

Cheers
Björn

             reply	other threads:[~2025-04-22 12:54 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-22 12:31 Björn JACKE [this message]
2025-04-24 19:53 ` casefold is using unsuitable case mapping table Gabriel Krisman Bertazi
2025-04-25 11:40   ` Björn JACKE
2025-06-09 18:12     ` Gabriel Krisman Bertazi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250422123141.GD855798@sernet.de \
    --to=bjacke@sernet.de \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox