Re: [v4 3/3] bcachefs: Implement casefolding

public inbox for linux-bcachefs@vger.kernel.org
 help / color / mirror / Atom feed

From: Gabriel Krisman Bertazi <krisman@suse.de>
To: Joshua Ashton <joshua@froggi.es>
Cc: linux-bcachefs@vger.kernel.org, "André Almeida" <andrealmeid@igalia.com>
Subject: Re: [v4 3/3] bcachefs: Implement casefolding
Date: Mon, 14 Aug 2023 19:01:58 -0400	[thread overview]
Message-ID: <87350l5iu1.fsf@suse.de> (raw)
In-Reply-To: <8ffb410c-ebfc-490c-96d8-8ed72ef8eee2@froggi.es> (Joshua Ashton's message of "Mon, 14 Aug 2023 22:18:33 +0100")

Joshua Ashton <joshua@froggi.es> writes:

> On 8/14/23 17:34, Gabriel Krisman Bertazi wrote:
>>> +
>>> +Another option was to store without the two lengths, and just take the length of
>>> +the regular name and casefolded name contiguously / 2 as the length. This would
>>> +assume that the regular length == casefolded length, but that could potentially
>>> +not be true, if the uppercase unicode glyph had a different UTF-8 encoding than
>>> +the lowercase unicode glyph.
>> Why do you want to store the casefolded name?  In ext4, we just store
>> the case-exact name that was created, and do a code-point by code-point
>> comparison with the name under lookup through utf8_strncasecmp, which is
>> optimized to compare two not casefolded strings.  We just modify the
>> filename hash to be case-insensitive, so we know which dirent to look
>> for during lookup.  Finally, we use d_compare/d_hash to leverage the
>> dcache into finding negative dentries.
>
> Yeah, am aware of how ext4 does it, I just wanted it to have basically
> the exact same performance characteristics as as regular lookup.
>
> The utf8 cursor stuff in utf8_strncasecmp is still super expensive
> compared to a straight memcmp in some local benching.

More expensive than strncasecmp, for sure. But I'm surprised it is
visible in benchmarks.  Of course, unless you are searching different
files at every lookup, once it makes proper use of the dcache it won't
matter much.  There is also utf8_strncasecmp_folded that avoids half of
the trie searches by caching the name-under-lookup.

> I am happy to bung it in the vfs super_block for now though -- but I
> need to look at how that all works out with mounting and the multiple
> drives situation. Just be aware that at some point, it will likely be
> moved back again to our code.

Different drives as in different filesystem volumes? They each should
have one superblock anyway.  Also, someone called my attention to your
v1 commit message in v1 regarding selecting encoding per-directory.  I'm
not sure it is a good idea, since it is definitely problematic once you
start moving files around different subtrees.

Unicode casefolding is only stable for defined code points (of course),
but newer versions might add new code points that previously didn't have
any casefolding.  So, if you have a name that has an invalid casepoint
in an older version (but is legal utf8) and moves it to a new directory
with an encoding version where that folds into something else, you have
a situation where one file occludes the other and all sorts of
unpredictable behavior.

It can happen while copying across disks, of course. But we already
reject many cases of cross-filesystem copying.  But failing to move a
file within the same filesystem is not very user-friendly.

I think the reason you want to support multiple encoding versions is to
enable applications built for different Windows ntfs versions. If that
is the case, you should be safe using the latest unicode disk-wide,
since it is backward compatible.  Take a look also at ext4 and f2fs'
"strict mode", that rejects invalid unicode sequences too.

>> and ->lookup will be called again (perhaps creating another dentry?)
>> There are two ways around it. Rely on the libfs' generic handlers for
>> d_hash/d_compare and soon d_revalidate, or use d_add_ci to search for
>> case-insensitive aliases at lookup time, but then you don't benefit much
>> from the dcache if looking up for different cases.
>
> Thanks for the heads-up on this.
>
> It seems like I may have potentially hit a foot-gun there if we are
> making vfs dentry's for each lookup.
>
> I will investigate and validate that we are doing the right thing here
> for the next revision.
>
> I will not lie, I know veeery little about the vfs or dcache side, so
> huge thanks for pointing this out to me. :)
>
> I had a quick look at what f2fs does here, and it seems like it also
> just does d_splice_alias, but it has the following caveat.
>
> [f2fs code]
> #if IS_ENABLED(CONFIG_UNICODE)
> 	if (!inode && IS_CASEFOLDED(dir)) {
> 		/* Eventually we want to call d_add_ci(dentry, NULL)
> 		 * for negative dentries in the encoding case as
> 		 * well.  For now, prevent the negative dentry
> 		 * from being cached.
> 		 */
> 		trace_f2fs_lookup_end(dir, dentry, ino, err);
> 		return NULL;
> 	}
> #endif
> 	new = d_splice_alias(inode, dentry);
>
> I won't even pretend that I understand what a negative dentry is right
> now... :D

A negative dentry is simply a cache entry saying that a file doesn't
exist in the filesystem, so further lookups don't need to hit the disk
over and over.

The f2fs caveat you mentioned is entirely avoiding the creation of
negative dentries in IS_CASEFOLDED directories, since they aren't
supported currently. I explained a bit the reasoning for that in the
following series, that also remove this code.  It is (hopefully) going
upstream soon:

  https://www.spinics.net/lists/linux-ext4/msg90964.html

My suggestion is that you follow f2fs/ext4 closely.  Avoid d_add_ci, as
it was written for filesystems that deal with case-insensitive
directories in a slight different way, and don't make use good use of
the dcache, in my opinion.  A better approach would be to use
d_compare/d_hash and keep calling d_splice_alias.  Take a look at the
generic handlers in fs/libfs.c too.

-- 
Gabriel Krisman Bertazi

next prev parent reply	other threads:[~2023-08-14 23:02 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-14  1:24 [v4 0/3] bcachefs: Casefolding implementation Joshua Ashton
2023-08-14  1:24 ` [v4 1/3] bcachefs: Split out dirent alloc and name initialization Joshua Ashton
2023-08-14  1:24 ` [v4 2/3] bcachefs: Move dirent_val_u64s to dirent.c Joshua Ashton
2023-08-14  1:24 ` [v4 3/3] bcachefs: Implement casefolding Joshua Ashton
2023-08-14 16:34   ` Gabriel Krisman Bertazi
2023-08-14 21:18     ` Joshua Ashton
2023-08-14 21:33       ` Joshua Ashton
2023-08-14 23:01       ` Gabriel Krisman Bertazi [this message]
2023-08-15  0:58       ` Kent Overstreet
2023-08-15 18:00         ` Gabriel Krisman Bertazi
2023-08-23  9:37           ` Joshua Ashton
2023-08-15 12:36   ` Brian Foster
2023-08-24  5:43     ` Joshua Ashton
2023-08-16 19:40 ` [v4 0/3] bcachefs: Casefolding implementation Kent Overstreet
2023-08-23  9:34   ` Joshua Ashton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87350l5iu1.fsf@suse.de \
    --to=krisman@suse.de \
    --cc=andrealmeid@igalia.com \
    --cc=joshua@froggi.es \
    --cc=linux-bcachefs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox