All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gabriel Krisman Bertazi <krisman@suse.de>
To: Joshua Ashton <joshua@froggi.es>
Cc: linux-bcachefs@vger.kernel.org, "André Almeida" <andrealmeid@igalia.com>
Subject: Re: [v4 3/3] bcachefs: Implement casefolding
Date: Mon, 14 Aug 2023 12:34:46 -0400	[thread overview]
Message-ID: <87v8dh60rd.fsf@suse.de> (raw)
In-Reply-To: <20230814012434.627641-4-joshua@froggi.es> (Joshua Ashton's message of "Mon, 14 Aug 2023 02:24:27 +0100")

Joshua Ashton <joshua@froggi.es> writes:

> This patch implements support for case-insensitive file name lookups
> in bcachefs.
>
> The implementation uses the same UTF-8 lowering and normalization that
> ext4 and f2fs is using.
>
> More information is provided in Documentation/bcachefs/casefolding.rst
>
> Signed-off-by: Joshua Ashton <joshua@froggi.es>

Please, cc the entire thread as i don't think I even got the cover
letter.

Did you test this against fstests generic/556?  it checks all sorts of
common pitfalls and corner-cases in case-insensitive filesystems.

> --- /dev/null
> +++ b/Documentation/filesystems/bcachefs/casefolding.rst
> @@ -0,0 +1,73 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Casefolding
> +===========
> +
> +bcachefs has support for case-insensitive file and directory
> +lookups using the regular `chattr +F` (`S_CASEFOLD`, `FS_CASEFOLD_FL`)
> +casefolding attributes.
> +
> +The main usecase for casefolding is compatibility with software written
> +against other filesystems that rely on casefolded lookups
> +(eg. NTFS and Wine/Proton).
> +Taking advantage of file-system level casefolding can lead to great
> +loading time gains in many applications and games.
> +
> +Casefolding support requires a kernel with the `CONFIG_UNICODE` enabled.
> +Once a directory has been flagged for casefolding, a feature bit
> +is enabled on the superblock which marks the filesystem as using
> +casefolding.
> +When the feature bit for casefolding is enabled, it is no longer possible
> +to mount that filesystem on kernels without `CONFIG_UNICODE` enabled.
> +
> +On the lookup/query side: casefolding is implemented by allocating a new
> +string of `BCH_NAME_MAX` length using the `utf8_casefold` function to
> +casefold the query string.
> +
> +On the dirent side: casefolding is implemented by ensuring the `bkey`'s
> +hash is made from the casefolded string and storing the cached casefolded
> +name with the regular name in the dirent.
> +
> +The structure looks like this:
> +
> +Regular:    [dirent data][regular name][nul][nul]...
> +Casefolded: [dirent data][reg len][cf len][regular name][casefolded name][nul][nul]...
> +
> +(Do note, the number of `NUL`s here is merely for illustration, they count can vary
> + per-key, and they may not even be present if the key is aligned to `sizeof(u64)`.)
> +
> +This is efficient as it means that for all file lookups that require casefolding,
> +it has identical performance to a regular lookup:
> +a hash comparison and a `memcmp` of the name.
> +
> +Rationale
> +---------
> +
> +Several designs were considered for this system:
> +One was to introduce a dirent_v2, however that would be painful especially as
> +the hash system only has support for a single key type. This would also need
> +`BCH_NAME_MAX` to change between versions, and a new feature bit.
> +
> +Another option was to store without the two lengths, and just take the length of
> +the regular name and casefolded name contiguously / 2 as the length. This would
> +assume that the regular length == casefolded length, but that could potentially
> +not be true, if the uppercase unicode glyph had a different UTF-8 encoding than
> +the lowercase unicode glyph.

Why do you want to store the casefolded name?  In ext4, we just store
the case-exact name that was created, and do a code-point by code-point
comparison with the name under lookup through utf8_strncasecmp, which is
optimized to compare two not casefolded strings.  We just modify the
filename hash to be case-insensitive, so we know which dirent to look
for during lookup.  Finally, we use d_compare/d_hash to leverage the
dcache into finding negative dentries.

Sure, if you have both, dirent and dentry->d_name casefolded, then it is
just a regular memcmp, but at least you don't have to change the disk
format and your implementation becomes simpler.



> +It would be possible to disregard the casefold cache for those cases, but it was
> +decided to simply encode the two string lengths in the key to avoid random
> +performance issues if this edgecase was ever hit.
> +
> +The option settled on was to use a free-bit in d_type to mark a dirent as having
> +a casefold cache, and then treat the first 4 bytes the name block as lengths.
> +You can see this in the `d_cf_name_block` member of union in `bch_dirent`.
> +
> +The feature bit was used to allow casefolding support to be enabled for the majority
> +of users, but some allow users who have no need for the feature to still use bcachefs as
> +`CONFIG_UNICODE` can increase the kernel side a significant amount due to the tables used,
> +which may be decider between using bcachefs for eg. embedded platforms.
> +
> +Other filesystems like ext4 and f2fs have a super-block level option for casefolding
> +encoding, but bcachefs currently does not provide this. ext4 and f2fs do not expose
> +any encodings than a single UTF-8 version. When future encodings are desirable,
> +they will be added trivially using the opts mechanism.
> +
> diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
> index 30b3d7b9f9dc..0e1facdc7876 100644
> --- a/fs/bcachefs/bcachefs.h
> +++ b/fs/bcachefs/bcachefs.h
> @@ -202,6 +202,7 @@
>  #include <linux/types.h>
>  #include <linux/workqueue.h>
>  #include <linux/zstd.h>
> +#include <linux/unicode.h>
>  
>  #include "bcachefs_format.h"
>  #include "errcode.h"
> @@ -657,6 +658,8 @@ enum bch_write_ref {
>  	BCH_WRITE_REF_NR,
>  };
>  
> +#define BCH_FS_DEFAULT_UTF8_ENCODING UNICODE_AGE(12, 1, 0)
> +
>  struct bch_fs {
>  	struct closure		cl;
>  
> @@ -723,6 +726,9 @@ struct bch_fs {
>  		u64		compat;
>  	}			sb;
>  
> +#if IS_ENABLED(CONFIG_UNICODE)
> +	struct unicode_map 	*cf_encoding;
> +#endif

This goes in struct super_block.


> @@ -13,8 +13,39 @@
>  
>  #include <linux/dcache.h>
>  
> -static inline unsigned dirent_val_u64s(unsigned len)
> +static inline int bch2_casefold(struct btree_trans *trans, const struct bch_hash_info *info,
> +				const struct qstr *str, struct qstr *out_cf)
>  {
> +#if IS_ENABLED(CONFIG_UNICODE)
> +	unsigned char *buf = NULL;
> +	int ret;
> +
> +	*out_cf = (struct qstr) QSTR_INIT(NULL, 0);
> +
> +	buf = bch2_trans_kmalloc(trans, BCH_NAME_MAX + 1);
> +	ret = PTR_ERR_OR_ZERO(buf);
> +	if (ret)
> +		return ret;
> +
> +	ret = utf8_casefold(info->cf_encoding, str, buf, BCH_NAME_MAX + 1);
> +	if (ret <= 0)
> +		return ret;

The error path leaks buf.

> @@ -472,6 +600,7 @@ int __bch2_dirent_lookup_trans(struct btree_trans *trans,
>  			       const struct qstr *name, subvol_inum *inum,
>  			       unsigned flags)
>  {
> +	struct qstr lookup_name = *name;
>  	struct bkey_s_c k;
>  	struct bkey_s_c_dirent d;
>  	u32 snapshot;
> @@ -481,8 +610,14 @@ int __bch2_dirent_lookup_trans(struct btree_trans *trans,
>  	if (ret)
>  		return ret;
>  
> +	ret = hash_info->cf_encoding
> +		? bch2_casefold(trans, hash_info, name, &lookup_name)
> +		: 0;
> +	if (ret)
> +		return ret;
> +
>  	ret = bch2_hash_lookup(trans, iter, bch2_dirent_hash_desc,
> -			       hash_info, dir, name, flags);
> +			       hash_info, dir, &lookup_name, flags);
>  	if (ret)
>  		return ret;


(Skipping the bcachefs-specific bits that I'm not qualified to review)

I don't get your model for the dcache.  I don't see any handling
anywhere in the patch.  Also, I fetched the code and it seems bcachefs
will just do d_splice_alias at the end of the lookup. So, when someone
lookups FOO, ->lookup will be called and the dentry FOO will be created.
Then, when FOo is looked up, the dcache won't find the previous dentry
and ->lookup will be called again (perhaps creating another dentry?)

There are two ways around it. Rely on the libfs' generic handlers for
d_hash/d_compare and soon d_revalidate, or use d_add_ci to search for
case-insensitive aliases at lookup time, but then you don't benefit much
from the dcache if looking up for different cases.


> @@ -54,6 +54,28 @@ static int bch2_inode_flags_set(struct btree_trans *trans,
>  	    (newflags & (BCH_INODE_NODUMP|BCH_INODE_NOATIME)) != newflags)
>  		return -EINVAL;
>  
> +	if ((newflags ^ oldflags) & BCH_INODE_CASEFOLDED) {
> +#if IS_ENABLED(CONFIG_UNICODE)
> +		int ret = 0;
> +		/* Not supported on individual files. */
> +		if (!S_ISDIR(bi->bi_mode))
> +			return -EOPNOTSUPP;
> +
> +		/*
> +		 * Make sure the dir is empty, as otherwise we'd need to
> +		 * rehash everything and update the dirent keys.
> +		 */
> +		ret = bch2_empty_dir_trans(trans, inode_inum(inode));
> +		if (ret < 0)
> +			return ret;
> +
> +		bch2_check_set_feature(c, BCH_FEATURE_casefolding);

fwiw, we use _CASEFOLD in other filesystems, not _casefolding.

Thanks,

-- 
Gabriel Krisman Bertazi

  reply	other threads:[~2023-08-14 16:35 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-14  1:24 [v4 0/3] bcachefs: Casefolding implementation Joshua Ashton
2023-08-14  1:24 ` [v4 1/3] bcachefs: Split out dirent alloc and name initialization Joshua Ashton
2023-08-14  1:24 ` [v4 2/3] bcachefs: Move dirent_val_u64s to dirent.c Joshua Ashton
2023-08-14  1:24 ` [v4 3/3] bcachefs: Implement casefolding Joshua Ashton
2023-08-14 16:34   ` Gabriel Krisman Bertazi [this message]
2023-08-14 21:18     ` Joshua Ashton
2023-08-14 21:33       ` Joshua Ashton
2023-08-14 23:01       ` Gabriel Krisman Bertazi
2023-08-15  0:58       ` Kent Overstreet
2023-08-15 18:00         ` Gabriel Krisman Bertazi
2023-08-23  9:37           ` Joshua Ashton
2023-08-15 12:36   ` Brian Foster
2023-08-24  5:43     ` Joshua Ashton
2023-08-16 19:40 ` [v4 0/3] bcachefs: Casefolding implementation Kent Overstreet
2023-08-23  9:34   ` Joshua Ashton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v8dh60rd.fsf@suse.de \
    --to=krisman@suse.de \
    --cc=andrealmeid@igalia.com \
    --cc=joshua@froggi.es \
    --cc=linux-bcachefs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.