Re: [v4 3/3] bcachefs: Implement casefolding

public inbox for linux-bcachefs@vger.kernel.org
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: Joshua Ashton <joshua@froggi.es>
Cc: linux-bcachefs@vger.kernel.org,
	"André Almeida" <andrealmeid@igalia.com>,
	"Gabriel Krisman Bertazi" <krisman@suse.de>
Subject: Re: [v4 3/3] bcachefs: Implement casefolding
Date: Tue, 15 Aug 2023 08:36:03 -0400	[thread overview]
Message-ID: <ZNtxM447DrmJDA3W@bfoster> (raw)
In-Reply-To: <20230814012434.627641-4-joshua@froggi.es>

On Mon, Aug 14, 2023 at 02:24:27AM +0100, Joshua Ashton wrote:
> This patch implements support for case-insensitive file name lookups
> in bcachefs.
> 
> The implementation uses the same UTF-8 lowering and normalization that
> ext4 and f2fs is using.
> 
> More information is provided in Documentation/bcachefs/casefolding.rst
> 
> Signed-off-by: Joshua Ashton <joshua@froggi.es>
> 
> Cc: André Almeida <andrealmeid@igalia.com>
> Cc: Gabriel Krisman Bertazi <krisman@suse.de>
> ---

Hi Joshua,

I'm still grokking the implementation and I see Gabriel has already
provided much more substantive feedback, so just a few random comments...

>  .../filesystems/bcachefs/casefolding.rst      |  73 ++++++++
>  fs/bcachefs/bcachefs.h                        |   6 +
>  fs/bcachefs/bcachefs_format.h                 |  26 ++-
>  fs/bcachefs/dirent.c                          | 177 +++++++++++++++---
>  fs/bcachefs/fs-common.c                       |   4 +
>  fs/bcachefs/fs-ioctl.c                        |  22 +++
>  fs/bcachefs/fs-ioctl.h                        |  20 +-
>  fs/bcachefs/str_hash.h                        |   4 +
>  fs/bcachefs/super.c                           |  19 ++
>  9 files changed, 318 insertions(+), 33 deletions(-)
>  create mode 100644 Documentation/filesystems/bcachefs/casefolding.rst
> 
> diff --git a/Documentation/filesystems/bcachefs/casefolding.rst b/Documentation/filesystems/bcachefs/casefolding.rst
> new file mode 100644
> index 000000000000..c8580d892ae8
> --- /dev/null
> +++ b/Documentation/filesystems/bcachefs/casefolding.rst
> @@ -0,0 +1,73 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Casefolding
> +===========
> +
> +bcachefs has support for case-insensitive file and directory
> +lookups using the regular `chattr +F` (`S_CASEFOLD`, `FS_CASEFOLD_FL`)
> +casefolding attributes.
> +
> +The main usecase for casefolding is compatibility with software written
> +against other filesystems that rely on casefolded lookups
> +(eg. NTFS and Wine/Proton).
> +Taking advantage of file-system level casefolding can lead to great
> +loading time gains in many applications and games.
> +
> +Casefolding support requires a kernel with the `CONFIG_UNICODE` enabled.
> +Once a directory has been flagged for casefolding, a feature bit
> +is enabled on the superblock which marks the filesystem as using
> +casefolding.
> +When the feature bit for casefolding is enabled, it is no longer possible
> +to mount that filesystem on kernels without `CONFIG_UNICODE` enabled.
> +
> +On the lookup/query side: casefolding is implemented by allocating a new
> +string of `BCH_NAME_MAX` length using the `utf8_casefold` function to
> +casefold the query string.
> +
> +On the dirent side: casefolding is implemented by ensuring the `bkey`'s
> +hash is made from the casefolded string and storing the cached casefolded
> +name with the regular name in the dirent.
> +
> +The structure looks like this:
> +
> +Regular:    [dirent data][regular name][nul][nul]...
> +Casefolded: [dirent data][reg len][cf len][regular name][casefolded name][nul][nul]...
> +
> +(Do note, the number of `NUL`s here is merely for illustration, they count can vary
> + per-key, and they may not even be present if the key is aligned to `sizeof(u64)`.)
> +
> +This is efficient as it means that for all file lookups that require casefolding,
> +it has identical performance to a regular lookup:
> +a hash comparison and a `memcmp` of the name.
> +

Thanks for the writeup. This is very helpful.

> +Rationale
> +---------
> +
> +Several designs were considered for this system:
> +One was to introduce a dirent_v2, however that would be painful especially as
> +the hash system only has support for a single key type. This would also need
> +`BCH_NAME_MAX` to change between versions, and a new feature bit.
> +
> +Another option was to store without the two lengths, and just take the length of
> +the regular name and casefolded name contiguously / 2 as the length. This would
> +assume that the regular length == casefolded length, but that could potentially
> +not be true, if the uppercase unicode glyph had a different UTF-8 encoding than
> +the lowercase unicode glyph.
> +It would be possible to disregard the casefold cache for those cases, but it was
> +decided to simply encode the two string lengths in the key to avoid random
> +performance issues if this edgecase was ever hit.
> +
> +The option settled on was to use a free-bit in d_type to mark a dirent as having
> +a casefold cache, and then treat the first 4 bytes the name block as lengths.
> +You can see this in the `d_cf_name_block` member of union in `bch_dirent`.
> +
> +The feature bit was used to allow casefolding support to be enabled for the majority
> +of users, but some allow users who have no need for the feature to still use bcachefs as
> +`CONFIG_UNICODE` can increase the kernel side a significant amount due to the tables used,
> +which may be decider between using bcachefs for eg. embedded platforms.
> +
> +Other filesystems like ext4 and f2fs have a super-block level option for casefolding
> +encoding, but bcachefs currently does not provide this. ext4 and f2fs do not expose
> +any encodings than a single UTF-8 version. When future encodings are desirable,
> +they will be added trivially using the opts mechanism.
> +
...
> diff --git a/fs/bcachefs/bcachefs_format.h b/fs/bcachefs/bcachefs_format.h
> index 23bae622309c..cc6797422977 100644
> --- a/fs/bcachefs/bcachefs_format.h
> +++ b/fs/bcachefs/bcachefs_format.h
...
> @@ -864,6 +866,7 @@ enum {
>  #define BCH_INODE_I_SECTORS_DIRTY (1 << __BCH_INODE_I_SECTORS_DIRTY)
>  #define BCH_INODE_UNLINKED	(1 << __BCH_INODE_UNLINKED)
>  #define BCH_INODE_BACKPTR_UNTRUSTED (1 << __BCH_INODE_BACKPTR_UNTRUSTED)
> +#define BCH_INODE_CASEFOLDED	(1 << __BCH_INODE_CASEFOLDED)
>  
>  LE32_BITMASK(INODE_STR_HASH,	struct bch_inode, bi_flags, 20, 24);
>  LE32_BITMASK(INODE_NR_FIELDS,	struct bch_inode, bi_flags, 24, 31);
> @@ -908,9 +911,25 @@ struct bch_dirent {
>  	 * Copy of mode bits 12-15 from the target inode - so userspace can get
>  	 * the filetype without having to do a stat()
>  	 */
> -	__u8			d_type;
> +#if defined(__LITTLE_ENDIAN_BITFIELD)
> +	__u8			d_type:5,
> +				d_unused:2,
> +				d_casefold:1;
> +#elif defined(__BIG_ENDIAN_BITFIELD)
> +	__u8			d_casefold:1,
> +				d_unused:2,
> +				d_type:5;
> +#endif

Interesting. I didn't realize that bitfield endianness matched general
endianness, so I presume this is required to ensure data is always le on
disk.

>  
> -	__u8			d_name[];
> +	union {
> +	struct {
> +		__u8		d_pad;
> +		__le16		d_name_len;
> +		__le16		d_cf_name_len;
> +		__u8		d_names[0];
> +	} d_cf_name_block __packed;
> +	__u8			d_name[0];
> +	} __packed;
>  } __packed __aligned(8);
>  
>  #define DT_SUBVOL	16
> @@ -1841,7 +1860,8 @@ static inline void SET_BCH_SB_BACKGROUND_COMPRESSION_TYPE(struct bch_sb *sb, __u
>  	x(new_varint,			15)	\
>  	x(journal_no_flush,		16)	\
>  	x(alloc_v2,			17)	\
> -	x(extents_across_btree_nodes,	18)
> +	x(extents_across_btree_nodes,	18)	\
> +	x(casefolding,			19)
>  
>  #define BCH_SB_FEATURES_ALWAYS				\
>  	((1ULL << BCH_FEATURE_new_extent_overwrite)|	\
> diff --git a/fs/bcachefs/dirent.c b/fs/bcachefs/dirent.c
> index 6bdcccabd732..6ac45d7ce6e9 100644
> --- a/fs/bcachefs/dirent.c
> +++ b/fs/bcachefs/dirent.c
> @@ -13,8 +13,39 @@
>  
>  #include <linux/dcache.h>
>  
> -static inline unsigned dirent_val_u64s(unsigned len)
> +static inline int bch2_casefold(struct btree_trans *trans, const struct bch_hash_info *info,
> +				const struct qstr *str, struct qstr *out_cf)
>  {
> +#if IS_ENABLED(CONFIG_UNICODE)
> +	unsigned char *buf = NULL;
> +	int ret;
> +
> +	*out_cf = (struct qstr) QSTR_INIT(NULL, 0);
> +
> +	buf = bch2_trans_kmalloc(trans, BCH_NAME_MAX + 1);
> +	ret = PTR_ERR_OR_ZERO(buf);
> +	if (ret)
> +		return ret;
> +
> +	ret = utf8_casefold(info->cf_encoding, str, buf, BCH_NAME_MAX + 1);

Looks like this helper really only needs the encoding pointer rather
than the full hash_info..?

> +	if (ret <= 0)
> +		return ret;
> +
> +	*out_cf = (struct qstr) QSTR_INIT(buf, ret);
> +	return 0;
> +#else
> +	*out_cf = (struct qstr) QSTR_INIT(NULL, 0);
> +
> +	return -EOPNOTSUPP;
> +#endif

It also looks like the #else could be completely avoided here. For
example, lift the local vars out of the ifdef, init out_cf using the
NULL buf, init ret to -EOPNOTSUPP and push 'return ret;' out of the end
of the ifdef. Hm?

> +}
> +
> +static inline unsigned dirent_val_u64s(unsigned len, unsigned cf_len)
> +{
> +	if (cf_len > 0) {
> +		return DIV_ROUND_UP(offsetof(struct bch_dirent, d_cf_name_block.d_names) + len + cf_len,
> +				    sizeof(u64));
> +	}
>  	return DIV_ROUND_UP(offsetof(struct bch_dirent, d_name) + len,
>  			    sizeof(u64));

FWIW, I think something along the lines of the pattern you use just
below makes this a bit more readable. For example, something like:

        unsigned name_offset;
        name_offset = (cf_len > 0) ?
                        offsetof(struct bch_dirent, d_cf_name_block.d_names) :
                        offsetof(struct bch_dirent, d_name);
        return DIV_ROUND_UP(name_offset + len + cf_len, sizeof(u64));

... but just a thought.

>  }
...
>  /* bcachefs inode flags -> FS_IOC_FSGETXATTR: */
> diff --git a/fs/bcachefs/str_hash.h b/fs/bcachefs/str_hash.h
> index ae21a8cca1b4..866cfc145889 100644
> --- a/fs/bcachefs/str_hash.h
> +++ b/fs/bcachefs/str_hash.h
> @@ -34,6 +34,7 @@ bch2_str_hash_opt_to_type(struct bch_fs *c, enum bch_str_hash_opts opt)
>  
>  struct bch_hash_info {
>  	u8			type;
> +	struct unicode_map 	*cf_encoding;
>  	/*
>  	 * For crc32 or crc64 string hashes the first key value of
>  	 * the siphash_key (k0) is used as the key.
> @@ -48,6 +49,9 @@ bch2_hash_info_init(struct bch_fs *c, const struct bch_inode_unpacked *bi)
>  	struct bch_hash_info info = {
>  		.type = (bi->bi_flags >> INODE_STR_HASH_OFFSET) &
>  			~(~0U << INODE_STR_HASH_BITS),
> +#if IS_ENABLED(CONFIG_UNICODE)
> +		.cf_encoding = !!(bi->bi_flags & BCH_INODE_CASEFOLDED) ? c->cf_encoding : NULL,
> +#endif

What's the purpose for propagating this pointer via the hash? Is this
necessary to (eventually) support per-directory encodings, or something
else? If purely the former and that is something to possibly come later,
it does seem a little premature to me in that the infrastructure part
should probably go along with the feature functionality that takes
advantage of it.

BTW on a similar train of thought.. I'm not familiar enough with the
feature to have an opinion on Gabriel's feedback wrt to storing the
casefolded name on-disk, but I do think there's value in breaking down
the feature into smaller logical components, particularly when there are
performance vs. complexity tradeoffs. So for example if you wanted to
continue to implement the on-disk format change approach, perhaps factor
out core functionality along the lines of what ext4 does into a patch 1,
then add the on-disk format change support in patch 2 as an (optional?)
optimization (then lift off the per-dir bits into later patches that
implement the granular approach, etc.).

That not only makes this easier to review, but IMO significantly easier
to maintain, test, diagnose, etc., such that it's at least worth
considering. Just my .02.

Brian

>  		.siphash_key = { .k0 = bi->bi_hash_seed }
>  	};
>  
> diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c
> index 5c62fcf3afdb..78d19cfb966a 100644
> --- a/fs/bcachefs/super.c
> +++ b/fs/bcachefs/super.c
> @@ -757,6 +757,25 @@ static struct bch_fs *bch2_fs_alloc(struct bch_sb *sb, struct bch_opts opts)
>  	if (ret)
>  		goto err;
>  
> +#if IS_ENABLED(CONFIG_UNICODE)
> +	/* Default encoding until we can potentially have more as an option. */
> +	c->cf_encoding = utf8_load(BCH_FS_DEFAULT_UTF8_ENCODING);
> +	if (IS_ERR(c->cf_encoding)) {
> +		printk(KERN_ERR "Cannot load UTF-8 encoding for filesystem. Version: %u.%u.%u",
> +			unicode_major(BCH_FS_DEFAULT_UTF8_ENCODING),
> +			unicode_minor(BCH_FS_DEFAULT_UTF8_ENCODING),
> +			unicode_rev(BCH_FS_DEFAULT_UTF8_ENCODING));
> +		ret = -EINVAL;
> +		goto err;
> +	}
> +#else
> +	if (c->sb.features & (1ULL << BCH_FEATURE_casefolding)) {
> +		printk(KERN_ERR "Cannot mount a filesystem with casefolding on a kernel without CONFIG_UNICODE\n");
> +		ret = -EINVAL;
> +		goto err;
> +	}
> +#endif
> +
>  	pr_uuid(&name, c->sb.user_uuid.b);
>  	strscpy(c->name, name.buf, sizeof(c->name));
>  	printbuf_exit(&name);
> -- 
> 2.41.0
>

next prev parent reply	other threads:[~2023-08-15 12:35 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-14  1:24 [v4 0/3] bcachefs: Casefolding implementation Joshua Ashton
2023-08-14  1:24 ` [v4 1/3] bcachefs: Split out dirent alloc and name initialization Joshua Ashton
2023-08-14  1:24 ` [v4 2/3] bcachefs: Move dirent_val_u64s to dirent.c Joshua Ashton
2023-08-14  1:24 ` [v4 3/3] bcachefs: Implement casefolding Joshua Ashton
2023-08-14 16:34   ` Gabriel Krisman Bertazi
2023-08-14 21:18     ` Joshua Ashton
2023-08-14 21:33       ` Joshua Ashton
2023-08-14 23:01       ` Gabriel Krisman Bertazi
2023-08-15  0:58       ` Kent Overstreet
2023-08-15 18:00         ` Gabriel Krisman Bertazi
2023-08-23  9:37           ` Joshua Ashton
2023-08-15 12:36   ` Brian Foster [this message]
2023-08-24  5:43     ` Joshua Ashton
2023-08-16 19:40 ` [v4 0/3] bcachefs: Casefolding implementation Kent Overstreet
2023-08-23  9:34   ` Joshua Ashton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZNtxM447DrmJDA3W@bfoster \
    --to=bfoster@redhat.com \
    --cc=andrealmeid@igalia.com \
    --cc=joshua@froggi.es \
    --cc=krisman@suse.de \
    --cc=linux-bcachefs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox