From: Gabriel Krisman Bertazi <krisman@suse.de>
To: Joshua Ashton <joshua@froggi.es>
Cc: linux-bcachefs@vger.kernel.org, "André Almeida" <andrealmeid@igalia.com>
Subject: Re: [v4 3/3] bcachefs: Implement casefolding
Date: Mon, 14 Aug 2023 12:34:46 -0400 [thread overview]
Message-ID: <87v8dh60rd.fsf@suse.de> (raw)
In-Reply-To: <20230814012434.627641-4-joshua@froggi.es> (Joshua Ashton's message of "Mon, 14 Aug 2023 02:24:27 +0100")
Joshua Ashton <joshua@froggi.es> writes:
> This patch implements support for case-insensitive file name lookups
> in bcachefs.
>
> The implementation uses the same UTF-8 lowering and normalization that
> ext4 and f2fs is using.
>
> More information is provided in Documentation/bcachefs/casefolding.rst
>
> Signed-off-by: Joshua Ashton <joshua@froggi.es>
Please, cc the entire thread as i don't think I even got the cover
letter.
Did you test this against fstests generic/556? it checks all sorts of
common pitfalls and corner-cases in case-insensitive filesystems.
> --- /dev/null
> +++ b/Documentation/filesystems/bcachefs/casefolding.rst
> @@ -0,0 +1,73 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Casefolding
> +===========
> +
> +bcachefs has support for case-insensitive file and directory
> +lookups using the regular `chattr +F` (`S_CASEFOLD`, `FS_CASEFOLD_FL`)
> +casefolding attributes.
> +
> +The main usecase for casefolding is compatibility with software written
> +against other filesystems that rely on casefolded lookups
> +(eg. NTFS and Wine/Proton).
> +Taking advantage of file-system level casefolding can lead to great
> +loading time gains in many applications and games.
> +
> +Casefolding support requires a kernel with the `CONFIG_UNICODE` enabled.
> +Once a directory has been flagged for casefolding, a feature bit
> +is enabled on the superblock which marks the filesystem as using
> +casefolding.
> +When the feature bit for casefolding is enabled, it is no longer possible
> +to mount that filesystem on kernels without `CONFIG_UNICODE` enabled.
> +
> +On the lookup/query side: casefolding is implemented by allocating a new
> +string of `BCH_NAME_MAX` length using the `utf8_casefold` function to
> +casefold the query string.
> +
> +On the dirent side: casefolding is implemented by ensuring the `bkey`'s
> +hash is made from the casefolded string and storing the cached casefolded
> +name with the regular name in the dirent.
> +
> +The structure looks like this:
> +
> +Regular: [dirent data][regular name][nul][nul]...
> +Casefolded: [dirent data][reg len][cf len][regular name][casefolded name][nul][nul]...
> +
> +(Do note, the number of `NUL`s here is merely for illustration, they count can vary
> + per-key, and they may not even be present if the key is aligned to `sizeof(u64)`.)
> +
> +This is efficient as it means that for all file lookups that require casefolding,
> +it has identical performance to a regular lookup:
> +a hash comparison and a `memcmp` of the name.
> +
> +Rationale
> +---------
> +
> +Several designs were considered for this system:
> +One was to introduce a dirent_v2, however that would be painful especially as
> +the hash system only has support for a single key type. This would also need
> +`BCH_NAME_MAX` to change between versions, and a new feature bit.
> +
> +Another option was to store without the two lengths, and just take the length of
> +the regular name and casefolded name contiguously / 2 as the length. This would
> +assume that the regular length == casefolded length, but that could potentially
> +not be true, if the uppercase unicode glyph had a different UTF-8 encoding than
> +the lowercase unicode glyph.
Why do you want to store the casefolded name? In ext4, we just store
the case-exact name that was created, and do a code-point by code-point
comparison with the name under lookup through utf8_strncasecmp, which is
optimized to compare two not casefolded strings. We just modify the
filename hash to be case-insensitive, so we know which dirent to look
for during lookup. Finally, we use d_compare/d_hash to leverage the
dcache into finding negative dentries.
Sure, if you have both, dirent and dentry->d_name casefolded, then it is
just a regular memcmp, but at least you don't have to change the disk
format and your implementation becomes simpler.
> +It would be possible to disregard the casefold cache for those cases, but it was
> +decided to simply encode the two string lengths in the key to avoid random
> +performance issues if this edgecase was ever hit.
> +
> +The option settled on was to use a free-bit in d_type to mark a dirent as having
> +a casefold cache, and then treat the first 4 bytes the name block as lengths.
> +You can see this in the `d_cf_name_block` member of union in `bch_dirent`.
> +
> +The feature bit was used to allow casefolding support to be enabled for the majority
> +of users, but some allow users who have no need for the feature to still use bcachefs as
> +`CONFIG_UNICODE` can increase the kernel side a significant amount due to the tables used,
> +which may be decider between using bcachefs for eg. embedded platforms.
> +
> +Other filesystems like ext4 and f2fs have a super-block level option for casefolding
> +encoding, but bcachefs currently does not provide this. ext4 and f2fs do not expose
> +any encodings than a single UTF-8 version. When future encodings are desirable,
> +they will be added trivially using the opts mechanism.
> +
> diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
> index 30b3d7b9f9dc..0e1facdc7876 100644
> --- a/fs/bcachefs/bcachefs.h
> +++ b/fs/bcachefs/bcachefs.h
> @@ -202,6 +202,7 @@
> #include <linux/types.h>
> #include <linux/workqueue.h>
> #include <linux/zstd.h>
> +#include <linux/unicode.h>
>
> #include "bcachefs_format.h"
> #include "errcode.h"
> @@ -657,6 +658,8 @@ enum bch_write_ref {
> BCH_WRITE_REF_NR,
> };
>
> +#define BCH_FS_DEFAULT_UTF8_ENCODING UNICODE_AGE(12, 1, 0)
> +
> struct bch_fs {
> struct closure cl;
>
> @@ -723,6 +726,9 @@ struct bch_fs {
> u64 compat;
> } sb;
>
> +#if IS_ENABLED(CONFIG_UNICODE)
> + struct unicode_map *cf_encoding;
> +#endif
This goes in struct super_block.
> @@ -13,8 +13,39 @@
>
> #include <linux/dcache.h>
>
> -static inline unsigned dirent_val_u64s(unsigned len)
> +static inline int bch2_casefold(struct btree_trans *trans, const struct bch_hash_info *info,
> + const struct qstr *str, struct qstr *out_cf)
> {
> +#if IS_ENABLED(CONFIG_UNICODE)
> + unsigned char *buf = NULL;
> + int ret;
> +
> + *out_cf = (struct qstr) QSTR_INIT(NULL, 0);
> +
> + buf = bch2_trans_kmalloc(trans, BCH_NAME_MAX + 1);
> + ret = PTR_ERR_OR_ZERO(buf);
> + if (ret)
> + return ret;
> +
> + ret = utf8_casefold(info->cf_encoding, str, buf, BCH_NAME_MAX + 1);
> + if (ret <= 0)
> + return ret;
The error path leaks buf.
> @@ -472,6 +600,7 @@ int __bch2_dirent_lookup_trans(struct btree_trans *trans,
> const struct qstr *name, subvol_inum *inum,
> unsigned flags)
> {
> + struct qstr lookup_name = *name;
> struct bkey_s_c k;
> struct bkey_s_c_dirent d;
> u32 snapshot;
> @@ -481,8 +610,14 @@ int __bch2_dirent_lookup_trans(struct btree_trans *trans,
> if (ret)
> return ret;
>
> + ret = hash_info->cf_encoding
> + ? bch2_casefold(trans, hash_info, name, &lookup_name)
> + : 0;
> + if (ret)
> + return ret;
> +
> ret = bch2_hash_lookup(trans, iter, bch2_dirent_hash_desc,
> - hash_info, dir, name, flags);
> + hash_info, dir, &lookup_name, flags);
> if (ret)
> return ret;
(Skipping the bcachefs-specific bits that I'm not qualified to review)
I don't get your model for the dcache. I don't see any handling
anywhere in the patch. Also, I fetched the code and it seems bcachefs
will just do d_splice_alias at the end of the lookup. So, when someone
lookups FOO, ->lookup will be called and the dentry FOO will be created.
Then, when FOo is looked up, the dcache won't find the previous dentry
and ->lookup will be called again (perhaps creating another dentry?)
There are two ways around it. Rely on the libfs' generic handlers for
d_hash/d_compare and soon d_revalidate, or use d_add_ci to search for
case-insensitive aliases at lookup time, but then you don't benefit much
from the dcache if looking up for different cases.
> @@ -54,6 +54,28 @@ static int bch2_inode_flags_set(struct btree_trans *trans,
> (newflags & (BCH_INODE_NODUMP|BCH_INODE_NOATIME)) != newflags)
> return -EINVAL;
>
> + if ((newflags ^ oldflags) & BCH_INODE_CASEFOLDED) {
> +#if IS_ENABLED(CONFIG_UNICODE)
> + int ret = 0;
> + /* Not supported on individual files. */
> + if (!S_ISDIR(bi->bi_mode))
> + return -EOPNOTSUPP;
> +
> + /*
> + * Make sure the dir is empty, as otherwise we'd need to
> + * rehash everything and update the dirent keys.
> + */
> + ret = bch2_empty_dir_trans(trans, inode_inum(inode));
> + if (ret < 0)
> + return ret;
> +
> + bch2_check_set_feature(c, BCH_FEATURE_casefolding);
fwiw, we use _CASEFOLD in other filesystems, not _casefolding.
Thanks,
--
Gabriel Krisman Bertazi
next prev parent reply other threads:[~2023-08-14 16:35 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-14 1:24 [v4 0/3] bcachefs: Casefolding implementation Joshua Ashton
2023-08-14 1:24 ` [v4 1/3] bcachefs: Split out dirent alloc and name initialization Joshua Ashton
2023-08-14 1:24 ` [v4 2/3] bcachefs: Move dirent_val_u64s to dirent.c Joshua Ashton
2023-08-14 1:24 ` [v4 3/3] bcachefs: Implement casefolding Joshua Ashton
2023-08-14 16:34 ` Gabriel Krisman Bertazi [this message]
2023-08-14 21:18 ` Joshua Ashton
2023-08-14 21:33 ` Joshua Ashton
2023-08-14 23:01 ` Gabriel Krisman Bertazi
2023-08-15 0:58 ` Kent Overstreet
2023-08-15 18:00 ` Gabriel Krisman Bertazi
2023-08-23 9:37 ` Joshua Ashton
2023-08-15 12:36 ` Brian Foster
2023-08-24 5:43 ` Joshua Ashton
2023-08-16 19:40 ` [v4 0/3] bcachefs: Casefolding implementation Kent Overstreet
2023-08-23 9:34 ` Joshua Ashton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v8dh60rd.fsf@suse.de \
--to=krisman@suse.de \
--cc=andrealmeid@igalia.com \
--cc=joshua@froggi.es \
--cc=linux-bcachefs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox