From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4E1BC001B0 for ; Tue, 15 Aug 2023 12:35:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237119AbjHOMem (ORCPT ); Tue, 15 Aug 2023 08:34:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237271AbjHOMeV (ORCPT ); Tue, 15 Aug 2023 08:34:21 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C84AC19B4 for ; Tue, 15 Aug 2023 05:33:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1692102783; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kfiD0C27rHiSjCZ7pupbA2nLVMOp1AGD1WgQuwUPNPs=; b=Ulz58PU3JxgJyVpiO1rkPEPmEATe0Ubdl51a2FalpL3yMMEWRBLeAINTylEqIJGfFwhFCX thyeufi7ay6S/uPCzQqnancVmdU7Cm7EjE2H1L01CJKtzs2bHtVm3G6lvWwFppSCPMD2qv F85ieKSr1YjJfM6XMZhHfgu8boJOp04= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-88-J_adAVNDOtCba3v5OgbJhw-1; Tue, 15 Aug 2023 08:33:01 -0400 X-MC-Unique: J_adAVNDOtCba3v5OgbJhw-1 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-767564705f5so602871285a.1 for ; Tue, 15 Aug 2023 05:33:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692102781; x=1692707581; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kfiD0C27rHiSjCZ7pupbA2nLVMOp1AGD1WgQuwUPNPs=; b=CkK6ovjkxHWF42cK4bR7IN2YpTwLyH3jYbJM6hK+I4RbqSu7b78pbvFwjCtxrED2G0 2UKfu5L2ob57rbAGTAq+SmL/G5iAGpnYk1I1polLA1BwblaiPZf/D7+fqfaZK9rg4XCE 6e50cCChvyc7T6QpxxyeDTq2n4oiHmZt5BOIhoxdlu+mtK/p5e4Si/mJ5wvxs2HHKuRM Z7VIdk0xpPuhFnpzQpGK1JXi5pshHCW17FB1cf49pV+HR8oCAimnOy1+vSY91QS/YAnJ 6Dls2bAmu39aEWYHjhT/qPCxov41u/la3WUAJ8YeST2nwerNQPYq4YJpbrDL8OLGcIsj V4wQ== X-Gm-Message-State: AOJu0YzCYS5fHm3qiIijEMyCxMG3KzrgBBuBwDPi/5fCRieqEHUgWAA6 AxVFADlvsux4mR9O0I2LQKuczi372qvsHucfvb4iVHqfs9n4mCgW/326caceszTDnhg/e5LPt/U hb4kQ8InI+g7UslE/jB8bYLC9+jk= X-Received: by 2002:a05:620a:2449:b0:767:954:a743 with SMTP id h9-20020a05620a244900b007670954a743mr15953118qkn.51.1692102781167; Tue, 15 Aug 2023 05:33:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH5AXPUZ+m+G9InvzR3SJ7JT96UuoBJK9SbXniYv8UvJ113dRC7e1d59iCLH221K8x4EBf97Q== X-Received: by 2002:a05:620a:2449:b0:767:954:a743 with SMTP id h9-20020a05620a244900b007670954a743mr15953095qkn.51.1692102780805; Tue, 15 Aug 2023 05:33:00 -0700 (PDT) Received: from bfoster (c-24-60-61-41.hsd1.ma.comcast.net. [24.60.61.41]) by smtp.gmail.com with ESMTPSA id g3-20020a05620a13c300b0075cd80fde9esm3714172qkl.89.2023.08.15.05.32.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Aug 2023 05:33:00 -0700 (PDT) Date: Tue, 15 Aug 2023 08:36:03 -0400 From: Brian Foster To: Joshua Ashton Cc: linux-bcachefs@vger.kernel.org, =?iso-8859-1?Q?Andr=E9?= Almeida , Gabriel Krisman Bertazi Subject: Re: [v4 3/3] bcachefs: Implement casefolding Message-ID: References: <20230814012434.627641-1-joshua@froggi.es> <20230814012434.627641-4-joshua@froggi.es> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230814012434.627641-4-joshua@froggi.es> Precedence: bulk List-ID: X-Mailing-List: linux-bcachefs@vger.kernel.org On Mon, Aug 14, 2023 at 02:24:27AM +0100, Joshua Ashton wrote: > This patch implements support for case-insensitive file name lookups > in bcachefs. > > The implementation uses the same UTF-8 lowering and normalization that > ext4 and f2fs is using. > > More information is provided in Documentation/bcachefs/casefolding.rst > > Signed-off-by: Joshua Ashton > > Cc: André Almeida > Cc: Gabriel Krisman Bertazi > --- Hi Joshua, I'm still grokking the implementation and I see Gabriel has already provided much more substantive feedback, so just a few random comments... > .../filesystems/bcachefs/casefolding.rst | 73 ++++++++ > fs/bcachefs/bcachefs.h | 6 + > fs/bcachefs/bcachefs_format.h | 26 ++- > fs/bcachefs/dirent.c | 177 +++++++++++++++--- > fs/bcachefs/fs-common.c | 4 + > fs/bcachefs/fs-ioctl.c | 22 +++ > fs/bcachefs/fs-ioctl.h | 20 +- > fs/bcachefs/str_hash.h | 4 + > fs/bcachefs/super.c | 19 ++ > 9 files changed, 318 insertions(+), 33 deletions(-) > create mode 100644 Documentation/filesystems/bcachefs/casefolding.rst > > diff --git a/Documentation/filesystems/bcachefs/casefolding.rst b/Documentation/filesystems/bcachefs/casefolding.rst > new file mode 100644 > index 000000000000..c8580d892ae8 > --- /dev/null > +++ b/Documentation/filesystems/bcachefs/casefolding.rst > @@ -0,0 +1,73 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +Casefolding > +=========== > + > +bcachefs has support for case-insensitive file and directory > +lookups using the regular `chattr +F` (`S_CASEFOLD`, `FS_CASEFOLD_FL`) > +casefolding attributes. > + > +The main usecase for casefolding is compatibility with software written > +against other filesystems that rely on casefolded lookups > +(eg. NTFS and Wine/Proton). > +Taking advantage of file-system level casefolding can lead to great > +loading time gains in many applications and games. > + > +Casefolding support requires a kernel with the `CONFIG_UNICODE` enabled. > +Once a directory has been flagged for casefolding, a feature bit > +is enabled on the superblock which marks the filesystem as using > +casefolding. > +When the feature bit for casefolding is enabled, it is no longer possible > +to mount that filesystem on kernels without `CONFIG_UNICODE` enabled. > + > +On the lookup/query side: casefolding is implemented by allocating a new > +string of `BCH_NAME_MAX` length using the `utf8_casefold` function to > +casefold the query string. > + > +On the dirent side: casefolding is implemented by ensuring the `bkey`'s > +hash is made from the casefolded string and storing the cached casefolded > +name with the regular name in the dirent. > + > +The structure looks like this: > + > +Regular: [dirent data][regular name][nul][nul]... > +Casefolded: [dirent data][reg len][cf len][regular name][casefolded name][nul][nul]... > + > +(Do note, the number of `NUL`s here is merely for illustration, they count can vary > + per-key, and they may not even be present if the key is aligned to `sizeof(u64)`.) > + > +This is efficient as it means that for all file lookups that require casefolding, > +it has identical performance to a regular lookup: > +a hash comparison and a `memcmp` of the name. > + Thanks for the writeup. This is very helpful. > +Rationale > +--------- > + > +Several designs were considered for this system: > +One was to introduce a dirent_v2, however that would be painful especially as > +the hash system only has support for a single key type. This would also need > +`BCH_NAME_MAX` to change between versions, and a new feature bit. > + > +Another option was to store without the two lengths, and just take the length of > +the regular name and casefolded name contiguously / 2 as the length. This would > +assume that the regular length == casefolded length, but that could potentially > +not be true, if the uppercase unicode glyph had a different UTF-8 encoding than > +the lowercase unicode glyph. > +It would be possible to disregard the casefold cache for those cases, but it was > +decided to simply encode the two string lengths in the key to avoid random > +performance issues if this edgecase was ever hit. > + > +The option settled on was to use a free-bit in d_type to mark a dirent as having > +a casefold cache, and then treat the first 4 bytes the name block as lengths. > +You can see this in the `d_cf_name_block` member of union in `bch_dirent`. > + > +The feature bit was used to allow casefolding support to be enabled for the majority > +of users, but some allow users who have no need for the feature to still use bcachefs as > +`CONFIG_UNICODE` can increase the kernel side a significant amount due to the tables used, > +which may be decider between using bcachefs for eg. embedded platforms. > + > +Other filesystems like ext4 and f2fs have a super-block level option for casefolding > +encoding, but bcachefs currently does not provide this. ext4 and f2fs do not expose > +any encodings than a single UTF-8 version. When future encodings are desirable, > +they will be added trivially using the opts mechanism. > + ... > diff --git a/fs/bcachefs/bcachefs_format.h b/fs/bcachefs/bcachefs_format.h > index 23bae622309c..cc6797422977 100644 > --- a/fs/bcachefs/bcachefs_format.h > +++ b/fs/bcachefs/bcachefs_format.h ... > @@ -864,6 +866,7 @@ enum { > #define BCH_INODE_I_SECTORS_DIRTY (1 << __BCH_INODE_I_SECTORS_DIRTY) > #define BCH_INODE_UNLINKED (1 << __BCH_INODE_UNLINKED) > #define BCH_INODE_BACKPTR_UNTRUSTED (1 << __BCH_INODE_BACKPTR_UNTRUSTED) > +#define BCH_INODE_CASEFOLDED (1 << __BCH_INODE_CASEFOLDED) > > LE32_BITMASK(INODE_STR_HASH, struct bch_inode, bi_flags, 20, 24); > LE32_BITMASK(INODE_NR_FIELDS, struct bch_inode, bi_flags, 24, 31); > @@ -908,9 +911,25 @@ struct bch_dirent { > * Copy of mode bits 12-15 from the target inode - so userspace can get > * the filetype without having to do a stat() > */ > - __u8 d_type; > +#if defined(__LITTLE_ENDIAN_BITFIELD) > + __u8 d_type:5, > + d_unused:2, > + d_casefold:1; > +#elif defined(__BIG_ENDIAN_BITFIELD) > + __u8 d_casefold:1, > + d_unused:2, > + d_type:5; > +#endif Interesting. I didn't realize that bitfield endianness matched general endianness, so I presume this is required to ensure data is always le on disk. > > - __u8 d_name[]; > + union { > + struct { > + __u8 d_pad; > + __le16 d_name_len; > + __le16 d_cf_name_len; > + __u8 d_names[0]; > + } d_cf_name_block __packed; > + __u8 d_name[0]; > + } __packed; > } __packed __aligned(8); > > #define DT_SUBVOL 16 > @@ -1841,7 +1860,8 @@ static inline void SET_BCH_SB_BACKGROUND_COMPRESSION_TYPE(struct bch_sb *sb, __u > x(new_varint, 15) \ > x(journal_no_flush, 16) \ > x(alloc_v2, 17) \ > - x(extents_across_btree_nodes, 18) > + x(extents_across_btree_nodes, 18) \ > + x(casefolding, 19) > > #define BCH_SB_FEATURES_ALWAYS \ > ((1ULL << BCH_FEATURE_new_extent_overwrite)| \ > diff --git a/fs/bcachefs/dirent.c b/fs/bcachefs/dirent.c > index 6bdcccabd732..6ac45d7ce6e9 100644 > --- a/fs/bcachefs/dirent.c > +++ b/fs/bcachefs/dirent.c > @@ -13,8 +13,39 @@ > > #include > > -static inline unsigned dirent_val_u64s(unsigned len) > +static inline int bch2_casefold(struct btree_trans *trans, const struct bch_hash_info *info, > + const struct qstr *str, struct qstr *out_cf) > { > +#if IS_ENABLED(CONFIG_UNICODE) > + unsigned char *buf = NULL; > + int ret; > + > + *out_cf = (struct qstr) QSTR_INIT(NULL, 0); > + > + buf = bch2_trans_kmalloc(trans, BCH_NAME_MAX + 1); > + ret = PTR_ERR_OR_ZERO(buf); > + if (ret) > + return ret; > + > + ret = utf8_casefold(info->cf_encoding, str, buf, BCH_NAME_MAX + 1); Looks like this helper really only needs the encoding pointer rather than the full hash_info..? > + if (ret <= 0) > + return ret; > + > + *out_cf = (struct qstr) QSTR_INIT(buf, ret); > + return 0; > +#else > + *out_cf = (struct qstr) QSTR_INIT(NULL, 0); > + > + return -EOPNOTSUPP; > +#endif It also looks like the #else could be completely avoided here. For example, lift the local vars out of the ifdef, init out_cf using the NULL buf, init ret to -EOPNOTSUPP and push 'return ret;' out of the end of the ifdef. Hm? > +} > + > +static inline unsigned dirent_val_u64s(unsigned len, unsigned cf_len) > +{ > + if (cf_len > 0) { > + return DIV_ROUND_UP(offsetof(struct bch_dirent, d_cf_name_block.d_names) + len + cf_len, > + sizeof(u64)); > + } > return DIV_ROUND_UP(offsetof(struct bch_dirent, d_name) + len, > sizeof(u64)); FWIW, I think something along the lines of the pattern you use just below makes this a bit more readable. For example, something like: unsigned name_offset; name_offset = (cf_len > 0) ? offsetof(struct bch_dirent, d_cf_name_block.d_names) : offsetof(struct bch_dirent, d_name); return DIV_ROUND_UP(name_offset + len + cf_len, sizeof(u64)); ... but just a thought. > } ... > /* bcachefs inode flags -> FS_IOC_FSGETXATTR: */ > diff --git a/fs/bcachefs/str_hash.h b/fs/bcachefs/str_hash.h > index ae21a8cca1b4..866cfc145889 100644 > --- a/fs/bcachefs/str_hash.h > +++ b/fs/bcachefs/str_hash.h > @@ -34,6 +34,7 @@ bch2_str_hash_opt_to_type(struct bch_fs *c, enum bch_str_hash_opts opt) > > struct bch_hash_info { > u8 type; > + struct unicode_map *cf_encoding; > /* > * For crc32 or crc64 string hashes the first key value of > * the siphash_key (k0) is used as the key. > @@ -48,6 +49,9 @@ bch2_hash_info_init(struct bch_fs *c, const struct bch_inode_unpacked *bi) > struct bch_hash_info info = { > .type = (bi->bi_flags >> INODE_STR_HASH_OFFSET) & > ~(~0U << INODE_STR_HASH_BITS), > +#if IS_ENABLED(CONFIG_UNICODE) > + .cf_encoding = !!(bi->bi_flags & BCH_INODE_CASEFOLDED) ? c->cf_encoding : NULL, > +#endif What's the purpose for propagating this pointer via the hash? Is this necessary to (eventually) support per-directory encodings, or something else? If purely the former and that is something to possibly come later, it does seem a little premature to me in that the infrastructure part should probably go along with the feature functionality that takes advantage of it. BTW on a similar train of thought.. I'm not familiar enough with the feature to have an opinion on Gabriel's feedback wrt to storing the casefolded name on-disk, but I do think there's value in breaking down the feature into smaller logical components, particularly when there are performance vs. complexity tradeoffs. So for example if you wanted to continue to implement the on-disk format change approach, perhaps factor out core functionality along the lines of what ext4 does into a patch 1, then add the on-disk format change support in patch 2 as an (optional?) optimization (then lift off the per-dir bits into later patches that implement the granular approach, etc.). That not only makes this easier to review, but IMO significantly easier to maintain, test, diagnose, etc., such that it's at least worth considering. Just my .02. Brian > .siphash_key = { .k0 = bi->bi_hash_seed } > }; > > diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c > index 5c62fcf3afdb..78d19cfb966a 100644 > --- a/fs/bcachefs/super.c > +++ b/fs/bcachefs/super.c > @@ -757,6 +757,25 @@ static struct bch_fs *bch2_fs_alloc(struct bch_sb *sb, struct bch_opts opts) > if (ret) > goto err; > > +#if IS_ENABLED(CONFIG_UNICODE) > + /* Default encoding until we can potentially have more as an option. */ > + c->cf_encoding = utf8_load(BCH_FS_DEFAULT_UTF8_ENCODING); > + if (IS_ERR(c->cf_encoding)) { > + printk(KERN_ERR "Cannot load UTF-8 encoding for filesystem. Version: %u.%u.%u", > + unicode_major(BCH_FS_DEFAULT_UTF8_ENCODING), > + unicode_minor(BCH_FS_DEFAULT_UTF8_ENCODING), > + unicode_rev(BCH_FS_DEFAULT_UTF8_ENCODING)); > + ret = -EINVAL; > + goto err; > + } > +#else > + if (c->sb.features & (1ULL << BCH_FEATURE_casefolding)) { > + printk(KERN_ERR "Cannot mount a filesystem with casefolding on a kernel without CONFIG_UNICODE\n"); > + ret = -EINVAL; > + goto err; > + } > +#endif > + > pr_uuid(&name, c->sb.user_uuid.b); > strscpy(c->name, name.buf, sizeof(c->name)); > printbuf_exit(&name); > -- > 2.41.0 >