* Re: [f2fs-dev] [PATCH v3 1/2] libfs: reduce the number of memory allocations in generic_ci_match @ 2025-07-08 6:52 ywen.chen 0 siblings, 0 replies; 5+ messages in thread From: ywen.chen @ 2025-07-08 6:52 UTC (permalink / raw) To: Christoph Hellwig, Eric Biggers Cc: brauner, tytso, linux-kernel, linux-f2fs-devel, Christoph Hellwig, adilger.kernel, viro, linux-fsdevel, jaegeuk, linux-ext4 > But I wonder why generic_ci_match is even called that often. Both ext4 > and f2fs support hashed lookups, so you should usually only see it called > for the main match, plus the occasional hash false positive, which should > be rate if the hash works. At present, in the latest version of Linux, in some scenarios, f2fs still uses linear search. The logic of linear search was introduced by Commit 91b587ba79e1 (f2fs: Introduce linear search for dentries). Commit 91b587ba79e1 was designed to solve the problem of inconsistent hashes before and after the rollback of Commit 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points"), which led to files being inaccessible. In order to reduce the impact of linear search, in relatively new versions, the logic of turning off linear search has also been introduced. However, the triggering conditions for this turn - off logic on f2fs are rather strict: 1. Use the latest version of the fsck.f2fs tool to correct the file system. 2. Use a relatively new version of the kernel. (For example, linear search cannot be turned off in v6.6) The performance gain of this commit is very obvious in scenarios where linear search is not turned off. In scenarios where linear search is turned off, no performance problems will be introduced either.<br> _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [f2fs-dev] [PATCH] f2fs: improve the performance of f2fs_lookup @ 2025-07-03 8:56 Christoph Hellwig 2025-07-04 2:43 ` [f2fs-dev] [PATCH v3 1/2] libfs: reduce the number of memory allocations in generic_ci_match Yuwen Chen 0 siblings, 1 reply; 5+ messages in thread From: Christoph Hellwig @ 2025-07-03 8:56 UTC (permalink / raw) To: Yuwen Chen Cc: brauner, tytso, linux-kernel, linux-f2fs-devel, adilger.kernel, viro, linux-fsdevel, jaegeuk, linux-ext4 On Thu, Jul 03, 2025 at 04:21:30PM +0800, Yuwen Chen wrote: > On the Android system, the file creation operation will call > the f2fs_lookup function. When there are too many files in a > directory, the generic_ci_match operation will be called > repeatedly in large quantities. In extreme cases, the file > creation speed will drop to three times per second. This files to explain what you are changing in detail, and why (except for the very highlevel problem statement here). > > Signed-off-by: Yuwen Chen <ywen.chen@foxmail.com> > --- > fs/ext4/namei.c | 2 +- > fs/f2fs/dir.c | 24 +++++++++++++++++------- > fs/f2fs/f2fs.h | 3 ++- > fs/f2fs/inline.c | 3 ++- > fs/libfs.c | 32 +++++++++++++++++++++++++++++--- > include/linux/fs.h | 8 +++++++- Also please split generic infrastructure changes from f2fs ones. _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* [f2fs-dev] [PATCH v3 1/2] libfs: reduce the number of memory allocations in generic_ci_match 2025-07-03 8:56 [f2fs-dev] [PATCH] f2fs: improve the performance of f2fs_lookup Christoph Hellwig @ 2025-07-04 2:43 ` Yuwen Chen 2025-07-04 6:02 ` Eric Biggers via Linux-f2fs-devel 0 siblings, 1 reply; 5+ messages in thread From: Yuwen Chen @ 2025-07-04 2:43 UTC (permalink / raw) To: hch Cc: brauner, ywen.chen, tytso, linux-kernel, linux-f2fs-devel, adilger.kernel, viro, linux-fsdevel, jaegeuk, linux-ext4 During path traversal, the generic_ci_match function may be called multiple times. The number of memory allocations and releases in it accounts for a relatively high proportion in the flamegraph. This patch significantly reduces the number of memory allocations in generic_ci_match through pre - allocation. Signed-off-by: Yuwen Chen <ywen.chen@foxmail.com> --- fs/ext4/namei.c | 2 +- fs/f2fs/dir.c | 2 +- fs/libfs.c | 33 ++++++++++++++++++++++++++++++--- include/linux/fs.h | 8 +++++++- 4 files changed, 39 insertions(+), 6 deletions(-) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index a178ac2294895..f235693bd71aa 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1443,7 +1443,7 @@ static bool ext4_match(struct inode *parent, return generic_ci_match(parent, fname->usr_fname, &fname->cf_name, de->name, - de->name_len) > 0; + de->name_len, NULL) > 0; } #endif diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index c36b3b22bfffd..4c6611fbd9574 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -197,7 +197,7 @@ static inline int f2fs_match_name(const struct inode *dir, if (fname->cf_name.name) return generic_ci_match(dir, fname->usr_fname, &fname->cf_name, - de_name, de_name_len); + de_name, de_name_len, NULL); #endif f.usr_fname = fname->usr_fname; diff --git a/fs/libfs.c b/fs/libfs.c index 9ea0ecc325a81..d2a6b2a4fe11c 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -1863,6 +1863,26 @@ static const struct dentry_operations generic_ci_dentry_ops = { #endif }; +#define DECRYPTED_NAME_PREALLOC_MIN_LEN 64 +static inline char *decrypted_name_prealloc_resize( + struct decrypted_name_prealloc *prealloc, + size_t wantlen) +{ + char *retbuf = NULL; + + if (prealloc->name && wantlen >= prealloc->namelen) + return prealloc->name; + + retbuf = kmalloc(wantlen + DECRYPTED_NAME_PREALLOC_MIN_LEN, GFP_KERNEL); + if (!retbuf) + return NULL; + + kfree(prealloc->name); + prealloc->name = retbuf; + prealloc->namelen = wantlen + DECRYPTED_NAME_PREALLOC_MIN_LEN; + return retbuf; +} + /** * generic_ci_match() - Match a name (case-insensitively) with a dirent. * This is a filesystem helper for comparison with directory entries. @@ -1873,6 +1893,7 @@ static const struct dentry_operations generic_ci_dentry_ops = { * @folded_name: Optional pre-folded name under lookup * @de_name: Dirent name. * @de_name_len: dirent name length. + * @prealloc: decrypted name memory buffer * * Test whether a case-insensitive directory entry matches the filename * being searched. If @folded_name is provided, it is used instead of @@ -1884,7 +1905,8 @@ static const struct dentry_operations generic_ci_dentry_ops = { int generic_ci_match(const struct inode *parent, const struct qstr *name, const struct qstr *folded_name, - const u8 *de_name, u32 de_name_len) + const u8 *de_name, u32 de_name_len, + struct decrypted_name_prealloc *prealloc) { const struct super_block *sb = parent->i_sb; const struct unicode_map *um = sb->s_encoding; @@ -1899,7 +1921,11 @@ int generic_ci_match(const struct inode *parent, if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent))) return -EINVAL; - decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL); + if (!prealloc) + decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL); + else + decrypted_name.name = decrypted_name_prealloc_resize( + prealloc, de_name_len); if (!decrypted_name.name) return -ENOMEM; res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name, @@ -1928,7 +1954,8 @@ int generic_ci_match(const struct inode *parent, res = utf8_strncasecmp(um, name, &dirent); out: - kfree(decrypted_name.name); + if (!prealloc) + kfree(decrypted_name.name); if (res < 0 && sb_has_strict_encoding(sb)) { pr_err_ratelimited("Directory contains filename that is invalid UTF-8"); return 0; diff --git a/include/linux/fs.h b/include/linux/fs.h index 4ec77da65f144..65307c8c11485 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3651,10 +3651,16 @@ extern int generic_file_fsync(struct file *, loff_t, loff_t, int); extern int generic_check_addressable(unsigned, u64); extern void generic_set_sb_d_ops(struct super_block *sb); + +struct decrypted_name_prealloc { + char *name; + size_t namelen; +}; extern int generic_ci_match(const struct inode *parent, const struct qstr *name, const struct qstr *folded_name, - const u8 *de_name, u32 de_name_len); + const u8 *de_name, u32 de_name_len, + struct decrypted_name_prealloc *prealloc); #if IS_ENABLED(CONFIG_UNICODE) int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str); -- 2.34.1 _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [f2fs-dev] [PATCH v3 1/2] libfs: reduce the number of memory allocations in generic_ci_match 2025-07-04 2:43 ` [f2fs-dev] [PATCH v3 1/2] libfs: reduce the number of memory allocations in generic_ci_match Yuwen Chen @ 2025-07-04 6:02 ` Eric Biggers via Linux-f2fs-devel 2025-07-07 5:27 ` Christoph Hellwig 0 siblings, 1 reply; 5+ messages in thread From: Eric Biggers via Linux-f2fs-devel @ 2025-07-04 6:02 UTC (permalink / raw) To: Yuwen Chen Cc: brauner, tytso, linux-kernel, linux-f2fs-devel, hch, adilger.kernel, viro, linux-fsdevel, jaegeuk, linux-ext4 On Fri, Jul 04, 2025 at 10:43:57AM +0800, Yuwen Chen wrote: > During path traversal, the generic_ci_match function may be called > multiple times. The number of memory allocations and releases > in it accounts for a relatively high proportion in the flamegraph. > This patch significantly reduces the number of memory allocations > in generic_ci_match through pre - allocation. > > Signed-off-by: Yuwen Chen <ywen.chen@foxmail.com> > --- > fs/ext4/namei.c | 2 +- > fs/f2fs/dir.c | 2 +- > fs/libfs.c | 33 ++++++++++++++++++++++++++++++--- > include/linux/fs.h | 8 +++++++- > 4 files changed, 39 insertions(+), 6 deletions(-) > The reason the allocation is needed at all is because generic_ci_match() has to decrypt the encrypted on-disk filename from the dentry that it's matching against. It can't decrypt in-place, since the source buffer is in the pagecache which must not be modified. Hence, a separate destination buffer is needed. Filenames have a maximum length of NAME_MAX, i.e. 255, bytes. It would be *much* simpler to just allocate that on the stack. And we almost can. 255 bytes is on the high end of what can be acceptable to allocate on the stack in the kernel. However, here it would give a lot of benefit and would always occur close to the leaves in the call graph. So the size is not a barrier here, IMO. The real problem is, once again, the legacy crypto_skcipher API, which requires that the source/destination buffers be provided as scatterlists. In Linux, the kernel stack can be in the vmalloc area. Thus, the buffers passed to crypto_skcipher cannot be stack buffers unless the caller actually is aware of how to turn a vmalloc'ed buffer into a scatterlist, which is hard to do. (See verity_ahash_update() in drivers/md/dm-verity-target.c for an example.) Fortunately, I'm currently in the process of introducing library APIs that will supersede these legacy crypto APIs. They'll be simpler and faster and won't have these silly limitations like not working on virtual addresses... I plan to make fscrypt use the library APIs instead of the legacy crypto API. It will take some time to land everything, though. We can consider this patchset as a workaround in the mean time. But it's sad to see the legacy crypto API continue to cause problems and more time be wasted on these problems. I do wonder if the "turn a vmalloc'ed buffer into a scatterlist" trick that some code in the kernel uses is something that would be worth adopting for now in fname_decrypt(). As I mentioned above, it's hard to do (you have to go page by page), but it's possible. That would allow immediately moving generic_ci_match() to use a stack allocation, which would avoid adding all the complexity of the preallocation that you have in this patchset. - Eric _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [f2fs-dev] [PATCH v3 1/2] libfs: reduce the number of memory allocations in generic_ci_match 2025-07-04 6:02 ` Eric Biggers via Linux-f2fs-devel @ 2025-07-07 5:27 ` Christoph Hellwig 2025-07-08 7:11 ` Yuwen Chen 0 siblings, 1 reply; 5+ messages in thread From: Christoph Hellwig @ 2025-07-07 5:27 UTC (permalink / raw) To: Eric Biggers Cc: brauner, linux-ext4, tytso, linux-kernel, linux-f2fs-devel, hch, adilger.kernel, viro, linux-fsdevel, jaegeuk, Yuwen Chen On Thu, Jul 03, 2025 at 11:02:59PM -0700, Eric Biggers wrote: [Can you trim your replies to the usual 73 characters? The long lines make them quite hard to read without first reflowing them] > The real problem is, once again, the legacy crypto_skcipher API, which requires > that the source/destination buffers be provided as scatterlists. In Linux, the > kernel stack can be in the vmalloc area. Thus, the buffers passed to > crypto_skcipher cannot be stack buffers unless the caller actually is aware of > how to turn a vmalloc'ed buffer into a scatterlist, which is hard to do. (See > verity_ahash_update() in drivers/md/dm-verity-target.c for an example.) I don't think setting up a scatterlist for vmalloc data is hard. But it is extra boilerplate code that is rather annoying and adds overhead. > code in the kernel uses is something that would be worth adopting for > now in fname_decrypt(). As I mentioned above, it's hard to do (you > have to go page by page), but it's possible. That would allow > immediately moving generic_ci_match() to use a stack allocation, which > would avoid adding all the complexity of the preallocation that you > have in this patchset. I suspect that all the overhead required for that get close to that of a memory allocation. But I wonder why generic_ci_match is even called that often. Both ext4 and f2fs support hashed lookups, so you should usually only see it called for the main match, plus the occasional hash false positive, which should be rate if the hash works. Yuwen, are you using f2fs in the mode where it does a linear scan on a hash lookup miss? That was added as a workaround for the utf8 code point changes, but is a completely broken idea the defeats hashed lookups and IIRC only was default for a very short time. Note that even with this fixed, using an on-stack allocation would be nice eventually when moving the crypto library API, as it would still avoid the allocation entirely. But caching shouldn't be worth it if the number of generic_ci_match per lookup is just slightly above 1. _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [f2fs-dev] [PATCH v3 1/2] libfs: reduce the number of memory allocations in generic_ci_match 2025-07-07 5:27 ` Christoph Hellwig @ 2025-07-08 7:11 ` Yuwen Chen 0 siblings, 0 replies; 5+ messages in thread From: Yuwen Chen @ 2025-07-08 7:11 UTC (permalink / raw) To: hch Cc: brauner, ywen.chen, tytso, linux-kernel, linux-f2fs-devel, ebiggers, adilger.kernel, viro, linux-fsdevel, jaegeuk, linux-ext4 On Sun, 6 Jul 2025 22:27:17 -0700 Christoph Hellwig wrote: > But I wonder why generic_ci_match is even called that often. Both ext4 > and f2fs support hashed lookups, so you should usually only see it called > for the main match, plus the occasional hash false positive, which should > be rate if the hash works. At present, in the latest version of Linux, in some scenarios, f2fs still uses linear search. The logic of linear search was introduced by Commit 91b587ba79e1 (f2fs: Introduce linear search for dentries). Commit 91b587ba79e1 was designed to solve the problem of inconsistent hashes before and after the rollback of Commit 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points"), which led to files being inaccessible. In order to reduce the impact of linear search, in relatively new versions, the logic of turning off linear search has also been introduced. However, the triggering conditions for this turn - off logic on f2fs are rather strict: 1. Use the latest version of the fsck.f2fs tool to correct the file system. 2. Use a relatively new version of the kernel. (For example, linear search cannot be turned off in v6.6) The performance gain of this commit is very obvious in scenarios where linear search is not turned off. In scenarios where linear search is turned off, no performance problems will be introduced either. _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-07-08 7:23 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-07-08 6:52 [f2fs-dev] [PATCH v3 1/2] libfs: reduce the number of memory allocations in generic_ci_match ywen.chen -- strict thread matches above, loose matches on Subject: below -- 2025-07-03 8:56 [f2fs-dev] [PATCH] f2fs: improve the performance of f2fs_lookup Christoph Hellwig 2025-07-04 2:43 ` [f2fs-dev] [PATCH v3 1/2] libfs: reduce the number of memory allocations in generic_ci_match Yuwen Chen 2025-07-04 6:02 ` Eric Biggers via Linux-f2fs-devel 2025-07-07 5:27 ` Christoph Hellwig 2025-07-08 7:11 ` Yuwen Chen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).