* [PATCH 0/2] speed up /proc/filesystems
@ 2025-03-29 19:28 Mateusz Guzik
2025-03-29 19:28 ` [PATCH 1/2] proc: add a helper for marking files as permanent by external consumers Mateusz Guzik
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Mateusz Guzik @ 2025-03-29 19:28 UTC (permalink / raw)
To: brauner; +Cc: viro, jack, linux-kernel, linux-fsdevel, Mateusz Guzik
I accidentally found out it is used a lot *and* is incredibly slow.
Part of it is procfs protecting the file from going away on each op,
other part is content generatin being dog slow.
Turns out procfs did not provide an interface to mark files as
permanent. I added easiest hack I could think of to remedy the problem,
I am not going to argue how to do it.
Mateusz Guzik (2):
proc: add a helper for marking files as permanent by external
consumers
fs: cache the string generated by reading /proc/filesystems
fs/filesystems.c | 148 +++++++++++++++++++++++++++++++++++++---
fs/proc/generic.c | 6 ++
include/linux/proc_fs.h | 1 +
3 files changed, 147 insertions(+), 8 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH 1/2] proc: add a helper for marking files as permanent by external consumers 2025-03-29 19:28 [PATCH 0/2] speed up /proc/filesystems Mateusz Guzik @ 2025-03-29 19:28 ` Mateusz Guzik 2025-03-29 19:28 ` [PATCH 2/2] fs: cache the string generated by reading /proc/filesystems Mateusz Guzik 2025-04-01 10:34 ` [PATCH 0/2] speed up /proc/filesystems Christian Brauner 2 siblings, 0 replies; 10+ messages in thread From: Mateusz Guzik @ 2025-03-29 19:28 UTC (permalink / raw) To: brauner; +Cc: viro, jack, linux-kernel, linux-fsdevel, Mateusz Guzik This avoids refing trips on use, added with VFS in mind but should be helpful elsewhere as well. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> --- fs/proc/generic.c | 6 ++++++ include/linux/proc_fs.h | 1 + 2 files changed, 7 insertions(+) diff --git a/fs/proc/generic.c b/fs/proc/generic.c index a3e22803cddf..ae86554bb271 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -668,6 +668,12 @@ struct proc_dir_entry *proc_create_single_data(const char *name, umode_t mode, } EXPORT_SYMBOL(proc_create_single_data); +void proc_make_permanent(struct proc_dir_entry *de) +{ + pde_make_permanent(de); +} +EXPORT_SYMBOL(proc_make_permanent); + void proc_set_size(struct proc_dir_entry *de, loff_t size) { de->size = size; diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index ea62201c74c4..0b6fcef4099a 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -105,6 +105,7 @@ struct proc_dir_entry *proc_create_single_data(const char *name, umode_t mode, int (*show)(struct seq_file *, void *), void *data); #define proc_create_single(name, mode, parent, show) \ proc_create_single_data(name, mode, parent, show, NULL) +void proc_make_permanent(struct proc_dir_entry *); extern struct proc_dir_entry *proc_create_data(const char *, umode_t, struct proc_dir_entry *, -- 2.43.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] fs: cache the string generated by reading /proc/filesystems 2025-03-29 19:28 [PATCH 0/2] speed up /proc/filesystems Mateusz Guzik 2025-03-29 19:28 ` [PATCH 1/2] proc: add a helper for marking files as permanent by external consumers Mateusz Guzik @ 2025-03-29 19:28 ` Mateusz Guzik 2025-03-29 20:53 ` Andreas Schwab 2025-04-01 10:30 ` Christian Brauner 2025-04-01 10:34 ` [PATCH 0/2] speed up /proc/filesystems Christian Brauner 2 siblings, 2 replies; 10+ messages in thread From: Mateusz Guzik @ 2025-03-29 19:28 UTC (permalink / raw) To: brauner; +Cc: viro, jack, linux-kernel, linux-fsdevel, Mateusz Guzik It is being read surprisingly often (e.g., by mkdir, ls and even sed!). This is lock-protected pointer chasing over a linked list to pay for sprintf for every fs (32 on my boxen). Instead cache the result. While here mark the file as permanent to avoid atomic refs on each op (which can also degrade to taking a spinlock). open+read+close cycle single-threaded (ops/s): before: 450929 after: 982308 (+117%) Here the main bottleneck is memcg. open+read+close cycle with 20 processes (ops/s): before: 578654 after: 3163961 (+446%) The main bottleneck *before* is spinlock acquire in procfs eliminated by marking the file as permanent. The main bottleneck *after* is the spurious lockref trip on open. The file looks like a sterotypical C from the 90s, right down to an open-code and slightly obfuscated linked list. I intentionally did not clean up any of it -- I think the file will be best served by a Rust rewrite when the time comes. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> --- fs/filesystems.c | 148 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 140 insertions(+), 8 deletions(-) diff --git a/fs/filesystems.c b/fs/filesystems.c index 58b9067b2391..5d4649fd8788 100644 --- a/fs/filesystems.c +++ b/fs/filesystems.c @@ -34,6 +34,23 @@ static struct file_system_type *file_systems; static DEFINE_RWLOCK(file_systems_lock); +#ifdef CONFIG_PROC_FS +static unsigned long file_systems_gen; + +struct file_systems_string { + struct rcu_head rcufree; + unsigned long gen; + size_t len; + char string[]; +}; +static struct file_systems_string *file_systems_string; +static void invalidate_filesystems_string(void); +#else +static void invalidate_filesystems_string(void) +{ +} +#endif + /* WARNING: This can be used only if we _already_ own a reference */ struct file_system_type *get_filesystem(struct file_system_type *fs) { @@ -83,10 +100,12 @@ int register_filesystem(struct file_system_type * fs) return -EBUSY; write_lock(&file_systems_lock); p = find_filesystem(fs->name, strlen(fs->name)); - if (*p) + if (*p) { res = -EBUSY; - else + } else { *p = fs; + invalidate_filesystems_string(); + } write_unlock(&file_systems_lock); return res; } @@ -115,6 +134,7 @@ int unregister_filesystem(struct file_system_type * fs) if (fs == *tmp) { *tmp = fs->next; fs->next = NULL; + invalidate_filesystems_string(); write_unlock(&file_systems_lock); synchronize_rcu(); return 0; @@ -234,25 +254,137 @@ int __init list_bdev_fs_names(char *buf, size_t size) } #ifdef CONFIG_PROC_FS -static int filesystems_proc_show(struct seq_file *m, void *v) +/* + * The fs list gets queried a lot by userspace, including rather surprising + * programs (would you guess *sed* is on the list?). In order to reduce the + * overhead we cache the resulting string, which normally hangs around below + * 512 bytes in size. + * + * As the list almost never changes, its creation is not particularly optimized + * for simplicity. + * + * We sort it out on read in order to not introduce a failure point for fs + * registration (in principle we may be unable to alloc memory for the list). + */ +static void invalidate_filesystems_string(void) { - struct file_system_type * tmp; + struct file_systems_string *fss; - read_lock(&file_systems_lock); + lockdep_assert_held_write(&file_systems_lock); + file_systems_gen++; + fss = file_systems_string; + WRITE_ONCE(file_systems_string, NULL); + kfree_rcu(fss, rcufree); +} + +static noinline int regen_filesystems_string(void) +{ + struct file_system_type *tmp; + struct file_systems_string *old, *new; + size_t newlen, usedlen; + unsigned long gen; + +retry: + lockdep_assert_not_held(&file_systems_lock); + + newlen = 0; + write_lock(&file_systems_lock); + gen = file_systems_gen; + tmp = file_systems; + /* pre-calc space for "%s\t%s\n" for each fs */ + while (tmp) { + if (!(tmp->fs_flags & FS_REQUIRES_DEV)) + newlen += strlen("nodev"); + newlen += strlen("\t"); + newlen += strlen(tmp->name); + newlen += strlen("\n"); + tmp = tmp->next; + } + write_unlock(&file_systems_lock); + + new = kmalloc(offsetof(struct file_systems_string, string) + newlen + 1, + GFP_KERNEL); + if (!new) + return -ENOMEM; + + new->gen = gen; + new->len = newlen; + new->string[newlen] = '\0'; + write_lock(&file_systems_lock); + old = file_systems_string; + + /* + * Did someone beat us to it? + */ + if (old && old->gen == file_systems_gen) { + write_unlock(&file_systems_lock); + kfree(new); + return 0; + } + + /* + * Did the list change in the meantime? + */ + if (gen != file_systems_gen) { + write_unlock(&file_systems_lock); + kfree(new); + goto retry; + } + + /* + * Populated the string. + * + * We know we have just enough space because we calculated the right + * size the previous time we had the lock and confirmed the list has + * not changed after reacquiring it. + */ + usedlen = 0; tmp = file_systems; while (tmp) { - seq_printf(m, "%s\t%s\n", + usedlen += sprintf(&new->string[usedlen], "%s\t%s\n", (tmp->fs_flags & FS_REQUIRES_DEV) ? "" : "nodev", tmp->name); tmp = tmp->next; } - read_unlock(&file_systems_lock); + BUG_ON(new->len != strlen(new->string)); + + /* + * Pairs with consume fence in READ_ONCE() in filesystems_proc_show(). + */ + smp_store_release(&file_systems_string, new); + write_unlock(&file_systems_lock); + kfree_rcu(old, rcufree); return 0; } +static int filesystems_proc_show(struct seq_file *m, void *v) +{ + struct file_systems_string *fss; + + for (;;) { + scoped_guard(rcu) { + /* + * Pairs with smp_store_release() in regen_filesystems_string(). + */ + fss = READ_ONCE(file_systems_string); + if (likely(fss)) { + seq_write(m, fss->string, fss->len); + return 0; + } + } + + int err = regen_filesystems_string(); + if (unlikely(err)) + return err; + } +} + static int __init proc_filesystems_init(void) { - proc_create_single("filesystems", 0, NULL, filesystems_proc_show); + struct proc_dir_entry *pde; + + pde = proc_create_single("filesystems", 0, NULL, filesystems_proc_show); + proc_make_permanent(pde); return 0; } module_init(proc_filesystems_init); -- 2.43.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] fs: cache the string generated by reading /proc/filesystems 2025-03-29 19:28 ` [PATCH 2/2] fs: cache the string generated by reading /proc/filesystems Mateusz Guzik @ 2025-03-29 20:53 ` Andreas Schwab 2025-04-04 8:41 ` Christoph Hellwig 2025-04-01 10:30 ` Christian Brauner 1 sibling, 1 reply; 10+ messages in thread From: Andreas Schwab @ 2025-03-29 20:53 UTC (permalink / raw) To: Mateusz Guzik; +Cc: brauner, viro, jack, linux-kernel, linux-fsdevel On Mär 29 2025, Mateusz Guzik wrote: > It is being read surprisingly often (e.g., by mkdir, ls and even sed!). It is part of libselinux (selinuxfs_exits), called by its library initializer. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] fs: cache the string generated by reading /proc/filesystems 2025-03-29 20:53 ` Andreas Schwab @ 2025-04-04 8:41 ` Christoph Hellwig 2025-04-05 4:55 ` Mateusz Guzik 0 siblings, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2025-04-04 8:41 UTC (permalink / raw) To: Andreas Schwab Cc: Mateusz Guzik, brauner, viro, jack, linux-kernel, linux-fsdevel On Sat, Mar 29, 2025 at 09:53:16PM +0100, Andreas Schwab wrote: > On Mär 29 2025, Mateusz Guzik wrote: > > > It is being read surprisingly often (e.g., by mkdir, ls and even sed!). > > It is part of libselinux (selinuxfs_exits), called by its library > initializer. Can we please fix libselinux instead of working around this really broken behavior in the kernel? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] fs: cache the string generated by reading /proc/filesystems 2025-04-04 8:41 ` Christoph Hellwig @ 2025-04-05 4:55 ` Mateusz Guzik 2025-04-05 5:26 ` Mateusz Guzik 0 siblings, 1 reply; 10+ messages in thread From: Mateusz Guzik @ 2025-04-05 4:55 UTC (permalink / raw) To: Christoph Hellwig Cc: Andreas Schwab, brauner, viro, jack, linux-kernel, linux-fsdevel On Fri, Apr 4, 2025 at 10:41 AM Christoph Hellwig <hch@infradead.org> wrote: > > On Sat, Mar 29, 2025 at 09:53:16PM +0100, Andreas Schwab wrote: > > On Mär 29 2025, Mateusz Guzik wrote: > > > > > It is being read surprisingly often (e.g., by mkdir, ls and even sed!). > > > > It is part of libselinux (selinuxfs_exits), called by its library > > initializer. > > Can we please fix libselinux instead of working around this really > broken behavior in the kernel? > That's a fair point, I'm going to ask them about it. If they fix it then I'll probably self-NAK the patch. -- Mateusz Guzik <mjguzik gmail.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] fs: cache the string generated by reading /proc/filesystems 2025-04-05 4:55 ` Mateusz Guzik @ 2025-04-05 5:26 ` Mateusz Guzik 0 siblings, 0 replies; 10+ messages in thread From: Mateusz Guzik @ 2025-04-05 5:26 UTC (permalink / raw) To: Christoph Hellwig Cc: Andreas Schwab, brauner, viro, jack, linux-kernel, linux-fsdevel On Sat, Apr 5, 2025 at 6:55 AM Mateusz Guzik <mjguzik@gmail.com> wrote: > > On Fri, Apr 4, 2025 at 10:41 AM Christoph Hellwig <hch@infradead.org> wrote: > > > > On Sat, Mar 29, 2025 at 09:53:16PM +0100, Andreas Schwab wrote: > > > On Mär 29 2025, Mateusz Guzik wrote: > > > > > > > It is being read surprisingly often (e.g., by mkdir, ls and even sed!). > > > > > > It is part of libselinux (selinuxfs_exits), called by its library > > > initializer. > > > > Can we please fix libselinux instead of working around this really > > broken behavior in the kernel? > > > > That's a fair point, I'm going to ask them about it. If they fix it > then I'll probably self-NAK the patch. > for interested parties: https://github.com/SELinuxProject/selinux/issues/468 -- Mateusz Guzik <mjguzik gmail.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] fs: cache the string generated by reading /proc/filesystems 2025-03-29 19:28 ` [PATCH 2/2] fs: cache the string generated by reading /proc/filesystems Mateusz Guzik 2025-03-29 20:53 ` Andreas Schwab @ 2025-04-01 10:30 ` Christian Brauner 2025-04-01 12:12 ` Mateusz Guzik 1 sibling, 1 reply; 10+ messages in thread From: Christian Brauner @ 2025-04-01 10:30 UTC (permalink / raw) To: Mateusz Guzik; +Cc: viro, jack, linux-kernel, linux-fsdevel On Sat, Mar 29, 2025 at 08:28:21PM +0100, Mateusz Guzik wrote: > It is being read surprisingly often (e.g., by mkdir, ls and even sed!). > > This is lock-protected pointer chasing over a linked list to pay for > sprintf for every fs (32 on my boxen). > > Instead cache the result. > > While here mark the file as permanent to avoid atomic refs on each op > (which can also degrade to taking a spinlock). > > open+read+close cycle single-threaded (ops/s): > before: 450929 > after: 982308 (+117%) > > Here the main bottleneck is memcg. > > open+read+close cycle with 20 processes (ops/s): > before: 578654 > after: 3163961 (+446%) > > The main bottleneck *before* is spinlock acquire in procfs eliminated by > marking the file as permanent. The main bottleneck *after* is the > spurious lockref trip on open. > > The file looks like a sterotypical C from the 90s, right down to an > open-code and slightly obfuscated linked list. I intentionally did not > clean up any of it -- I think the file will be best served by a Rust > rewrite when the time comes. > > Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> > --- > + pde = proc_create_single("filesystems", 0, NULL, filesystems_proc_show); > + proc_make_permanent(pde); > return 0; This all looks good enough for me especially if it's really that much of a bottleneck for some workloads. But the above part is broken because proc_create_single() may return NULL afair and that means proc_make_permanent()->pde_make_permanent() will NULL-deref. I'll fix that up locally. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] fs: cache the string generated by reading /proc/filesystems 2025-04-01 10:30 ` Christian Brauner @ 2025-04-01 12:12 ` Mateusz Guzik 0 siblings, 0 replies; 10+ messages in thread From: Mateusz Guzik @ 2025-04-01 12:12 UTC (permalink / raw) To: Christian Brauner; +Cc: viro, jack, linux-kernel, linux-fsdevel On Tue, Apr 1, 2025 at 12:31 PM Christian Brauner <brauner@kernel.org> wrote: > > On Sat, Mar 29, 2025 at 08:28:21PM +0100, Mateusz Guzik wrote: > > It is being read surprisingly often (e.g., by mkdir, ls and even sed!). > > > > This is lock-protected pointer chasing over a linked list to pay for > > sprintf for every fs (32 on my boxen). > > > > Instead cache the result. > > > > While here mark the file as permanent to avoid atomic refs on each op > > (which can also degrade to taking a spinlock). > > > > open+read+close cycle single-threaded (ops/s): > > before: 450929 > > after: 982308 (+117%) > > > > Here the main bottleneck is memcg. > > > > open+read+close cycle with 20 processes (ops/s): > > before: 578654 > > after: 3163961 (+446%) > > > > The main bottleneck *before* is spinlock acquire in procfs eliminated by > > marking the file as permanent. The main bottleneck *after* is the > > spurious lockref trip on open. > > > > The file looks like a sterotypical C from the 90s, right down to an > > open-code and slightly obfuscated linked list. I intentionally did not > > clean up any of it -- I think the file will be best served by a Rust > > rewrite when the time comes. > > > > Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> > > --- > > > + pde = proc_create_single("filesystems", 0, NULL, filesystems_proc_show); > > + proc_make_permanent(pde); > > return 0; > > This all looks good enough for me especially if it's really that much of > a bottleneck for some workloads. But the above part is broken because > proc_create_single() may return NULL afair and that means > proc_make_permanent()->pde_make_permanent() will NULL-deref. > > I'll fix that up locally. oh huh, indeed my bad. But in that case it should perhaps complain? WARN_ON? I would argue if this fails then something really went wrong. -- Mateusz Guzik <mjguzik gmail.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] speed up /proc/filesystems 2025-03-29 19:28 [PATCH 0/2] speed up /proc/filesystems Mateusz Guzik 2025-03-29 19:28 ` [PATCH 1/2] proc: add a helper for marking files as permanent by external consumers Mateusz Guzik 2025-03-29 19:28 ` [PATCH 2/2] fs: cache the string generated by reading /proc/filesystems Mateusz Guzik @ 2025-04-01 10:34 ` Christian Brauner 2 siblings, 0 replies; 10+ messages in thread From: Christian Brauner @ 2025-04-01 10:34 UTC (permalink / raw) To: Mateusz Guzik; +Cc: Christian Brauner, viro, jack, linux-kernel, linux-fsdevel On Sat, 29 Mar 2025 20:28:19 +0100, Mateusz Guzik wrote: > I accidentally found out it is used a lot *and* is incredibly slow. > > Part of it is procfs protecting the file from going away on each op, > other part is content generatin being dog slow. > > Turns out procfs did not provide an interface to mark files as > permanent. I added easiest hack I could think of to remedy the problem, > I am not going to argue how to do it. > > [...] Applied to the vfs-6.16.procfs branch of the vfs/vfs.git tree. Patches in the vfs-6.16.procfs branch should appear in linux-next soon. Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it. It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated. Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch. tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs-6.16.procfs [1/2] proc: add a helper for marking files as permanent by external consumers https://git.kernel.org/vfs/vfs/c/6040503a448b [2/2] fs: cache the string generated by reading /proc/filesystems https://git.kernel.org/vfs/vfs/c/9750cdeb327d ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-04-05 5:26 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-03-29 19:28 [PATCH 0/2] speed up /proc/filesystems Mateusz Guzik 2025-03-29 19:28 ` [PATCH 1/2] proc: add a helper for marking files as permanent by external consumers Mateusz Guzik 2025-03-29 19:28 ` [PATCH 2/2] fs: cache the string generated by reading /proc/filesystems Mateusz Guzik 2025-03-29 20:53 ` Andreas Schwab 2025-04-04 8:41 ` Christoph Hellwig 2025-04-05 4:55 ` Mateusz Guzik 2025-04-05 5:26 ` Mateusz Guzik 2025-04-01 10:30 ` Christian Brauner 2025-04-01 12:12 ` Mateusz Guzik 2025-04-01 10:34 ` [PATCH 0/2] speed up /proc/filesystems Christian Brauner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox