* [PATCH 1/2] proc: add a helper for marking files as permanent by external consumers
2025-03-29 19:28 [PATCH 0/2] speed up /proc/filesystems Mateusz Guzik
@ 2025-03-29 19:28 ` Mateusz Guzik
2025-03-29 19:28 ` [PATCH 2/2] fs: cache the string generated by reading /proc/filesystems Mateusz Guzik
2025-04-01 10:34 ` [PATCH 0/2] speed up /proc/filesystems Christian Brauner
2 siblings, 0 replies; 10+ messages in thread
From: Mateusz Guzik @ 2025-03-29 19:28 UTC (permalink / raw)
To: brauner; +Cc: viro, jack, linux-kernel, linux-fsdevel, Mateusz Guzik
This avoids refing trips on use, added with VFS in mind but should be
helpful elsewhere as well.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
fs/proc/generic.c | 6 ++++++
include/linux/proc_fs.h | 1 +
2 files changed, 7 insertions(+)
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index a3e22803cddf..ae86554bb271 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -668,6 +668,12 @@ struct proc_dir_entry *proc_create_single_data(const char *name, umode_t mode,
}
EXPORT_SYMBOL(proc_create_single_data);
+void proc_make_permanent(struct proc_dir_entry *de)
+{
+ pde_make_permanent(de);
+}
+EXPORT_SYMBOL(proc_make_permanent);
+
void proc_set_size(struct proc_dir_entry *de, loff_t size)
{
de->size = size;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index ea62201c74c4..0b6fcef4099a 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -105,6 +105,7 @@ struct proc_dir_entry *proc_create_single_data(const char *name, umode_t mode,
int (*show)(struct seq_file *, void *), void *data);
#define proc_create_single(name, mode, parent, show) \
proc_create_single_data(name, mode, parent, show, NULL)
+void proc_make_permanent(struct proc_dir_entry *);
extern struct proc_dir_entry *proc_create_data(const char *, umode_t,
struct proc_dir_entry *,
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 2/2] fs: cache the string generated by reading /proc/filesystems
2025-03-29 19:28 [PATCH 0/2] speed up /proc/filesystems Mateusz Guzik
2025-03-29 19:28 ` [PATCH 1/2] proc: add a helper for marking files as permanent by external consumers Mateusz Guzik
@ 2025-03-29 19:28 ` Mateusz Guzik
2025-03-29 20:53 ` Andreas Schwab
2025-04-01 10:30 ` Christian Brauner
2025-04-01 10:34 ` [PATCH 0/2] speed up /proc/filesystems Christian Brauner
2 siblings, 2 replies; 10+ messages in thread
From: Mateusz Guzik @ 2025-03-29 19:28 UTC (permalink / raw)
To: brauner; +Cc: viro, jack, linux-kernel, linux-fsdevel, Mateusz Guzik
It is being read surprisingly often (e.g., by mkdir, ls and even sed!).
This is lock-protected pointer chasing over a linked list to pay for
sprintf for every fs (32 on my boxen).
Instead cache the result.
While here mark the file as permanent to avoid atomic refs on each op
(which can also degrade to taking a spinlock).
open+read+close cycle single-threaded (ops/s):
before: 450929
after: 982308 (+117%)
Here the main bottleneck is memcg.
open+read+close cycle with 20 processes (ops/s):
before: 578654
after: 3163961 (+446%)
The main bottleneck *before* is spinlock acquire in procfs eliminated by
marking the file as permanent. The main bottleneck *after* is the
spurious lockref trip on open.
The file looks like a sterotypical C from the 90s, right down to an
open-code and slightly obfuscated linked list. I intentionally did not
clean up any of it -- I think the file will be best served by a Rust
rewrite when the time comes.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
fs/filesystems.c | 148 ++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 140 insertions(+), 8 deletions(-)
diff --git a/fs/filesystems.c b/fs/filesystems.c
index 58b9067b2391..5d4649fd8788 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -34,6 +34,23 @@
static struct file_system_type *file_systems;
static DEFINE_RWLOCK(file_systems_lock);
+#ifdef CONFIG_PROC_FS
+static unsigned long file_systems_gen;
+
+struct file_systems_string {
+ struct rcu_head rcufree;
+ unsigned long gen;
+ size_t len;
+ char string[];
+};
+static struct file_systems_string *file_systems_string;
+static void invalidate_filesystems_string(void);
+#else
+static void invalidate_filesystems_string(void)
+{
+}
+#endif
+
/* WARNING: This can be used only if we _already_ own a reference */
struct file_system_type *get_filesystem(struct file_system_type *fs)
{
@@ -83,10 +100,12 @@ int register_filesystem(struct file_system_type * fs)
return -EBUSY;
write_lock(&file_systems_lock);
p = find_filesystem(fs->name, strlen(fs->name));
- if (*p)
+ if (*p) {
res = -EBUSY;
- else
+ } else {
*p = fs;
+ invalidate_filesystems_string();
+ }
write_unlock(&file_systems_lock);
return res;
}
@@ -115,6 +134,7 @@ int unregister_filesystem(struct file_system_type * fs)
if (fs == *tmp) {
*tmp = fs->next;
fs->next = NULL;
+ invalidate_filesystems_string();
write_unlock(&file_systems_lock);
synchronize_rcu();
return 0;
@@ -234,25 +254,137 @@ int __init list_bdev_fs_names(char *buf, size_t size)
}
#ifdef CONFIG_PROC_FS
-static int filesystems_proc_show(struct seq_file *m, void *v)
+/*
+ * The fs list gets queried a lot by userspace, including rather surprising
+ * programs (would you guess *sed* is on the list?). In order to reduce the
+ * overhead we cache the resulting string, which normally hangs around below
+ * 512 bytes in size.
+ *
+ * As the list almost never changes, its creation is not particularly optimized
+ * for simplicity.
+ *
+ * We sort it out on read in order to not introduce a failure point for fs
+ * registration (in principle we may be unable to alloc memory for the list).
+ */
+static void invalidate_filesystems_string(void)
{
- struct file_system_type * tmp;
+ struct file_systems_string *fss;
- read_lock(&file_systems_lock);
+ lockdep_assert_held_write(&file_systems_lock);
+ file_systems_gen++;
+ fss = file_systems_string;
+ WRITE_ONCE(file_systems_string, NULL);
+ kfree_rcu(fss, rcufree);
+}
+
+static noinline int regen_filesystems_string(void)
+{
+ struct file_system_type *tmp;
+ struct file_systems_string *old, *new;
+ size_t newlen, usedlen;
+ unsigned long gen;
+
+retry:
+ lockdep_assert_not_held(&file_systems_lock);
+
+ newlen = 0;
+ write_lock(&file_systems_lock);
+ gen = file_systems_gen;
+ tmp = file_systems;
+ /* pre-calc space for "%s\t%s\n" for each fs */
+ while (tmp) {
+ if (!(tmp->fs_flags & FS_REQUIRES_DEV))
+ newlen += strlen("nodev");
+ newlen += strlen("\t");
+ newlen += strlen(tmp->name);
+ newlen += strlen("\n");
+ tmp = tmp->next;
+ }
+ write_unlock(&file_systems_lock);
+
+ new = kmalloc(offsetof(struct file_systems_string, string) + newlen + 1,
+ GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+
+ new->gen = gen;
+ new->len = newlen;
+ new->string[newlen] = '\0';
+ write_lock(&file_systems_lock);
+ old = file_systems_string;
+
+ /*
+ * Did someone beat us to it?
+ */
+ if (old && old->gen == file_systems_gen) {
+ write_unlock(&file_systems_lock);
+ kfree(new);
+ return 0;
+ }
+
+ /*
+ * Did the list change in the meantime?
+ */
+ if (gen != file_systems_gen) {
+ write_unlock(&file_systems_lock);
+ kfree(new);
+ goto retry;
+ }
+
+ /*
+ * Populated the string.
+ *
+ * We know we have just enough space because we calculated the right
+ * size the previous time we had the lock and confirmed the list has
+ * not changed after reacquiring it.
+ */
+ usedlen = 0;
tmp = file_systems;
while (tmp) {
- seq_printf(m, "%s\t%s\n",
+ usedlen += sprintf(&new->string[usedlen], "%s\t%s\n",
(tmp->fs_flags & FS_REQUIRES_DEV) ? "" : "nodev",
tmp->name);
tmp = tmp->next;
}
- read_unlock(&file_systems_lock);
+ BUG_ON(new->len != strlen(new->string));
+
+ /*
+ * Pairs with consume fence in READ_ONCE() in filesystems_proc_show().
+ */
+ smp_store_release(&file_systems_string, new);
+ write_unlock(&file_systems_lock);
+ kfree_rcu(old, rcufree);
return 0;
}
+static int filesystems_proc_show(struct seq_file *m, void *v)
+{
+ struct file_systems_string *fss;
+
+ for (;;) {
+ scoped_guard(rcu) {
+ /*
+ * Pairs with smp_store_release() in regen_filesystems_string().
+ */
+ fss = READ_ONCE(file_systems_string);
+ if (likely(fss)) {
+ seq_write(m, fss->string, fss->len);
+ return 0;
+ }
+ }
+
+ int err = regen_filesystems_string();
+ if (unlikely(err))
+ return err;
+ }
+}
+
static int __init proc_filesystems_init(void)
{
- proc_create_single("filesystems", 0, NULL, filesystems_proc_show);
+ struct proc_dir_entry *pde;
+
+ pde = proc_create_single("filesystems", 0, NULL, filesystems_proc_show);
+ proc_make_permanent(pde);
return 0;
}
module_init(proc_filesystems_init);
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 0/2] speed up /proc/filesystems
2025-03-29 19:28 [PATCH 0/2] speed up /proc/filesystems Mateusz Guzik
2025-03-29 19:28 ` [PATCH 1/2] proc: add a helper for marking files as permanent by external consumers Mateusz Guzik
2025-03-29 19:28 ` [PATCH 2/2] fs: cache the string generated by reading /proc/filesystems Mateusz Guzik
@ 2025-04-01 10:34 ` Christian Brauner
2 siblings, 0 replies; 10+ messages in thread
From: Christian Brauner @ 2025-04-01 10:34 UTC (permalink / raw)
To: Mateusz Guzik; +Cc: Christian Brauner, viro, jack, linux-kernel, linux-fsdevel
On Sat, 29 Mar 2025 20:28:19 +0100, Mateusz Guzik wrote:
> I accidentally found out it is used a lot *and* is incredibly slow.
>
> Part of it is procfs protecting the file from going away on each op,
> other part is content generatin being dog slow.
>
> Turns out procfs did not provide an interface to mark files as
> permanent. I added easiest hack I could think of to remedy the problem,
> I am not going to argue how to do it.
>
> [...]
Applied to the vfs-6.16.procfs branch of the vfs/vfs.git tree.
Patches in the vfs-6.16.procfs branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.16.procfs
[1/2] proc: add a helper for marking files as permanent by external consumers
https://git.kernel.org/vfs/vfs/c/6040503a448b
[2/2] fs: cache the string generated by reading /proc/filesystems
https://git.kernel.org/vfs/vfs/c/9750cdeb327d
^ permalink raw reply [flat|nested] 10+ messages in thread