* [PATCH v3 0/3] revamp fs/filesystems.c
@ 2026-04-25 22:08 Mateusz Guzik
2026-04-25 22:08 ` [PATCH v3 1/3] proc: allow to mark /proc files permanent outside of fs/proc/ Mateusz Guzik
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: Mateusz Guzik @ 2026-04-25 22:08 UTC (permalink / raw)
To: brauner; +Cc: viro, jack, linux-kernel, linux-fsdevel, adobriyan, Mateusz Guzik
The file is a mess with a hand-rolled linked list in a desperate need of
a clean up.
The code to emit /proc/filesystems is used frequently because libselinux
reads the file, which in turn is linked into numerous frequently used
programs (even ones you would not suspect, like sed!). In order to
combat that pre-gen the string instead of pointer-chasing and printfing
one by-one.
open+read+close cycle single-threaded (ops/s):
before: 442732
after: 1063462 (+140%)
Additionally scalability is also improved thanks to bypassing ref
maintenance on open/close.
open+read+close cycle with 20 processes (ops/s):
before: 606177
after: 3300576 (+444%)
The main bottleneck afterwards is the spurious lockref trip on open.
Alexey Dobriyan (1):
proc: allow to mark /proc files permanent outside of fs/proc/
Christian Brauner (1):
fs: RCU-ify filesystems list
Mateusz Guzik (1):
fs: cache the string generated by reading /proc/filesystems
fs/filesystems.c | 332 +++++++++++++++++++++++++++-------------
fs/ocfs2/super.c | 1 -
fs/proc/generic.c | 12 ++
fs/proc/internal.h | 3 +
include/linux/fs.h | 2 +-
include/linux/proc_fs.h | 10 ++
6 files changed, 252 insertions(+), 108 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3 1/3] proc: allow to mark /proc files permanent outside of fs/proc/
2026-04-25 22:08 [PATCH v3 0/3] revamp fs/filesystems.c Mateusz Guzik
@ 2026-04-25 22:08 ` Mateusz Guzik
2026-04-25 22:08 ` [PATCH v3 2/3] fs: RCU-ify filesystems list Mateusz Guzik
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Mateusz Guzik @ 2026-04-25 22:08 UTC (permalink / raw)
To: brauner; +Cc: viro, jack, linux-kernel, linux-fsdevel, adobriyan
From: Alexey Dobriyan <adobriyan@gmail.com>
Add proc_make_permanent() function to mark PDE as permanent to speed up
open/read/close (one alloc/free and lock/unlock less).
Enable it for built-in code and for compiled-in modules.
This function becomes nop magically in modular code.
Note, note, note!
If built-in code creates and deletes PDEs dynamically (not in init
hook), then proc_make_permanent() must not be used.
It is intended for simple code:
static int __init xxx_module_init(void)
{
g_pde = proc_create_single();
proc_make_permanent(g_pde);
return 0;
}
static void __exit xxx_module_exit(void)
{
remove_proc_entry(g_pde);
}
If module is built-in then exit hook never executed and PDE is
permanent so it is OK to mark it as such.
If module is module then rmmod will yank PDE, but proc_make_permanent()
is nop and core /proc code will do everything right.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---
fs/proc/generic.c | 12 ++++++++++++
fs/proc/internal.h | 3 +++
include/linux/proc_fs.h | 10 ++++++++++
3 files changed, 25 insertions(+)
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index 3063080f3bb2..497561ee3848 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -845,3 +845,15 @@ ssize_t proc_simple_write(struct file *f, const char __user *ubuf, size_t size,
kfree(buf);
return ret == 0 ? size : ret;
}
+
+/*
+ * Not exported to modules:
+ * modules' /proc files aren't permanent because modules aren't permanent.
+ */
+void impl_proc_make_permanent(struct proc_dir_entry *pde);
+void impl_proc_make_permanent(struct proc_dir_entry *pde)
+{
+ if (pde) {
+ pde_make_permanent(pde);
+ }
+}
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 64dc44832808..1edbabbdbc5d 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -79,8 +79,11 @@ static inline bool pde_is_permanent(const struct proc_dir_entry *pde)
return pde->flags & PROC_ENTRY_PERMANENT;
}
+/* This is for builtin code, not even for modules which are compiled in. */
static inline void pde_make_permanent(struct proc_dir_entry *pde)
{
+ /* Ensure magic flag does something. */
+ static_assert(PROC_ENTRY_PERMANENT != 0);
pde->flags |= PROC_ENTRY_PERMANENT;
}
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 19d1c5e5f335..dceccd27a234 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -248,4 +248,14 @@ static inline struct pid_namespace *proc_pid_ns(struct super_block *sb)
bool proc_ns_file(const struct file *file);
+static inline void proc_make_permanent(struct proc_dir_entry *pde)
+{
+ /* Don't give matches to modules. */
+#if defined CONFIG_PROC_FS && !defined MODULE
+ /* This mess is created by defining "struct proc_dir_entry" elsewhere. */
+ void impl_proc_make_permanent(struct proc_dir_entry *pde);
+ impl_proc_make_permanent(pde);
+#endif
+}
+
#endif /* _LINUX_PROC_FS_H */
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v3 2/3] fs: RCU-ify filesystems list
2026-04-25 22:08 [PATCH v3 0/3] revamp fs/filesystems.c Mateusz Guzik
2026-04-25 22:08 ` [PATCH v3 1/3] proc: allow to mark /proc files permanent outside of fs/proc/ Mateusz Guzik
@ 2026-04-25 22:08 ` Mateusz Guzik
2026-04-25 22:08 ` [PATCH v3 3/3] fs: cache the string generated by reading /proc/filesystems Mateusz Guzik
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Mateusz Guzik @ 2026-04-25 22:08 UTC (permalink / raw)
To: brauner; +Cc: viro, jack, linux-kernel, linux-fsdevel, adobriyan
From: Christian Brauner <brauner@kernel.org>
The drivers list was protected by an rwlock; every mount, every open
of /proc/filesystems and the legacy sysfs(2) syscall walked a
hand-rolled singly-linked list under it. /proc/filesystems is
especially hot because libselinux causes programs as mundane as
mkdir, ls and sed to open and read it on every invocation.
Convert the list to an RCU-protected hlist and switch the writer side
to a plain spinlock. Writers keep their existing non-sleeping
section while readers walk under rcu_read_lock() with no lock traffic:
- register_filesystem()/unregister_filesystem() take
file_systems_lock, publish via hlist_{add_tail,del_init}_rcu()
and invalidate the cached /proc/filesystems string.
unregister_filesystem() keeps its synchronize_rcu() after
dropping the lock so in-flight readers are drained before the
module (and its embedded file_system_type) can go away.
- __get_fs_type(), list_bdev_fs_names() and the
fs_index()/fs_name()/fs_maxindex() helpers walk the list under
rcu_read_lock(). fs_name() continues to drop the read-side
lock after try_module_get() and accesses ->name outside the RCU
section; the module reference pins the embedded file_system_type
across the boundary.
struct file_system_type::next becomes struct hlist_node list; no
in-tree caller references the old ->next field outside
fs/filesystems.c.
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
fs/filesystems.c | 179 +++++++++++++++++++--------------------------
fs/ocfs2/super.c | 1 -
include/linux/fs.h | 2 +-
3 files changed, 75 insertions(+), 107 deletions(-)
diff --git a/fs/filesystems.c b/fs/filesystems.c
index 0c7d2b7ac26c..7976366d4197 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -17,22 +17,19 @@
#include <linux/slab.h>
#include <linux/uaccess.h>
#include <linux/fs_parser.h>
+#include <linux/rculist.h>
/*
- * Handling of filesystem drivers list.
- * Rules:
- * Inclusion to/removals from/scanning of list are protected by spinlock.
- * During the unload module must call unregister_filesystem().
- * We can access the fields of list element if:
- * 1) spinlock is held or
- * 2) we hold the reference to the module.
- * The latter can be guaranteed by call of try_module_get(); if it
- * returned 0 we must skip the element, otherwise we got the reference.
- * Once the reference is obtained we can drop the spinlock.
+ * Read-mostly filesystem drivers list.
+ *
+ * Readers walk under rcu_read_lock(); writers take file_systems_lock
+ * and publish via _rcu hlist primitives. unregister_filesystem()
+ * synchronize_rcu()s after unlock so the embedded file_system_type
+ * can't go away under a reader. To keep using a filesystem after
+ * the RCU section ends, take a module reference via try_module_get().
*/
-
-static struct file_system_type *file_systems;
-static DEFINE_RWLOCK(file_systems_lock);
+static HLIST_HEAD(file_systems);
+static DEFINE_SPINLOCK(file_systems_lock);
/* WARNING: This can be used only if we _already_ own a reference */
struct file_system_type *get_filesystem(struct file_system_type *fs)
@@ -46,14 +43,15 @@ void put_filesystem(struct file_system_type *fs)
module_put(fs->owner);
}
-static struct file_system_type **find_filesystem(const char *name, unsigned len)
+static struct file_system_type *find_filesystem(const char *name, unsigned len)
{
- struct file_system_type **p;
- for (p = &file_systems; *p; p = &(*p)->next)
- if (strncmp((*p)->name, name, len) == 0 &&
- !(*p)->name[len])
- break;
- return p;
+ struct file_system_type *fs;
+
+ hlist_for_each_entry_rcu(fs, &file_systems, list,
+ lockdep_is_held(&file_systems_lock))
+ if (strncmp(fs->name, name, len) == 0 && !fs->name[len])
+ return fs;
+ return NULL;
}
/**
@@ -64,33 +62,26 @@ static struct file_system_type **find_filesystem(const char *name, unsigned len)
* is aware of for mount and other syscalls. Returns 0 on success,
* or a negative errno code on an error.
*
- * The &struct file_system_type that is passed is linked into the kernel
+ * The &struct file_system_type that is passed is linked into the kernel
* structures and must not be freed until the file system has been
* unregistered.
*/
-
-int register_filesystem(struct file_system_type * fs)
+int register_filesystem(struct file_system_type *fs)
{
- int res = 0;
- struct file_system_type ** p;
-
if (fs->parameters &&
!fs_validate_description(fs->name, fs->parameters))
return -EINVAL;
BUG_ON(strchr(fs->name, '.'));
- if (fs->next)
+ if (!hlist_unhashed_lockless(&fs->list))
return -EBUSY;
- write_lock(&file_systems_lock);
- p = find_filesystem(fs->name, strlen(fs->name));
- if (*p)
- res = -EBUSY;
- else
- *p = fs;
- write_unlock(&file_systems_lock);
- return res;
-}
+ guard(spinlock)(&file_systems_lock);
+ if (find_filesystem(fs->name, strlen(fs->name)))
+ return -EBUSY;
+ hlist_add_tail_rcu(&fs->list, &file_systems);
+ return 0;
+}
EXPORT_SYMBOL(register_filesystem);
/**
@@ -100,94 +91,78 @@ EXPORT_SYMBOL(register_filesystem);
* Remove a file system that was previously successfully registered
* with the kernel. An error is returned if the file system is not found.
* Zero is returned on a success.
- *
+ *
* Once this function has returned the &struct file_system_type structure
* may be freed or reused.
*/
-
-int unregister_filesystem(struct file_system_type * fs)
+int unregister_filesystem(struct file_system_type *fs)
{
- struct file_system_type ** tmp;
-
- write_lock(&file_systems_lock);
- tmp = &file_systems;
- while (*tmp) {
- if (fs == *tmp) {
- *tmp = fs->next;
- fs->next = NULL;
- write_unlock(&file_systems_lock);
- synchronize_rcu();
- return 0;
- }
- tmp = &(*tmp)->next;
+ scoped_guard(spinlock, &file_systems_lock) {
+ if (hlist_unhashed(&fs->list))
+ return -EINVAL;
+ hlist_del_init_rcu(&fs->list);
}
- write_unlock(&file_systems_lock);
-
- return -EINVAL;
+ synchronize_rcu();
+ return 0;
}
-
EXPORT_SYMBOL(unregister_filesystem);
#ifdef CONFIG_SYSFS_SYSCALL
-static int fs_index(const char __user * __name)
+static int fs_index(const char __user *__name)
{
- struct file_system_type * tmp;
+ struct file_system_type *p;
char *name __free(kfree) = strndup_user(__name, PATH_MAX);
- int err, index;
+ int index = 0;
if (IS_ERR(name))
return PTR_ERR(name);
- err = -EINVAL;
- read_lock(&file_systems_lock);
- for (tmp=file_systems, index=0 ; tmp ; tmp=tmp->next, index++) {
- if (strcmp(tmp->name, name) == 0) {
- err = index;
- break;
- }
+ guard(rcu)();
+ hlist_for_each_entry_rcu(p, &file_systems, list) {
+ if (strcmp(p->name, name) == 0)
+ return index;
+ index++;
}
- read_unlock(&file_systems_lock);
- return err;
+ return -EINVAL;
}
-static int fs_name(unsigned int index, char __user * buf)
+static int fs_name(unsigned int index, char __user *buf)
{
- struct file_system_type * tmp;
- int len, res = -EINVAL;
-
- read_lock(&file_systems_lock);
- for (tmp = file_systems; tmp; tmp = tmp->next, index--) {
- if (index == 0) {
- if (try_module_get(tmp->owner))
- res = 0;
+ struct file_system_type *p, *found = NULL;
+ int len, res;
+
+ scoped_guard(rcu) {
+ hlist_for_each_entry_rcu(p, &file_systems, list) {
+ if (index--)
+ continue;
+ if (try_module_get(p->owner))
+ found = p;
break;
}
}
- read_unlock(&file_systems_lock);
- if (res)
- return res;
+ if (!found)
+ return -EINVAL;
/* OK, we got the reference, so we can safely block */
- len = strlen(tmp->name) + 1;
- res = copy_to_user(buf, tmp->name, len) ? -EFAULT : 0;
- put_filesystem(tmp);
+ len = strlen(found->name) + 1;
+ res = copy_to_user(buf, found->name, len) ? -EFAULT : 0;
+ put_filesystem(found);
return res;
}
static int fs_maxindex(void)
{
- struct file_system_type * tmp;
- int index;
+ struct file_system_type *p;
+ int index = 0;
- read_lock(&file_systems_lock);
- for (tmp = file_systems, index = 0 ; tmp ; tmp = tmp->next, index++)
- ;
- read_unlock(&file_systems_lock);
+ guard(rcu)();
+ hlist_for_each_entry_rcu(p, &file_systems, list)
+ index++;
return index;
}
/*
- * Whee.. Weird sysv syscall.
+ * Whee.. Weird sysv syscall.
*/
SYSCALL_DEFINE3(sysfs, int, option, unsigned long, arg1, unsigned long, arg2)
{
@@ -216,8 +191,8 @@ int __init list_bdev_fs_names(char *buf, size_t size)
size_t len;
int count = 0;
- read_lock(&file_systems_lock);
- for (p = file_systems; p; p = p->next) {
+ guard(rcu)();
+ hlist_for_each_entry_rcu(p, &file_systems, list) {
if (!(p->fs_flags & FS_REQUIRES_DEV))
continue;
len = strlen(p->name) + 1;
@@ -230,24 +205,20 @@ int __init list_bdev_fs_names(char *buf, size_t size)
size -= len;
count++;
}
- read_unlock(&file_systems_lock);
return count;
}
#ifdef CONFIG_PROC_FS
static int filesystems_proc_show(struct seq_file *m, void *v)
{
- struct file_system_type * tmp;
+ struct file_system_type *p;
- read_lock(&file_systems_lock);
- tmp = file_systems;
- while (tmp) {
+ guard(rcu)();
+ hlist_for_each_entry_rcu(p, &file_systems, list) {
seq_printf(m, "%s\t%s\n",
- (tmp->fs_flags & FS_REQUIRES_DEV) ? "" : "nodev",
- tmp->name);
- tmp = tmp->next;
+ (p->fs_flags & FS_REQUIRES_DEV) ? "" : "nodev",
+ p->name);
}
- read_unlock(&file_systems_lock);
return 0;
}
@@ -263,11 +234,10 @@ static struct file_system_type *__get_fs_type(const char *name, int len)
{
struct file_system_type *fs;
- read_lock(&file_systems_lock);
- fs = *(find_filesystem(name, len));
+ guard(rcu)();
+ fs = find_filesystem(name, len);
if (fs && !try_module_get(fs->owner))
fs = NULL;
- read_unlock(&file_systems_lock);
return fs;
}
@@ -291,5 +261,4 @@ struct file_system_type *get_fs_type(const char *name)
}
return fs;
}
-
EXPORT_SYMBOL(get_fs_type);
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index b875f01c9756..4870e680c4e5 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1224,7 +1224,6 @@ static struct file_system_type ocfs2_fs_type = {
.name = "ocfs2",
.kill_sb = kill_block_super,
.fs_flags = FS_REQUIRES_DEV|FS_RENAME_DOES_D_MOVE,
- .next = NULL,
.init_fs_context = ocfs2_init_fs_context,
.parameters = ocfs2_param_spec,
};
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 11559c513dfb..c37bb3c7de8b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2286,7 +2286,7 @@ struct file_system_type {
const struct fs_parameter_spec *parameters;
void (*kill_sb) (struct super_block *);
struct module *owner;
- struct file_system_type * next;
+ struct hlist_node list;
struct hlist_head fs_supers;
struct lock_class_key s_lock_key;
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v3 3/3] fs: cache the string generated by reading /proc/filesystems
2026-04-25 22:08 [PATCH v3 0/3] revamp fs/filesystems.c Mateusz Guzik
2026-04-25 22:08 ` [PATCH v3 1/3] proc: allow to mark /proc files permanent outside of fs/proc/ Mateusz Guzik
2026-04-25 22:08 ` [PATCH v3 2/3] fs: RCU-ify filesystems list Mateusz Guzik
@ 2026-04-25 22:08 ` Mateusz Guzik
2026-04-27 14:53 ` [PATCH v3 0/3] revamp fs/filesystems.c Christian Brauner
2026-04-28 6:36 ` Why does GNU sed abuse /proc/filesystems? " Cedric Blancher
4 siblings, 0 replies; 7+ messages in thread
From: Mateusz Guzik @ 2026-04-25 22:08 UTC (permalink / raw)
To: brauner; +Cc: viro, jack, linux-kernel, linux-fsdevel, adobriyan, Mateusz Guzik
It is being read surprisingly often (e.g., by mkdir, ls and even sed!).
This is lock-protected pointer chasing over a linked list to pay for
sprintf for every fs (32 on my boxen).
Instead cache the result.
While here make the file as permanent to avoid spurious ref trips in
procfs.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
fs/filesystems.c | 155 ++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 153 insertions(+), 2 deletions(-)
diff --git a/fs/filesystems.c b/fs/filesystems.c
index 7976366d4197..771fc31a69b8 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -31,6 +31,36 @@
static HLIST_HEAD(file_systems);
static DEFINE_SPINLOCK(file_systems_lock);
+#ifdef CONFIG_PROC_FS
+/*
+ * Cache a stringified version of the filesystem list.
+ *
+ * The fs list gets queried a lot by userspace because of libselinux, including
+ * rather surprising programs (would you guess *sed* is on the list?). In order
+ * to reduce the overhead we cache the resulting string, which normally hangs
+ * around below 512 bytes in size.
+ *
+ * As the list almost never changes, its creation is not particularly optimized
+ * to keep things simple.
+ *
+ * We sort it out on read in order to not introduce a failure point for fs
+ * registration (in principle we may be unable to alloc memory for the list).
+ */
+struct file_systems_string {
+ struct rcu_head rcu;
+ unsigned long gen;
+ size_t len;
+ char string[];
+};
+
+static unsigned long file_systems_gen;
+static struct file_systems_string __rcu *file_systems_string;
+
+static void invalidate_filesystems_string(void);
+#else
+static inline void invalidate_filesystems_string(void) { }
+#endif
+
/* WARNING: This can be used only if we _already_ own a reference */
struct file_system_type *get_filesystem(struct file_system_type *fs)
{
@@ -80,6 +110,7 @@ int register_filesystem(struct file_system_type *fs)
if (find_filesystem(fs->name, strlen(fs->name)))
return -EBUSY;
hlist_add_tail_rcu(&fs->list, &file_systems);
+ invalidate_filesystems_string();
return 0;
}
EXPORT_SYMBOL(register_filesystem);
@@ -101,6 +132,7 @@ int unregister_filesystem(struct file_system_type *fs)
if (hlist_unhashed(&fs->list))
return -EINVAL;
hlist_del_init_rcu(&fs->list);
+ invalidate_filesystems_string();
}
synchronize_rcu();
return 0;
@@ -209,7 +241,102 @@ int __init list_bdev_fs_names(char *buf, size_t size)
}
#ifdef CONFIG_PROC_FS
-static int filesystems_proc_show(struct seq_file *m, void *v)
+static void invalidate_filesystems_string(void)
+{
+ struct file_systems_string *old;
+
+ lockdep_assert_held_write(&file_systems_lock);
+ file_systems_gen++;
+ old = rcu_replace_pointer(file_systems_string, NULL,
+ lockdep_is_held(&file_systems_lock));
+ if (old)
+ kfree_rcu(old, rcu);
+}
+
+static __cold noinline int regen_filesystems_string(void)
+{
+ struct file_system_type *p;
+ struct file_systems_string *old, *new;
+ size_t newlen, usedlen;
+ unsigned long gen;
+
+retry:
+ newlen = 0;
+
+ /* pre-calc space for each fs */
+ spin_lock(&file_systems_lock);
+ gen = file_systems_gen;
+ hlist_for_each_entry_rcu(p, &file_systems, list) {
+ if (!(p->fs_flags & FS_REQUIRES_DEV))
+ newlen += strlen("nodev");
+ newlen += strlen("\t") + strlen(p->name) + strlen("\n");
+ }
+ spin_unlock(&file_systems_lock);
+
+ new = kmalloc(offsetof(struct file_systems_string, string) + newlen + 1,
+ GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+
+ new->gen = gen;
+ new->len = newlen;
+ new->string[newlen] = '\0';
+
+ spin_lock(&file_systems_lock);
+ old = file_systems_string;
+
+ /*
+ * Did someone beat us to it?
+ */
+ if (old && old->gen == file_systems_gen) {
+ kfree(new);
+ return 0;
+ }
+
+ /*
+ * Did the list change in the meantime?
+ */
+ if (gen != file_systems_gen) {
+ kfree(new);
+ goto retry;
+ }
+
+ /*
+ * Populate the string.
+ *
+ * We know we have just enough space because we calculated the right
+ * size the previous time we had the lock and confirmed the list has
+ * not changed after reacquiring it.
+ */
+ usedlen = 0;
+ hlist_for_each_entry_rcu(p, &file_systems, list) {
+ usedlen += sprintf(&new->string[usedlen], "%s\t%s\n",
+ (p->fs_flags & FS_REQUIRES_DEV) ? "" : "nodev",
+ p->name);
+ }
+
+ if (WARN_ON_ONCE(new->len != strlen(new->string))) {
+ /*
+ * Should never happen of course, keep this in case someone changes string
+ * generation above and messes it up.
+ */
+ spin_unlock(&file_systems_lock);
+ if (old)
+ kfree_rcu(old, rcu);
+ return -EINVAL;
+ }
+
+ /*
+ * Paired with consume fence in READ_ONCE() in filesystems_proc_show()
+ */
+ smp_store_release(&file_systems_string, new);
+ spin_unlock(&file_systems_lock);
+ if (old)
+ kfree_rcu(old, rcu);
+ return 0;
+}
+
+static __cold noinline int filesystems_proc_show_fallback(struct seq_file *m, void *v)
{
struct file_system_type *p;
@@ -222,9 +349,33 @@ static int filesystems_proc_show(struct seq_file *m, void *v)
return 0;
}
+static int filesystems_proc_show(struct seq_file *m, void *v)
+{
+ struct file_systems_string *fss;
+
+ for (;;) {
+ scoped_guard(rcu) {
+ fss = rcu_dereference(file_systems_string);
+ if (likely(fss)) {
+ seq_write(m, fss->string, fss->len);
+ return 0;
+ }
+ }
+
+ int err = regen_filesystems_string();
+ if (unlikely(err))
+ return filesystems_proc_show_fallback(m, v);
+ }
+}
+
static int __init proc_filesystems_init(void)
{
- proc_create_single("filesystems", 0, NULL, filesystems_proc_show);
+ struct proc_dir_entry *pde;
+
+ pde = proc_create_single("filesystems", 0, NULL, filesystems_proc_show);
+ if (!pde)
+ return -ENOMEM;
+ proc_make_permanent(pde);
return 0;
}
module_init(proc_filesystems_init);
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3 0/3] revamp fs/filesystems.c
2026-04-25 22:08 [PATCH v3 0/3] revamp fs/filesystems.c Mateusz Guzik
` (2 preceding siblings ...)
2026-04-25 22:08 ` [PATCH v3 3/3] fs: cache the string generated by reading /proc/filesystems Mateusz Guzik
@ 2026-04-27 14:53 ` Christian Brauner
2026-04-28 6:36 ` Why does GNU sed abuse /proc/filesystems? " Cedric Blancher
4 siblings, 0 replies; 7+ messages in thread
From: Christian Brauner @ 2026-04-27 14:53 UTC (permalink / raw)
To: Mateusz Guzik
Cc: Christian Brauner, viro, jack, linux-kernel, linux-fsdevel,
adobriyan
On Sun, 26 Apr 2026 00:08:41 +0200, Mateusz Guzik wrote:
> The file is a mess with a hand-rolled linked list in a desperate need of
> a clean up.
>
> The code to emit /proc/filesystems is used frequently because libselinux
> reads the file, which in turn is linked into numerous frequently used
> programs (even ones you would not suspect, like sed!). In order to
> combat that pre-gen the string instead of pointer-chasing and printfing
> one by-one.
>
> [...]
Applied to the vfs-7.2.procfs branch of the vfs/vfs.git tree.
Patches in the vfs-7.2.procfs branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: master
[1/3] proc: allow to mark /proc files permanent outside of fs/proc/
https://git.kernel.org/vfs/vfs/c/b26811a2ab58
[2/3] fs: RCU-ify filesystems list
https://git.kernel.org/vfs/vfs/c/1fe9dc896f66
[3/3] fs: cache the string generated by reading /proc/filesystems
https://git.kernel.org/vfs/vfs/c/3bd2c4fa951a
^ permalink raw reply [flat|nested] 7+ messages in thread
* Why does GNU sed abuse /proc/filesystems? Re: [PATCH v3 0/3] revamp fs/filesystems.c
2026-04-25 22:08 [PATCH v3 0/3] revamp fs/filesystems.c Mateusz Guzik
` (3 preceding siblings ...)
2026-04-27 14:53 ` [PATCH v3 0/3] revamp fs/filesystems.c Christian Brauner
@ 2026-04-28 6:36 ` Cedric Blancher
2026-04-28 8:31 ` Mateusz Guzik
4 siblings, 1 reply; 7+ messages in thread
From: Cedric Blancher @ 2026-04-28 6:36 UTC (permalink / raw)
To: linux-fsdevel, Linux Kernel Mailing List
On Sun, 26 Apr 2026 at 00:09, Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> The file is a mess with a hand-rolled linked list in a desperate need of
> a clean up.
>
> The code to emit /proc/filesystems is used frequently because libselinux
> reads the file, which in turn is linked into numerous frequently used
> programs (even ones you would not suspect, like sed!). In order to
> combat that pre-gen the string instead of pointer-chasing and printfing
> one by-one.
Why is GNU sed touching /proc/filesystems in the first place? This is
not really a stable API, and would actually be a thing which should
NOT be touched by a "simple" userland tool.
<rant>maybe rename /proc/filesystems to
/proc/filesystems_only_for_admin_purposes</rant>, or stick an ACL on
it to prevent abuse?
Ced
--
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Why does GNU sed abuse /proc/filesystems? Re: [PATCH v3 0/3] revamp fs/filesystems.c
2026-04-28 6:36 ` Why does GNU sed abuse /proc/filesystems? " Cedric Blancher
@ 2026-04-28 8:31 ` Mateusz Guzik
0 siblings, 0 replies; 7+ messages in thread
From: Mateusz Guzik @ 2026-04-28 8:31 UTC (permalink / raw)
To: Cedric Blancher; +Cc: linux-fsdevel, Linux Kernel Mailing List
On Tue, Apr 28, 2026 at 08:36:00AM +0200, Cedric Blancher wrote:
> On Sun, 26 Apr 2026 at 00:09, Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > The file is a mess with a hand-rolled linked list in a desperate need of
> > a clean up.
> >
> > The code to emit /proc/filesystems is used frequently because libselinux
> > reads the file, which in turn is linked into numerous frequently used
> > programs (even ones you would not suspect, like sed!). In order to
> > combat that pre-gen the string instead of pointer-chasing and printfing
> > one by-one.
>
> Why is GNU sed touching /proc/filesystems in the first place? This is
> not really a stable API, and would actually be a thing which should
> NOT be touched by a "simple" userland tool.
>
> <rant>maybe rename /proc/filesystems to
> /proc/filesystems_only_for_admin_purposes</rant>, or stick an ACL on
> it to prevent abuse?
>
It has support for file creation with the -i switch. For that reason it
links with libselinux which does the dirty on binary startup.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-04-28 8:31 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-25 22:08 [PATCH v3 0/3] revamp fs/filesystems.c Mateusz Guzik
2026-04-25 22:08 ` [PATCH v3 1/3] proc: allow to mark /proc files permanent outside of fs/proc/ Mateusz Guzik
2026-04-25 22:08 ` [PATCH v3 2/3] fs: RCU-ify filesystems list Mateusz Guzik
2026-04-25 22:08 ` [PATCH v3 3/3] fs: cache the string generated by reading /proc/filesystems Mateusz Guzik
2026-04-27 14:53 ` [PATCH v3 0/3] revamp fs/filesystems.c Christian Brauner
2026-04-28 6:36 ` Why does GNU sed abuse /proc/filesystems? " Cedric Blancher
2026-04-28 8:31 ` Mateusz Guzik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox