public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs
@ 2026-03-11 21:43 Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 01/26] fs: add switch_fs_struct() Christian Brauner
                   ` (25 more replies)
  0 siblings, 26 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Summary:

* all kthreads are isolated in a separate SB_KERNMOUNT of nullfs.
  -> no lookup of anything else, no mounting on top of it, completely
  isolated.
* init has a separate fs_struct from all kthreads
* scoped_with_init_fs() allows a kthread to temporarily assume init's
  fs_struct for filesystem operations.

So this is a bit of a crazy series. When the kernel is started it
roughly goes like this:

init_task
==> create pid 1 (systemd etc.)
==> pid 2 (kthreadd)

After this point all kthreads and PID 1 share the same filesystem state.
That obviously already came up when we discussed pivot_root() as this
allows pivot_root() to rewrite the fs_struct of all kthreads.

This rewriting is really weird and mostly done so kthread can use init's
filesystem state when they would like to. But this really should be
discouraged. The rewriting should also stop completely. I worked a bit
to get rid of it in a more fundamental way. Is it crazy? Yes. Is it
likely broken? Yes. Does it at least boot? Yes.

Instead of sharing fs_struct between kernel threads and pid 1, pid 1
get's a completely separate fs_struct. All kthreads continue sharing
init_fs as before and pid 1's fs_struct is isolated from kthread's
filesystem state. IOW, userspace init cannot affect kthreads filesystem
state anymore and kthreads cannot affect userspace's filesystem state
anymore - without explicit opt-in.

All kthreads are anchored in a kernel internal mount of nullfs that
cannot be mounted on and that cannot be used to follow other mounts.
It's a completely private mount that insulates kthreads.

This series makes performing mountains of filesystem work such as path
lookup and file opening and so on from kthreads hard - painfully so. I
think this is a benefit because it takes the idea of just offloading
_security sensitive_ operations in init's filesystem state and
running random binaries or opening and creating files to kthreads
difficult behind the shed... And imho it should.

The only remaining kernel tasks that actually share init's filesystem
state are usermodhelpers - as they execute random binaries in the root
filesystem. Another concept we should really show the back of the shed.

This gives a lot stronger guarantees than what we have now. This also
makes path lookup from kthreads fail by default. IOW, it won't be
possible anymore to just lookup random stuff in init's filesytem state
without explicitly opting in to that.

The places that need to perform lookup in init's filesystem state may
use scoped_with_init_fs() which will temporarily override the caller's
fs_struct with init's fs_struct.

We now also warn and notice when pid 1 simply stops sharing filesystem
state with us, i.e., abandons it's userspace_init_fs.

On older kernels if PID 1 unshared its filesystem state with us the
kernel simply used the stale fs_struct state implicitly pinning
anything that PID 1 had last used. Even if PID 1 might've moved on to
some completely different fs_struct state and might've even unmounted
the old root.

This has hilarious consequences: Think continuing to dump coredump
state into an implicitly pinned directory somewhere. Calling random
binaries in the old rootfs via usermodehelpers.

Be aggressive about this: We simply reject operating on stale
fs_struct state by reverting userspace_init_fs to nullfs. Every kworker
that does lookups after this point will fail. Every usermodehelper call
will fail. This is a lot stronger but I wouldn't know what it means for
pid 1 to simply stop sharing its fs state with the kernel. Clearly it
wanted to separate so cut all ties.

I've went through the kernel and looked at hopefully everything that
does path lookup from kthreads (workqueues, ...).

TL;DR:

==== PID 1 (systemd) ====

  root@localhost:~# stat --file-system /proc/1/root
    File: "/proc/1/root"
      ID: e3cb00dd533cd3d7 Namelen: 255     Type: ext2/ext3

  root@localhost:~# cat /proc/1/mountinfo | wc -l
  30

==== PID 2 (kthreadd) ====

  root@localhost:~# stat --file-system /proc/2/root
    File: "/proc/2/root"
      ID: 200000000 Namelen: 255     Type: nullfs

  root@localhost:~# cat /proc/2/mountinfo | wc -l
  0

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
Changes in v3:
- Fix __override_init_fs() to save and return the original fs instead of
  the override, so __revert_init_fs() actually restores the caller's fs.
- Replace smp_store_release() with WRITE_ONCE() in fs override/revert.
- Move userspace_init_fs wiring commit before conversion patches to fix
  bisectability for user-process-context callers.
- Switch all remote procfs accesses (mounts_open_common, get_task_root,
  proc_cwd_link, task_state umask, kcmp) to use task_struct::real_fs
  instead of task_struct::fs.
- Move VFS_WARN_ON_ONCE in copy_fs() into else block so it doesn't fire
  for UMH threads.
- Fix sleeping under task_lock: validate_fs_switch() now runs outside
  task_lock() with might_sleep() annotation.
- Add pnfs/blocklayout scoped_with_init_fs() conversion.
- Wrap security_initramfs_populated() inside scoped_with_init_fs() in
  initramfs unpacking since IPE accesses current->fs->root.
- Fix stale comments: "two mounts" -> "three mounts", UMH comment,
  kthread_mntns() nullfs ambiguity.
- Fix commit message mismatches.
- Link to v2: https://patch.msgid.link/20260306-work-kthread-nullfs-v2-0-ad1b4bed7d3e@kernel.org

Changes in v2:
- Remove LOOKUP_IN_INIT in favor of scoped_with_init_fs().
- Link to v1: https://patch.msgid.link/20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org

---
Christian Brauner (26):
      fs: add switch_fs_struct()
      fs: notice when init abandons fs sharing
      fs: add scoped_with_init_fs()
      fs: add real_fs to track task's actual fs_struct
      fs: make userspace_init_fs a dynamically-initialized pointer
      rnbd: use scoped_with_init_fs() for block device open
      crypto: ccp: use scoped_with_init_fs() for SEV file access
      scsi: target: use scoped_with_init_fs() for ALUA metadata
      scsi: target: use scoped_with_init_fs() for APTPL metadata
      btrfs: use scoped_with_init_fs() for update_dev_time()
      coredump: use scoped_with_init_fs() for coredump path resolution
      fs: use scoped_with_init_fs() for kernel_read_file_from_path_initns()
      ksmbd: use scoped_with_init_fs() for share path resolution
      ksmbd: use scoped_with_init_fs() for filesystem info path lookup
      ksmbd: use scoped_with_init_fs() for VFS path operations
      pnfs/blocklayout: use scoped_with_init_fs() for SCSI device lookup
      initramfs: use scoped_with_init_fs() for rootfs unpacking
      af_unix: use scoped_with_init_fs() for coredump socket lookup
      fs: stop sharing fs_struct between init_task and pid 1
      fs: add umh argument to struct kernel_clone_args
      fs: add kthread_mntns()
      devtmpfs: create private mount namespace
      nullfs: make nullfs multi-instance
      fs: start all kthreads in nullfs
      fs: stop rewriting kthread fs structs
      fs: stop rewriting paths for PF_EXITING | PF_DUMPCORE

 drivers/base/devtmpfs.c           |   2 +-
 drivers/block/rnbd/rnbd-srv.c     |   4 +-
 drivers/crypto/ccp/sev-dev.c      |  12 ++---
 drivers/target/target_core_alua.c |   6 ++-
 drivers/target/target_core_pr.c   |   4 +-
 fs/btrfs/volumes.c                |  11 +++-
 fs/coredump.c                     |  11 ++--
 fs/fs_struct.c                    | 103 ++++++++++++++++++++++++++++++++++++--
 fs/kernel_read_file.c             |   9 +---
 fs/namespace.c                    |  46 ++++++++++++++---
 fs/nfs/blocklayout/dev.c          |  13 +++--
 fs/nullfs.c                       |  12 ++---
 fs/proc/array.c                   |   4 +-
 fs/proc/base.c                    |   8 +--
 fs/proc_namespace.c               |   4 +-
 fs/smb/server/mgmt/share_config.c |   4 +-
 fs/smb/server/smb2pdu.c           |   4 +-
 fs/smb/server/vfs.c               |  14 ++++--
 include/linux/fs_struct.h         |  34 +++++++++++++
 include/linux/init_task.h         |   1 +
 include/linux/mount.h             |   1 +
 include/linux/sched.h             |   1 +
 include/linux/sched/task.h        |   1 +
 init/init_task.c                  |   1 +
 init/initramfs.c                  |  14 ++++--
 init/main.c                       |  10 +++-
 kernel/fork.c                     |  53 ++++++++++++--------
 kernel/kcmp.c                     |   2 +-
 kernel/umh.c                      |   6 +--
 net/unix/af_unix.c                |  17 +++----
 30 files changed, 307 insertions(+), 105 deletions(-)
---
base-commit: c107785c7e8dbabd1c18301a1c362544b5786282
change-id: 20260303-work-kthread-nullfs-875a837f4198


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 01/26] fs: add switch_fs_struct()
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
@ 2026-03-11 21:43 ` Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 02/26] fs: notice when init abandons fs sharing Christian Brauner
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Don't open-code the guts of replacing current's fs struct.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/fs_struct.c            | 18 ++++++++++++++++++
 include/linux/fs_struct.h |  2 ++
 kernel/fork.c             | 22 ++++++----------------
 3 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index 394875d06fd6..c441586537e7 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -147,6 +147,24 @@ int unshare_fs_struct(void)
 }
 EXPORT_SYMBOL_GPL(unshare_fs_struct);
 
+struct fs_struct *switch_fs_struct(struct fs_struct *new_fs)
+{
+	struct fs_struct *fs;
+
+	scoped_guard(task_lock, current) {
+		fs = current->fs;
+		read_seqlock_excl(&fs->seq);
+		current->fs = new_fs;
+		if (--fs->users)
+			new_fs = NULL;
+		else
+			new_fs = fs;
+		read_sequnlock_excl(&fs->seq);
+	}
+
+	return new_fs;
+}
+
 /* to be mentioned only in INIT_TASK */
 struct fs_struct init_fs = {
 	.users		= 1,
diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h
index 0070764b790a..ade459383f92 100644
--- a/include/linux/fs_struct.h
+++ b/include/linux/fs_struct.h
@@ -40,6 +40,8 @@ static inline void get_fs_pwd(struct fs_struct *fs, struct path *pwd)
 	read_sequnlock_excl(&fs->seq);
 }
 
+struct fs_struct *switch_fs_struct(struct fs_struct *new_fs);
+
 extern bool current_chrooted(void);
 
 static inline int current_umask(void)
diff --git a/kernel/fork.c b/kernel/fork.c
index 65113a304518..67e57ee44548 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -3123,7 +3123,7 @@ static int unshare_fd(unsigned long unshare_flags, struct files_struct **new_fdp
  */
 int ksys_unshare(unsigned long unshare_flags)
 {
-	struct fs_struct *fs, *new_fs = NULL;
+	struct fs_struct *new_fs = NULL;
 	struct files_struct *new_fd = NULL;
 	struct cred *new_cred = NULL;
 	struct nsproxy *new_nsproxy = NULL;
@@ -3198,23 +3198,13 @@ int ksys_unshare(unsigned long unshare_flags)
 		if (new_nsproxy)
 			switch_task_namespaces(current, new_nsproxy);
 
-		task_lock(current);
+		if (new_fs)
+			new_fs = switch_fs_struct(new_fs);
 
-		if (new_fs) {
-			fs = current->fs;
-			read_seqlock_excl(&fs->seq);
-			current->fs = new_fs;
-			if (--fs->users)
-				new_fs = NULL;
-			else
-				new_fs = fs;
-			read_sequnlock_excl(&fs->seq);
-		}
-
-		if (new_fd)
+		if (new_fd) {
+			guard(task_lock)(current);
 			swap(current->files, new_fd);
-
-		task_unlock(current);
+		}
 
 		if (new_cred) {
 			/* Install the new user namespace */

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 02/26] fs: notice when init abandons fs sharing
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 01/26] fs: add switch_fs_struct() Christian Brauner
@ 2026-03-11 21:43 ` Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 03/26] fs: add scoped_with_init_fs() Christian Brauner
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

PID 1 may choose to stop sharing fs_struct state with us. Either via
unshare(CLONE_FS) or unshare(CLONE_NEWNS). Of course, PID 1 could have
chosen to create arbitrary process trees that all share fs_struct state
via CLONE_FS. This is a strong statement: We only care about PID 1 aka
the thread-group leader so subthread's fs_struct state doesn't matter.

PID 1 unsharing fs_struct state is a bug. PID 1 relies on various
kthreads to be able to perform work based on its fs_struct state.
Breaking that contract sucks for both sides. So just don't bother with
extra work for this. No sane init system should ever do this.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/fs_struct.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index c441586537e7..fcecf209f1a9 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -147,6 +147,30 @@ int unshare_fs_struct(void)
 }
 EXPORT_SYMBOL_GPL(unshare_fs_struct);
 
+/*
+ * PID 1 may choose to stop sharing fs_struct state with us.
+ * Either via unshare(CLONE_FS) or unshare(CLONE_NEWNS). Of
+ * course, PID 1 could have chosen to create arbitrary process
+ * trees that all share fs_struct state via CLONE_FS. This is a
+ * strong statement: We only care about PID 1 aka the thread-group
+ * leader so subthread's fs_struct state doesn't matter.
+ *
+ * PID 1 unsharing fs_struct state is a bug. PID 1 relies on
+ * various kthreads to be able to perform work based on its
+ * fs_struct state. Breaking that contract sucks for both sides.
+ * So just don't bother with extra work for this. No sane init
+ * system should ever do this.
+ */
+static inline void validate_fs_switch(struct fs_struct *old_fs)
+{
+	if (likely(current->pid != 1))
+		return;
+	/* @old_fs may be dangling but for comparison it's fine */
+	if (old_fs != &init_fs)
+		return;
+	pr_warn("VFS: Pid 1 stopped sharing filesystem state\n");
+}
+
 struct fs_struct *switch_fs_struct(struct fs_struct *new_fs)
 {
 	struct fs_struct *fs;
@@ -162,6 +186,7 @@ struct fs_struct *switch_fs_struct(struct fs_struct *new_fs)
 		read_sequnlock_excl(&fs->seq);
 	}
 
+	validate_fs_switch(fs);
 	return new_fs;
 }
 

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 03/26] fs: add scoped_with_init_fs()
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 01/26] fs: add switch_fs_struct() Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 02/26] fs: notice when init abandons fs sharing Christian Brauner
@ 2026-03-11 21:43 ` Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 04/26] fs: add real_fs to track task's actual fs_struct Christian Brauner
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Similar to scoped_with_kernel_creds() allow a temporary override of
current->fs to serve the few places where lookup is performed from
kthread context or needs init's filesytem state.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 include/linux/fs_struct.h | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h
index ade459383f92..e11d0e57168f 100644
--- a/include/linux/fs_struct.h
+++ b/include/linux/fs_struct.h
@@ -6,6 +6,7 @@
 #include <linux/path.h>
 #include <linux/spinlock.h>
 #include <linux/seqlock.h>
+#include <linux/vfsdebug.h>
 
 struct fs_struct {
 	int users;
@@ -49,4 +50,34 @@ static inline int current_umask(void)
 	return current->fs->umask;
 }
 
+/*
+ * Temporarily use userspace_init_fs for path resolution in kthreads.
+ * Callers should use scoped_with_init_fs() which automatically
+ * restores the original fs_struct at scope exit.
+ */
+static inline struct fs_struct *__override_init_fs(void)
+{
+	struct fs_struct *fs;
+
+	fs = current->fs;
+	WRITE_ONCE(current->fs, fs);
+	return fs;
+}
+
+static inline void __revert_init_fs(struct fs_struct *revert_fs)
+{
+	VFS_WARN_ON_ONCE(current->fs != revert_fs);
+	WRITE_ONCE(current->fs, revert_fs);
+}
+
+DEFINE_CLASS(__override_init_fs,
+	     struct fs_struct *,
+	     __revert_init_fs(_T),
+	     __override_init_fs(), void)
+
+#define scoped_with_init_fs() \
+	scoped_class(__override_init_fs, __UNIQUE_ID(label))
+
+void __init init_userspace_fs(void);
+
 #endif /* _LINUX_FS_STRUCT_H */

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 04/26] fs: add real_fs to track task's actual fs_struct
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (2 preceding siblings ...)
  2026-03-11 21:43 ` [PATCH RFC v3 03/26] fs: add scoped_with_init_fs() Christian Brauner
@ 2026-03-11 21:43 ` Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 05/26] fs: make userspace_init_fs a dynamically-initialized pointer Christian Brauner
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Add a real_fs field to task_struct that always mirrors the fs field.
This lays the groundwork for distinguishing between a task's permanent
fs_struct and one that is temporarily overridden via scoped_with_init_fs().

When a kthread temporarily overrides current->fs for path lookup, we
need to know the original fs_struct for operations like exit_fs() and
unshare_fs_struct() that must operate on the real, permanent fs.

For now real_fs is always equal to fs. It is maintained alongside fs in
all the relevant paths: exit_fs(), unshare_fs_struct(),
switch_fs_struct(), and copy_fs().

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/fs_struct.c        | 11 ++++++++---
 fs/proc/array.c       |  4 ++--
 fs/proc/base.c        |  8 ++++----
 fs/proc_namespace.c   |  4 ++--
 include/linux/sched.h |  1 +
 init/init_task.c      |  1 +
 kernel/fork.c         |  8 +++++++-
 kernel/kcmp.c         |  2 +-
 8 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index fcecf209f1a9..c03a574ed65a 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -61,7 +61,7 @@ void chroot_fs_refs(const struct path *old_root, const struct path *new_root)
 	read_lock(&tasklist_lock);
 	for_each_process_thread(g, p) {
 		task_lock(p);
-		fs = p->fs;
+		fs = p->real_fs;
 		if (fs) {
 			int hits = 0;
 			write_seqlock(&fs->seq);
@@ -89,12 +89,13 @@ void free_fs_struct(struct fs_struct *fs)
 
 void exit_fs(struct task_struct *tsk)
 {
-	struct fs_struct *fs = tsk->fs;
+	struct fs_struct *fs = tsk->real_fs;
 
 	if (fs) {
 		int kill;
 		task_lock(tsk);
 		read_seqlock_excl(&fs->seq);
+		tsk->real_fs = NULL;
 		tsk->fs = NULL;
 		kill = !--fs->users;
 		read_sequnlock_excl(&fs->seq);
@@ -126,7 +127,7 @@ struct fs_struct *copy_fs_struct(struct fs_struct *old)
 
 int unshare_fs_struct(void)
 {
-	struct fs_struct *fs = current->fs;
+	struct fs_struct *fs = current->real_fs;
 	struct fs_struct *new_fs = copy_fs_struct(fs);
 	int kill;
 
@@ -135,8 +136,10 @@ int unshare_fs_struct(void)
 
 	task_lock(current);
 	read_seqlock_excl(&fs->seq);
+	VFS_WARN_ON_ONCE(fs != current->fs);
 	kill = !--fs->users;
 	current->fs = new_fs;
+	current->real_fs = new_fs;
 	read_sequnlock_excl(&fs->seq);
 	task_unlock(current);
 
@@ -177,8 +180,10 @@ struct fs_struct *switch_fs_struct(struct fs_struct *new_fs)
 
 	scoped_guard(task_lock, current) {
 		fs = current->fs;
+		VFS_WARN_ON_ONCE(fs != current->real_fs);
 		read_seqlock_excl(&fs->seq);
 		current->fs = new_fs;
+		current->real_fs = new_fs;
 		if (--fs->users)
 			new_fs = NULL;
 		else
diff --git a/fs/proc/array.c b/fs/proc/array.c
index f447e734612a..10d792b8f170 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -168,8 +168,8 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
 	cred = get_task_cred(p);
 
 	task_lock(p);
-	if (p->fs)
-		umask = p->fs->umask;
+	if (p->real_fs)
+		umask = p->real_fs->umask;
 	if (p->files)
 		max_fds = files_fdtable(p->files)->max_fds;
 	task_unlock(p);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 4c863d17dfb4..28067e77b820 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -210,8 +210,8 @@ static int get_task_root(struct task_struct *task, struct path *root)
 	int result = -ENOENT;
 
 	task_lock(task);
-	if (task->fs) {
-		get_fs_root(task->fs, root);
+	if (task->real_fs) {
+		get_fs_root(task->real_fs, root);
 		result = 0;
 	}
 	task_unlock(task);
@@ -225,8 +225,8 @@ static int proc_cwd_link(struct dentry *dentry, struct path *path)
 
 	if (task) {
 		task_lock(task);
-		if (task->fs) {
-			get_fs_pwd(task->fs, path);
+		if (task->real_fs) {
+			get_fs_pwd(task->real_fs, path);
 			result = 0;
 		}
 		task_unlock(task);
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 5c555db68aa2..036356c0a55b 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -254,13 +254,13 @@ static int mounts_open_common(struct inode *inode, struct file *file,
 	}
 	ns = nsp->mnt_ns;
 	get_mnt_ns(ns);
-	if (!task->fs) {
+	if (!task->real_fs) {
 		task_unlock(task);
 		put_task_struct(task);
 		ret = -ENOENT;
 		goto err_put_ns;
 	}
-	get_fs_root(task->fs, &root);
+	get_fs_root(task->real_fs, &root);
 	task_unlock(task);
 	put_task_struct(task);
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a7b4a980eb2f..5c7b9df92ebb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1179,6 +1179,7 @@ struct task_struct {
 	unsigned long			last_switch_time;
 #endif
 	/* Filesystem information: */
+	struct fs_struct		*real_fs;
 	struct fs_struct		*fs;
 
 	/* Open file information: */
diff --git a/init/init_task.c b/init/init_task.c
index 5c838757fc10..7d0b4a5927eb 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -152,6 +152,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
 	RCU_POINTER_INITIALIZER(cred, &init_cred),
 	.comm		= INIT_TASK_COMM,
 	.thread		= INIT_THREAD,
+	.real_fs	= &init_fs,
 	.fs		= &init_fs,
 	.files		= &init_files,
 #ifdef CONFIG_IO_URING
diff --git a/kernel/fork.c b/kernel/fork.c
index 67e57ee44548..154703cf7d3d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1593,6 +1593,8 @@ static int copy_mm(u64 clone_flags, struct task_struct *tsk)
 static int copy_fs(u64 clone_flags, struct task_struct *tsk)
 {
 	struct fs_struct *fs = current->fs;
+
+	VFS_WARN_ON_ONCE(current->fs != current->real_fs);
 	if (clone_flags & CLONE_FS) {
 		/* tsk->fs is already what we want */
 		read_seqlock_excl(&fs->seq);
@@ -1605,7 +1607,7 @@ static int copy_fs(u64 clone_flags, struct task_struct *tsk)
 		read_sequnlock_excl(&fs->seq);
 		return 0;
 	}
-	tsk->fs = copy_fs_struct(fs);
+	tsk->real_fs = tsk->fs = copy_fs_struct(fs);
 	if (!tsk->fs)
 		return -ENOMEM;
 	return 0;
@@ -3152,6 +3154,10 @@ int ksys_unshare(unsigned long unshare_flags)
 	if (unshare_flags & CLONE_NEWNS)
 		unshare_flags |= CLONE_FS;
 
+	/* No unsharing with overriden fs state */
+	VFS_WARN_ON_ONCE(unshare_flags & (CLONE_NEWNS | CLONE_FS) &&
+			 current->fs != current->real_fs);
+
 	err = check_unshare_flags(unshare_flags);
 	if (err)
 		goto bad_unshare_out;
diff --git a/kernel/kcmp.c b/kernel/kcmp.c
index 7c1a65bd5f8d..76476aeee067 100644
--- a/kernel/kcmp.c
+++ b/kernel/kcmp.c
@@ -186,7 +186,7 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
 		ret = kcmp_ptr(task1->files, task2->files, KCMP_FILES);
 		break;
 	case KCMP_FS:
-		ret = kcmp_ptr(task1->fs, task2->fs, KCMP_FS);
+		ret = kcmp_ptr(task1->real_fs, task2->real_fs, KCMP_FS);
 		break;
 	case KCMP_SIGHAND:
 		ret = kcmp_ptr(task1->sighand, task2->sighand, KCMP_SIGHAND);

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 05/26] fs: make userspace_init_fs a dynamically-initialized pointer
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (3 preceding siblings ...)
  2026-03-11 21:43 ` [PATCH RFC v3 04/26] fs: add real_fs to track task's actual fs_struct Christian Brauner
@ 2026-03-11 21:43 ` Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 06/26] rnbd: use scoped_with_init_fs() for block device open Christian Brauner
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Change userspace_init_fs from a declared-but-unused extern struct to
a dynamically initialized pointer. Add init_userspace_fs() which is
called early in kernel_init() (PID 1) to record PID 1's fs_struct
as the canonical userspace filesystem state.

Wire up __override_init_fs() and __revert_init_fs() to actually swap
current->fs to/from userspace_init_fs. Previously these were no-ops
that stored current->fs back to itself.

Fix nullfs_userspace_init() to compare against userspace_init_fs
instead of &init_fs. When PID 1 unshares its filesystem state, revert
userspace_init_fs to init_fs's root (nullfs) so that stale filesystem
state is not silently inherited by kworkers and usermodehelpers.

At this stage PID 1's fs still points to rootfs (set by
init_mount_tree), so userspace_init_fs points to rootfs and
scoped_with_init_fs() is functionally equivalent to its previous no-op
behavior.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/fs_struct.c            | 48 ++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/fs_struct.h | 15 ++++++++-------
 include/linux/init_task.h |  1 +
 init/main.c               |  3 +++
 4 files changed, 59 insertions(+), 8 deletions(-)

diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index c03a574ed65a..f44e43ce6d93 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -8,6 +8,7 @@
 #include <linux/fs_struct.h>
 #include <linux/init_task.h>
 #include "internal.h"
+#include "mount.h"
 
 /*
  * Replace the fs->{rootmnt,root} with {mnt,dentry}. Put the old values.
@@ -163,15 +164,34 @@ EXPORT_SYMBOL_GPL(unshare_fs_struct);
  * fs_struct state. Breaking that contract sucks for both sides.
  * So just don't bother with extra work for this. No sane init
  * system should ever do this.
+ *
+ * On older kernels if PID 1 unshared its filesystem state with us the
+ * kernel simply used the stale fs_struct state implicitly pinning
+ * anything that PID 1 had last used. Even if PID 1 might've moved on to
+ * some completely different fs_struct state and might've even unmounted
+ * the old root.
+ *
+ * This has hilarious consequences: Think continuing to dump coredump
+ * state into an implicitly pinned directory somewhere. Calling random
+ * binaries in the old rootfs via usermodehelpers.
+ *
+ * Be aggressive about this: We simply reject operating on stale
+ * fs_struct state by reverting to nullfs. Every kworker that does
+ * lookups after this point will fail. Every usermodehelper call will
+ * fail. Tough luck but let's be kind and emit a warning to userspace.
  */
 static inline void validate_fs_switch(struct fs_struct *old_fs)
 {
+	might_sleep();
+
 	if (likely(current->pid != 1))
 		return;
 	/* @old_fs may be dangling but for comparison it's fine */
-	if (old_fs != &init_fs)
+	if (old_fs != userspace_init_fs)
 		return;
 	pr_warn("VFS: Pid 1 stopped sharing filesystem state\n");
+	set_fs_root(userspace_init_fs, &init_fs.root);
+	set_fs_pwd(userspace_init_fs, &init_fs.root);
 }
 
 struct fs_struct *switch_fs_struct(struct fs_struct *new_fs)
@@ -201,3 +221,29 @@ struct fs_struct init_fs = {
 	.seq		= __SEQLOCK_UNLOCKED(init_fs.seq),
 	.umask		= 0022,
 };
+
+struct fs_struct *userspace_init_fs __ro_after_init;
+EXPORT_SYMBOL_GPL(userspace_init_fs);
+
+void __init init_userspace_fs(void)
+{
+	struct mount *m;
+	struct path root;
+
+	/* Move PID 1 from nullfs into the initramfs. */
+	m = topmost_overmount(current->nsproxy->mnt_ns->root);
+	root.mnt = &m->mnt;
+	root.dentry = root.mnt->mnt_root;
+
+	VFS_WARN_ON_ONCE(current->pid != 1);
+
+	set_fs_root(current->fs, &root);
+	set_fs_pwd(current->fs, &root);
+
+	/* Hold a reference for the global pointer. */
+	read_seqlock_excl(&current->fs->seq);
+	current->fs->users++;
+	read_sequnlock_excl(&current->fs->seq);
+
+	userspace_init_fs = current->fs;
+}
diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h
index e11d0e57168f..97eef8d3863d 100644
--- a/include/linux/fs_struct.h
+++ b/include/linux/fs_struct.h
@@ -17,6 +17,7 @@ struct fs_struct {
 } __randomize_layout;
 
 extern struct kmem_cache *fs_cachep;
+extern struct fs_struct *userspace_init_fs;
 
 extern void exit_fs(struct task_struct *);
 extern void set_fs_root(struct fs_struct *, const struct path *);
@@ -57,17 +58,17 @@ static inline int current_umask(void)
  */
 static inline struct fs_struct *__override_init_fs(void)
 {
-	struct fs_struct *fs;
+	struct fs_struct *old_fs;
 
-	fs = current->fs;
-	WRITE_ONCE(current->fs, fs);
-	return fs;
+	old_fs = current->fs;
+	WRITE_ONCE(current->fs, userspace_init_fs);
+	return old_fs;
 }
 
-static inline void __revert_init_fs(struct fs_struct *revert_fs)
+static inline void __revert_init_fs(struct fs_struct *old_fs)
 {
-	VFS_WARN_ON_ONCE(current->fs != revert_fs);
-	WRITE_ONCE(current->fs, revert_fs);
+	VFS_WARN_ON_ONCE(current->fs != userspace_init_fs);
+	WRITE_ONCE(current->fs, old_fs);
 }
 
 DEFINE_CLASS(__override_init_fs,
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index a6cb241ea00c..61536be773f5 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -24,6 +24,7 @@
 
 extern struct files_struct init_files;
 extern struct fs_struct init_fs;
+extern struct fs_struct *userspace_init_fs;
 extern struct nsproxy init_nsproxy;
 
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
diff --git a/init/main.c b/init/main.c
index 1cb395dd94e4..5ccc642a5aa7 100644
--- a/init/main.c
+++ b/init/main.c
@@ -102,6 +102,7 @@
 #include <linux/stackdepot.h>
 #include <linux/randomize_kstack.h>
 #include <linux/pidfs.h>
+#include <linux/fs_struct.h>
 #include <linux/ptdump.h>
 #include <linux/time_namespace.h>
 #include <linux/unaligned.h>
@@ -1574,6 +1575,8 @@ static int __ref kernel_init(void *unused)
 {
 	int ret;
 
+	init_userspace_fs();
+
 	/*
 	 * Wait until kthreadd is all set-up.
 	 */

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 06/26] rnbd: use scoped_with_init_fs() for block device open
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (4 preceding siblings ...)
  2026-03-11 21:43 ` [PATCH RFC v3 05/26] fs: make userspace_init_fs a dynamically-initialized pointer Christian Brauner
@ 2026-03-11 21:43 ` Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 07/26] crypto: ccp: use scoped_with_init_fs() for SEV file access Christian Brauner
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Use scoped_with_init_fs() to temporarily override current->fs for
the bdev_file_open_by_path() call so the path lookup happens in
init's filesystem context.

process_msg_open() ← rnbd_srv_rdma_ev() ← RDMA completion callback ←
ib_cq_poll_work() ← kworker (InfiniBand completion workqueue)

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 drivers/block/rnbd/rnbd-srv.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
index 10e8c438bb43..79c9a5fb418f 100644
--- a/drivers/block/rnbd/rnbd-srv.c
+++ b/drivers/block/rnbd/rnbd-srv.c
@@ -11,6 +11,7 @@
 
 #include <linux/module.h>
 #include <linux/blkdev.h>
+#include <linux/fs_struct.h>
 
 #include "rnbd-srv.h"
 #include "rnbd-srv-trace.h"
@@ -734,7 +735,8 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
 		goto reject;
 	}
 
-	bdev_file = bdev_file_open_by_path(full_path, open_flags, NULL, NULL);
+	scoped_with_init_fs()
+		bdev_file = bdev_file_open_by_path(full_path, open_flags, NULL, NULL);
 	if (IS_ERR(bdev_file)) {
 		ret = PTR_ERR(bdev_file);
 		pr_err("Opening device '%s' on session %s failed, failed to open the block device, err: %pe\n",

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 07/26] crypto: ccp: use scoped_with_init_fs() for SEV file access
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (5 preceding siblings ...)
  2026-03-11 21:43 ` [PATCH RFC v3 06/26] rnbd: use scoped_with_init_fs() for block device open Christian Brauner
@ 2026-03-11 21:43 ` Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 08/26] scsi: target: use scoped_with_init_fs() for ALUA metadata Christian Brauner
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Replace the manual init_task root retrieval with scoped_with_init_fs()
to temporarily override current->fs. This allows using the simpler
filp_open() instead of the init_root() + file_open_root() pattern.

open_file_as_root() ← sev_read_init_ex_file() / sev_write_init_ex_file()
← sev_platform_init() ← __sev_guest_init() ← KVM ioctl — user process context

Needs init's root because the SEV init_ex file path should resolve
against the real root, not a KVM user's chroot.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 drivers/crypto/ccp/sev-dev.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 096f993974d1..4320054da0f6 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -260,20 +260,16 @@ static int sev_cmd_buffer_len(int cmd)
 
 static struct file *open_file_as_root(const char *filename, int flags, umode_t mode)
 {
-	struct path root __free(path_put) = {};
-
-	task_lock(&init_task);
-	get_fs_root(init_task.fs, &root);
-	task_unlock(&init_task);
-
 	CLASS(prepare_creds, cred)();
 	if (!cred)
 		return ERR_PTR(-ENOMEM);
 
 	cred->fsuid = GLOBAL_ROOT_UID;
 
-	scoped_with_creds(cred)
-		return file_open_root(&root, filename, flags, mode);
+	scoped_with_init_fs() {
+		scoped_with_creds(cred)
+			return filp_open(filename, flags, mode);
+	}
 }
 
 static int sev_read_init_ex_file(void)

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 08/26] scsi: target: use scoped_with_init_fs() for ALUA metadata
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (6 preceding siblings ...)
  2026-03-11 21:43 ` [PATCH RFC v3 07/26] crypto: ccp: use scoped_with_init_fs() for SEV file access Christian Brauner
@ 2026-03-11 21:43 ` Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 09/26] scsi: target: use scoped_with_init_fs() for APTPL metadata Christian Brauner
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Use scoped_with_init_fs() to temporarily override current->fs for
the filp_open() call in core_alua_write_tpg_metadata() so the
path lookup happens in init's filesystem context.

core_alua_write_tpg_metadata() ← core_alua_update_tpg_primary_metadata()
← core_alua_do_transition_tg_pt() ← target_queued_submit_work() ←
kworker (target submission workqueue)

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 drivers/target/target_core_alua.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/target/target_core_alua.c b/drivers/target/target_core_alua.c
index 10250aca5a81..fde88642a43a 100644
--- a/drivers/target/target_core_alua.c
+++ b/drivers/target/target_core_alua.c
@@ -18,6 +18,7 @@
 #include <linux/fcntl.h>
 #include <linux/file.h>
 #include <linux/fs.h>
+#include <linux/fs_struct.h>
 #include <scsi/scsi_proto.h>
 #include <linux/unaligned.h>
 
@@ -856,10 +857,13 @@ static int core_alua_write_tpg_metadata(
 	unsigned char *md_buf,
 	u32 md_buf_len)
 {
-	struct file *file = filp_open(path, O_RDWR | O_CREAT | O_TRUNC, 0600);
+	struct file *file;
 	loff_t pos = 0;
 	int ret;
 
+	scoped_with_init_fs()
+		file = filp_open(path, O_RDWR | O_CREAT | O_TRUNC, 0600);
+
 	if (IS_ERR(file)) {
 		pr_err("filp_open(%s) for ALUA metadata failed\n", path);
 		return -ENODEV;

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 09/26] scsi: target: use scoped_with_init_fs() for APTPL metadata
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (7 preceding siblings ...)
  2026-03-11 21:43 ` [PATCH RFC v3 08/26] scsi: target: use scoped_with_init_fs() for ALUA metadata Christian Brauner
@ 2026-03-11 21:43 ` Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 10/26] btrfs: use scoped_with_init_fs() for update_dev_time() Christian Brauner
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Use scoped_with_init_fs() to temporarily override current->fs for
the filp_open() call in __core_scsi3_write_aptpl_to_file() so the
path lookup happens in init's filesystem context.

__core_scsi3_write_aptpl_to_file() ← core_scsi3_update_and_write_aptpl()
← PR command handlers ← target_queued_submit_work() ← kworker

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 drivers/target/target_core_pr.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/target/target_core_pr.c b/drivers/target/target_core_pr.c
index f88e63aefcd8..2a030f119b24 100644
--- a/drivers/target/target_core_pr.c
+++ b/drivers/target/target_core_pr.c
@@ -18,6 +18,7 @@
 #include <linux/file.h>
 #include <linux/fcntl.h>
 #include <linux/fs.h>
+#include <linux/fs_struct.h>
 #include <scsi/scsi_proto.h>
 #include <linux/unaligned.h>
 
@@ -1969,7 +1970,8 @@ static int __core_scsi3_write_aptpl_to_file(
 	if (!path)
 		return -ENOMEM;
 
-	file = filp_open(path, flags, 0600);
+	scoped_with_init_fs()
+		file = filp_open(path, flags, 0600);
 	if (IS_ERR(file)) {
 		pr_err("filp_open(%s) for APTPL metadata"
 			" failed\n", path);

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 10/26] btrfs: use scoped_with_init_fs() for update_dev_time()
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (8 preceding siblings ...)
  2026-03-11 21:43 ` [PATCH RFC v3 09/26] scsi: target: use scoped_with_init_fs() for APTPL metadata Christian Brauner
@ 2026-03-11 21:43 ` Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 11/26] coredump: use scoped_with_init_fs() for coredump path resolution Christian Brauner
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

update_dev_time() can be called from both kthread and process context.
Use scoped_with_init_fs() to temporarily override current->fs for
the kern_path() call when running in kthread context so the path
lookup happens in init's filesystem context.

update_dev_time() ← btrfs_scratch_superblocks() ←
btrfs_dev_replace_finishing() ← btrfs_dev_replace_kthread()
← kthread (kthread_run)

Also called from ioctl (user process).

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/btrfs/volumes.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 648bb09fc416..b42e93c8e5b1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -12,6 +12,7 @@
 #include <linux/uuid.h>
 #include <linux/list_sort.h>
 #include <linux/namei.h>
+#include <linux/fs_struct.h>
 #include "misc.h"
 #include "disk-io.h"
 #include "extent-tree.h"
@@ -2119,8 +2120,16 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans,
 static void update_dev_time(const char *device_path)
 {
 	struct path path;
+	int err;
 
-	if (!kern_path(device_path, LOOKUP_FOLLOW, &path)) {
+	if (tsk_is_kthread(current)) {
+		scoped_with_init_fs()
+			err = kern_path(device_path, LOOKUP_FOLLOW, &path);
+	} else {
+		err = kern_path(device_path, LOOKUP_FOLLOW, &path);
+	}
+
+	if (!err) {
 		vfs_utimes(&path, NULL);
 		path_put(&path);
 	}

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 11/26] coredump: use scoped_with_init_fs() for coredump path resolution
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (9 preceding siblings ...)
  2026-03-11 21:43 ` [PATCH RFC v3 10/26] btrfs: use scoped_with_init_fs() for update_dev_time() Christian Brauner
@ 2026-03-11 21:43 ` Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 12/26] fs: use scoped_with_init_fs() for kernel_read_file_from_path_initns() Christian Brauner
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Use scoped_with_init_fs() to temporarily override current->fs for
the filp_open() call so the coredump path lookup happens in init's
filesystem context. This replaces the init_root() + file_open_root()
pattern with the simpler scoped override.

coredump_file() ← do_coredump() ← vfs_coredump() ← get_signal() — runs
as the crashing userspace process

Uses init's root to prevent a chrooted/user-namespaced process from
controlling where suid coredumps land. Not a kthread, but intentionally
needs init's fs for security.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/coredump.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 29df8aa19e2e..7428349f10bf 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -919,15 +919,10 @@ static bool coredump_file(struct core_name *cn, struct coredump_params *cprm,
 		 * with a fully qualified path" rule is to control where
 		 * coredumps may be placed using root privileges,
 		 * current->fs->root must not be used. Instead, use the
-		 * root directory of init_task.
+		 * root directory of PID 1.
 		 */
-		struct path root;
-
-		task_lock(&init_task);
-		get_fs_root(init_task.fs, &root);
-		task_unlock(&init_task);
-		file = file_open_root(&root, cn->corename, open_flags, 0600);
-		path_put(&root);
+		scoped_with_init_fs()
+			file = filp_open(cn->corename, open_flags, 0600);
 	} else {
 		file = filp_open(cn->corename, open_flags, 0600);
 	}

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 12/26] fs: use scoped_with_init_fs() for kernel_read_file_from_path_initns()
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (10 preceding siblings ...)
  2026-03-11 21:43 ` [PATCH RFC v3 11/26] coredump: use scoped_with_init_fs() for coredump path resolution Christian Brauner
@ 2026-03-11 21:43 ` Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 13/26] ksmbd: use scoped_with_init_fs() for share path resolution Christian Brauner
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Replace the manual init_task root retrieval with scoped_with_init_fs()
to temporarily override current->fs. This allows using the simpler
filp_open() instead of the init_root() + file_open_root() pattern.

kernel_read_file_from_path_initns() ← fw_get_filesystem_firmware() ←
_request_firmware() ← request_firmware_work_func() ← kworker (async
firmware loading)

Also called synchronously from request_firmware() which can be user or
kthread context.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/kernel_read_file.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/kernel_read_file.c b/fs/kernel_read_file.c
index de32c95d823d..9c2ba9240083 100644
--- a/fs/kernel_read_file.c
+++ b/fs/kernel_read_file.c
@@ -150,18 +150,13 @@ ssize_t kernel_read_file_from_path_initns(const char *path, loff_t offset,
 					  enum kernel_read_file_id id)
 {
 	struct file *file;
-	struct path root;
 	ssize_t ret;
 
 	if (!path || !*path)
 		return -EINVAL;
 
-	task_lock(&init_task);
-	get_fs_root(init_task.fs, &root);
-	task_unlock(&init_task);
-
-	file = file_open_root(&root, path, O_RDONLY, 0);
-	path_put(&root);
+	scoped_with_init_fs()
+		file = filp_open(path, O_RDONLY, 0);
 	if (IS_ERR(file))
 		return PTR_ERR(file);
 

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 13/26] ksmbd: use scoped_with_init_fs() for share path resolution
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (11 preceding siblings ...)
  2026-03-11 21:43 ` [PATCH RFC v3 12/26] fs: use scoped_with_init_fs() for kernel_read_file_from_path_initns() Christian Brauner
@ 2026-03-11 21:43 ` Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 14/26] ksmbd: use scoped_with_init_fs() for filesystem info path lookup Christian Brauner
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Use scoped_with_init_fs() to temporarily override current->fs for
the kern_path() call in share_config_request() so the share path
lookup happens in init's filesystem context.

All ksmbd paths ← SMB command handlers ← handle_ksmbd_work() ← workqueue
← ksmbd_conn_handler_loop() ← kthread

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/smb/server/mgmt/share_config.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/smb/server/mgmt/share_config.c b/fs/smb/server/mgmt/share_config.c
index 53f44ff4d376..4535566abef2 100644
--- a/fs/smb/server/mgmt/share_config.c
+++ b/fs/smb/server/mgmt/share_config.c
@@ -9,6 +9,7 @@
 #include <linux/rwsem.h>
 #include <linux/parser.h>
 #include <linux/namei.h>
+#include <linux/fs_struct.h>
 #include <linux/sched.h>
 #include <linux/mm.h>
 
@@ -189,7 +190,8 @@ static struct ksmbd_share_config *share_config_request(struct ksmbd_work *work,
 				goto out;
 			}
 
-			ret = kern_path(share->path, 0, &share->vfs_path);
+			scoped_with_init_fs()
+				ret = kern_path(share->path, 0, &share->vfs_path);
 			ksmbd_revert_fsids(work);
 			if (ret) {
 				ksmbd_debug(SMB, "failed to access '%s'\n",

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 14/26] ksmbd: use scoped_with_init_fs() for filesystem info path lookup
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (12 preceding siblings ...)
  2026-03-11 21:43 ` [PATCH RFC v3 13/26] ksmbd: use scoped_with_init_fs() for share path resolution Christian Brauner
@ 2026-03-11 21:43 ` Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 15/26] ksmbd: use scoped_with_init_fs() for VFS path operations Christian Brauner
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Use scoped_with_init_fs() to temporarily override current->fs for
the kern_path() call in smb2_get_info_filesystem() so the share
path lookup happens in init's filesystem context.

All ksmbd paths ← SMB command handlers ← handle_ksmbd_work() ← workqueue
← ksmbd_conn_handler_loop() ← kthread

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/smb/server/smb2pdu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/smb/server/smb2pdu.c b/fs/smb/server/smb2pdu.c
index 743c629fe7ec..0667b0b663cd 100644
--- a/fs/smb/server/smb2pdu.c
+++ b/fs/smb/server/smb2pdu.c
@@ -9,6 +9,7 @@
 #include <net/addrconf.h>
 #include <linux/syscalls.h>
 #include <linux/namei.h>
+#include <linux/fs_struct.h>
 #include <linux/statfs.h>
 #include <linux/ethtool.h>
 #include <linux/falloc.h>
@@ -5463,7 +5464,8 @@ static int smb2_get_info_filesystem(struct ksmbd_work *work,
 	if (!share->path)
 		return -EIO;
 
-	rc = kern_path(share->path, LOOKUP_NO_SYMLINKS, &path);
+	scoped_with_init_fs()
+		rc = kern_path(share->path, LOOKUP_NO_SYMLINKS, &path);
 	if (rc) {
 		pr_err("cannot create vfs path\n");
 		return -EIO;

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 15/26] ksmbd: use scoped_with_init_fs() for VFS path operations
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (13 preceding siblings ...)
  2026-03-11 21:43 ` [PATCH RFC v3 14/26] ksmbd: use scoped_with_init_fs() for filesystem info path lookup Christian Brauner
@ 2026-03-11 21:43 ` Christian Brauner
  2026-03-11 21:43 ` [PATCH RFC v3 16/26] pnfs/blocklayout: use scoped_with_init_fs() for SCSI device lookup Christian Brauner
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Use scoped_with_init_fs() to temporarily override current->fs for
path lookups in ksmbd VFS helpers:
- ksmbd_vfs_path_lookup(): wrap vfs_path_parent_lookup()
- ksmbd_vfs_link(): wrap kern_path() for old path resolution
- ksmbd_vfs_kern_path_create(): wrap start_creating_path()

This ensures path lookups happen in init's filesystem context.

All ksmbd paths ← SMB command handlers ← handle_ksmbd_work() ← workqueue
← ksmbd_conn_handler_loop() ← kthread

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/smb/server/vfs.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
index d08973b288e5..4b537e169160 100644
--- a/fs/smb/server/vfs.c
+++ b/fs/smb/server/vfs.c
@@ -7,6 +7,7 @@
 #include <crypto/sha2.h>
 #include <linux/kernel.h>
 #include <linux/fs.h>
+#include <linux/fs_struct.h>
 #include <linux/filelock.h>
 #include <linux/uaccess.h>
 #include <linux/backing-dev.h>
@@ -67,9 +68,10 @@ static int ksmbd_vfs_path_lookup(struct ksmbd_share_config *share_conf,
 	}
 
 	CLASS(filename_kernel, filename)(pathname);
-	err = vfs_path_parent_lookup(filename, flags,
-				     path, &last, &type,
-				     root_share_path);
+	scoped_with_init_fs()
+		err = vfs_path_parent_lookup(filename, flags,
+					     path, &last, &type,
+					     root_share_path);
 	if (err)
 		return err;
 
@@ -622,7 +624,8 @@ int ksmbd_vfs_link(struct ksmbd_work *work, const char *oldname,
 	if (ksmbd_override_fsids(work))
 		return -ENOMEM;
 
-	err = kern_path(oldname, LOOKUP_NO_SYMLINKS, &oldpath);
+	scoped_with_init_fs()
+		err = kern_path(oldname, LOOKUP_NO_SYMLINKS, &oldpath);
 	if (err) {
 		pr_err("cannot get linux path for %s, err = %d\n",
 		       oldname, err);
@@ -1258,7 +1261,8 @@ struct dentry *ksmbd_vfs_kern_path_create(struct ksmbd_work *work,
 	if (!abs_name)
 		return ERR_PTR(-ENOMEM);
 
-	dent = start_creating_path(AT_FDCWD, abs_name, path, flags);
+	scoped_with_init_fs()
+		dent = start_creating_path(AT_FDCWD, abs_name, path, flags);
 	kfree(abs_name);
 	return dent;
 }

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 16/26] pnfs/blocklayout: use scoped_with_init_fs() for SCSI device lookup
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (14 preceding siblings ...)
  2026-03-11 21:43 ` [PATCH RFC v3 15/26] ksmbd: use scoped_with_init_fs() for VFS path operations Christian Brauner
@ 2026-03-11 21:43 ` Christian Brauner
  2026-03-11 21:44 ` [PATCH RFC v3 17/26] initramfs: use scoped_with_init_fs() for rootfs unpacking Christian Brauner
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

bl_open_path() resolves pNFS block device paths under /dev/disk/by-id/
via bdev_file_open_by_path() -> lookup_bdev() -> kern_path(). This
path resolution uses current->fs->root.

With kthreads now starting in nullfs, this fails when the call
originates from writeback kworker context because current->fs->root
points at the empty nullfs. The full callchain from kworker is:

  wb_workfn                              [kworker writeback callback]
    ...
      nfs_writepages                     [address_space_operations.writepages]
        nfs_do_writepage
          nfs_pageio_add_request
            ...
              bl_pg_init_write           [nfs_pageio_ops.pg_init]
                pnfs_generic_pg_init_write
                  pnfs_update_layout
                    nfs4_proc_layoutget  [synchronous RPC]
                      pnfs_layout_process
                        bl_alloc_lseg
                          bl_alloc_extent
                            bl_find_get_deviceid
                              bl_alloc_deviceid_node
                                bl_parse_deviceid
                                  bl_parse_scsi
                                    bl_open_path
                                      bdev_file_open_by_path
                                        lookup_bdev
                                          kern_path  <- current->fs->root

bl_open_path() can also be reached from userspace process context (e.g.
open, read, write syscalls via pnfs_update_layout). In that case
current->fs must not be overridden as the path should resolve against
the calling process's filesystem root.

Add a tsk_is_kthread() conditional in bl_open_path() to only apply
scoped_with_init_fs() in kthread context.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/nfs/blocklayout/dev.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
index cc6327d97a91..eed960839608 100644
--- a/fs/nfs/blocklayout/dev.c
+++ b/fs/nfs/blocklayout/dev.c
@@ -4,6 +4,7 @@
  */
 #include <linux/sunrpc/svc.h>
 #include <linux/blkdev.h>
+#include <linux/fs_struct.h>
 #include <linux/nfs4.h>
 #include <linux/nfs_fs.h>
 #include <linux/nfs_xdr.h>
@@ -363,21 +364,27 @@ static struct file *
 bl_open_path(struct pnfs_block_volume *v, const char *prefix)
 {
 	struct file *bdev_file;
-	const char *devname;
+	const char *devname __free(kfree) = NULL;
 
 	devname = kasprintf(GFP_KERNEL, "/dev/disk/by-id/%s%*phN",
 			prefix, v->scsi.designator_len, v->scsi.designator);
 	if (!devname)
 		return ERR_PTR(-ENOMEM);
 
-	bdev_file = bdev_file_open_by_path(devname, BLK_OPEN_READ | BLK_OPEN_WRITE,
+	if (tsk_is_kthread(current)) {
+		scoped_with_init_fs()
+			bdev_file = bdev_file_open_by_path(devname,
+					BLK_OPEN_READ | BLK_OPEN_WRITE,
 					NULL, NULL);
+	} else {
+		bdev_file = bdev_file_open_by_path(devname,
+				BLK_OPEN_READ | BLK_OPEN_WRITE, NULL, NULL);
+	}
 	if (IS_ERR(bdev_file)) {
 		dprintk("failed to open device %s (%ld)\n",
 			devname, PTR_ERR(bdev_file));
 	}
 
-	kfree(devname);
 	return bdev_file;
 }
 

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 17/26] initramfs: use scoped_with_init_fs() for rootfs unpacking
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (15 preceding siblings ...)
  2026-03-11 21:43 ` [PATCH RFC v3 16/26] pnfs/blocklayout: use scoped_with_init_fs() for SCSI device lookup Christian Brauner
@ 2026-03-11 21:44 ` Christian Brauner
  2026-03-11 21:44 ` [PATCH RFC v3 18/26] af_unix: use scoped_with_init_fs() for coredump socket lookup Christian Brauner
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:44 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Extract the initramfs unpacking code into a separate
unpack_initramfs() function and wrap its invocation from
do_populate_rootfs() with scoped_with_init_fs(). This ensures all
file operations during initramfs unpacking (including filp_open()
calls in do_name() and populate_initrd_image()) happen in init's
filesystem context.

Note that security_initramfs_populated() needs the scope as well since
it does use current->fs to derive the initramfs superblock.

do_populate_rootfs() ← async_schedule_domain() ← kworker (async
workqueue)

May also run synchronously from PID 1 in case async workqueue is
considered full. Overriding in that case is fine as well.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 init/initramfs.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/init/initramfs.c b/init/initramfs.c
index 139baed06589..3faa2045b9cf 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -3,6 +3,7 @@
 #include <linux/async.h>
 #include <linux/export.h>
 #include <linux/fs.h>
+#include <linux/fs_struct.h>
 #include <linux/slab.h>
 #include <linux/types.h>
 #include <linux/fcntl.h>
@@ -715,7 +716,7 @@ static void __init populate_initrd_image(char *err)
 }
 #endif /* CONFIG_BLK_DEV_RAM */
 
-static void __init do_populate_rootfs(void *unused, async_cookie_t cookie)
+static void __init unpack_initramfs(async_cookie_t cookie)
 {
 	/* Load the built in initramfs */
 	char *err = unpack_to_rootfs(__initramfs_start, __initramfs_size);
@@ -723,7 +724,7 @@ static void __init do_populate_rootfs(void *unused, async_cookie_t cookie)
 		panic_show_mem("%s", err); /* Failed to decompress INTERNAL initramfs */
 
 	if (!initrd_start || IS_ENABLED(CONFIG_INITRAMFS_FORCE))
-		goto done;
+		return;
 
 	if (IS_ENABLED(CONFIG_BLK_DEV_RAM))
 		printk(KERN_INFO "Trying to unpack rootfs image as initramfs...\n");
@@ -738,9 +739,14 @@ static void __init do_populate_rootfs(void *unused, async_cookie_t cookie)
 		printk(KERN_EMERG "Initramfs unpacking failed: %s\n", err);
 #endif
 	}
+}
 
-done:
-	security_initramfs_populated();
+static void __init do_populate_rootfs(void *unused, async_cookie_t cookie)
+{
+	scoped_with_init_fs() {
+		unpack_initramfs(cookie);
+		security_initramfs_populated();
+	}
 
 	/*
 	 * If the initrd region is overlapped with crashkernel reserved region,

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 18/26] af_unix: use scoped_with_init_fs() for coredump socket lookup
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (16 preceding siblings ...)
  2026-03-11 21:44 ` [PATCH RFC v3 17/26] initramfs: use scoped_with_init_fs() for rootfs unpacking Christian Brauner
@ 2026-03-11 21:44 ` Christian Brauner
  2026-03-11 21:44 ` [PATCH RFC v3 19/26] fs: stop sharing fs_struct between init_task and pid 1 Christian Brauner
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:44 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Use scoped_with_init_fs() to temporarily override current->fs for the
coredump unix socket path resolution. This replaces the init_root() +
vfs_path_lookup() pattern with scoped_with_init_fs() + kern_path().

The old code used LOOKUP_BENEATH to confine the lookup beneath init's
root. This is dropped because the coredump socket path is absolute and
resolved from root (where ".." is a no-op), and LOOKUP_NO_SYMLINKS
already blocks any symlink-based escape. LOOKUP_BENEATH was redundant
in this context.

unix_find_bsd(SOCK_COREDUMP) ← coredump_sock_connect() ← do_coredump() —
same crashing userspace process

Same security rationale as coredump.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 net/unix/af_unix.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 3756a93dc63a..64b56b3d0aee 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1198,17 +1198,12 @@ static struct sock *unix_find_bsd(struct sockaddr_un *sunaddr, int addr_len,
 	unix_mkname_bsd(sunaddr, addr_len);
 
 	if (flags & SOCK_COREDUMP) {
-		struct path root;
-
-		task_lock(&init_task);
-		get_fs_root(init_task.fs, &root);
-		task_unlock(&init_task);
-
-		scoped_with_kernel_creds()
-			err = vfs_path_lookup(root.dentry, root.mnt, sunaddr->sun_path,
-					      LOOKUP_BENEATH | LOOKUP_NO_SYMLINKS |
-					      LOOKUP_NO_MAGICLINKS, &path);
-		path_put(&root);
+		scoped_with_init_fs() {
+			scoped_with_kernel_creds()
+				err = kern_path(sunaddr->sun_path,
+						LOOKUP_NO_SYMLINKS |
+						LOOKUP_NO_MAGICLINKS, &path);
+		}
 		if (err)
 			goto fail;
 	} else {

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 19/26] fs: stop sharing fs_struct between init_task and pid 1
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (17 preceding siblings ...)
  2026-03-11 21:44 ` [PATCH RFC v3 18/26] af_unix: use scoped_with_init_fs() for coredump socket lookup Christian Brauner
@ 2026-03-11 21:44 ` Christian Brauner
  2026-03-11 21:44 ` [PATCH RFC v3 20/26] fs: add umh argument to struct kernel_clone_args Christian Brauner
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:44 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Spawn kernel_init (PID 1) via kernel_clone() directly instead of
user_mode_thread(), without CLONE_FS. This gives PID 1 its own private
copy of init_task's fs_struct rather than sharing it.

This is a prerequisite for isolating kthreads in nullfs: when
init_task's fs is later pointed at nullfs, PID 1 must not share it
or init_userspace_fs() would modify init_task's fs as well, defeating
the isolation.

At this stage PID 1 still gets rootfs (a private copy rather than a
shared reference), so there is no functional change.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 init/main.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/init/main.c b/init/main.c
index 5ccc642a5aa7..6633d4bea52b 100644
--- a/init/main.c
+++ b/init/main.c
@@ -714,6 +714,11 @@ static __initdata DECLARE_COMPLETION(kthreadd_done);
 
 static noinline void __ref __noreturn rest_init(void)
 {
+	struct kernel_clone_args init_args = {
+		.flags		= (CLONE_VM | CLONE_UNTRACED),
+		.fn		= kernel_init,
+		.fn_arg		= NULL,
+	};
 	struct task_struct *tsk;
 	int pid;
 
@@ -723,7 +728,7 @@ static noinline void __ref __noreturn rest_init(void)
 	 * the init task will end up wanting to create kthreads, which, if
 	 * we schedule it before we create kthreadd, will OOPS.
 	 */
-	pid = user_mode_thread(kernel_init, NULL, CLONE_FS);
+	pid = kernel_clone(&init_args);
 	/*
 	 * Pin init on the boot CPU. Task migration is not properly working
 	 * until sched_init_smp() has been run. It will set the allowed

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 20/26] fs: add umh argument to struct kernel_clone_args
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (18 preceding siblings ...)
  2026-03-11 21:44 ` [PATCH RFC v3 19/26] fs: stop sharing fs_struct between init_task and pid 1 Christian Brauner
@ 2026-03-11 21:44 ` Christian Brauner
  2026-03-11 21:44 ` [PATCH RFC v3 21/26] fs: add kthread_mntns() Christian Brauner
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:44 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Add a umh field to struct kernel_clone_args. When set, copy_fs() copies
from pid 1's fs_struct instead of the kthread's fs_struct. This ensures
usermodehelper threads always get init's filesystem state regardless of
their parent's (kthreadd's) fs.

Usermodehelper threads are not allowed to create mount namespaces
(CLONE_NEWNS), share filesystem state (CLONE_FS), or be started from
a non-initial mount namespace. No usermodehelper currently does this so
we don't need to worry about this restriction.

Set .umh = 1 in user_mode_thread(). At this stage pid 1's fs points to
rootfs which is the same as kthreadd's fs, so this is functionally
equivalent.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 include/linux/sched/task.h |  1 +
 kernel/fork.c              | 25 +++++++++++++++++++++----
 kernel/umh.c               |  6 ++----
 3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 41ed884cffc9..e0c1ca8c6a18 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -31,6 +31,7 @@ struct kernel_clone_args {
 	u32 io_thread:1;
 	u32 user_worker:1;
 	u32 no_files:1;
+	u32 umh:1;
 	unsigned long stack;
 	unsigned long stack_size;
 	unsigned long tls;
diff --git a/kernel/fork.c b/kernel/fork.c
index 154703cf7d3d..f62b4c370f74 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1590,11 +1590,27 @@ static int copy_mm(u64 clone_flags, struct task_struct *tsk)
 	return 0;
 }
 
-static int copy_fs(u64 clone_flags, struct task_struct *tsk)
+static int copy_fs(u64 clone_flags, struct task_struct *tsk, bool umh)
 {
-	struct fs_struct *fs = current->fs;
+	struct fs_struct *fs;
+
+	/*
+	 * Usermodehelper may use userspace_init_fs filesystem state but
+	 * they don't get to create mount namespaces, share the
+	 * filesystem state, or be started from a non-initial mount
+	 * namespace.
+	 */
+	if (umh) {
+		if (clone_flags & (CLONE_NEWNS | CLONE_FS))
+			return -EINVAL;
+		if (current->nsproxy->mnt_ns != &init_mnt_ns)
+			return -EINVAL;
+		fs = userspace_init_fs;
+	} else {
+		fs = current->fs;
+		VFS_WARN_ON_ONCE(current->fs != current->real_fs);
+	}
 
-	VFS_WARN_ON_ONCE(current->fs != current->real_fs);
 	if (clone_flags & CLONE_FS) {
 		/* tsk->fs is already what we want */
 		read_seqlock_excl(&fs->seq);
@@ -2213,7 +2229,7 @@ __latent_entropy struct task_struct *copy_process(
 	retval = copy_files(clone_flags, p, args->no_files);
 	if (retval)
 		goto bad_fork_cleanup_semundo;
-	retval = copy_fs(clone_flags, p);
+	retval = copy_fs(clone_flags, p, args->umh);
 	if (retval)
 		goto bad_fork_cleanup_files;
 	retval = copy_sighand(clone_flags, p);
@@ -2727,6 +2743,7 @@ pid_t user_mode_thread(int (*fn)(void *), void *arg, unsigned long flags)
 		.exit_signal	= (flags & CSIGNAL),
 		.fn		= fn,
 		.fn_arg		= arg,
+		.umh		= 1,
 	};
 
 	return kernel_clone(&args);
diff --git a/kernel/umh.c b/kernel/umh.c
index cffda97d961c..d3f4b308b85d 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -71,10 +71,8 @@ static int call_usermodehelper_exec_async(void *data)
 	spin_unlock_irq(&current->sighand->siglock);
 
 	/*
-	 * Initial kernel threads share ther FS with init, in order to
-	 * get the init root directory. But we've now created a new
-	 * thread that is going to execve a user process and has its own
-	 * 'struct fs_struct'. Reset umask to the default.
+	 * Usermodehelper threads get a copy of userspace init's
+	 * fs_struct. Reset umask to the default.
 	 */
 	current->fs->umask = 0022;
 

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 21/26] fs: add kthread_mntns()
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (19 preceding siblings ...)
  2026-03-11 21:44 ` [PATCH RFC v3 20/26] fs: add umh argument to struct kernel_clone_args Christian Brauner
@ 2026-03-11 21:44 ` Christian Brauner
  2026-03-11 22:13   ` Thomas Weißschuh
  2026-03-11 21:44 ` [PATCH RFC v3 22/26] devtmpfs: create private mount namespace Christian Brauner
                   ` (4 subsequent siblings)
  25 siblings, 1 reply; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:44 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Allow kthreads to create a private mount namespace.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/namespace.c        | 30 ++++++++++++++++++++++++++++++
 include/linux/mount.h |  1 +
 2 files changed, 31 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index 854f4fc66469..e23d2fa7e255 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6200,6 +6200,36 @@ static void __init init_mount_tree(void)
 	ns_tree_add(&init_mnt_ns);
 }
 
+/*
+ * Allow to give a specific kthread a private mount namespace anchored
+ * in the userspace nullfs (mount id 1) so it can mount.
+ */
+int __init kthread_mntns(void)
+{
+	struct mount *m;
+	struct path root;
+	int ret;
+
+	/* Only allowed for kthreads in the initial mount namespace. */
+	VFS_WARN_ON_ONCE(!(current->flags & PF_KTHREAD));
+	VFS_WARN_ON_ONCE(current->nsproxy->mnt_ns != &init_mnt_ns);
+
+	/*
+	 * TODO: switch to creating a completely empty mount namespace
+	 * once that series lands.
+	 */
+	ret = ksys_unshare(CLONE_NEWNS);
+	if (ret)
+		return ret;
+
+	m = current->nsproxy->mnt_ns->root;
+	root.mnt = &m->mnt;
+	root.dentry = root.mnt->mnt_root;
+	set_fs_pwd(current->fs, &root);
+	set_fs_root(current->fs, &root);
+	return 0;
+}
+
 void __init mnt_init(void)
 {
 	int err;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index acfe7ef86a1b..69d61f21b548 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -106,6 +106,7 @@ int do_mount(const char *, const char __user *,
 extern const struct path *collect_paths(const struct path *, struct path *, unsigned);
 extern void drop_collected_paths(const struct path *, const struct path *);
 extern void kern_unmount_array(struct vfsmount *mnt[], unsigned int num);
+int __init kthread_mntns(void);
 
 extern int cifs_root_data(char **dev, char **opts);
 

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 22/26] devtmpfs: create private mount namespace
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (20 preceding siblings ...)
  2026-03-11 21:44 ` [PATCH RFC v3 21/26] fs: add kthread_mntns() Christian Brauner
@ 2026-03-11 21:44 ` Christian Brauner
  2026-03-11 21:44 ` [PATCH RFC v3 23/26] nullfs: make nullfs multi-instance Christian Brauner
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:44 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Kernel threads are located in a completely isolated nullfs mount.
Make it possible for a kthread to create a private mount namespace so it
can mount private filesystem instances. This is only used by devtmpfs.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 drivers/base/devtmpfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index b1c4ceb65026..246ac0b331fe 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -413,7 +413,7 @@ static noinline int __init devtmpfs_setup(void *p)
 {
 	int err;
 
-	err = ksys_unshare(CLONE_NEWNS);
+	err = kthread_mntns();
 	if (err)
 		goto out;
 	err = init_mount("devtmpfs", "/", "devtmpfs", DEVTMPFS_MFLAGS, NULL);

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 23/26] nullfs: make nullfs multi-instance
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (21 preceding siblings ...)
  2026-03-11 21:44 ` [PATCH RFC v3 22/26] devtmpfs: create private mount namespace Christian Brauner
@ 2026-03-11 21:44 ` Christian Brauner
  2026-03-11 21:44 ` [PATCH RFC v3 24/26] fs: start all kthreads in nullfs Christian Brauner
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:44 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Allow multiple instances of nullfs to be created. Right now we're only
going to use it for kernel-internal purposes but ultimately we can allow
userspace to use it too to e.g., safely overmount stuff.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/nullfs.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/fs/nullfs.c b/fs/nullfs.c
index fdbd3e5d3d71..c6f5b9493e26 100644
--- a/fs/nullfs.c
+++ b/fs/nullfs.c
@@ -40,14 +40,9 @@ static int nullfs_fs_fill_super(struct super_block *s, struct fs_context *fc)
 	return 0;
 }
 
-/*
- * For now this is a single global instance. If needed we can make it
- * mountable by userspace at which point we will need to make it
- * multi-instance.
- */
 static int nullfs_fs_get_tree(struct fs_context *fc)
 {
-	return get_tree_single(fc, nullfs_fs_fill_super);
+	return get_tree_nodev(fc, nullfs_fs_fill_super);
 }
 
 static const struct fs_context_operations nullfs_fs_context_ops = {
@@ -57,9 +52,8 @@ static const struct fs_context_operations nullfs_fs_context_ops = {
 static int nullfs_init_fs_context(struct fs_context *fc)
 {
 	fc->ops		= &nullfs_fs_context_ops;
-	fc->global	= true;
-	fc->sb_flags	= SB_NOUSER;
-	fc->s_iflags	= SB_I_NOEXEC | SB_I_NODEV;
+	fc->sb_flags	|= SB_NOUSER;
+	fc->s_iflags	|= SB_I_NOEXEC | SB_I_NODEV;
 	return 0;
 }
 

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 24/26] fs: start all kthreads in nullfs
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (22 preceding siblings ...)
  2026-03-11 21:44 ` [PATCH RFC v3 23/26] nullfs: make nullfs multi-instance Christian Brauner
@ 2026-03-11 21:44 ` Christian Brauner
  2026-03-11 21:44 ` [PATCH RFC v3 25/26] fs: stop rewriting kthread fs structs Christian Brauner
  2026-03-11 21:44 ` [PATCH RFC v3 26/26] fs: stop rewriting paths for PF_EXITING | PF_DUMPCORE Christian Brauner
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:44 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Point init_task's fs_struct (root and pwd) at a private nullfs instance
instead of the mutable rootfs. All kthreads now start isolated in nullfs
and must use scoped_with_init_fs() for any path resolution.

PID 1 is moved from nullfs into the initramfs by init_userspace_fs().
Usermodehelper threads use userspace_init_fs via the umh flag in
copy_fs(). All subsystems that need init's filesystem state for path
resolution already use scoped_with_init_fs() from earlier commits in
this series.

This isolates kthreads from userspace filesystem state and makes it
hard to perform filesystem operations from kthread context.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/namespace.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index e23d2fa7e255..5d318e2e1e4a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6143,12 +6143,14 @@ static void __init init_mount_tree(void)
 	struct path root;
 
 	/*
-	 * We create two mounts:
+	 * We create three mounts:
 	 *
 	 * (1) nullfs with mount id 1
 	 * (2) mutable rootfs with mount id 2
+	 * (3) private nullfs for kthreads (SB_KERNMOUNT)
 	 *
-	 * with (2) mounted on top of (1).
+	 * with (2) mounted on top of (1). The init_task's root and pwd
+	 * are pointed at (3) so all kthreads start isolated in nullfs.
 	 */
 	nullfs_mnt = vfs_kern_mount(&nullfs_fs_type, 0, "nullfs", NULL);
 	if (IS_ERR(nullfs_mnt))
@@ -6188,12 +6190,14 @@ static void __init init_mount_tree(void)
 		init_mnt_ns.nr_mounts++;
 	}
 
+	nullfs_mnt = kern_mount(&nullfs_fs_type);
+	if (IS_ERR(nullfs_mnt))
+		panic("VFS: Failed to create private nullfs instance");
+	root.mnt	= nullfs_mnt;
+	root.dentry	= nullfs_mnt->mnt_root;
+
 	init_task.nsproxy->mnt_ns = &init_mnt_ns;
 	get_mnt_ns(&init_mnt_ns);
-
-	/* The root and pwd always point to the mutable rootfs. */
-	root.mnt	= mnt;
-	root.dentry	= mnt->mnt_root;
 	set_fs_pwd(current->fs, &root);
 	set_fs_root(current->fs, &root);
 

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 25/26] fs: stop rewriting kthread fs structs
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (23 preceding siblings ...)
  2026-03-11 21:44 ` [PATCH RFC v3 24/26] fs: start all kthreads in nullfs Christian Brauner
@ 2026-03-11 21:44 ` Christian Brauner
  2026-03-11 21:44 ` [PATCH RFC v3 26/26] fs: stop rewriting paths for PF_EXITING | PF_DUMPCORE Christian Brauner
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:44 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

Now that we isolated kthreads filesystem state completely from userspace
stop rewriting their state.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/fs_struct.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index f44e43ce6d93..2a98cfbedd32 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -61,6 +61,10 @@ void chroot_fs_refs(const struct path *old_root, const struct path *new_root)
 
 	read_lock(&tasklist_lock);
 	for_each_process_thread(g, p) {
+		/* leave kthreads alone */
+		if (p->flags & PF_KTHREAD)
+			continue;
+
 		task_lock(p);
 		fs = p->real_fs;
 		if (fs) {

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v3 26/26] fs: stop rewriting paths for PF_EXITING | PF_DUMPCORE
  2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
                   ` (24 preceding siblings ...)
  2026-03-11 21:44 ` [PATCH RFC v3 25/26] fs: stop rewriting kthread fs structs Christian Brauner
@ 2026-03-11 21:44 ` Christian Brauner
  25 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2026-03-11 21:44 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-kernel, Alexander Viro, Jens Axboe,
	Jan Kara, Tejun Heo, Jann Horn, Christian Brauner

If the task is dead or dumping core stop messing with its fs struct.
There's no point in doing that. Worst case it'll be stuck in a stale
path until it calls exit_fs().

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/fs_struct.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index 2a98cfbedd32..34699f3b6f88 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -61,8 +61,7 @@ void chroot_fs_refs(const struct path *old_root, const struct path *new_root)
 
 	read_lock(&tasklist_lock);
 	for_each_process_thread(g, p) {
-		/* leave kthreads alone */
-		if (p->flags & PF_KTHREAD)
+		if (p->flags & (PF_KTHREAD | PF_EXITING | PF_DUMPCORE))
 			continue;
 
 		task_lock(p);

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC v3 21/26] fs: add kthread_mntns()
  2026-03-11 21:44 ` [PATCH RFC v3 21/26] fs: add kthread_mntns() Christian Brauner
@ 2026-03-11 22:13   ` Thomas Weißschuh
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Weißschuh @ 2026-03-11 22:13 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Linus Torvalds, linux-kernel, Alexander Viro,
	Jens Axboe, Jan Kara, Tejun Heo, Jann Horn

On 2026-03-11 22:44:04+0100, Christian Brauner wrote:
> Allow kthreads to create a private mount namespace.
> 
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> ---
>  fs/namespace.c        | 30 ++++++++++++++++++++++++++++++
>  include/linux/mount.h |  1 +
>  2 files changed, 31 insertions(+)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 854f4fc66469..e23d2fa7e255 100644

(...)

> diff --git a/include/linux/mount.h b/include/linux/mount.h
> index acfe7ef86a1b..69d61f21b548 100644
> --- a/include/linux/mount.h
> +++ b/include/linux/mount.h
> @@ -106,6 +106,7 @@ int do_mount(const char *, const char __user *,
>  extern const struct path *collect_paths(const struct path *, struct path *, unsigned);
>  extern void drop_collected_paths(const struct path *, const struct path *);
>  extern void kern_unmount_array(struct vfsmount *mnt[], unsigned int num);
> +int __init kthread_mntns(void);

The usage of '__init' needs an '#include <linux/init.h>', otherwise
compilation fails in some cases. Or drop the '__init' altogether, as
it has no meaning in a declaration anyways.

>  extern int cifs_root_data(char **dev, char **opts);
>  
> 
> -- 
> 2.47.3
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2026-03-11 22:13 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11 21:43 [PATCH RFC v3 00/26] fs,kthread: start all kthreads in nullfs Christian Brauner
2026-03-11 21:43 ` [PATCH RFC v3 01/26] fs: add switch_fs_struct() Christian Brauner
2026-03-11 21:43 ` [PATCH RFC v3 02/26] fs: notice when init abandons fs sharing Christian Brauner
2026-03-11 21:43 ` [PATCH RFC v3 03/26] fs: add scoped_with_init_fs() Christian Brauner
2026-03-11 21:43 ` [PATCH RFC v3 04/26] fs: add real_fs to track task's actual fs_struct Christian Brauner
2026-03-11 21:43 ` [PATCH RFC v3 05/26] fs: make userspace_init_fs a dynamically-initialized pointer Christian Brauner
2026-03-11 21:43 ` [PATCH RFC v3 06/26] rnbd: use scoped_with_init_fs() for block device open Christian Brauner
2026-03-11 21:43 ` [PATCH RFC v3 07/26] crypto: ccp: use scoped_with_init_fs() for SEV file access Christian Brauner
2026-03-11 21:43 ` [PATCH RFC v3 08/26] scsi: target: use scoped_with_init_fs() for ALUA metadata Christian Brauner
2026-03-11 21:43 ` [PATCH RFC v3 09/26] scsi: target: use scoped_with_init_fs() for APTPL metadata Christian Brauner
2026-03-11 21:43 ` [PATCH RFC v3 10/26] btrfs: use scoped_with_init_fs() for update_dev_time() Christian Brauner
2026-03-11 21:43 ` [PATCH RFC v3 11/26] coredump: use scoped_with_init_fs() for coredump path resolution Christian Brauner
2026-03-11 21:43 ` [PATCH RFC v3 12/26] fs: use scoped_with_init_fs() for kernel_read_file_from_path_initns() Christian Brauner
2026-03-11 21:43 ` [PATCH RFC v3 13/26] ksmbd: use scoped_with_init_fs() for share path resolution Christian Brauner
2026-03-11 21:43 ` [PATCH RFC v3 14/26] ksmbd: use scoped_with_init_fs() for filesystem info path lookup Christian Brauner
2026-03-11 21:43 ` [PATCH RFC v3 15/26] ksmbd: use scoped_with_init_fs() for VFS path operations Christian Brauner
2026-03-11 21:43 ` [PATCH RFC v3 16/26] pnfs/blocklayout: use scoped_with_init_fs() for SCSI device lookup Christian Brauner
2026-03-11 21:44 ` [PATCH RFC v3 17/26] initramfs: use scoped_with_init_fs() for rootfs unpacking Christian Brauner
2026-03-11 21:44 ` [PATCH RFC v3 18/26] af_unix: use scoped_with_init_fs() for coredump socket lookup Christian Brauner
2026-03-11 21:44 ` [PATCH RFC v3 19/26] fs: stop sharing fs_struct between init_task and pid 1 Christian Brauner
2026-03-11 21:44 ` [PATCH RFC v3 20/26] fs: add umh argument to struct kernel_clone_args Christian Brauner
2026-03-11 21:44 ` [PATCH RFC v3 21/26] fs: add kthread_mntns() Christian Brauner
2026-03-11 22:13   ` Thomas Weißschuh
2026-03-11 21:44 ` [PATCH RFC v3 22/26] devtmpfs: create private mount namespace Christian Brauner
2026-03-11 21:44 ` [PATCH RFC v3 23/26] nullfs: make nullfs multi-instance Christian Brauner
2026-03-11 21:44 ` [PATCH RFC v3 24/26] fs: start all kthreads in nullfs Christian Brauner
2026-03-11 21:44 ` [PATCH RFC v3 25/26] fs: stop rewriting kthread fs structs Christian Brauner
2026-03-11 21:44 ` [PATCH RFC v3 26/26] fs: stop rewriting paths for PF_EXITING | PF_DUMPCORE Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox