linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/16] pidfs: persistent info & xattrs
@ 2025-06-18 20:53 Christian Brauner
  2025-06-18 20:53 ` [PATCH v2 01/16] pidfs: raise SB_I_NODEV and SB_I_NOEXEC Christian Brauner
                   ` (15 more replies)
  0 siblings, 16 replies; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

Persist exit and coredump information independent of whether anyone
currently holds a pidfd for the struct pid.

The current scheme allocated pidfs dentries on-demand repeatedly.
This scheme is reaching it's limits as it makes it impossible to pin
information that needs to be available after the task has exited or
coredumped and that should not be lost simply because the pidfd got
closed temporarily. The next opener should still see the stashed
information.

This is also a prerequisite for supporting extended attributes on
pidfds to allow attaching meta information to them.

If someone opens a pidfd for a struct pid a pidfs dentry is allocated
and stashed in pid->stashed. Once the last pidfd for the struct pid is
closed the pidfs dentry is released and removed from pid->stashed.

So if 10 callers create a pidfs dentry for the same struct pid
sequentially, i.e., each closing the pidfd before the other creates a
new one then a new pidfs dentry is allocated every time.

Because multiple tasks acquiring and releasing a pidfd for the same
struct pid can race with each another a task may still find a valid
pidfs entry from the previous task in pid->stashed and reuse it. Or it
might find a dead dentry in there and fail to reuse it and so stashes a
new pidfs dentry. Multiple tasks may race to stash a new pidfs dentry
but only one will succeed, the other ones will put their dentry.

The current scheme aims to ensure that a pidfs dentry for a struct pid
can only be created if the task is still alive or if a pidfs dentry
already existed before the task was reaped and so exit information has
been was stashed in the pidfs inode.

That's great except that it's buggy. If a pidfs dentry is stashed in
pid->stashed after pidfs_exit() but before __unhash_process() is called
we will return a pidfd for a reaped task without exit information being
available.

The pidfds_pid_valid() check does not guard against this race as it
doens't sync at all with pidfs_exit(). The pid_has_task() check might be
successful simply because we're before __unhash_process() but after
pidfs_exit().

Introduce a new scheme where the lifetime of information associated with
a pidfs entry (coredump and exit information) isn't bound to the
lifetime of the pidfs inode but the struct pid itself.

The first time a pidfs dentry is allocated for a struct pid a struct
pidfs_attr will be allocated which will be used to store exit and
coredump information.

If all pidfs for the pidfs dentry are closed the dentry and inode can be
cleaned up but the struct pidfs_attr will stick until the struct pid
itself is freed. This will ensure minimal memory usage while persisting
relevant information.

The new scheme has various advantages. First, it allows to close the
race where we end up handing out a pidfd for a reaped task for which no
exit information is available. Second, it minimizes memory usage.
Third, it allows to remove complex lifetime tracking via dentries when
registering a struct pid with pidfs. There's no need to get or put a
reference. Instead, the lifetime of exit and coredump information
associated with a struct pid is bound to the lifetime of struct pid
itself.

Now that we have a way to persist information for pidfs dentries we can
start supporting extended attributes on pidfds. This will allow
userspace to attach meta information to tasks.

One natural extension would be to introduce a custom pidfs.* extended
attribute space and allow for the inheritance of extended attributes
across fork() and exec().

The first simple scheme will allow privileged userspace to set trusted
extended attributes on pidfs inodes.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
Christian Brauner (16):
      pidfs: raise SB_I_NODEV and SB_I_NOEXEC
      libfs: massage path_from_stashed() to allow custom stashing behavior
      libfs: massage path_from_stashed()
      pidfs: move to anonymous struct
      pidfs: persist information
      pidfs: remove unused members from struct pidfs_inode
      pidfs: remove custom inode allocation
      pidfs: remove pidfs_{get,put}_pid()
      pidfs: remove pidfs_pid_valid()
      libfs: prepare to allow for non-immutable pidfd inodes
      pidfs: make inodes mutable
      pidfs: support xattrs on pidfds
      selftests/pidfd: test extended attribute support
      selftests/pidfd: test extended attribute support
      selftests/pidfd: test setattr support
      pidfs: add some CONFIG_DEBUG_VFS asserts

 fs/coredump.c                                      |   6 -
 fs/internal.h                                      |   3 +
 fs/libfs.c                                         |  34 +-
 fs/pidfs.c                                         | 422 ++++++++++++---------
 include/linux/pid.h                                |  14 +-
 include/linux/pidfs.h                              |   3 +-
 kernel/pid.c                                       |   2 +-
 net/unix/af_unix.c                                 |   5 -
 tools/testing/selftests/pidfd/.gitignore           |   2 +
 tools/testing/selftests/pidfd/Makefile             |   3 +-
 tools/testing/selftests/pidfd/pidfd_setattr_test.c |  69 ++++
 tools/testing/selftests/pidfd/pidfd_xattr_test.c   | 132 +++++++
 12 files changed, 480 insertions(+), 215 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250618-work-pidfs-persistent-251fe3cf5c3b


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 01/16] pidfs: raise SB_I_NODEV and SB_I_NOEXEC
  2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
@ 2025-06-18 20:53 ` Christian Brauner
  2025-06-22 20:37   ` Alexander Mikhalitsyn
  2025-06-18 20:53 ` [PATCH v2 02/16] libfs: massage path_from_stashed() to allow custom stashing behavior Christian Brauner
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

Similar to commit 1ed95281c0c7 ("anon_inode: raise SB_I_NODEV and SB_I_NOEXEC"):
it shouldn't be possible to execute pidfds via
execveat(fd_anon_inode, "", NULL, NULL, AT_EMPTY_PATH)
so raise SB_I_NOEXEC so that no one gets any creative ideas.

Also raise SB_I_NODEV as we don't expect or support any devices on pidfs.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/pidfs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/pidfs.c b/fs/pidfs.c
index c1f0a067be40..ff2560b34ed1 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -891,6 +891,8 @@ static int pidfs_init_fs_context(struct fs_context *fc)
 	if (!ctx)
 		return -ENOMEM;
 
+	fc->s_iflags |= SB_I_NOEXEC;
+	fc->s_iflags |= SB_I_NODEV;
 	ctx->ops = &pidfs_sops;
 	ctx->eops = &pidfs_export_operations;
 	ctx->dops = &pidfs_dentry_operations;

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 02/16] libfs: massage path_from_stashed() to allow custom stashing behavior
  2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
  2025-06-18 20:53 ` [PATCH v2 01/16] pidfs: raise SB_I_NODEV and SB_I_NOEXEC Christian Brauner
@ 2025-06-18 20:53 ` Christian Brauner
  2025-06-22 20:40   ` Alexander Mikhalitsyn
  2025-06-18 20:53 ` [PATCH v2 03/16] libfs: massage path_from_stashed() Christian Brauner
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

* Add a callback to struct stashed_operations so it's possible to
  implement custom behavior for pidfs and allow for it to return errors.

* Teach stashed_dentry_get() to handle error pointers.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/internal.h |  3 +++
 fs/libfs.c    | 27 ++++++++++++++++++++-------
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 393f6c5c24f6..22ba066d1dba 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -322,12 +322,15 @@ struct mnt_idmap *alloc_mnt_idmap(struct user_namespace *mnt_userns);
 struct mnt_idmap *mnt_idmap_get(struct mnt_idmap *idmap);
 void mnt_idmap_put(struct mnt_idmap *idmap);
 struct stashed_operations {
+	struct dentry *(*stash_dentry)(struct dentry **stashed,
+				       struct dentry *dentry);
 	void (*put_data)(void *data);
 	int (*init_inode)(struct inode *inode, void *data);
 };
 int path_from_stashed(struct dentry **stashed, struct vfsmount *mnt, void *data,
 		      struct path *path);
 void stashed_dentry_prune(struct dentry *dentry);
+struct dentry *stash_dentry(struct dentry **stashed, struct dentry *dentry);
 struct dentry *stashed_dentry_get(struct dentry **stashed);
 /**
  * path_mounted - check whether path is mounted
diff --git a/fs/libfs.c b/fs/libfs.c
index 9ea0ecc325a8..3541e22c87b5 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -2128,6 +2128,8 @@ struct dentry *stashed_dentry_get(struct dentry **stashed)
 	dentry = rcu_dereference(*stashed);
 	if (!dentry)
 		return NULL;
+	if (IS_ERR(dentry))
+		return dentry;
 	if (!lockref_get_not_dead(&dentry->d_lockref))
 		return NULL;
 	return dentry;
@@ -2176,8 +2178,7 @@ static struct dentry *prepare_anon_dentry(struct dentry **stashed,
 	return dentry;
 }
 
-static struct dentry *stash_dentry(struct dentry **stashed,
-				   struct dentry *dentry)
+struct dentry *stash_dentry(struct dentry **stashed, struct dentry *dentry)
 {
 	guard(rcu)();
 	for (;;) {
@@ -2218,12 +2219,15 @@ static struct dentry *stash_dentry(struct dentry **stashed,
 int path_from_stashed(struct dentry **stashed, struct vfsmount *mnt, void *data,
 		      struct path *path)
 {
-	struct dentry *dentry;
+	struct dentry *dentry, *res;
 	const struct stashed_operations *sops = mnt->mnt_sb->s_fs_info;
 
 	/* See if dentry can be reused. */
-	path->dentry = stashed_dentry_get(stashed);
-	if (path->dentry) {
+	res = stashed_dentry_get(stashed);
+	if (IS_ERR(res))
+		return PTR_ERR(res);
+	if (res) {
+		path->dentry = res;
 		sops->put_data(data);
 		goto out_path;
 	}
@@ -2234,8 +2238,17 @@ int path_from_stashed(struct dentry **stashed, struct vfsmount *mnt, void *data,
 		return PTR_ERR(dentry);
 
 	/* Added a new dentry. @data is now owned by the filesystem. */
-	path->dentry = stash_dentry(stashed, dentry);
-	if (path->dentry != dentry)
+	if (sops->stash_dentry)
+		res = sops->stash_dentry(stashed, dentry);
+	else
+		res = stash_dentry(stashed, dentry);
+	if (IS_ERR(res)) {
+		dput(dentry);
+		return PTR_ERR(res);
+	}
+	path->dentry = res;
+	/* A dentry was reused. */
+	if (res != dentry)
 		dput(dentry);
 
 out_path:

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 03/16] libfs: massage path_from_stashed()
  2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
  2025-06-18 20:53 ` [PATCH v2 01/16] pidfs: raise SB_I_NODEV and SB_I_NOEXEC Christian Brauner
  2025-06-18 20:53 ` [PATCH v2 02/16] libfs: massage path_from_stashed() to allow custom stashing behavior Christian Brauner
@ 2025-06-18 20:53 ` Christian Brauner
  2025-06-22 20:42   ` Alexander Mikhalitsyn
  2025-06-18 20:53 ` [PATCH v2 04/16] pidfs: move to anonymous struct Christian Brauner
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

Make it a littler easier to follow.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/libfs.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index 3541e22c87b5..997d3a363ce6 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -2227,9 +2227,8 @@ int path_from_stashed(struct dentry **stashed, struct vfsmount *mnt, void *data,
 	if (IS_ERR(res))
 		return PTR_ERR(res);
 	if (res) {
-		path->dentry = res;
 		sops->put_data(data);
-		goto out_path;
+		goto make_path;
 	}
 
 	/* Allocate a new dentry. */
@@ -2246,15 +2245,14 @@ int path_from_stashed(struct dentry **stashed, struct vfsmount *mnt, void *data,
 		dput(dentry);
 		return PTR_ERR(res);
 	}
-	path->dentry = res;
-	/* A dentry was reused. */
 	if (res != dentry)
 		dput(dentry);
 
-out_path:
-	WARN_ON_ONCE(path->dentry->d_fsdata != stashed);
-	WARN_ON_ONCE(d_inode(path->dentry)->i_private != data);
+make_path:
+	path->dentry = res;
 	path->mnt = mntget(mnt);
+	VFS_WARN_ON_ONCE(path->dentry->d_fsdata != stashed);
+	VFS_WARN_ON_ONCE(d_inode(path->dentry)->i_private != data);
 	return 0;
 }
 

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 04/16] pidfs: move to anonymous struct
  2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
                   ` (2 preceding siblings ...)
  2025-06-18 20:53 ` [PATCH v2 03/16] libfs: massage path_from_stashed() Christian Brauner
@ 2025-06-18 20:53 ` Christian Brauner
  2025-06-22 20:44   ` Alexander Mikhalitsyn
  2025-06-18 20:53 ` [PATCH v2 05/16] pidfs: persist information Christian Brauner
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

Move the pidfs entries to an anonymous struct.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 include/linux/pid.h | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index 453ae6d8a68d..00646a692dd4 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -52,14 +52,15 @@ struct upid {
 	struct pid_namespace *ns;
 };
 
-struct pid
-{
+struct pid {
 	refcount_t count;
 	unsigned int level;
 	spinlock_t lock;
-	struct dentry *stashed;
-	u64 ino;
-	struct rb_node pidfs_node;
+	struct {
+		u64 ino;
+		struct rb_node pidfs_node;
+		struct dentry *stashed;
+	};
 	/* lists of tasks that use this pid */
 	struct hlist_head tasks[PIDTYPE_MAX];
 	struct hlist_head inodes;

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 05/16] pidfs: persist information
  2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
                   ` (3 preceding siblings ...)
  2025-06-18 20:53 ` [PATCH v2 04/16] pidfs: move to anonymous struct Christian Brauner
@ 2025-06-18 20:53 ` Christian Brauner
  2025-06-22 21:09   ` Alexander Mikhalitsyn
  2025-06-18 20:53 ` [PATCH v2 06/16] pidfs: remove unused members from struct pidfs_inode Christian Brauner
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

Persist exit and coredump information independent of whether anyone
currently holds a pidfd for the struct pid.

The current scheme allocated pidfs dentries on-demand repeatedly.
This scheme is reaching it's limits as it makes it impossible to pin
information that needs to be available after the task has exited or
coredumped and that should not be lost simply because the pidfd got
closed temporarily. The next opener should still see the stashed
information.

This is also a prerequisite for supporting extended attributes on
pidfds to allow attaching meta information to them.

If someone opens a pidfd for a struct pid a pidfs dentry is allocated
and stashed in pid->stashed. Once the last pidfd for the struct pid is
closed the pidfs dentry is released and removed from pid->stashed.

So if 10 callers create a pidfs dentry for the same struct pid
sequentially, i.e., each closing the pidfd before the other creates a
new one then a new pidfs dentry is allocated every time.

Because multiple tasks acquiring and releasing a pidfd for the same
struct pid can race with each another a task may still find a valid
pidfs entry from the previous task in pid->stashed and reuse it. Or it
might find a dead dentry in there and fail to reuse it and so stashes a
new pidfs dentry. Multiple tasks may race to stash a new pidfs dentry
but only one will succeed, the other ones will put their dentry.

The current scheme aims to ensure that a pidfs dentry for a struct pid
can only be created if the task is still alive or if a pidfs dentry
already existed before the task was reaped and so exit information has
been was stashed in the pidfs inode.

That's great except that it's buggy. If a pidfs dentry is stashed in
pid->stashed after pidfs_exit() but before __unhash_process() is called
we will return a pidfd for a reaped task without exit information being
available.

The pidfds_pid_valid() check does not guard against this race as it
doens't sync at all with pidfs_exit(). The pid_has_task() check might be
successful simply because we're before __unhash_process() but after
pidfs_exit().

Introduce a new scheme where the lifetime of information associated with
a pidfs entry (coredump and exit information) isn't bound to the
lifetime of the pidfs inode but the struct pid itself.

The first time a pidfs dentry is allocated for a struct pid a struct
pidfs_attr will be allocated which will be used to store exit and
coredump information.

If all pidfs for the pidfs dentry are closed the dentry and inode can be
cleaned up but the struct pidfs_attr will stick until the struct pid
itself is freed. This will ensure minimal memory usage while persisting
relevant information.

The new scheme has various advantages. First, it allows to close the
race where we end up handing out a pidfd for a reaped task for which no
exit information is available. Second, it minimizes memory usage.
Third, it allows to remove complex lifetime tracking via dentries when
registering a struct pid with pidfs. There's no need to get or put a
reference. Instead, the lifetime of exit and coredump information
associated with a struct pid is bound to the lifetime of struct pid
itself.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/pidfs.c            | 212 ++++++++++++++++++++++++++++++++++----------------
 include/linux/pid.h   |   3 +
 include/linux/pidfs.h |   1 +
 kernel/pid.c          |   2 +-
 4 files changed, 151 insertions(+), 67 deletions(-)

diff --git a/fs/pidfs.c b/fs/pidfs.c
index ff2560b34ed1..6a907457b1fe 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -25,7 +25,10 @@
 #include "internal.h"
 #include "mount.h"
 
+#define PIDFS_PID_DEAD ERR_PTR(-ESRCH)
+
 static struct kmem_cache *pidfs_cachep __ro_after_init;
+static struct kmem_cache *pidfs_attr_cachep __ro_after_init;
 
 /*
  * Stashes information that userspace needs to access even after the
@@ -37,6 +40,11 @@ struct pidfs_exit_info {
 	__u32 coredump_mask;
 };
 
+struct pidfs_attr {
+	struct pidfs_exit_info __pei;
+	struct pidfs_exit_info *exit_info;
+};
+
 struct pidfs_inode {
 	struct pidfs_exit_info __pei;
 	struct pidfs_exit_info *exit_info;
@@ -125,6 +133,7 @@ void pidfs_add_pid(struct pid *pid)
 
 	pid->ino = pidfs_ino_nr;
 	pid->stashed = NULL;
+	pid->attr = NULL;
 	pidfs_ino_nr++;
 
 	write_seqcount_begin(&pidmap_lock_seq);
@@ -139,6 +148,18 @@ void pidfs_remove_pid(struct pid *pid)
 	write_seqcount_end(&pidmap_lock_seq);
 }
 
+void pidfs_free_pid(struct pid *pid)
+{
+	/*
+	 * Any dentry must've been wiped from the pid by now.
+	 * Otherwise there's a reference count bug.
+	 */
+	VFS_WARN_ON_ONCE(pid->stashed);
+
+	if (!IS_ERR(pid->attr))
+		kfree(pid->attr);
+}
+
 #ifdef CONFIG_PROC_FS
 /**
  * pidfd_show_fdinfo - print information about a pidfd
@@ -261,13 +282,13 @@ static __u32 pidfs_coredump_mask(unsigned long mm_flags)
 static long pidfd_info(struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct pidfd_info __user *uinfo = (struct pidfd_info __user *)arg;
-	struct inode *inode = file_inode(file);
 	struct pid *pid = pidfd_pid(file);
 	size_t usize = _IOC_SIZE(cmd);
 	struct pidfd_info kinfo = {};
 	struct pidfs_exit_info *exit_info;
 	struct user_namespace *user_ns;
 	struct task_struct *task;
+	struct pidfs_attr *attr;
 	const struct cred *c;
 	__u64 mask;
 
@@ -286,8 +307,9 @@ static long pidfd_info(struct file *file, unsigned int cmd, unsigned long arg)
 	if (!pid_in_current_pidns(pid))
 		return -ESRCH;
 
+	attr = READ_ONCE(pid->attr);
 	if (mask & PIDFD_INFO_EXIT) {
-		exit_info = READ_ONCE(pidfs_i(inode)->exit_info);
+		exit_info = READ_ONCE(attr->exit_info);
 		if (exit_info) {
 			kinfo.mask |= PIDFD_INFO_EXIT;
 #ifdef CONFIG_CGROUPS
@@ -300,7 +322,7 @@ static long pidfd_info(struct file *file, unsigned int cmd, unsigned long arg)
 
 	if (mask & PIDFD_INFO_COREDUMP) {
 		kinfo.mask |= PIDFD_INFO_COREDUMP;
-		kinfo.coredump_mask = READ_ONCE(pidfs_i(inode)->__pei.coredump_mask);
+		kinfo.coredump_mask = READ_ONCE(attr->__pei.coredump_mask);
 	}
 
 	task = get_pid_task(pid, PIDTYPE_PID);
@@ -552,41 +574,61 @@ struct pid *pidfd_pid(const struct file *file)
  * task has been reaped which cannot happen until we're out of
  * release_task().
  *
- * If this struct pid is referred to by a pidfd then
- * stashed_dentry_get() will return the dentry and inode for that struct
- * pid. Since we've taken a reference on it there's now an additional
- * reference from the exit path on it. Which is fine. We're going to put
- * it again in a second and we know that the pid is kept alive anyway.
+ * If this struct pid has at least once been referred to by a pidfd then
+ * pid->attr will be allocated. If not we mark the struct pid as dead so
+ * anyone who is trying to register it with pidfs will fail to do so.
+ * Otherwise we would hand out pidfs for reaped tasks without having
+ * exit information available.
  *
- * Worst case is that we've filled in the info and immediately free the
- * dentry and inode afterwards since the pidfd has been closed. Since
+ * Worst case is that we've filled in the info and the pid gets freed
+ * right away in free_pid() when no one holds a pidfd anymore. Since
  * pidfs_exit() currently is placed after exit_task_work() we know that
- * it cannot be us aka the exiting task holding a pidfd to ourselves.
+ * it cannot be us aka the exiting task holding a pidfd to itself.
  */
 void pidfs_exit(struct task_struct *tsk)
 {
-	struct dentry *dentry;
+	struct pid *pid = task_pid(tsk);
+	struct pidfs_attr *attr;
+	struct pidfs_exit_info *exit_info;
+#ifdef CONFIG_CGROUPS
+	struct cgroup *cgrp;
+#endif
 
 	might_sleep();
 
-	dentry = stashed_dentry_get(&task_pid(tsk)->stashed);
-	if (dentry) {
-		struct inode *inode = d_inode(dentry);
-		struct pidfs_exit_info *exit_info = &pidfs_i(inode)->__pei;
-#ifdef CONFIG_CGROUPS
-		struct cgroup *cgrp;
+	guard(spinlock_irq)(&pid->wait_pidfd.lock);
+	attr = pid->attr;
+	if (!attr) {
+		/*
+		 * No one ever held a pidfd for this struct pid.
+		 * Mark it as dead so no one can add a pidfs
+		 * entry anymore. We're about to be reaped and
+		 * so no exit information would be available.
+		 */
+		pid->attr = PIDFS_PID_DEAD;
+		return;
+	}
 
-		rcu_read_lock();
-		cgrp = task_dfl_cgroup(tsk);
-		exit_info->cgroupid = cgroup_id(cgrp);
-		rcu_read_unlock();
+	/*
+	 * If @pid->attr is set someone might still legitimately hold a
+	 * pidfd to @pid or someone might concurrently still be getting
+	 * a reference to an already stashed dentry from @pid->stashed.
+	 * So defer cleaning @pid->attr until the last reference to @pid
+	 * is put
+	 */
+
+	exit_info = &attr->__pei;
+
+#ifdef CONFIG_CGROUPS
+	rcu_read_lock();
+	cgrp = task_dfl_cgroup(tsk);
+	exit_info->cgroupid = cgroup_id(cgrp);
+	rcu_read_unlock();
 #endif
-		exit_info->exit_code = tsk->exit_code;
+	exit_info->exit_code = tsk->exit_code;
 
-		/* Ensure that PIDFD_GET_INFO sees either all or nothing. */
-		smp_store_release(&pidfs_i(inode)->exit_info, &pidfs_i(inode)->__pei);
-		dput(dentry);
-	}
+	/* Ensure that PIDFD_GET_INFO sees either all or nothing. */
+	smp_store_release(&attr->exit_info, &attr->__pei);
 }
 
 #ifdef CONFIG_COREDUMP
@@ -594,16 +636,15 @@ void pidfs_coredump(const struct coredump_params *cprm)
 {
 	struct pid *pid = cprm->pid;
 	struct pidfs_exit_info *exit_info;
-	struct dentry *dentry;
-	struct inode *inode;
+	struct pidfs_attr *attr;
 	__u32 coredump_mask = 0;
 
-	dentry = pid->stashed;
-	if (WARN_ON_ONCE(!dentry))
-		return;
+	attr = READ_ONCE(pid->attr);
 
-	inode = d_inode(dentry);
-	exit_info = &pidfs_i(inode)->__pei;
+	VFS_WARN_ON_ONCE(!attr);
+	VFS_WARN_ON_ONCE(attr == PIDFS_PID_DEAD);
+
+	exit_info = &attr->__pei;
 	/* Note how we were coredumped. */
 	coredump_mask = pidfs_coredump_mask(cprm->mm_flags);
 	/* Note that we actually did coredump. */
@@ -663,7 +704,7 @@ static struct inode *pidfs_alloc_inode(struct super_block *sb)
 
 static void pidfs_free_inode(struct inode *inode)
 {
-	kmem_cache_free(pidfs_cachep, pidfs_i(inode));
+	kfree(pidfs_i(inode));
 }
 
 static const struct super_operations pidfs_sops = {
@@ -831,8 +872,13 @@ static inline bool pidfs_pid_valid(struct pid *pid, const struct path *path,
 	 * recorded and published can be handled correctly.
 	 */
 	if (unlikely(!pid_has_task(pid, type))) {
-		struct inode *inode = d_inode(path->dentry);
-		return !!READ_ONCE(pidfs_i(inode)->exit_info);
+		struct pidfs_attr *attr;
+
+		attr = READ_ONCE(pid->attr);
+		if (!attr)
+			return false;
+		if (!READ_ONCE(attr->exit_info))
+			return false;
 	}
 
 	return true;
@@ -878,9 +924,67 @@ static void pidfs_put_data(void *data)
 	put_pid(pid);
 }
 
+/**
+ * pidfs_register_pid - register a struct pid in pidfs
+ * @pid: pid to pin
+ *
+ * Register a struct pid in pidfs. Needs to be paired with
+ * pidfs_put_pid() to not risk leaking the pidfs dentry and inode.
+ *
+ * Return: On success zero, on error a negative error code is returned.
+ */
+int pidfs_register_pid(struct pid *pid)
+{
+	struct pidfs_attr *new_attr __free(kfree) = NULL;
+	struct pidfs_attr *attr;
+
+	might_sleep();
+
+	if (!pid)
+		return 0;
+
+	attr = READ_ONCE(pid->attr);
+	if (unlikely(attr == PIDFS_PID_DEAD))
+		return PTR_ERR(PIDFS_PID_DEAD);
+	if (attr)
+		return 0;
+
+	new_attr = kmem_cache_zalloc(pidfs_attr_cachep, GFP_KERNEL);
+	if (!new_attr)
+		return -ENOMEM;
+
+	/* Synchronize with pidfs_exit(). */
+	guard(spinlock_irq)(&pid->wait_pidfd.lock);
+
+	attr = pid->attr;
+	if (unlikely(attr == PIDFS_PID_DEAD))
+		return PTR_ERR(PIDFS_PID_DEAD);
+	if (unlikely(attr))
+		return 0;
+
+	pid->attr = no_free_ptr(new_attr);
+	return 0;
+}
+
+static struct dentry *pidfs_stash_dentry(struct dentry **stashed,
+					 struct dentry *dentry)
+{
+	int ret;
+	struct pid *pid = d_inode(dentry)->i_private;
+
+	VFS_WARN_ON_ONCE(stashed != &pid->stashed);
+
+	ret = pidfs_register_pid(pid);
+	if (ret)
+		return ERR_PTR(ret);
+
+	return stash_dentry(stashed, dentry);
+}
+
 static const struct stashed_operations pidfs_stashed_ops = {
-	.init_inode = pidfs_init_inode,
-	.put_data = pidfs_put_data,
+	.stash_dentry	= pidfs_stash_dentry,
+	.init_inode	= pidfs_init_inode,
+	.put_data	= pidfs_put_data,
 };
 
 static int pidfs_init_fs_context(struct fs_context *fc)
@@ -936,33 +1040,6 @@ struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags)
 	return pidfd_file;
 }
 
-/**
- * pidfs_register_pid - register a struct pid in pidfs
- * @pid: pid to pin
- *
- * Register a struct pid in pidfs. Needs to be paired with
- * pidfs_put_pid() to not risk leaking the pidfs dentry and inode.
- *
- * Return: On success zero, on error a negative error code is returned.
- */
-int pidfs_register_pid(struct pid *pid)
-{
-	struct path path __free(path_put) = {};
-	int ret;
-
-	might_sleep();
-
-	if (!pid)
-		return 0;
-
-	ret = path_from_stashed(&pid->stashed, pidfs_mnt, get_pid(pid), &path);
-	if (unlikely(ret))
-		return ret;
-	/* Keep the dentry and only put the reference to the mount. */
-	path.dentry = NULL;
-	return 0;
-}
-
 /**
  * pidfs_get_pid - pin a struct pid through pidfs
  * @pid: pid to pin
@@ -1008,6 +1085,9 @@ void __init pidfs_init(void)
 					 (SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT |
 					  SLAB_ACCOUNT | SLAB_PANIC),
 					 pidfs_inode_init_once);
+	pidfs_attr_cachep = kmem_cache_create("pidfs_attr_cache", sizeof(struct pidfs_attr), 0,
+					 (SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT |
+					  SLAB_ACCOUNT | SLAB_PANIC), NULL);
 	pidfs_mnt = kern_mount(&pidfs_type);
 	if (IS_ERR(pidfs_mnt))
 		panic("Failed to mount pidfs pseudo filesystem");
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 00646a692dd4..003a1027d219 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -47,6 +47,8 @@
 
 #define RESERVED_PIDS 300
 
+struct pidfs_attr;
+
 struct upid {
 	int nr;
 	struct pid_namespace *ns;
@@ -60,6 +62,7 @@ struct pid {
 		u64 ino;
 		struct rb_node pidfs_node;
 		struct dentry *stashed;
+		struct pidfs_attr *attr;
 	};
 	/* lists of tasks that use this pid */
 	struct hlist_head tasks[PIDTYPE_MAX];
diff --git a/include/linux/pidfs.h b/include/linux/pidfs.h
index 77e7db194914..8f6ed59bb3fb 100644
--- a/include/linux/pidfs.h
+++ b/include/linux/pidfs.h
@@ -16,5 +16,6 @@ extern const struct dentry_operations pidfs_dentry_operations;
 int pidfs_register_pid(struct pid *pid);
 void pidfs_get_pid(struct pid *pid);
 void pidfs_put_pid(struct pid *pid);
+void pidfs_free_pid(struct pid *pid);
 
 #endif /* _LINUX_PID_FS_H */
diff --git a/kernel/pid.c b/kernel/pid.c
index 8317bcbc7cf7..07db7d8d066c 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -100,7 +100,7 @@ void put_pid(struct pid *pid)
 
 	ns = pid->numbers[pid->level].ns;
 	if (refcount_dec_and_test(&pid->count)) {
-		WARN_ON_ONCE(pid->stashed);
+		pidfs_free_pid(pid);
 		kmem_cache_free(ns->pid_cachep, pid);
 		put_pid_ns(ns);
 	}

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 06/16] pidfs: remove unused members from struct pidfs_inode
  2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
                   ` (4 preceding siblings ...)
  2025-06-18 20:53 ` [PATCH v2 05/16] pidfs: persist information Christian Brauner
@ 2025-06-18 20:53 ` Christian Brauner
  2025-06-22 21:10   ` Alexander Mikhalitsyn
  2025-06-18 20:53 ` [PATCH v2 07/16] pidfs: remove custom inode allocation Christian Brauner
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

We've moved persistent information to struct pid.
So there's no need for these anymore.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/pidfs.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/pidfs.c b/fs/pidfs.c
index 6a907457b1fe..72aac4f7b7d5 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -46,8 +46,6 @@ struct pidfs_attr {
 };
 
 struct pidfs_inode {
-	struct pidfs_exit_info __pei;
-	struct pidfs_exit_info *exit_info;
 	struct inode vfs_inode;
 };
 
@@ -696,9 +694,6 @@ static struct inode *pidfs_alloc_inode(struct super_block *sb)
 	if (!pi)
 		return NULL;
 
-	memset(&pi->__pei, 0, sizeof(pi->__pei));
-	pi->exit_info = NULL;
-
 	return &pi->vfs_inode;
 }
 

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 07/16] pidfs: remove custom inode allocation
  2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
                   ` (5 preceding siblings ...)
  2025-06-18 20:53 ` [PATCH v2 06/16] pidfs: remove unused members from struct pidfs_inode Christian Brauner
@ 2025-06-18 20:53 ` Christian Brauner
  2025-06-22 21:10   ` Alexander Mikhalitsyn
  2025-06-18 20:53 ` [PATCH v2 08/16] pidfs: remove pidfs_{get,put}_pid() Christian Brauner
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

We don't need it anymore as persistent information is allocated lazily
and stashed in struct pid.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/pidfs.c | 39 ---------------------------------------
 1 file changed, 39 deletions(-)

diff --git a/fs/pidfs.c b/fs/pidfs.c
index 72aac4f7b7d5..c49c53d6ae51 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -27,7 +27,6 @@
 
 #define PIDFS_PID_DEAD ERR_PTR(-ESRCH)
 
-static struct kmem_cache *pidfs_cachep __ro_after_init;
 static struct kmem_cache *pidfs_attr_cachep __ro_after_init;
 
 /*
@@ -45,15 +44,6 @@ struct pidfs_attr {
 	struct pidfs_exit_info *exit_info;
 };
 
-struct pidfs_inode {
-	struct inode vfs_inode;
-};
-
-static inline struct pidfs_inode *pidfs_i(struct inode *inode)
-{
-	return container_of(inode, struct pidfs_inode, vfs_inode);
-}
-
 static struct rb_root pidfs_ino_tree = RB_ROOT;
 
 #if BITS_PER_LONG == 32
@@ -686,27 +676,9 @@ static void pidfs_evict_inode(struct inode *inode)
 	put_pid(pid);
 }
 
-static struct inode *pidfs_alloc_inode(struct super_block *sb)
-{
-	struct pidfs_inode *pi;
-
-	pi = alloc_inode_sb(sb, pidfs_cachep, GFP_KERNEL);
-	if (!pi)
-		return NULL;
-
-	return &pi->vfs_inode;
-}
-
-static void pidfs_free_inode(struct inode *inode)
-{
-	kfree(pidfs_i(inode));
-}
-
 static const struct super_operations pidfs_sops = {
-	.alloc_inode	= pidfs_alloc_inode,
 	.drop_inode	= generic_delete_inode,
 	.evict_inode	= pidfs_evict_inode,
-	.free_inode	= pidfs_free_inode,
 	.statfs		= simple_statfs,
 };
 
@@ -1067,19 +1039,8 @@ void pidfs_put_pid(struct pid *pid)
 	dput(pid->stashed);
 }
 
-static void pidfs_inode_init_once(void *data)
-{
-	struct pidfs_inode *pi = data;
-
-	inode_init_once(&pi->vfs_inode);
-}
-
 void __init pidfs_init(void)
 {
-	pidfs_cachep = kmem_cache_create("pidfs_cache", sizeof(struct pidfs_inode), 0,
-					 (SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT |
-					  SLAB_ACCOUNT | SLAB_PANIC),
-					 pidfs_inode_init_once);
 	pidfs_attr_cachep = kmem_cache_create("pidfs_attr_cache", sizeof(struct pidfs_attr), 0,
 					 (SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT |
 					  SLAB_ACCOUNT | SLAB_PANIC), NULL);

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 08/16] pidfs: remove pidfs_{get,put}_pid()
  2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
                   ` (6 preceding siblings ...)
  2025-06-18 20:53 ` [PATCH v2 07/16] pidfs: remove custom inode allocation Christian Brauner
@ 2025-06-18 20:53 ` Christian Brauner
  2025-06-22 21:11   ` Alexander Mikhalitsyn
  2025-06-18 20:53 ` [PATCH v2 09/16] pidfs: remove pidfs_pid_valid() Christian Brauner
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

Now that we stash persistent information in struct pid there's no need
to play volatile games with pinning struct pid via dentries in pidfs.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/coredump.c         |  6 ------
 fs/pidfs.c            | 35 +----------------------------------
 include/linux/pidfs.h |  2 --
 net/unix/af_unix.c    |  5 -----
 4 files changed, 1 insertion(+), 47 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index f217ebf2b3b6..55d6a713a0fb 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -898,12 +898,6 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 		retval = kernel_connect(socket, (struct sockaddr *)(&addr),
 					addr_len, O_NONBLOCK | SOCK_COREDUMP);
 
-		/*
-		 * ... Make sure to only put our reference after connect() took
-		 * its own reference keeping the pidfs entry alive ...
-		 */
-		pidfs_put_pid(cprm.pid);
-
 		if (retval) {
 			if (retval == -EAGAIN)
 				coredump_report_failure("Coredump socket %s receive queue full", addr.sun_path);
diff --git a/fs/pidfs.c b/fs/pidfs.c
index c49c53d6ae51..bc2342cf4492 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -895,8 +895,7 @@ static void pidfs_put_data(void *data)
  * pidfs_register_pid - register a struct pid in pidfs
  * @pid: pid to pin
  *
- * Register a struct pid in pidfs. Needs to be paired with
- * pidfs_put_pid() to not risk leaking the pidfs dentry and inode.
+ * Register a struct pid in pidfs.
  *
  * Return: On success zero, on error a negative error code is returned.
  */
@@ -1007,38 +1006,6 @@ struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags)
 	return pidfd_file;
 }
 
-/**
- * pidfs_get_pid - pin a struct pid through pidfs
- * @pid: pid to pin
- *
- * Similar to pidfs_register_pid() but only valid if the caller knows
- * there's a reference to the @pid through a dentry already that can't
- * go away.
- */
-void pidfs_get_pid(struct pid *pid)
-{
-	if (!pid)
-		return;
-	WARN_ON_ONCE(!stashed_dentry_get(&pid->stashed));
-}
-
-/**
- * pidfs_put_pid - drop a pidfs reference
- * @pid: pid to drop
- *
- * Drop a reference to @pid via pidfs. This is only safe if the
- * reference has been taken via pidfs_get_pid().
- */
-void pidfs_put_pid(struct pid *pid)
-{
-	might_sleep();
-
-	if (!pid)
-		return;
-	VFS_WARN_ON_ONCE(!pid->stashed);
-	dput(pid->stashed);
-}
-
 void __init pidfs_init(void)
 {
 	pidfs_attr_cachep = kmem_cache_create("pidfs_attr_cache", sizeof(struct pidfs_attr), 0,
diff --git a/include/linux/pidfs.h b/include/linux/pidfs.h
index 8f6ed59bb3fb..3e08c33da2df 100644
--- a/include/linux/pidfs.h
+++ b/include/linux/pidfs.h
@@ -14,8 +14,6 @@ void pidfs_coredump(const struct coredump_params *cprm);
 #endif
 extern const struct dentry_operations pidfs_dentry_operations;
 int pidfs_register_pid(struct pid *pid);
-void pidfs_get_pid(struct pid *pid);
-void pidfs_put_pid(struct pid *pid);
 void pidfs_free_pid(struct pid *pid);
 
 #endif /* _LINUX_PID_FS_H */
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 2e2e9997a68e..129388c309b0 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -646,9 +646,6 @@ static void unix_sock_destructor(struct sock *sk)
 		return;
 	}
 
-	if (sk->sk_peer_pid)
-		pidfs_put_pid(sk->sk_peer_pid);
-
 	if (u->addr)
 		unix_release_addr(u->addr);
 
@@ -769,7 +766,6 @@ static void drop_peercred(struct unix_peercred *peercred)
 	swap(peercred->peer_pid, pid);
 	swap(peercred->peer_cred, cred);
 
-	pidfs_put_pid(pid);
 	put_pid(pid);
 	put_cred(cred);
 }
@@ -802,7 +798,6 @@ static void copy_peercred(struct sock *sk, struct sock *peersk)
 
 	spin_lock(&sk->sk_peer_lock);
 	sk->sk_peer_pid = get_pid(peersk->sk_peer_pid);
-	pidfs_get_pid(sk->sk_peer_pid);
 	sk->sk_peer_cred = get_cred(peersk->sk_peer_cred);
 	spin_unlock(&sk->sk_peer_lock);
 }

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 09/16] pidfs: remove pidfs_pid_valid()
  2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
                   ` (7 preceding siblings ...)
  2025-06-18 20:53 ` [PATCH v2 08/16] pidfs: remove pidfs_{get,put}_pid() Christian Brauner
@ 2025-06-18 20:53 ` Christian Brauner
  2025-06-22 21:13   ` Alexander Mikhalitsyn
  2025-06-18 20:53 ` [PATCH v2 10/16] libfs: prepare to allow for non-immutable pidfd inodes Christian Brauner
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

The validation is now completely handled in path_from_stashed().

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/pidfs.c | 53 -----------------------------------------------------
 1 file changed, 53 deletions(-)

diff --git a/fs/pidfs.c b/fs/pidfs.c
index bc2342cf4492..ec375692a710 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -804,58 +804,8 @@ static int pidfs_export_permission(struct handle_to_path_ctx *ctx,
 	return 0;
 }
 
-static inline bool pidfs_pid_valid(struct pid *pid, const struct path *path,
-				   unsigned int flags)
-{
-	enum pid_type type;
-
-	if (flags & PIDFD_STALE)
-		return true;
-
-	/*
-	 * Make sure that if a pidfd is created PIDFD_INFO_EXIT
-	 * information will be available. So after an inode for the
-	 * pidfd has been allocated perform another check that the pid
-	 * is still alive. If it is exit information is available even
-	 * if the task gets reaped before the pidfd is returned to
-	 * userspace. The only exception are indicated by PIDFD_STALE:
-	 *
-	 * (1) The kernel is in the middle of task creation and thus no
-	 *     task linkage has been established yet.
-	 * (2) The caller knows @pid has been registered in pidfs at a
-	 *     time when the task was still alive.
-	 *
-	 * In both cases exit information will have been reported.
-	 */
-	if (flags & PIDFD_THREAD)
-		type = PIDTYPE_PID;
-	else
-		type = PIDTYPE_TGID;
-
-	/*
-	 * Since pidfs_exit() is called before struct pid's task linkage
-	 * is removed the case where the task got reaped but a dentry
-	 * was already attached to struct pid and exit information was
-	 * recorded and published can be handled correctly.
-	 */
-	if (unlikely(!pid_has_task(pid, type))) {
-		struct pidfs_attr *attr;
-
-		attr = READ_ONCE(pid->attr);
-		if (!attr)
-			return false;
-		if (!READ_ONCE(attr->exit_info))
-			return false;
-	}
-
-	return true;
-}
-
 static struct file *pidfs_export_open(struct path *path, unsigned int oflags)
 {
-	if (!pidfs_pid_valid(d_inode(path->dentry)->i_private, path, oflags))
-		return ERR_PTR(-ESRCH);
-
 	/*
 	 * Clear O_LARGEFILE as open_by_handle_at() forces it and raise
 	 * O_RDWR as pidfds always are.
@@ -993,9 +943,6 @@ struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags)
 	if (ret < 0)
 		return ERR_PTR(ret);
 
-	if (!pidfs_pid_valid(pid, &path, flags))
-		return ERR_PTR(-ESRCH);
-
 	flags &= ~PIDFD_STALE;
 	flags |= O_RDWR;
 	pidfd_file = dentry_open(&path, flags, current_cred());

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 10/16] libfs: prepare to allow for non-immutable pidfd inodes
  2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
                   ` (8 preceding siblings ...)
  2025-06-18 20:53 ` [PATCH v2 09/16] pidfs: remove pidfs_pid_valid() Christian Brauner
@ 2025-06-18 20:53 ` Christian Brauner
  2025-06-22 21:14   ` Alexander Mikhalitsyn
  2025-06-18 20:53 ` [PATCH v2 11/16] pidfs: make inodes mutable Christian Brauner
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

Allow for S_IMMUTABLE to be stripped so that we can support xattrs.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/libfs.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index 997d3a363ce6..c5b520df9032 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -2162,7 +2162,6 @@ static struct dentry *prepare_anon_dentry(struct dentry **stashed,
 
 	/* Notice when this is changed. */
 	WARN_ON_ONCE(!S_ISREG(inode->i_mode));
-	WARN_ON_ONCE(!IS_IMMUTABLE(inode));
 
 	dentry = d_alloc_anon(sb);
 	if (!dentry) {

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 11/16] pidfs: make inodes mutable
  2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
                   ` (9 preceding siblings ...)
  2025-06-18 20:53 ` [PATCH v2 10/16] libfs: prepare to allow for non-immutable pidfd inodes Christian Brauner
@ 2025-06-18 20:53 ` Christian Brauner
  2025-06-22 21:14   ` Alexander Mikhalitsyn
  2025-06-18 20:53 ` [PATCH v2 12/16] pidfs: support xattrs on pidfds Christian Brauner
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

Prepare for allowing extended attributes to be set on pidfd inodes by
allowing them to be mutable.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/pidfs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/pidfs.c b/fs/pidfs.c
index ec375692a710..df5bc69ea1c0 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -827,6 +827,8 @@ static int pidfs_init_inode(struct inode *inode, void *data)
 
 	inode->i_private = data;
 	inode->i_flags |= S_PRIVATE | S_ANON_INODE;
+	/* We allow to set xattrs. */
+	inode->i_flags &= ~S_IMMUTABLE;
 	inode->i_mode |= S_IRWXU;
 	inode->i_op = &pidfs_inode_operations;
 	inode->i_fop = &pidfs_file_operations;

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 12/16] pidfs: support xattrs on pidfds
  2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
                   ` (10 preceding siblings ...)
  2025-06-18 20:53 ` [PATCH v2 11/16] pidfs: make inodes mutable Christian Brauner
@ 2025-06-18 20:53 ` Christian Brauner
  2025-06-22 21:19   ` Alexander Mikhalitsyn
  2025-06-18 20:53 ` [PATCH v2 13/16] selftests/pidfd: test extended attribute support Christian Brauner
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

Now that we have a way to persist information for pidfs dentries we can
start supporting extended attributes on pidfds. This will allow
userspace to attach meta information to tasks.

One natural extension would be to introduce a custom pidfs.* extended
attribute space and allow for the inheritance of extended attributes
across fork() and exec().

The first simple scheme will allow privileged userspace to set trusted
extended attributes on pidfs inodes.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/pidfs.c | 106 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 102 insertions(+), 4 deletions(-)

diff --git a/fs/pidfs.c b/fs/pidfs.c
index df5bc69ea1c0..15d99854d243 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -21,6 +21,7 @@
 #include <linux/utsname.h>
 #include <net/net_namespace.h>
 #include <linux/coredump.h>
+#include <linux/xattr.h>
 
 #include "internal.h"
 #include "mount.h"
@@ -28,6 +29,7 @@
 #define PIDFS_PID_DEAD ERR_PTR(-ESRCH)
 
 static struct kmem_cache *pidfs_attr_cachep __ro_after_init;
+static struct kmem_cache *pidfs_xattr_cachep __ro_after_init;
 
 /*
  * Stashes information that userspace needs to access even after the
@@ -40,6 +42,7 @@ struct pidfs_exit_info {
 };
 
 struct pidfs_attr {
+	struct simple_xattrs *xattrs;
 	struct pidfs_exit_info __pei;
 	struct pidfs_exit_info *exit_info;
 };
@@ -138,14 +141,27 @@ void pidfs_remove_pid(struct pid *pid)
 
 void pidfs_free_pid(struct pid *pid)
 {
+	struct pidfs_attr *attr __free(kfree) = no_free_ptr(pid->attr);
+	struct simple_xattrs *xattrs __free(kfree) = NULL;
+
 	/*
 	 * Any dentry must've been wiped from the pid by now.
 	 * Otherwise there's a reference count bug.
 	 */
 	VFS_WARN_ON_ONCE(pid->stashed);
 
-	if (!IS_ERR(pid->attr))
-		kfree(pid->attr);
+	if (IS_ERR(attr))
+		return;
+
+	/*
+	 * Any dentry must've been wiped from the pid by now. Otherwise
+	 * there's a reference count bug.
+	 */
+	VFS_WARN_ON_ONCE(pid->stashed);
+
+	xattrs = attr->xattrs;
+	if (xattrs)
+		simple_xattrs_free(attr->xattrs, NULL);
 }
 
 #ifdef CONFIG_PROC_FS
@@ -663,9 +679,24 @@ static int pidfs_getattr(struct mnt_idmap *idmap, const struct path *path,
 	return anon_inode_getattr(idmap, path, stat, request_mask, query_flags);
 }
 
+static ssize_t pidfs_listxattr(struct dentry *dentry, char *buf, size_t size)
+{
+	struct inode *inode = d_inode(dentry);
+	struct pid *pid = inode->i_private;
+	struct pidfs_attr *attr = pid->attr;
+	struct simple_xattrs *xattrs;
+
+	xattrs = READ_ONCE(attr->xattrs);
+	if (!xattrs)
+		return 0;
+
+	return simple_xattr_list(inode, xattrs, buf, size);
+}
+
 static const struct inode_operations pidfs_inode_operations = {
-	.getattr = pidfs_getattr,
-	.setattr = pidfs_setattr,
+	.getattr	= pidfs_getattr,
+	.setattr	= pidfs_setattr,
+	.listxattr	= pidfs_listxattr,
 };
 
 static void pidfs_evict_inode(struct inode *inode)
@@ -905,6 +936,67 @@ static const struct stashed_operations pidfs_stashed_ops = {
 	.put_data	= pidfs_put_data,
 };
 
+static int pidfs_xattr_get(const struct xattr_handler *handler,
+			   struct dentry *unused, struct inode *inode,
+			   const char *suffix, void *value, size_t size)
+{
+	struct pid *pid = inode->i_private;
+	struct pidfs_attr *attr = pid->attr;
+	const char *name;
+	struct simple_xattrs *xattrs;
+
+	xattrs = READ_ONCE(attr->xattrs);
+	if (!xattrs)
+		return 0;
+
+	name = xattr_full_name(handler, suffix);
+	return simple_xattr_get(xattrs, name, value, size);
+}
+
+static int pidfs_xattr_set(const struct xattr_handler *handler,
+			   struct mnt_idmap *idmap, struct dentry *unused,
+			   struct inode *inode, const char *suffix,
+			   const void *value, size_t size, int flags)
+{
+	struct pid *pid = inode->i_private;
+	struct pidfs_attr *attr = pid->attr;
+	const char *name;
+	struct simple_xattrs *xattrs;
+	struct simple_xattr *old_xattr;
+
+	/* Ensure we're the only one to set @attr->xattrs. */
+	WARN_ON_ONCE(!inode_is_locked(inode));
+
+	xattrs = READ_ONCE(attr->xattrs);
+	if (!xattrs) {
+		xattrs = kmem_cache_zalloc(pidfs_xattr_cachep, GFP_KERNEL);
+		if (!xattrs)
+			return -ENOMEM;
+
+		simple_xattrs_init(xattrs);
+		smp_store_release(&pid->attr->xattrs, xattrs);
+	}
+
+	name = xattr_full_name(handler, suffix);
+	old_xattr = simple_xattr_set(xattrs, name, value, size, flags);
+	if (IS_ERR(old_xattr))
+		return PTR_ERR(old_xattr);
+
+	simple_xattr_free(old_xattr);
+	return 0;
+}
+
+static const struct xattr_handler pidfs_trusted_xattr_handler = {
+	.prefix = XATTR_TRUSTED_PREFIX,
+	.get	= pidfs_xattr_get,
+	.set	= pidfs_xattr_set,
+};
+
+static const struct xattr_handler *const pidfs_xattr_handlers[] = {
+	&pidfs_trusted_xattr_handler,
+	NULL
+};
+
 static int pidfs_init_fs_context(struct fs_context *fc)
 {
 	struct pseudo_fs_context *ctx;
@@ -918,6 +1010,7 @@ static int pidfs_init_fs_context(struct fs_context *fc)
 	ctx->ops = &pidfs_sops;
 	ctx->eops = &pidfs_export_operations;
 	ctx->dops = &pidfs_dentry_operations;
+	ctx->xattr = pidfs_xattr_handlers;
 	fc->s_fs_info = (void *)&pidfs_stashed_ops;
 	return 0;
 }
@@ -960,6 +1053,11 @@ void __init pidfs_init(void)
 	pidfs_attr_cachep = kmem_cache_create("pidfs_attr_cache", sizeof(struct pidfs_attr), 0,
 					 (SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT |
 					  SLAB_ACCOUNT | SLAB_PANIC), NULL);
+
+	pidfs_xattr_cachep = kmem_cache_create("pidfs_xattr_cache",
+					       sizeof(struct simple_xattrs), 0,
+					       SLAB_PANIC, NULL);
+
 	pidfs_mnt = kern_mount(&pidfs_type);
 	if (IS_ERR(pidfs_mnt))
 		panic("Failed to mount pidfs pseudo filesystem");

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 13/16] selftests/pidfd: test extended attribute support
  2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
                   ` (11 preceding siblings ...)
  2025-06-18 20:53 ` [PATCH v2 12/16] pidfs: support xattrs on pidfds Christian Brauner
@ 2025-06-18 20:53 ` Christian Brauner
  2025-06-22 21:21   ` Alexander Mikhalitsyn
  2025-06-18 20:53 ` [PATCH v2 14/16] " Christian Brauner
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

Add tests for extended attribute support on pidfds.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 tools/testing/selftests/pidfd/.gitignore         |  1 +
 tools/testing/selftests/pidfd/Makefile           |  3 +-
 tools/testing/selftests/pidfd/pidfd_xattr_test.c | 97 ++++++++++++++++++++++++
 3 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/pidfd/.gitignore b/tools/testing/selftests/pidfd/.gitignore
index 0406a065deb4..bc4130506eda 100644
--- a/tools/testing/selftests/pidfd/.gitignore
+++ b/tools/testing/selftests/pidfd/.gitignore
@@ -10,3 +10,4 @@ pidfd_file_handle_test
 pidfd_bind_mount
 pidfd_info_test
 pidfd_exec_helper
+pidfd_xattr_test
diff --git a/tools/testing/selftests/pidfd/Makefile b/tools/testing/selftests/pidfd/Makefile
index fcbefc0d77f6..c9fd5023ef15 100644
--- a/tools/testing/selftests/pidfd/Makefile
+++ b/tools/testing/selftests/pidfd/Makefile
@@ -3,7 +3,8 @@ CFLAGS += -g $(KHDR_INCLUDES) -pthread -Wall
 
 TEST_GEN_PROGS := pidfd_test pidfd_fdinfo_test pidfd_open_test \
 	pidfd_poll_test pidfd_wait pidfd_getfd_test pidfd_setns_test \
-	pidfd_file_handle_test pidfd_bind_mount pidfd_info_test
+	pidfd_file_handle_test pidfd_bind_mount pidfd_info_test \
+	pidfd_xattr_test
 
 TEST_GEN_PROGS_EXTENDED := pidfd_exec_helper
 
diff --git a/tools/testing/selftests/pidfd/pidfd_xattr_test.c b/tools/testing/selftests/pidfd/pidfd_xattr_test.c
new file mode 100644
index 000000000000..00d400ac515b
--- /dev/null
+++ b/tools/testing/selftests/pidfd/pidfd_xattr_test.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <linux/types.h>
+#include <poll.h>
+#include <pthread.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <syscall.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#include <sys/socket.h>
+#include <linux/kcmp.h>
+#include <sys/stat.h>
+#include <sys/xattr.h>
+
+#include "pidfd.h"
+#include "../kselftest_harness.h"
+
+FIXTURE(pidfs_xattr)
+{
+	pid_t child_pid;
+	int child_pidfd;
+};
+
+FIXTURE_SETUP(pidfs_xattr)
+{
+	self->child_pid = create_child(&self->child_pidfd, CLONE_NEWUSER | CLONE_NEWPID);
+	EXPECT_GE(self->child_pid, 0);
+
+	if (self->child_pid == 0)
+		_exit(EXIT_SUCCESS);
+}
+
+FIXTURE_TEARDOWN(pidfs_xattr)
+{
+	sys_waitid(P_PID, self->child_pid, NULL, WEXITED);
+}
+
+TEST_F(pidfs_xattr, set_get_list_xattr_multiple)
+{
+	int ret, i;
+	char xattr_name[32];
+	char xattr_value[32];
+	char buf[32];
+	const int num_xattrs = 10;
+	char list[PATH_MAX] = {};
+
+	for (i = 0; i < num_xattrs; i++) {
+		snprintf(xattr_name, sizeof(xattr_name), "trusted.testattr%d", i);
+		snprintf(xattr_value, sizeof(xattr_value), "testvalue%d", i);
+		ret = fsetxattr(self->child_pidfd, xattr_name, xattr_value, strlen(xattr_value), 0);
+		ASSERT_EQ(ret, 0);
+	}
+
+	for (i = 0; i < num_xattrs; i++) {
+		snprintf(xattr_name, sizeof(xattr_name), "trusted.testattr%d", i);
+		snprintf(xattr_value, sizeof(xattr_value), "testvalue%d", i);
+		memset(buf, 0, sizeof(buf));
+		ret = fgetxattr(self->child_pidfd, xattr_name, buf, sizeof(buf));
+		ASSERT_EQ(ret, strlen(xattr_value));
+		ASSERT_EQ(strcmp(buf, xattr_value), 0);
+	}
+
+	ret = flistxattr(self->child_pidfd, list, sizeof(list));
+	ASSERT_GT(ret, 0);
+	for (i = 0; i < num_xattrs; i++) {
+		snprintf(xattr_name, sizeof(xattr_name), "trusted.testattr%d", i);
+		bool found = false;
+		for (char *it = list; it < list + ret; it += strlen(it) + 1) {
+			if (strcmp(it, xattr_name))
+				continue;
+			found = true;
+			break;
+		}
+		ASSERT_TRUE(found);
+	}
+
+	for (i = 0; i < num_xattrs; i++) {
+		snprintf(xattr_name, sizeof(xattr_name), "trusted.testattr%d", i);
+		ret = fremovexattr(self->child_pidfd, xattr_name);
+		ASSERT_EQ(ret, 0);
+
+		ret = fgetxattr(self->child_pidfd, xattr_name, buf, sizeof(buf));
+		ASSERT_EQ(ret, -1);
+		ASSERT_EQ(errno, ENODATA);
+	}
+}
+
+TEST_HARNESS_MAIN

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 14/16] selftests/pidfd: test extended attribute support
  2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
                   ` (12 preceding siblings ...)
  2025-06-18 20:53 ` [PATCH v2 13/16] selftests/pidfd: test extended attribute support Christian Brauner
@ 2025-06-18 20:53 ` Christian Brauner
  2025-06-22 21:21   ` Alexander Mikhalitsyn
  2025-06-18 20:53 ` [PATCH v2 15/16] selftests/pidfd: test setattr support Christian Brauner
  2025-06-18 20:53 ` [PATCH v2 16/16] pidfs: add some CONFIG_DEBUG_VFS asserts Christian Brauner
  15 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

Test that extended attributes are permanent.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 tools/testing/selftests/pidfd/pidfd_xattr_test.c | 35 ++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/tools/testing/selftests/pidfd/pidfd_xattr_test.c b/tools/testing/selftests/pidfd/pidfd_xattr_test.c
index 00d400ac515b..5cf7bb0e4bf2 100644
--- a/tools/testing/selftests/pidfd/pidfd_xattr_test.c
+++ b/tools/testing/selftests/pidfd/pidfd_xattr_test.c
@@ -94,4 +94,39 @@ TEST_F(pidfs_xattr, set_get_list_xattr_multiple)
 	}
 }
 
+TEST_F(pidfs_xattr, set_get_list_xattr_persistent)
+{
+	int ret;
+	char buf[32];
+	char list[PATH_MAX] = {};
+
+	ret = fsetxattr(self->child_pidfd, "trusted.persistent", "persistent value", strlen("persistent value"), 0);
+	ASSERT_EQ(ret, 0);
+
+	memset(buf, 0, sizeof(buf));
+	ret = fgetxattr(self->child_pidfd, "trusted.persistent", buf, sizeof(buf));
+	ASSERT_EQ(ret, strlen("persistent value"));
+	ASSERT_EQ(strcmp(buf, "persistent value"), 0);
+
+	ret = flistxattr(self->child_pidfd, list, sizeof(list));
+	ASSERT_GT(ret, 0);
+	ASSERT_EQ(strcmp(list, "trusted.persistent"), 0)
+
+	ASSERT_EQ(close(self->child_pidfd), 0);
+	self->child_pidfd = -EBADF;
+	sleep(2);
+
+	self->child_pidfd = sys_pidfd_open(self->child_pid, 0);
+	ASSERT_GE(self->child_pidfd, 0);
+
+	memset(buf, 0, sizeof(buf));
+	ret = fgetxattr(self->child_pidfd, "trusted.persistent", buf, sizeof(buf));
+	ASSERT_EQ(ret, strlen("persistent value"));
+	ASSERT_EQ(strcmp(buf, "persistent value"), 0);
+
+	ret = flistxattr(self->child_pidfd, list, sizeof(list));
+	ASSERT_GT(ret, 0);
+	ASSERT_EQ(strcmp(list, "trusted.persistent"), 0);
+}
+
 TEST_HARNESS_MAIN

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 15/16] selftests/pidfd: test setattr support
  2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
                   ` (13 preceding siblings ...)
  2025-06-18 20:53 ` [PATCH v2 14/16] " Christian Brauner
@ 2025-06-18 20:53 ` Christian Brauner
  2025-06-22 21:22   ` Alexander Mikhalitsyn
  2025-06-18 20:53 ` [PATCH v2 16/16] pidfs: add some CONFIG_DEBUG_VFS asserts Christian Brauner
  15 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

Verify that ->setattr() on a pidfd doens't work.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 tools/testing/selftests/pidfd/.gitignore           |  1 +
 tools/testing/selftests/pidfd/Makefile             |  2 +-
 tools/testing/selftests/pidfd/pidfd_setattr_test.c | 69 ++++++++++++++++++++++
 3 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/pidfd/.gitignore b/tools/testing/selftests/pidfd/.gitignore
index bc4130506eda..144e7ff65d6a 100644
--- a/tools/testing/selftests/pidfd/.gitignore
+++ b/tools/testing/selftests/pidfd/.gitignore
@@ -11,3 +11,4 @@ pidfd_bind_mount
 pidfd_info_test
 pidfd_exec_helper
 pidfd_xattr_test
+pidfd_setattr_test
diff --git a/tools/testing/selftests/pidfd/Makefile b/tools/testing/selftests/pidfd/Makefile
index c9fd5023ef15..03a6eede9c9e 100644
--- a/tools/testing/selftests/pidfd/Makefile
+++ b/tools/testing/selftests/pidfd/Makefile
@@ -4,7 +4,7 @@ CFLAGS += -g $(KHDR_INCLUDES) -pthread -Wall
 TEST_GEN_PROGS := pidfd_test pidfd_fdinfo_test pidfd_open_test \
 	pidfd_poll_test pidfd_wait pidfd_getfd_test pidfd_setns_test \
 	pidfd_file_handle_test pidfd_bind_mount pidfd_info_test \
-	pidfd_xattr_test
+	pidfd_xattr_test pidfd_setattr_test
 
 TEST_GEN_PROGS_EXTENDED := pidfd_exec_helper
 
diff --git a/tools/testing/selftests/pidfd/pidfd_setattr_test.c b/tools/testing/selftests/pidfd/pidfd_setattr_test.c
new file mode 100644
index 000000000000..d7de05edc4b3
--- /dev/null
+++ b/tools/testing/selftests/pidfd/pidfd_setattr_test.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <linux/types.h>
+#include <poll.h>
+#include <pthread.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <syscall.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#include <sys/socket.h>
+#include <linux/kcmp.h>
+#include <sys/stat.h>
+#include <sys/xattr.h>
+
+#include "pidfd.h"
+#include "../kselftest_harness.h"
+
+FIXTURE(pidfs_setattr)
+{
+	pid_t child_pid;
+	int child_pidfd;
+};
+
+FIXTURE_SETUP(pidfs_setattr)
+{
+	self->child_pid = create_child(&self->child_pidfd, CLONE_NEWUSER | CLONE_NEWPID);
+	EXPECT_GE(self->child_pid, 0);
+
+	if (self->child_pid == 0)
+		_exit(EXIT_SUCCESS);
+}
+
+FIXTURE_TEARDOWN(pidfs_setattr)
+{
+	sys_waitid(P_PID, self->child_pid, NULL, WEXITED);
+	EXPECT_EQ(close(self->child_pidfd), 0);
+}
+
+TEST_F(pidfs_setattr, no_chown)
+{
+	ASSERT_LT(fchown(self->child_pidfd, 1234, 5678), 0);
+	ASSERT_EQ(errno, EOPNOTSUPP);
+}
+
+TEST_F(pidfs_setattr, no_chmod)
+{
+	ASSERT_LT(fchmod(self->child_pidfd, 0777), 0);
+	ASSERT_EQ(errno, EOPNOTSUPP);
+}
+
+TEST_F(pidfs_setattr, no_exec)
+{
+	char *const argv[] = { NULL };
+	char *const envp[] = { NULL };
+
+	ASSERT_LT(execveat(self->child_pidfd, "", argv, envp, AT_EMPTY_PATH), 0);
+	ASSERT_EQ(errno, EACCES);
+}
+
+TEST_HARNESS_MAIN

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 16/16] pidfs: add some CONFIG_DEBUG_VFS asserts
  2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
                   ` (14 preceding siblings ...)
  2025-06-18 20:53 ` [PATCH v2 15/16] selftests/pidfd: test setattr support Christian Brauner
@ 2025-06-18 20:53 ` Christian Brauner
  2025-06-22 21:22   ` Alexander Mikhalitsyn
  15 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

Allow to catch some obvious bugs.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/pidfs.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/pidfs.c b/fs/pidfs.c
index 15d99854d243..1cf66fd9961e 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -809,6 +809,8 @@ static struct dentry *pidfs_fh_to_dentry(struct super_block *sb,
 	if (ret < 0)
 		return ERR_PTR(ret);
 
+	VFS_WARN_ON_ONCE(!pid->attr);
+
 	mntput(path.mnt);
 	return path.dentry;
 }
@@ -1038,6 +1040,8 @@ struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags)
 	if (ret < 0)
 		return ERR_PTR(ret);
 
+	VFS_WARN_ON_ONCE(!pid->attr);
+
 	flags &= ~PIDFD_STALE;
 	flags |= O_RDWR;
 	pidfd_file = dentry_open(&path, flags, current_cred());

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 01/16] pidfs: raise SB_I_NODEV and SB_I_NOEXEC
  2025-06-18 20:53 ` [PATCH v2 01/16] pidfs: raise SB_I_NODEV and SB_I_NOEXEC Christian Brauner
@ 2025-06-22 20:37   ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Mikhalitsyn @ 2025-06-22 20:37 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek

Am Mi., 18. Juni 2025 um 22:53 Uhr schrieb Christian Brauner
<brauner@kernel.org>:
>
> Similar to commit 1ed95281c0c7 ("anon_inode: raise SB_I_NODEV and SB_I_NOEXEC"):
> it shouldn't be possible to execute pidfds via
> execveat(fd_anon_inode, "", NULL, NULL, AT_EMPTY_PATH)
> so raise SB_I_NOEXEC so that no one gets any creative ideas.
>
> Also raise SB_I_NODEV as we don't expect or support any devices on pidfs.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

> ---
>  fs/pidfs.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/fs/pidfs.c b/fs/pidfs.c
> index c1f0a067be40..ff2560b34ed1 100644
> --- a/fs/pidfs.c
> +++ b/fs/pidfs.c
> @@ -891,6 +891,8 @@ static int pidfs_init_fs_context(struct fs_context *fc)
>         if (!ctx)
>                 return -ENOMEM;
>
> +       fc->s_iflags |= SB_I_NOEXEC;
> +       fc->s_iflags |= SB_I_NODEV;
>         ctx->ops = &pidfs_sops;
>         ctx->eops = &pidfs_export_operations;
>         ctx->dops = &pidfs_dentry_operations;
>
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 02/16] libfs: massage path_from_stashed() to allow custom stashing behavior
  2025-06-18 20:53 ` [PATCH v2 02/16] libfs: massage path_from_stashed() to allow custom stashing behavior Christian Brauner
@ 2025-06-22 20:40   ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Mikhalitsyn @ 2025-06-22 20:40 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek

Am Mi., 18. Juni 2025 um 22:53 Uhr schrieb Christian Brauner
<brauner@kernel.org>:
>
> * Add a callback to struct stashed_operations so it's possible to
>   implement custom behavior for pidfs and allow for it to return errors.
>
> * Teach stashed_dentry_get() to handle error pointers.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

> ---
>  fs/internal.h |  3 +++
>  fs/libfs.c    | 27 ++++++++++++++++++++-------
>  2 files changed, 23 insertions(+), 7 deletions(-)
>
> diff --git a/fs/internal.h b/fs/internal.h
> index 393f6c5c24f6..22ba066d1dba 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -322,12 +322,15 @@ struct mnt_idmap *alloc_mnt_idmap(struct user_namespace *mnt_userns);
>  struct mnt_idmap *mnt_idmap_get(struct mnt_idmap *idmap);
>  void mnt_idmap_put(struct mnt_idmap *idmap);
>  struct stashed_operations {
> +       struct dentry *(*stash_dentry)(struct dentry **stashed,
> +                                      struct dentry *dentry);
>         void (*put_data)(void *data);
>         int (*init_inode)(struct inode *inode, void *data);
>  };
>  int path_from_stashed(struct dentry **stashed, struct vfsmount *mnt, void *data,
>                       struct path *path);
>  void stashed_dentry_prune(struct dentry *dentry);
> +struct dentry *stash_dentry(struct dentry **stashed, struct dentry *dentry);
>  struct dentry *stashed_dentry_get(struct dentry **stashed);
>  /**
>   * path_mounted - check whether path is mounted
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 9ea0ecc325a8..3541e22c87b5 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -2128,6 +2128,8 @@ struct dentry *stashed_dentry_get(struct dentry **stashed)
>         dentry = rcu_dereference(*stashed);
>         if (!dentry)
>                 return NULL;
> +       if (IS_ERR(dentry))
> +               return dentry;
>         if (!lockref_get_not_dead(&dentry->d_lockref))
>                 return NULL;
>         return dentry;
> @@ -2176,8 +2178,7 @@ static struct dentry *prepare_anon_dentry(struct dentry **stashed,
>         return dentry;
>  }
>
> -static struct dentry *stash_dentry(struct dentry **stashed,
> -                                  struct dentry *dentry)
> +struct dentry *stash_dentry(struct dentry **stashed, struct dentry *dentry)
>  {
>         guard(rcu)();
>         for (;;) {
> @@ -2218,12 +2219,15 @@ static struct dentry *stash_dentry(struct dentry **stashed,
>  int path_from_stashed(struct dentry **stashed, struct vfsmount *mnt, void *data,
>                       struct path *path)
>  {
> -       struct dentry *dentry;
> +       struct dentry *dentry, *res;
>         const struct stashed_operations *sops = mnt->mnt_sb->s_fs_info;
>
>         /* See if dentry can be reused. */
> -       path->dentry = stashed_dentry_get(stashed);
> -       if (path->dentry) {
> +       res = stashed_dentry_get(stashed);
> +       if (IS_ERR(res))
> +               return PTR_ERR(res);
> +       if (res) {
> +               path->dentry = res;
>                 sops->put_data(data);
>                 goto out_path;
>         }
> @@ -2234,8 +2238,17 @@ int path_from_stashed(struct dentry **stashed, struct vfsmount *mnt, void *data,
>                 return PTR_ERR(dentry);
>
>         /* Added a new dentry. @data is now owned by the filesystem. */
> -       path->dentry = stash_dentry(stashed, dentry);
> -       if (path->dentry != dentry)
> +       if (sops->stash_dentry)
> +               res = sops->stash_dentry(stashed, dentry);
> +       else
> +               res = stash_dentry(stashed, dentry);
> +       if (IS_ERR(res)) {
> +               dput(dentry);
> +               return PTR_ERR(res);
> +       }
> +       path->dentry = res;
> +       /* A dentry was reused. */
> +       if (res != dentry)
>                 dput(dentry);
>
>  out_path:
>
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 03/16] libfs: massage path_from_stashed()
  2025-06-18 20:53 ` [PATCH v2 03/16] libfs: massage path_from_stashed() Christian Brauner
@ 2025-06-22 20:42   ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Mikhalitsyn @ 2025-06-22 20:42 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek

Am Mi., 18. Juni 2025 um 22:53 Uhr schrieb Christian Brauner
<brauner@kernel.org>:
>
> Make it a littler easier to follow.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

> ---
>  fs/libfs.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 3541e22c87b5..997d3a363ce6 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -2227,9 +2227,8 @@ int path_from_stashed(struct dentry **stashed, struct vfsmount *mnt, void *data,
>         if (IS_ERR(res))
>                 return PTR_ERR(res);
>         if (res) {
> -               path->dentry = res;
>                 sops->put_data(data);
> -               goto out_path;
> +               goto make_path;
>         }
>
>         /* Allocate a new dentry. */
> @@ -2246,15 +2245,14 @@ int path_from_stashed(struct dentry **stashed, struct vfsmount *mnt, void *data,
>                 dput(dentry);
>                 return PTR_ERR(res);
>         }
> -       path->dentry = res;
> -       /* A dentry was reused. */
>         if (res != dentry)
>                 dput(dentry);
>
> -out_path:
> -       WARN_ON_ONCE(path->dentry->d_fsdata != stashed);
> -       WARN_ON_ONCE(d_inode(path->dentry)->i_private != data);
> +make_path:
> +       path->dentry = res;
>         path->mnt = mntget(mnt);
> +       VFS_WARN_ON_ONCE(path->dentry->d_fsdata != stashed);
> +       VFS_WARN_ON_ONCE(d_inode(path->dentry)->i_private != data);
>         return 0;
>  }
>
>
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 04/16] pidfs: move to anonymous struct
  2025-06-18 20:53 ` [PATCH v2 04/16] pidfs: move to anonymous struct Christian Brauner
@ 2025-06-22 20:44   ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Mikhalitsyn @ 2025-06-22 20:44 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek

Am Mi., 18. Juni 2025 um 22:53 Uhr schrieb Christian Brauner
<brauner@kernel.org>:
>
> Move the pidfs entries to an anonymous struct.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

> ---
>  include/linux/pid.h | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index 453ae6d8a68d..00646a692dd4 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -52,14 +52,15 @@ struct upid {
>         struct pid_namespace *ns;
>  };
>
> -struct pid
> -{
> +struct pid {
>         refcount_t count;
>         unsigned int level;
>         spinlock_t lock;
> -       struct dentry *stashed;
> -       u64 ino;
> -       struct rb_node pidfs_node;
> +       struct {
> +               u64 ino;
> +               struct rb_node pidfs_node;
> +               struct dentry *stashed;
> +       };
>         /* lists of tasks that use this pid */
>         struct hlist_head tasks[PIDTYPE_MAX];
>         struct hlist_head inodes;
>
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 05/16] pidfs: persist information
  2025-06-18 20:53 ` [PATCH v2 05/16] pidfs: persist information Christian Brauner
@ 2025-06-22 21:09   ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Mikhalitsyn @ 2025-06-22 21:09 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek

Am Mi., 18. Juni 2025 um 22:54 Uhr schrieb Christian Brauner
<brauner@kernel.org>:
>
> Persist exit and coredump information independent of whether anyone
> currently holds a pidfd for the struct pid.
>
> The current scheme allocated pidfs dentries on-demand repeatedly.
> This scheme is reaching it's limits as it makes it impossible to pin
> information that needs to be available after the task has exited or
> coredumped and that should not be lost simply because the pidfd got
> closed temporarily. The next opener should still see the stashed
> information.
>
> This is also a prerequisite for supporting extended attributes on
> pidfds to allow attaching meta information to them.
>
> If someone opens a pidfd for a struct pid a pidfs dentry is allocated
> and stashed in pid->stashed. Once the last pidfd for the struct pid is
> closed the pidfs dentry is released and removed from pid->stashed.
>
> So if 10 callers create a pidfs dentry for the same struct pid
> sequentially, i.e., each closing the pidfd before the other creates a
> new one then a new pidfs dentry is allocated every time.
>
> Because multiple tasks acquiring and releasing a pidfd for the same
> struct pid can race with each another a task may still find a valid
> pidfs entry from the previous task in pid->stashed and reuse it. Or it
> might find a dead dentry in there and fail to reuse it and so stashes a
> new pidfs dentry. Multiple tasks may race to stash a new pidfs dentry
> but only one will succeed, the other ones will put their dentry.
>
> The current scheme aims to ensure that a pidfs dentry for a struct pid
> can only be created if the task is still alive or if a pidfs dentry
> already existed before the task was reaped and so exit information has
> been was stashed in the pidfs inode.
>
> That's great except that it's buggy. If a pidfs dentry is stashed in
> pid->stashed after pidfs_exit() but before __unhash_process() is called
> we will return a pidfd for a reaped task without exit information being
> available.
>
> The pidfds_pid_valid() check does not guard against this race as it
> doens't sync at all with pidfs_exit(). The pid_has_task() check might be
> successful simply because we're before __unhash_process() but after
> pidfs_exit().
>
> Introduce a new scheme where the lifetime of information associated with
> a pidfs entry (coredump and exit information) isn't bound to the
> lifetime of the pidfs inode but the struct pid itself.
>
> The first time a pidfs dentry is allocated for a struct pid a struct
> pidfs_attr will be allocated which will be used to store exit and
> coredump information.
>
> If all pidfs for the pidfs dentry are closed the dentry and inode can be
> cleaned up but the struct pidfs_attr will stick until the struct pid
> itself is freed. This will ensure minimal memory usage while persisting
> relevant information.
>
> The new scheme has various advantages. First, it allows to close the
> race where we end up handing out a pidfd for a reaped task for which no
> exit information is available. Second, it minimizes memory usage.
> Third, it allows to remove complex lifetime tracking via dentries when
> registering a struct pid with pidfs. There's no need to get or put a
> reference. Instead, the lifetime of exit and coredump information
> associated with a struct pid is bound to the lifetime of struct pid
> itself.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

> ---
>  fs/pidfs.c            | 212 ++++++++++++++++++++++++++++++++++----------------
>  include/linux/pid.h   |   3 +
>  include/linux/pidfs.h |   1 +
>  kernel/pid.c          |   2 +-
>  4 files changed, 151 insertions(+), 67 deletions(-)
>
> diff --git a/fs/pidfs.c b/fs/pidfs.c
> index ff2560b34ed1..6a907457b1fe 100644
> --- a/fs/pidfs.c
> +++ b/fs/pidfs.c
> @@ -25,7 +25,10 @@
>  #include "internal.h"
>  #include "mount.h"
>
> +#define PIDFS_PID_DEAD ERR_PTR(-ESRCH)
> +
>  static struct kmem_cache *pidfs_cachep __ro_after_init;
> +static struct kmem_cache *pidfs_attr_cachep __ro_after_init;
>
>  /*
>   * Stashes information that userspace needs to access even after the
> @@ -37,6 +40,11 @@ struct pidfs_exit_info {
>         __u32 coredump_mask;
>  };
>
> +struct pidfs_attr {
> +       struct pidfs_exit_info __pei;
> +       struct pidfs_exit_info *exit_info;
> +};
> +
>  struct pidfs_inode {
>         struct pidfs_exit_info __pei;
>         struct pidfs_exit_info *exit_info;
> @@ -125,6 +133,7 @@ void pidfs_add_pid(struct pid *pid)
>
>         pid->ino = pidfs_ino_nr;
>         pid->stashed = NULL;
> +       pid->attr = NULL;
>         pidfs_ino_nr++;
>
>         write_seqcount_begin(&pidmap_lock_seq);
> @@ -139,6 +148,18 @@ void pidfs_remove_pid(struct pid *pid)
>         write_seqcount_end(&pidmap_lock_seq);
>  }
>
> +void pidfs_free_pid(struct pid *pid)
> +{
> +       /*
> +        * Any dentry must've been wiped from the pid by now.
> +        * Otherwise there's a reference count bug.
> +        */
> +       VFS_WARN_ON_ONCE(pid->stashed);
> +
> +       if (!IS_ERR(pid->attr))
> +               kfree(pid->attr);
> +}
> +
>  #ifdef CONFIG_PROC_FS
>  /**
>   * pidfd_show_fdinfo - print information about a pidfd
> @@ -261,13 +282,13 @@ static __u32 pidfs_coredump_mask(unsigned long mm_flags)
>  static long pidfd_info(struct file *file, unsigned int cmd, unsigned long arg)
>  {
>         struct pidfd_info __user *uinfo = (struct pidfd_info __user *)arg;
> -       struct inode *inode = file_inode(file);
>         struct pid *pid = pidfd_pid(file);
>         size_t usize = _IOC_SIZE(cmd);
>         struct pidfd_info kinfo = {};
>         struct pidfs_exit_info *exit_info;
>         struct user_namespace *user_ns;
>         struct task_struct *task;
> +       struct pidfs_attr *attr;
>         const struct cred *c;
>         __u64 mask;
>
> @@ -286,8 +307,9 @@ static long pidfd_info(struct file *file, unsigned int cmd, unsigned long arg)
>         if (!pid_in_current_pidns(pid))
>                 return -ESRCH;
>
> +       attr = READ_ONCE(pid->attr);
>         if (mask & PIDFD_INFO_EXIT) {
> -               exit_info = READ_ONCE(pidfs_i(inode)->exit_info);
> +               exit_info = READ_ONCE(attr->exit_info);
>                 if (exit_info) {
>                         kinfo.mask |= PIDFD_INFO_EXIT;
>  #ifdef CONFIG_CGROUPS
> @@ -300,7 +322,7 @@ static long pidfd_info(struct file *file, unsigned int cmd, unsigned long arg)
>
>         if (mask & PIDFD_INFO_COREDUMP) {
>                 kinfo.mask |= PIDFD_INFO_COREDUMP;
> -               kinfo.coredump_mask = READ_ONCE(pidfs_i(inode)->__pei.coredump_mask);
> +               kinfo.coredump_mask = READ_ONCE(attr->__pei.coredump_mask);
>         }
>
>         task = get_pid_task(pid, PIDTYPE_PID);
> @@ -552,41 +574,61 @@ struct pid *pidfd_pid(const struct file *file)
>   * task has been reaped which cannot happen until we're out of
>   * release_task().
>   *
> - * If this struct pid is referred to by a pidfd then
> - * stashed_dentry_get() will return the dentry and inode for that struct
> - * pid. Since we've taken a reference on it there's now an additional
> - * reference from the exit path on it. Which is fine. We're going to put
> - * it again in a second and we know that the pid is kept alive anyway.
> + * If this struct pid has at least once been referred to by a pidfd then
> + * pid->attr will be allocated. If not we mark the struct pid as dead so
> + * anyone who is trying to register it with pidfs will fail to do so.
> + * Otherwise we would hand out pidfs for reaped tasks without having
> + * exit information available.
>   *
> - * Worst case is that we've filled in the info and immediately free the
> - * dentry and inode afterwards since the pidfd has been closed. Since
> + * Worst case is that we've filled in the info and the pid gets freed
> + * right away in free_pid() when no one holds a pidfd anymore. Since
>   * pidfs_exit() currently is placed after exit_task_work() we know that
> - * it cannot be us aka the exiting task holding a pidfd to ourselves.
> + * it cannot be us aka the exiting task holding a pidfd to itself.
>   */
>  void pidfs_exit(struct task_struct *tsk)
>  {
> -       struct dentry *dentry;
> +       struct pid *pid = task_pid(tsk);
> +       struct pidfs_attr *attr;
> +       struct pidfs_exit_info *exit_info;
> +#ifdef CONFIG_CGROUPS
> +       struct cgroup *cgrp;
> +#endif
>
>         might_sleep();
>
> -       dentry = stashed_dentry_get(&task_pid(tsk)->stashed);
> -       if (dentry) {
> -               struct inode *inode = d_inode(dentry);
> -               struct pidfs_exit_info *exit_info = &pidfs_i(inode)->__pei;
> -#ifdef CONFIG_CGROUPS
> -               struct cgroup *cgrp;
> +       guard(spinlock_irq)(&pid->wait_pidfd.lock);
> +       attr = pid->attr;
> +       if (!attr) {
> +               /*
> +                * No one ever held a pidfd for this struct pid.
> +                * Mark it as dead so no one can add a pidfs
> +                * entry anymore. We're about to be reaped and
> +                * so no exit information would be available.
> +                */
> +               pid->attr = PIDFS_PID_DEAD;
> +               return;
> +       }
>
> -               rcu_read_lock();
> -               cgrp = task_dfl_cgroup(tsk);
> -               exit_info->cgroupid = cgroup_id(cgrp);
> -               rcu_read_unlock();
> +       /*
> +        * If @pid->attr is set someone might still legitimately hold a
> +        * pidfd to @pid or someone might concurrently still be getting
> +        * a reference to an already stashed dentry from @pid->stashed.
> +        * So defer cleaning @pid->attr until the last reference to @pid
> +        * is put
> +        */
> +
> +       exit_info = &attr->__pei;
> +
> +#ifdef CONFIG_CGROUPS
> +       rcu_read_lock();
> +       cgrp = task_dfl_cgroup(tsk);
> +       exit_info->cgroupid = cgroup_id(cgrp);
> +       rcu_read_unlock();
>  #endif
> -               exit_info->exit_code = tsk->exit_code;
> +       exit_info->exit_code = tsk->exit_code;
>
> -               /* Ensure that PIDFD_GET_INFO sees either all or nothing. */
> -               smp_store_release(&pidfs_i(inode)->exit_info, &pidfs_i(inode)->__pei);
> -               dput(dentry);
> -       }
> +       /* Ensure that PIDFD_GET_INFO sees either all or nothing. */
> +       smp_store_release(&attr->exit_info, &attr->__pei);
>  }
>
>  #ifdef CONFIG_COREDUMP
> @@ -594,16 +636,15 @@ void pidfs_coredump(const struct coredump_params *cprm)
>  {
>         struct pid *pid = cprm->pid;
>         struct pidfs_exit_info *exit_info;
> -       struct dentry *dentry;
> -       struct inode *inode;
> +       struct pidfs_attr *attr;
>         __u32 coredump_mask = 0;
>
> -       dentry = pid->stashed;
> -       if (WARN_ON_ONCE(!dentry))
> -               return;
> +       attr = READ_ONCE(pid->attr);
>
> -       inode = d_inode(dentry);
> -       exit_info = &pidfs_i(inode)->__pei;
> +       VFS_WARN_ON_ONCE(!attr);
> +       VFS_WARN_ON_ONCE(attr == PIDFS_PID_DEAD);
> +
> +       exit_info = &attr->__pei;
>         /* Note how we were coredumped. */
>         coredump_mask = pidfs_coredump_mask(cprm->mm_flags);
>         /* Note that we actually did coredump. */
> @@ -663,7 +704,7 @@ static struct inode *pidfs_alloc_inode(struct super_block *sb)
>
>  static void pidfs_free_inode(struct inode *inode)
>  {
> -       kmem_cache_free(pidfs_cachep, pidfs_i(inode));
> +       kfree(pidfs_i(inode));
>  }
>
>  static const struct super_operations pidfs_sops = {
> @@ -831,8 +872,13 @@ static inline bool pidfs_pid_valid(struct pid *pid, const struct path *path,
>          * recorded and published can be handled correctly.
>          */
>         if (unlikely(!pid_has_task(pid, type))) {
> -               struct inode *inode = d_inode(path->dentry);
> -               return !!READ_ONCE(pidfs_i(inode)->exit_info);
> +               struct pidfs_attr *attr;
> +
> +               attr = READ_ONCE(pid->attr);
> +               if (!attr)
> +                       return false;
> +               if (!READ_ONCE(attr->exit_info))
> +                       return false;
>         }
>
>         return true;
> @@ -878,9 +924,67 @@ static void pidfs_put_data(void *data)
>         put_pid(pid);
>  }
>
> +/**
> + * pidfs_register_pid - register a struct pid in pidfs
> + * @pid: pid to pin
> + *
> + * Register a struct pid in pidfs. Needs to be paired with
> + * pidfs_put_pid() to not risk leaking the pidfs dentry and inode.
> + *
> + * Return: On success zero, on error a negative error code is returned.
> + */
> +int pidfs_register_pid(struct pid *pid)
> +{
> +       struct pidfs_attr *new_attr __free(kfree) = NULL;
> +       struct pidfs_attr *attr;
> +
> +       might_sleep();
> +
> +       if (!pid)
> +               return 0;
> +
> +       attr = READ_ONCE(pid->attr);
> +       if (unlikely(attr == PIDFS_PID_DEAD))
> +               return PTR_ERR(PIDFS_PID_DEAD);
> +       if (attr)
> +               return 0;
> +
> +       new_attr = kmem_cache_zalloc(pidfs_attr_cachep, GFP_KERNEL);
> +       if (!new_attr)
> +               return -ENOMEM;
> +
> +       /* Synchronize with pidfs_exit(). */
> +       guard(spinlock_irq)(&pid->wait_pidfd.lock);
> +
> +       attr = pid->attr;
> +       if (unlikely(attr == PIDFS_PID_DEAD))
> +               return PTR_ERR(PIDFS_PID_DEAD);
> +       if (unlikely(attr))
> +               return 0;
> +
> +       pid->attr = no_free_ptr(new_attr);
> +       return 0;
> +}
> +
> +static struct dentry *pidfs_stash_dentry(struct dentry **stashed,
> +                                        struct dentry *dentry)
> +{
> +       int ret;
> +       struct pid *pid = d_inode(dentry)->i_private;
> +
> +       VFS_WARN_ON_ONCE(stashed != &pid->stashed);
> +
> +       ret = pidfs_register_pid(pid);
> +       if (ret)
> +               return ERR_PTR(ret);
> +
> +       return stash_dentry(stashed, dentry);
> +}
> +
>  static const struct stashed_operations pidfs_stashed_ops = {
> -       .init_inode = pidfs_init_inode,
> -       .put_data = pidfs_put_data,
> +       .stash_dentry   = pidfs_stash_dentry,
> +       .init_inode     = pidfs_init_inode,
> +       .put_data       = pidfs_put_data,
>  };
>
>  static int pidfs_init_fs_context(struct fs_context *fc)
> @@ -936,33 +1040,6 @@ struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags)
>         return pidfd_file;
>  }
>
> -/**
> - * pidfs_register_pid - register a struct pid in pidfs
> - * @pid: pid to pin
> - *
> - * Register a struct pid in pidfs. Needs to be paired with
> - * pidfs_put_pid() to not risk leaking the pidfs dentry and inode.
> - *
> - * Return: On success zero, on error a negative error code is returned.
> - */
> -int pidfs_register_pid(struct pid *pid)
> -{
> -       struct path path __free(path_put) = {};
> -       int ret;
> -
> -       might_sleep();
> -
> -       if (!pid)
> -               return 0;
> -
> -       ret = path_from_stashed(&pid->stashed, pidfs_mnt, get_pid(pid), &path);
> -       if (unlikely(ret))
> -               return ret;
> -       /* Keep the dentry and only put the reference to the mount. */
> -       path.dentry = NULL;
> -       return 0;
> -}
> -
>  /**
>   * pidfs_get_pid - pin a struct pid through pidfs
>   * @pid: pid to pin
> @@ -1008,6 +1085,9 @@ void __init pidfs_init(void)
>                                          (SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT |
>                                           SLAB_ACCOUNT | SLAB_PANIC),
>                                          pidfs_inode_init_once);
> +       pidfs_attr_cachep = kmem_cache_create("pidfs_attr_cache", sizeof(struct pidfs_attr), 0,
> +                                        (SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT |
> +                                         SLAB_ACCOUNT | SLAB_PANIC), NULL);
>         pidfs_mnt = kern_mount(&pidfs_type);
>         if (IS_ERR(pidfs_mnt))
>                 panic("Failed to mount pidfs pseudo filesystem");
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index 00646a692dd4..003a1027d219 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -47,6 +47,8 @@
>
>  #define RESERVED_PIDS 300
>
> +struct pidfs_attr;
> +
>  struct upid {
>         int nr;
>         struct pid_namespace *ns;
> @@ -60,6 +62,7 @@ struct pid {
>                 u64 ino;
>                 struct rb_node pidfs_node;
>                 struct dentry *stashed;
> +               struct pidfs_attr *attr;
>         };
>         /* lists of tasks that use this pid */
>         struct hlist_head tasks[PIDTYPE_MAX];
> diff --git a/include/linux/pidfs.h b/include/linux/pidfs.h
> index 77e7db194914..8f6ed59bb3fb 100644
> --- a/include/linux/pidfs.h
> +++ b/include/linux/pidfs.h
> @@ -16,5 +16,6 @@ extern const struct dentry_operations pidfs_dentry_operations;
>  int pidfs_register_pid(struct pid *pid);
>  void pidfs_get_pid(struct pid *pid);
>  void pidfs_put_pid(struct pid *pid);
> +void pidfs_free_pid(struct pid *pid);
>
>  #endif /* _LINUX_PID_FS_H */
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 8317bcbc7cf7..07db7d8d066c 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -100,7 +100,7 @@ void put_pid(struct pid *pid)
>
>         ns = pid->numbers[pid->level].ns;
>         if (refcount_dec_and_test(&pid->count)) {
> -               WARN_ON_ONCE(pid->stashed);
> +               pidfs_free_pid(pid);
>                 kmem_cache_free(ns->pid_cachep, pid);
>                 put_pid_ns(ns);
>         }
>
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 07/16] pidfs: remove custom inode allocation
  2025-06-18 20:53 ` [PATCH v2 07/16] pidfs: remove custom inode allocation Christian Brauner
@ 2025-06-22 21:10   ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Mikhalitsyn @ 2025-06-22 21:10 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek

Am Mi., 18. Juni 2025 um 22:54 Uhr schrieb Christian Brauner
<brauner@kernel.org>:
>
> We don't need it anymore as persistent information is allocated lazily
> and stashed in struct pid.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

> ---
>  fs/pidfs.c | 39 ---------------------------------------
>  1 file changed, 39 deletions(-)
>
> diff --git a/fs/pidfs.c b/fs/pidfs.c
> index 72aac4f7b7d5..c49c53d6ae51 100644
> --- a/fs/pidfs.c
> +++ b/fs/pidfs.c
> @@ -27,7 +27,6 @@
>
>  #define PIDFS_PID_DEAD ERR_PTR(-ESRCH)
>
> -static struct kmem_cache *pidfs_cachep __ro_after_init;
>  static struct kmem_cache *pidfs_attr_cachep __ro_after_init;
>
>  /*
> @@ -45,15 +44,6 @@ struct pidfs_attr {
>         struct pidfs_exit_info *exit_info;
>  };
>
> -struct pidfs_inode {
> -       struct inode vfs_inode;
> -};
> -
> -static inline struct pidfs_inode *pidfs_i(struct inode *inode)
> -{
> -       return container_of(inode, struct pidfs_inode, vfs_inode);
> -}
> -
>  static struct rb_root pidfs_ino_tree = RB_ROOT;
>
>  #if BITS_PER_LONG == 32
> @@ -686,27 +676,9 @@ static void pidfs_evict_inode(struct inode *inode)
>         put_pid(pid);
>  }
>
> -static struct inode *pidfs_alloc_inode(struct super_block *sb)
> -{
> -       struct pidfs_inode *pi;
> -
> -       pi = alloc_inode_sb(sb, pidfs_cachep, GFP_KERNEL);
> -       if (!pi)
> -               return NULL;
> -
> -       return &pi->vfs_inode;
> -}
> -
> -static void pidfs_free_inode(struct inode *inode)
> -{
> -       kfree(pidfs_i(inode));
> -}
> -
>  static const struct super_operations pidfs_sops = {
> -       .alloc_inode    = pidfs_alloc_inode,
>         .drop_inode     = generic_delete_inode,
>         .evict_inode    = pidfs_evict_inode,
> -       .free_inode     = pidfs_free_inode,
>         .statfs         = simple_statfs,
>  };
>
> @@ -1067,19 +1039,8 @@ void pidfs_put_pid(struct pid *pid)
>         dput(pid->stashed);
>  }
>
> -static void pidfs_inode_init_once(void *data)
> -{
> -       struct pidfs_inode *pi = data;
> -
> -       inode_init_once(&pi->vfs_inode);
> -}
> -
>  void __init pidfs_init(void)
>  {
> -       pidfs_cachep = kmem_cache_create("pidfs_cache", sizeof(struct pidfs_inode), 0,
> -                                        (SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT |
> -                                         SLAB_ACCOUNT | SLAB_PANIC),
> -                                        pidfs_inode_init_once);
>         pidfs_attr_cachep = kmem_cache_create("pidfs_attr_cache", sizeof(struct pidfs_attr), 0,
>                                          (SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT |
>                                           SLAB_ACCOUNT | SLAB_PANIC), NULL);
>
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 06/16] pidfs: remove unused members from struct pidfs_inode
  2025-06-18 20:53 ` [PATCH v2 06/16] pidfs: remove unused members from struct pidfs_inode Christian Brauner
@ 2025-06-22 21:10   ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Mikhalitsyn @ 2025-06-22 21:10 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek

Am Mi., 18. Juni 2025 um 22:54 Uhr schrieb Christian Brauner
<brauner@kernel.org>:
>
> We've moved persistent information to struct pid.
> So there's no need for these anymore.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

> ---
>  fs/pidfs.c | 5 -----
>  1 file changed, 5 deletions(-)
>
> diff --git a/fs/pidfs.c b/fs/pidfs.c
> index 6a907457b1fe..72aac4f7b7d5 100644
> --- a/fs/pidfs.c
> +++ b/fs/pidfs.c
> @@ -46,8 +46,6 @@ struct pidfs_attr {
>  };
>
>  struct pidfs_inode {
> -       struct pidfs_exit_info __pei;
> -       struct pidfs_exit_info *exit_info;
>         struct inode vfs_inode;
>  };
>
> @@ -696,9 +694,6 @@ static struct inode *pidfs_alloc_inode(struct super_block *sb)
>         if (!pi)
>                 return NULL;
>
> -       memset(&pi->__pei, 0, sizeof(pi->__pei));
> -       pi->exit_info = NULL;
> -
>         return &pi->vfs_inode;
>  }
>
>
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 08/16] pidfs: remove pidfs_{get,put}_pid()
  2025-06-18 20:53 ` [PATCH v2 08/16] pidfs: remove pidfs_{get,put}_pid() Christian Brauner
@ 2025-06-22 21:11   ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Mikhalitsyn @ 2025-06-22 21:11 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek

Am Mi., 18. Juni 2025 um 22:54 Uhr schrieb Christian Brauner
<brauner@kernel.org>:
>
> Now that we stash persistent information in struct pid there's no need
> to play volatile games with pinning struct pid via dentries in pidfs.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

> ---
>  fs/coredump.c         |  6 ------
>  fs/pidfs.c            | 35 +----------------------------------
>  include/linux/pidfs.h |  2 --
>  net/unix/af_unix.c    |  5 -----
>  4 files changed, 1 insertion(+), 47 deletions(-)
>
> diff --git a/fs/coredump.c b/fs/coredump.c
> index f217ebf2b3b6..55d6a713a0fb 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -898,12 +898,6 @@ void do_coredump(const kernel_siginfo_t *siginfo)
>                 retval = kernel_connect(socket, (struct sockaddr *)(&addr),
>                                         addr_len, O_NONBLOCK | SOCK_COREDUMP);
>
> -               /*
> -                * ... Make sure to only put our reference after connect() took
> -                * its own reference keeping the pidfs entry alive ...
> -                */
> -               pidfs_put_pid(cprm.pid);
> -
>                 if (retval) {
>                         if (retval == -EAGAIN)
>                                 coredump_report_failure("Coredump socket %s receive queue full", addr.sun_path);
> diff --git a/fs/pidfs.c b/fs/pidfs.c
> index c49c53d6ae51..bc2342cf4492 100644
> --- a/fs/pidfs.c
> +++ b/fs/pidfs.c
> @@ -895,8 +895,7 @@ static void pidfs_put_data(void *data)
>   * pidfs_register_pid - register a struct pid in pidfs
>   * @pid: pid to pin
>   *
> - * Register a struct pid in pidfs. Needs to be paired with
> - * pidfs_put_pid() to not risk leaking the pidfs dentry and inode.
> + * Register a struct pid in pidfs.
>   *
>   * Return: On success zero, on error a negative error code is returned.
>   */
> @@ -1007,38 +1006,6 @@ struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags)
>         return pidfd_file;
>  }
>
> -/**
> - * pidfs_get_pid - pin a struct pid through pidfs
> - * @pid: pid to pin
> - *
> - * Similar to pidfs_register_pid() but only valid if the caller knows
> - * there's a reference to the @pid through a dentry already that can't
> - * go away.
> - */
> -void pidfs_get_pid(struct pid *pid)
> -{
> -       if (!pid)
> -               return;
> -       WARN_ON_ONCE(!stashed_dentry_get(&pid->stashed));
> -}
> -
> -/**
> - * pidfs_put_pid - drop a pidfs reference
> - * @pid: pid to drop
> - *
> - * Drop a reference to @pid via pidfs. This is only safe if the
> - * reference has been taken via pidfs_get_pid().
> - */
> -void pidfs_put_pid(struct pid *pid)
> -{
> -       might_sleep();
> -
> -       if (!pid)
> -               return;
> -       VFS_WARN_ON_ONCE(!pid->stashed);
> -       dput(pid->stashed);
> -}
> -
>  void __init pidfs_init(void)
>  {
>         pidfs_attr_cachep = kmem_cache_create("pidfs_attr_cache", sizeof(struct pidfs_attr), 0,
> diff --git a/include/linux/pidfs.h b/include/linux/pidfs.h
> index 8f6ed59bb3fb..3e08c33da2df 100644
> --- a/include/linux/pidfs.h
> +++ b/include/linux/pidfs.h
> @@ -14,8 +14,6 @@ void pidfs_coredump(const struct coredump_params *cprm);
>  #endif
>  extern const struct dentry_operations pidfs_dentry_operations;
>  int pidfs_register_pid(struct pid *pid);
> -void pidfs_get_pid(struct pid *pid);
> -void pidfs_put_pid(struct pid *pid);
>  void pidfs_free_pid(struct pid *pid);
>
>  #endif /* _LINUX_PID_FS_H */
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 2e2e9997a68e..129388c309b0 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -646,9 +646,6 @@ static void unix_sock_destructor(struct sock *sk)
>                 return;
>         }
>
> -       if (sk->sk_peer_pid)
> -               pidfs_put_pid(sk->sk_peer_pid);
> -
>         if (u->addr)
>                 unix_release_addr(u->addr);
>
> @@ -769,7 +766,6 @@ static void drop_peercred(struct unix_peercred *peercred)
>         swap(peercred->peer_pid, pid);
>         swap(peercred->peer_cred, cred);
>
> -       pidfs_put_pid(pid);
>         put_pid(pid);
>         put_cred(cred);
>  }
> @@ -802,7 +798,6 @@ static void copy_peercred(struct sock *sk, struct sock *peersk)
>
>         spin_lock(&sk->sk_peer_lock);
>         sk->sk_peer_pid = get_pid(peersk->sk_peer_pid);
> -       pidfs_get_pid(sk->sk_peer_pid);
>         sk->sk_peer_cred = get_cred(peersk->sk_peer_cred);
>         spin_unlock(&sk->sk_peer_lock);
>  }
>
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 09/16] pidfs: remove pidfs_pid_valid()
  2025-06-18 20:53 ` [PATCH v2 09/16] pidfs: remove pidfs_pid_valid() Christian Brauner
@ 2025-06-22 21:13   ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Mikhalitsyn @ 2025-06-22 21:13 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek

Am Mi., 18. Juni 2025 um 22:54 Uhr schrieb Christian Brauner
<brauner@kernel.org>:
>
> The validation is now completely handled in path_from_stashed().
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

> ---
>  fs/pidfs.c | 53 -----------------------------------------------------
>  1 file changed, 53 deletions(-)
>
> diff --git a/fs/pidfs.c b/fs/pidfs.c
> index bc2342cf4492..ec375692a710 100644
> --- a/fs/pidfs.c
> +++ b/fs/pidfs.c
> @@ -804,58 +804,8 @@ static int pidfs_export_permission(struct handle_to_path_ctx *ctx,
>         return 0;
>  }
>
> -static inline bool pidfs_pid_valid(struct pid *pid, const struct path *path,
> -                                  unsigned int flags)
> -{
> -       enum pid_type type;
> -
> -       if (flags & PIDFD_STALE)
> -               return true;
> -
> -       /*
> -        * Make sure that if a pidfd is created PIDFD_INFO_EXIT
> -        * information will be available. So after an inode for the
> -        * pidfd has been allocated perform another check that the pid
> -        * is still alive. If it is exit information is available even
> -        * if the task gets reaped before the pidfd is returned to
> -        * userspace. The only exception are indicated by PIDFD_STALE:
> -        *
> -        * (1) The kernel is in the middle of task creation and thus no
> -        *     task linkage has been established yet.
> -        * (2) The caller knows @pid has been registered in pidfs at a
> -        *     time when the task was still alive.
> -        *
> -        * In both cases exit information will have been reported.
> -        */
> -       if (flags & PIDFD_THREAD)
> -               type = PIDTYPE_PID;
> -       else
> -               type = PIDTYPE_TGID;
> -
> -       /*
> -        * Since pidfs_exit() is called before struct pid's task linkage
> -        * is removed the case where the task got reaped but a dentry
> -        * was already attached to struct pid and exit information was
> -        * recorded and published can be handled correctly.
> -        */
> -       if (unlikely(!pid_has_task(pid, type))) {
> -               struct pidfs_attr *attr;
> -
> -               attr = READ_ONCE(pid->attr);
> -               if (!attr)
> -                       return false;
> -               if (!READ_ONCE(attr->exit_info))
> -                       return false;
> -       }
> -
> -       return true;
> -}
> -
>  static struct file *pidfs_export_open(struct path *path, unsigned int oflags)
>  {
> -       if (!pidfs_pid_valid(d_inode(path->dentry)->i_private, path, oflags))
> -               return ERR_PTR(-ESRCH);
> -
>         /*
>          * Clear O_LARGEFILE as open_by_handle_at() forces it and raise
>          * O_RDWR as pidfds always are.
> @@ -993,9 +943,6 @@ struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags)
>         if (ret < 0)
>                 return ERR_PTR(ret);
>
> -       if (!pidfs_pid_valid(pid, &path, flags))
> -               return ERR_PTR(-ESRCH);
> -
>         flags &= ~PIDFD_STALE;
>         flags |= O_RDWR;
>         pidfd_file = dentry_open(&path, flags, current_cred());
>
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 10/16] libfs: prepare to allow for non-immutable pidfd inodes
  2025-06-18 20:53 ` [PATCH v2 10/16] libfs: prepare to allow for non-immutable pidfd inodes Christian Brauner
@ 2025-06-22 21:14   ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Mikhalitsyn @ 2025-06-22 21:14 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek

Am Mi., 18. Juni 2025 um 22:54 Uhr schrieb Christian Brauner
<brauner@kernel.org>:
>
> Allow for S_IMMUTABLE to be stripped so that we can support xattrs.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

> ---
>  fs/libfs.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 997d3a363ce6..c5b520df9032 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -2162,7 +2162,6 @@ static struct dentry *prepare_anon_dentry(struct dentry **stashed,
>
>         /* Notice when this is changed. */
>         WARN_ON_ONCE(!S_ISREG(inode->i_mode));
> -       WARN_ON_ONCE(!IS_IMMUTABLE(inode));
>
>         dentry = d_alloc_anon(sb);
>         if (!dentry) {
>
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 11/16] pidfs: make inodes mutable
  2025-06-18 20:53 ` [PATCH v2 11/16] pidfs: make inodes mutable Christian Brauner
@ 2025-06-22 21:14   ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Mikhalitsyn @ 2025-06-22 21:14 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek

Am Mi., 18. Juni 2025 um 22:54 Uhr schrieb Christian Brauner
<brauner@kernel.org>:
>
> Prepare for allowing extended attributes to be set on pidfd inodes by
> allowing them to be mutable.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

> ---
>  fs/pidfs.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/fs/pidfs.c b/fs/pidfs.c
> index ec375692a710..df5bc69ea1c0 100644
> --- a/fs/pidfs.c
> +++ b/fs/pidfs.c
> @@ -827,6 +827,8 @@ static int pidfs_init_inode(struct inode *inode, void *data)
>
>         inode->i_private = data;
>         inode->i_flags |= S_PRIVATE | S_ANON_INODE;
> +       /* We allow to set xattrs. */
> +       inode->i_flags &= ~S_IMMUTABLE;
>         inode->i_mode |= S_IRWXU;
>         inode->i_op = &pidfs_inode_operations;
>         inode->i_fop = &pidfs_file_operations;
>
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 12/16] pidfs: support xattrs on pidfds
  2025-06-18 20:53 ` [PATCH v2 12/16] pidfs: support xattrs on pidfds Christian Brauner
@ 2025-06-22 21:19   ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Mikhalitsyn @ 2025-06-22 21:19 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek

Am Mi., 18. Juni 2025 um 22:54 Uhr schrieb Christian Brauner
<brauner@kernel.org>:
>
> Now that we have a way to persist information for pidfs dentries we can
> start supporting extended attributes on pidfds. This will allow
> userspace to attach meta information to tasks.
>
> One natural extension would be to introduce a custom pidfs.* extended
> attribute space and allow for the inheritance of extended attributes
> across fork() and exec().
>
> The first simple scheme will allow privileged userspace to set trusted
> extended attributes on pidfs inodes.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> ---
>  fs/pidfs.c | 106 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 102 insertions(+), 4 deletions(-)
>
> diff --git a/fs/pidfs.c b/fs/pidfs.c
> index df5bc69ea1c0..15d99854d243 100644
> --- a/fs/pidfs.c
> +++ b/fs/pidfs.c
> @@ -21,6 +21,7 @@
>  #include <linux/utsname.h>
>  #include <net/net_namespace.h>
>  #include <linux/coredump.h>
> +#include <linux/xattr.h>
>
>  #include "internal.h"
>  #include "mount.h"
> @@ -28,6 +29,7 @@
>  #define PIDFS_PID_DEAD ERR_PTR(-ESRCH)
>
>  static struct kmem_cache *pidfs_attr_cachep __ro_after_init;
> +static struct kmem_cache *pidfs_xattr_cachep __ro_after_init;
>
>  /*
>   * Stashes information that userspace needs to access even after the
> @@ -40,6 +42,7 @@ struct pidfs_exit_info {
>  };
>
>  struct pidfs_attr {
> +       struct simple_xattrs *xattrs;
>         struct pidfs_exit_info __pei;
>         struct pidfs_exit_info *exit_info;
>  };
> @@ -138,14 +141,27 @@ void pidfs_remove_pid(struct pid *pid)
>
>  void pidfs_free_pid(struct pid *pid)
>  {
> +       struct pidfs_attr *attr __free(kfree) = no_free_ptr(pid->attr);
> +       struct simple_xattrs *xattrs __free(kfree) = NULL;
> +
>         /*
>          * Any dentry must've been wiped from the pid by now.
>          * Otherwise there's a reference count bug.
>          */
>         VFS_WARN_ON_ONCE(pid->stashed);
>
> -       if (!IS_ERR(pid->attr))
> -               kfree(pid->attr);
> +       if (IS_ERR(attr))
> +               return;
> +
> +       /*
> +        * Any dentry must've been wiped from the pid by now. Otherwise
> +        * there's a reference count bug.
> +        */
> +       VFS_WARN_ON_ONCE(pid->stashed);

We have (almost) the same chunk 5 lines above. ;-)

> +
> +       xattrs = attr->xattrs;
> +       if (xattrs)
> +               simple_xattrs_free(attr->xattrs, NULL);
>  }
>
>  #ifdef CONFIG_PROC_FS
> @@ -663,9 +679,24 @@ static int pidfs_getattr(struct mnt_idmap *idmap, const struct path *path,
>         return anon_inode_getattr(idmap, path, stat, request_mask, query_flags);
>  }
>
> +static ssize_t pidfs_listxattr(struct dentry *dentry, char *buf, size_t size)
> +{
> +       struct inode *inode = d_inode(dentry);
> +       struct pid *pid = inode->i_private;
> +       struct pidfs_attr *attr = pid->attr;
> +       struct simple_xattrs *xattrs;
> +
> +       xattrs = READ_ONCE(attr->xattrs);
> +       if (!xattrs)
> +               return 0;
> +
> +       return simple_xattr_list(inode, xattrs, buf, size);
> +}
> +
>  static const struct inode_operations pidfs_inode_operations = {
> -       .getattr = pidfs_getattr,
> -       .setattr = pidfs_setattr,
> +       .getattr        = pidfs_getattr,
> +       .setattr        = pidfs_setattr,
> +       .listxattr      = pidfs_listxattr,
>  };
>
>  static void pidfs_evict_inode(struct inode *inode)
> @@ -905,6 +936,67 @@ static const struct stashed_operations pidfs_stashed_ops = {
>         .put_data       = pidfs_put_data,
>  };
>
> +static int pidfs_xattr_get(const struct xattr_handler *handler,
> +                          struct dentry *unused, struct inode *inode,
> +                          const char *suffix, void *value, size_t size)
> +{
> +       struct pid *pid = inode->i_private;
> +       struct pidfs_attr *attr = pid->attr;
> +       const char *name;
> +       struct simple_xattrs *xattrs;
> +
> +       xattrs = READ_ONCE(attr->xattrs);
> +       if (!xattrs)
> +               return 0;
> +
> +       name = xattr_full_name(handler, suffix);
> +       return simple_xattr_get(xattrs, name, value, size);
> +}
> +
> +static int pidfs_xattr_set(const struct xattr_handler *handler,
> +                          struct mnt_idmap *idmap, struct dentry *unused,
> +                          struct inode *inode, const char *suffix,
> +                          const void *value, size_t size, int flags)
> +{
> +       struct pid *pid = inode->i_private;
> +       struct pidfs_attr *attr = pid->attr;
> +       const char *name;
> +       struct simple_xattrs *xattrs;
> +       struct simple_xattr *old_xattr;
> +
> +       /* Ensure we're the only one to set @attr->xattrs. */
> +       WARN_ON_ONCE(!inode_is_locked(inode));
> +
> +       xattrs = READ_ONCE(attr->xattrs);
> +       if (!xattrs) {
> +               xattrs = kmem_cache_zalloc(pidfs_xattr_cachep, GFP_KERNEL);
> +               if (!xattrs)
> +                       return -ENOMEM;
> +
> +               simple_xattrs_init(xattrs);
> +               smp_store_release(&pid->attr->xattrs, xattrs);
> +       }
> +
> +       name = xattr_full_name(handler, suffix);
> +       old_xattr = simple_xattr_set(xattrs, name, value, size, flags);
> +       if (IS_ERR(old_xattr))
> +               return PTR_ERR(old_xattr);
> +
> +       simple_xattr_free(old_xattr);
> +       return 0;
> +}
> +
> +static const struct xattr_handler pidfs_trusted_xattr_handler = {
> +       .prefix = XATTR_TRUSTED_PREFIX,
> +       .get    = pidfs_xattr_get,
> +       .set    = pidfs_xattr_set,
> +};
> +
> +static const struct xattr_handler *const pidfs_xattr_handlers[] = {
> +       &pidfs_trusted_xattr_handler,
> +       NULL
> +};
> +
>  static int pidfs_init_fs_context(struct fs_context *fc)
>  {
>         struct pseudo_fs_context *ctx;
> @@ -918,6 +1010,7 @@ static int pidfs_init_fs_context(struct fs_context *fc)
>         ctx->ops = &pidfs_sops;
>         ctx->eops = &pidfs_export_operations;
>         ctx->dops = &pidfs_dentry_operations;
> +       ctx->xattr = pidfs_xattr_handlers;
>         fc->s_fs_info = (void *)&pidfs_stashed_ops;
>         return 0;
>  }
> @@ -960,6 +1053,11 @@ void __init pidfs_init(void)
>         pidfs_attr_cachep = kmem_cache_create("pidfs_attr_cache", sizeof(struct pidfs_attr), 0,
>                                          (SLAB_HWCACHE_ALIGN | SLAB_RECLAIM_ACCOUNT |
>                                           SLAB_ACCOUNT | SLAB_PANIC), NULL);
> +
> +       pidfs_xattr_cachep = kmem_cache_create("pidfs_xattr_cache",
> +                                              sizeof(struct simple_xattrs), 0,
> +                                              SLAB_PANIC, NULL);

WDYT about adding SLAB_ACCOUNT here?

> +
>         pidfs_mnt = kern_mount(&pidfs_type);
>         if (IS_ERR(pidfs_mnt))
>                 panic("Failed to mount pidfs pseudo filesystem");
>
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 13/16] selftests/pidfd: test extended attribute support
  2025-06-18 20:53 ` [PATCH v2 13/16] selftests/pidfd: test extended attribute support Christian Brauner
@ 2025-06-22 21:21   ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Mikhalitsyn @ 2025-06-22 21:21 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek

Am Mi., 18. Juni 2025 um 22:54 Uhr schrieb Christian Brauner
<brauner@kernel.org>:
>
> Add tests for extended attribute support on pidfds.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

> ---
>  tools/testing/selftests/pidfd/.gitignore         |  1 +
>  tools/testing/selftests/pidfd/Makefile           |  3 +-
>  tools/testing/selftests/pidfd/pidfd_xattr_test.c | 97 ++++++++++++++++++++++++
>  3 files changed, 100 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/pidfd/.gitignore b/tools/testing/selftests/pidfd/.gitignore
> index 0406a065deb4..bc4130506eda 100644
> --- a/tools/testing/selftests/pidfd/.gitignore
> +++ b/tools/testing/selftests/pidfd/.gitignore
> @@ -10,3 +10,4 @@ pidfd_file_handle_test
>  pidfd_bind_mount
>  pidfd_info_test
>  pidfd_exec_helper
> +pidfd_xattr_test
> diff --git a/tools/testing/selftests/pidfd/Makefile b/tools/testing/selftests/pidfd/Makefile
> index fcbefc0d77f6..c9fd5023ef15 100644
> --- a/tools/testing/selftests/pidfd/Makefile
> +++ b/tools/testing/selftests/pidfd/Makefile
> @@ -3,7 +3,8 @@ CFLAGS += -g $(KHDR_INCLUDES) -pthread -Wall
>
>  TEST_GEN_PROGS := pidfd_test pidfd_fdinfo_test pidfd_open_test \
>         pidfd_poll_test pidfd_wait pidfd_getfd_test pidfd_setns_test \
> -       pidfd_file_handle_test pidfd_bind_mount pidfd_info_test
> +       pidfd_file_handle_test pidfd_bind_mount pidfd_info_test \
> +       pidfd_xattr_test
>
>  TEST_GEN_PROGS_EXTENDED := pidfd_exec_helper
>
> diff --git a/tools/testing/selftests/pidfd/pidfd_xattr_test.c b/tools/testing/selftests/pidfd/pidfd_xattr_test.c
> new file mode 100644
> index 000000000000..00d400ac515b
> --- /dev/null
> +++ b/tools/testing/selftests/pidfd/pidfd_xattr_test.c
> @@ -0,0 +1,97 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define _GNU_SOURCE
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <limits.h>
> +#include <linux/types.h>
> +#include <poll.h>
> +#include <pthread.h>
> +#include <sched.h>
> +#include <signal.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <syscall.h>
> +#include <sys/prctl.h>
> +#include <sys/wait.h>
> +#include <unistd.h>
> +#include <sys/socket.h>
> +#include <linux/kcmp.h>
> +#include <sys/stat.h>
> +#include <sys/xattr.h>
> +
> +#include "pidfd.h"
> +#include "../kselftest_harness.h"
> +
> +FIXTURE(pidfs_xattr)
> +{
> +       pid_t child_pid;
> +       int child_pidfd;
> +};
> +
> +FIXTURE_SETUP(pidfs_xattr)
> +{
> +       self->child_pid = create_child(&self->child_pidfd, CLONE_NEWUSER | CLONE_NEWPID);
> +       EXPECT_GE(self->child_pid, 0);
> +
> +       if (self->child_pid == 0)
> +               _exit(EXIT_SUCCESS);
> +}
> +
> +FIXTURE_TEARDOWN(pidfs_xattr)
> +{
> +       sys_waitid(P_PID, self->child_pid, NULL, WEXITED);
> +}
> +
> +TEST_F(pidfs_xattr, set_get_list_xattr_multiple)
> +{
> +       int ret, i;
> +       char xattr_name[32];
> +       char xattr_value[32];
> +       char buf[32];
> +       const int num_xattrs = 10;
> +       char list[PATH_MAX] = {};
> +
> +       for (i = 0; i < num_xattrs; i++) {
> +               snprintf(xattr_name, sizeof(xattr_name), "trusted.testattr%d", i);
> +               snprintf(xattr_value, sizeof(xattr_value), "testvalue%d", i);
> +               ret = fsetxattr(self->child_pidfd, xattr_name, xattr_value, strlen(xattr_value), 0);
> +               ASSERT_EQ(ret, 0);
> +       }
> +
> +       for (i = 0; i < num_xattrs; i++) {
> +               snprintf(xattr_name, sizeof(xattr_name), "trusted.testattr%d", i);
> +               snprintf(xattr_value, sizeof(xattr_value), "testvalue%d", i);
> +               memset(buf, 0, sizeof(buf));
> +               ret = fgetxattr(self->child_pidfd, xattr_name, buf, sizeof(buf));
> +               ASSERT_EQ(ret, strlen(xattr_value));
> +               ASSERT_EQ(strcmp(buf, xattr_value), 0);
> +       }
> +
> +       ret = flistxattr(self->child_pidfd, list, sizeof(list));
> +       ASSERT_GT(ret, 0);
> +       for (i = 0; i < num_xattrs; i++) {
> +               snprintf(xattr_name, sizeof(xattr_name), "trusted.testattr%d", i);
> +               bool found = false;
> +               for (char *it = list; it < list + ret; it += strlen(it) + 1) {
> +                       if (strcmp(it, xattr_name))
> +                               continue;
> +                       found = true;
> +                       break;
> +               }
> +               ASSERT_TRUE(found);
> +       }
> +
> +       for (i = 0; i < num_xattrs; i++) {
> +               snprintf(xattr_name, sizeof(xattr_name), "trusted.testattr%d", i);
> +               ret = fremovexattr(self->child_pidfd, xattr_name);
> +               ASSERT_EQ(ret, 0);
> +
> +               ret = fgetxattr(self->child_pidfd, xattr_name, buf, sizeof(buf));
> +               ASSERT_EQ(ret, -1);
> +               ASSERT_EQ(errno, ENODATA);
> +       }
> +}
> +
> +TEST_HARNESS_MAIN
>
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 14/16] selftests/pidfd: test extended attribute support
  2025-06-18 20:53 ` [PATCH v2 14/16] " Christian Brauner
@ 2025-06-22 21:21   ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Mikhalitsyn @ 2025-06-22 21:21 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek

Am Mi., 18. Juni 2025 um 22:54 Uhr schrieb Christian Brauner
<brauner@kernel.org>:
>
> Test that extended attributes are permanent.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

> ---
>  tools/testing/selftests/pidfd/pidfd_xattr_test.c | 35 ++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
>
> diff --git a/tools/testing/selftests/pidfd/pidfd_xattr_test.c b/tools/testing/selftests/pidfd/pidfd_xattr_test.c
> index 00d400ac515b..5cf7bb0e4bf2 100644
> --- a/tools/testing/selftests/pidfd/pidfd_xattr_test.c
> +++ b/tools/testing/selftests/pidfd/pidfd_xattr_test.c
> @@ -94,4 +94,39 @@ TEST_F(pidfs_xattr, set_get_list_xattr_multiple)
>         }
>  }
>
> +TEST_F(pidfs_xattr, set_get_list_xattr_persistent)
> +{
> +       int ret;
> +       char buf[32];
> +       char list[PATH_MAX] = {};
> +
> +       ret = fsetxattr(self->child_pidfd, "trusted.persistent", "persistent value", strlen("persistent value"), 0);
> +       ASSERT_EQ(ret, 0);
> +
> +       memset(buf, 0, sizeof(buf));
> +       ret = fgetxattr(self->child_pidfd, "trusted.persistent", buf, sizeof(buf));
> +       ASSERT_EQ(ret, strlen("persistent value"));
> +       ASSERT_EQ(strcmp(buf, "persistent value"), 0);
> +
> +       ret = flistxattr(self->child_pidfd, list, sizeof(list));
> +       ASSERT_GT(ret, 0);
> +       ASSERT_EQ(strcmp(list, "trusted.persistent"), 0)
> +
> +       ASSERT_EQ(close(self->child_pidfd), 0);
> +       self->child_pidfd = -EBADF;
> +       sleep(2);
> +
> +       self->child_pidfd = sys_pidfd_open(self->child_pid, 0);
> +       ASSERT_GE(self->child_pidfd, 0);
> +
> +       memset(buf, 0, sizeof(buf));
> +       ret = fgetxattr(self->child_pidfd, "trusted.persistent", buf, sizeof(buf));
> +       ASSERT_EQ(ret, strlen("persistent value"));
> +       ASSERT_EQ(strcmp(buf, "persistent value"), 0);
> +
> +       ret = flistxattr(self->child_pidfd, list, sizeof(list));
> +       ASSERT_GT(ret, 0);
> +       ASSERT_EQ(strcmp(list, "trusted.persistent"), 0);
> +}
> +
>  TEST_HARNESS_MAIN
>
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 15/16] selftests/pidfd: test setattr support
  2025-06-18 20:53 ` [PATCH v2 15/16] selftests/pidfd: test setattr support Christian Brauner
@ 2025-06-22 21:22   ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Mikhalitsyn @ 2025-06-22 21:22 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek

Am Mi., 18. Juni 2025 um 22:54 Uhr schrieb Christian Brauner
<brauner@kernel.org>:
>
> Verify that ->setattr() on a pidfd doens't work.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

> ---
>  tools/testing/selftests/pidfd/.gitignore           |  1 +
>  tools/testing/selftests/pidfd/Makefile             |  2 +-
>  tools/testing/selftests/pidfd/pidfd_setattr_test.c | 69 ++++++++++++++++++++++
>  3 files changed, 71 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/pidfd/.gitignore b/tools/testing/selftests/pidfd/.gitignore
> index bc4130506eda..144e7ff65d6a 100644
> --- a/tools/testing/selftests/pidfd/.gitignore
> +++ b/tools/testing/selftests/pidfd/.gitignore
> @@ -11,3 +11,4 @@ pidfd_bind_mount
>  pidfd_info_test
>  pidfd_exec_helper
>  pidfd_xattr_test
> +pidfd_setattr_test
> diff --git a/tools/testing/selftests/pidfd/Makefile b/tools/testing/selftests/pidfd/Makefile
> index c9fd5023ef15..03a6eede9c9e 100644
> --- a/tools/testing/selftests/pidfd/Makefile
> +++ b/tools/testing/selftests/pidfd/Makefile
> @@ -4,7 +4,7 @@ CFLAGS += -g $(KHDR_INCLUDES) -pthread -Wall
>  TEST_GEN_PROGS := pidfd_test pidfd_fdinfo_test pidfd_open_test \
>         pidfd_poll_test pidfd_wait pidfd_getfd_test pidfd_setns_test \
>         pidfd_file_handle_test pidfd_bind_mount pidfd_info_test \
> -       pidfd_xattr_test
> +       pidfd_xattr_test pidfd_setattr_test
>
>  TEST_GEN_PROGS_EXTENDED := pidfd_exec_helper
>
> diff --git a/tools/testing/selftests/pidfd/pidfd_setattr_test.c b/tools/testing/selftests/pidfd/pidfd_setattr_test.c
> new file mode 100644
> index 000000000000..d7de05edc4b3
> --- /dev/null
> +++ b/tools/testing/selftests/pidfd/pidfd_setattr_test.c
> @@ -0,0 +1,69 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define _GNU_SOURCE
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <limits.h>
> +#include <linux/types.h>
> +#include <poll.h>
> +#include <pthread.h>
> +#include <sched.h>
> +#include <signal.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <syscall.h>
> +#include <sys/prctl.h>
> +#include <sys/wait.h>
> +#include <unistd.h>
> +#include <sys/socket.h>
> +#include <linux/kcmp.h>
> +#include <sys/stat.h>
> +#include <sys/xattr.h>
> +
> +#include "pidfd.h"
> +#include "../kselftest_harness.h"
> +
> +FIXTURE(pidfs_setattr)
> +{
> +       pid_t child_pid;
> +       int child_pidfd;
> +};
> +
> +FIXTURE_SETUP(pidfs_setattr)
> +{
> +       self->child_pid = create_child(&self->child_pidfd, CLONE_NEWUSER | CLONE_NEWPID);
> +       EXPECT_GE(self->child_pid, 0);
> +
> +       if (self->child_pid == 0)
> +               _exit(EXIT_SUCCESS);
> +}
> +
> +FIXTURE_TEARDOWN(pidfs_setattr)
> +{
> +       sys_waitid(P_PID, self->child_pid, NULL, WEXITED);
> +       EXPECT_EQ(close(self->child_pidfd), 0);
> +}
> +
> +TEST_F(pidfs_setattr, no_chown)
> +{
> +       ASSERT_LT(fchown(self->child_pidfd, 1234, 5678), 0);
> +       ASSERT_EQ(errno, EOPNOTSUPP);
> +}
> +
> +TEST_F(pidfs_setattr, no_chmod)
> +{
> +       ASSERT_LT(fchmod(self->child_pidfd, 0777), 0);
> +       ASSERT_EQ(errno, EOPNOTSUPP);
> +}
> +
> +TEST_F(pidfs_setattr, no_exec)
> +{
> +       char *const argv[] = { NULL };
> +       char *const envp[] = { NULL };
> +
> +       ASSERT_LT(execveat(self->child_pidfd, "", argv, envp, AT_EMPTY_PATH), 0);
> +       ASSERT_EQ(errno, EACCES);
> +}
> +
> +TEST_HARNESS_MAIN
>
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 16/16] pidfs: add some CONFIG_DEBUG_VFS asserts
  2025-06-18 20:53 ` [PATCH v2 16/16] pidfs: add some CONFIG_DEBUG_VFS asserts Christian Brauner
@ 2025-06-22 21:22   ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Mikhalitsyn @ 2025-06-22 21:22 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek

Am Mi., 18. Juni 2025 um 22:54 Uhr schrieb Christian Brauner
<brauner@kernel.org>:
>
> Allow to catch some obvious bugs.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

> ---
>  fs/pidfs.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/fs/pidfs.c b/fs/pidfs.c
> index 15d99854d243..1cf66fd9961e 100644
> --- a/fs/pidfs.c
> +++ b/fs/pidfs.c
> @@ -809,6 +809,8 @@ static struct dentry *pidfs_fh_to_dentry(struct super_block *sb,
>         if (ret < 0)
>                 return ERR_PTR(ret);
>
> +       VFS_WARN_ON_ONCE(!pid->attr);
> +
>         mntput(path.mnt);
>         return path.dentry;
>  }
> @@ -1038,6 +1040,8 @@ struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags)
>         if (ret < 0)
>                 return ERR_PTR(ret);
>
> +       VFS_WARN_ON_ONCE(!pid->attr);
> +
>         flags &= ~PIDFD_STALE;
>         flags |= O_RDWR;
>         pidfd_file = dentry_open(&path, flags, current_cred());
>
> --
> 2.47.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2025-06-22 21:23 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
2025-06-18 20:53 ` [PATCH v2 01/16] pidfs: raise SB_I_NODEV and SB_I_NOEXEC Christian Brauner
2025-06-22 20:37   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 02/16] libfs: massage path_from_stashed() to allow custom stashing behavior Christian Brauner
2025-06-22 20:40   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 03/16] libfs: massage path_from_stashed() Christian Brauner
2025-06-22 20:42   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 04/16] pidfs: move to anonymous struct Christian Brauner
2025-06-22 20:44   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 05/16] pidfs: persist information Christian Brauner
2025-06-22 21:09   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 06/16] pidfs: remove unused members from struct pidfs_inode Christian Brauner
2025-06-22 21:10   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 07/16] pidfs: remove custom inode allocation Christian Brauner
2025-06-22 21:10   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 08/16] pidfs: remove pidfs_{get,put}_pid() Christian Brauner
2025-06-22 21:11   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 09/16] pidfs: remove pidfs_pid_valid() Christian Brauner
2025-06-22 21:13   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 10/16] libfs: prepare to allow for non-immutable pidfd inodes Christian Brauner
2025-06-22 21:14   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 11/16] pidfs: make inodes mutable Christian Brauner
2025-06-22 21:14   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 12/16] pidfs: support xattrs on pidfds Christian Brauner
2025-06-22 21:19   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 13/16] selftests/pidfd: test extended attribute support Christian Brauner
2025-06-22 21:21   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 14/16] " Christian Brauner
2025-06-22 21:21   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 15/16] selftests/pidfd: test setattr support Christian Brauner
2025-06-22 21:22   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 16/16] pidfs: add some CONFIG_DEBUG_VFS asserts Christian Brauner
2025-06-22 21:22   ` Alexander Mikhalitsyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).