linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/16] pidfs: persistent info & xattrs
@ 2025-06-18 20:53 Christian Brauner
  2025-06-18 20:53 ` [PATCH v2 01/16] pidfs: raise SB_I_NODEV and SB_I_NOEXEC Christian Brauner
                   ` (15 more replies)
  0 siblings, 16 replies; 33+ messages in thread
From: Christian Brauner @ 2025-06-18 20:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jann Horn, Josef Bacik, Jeff Layton, Daan De Meyer,
	Lennart Poettering, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	Christian Brauner, Alexander Mikhalitsyn

Persist exit and coredump information independent of whether anyone
currently holds a pidfd for the struct pid.

The current scheme allocated pidfs dentries on-demand repeatedly.
This scheme is reaching it's limits as it makes it impossible to pin
information that needs to be available after the task has exited or
coredumped and that should not be lost simply because the pidfd got
closed temporarily. The next opener should still see the stashed
information.

This is also a prerequisite for supporting extended attributes on
pidfds to allow attaching meta information to them.

If someone opens a pidfd for a struct pid a pidfs dentry is allocated
and stashed in pid->stashed. Once the last pidfd for the struct pid is
closed the pidfs dentry is released and removed from pid->stashed.

So if 10 callers create a pidfs dentry for the same struct pid
sequentially, i.e., each closing the pidfd before the other creates a
new one then a new pidfs dentry is allocated every time.

Because multiple tasks acquiring and releasing a pidfd for the same
struct pid can race with each another a task may still find a valid
pidfs entry from the previous task in pid->stashed and reuse it. Or it
might find a dead dentry in there and fail to reuse it and so stashes a
new pidfs dentry. Multiple tasks may race to stash a new pidfs dentry
but only one will succeed, the other ones will put their dentry.

The current scheme aims to ensure that a pidfs dentry for a struct pid
can only be created if the task is still alive or if a pidfs dentry
already existed before the task was reaped and so exit information has
been was stashed in the pidfs inode.

That's great except that it's buggy. If a pidfs dentry is stashed in
pid->stashed after pidfs_exit() but before __unhash_process() is called
we will return a pidfd for a reaped task without exit information being
available.

The pidfds_pid_valid() check does not guard against this race as it
doens't sync at all with pidfs_exit(). The pid_has_task() check might be
successful simply because we're before __unhash_process() but after
pidfs_exit().

Introduce a new scheme where the lifetime of information associated with
a pidfs entry (coredump and exit information) isn't bound to the
lifetime of the pidfs inode but the struct pid itself.

The first time a pidfs dentry is allocated for a struct pid a struct
pidfs_attr will be allocated which will be used to store exit and
coredump information.

If all pidfs for the pidfs dentry are closed the dentry and inode can be
cleaned up but the struct pidfs_attr will stick until the struct pid
itself is freed. This will ensure minimal memory usage while persisting
relevant information.

The new scheme has various advantages. First, it allows to close the
race where we end up handing out a pidfd for a reaped task for which no
exit information is available. Second, it minimizes memory usage.
Third, it allows to remove complex lifetime tracking via dentries when
registering a struct pid with pidfs. There's no need to get or put a
reference. Instead, the lifetime of exit and coredump information
associated with a struct pid is bound to the lifetime of struct pid
itself.

Now that we have a way to persist information for pidfs dentries we can
start supporting extended attributes on pidfds. This will allow
userspace to attach meta information to tasks.

One natural extension would be to introduce a custom pidfs.* extended
attribute space and allow for the inheritance of extended attributes
across fork() and exec().

The first simple scheme will allow privileged userspace to set trusted
extended attributes on pidfs inodes.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
Christian Brauner (16):
      pidfs: raise SB_I_NODEV and SB_I_NOEXEC
      libfs: massage path_from_stashed() to allow custom stashing behavior
      libfs: massage path_from_stashed()
      pidfs: move to anonymous struct
      pidfs: persist information
      pidfs: remove unused members from struct pidfs_inode
      pidfs: remove custom inode allocation
      pidfs: remove pidfs_{get,put}_pid()
      pidfs: remove pidfs_pid_valid()
      libfs: prepare to allow for non-immutable pidfd inodes
      pidfs: make inodes mutable
      pidfs: support xattrs on pidfds
      selftests/pidfd: test extended attribute support
      selftests/pidfd: test extended attribute support
      selftests/pidfd: test setattr support
      pidfs: add some CONFIG_DEBUG_VFS asserts

 fs/coredump.c                                      |   6 -
 fs/internal.h                                      |   3 +
 fs/libfs.c                                         |  34 +-
 fs/pidfs.c                                         | 422 ++++++++++++---------
 include/linux/pid.h                                |  14 +-
 include/linux/pidfs.h                              |   3 +-
 kernel/pid.c                                       |   2 +-
 net/unix/af_unix.c                                 |   5 -
 tools/testing/selftests/pidfd/.gitignore           |   2 +
 tools/testing/selftests/pidfd/Makefile             |   3 +-
 tools/testing/selftests/pidfd/pidfd_setattr_test.c |  69 ++++
 tools/testing/selftests/pidfd/pidfd_xattr_test.c   | 132 +++++++
 12 files changed, 480 insertions(+), 215 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250618-work-pidfs-persistent-251fe3cf5c3b


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2025-06-22 21:23 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-18 20:53 [PATCH v2 00/16] pidfs: persistent info & xattrs Christian Brauner
2025-06-18 20:53 ` [PATCH v2 01/16] pidfs: raise SB_I_NODEV and SB_I_NOEXEC Christian Brauner
2025-06-22 20:37   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 02/16] libfs: massage path_from_stashed() to allow custom stashing behavior Christian Brauner
2025-06-22 20:40   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 03/16] libfs: massage path_from_stashed() Christian Brauner
2025-06-22 20:42   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 04/16] pidfs: move to anonymous struct Christian Brauner
2025-06-22 20:44   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 05/16] pidfs: persist information Christian Brauner
2025-06-22 21:09   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 06/16] pidfs: remove unused members from struct pidfs_inode Christian Brauner
2025-06-22 21:10   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 07/16] pidfs: remove custom inode allocation Christian Brauner
2025-06-22 21:10   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 08/16] pidfs: remove pidfs_{get,put}_pid() Christian Brauner
2025-06-22 21:11   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 09/16] pidfs: remove pidfs_pid_valid() Christian Brauner
2025-06-22 21:13   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 10/16] libfs: prepare to allow for non-immutable pidfd inodes Christian Brauner
2025-06-22 21:14   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 11/16] pidfs: make inodes mutable Christian Brauner
2025-06-22 21:14   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 12/16] pidfs: support xattrs on pidfds Christian Brauner
2025-06-22 21:19   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 13/16] selftests/pidfd: test extended attribute support Christian Brauner
2025-06-22 21:21   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 14/16] " Christian Brauner
2025-06-22 21:21   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 15/16] selftests/pidfd: test setattr support Christian Brauner
2025-06-22 21:22   ` Alexander Mikhalitsyn
2025-06-18 20:53 ` [PATCH v2 16/16] pidfs: add some CONFIG_DEBUG_VFS asserts Christian Brauner
2025-06-22 21:22   ` Alexander Mikhalitsyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).