linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] coredump: hand a pidfd to the usermode coredump helper
@ 2025-04-14 13:55 Christian Brauner
  2025-04-14 13:55 ` [PATCH v2 1/3] pidfs: move O_RDWR into pidfs_alloc_file() Christian Brauner
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Christian Brauner @ 2025-04-14 13:55 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Oleg Nesterov, Luca Boccassi, Lennart Poettering, Daan De Meyer,
	Mike Yuan, Zbigniew Jędrzejewski-Szmek, linux-kernel,
	Christian Brauner

Give userspace a way to instruct the kernel to install a pidfd for the
crashing process into the process started as a usermode helper. There's
still tricky race-windows that cannot be easily or sometimes not closed
at all by userspace. There's various ways like looking at the start time
of a process to make sure that the usermode helper process is started
after the crashing process but it's all very very brittle and fraught
with peril.

The crashed-but-not-reaped process can be killed by userspace before
coredump processing programs like systemd-coredump have had time to
manually open a PIDFD from the PID the kernel provides them, which means
they can be tricked into reading from an arbitrary process, and they run
with full privileges as they are usermode helper processes.

Even if that specific race-window wouldn't exist it's still the safest
and cleanest way to let the kernel provide the pidfd directly instead of
requiring userspace to do it manually. In parallel with this commit we
already have systemd adding support for this in [1].

When the usermode helper process is forked we install a pidfd file
descriptor three into the usermode helper's file descriptor table so
it's available to the exec'd program.

Since usermode helpers are either children of the system_unbound_wq
workqueue or kthreadd we know that the file descriptor table is empty
and can thus always use three as the file descriptor number.

Note, that we'll install a pidfd for the thread-group leader even if a
subthread is calling do_coredump(). We know that task linkage hasn't
been removed yet and even if this @current isn't the actual thread-group
leader we know that the thread-group leader cannot be reaped until
@current has exited.

[1]: https://github.com/systemd/systemd/pull/37125

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
Changes in v2:
- Store a pid in struct coredump_params instead of a file.
- Link to v1: https://lore.kernel.org/20250414-work-coredump-v1-0-6caebc807ff4@kernel.org

---
Christian Brauner (3):
      pidfs: move O_RDWR into pidfs_alloc_file()
      coredump: fix error handling for replace_fd()
      coredump: hand a pidfd to the usermode coredump helper

 fs/coredump.c            | 68 +++++++++++++++++++++++++++++++++++++++++++-----
 fs/pidfs.c               |  1 +
 include/linux/coredump.h |  1 +
 kernel/fork.c            |  2 +-
 4 files changed, 65 insertions(+), 7 deletions(-)
---
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
change-id: 20250413-work-coredump-0f7fa7e6414c


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/3] pidfs: move O_RDWR into pidfs_alloc_file()
  2025-04-14 13:55 [PATCH v2 0/3] coredump: hand a pidfd to the usermode coredump helper Christian Brauner
@ 2025-04-14 13:55 ` Christian Brauner
  2025-04-14 13:55 ` [PATCH v2 2/3] coredump: fix error handling for replace_fd() Christian Brauner
  2025-04-14 13:55 ` [PATCH v2 3/3] coredump: hand a pidfd to the usermode coredump helper Christian Brauner
  2 siblings, 0 replies; 13+ messages in thread
From: Christian Brauner @ 2025-04-14 13:55 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Oleg Nesterov, Luca Boccassi, Lennart Poettering, Daan De Meyer,
	Mike Yuan, Zbigniew Jędrzejewski-Szmek, linux-kernel,
	Christian Brauner

Since all pidfds must be O_RDWR currently enfore that directly in the
file allocation function itself instead of letting callers specify it.

Tested-by: Luca Boccassi <luca.boccassi@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/pidfs.c    | 1 +
 kernel/fork.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/pidfs.c b/fs/pidfs.c
index d64a4cbeb0da..50e69a9e104a 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -888,6 +888,7 @@ struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags)
 		return ERR_PTR(-ESRCH);
 
 	flags &= ~PIDFD_CLONE;
+	flags |= O_RDWR;
 	pidfd_file = dentry_open(&path, flags, current_cred());
 	/* Raise PIDFD_THREAD explicitly as do_dentry_open() strips it. */
 	if (!IS_ERR(pidfd_file))
diff --git a/kernel/fork.c b/kernel/fork.c
index c4b26cd8998b..d184e51196a2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2071,7 +2071,7 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
 	if (pidfd < 0)
 		return pidfd;
 
-	pidfd_file = pidfs_alloc_file(pid, flags | O_RDWR);
+	pidfd_file = pidfs_alloc_file(pid, flags);
 	if (IS_ERR(pidfd_file))
 		return PTR_ERR(pidfd_file);
 

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/3] coredump: fix error handling for replace_fd()
  2025-04-14 13:55 [PATCH v2 0/3] coredump: hand a pidfd to the usermode coredump helper Christian Brauner
  2025-04-14 13:55 ` [PATCH v2 1/3] pidfs: move O_RDWR into pidfs_alloc_file() Christian Brauner
@ 2025-04-14 13:55 ` Christian Brauner
  2025-04-14 13:55 ` [PATCH v2 3/3] coredump: hand a pidfd to the usermode coredump helper Christian Brauner
  2 siblings, 0 replies; 13+ messages in thread
From: Christian Brauner @ 2025-04-14 13:55 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Oleg Nesterov, Luca Boccassi, Lennart Poettering, Daan De Meyer,
	Mike Yuan, Zbigniew Jędrzejewski-Szmek, linux-kernel,
	Christian Brauner

The replace_fd() helper returns the file descriptor number on success
and a negative error code on failure. The current error handling in
umh_pipe_setup() only works because the file descriptor that is replaced
is zero but that's pretty volatile. Explicitly check for a negative
error code.

Tested-by: Luca Boccassi <luca.boccassi@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/coredump.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index c33c177a701b..9da592aa8f16 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -507,7 +507,9 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
 {
 	struct file *files[2];
 	struct coredump_params *cp = (struct coredump_params *)info->data;
-	int err = create_pipe_files(files, 0);
+	int err;
+
+	err = create_pipe_files(files, 0);
 	if (err)
 		return err;
 
@@ -515,10 +517,13 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
 
 	err = replace_fd(0, files[0], 0);
 	fput(files[0]);
+	if (err < 0)
+		return err;
+
 	/* and disallow core files too */
 	current->signal->rlim[RLIMIT_CORE] = (struct rlimit){1, 1};
 
-	return err;
+	return 0;
 }
 
 void do_coredump(const kernel_siginfo_t *siginfo)

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/3] coredump: hand a pidfd to the usermode coredump helper
  2025-04-14 13:55 [PATCH v2 0/3] coredump: hand a pidfd to the usermode coredump helper Christian Brauner
  2025-04-14 13:55 ` [PATCH v2 1/3] pidfs: move O_RDWR into pidfs_alloc_file() Christian Brauner
  2025-04-14 13:55 ` [PATCH v2 2/3] coredump: fix error handling for replace_fd() Christian Brauner
@ 2025-04-14 13:55 ` Christian Brauner
  2025-04-14 14:14   ` Oleg Nesterov
  2025-04-25 11:31   ` Benjamin Drung
  2 siblings, 2 replies; 13+ messages in thread
From: Christian Brauner @ 2025-04-14 13:55 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Oleg Nesterov, Luca Boccassi, Lennart Poettering, Daan De Meyer,
	Mike Yuan, Zbigniew Jędrzejewski-Szmek, linux-kernel,
	Christian Brauner

Give userspace a way to instruct the kernel to install a pidfd into the
usermode helper process. This makes coredump handling a lot more
reliable for userspace. In parallel with this commit we already have
systemd adding support for this in [1].

We create a pidfs file for the coredumping process when we process the
corename pattern. When the usermode helper process is forked we then
install the pidfs file as file descriptor three into the usermode
helpers file descriptor table so it's available to the exec'd program.

Since usermode helpers are either children of the system_unbound_wq
workqueue or kthreadd we know that the file descriptor table is empty
and can thus always use three as the file descriptor number.

Note, that we'll install a pidfd for the thread-group leader even if a
subthread is calling do_coredump(). We know that task linkage hasn't
been removed due to delay_group_leader() and even if this @current isn't
the actual thread-group leader we know that the thread-group leader
cannot be reaped until @current has exited.

Link: https://github.com/systemd/systemd/pull/37125 [1]
Tested-by: Luca Boccassi <luca.boccassi@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/coredump.c            | 59 ++++++++++++++++++++++++++++++++++++++++++++----
 include/linux/coredump.h |  1 +
 2 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 9da592aa8f16..403be0ff780e 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -43,6 +43,9 @@
 #include <linux/timekeeping.h>
 #include <linux/sysctl.h>
 #include <linux/elf.h>
+#include <linux/pidfs.h>
+#include <uapi/linux/pidfd.h>
+#include <linux/vfsdebug.h>
 
 #include <linux/uaccess.h>
 #include <asm/mmu_context.h>
@@ -60,6 +63,12 @@ static void free_vma_snapshot(struct coredump_params *cprm);
 #define CORE_FILE_NOTE_SIZE_DEFAULT (4*1024*1024)
 /* Define a reasonable max cap */
 #define CORE_FILE_NOTE_SIZE_MAX (16*1024*1024)
+/*
+ * File descriptor number for the pidfd for the thread-group leader of
+ * the coredumping task installed into the usermode helper's file
+ * descriptor table.
+ */
+#define COREDUMP_PIDFD_NUMBER 3
 
 static int core_uses_pid;
 static unsigned int core_pipe_limit;
@@ -339,6 +348,27 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
 			case 'C':
 				err = cn_printf(cn, "%d", cprm->cpu);
 				break;
+			/* pidfd number */
+			case 'F': {
+				/*
+				 * Installing a pidfd only makes sense if
+				 * we actually spawn a usermode helper.
+				 */
+				if (!ispipe)
+					break;
+
+				/*
+				 * Note that we'll install a pidfd for the
+				 * thread-group leader. We know that task
+				 * linkage hasn't been removed yet and even if
+				 * this @current isn't the actual thread-group
+				 * leader we know that the thread-group leader
+				 * cannot be reaped until @current has exited.
+				 */
+				cprm->pid = task_tgid(current);
+				err = cn_printf(cn, "%d", COREDUMP_PIDFD_NUMBER);
+				break;
+			}
 			default:
 				break;
 			}
@@ -493,7 +523,7 @@ static void wait_for_dump_helpers(struct file *file)
 }
 
 /*
- * umh_pipe_setup
+ * umh_coredump_setup
  * helper function to customize the process used
  * to collect the core in userspace.  Specifically
  * it sets up a pipe and installs it as fd 0 (stdin)
@@ -503,12 +533,33 @@ static void wait_for_dump_helpers(struct file *file)
  * is a special value that we use to trap recursive
  * core dumps
  */
-static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
+static int umh_coredump_setup(struct subprocess_info *info, struct cred *new)
 {
 	struct file *files[2];
 	struct coredump_params *cp = (struct coredump_params *)info->data;
 	int err;
 
+	if (cp->pid) {
+		struct file *pidfs_file __free(fput) = NULL;
+
+		pidfs_file = pidfs_alloc_file(cp->pid, 0);
+		if (IS_ERR(pidfs_file))
+			return PTR_ERR(pidfs_file);
+
+		/*
+		 * Usermode helpers are childen of either
+		 * system_unbound_wq or of kthreadd. So we know that
+		 * we're starting off with a clean file descriptor
+		 * table. So we should always be able to use
+		 * COREDUMP_PIDFD_NUMBER as our file descriptor value.
+		 */
+		VFS_WARN_ON_ONCE((pidfs_file = fget_raw(COREDUMP_PIDFD_NUMBER)) != NULL);
+
+		err = replace_fd(COREDUMP_PIDFD_NUMBER, pidfs_file, 0);
+		if (err < 0)
+			return err;
+	}
+
 	err = create_pipe_files(files, 0);
 	if (err)
 		return err;
@@ -598,7 +649,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 		}
 
 		if (cprm.limit == 1) {
-			/* See umh_pipe_setup() which sets RLIMIT_CORE = 1.
+			/* See umh_coredump_setup() which sets RLIMIT_CORE = 1.
 			 *
 			 * Normally core limits are irrelevant to pipes, since
 			 * we're not writing to the file system, but we use
@@ -637,7 +688,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 		retval = -ENOMEM;
 		sub_info = call_usermodehelper_setup(helper_argv[0],
 						helper_argv, NULL, GFP_KERNEL,
-						umh_pipe_setup, NULL, &cprm);
+						umh_coredump_setup, NULL, &cprm);
 		if (sub_info)
 			retval = call_usermodehelper_exec(sub_info,
 							  UMH_WAIT_EXEC);
diff --git a/include/linux/coredump.h b/include/linux/coredump.h
index 77e6e195d1d6..76e41805b92d 100644
--- a/include/linux/coredump.h
+++ b/include/linux/coredump.h
@@ -28,6 +28,7 @@ struct coredump_params {
 	int vma_count;
 	size_t vma_data_size;
 	struct core_vma_metadata *vma_meta;
+	struct pid *pid;
 };
 
 extern unsigned int core_file_note_size_limit;

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/3] coredump: hand a pidfd to the usermode coredump helper
  2025-04-14 13:55 ` [PATCH v2 3/3] coredump: hand a pidfd to the usermode coredump helper Christian Brauner
@ 2025-04-14 14:14   ` Oleg Nesterov
  2025-04-14 14:26     ` Christian Brauner
  2025-04-14 14:28     ` Oleg Nesterov
  2025-04-25 11:31   ` Benjamin Drung
  1 sibling, 2 replies; 13+ messages in thread
From: Oleg Nesterov @ 2025-04-14 14:14 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Luca Boccassi, Lennart Poettering, Daan De Meyer,
	Mike Yuan, Zbigniew Jędrzejewski-Szmek, linux-kernel

On 04/14, Christian Brauner wrote:
>
> -static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
> +static int umh_coredump_setup(struct subprocess_info *info, struct cred *new)
>  {
>  	struct file *files[2];
>  	struct coredump_params *cp = (struct coredump_params *)info->data;
>  	int err;
>
> +	if (cp->pid) {
> +		struct file *pidfs_file __free(fput) = NULL;
> +
> +		pidfs_file = pidfs_alloc_file(cp->pid, 0);
> +		if (IS_ERR(pidfs_file))
> +			return PTR_ERR(pidfs_file);
> +
> +		/*
> +		 * Usermode helpers are childen of either
> +		 * system_unbound_wq or of kthreadd. So we know that
> +		 * we're starting off with a clean file descriptor
> +		 * table. So we should always be able to use
> +		 * COREDUMP_PIDFD_NUMBER as our file descriptor value.
> +		 */
> +		VFS_WARN_ON_ONCE((pidfs_file = fget_raw(COREDUMP_PIDFD_NUMBER)) != NULL);
> +
> +		err = replace_fd(COREDUMP_PIDFD_NUMBER, pidfs_file, 0);
> +		if (err < 0)
> +			return err;

Yes, but if replace_fd() succeeds we need to nullify pidfs_file
to avoid fput from __free(fput) ?

And I think in this case __free(fput) doesn't buy too much, but
up to you.

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/3] coredump: hand a pidfd to the usermode coredump helper
  2025-04-14 14:14   ` Oleg Nesterov
@ 2025-04-14 14:26     ` Christian Brauner
  2025-04-14 14:28     ` Oleg Nesterov
  1 sibling, 0 replies; 13+ messages in thread
From: Christian Brauner @ 2025-04-14 14:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-fsdevel, Luca Boccassi, Lennart Poettering, Daan De Meyer,
	Mike Yuan, Zbigniew Jędrzejewski-Szmek, linux-kernel

On Mon, Apr 14, 2025 at 04:14:50PM +0200, Oleg Nesterov wrote:
> On 04/14, Christian Brauner wrote:
> >
> > -static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
> > +static int umh_coredump_setup(struct subprocess_info *info, struct cred *new)
> >  {
> >  	struct file *files[2];
> >  	struct coredump_params *cp = (struct coredump_params *)info->data;
> >  	int err;
> >
> > +	if (cp->pid) {
> > +		struct file *pidfs_file __free(fput) = NULL;
> > +
> > +		pidfs_file = pidfs_alloc_file(cp->pid, 0);
> > +		if (IS_ERR(pidfs_file))
> > +			return PTR_ERR(pidfs_file);
> > +
> > +		/*
> > +		 * Usermode helpers are childen of either
> > +		 * system_unbound_wq or of kthreadd. So we know that
> > +		 * we're starting off with a clean file descriptor
> > +		 * table. So we should always be able to use
> > +		 * COREDUMP_PIDFD_NUMBER as our file descriptor value.
> > +		 */
> > +		VFS_WARN_ON_ONCE((pidfs_file = fget_raw(COREDUMP_PIDFD_NUMBER)) != NULL);
> > +
> > +		err = replace_fd(COREDUMP_PIDFD_NUMBER, pidfs_file, 0);
> > +		if (err < 0)
> > +			return err;
> 
> Yes, but if replace_fd() succeeds we need to nullify pidfs_file
> to avoid fput from __free(fput) ?

No, since replace_fd() takes its own reference via do_dup2():

replace_fd()
-> do_dup2()
   {
	get_file(file)
	rcu_assign_pointer(fdt->fd[fd], file);
   }

so we always need to call it. I had a comment about this in the previous
patchset so people don't get confused. I can add it back.

Let me know if you're happy with this otherwise.

> 
> And I think in this case __free(fput) doesn't buy too much, but
> up to you.
> 
> Oleg.
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/3] coredump: hand a pidfd to the usermode coredump helper
  2025-04-14 14:14   ` Oleg Nesterov
  2025-04-14 14:26     ` Christian Brauner
@ 2025-04-14 14:28     ` Oleg Nesterov
  2025-04-14 14:41       ` Christian Brauner
  1 sibling, 1 reply; 13+ messages in thread
From: Oleg Nesterov @ 2025-04-14 14:28 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Luca Boccassi, Lennart Poettering, Daan De Meyer,
	Mike Yuan, Zbigniew Jędrzejewski-Szmek, linux-kernel

On 04/14, Oleg Nesterov wrote:
>
> On 04/14, Christian Brauner wrote:
> >
> > -static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
> > +static int umh_coredump_setup(struct subprocess_info *info, struct cred *new)
> >  {
> >  	struct file *files[2];
> >  	struct coredump_params *cp = (struct coredump_params *)info->data;
> >  	int err;
> >
> > +	if (cp->pid) {
> > +		struct file *pidfs_file __free(fput) = NULL;
> > +
> > +		pidfs_file = pidfs_alloc_file(cp->pid, 0);
> > +		if (IS_ERR(pidfs_file))
> > +			return PTR_ERR(pidfs_file);
> > +
> > +		/*
> > +		 * Usermode helpers are childen of either
> > +		 * system_unbound_wq or of kthreadd. So we know that
> > +		 * we're starting off with a clean file descriptor
> > +		 * table. So we should always be able to use
> > +		 * COREDUMP_PIDFD_NUMBER as our file descriptor value.
> > +		 */
> > +		VFS_WARN_ON_ONCE((pidfs_file = fget_raw(COREDUMP_PIDFD_NUMBER)) != NULL);
> > +
> > +		err = replace_fd(COREDUMP_PIDFD_NUMBER, pidfs_file, 0);
> > +		if (err < 0)
> > +			return err;
>
> Yes, but if replace_fd() succeeds we need to nullify pidfs_file
> to avoid fput from __free(fput) ?

Aah, please ignore me ;) replace_fd/do_dup2 does get_file() .

For this series:

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/3] coredump: hand a pidfd to the usermode coredump helper
  2025-04-14 14:28     ` Oleg Nesterov
@ 2025-04-14 14:41       ` Christian Brauner
  0 siblings, 0 replies; 13+ messages in thread
From: Christian Brauner @ 2025-04-14 14:41 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-fsdevel, Luca Boccassi, Lennart Poettering, Daan De Meyer,
	Mike Yuan, Zbigniew Jędrzejewski-Szmek, linux-kernel

On Mon, Apr 14, 2025 at 04:28:07PM +0200, Oleg Nesterov wrote:
> On 04/14, Oleg Nesterov wrote:
> >
> > On 04/14, Christian Brauner wrote:
> > >
> > > -static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
> > > +static int umh_coredump_setup(struct subprocess_info *info, struct cred *new)
> > >  {
> > >  	struct file *files[2];
> > >  	struct coredump_params *cp = (struct coredump_params *)info->data;
> > >  	int err;
> > >
> > > +	if (cp->pid) {
> > > +		struct file *pidfs_file __free(fput) = NULL;
> > > +
> > > +		pidfs_file = pidfs_alloc_file(cp->pid, 0);
> > > +		if (IS_ERR(pidfs_file))
> > > +			return PTR_ERR(pidfs_file);
> > > +
> > > +		/*
> > > +		 * Usermode helpers are childen of either
> > > +		 * system_unbound_wq or of kthreadd. So we know that
> > > +		 * we're starting off with a clean file descriptor
> > > +		 * table. So we should always be able to use
> > > +		 * COREDUMP_PIDFD_NUMBER as our file descriptor value.
> > > +		 */
> > > +		VFS_WARN_ON_ONCE((pidfs_file = fget_raw(COREDUMP_PIDFD_NUMBER)) != NULL);
> > > +
> > > +		err = replace_fd(COREDUMP_PIDFD_NUMBER, pidfs_file, 0);
> > > +		if (err < 0)
> > > +			return err;
> >
> > Yes, but if replace_fd() succeeds we need to nullify pidfs_file
> > to avoid fput from __free(fput) ?
> 
> Aah, please ignore me ;) replace_fd/do_dup2 does get_file() .
> 
> For this series:

Thanks for the excellent review as usual!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/3] coredump: hand a pidfd to the usermode coredump helper
  2025-04-14 13:55 ` [PATCH v2 3/3] coredump: hand a pidfd to the usermode coredump helper Christian Brauner
  2025-04-14 14:14   ` Oleg Nesterov
@ 2025-04-25 11:31   ` Benjamin Drung
  2025-04-25 11:57     ` Christian Brauner
  1 sibling, 1 reply; 13+ messages in thread
From: Benjamin Drung @ 2025-04-25 11:31 UTC (permalink / raw)
  To: Christian Brauner, linux-fsdevel
  Cc: Oleg Nesterov, Luca Boccassi, Lennart Poettering, Daan De Meyer,
	Mike Yuan, Zbigniew Jędrzejewski-Szmek, linux-kernel

Hi,

On Mon, 2025-04-14 at 15:55 +0200, Christian Brauner wrote:
> Give userspace a way to instruct the kernel to install a pidfd into the
> usermode helper process. This makes coredump handling a lot more
> reliable for userspace. In parallel with this commit we already have
> systemd adding support for this in [1].
> 
> We create a pidfs file for the coredumping process when we process the
> corename pattern. When the usermode helper process is forked we then
> install the pidfs file as file descriptor three into the usermode
> helpers file descriptor table so it's available to the exec'd program.
> 
> Since usermode helpers are either children of the system_unbound_wq
> workqueue or kthreadd we know that the file descriptor table is empty
> and can thus always use three as the file descriptor number.
> 
> Note, that we'll install a pidfd for the thread-group leader even if a
> subthread is calling do_coredump(). We know that task linkage hasn't
> been removed due to delay_group_leader() and even if this @current isn't
> the actual thread-group leader we know that the thread-group leader
> cannot be reaped until @current has exited.
> 
> Link: https://github.com/systemd/systemd/pull/37125 [1]
> Tested-by: Luca Boccassi <luca.boccassi@gmail.com>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> ---
>  fs/coredump.c            | 59 ++++++++++++++++++++++++++++++++++++++++++++----
>  include/linux/coredump.h |  1 +
>  2 files changed, 56 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 9da592aa8f16..403be0ff780e 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -43,6 +43,9 @@
>  #include <linux/timekeeping.h>
>  #include <linux/sysctl.h>
>  #include <linux/elf.h>
> +#include <linux/pidfs.h>
> +#include <uapi/linux/pidfd.h>
> +#include <linux/vfsdebug.h>
>  
>  #include <linux/uaccess.h>
>  #include <asm/mmu_context.h>
> @@ -60,6 +63,12 @@ static void free_vma_snapshot(struct coredump_params *cprm);
>  #define CORE_FILE_NOTE_SIZE_DEFAULT (4*1024*1024)
>  /* Define a reasonable max cap */
>  #define CORE_FILE_NOTE_SIZE_MAX (16*1024*1024)
> +/*
> + * File descriptor number for the pidfd for the thread-group leader of
> + * the coredumping task installed into the usermode helper's file
> + * descriptor table.
> + */
> +#define COREDUMP_PIDFD_NUMBER 3
>  
>  static int core_uses_pid;
>  static unsigned int core_pipe_limit;
> @@ -339,6 +348,27 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
>  			case 'C':
>  				err = cn_printf(cn, "%d", cprm->cpu);
>  				break;
> +			/* pidfd number */
> +			case 'F': {
> +				/*
> +				 * Installing a pidfd only makes sense if
> +				 * we actually spawn a usermode helper.
> +				 */
> +				if (!ispipe)
> +					break;
> +
> +				/*
> +				 * Note that we'll install a pidfd for the
> +				 * thread-group leader. We know that task
> +				 * linkage hasn't been removed yet and even if
> +				 * this @current isn't the actual thread-group
> +				 * leader we know that the thread-group leader
> +				 * cannot be reaped until @current has exited.
> +				 */
> +				cprm->pid = task_tgid(current);
> +				err = cn_printf(cn, "%d", COREDUMP_PIDFD_NUMBER);
> +				break;
> +			}
>  			default:
>  				break;
>  			}
> 

I tried this change with Apport: I took the Ubuntu mainline kernel build
https://kernel.ubuntu.com/mainline/daily/2025-04-24/ (that refers to
mainline commit e54f9b0410347c49b7ffdd495578811e70d7a407) and applied
these three patches on top. Then I modified Apport to take the
additional `-F%F` and tested that on Ubuntu 25.04 (plucky). The result
is the coredump failed as long as there was `-F%F` on
/proc/sys/kernel/core_pattern:

```
coredump: 7392(divide-by-zero): |/usr/share/apport/apport pipe failed
```

Did I do something wrong? Do I miss additional patches?

-- 
Benjamin Drung
Debian & Ubuntu Developer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/3] coredump: hand a pidfd to the usermode coredump helper
  2025-04-25 11:31   ` Benjamin Drung
@ 2025-04-25 11:57     ` Christian Brauner
  2025-04-25 12:03       ` Benjamin Drung
  0 siblings, 1 reply; 13+ messages in thread
From: Christian Brauner @ 2025-04-25 11:57 UTC (permalink / raw)
  To: Benjamin Drung
  Cc: linux-fsdevel, Oleg Nesterov, Luca Boccassi, Lennart Poettering,
	Daan De Meyer, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	linux-kernel

On Fri, Apr 25, 2025 at 01:31:56PM +0200, Benjamin Drung wrote:
> Hi,
> 
> On Mon, 2025-04-14 at 15:55 +0200, Christian Brauner wrote:
> > Give userspace a way to instruct the kernel to install a pidfd into the
> > usermode helper process. This makes coredump handling a lot more
> > reliable for userspace. In parallel with this commit we already have
> > systemd adding support for this in [1].
> > 
> > We create a pidfs file for the coredumping process when we process the
> > corename pattern. When the usermode helper process is forked we then
> > install the pidfs file as file descriptor three into the usermode
> > helpers file descriptor table so it's available to the exec'd program.
> > 
> > Since usermode helpers are either children of the system_unbound_wq
> > workqueue or kthreadd we know that the file descriptor table is empty
> > and can thus always use three as the file descriptor number.
> > 
> > Note, that we'll install a pidfd for the thread-group leader even if a
> > subthread is calling do_coredump(). We know that task linkage hasn't
> > been removed due to delay_group_leader() and even if this @current isn't
> > the actual thread-group leader we know that the thread-group leader
> > cannot be reaped until @current has exited.
> > 
> > Link: https://github.com/systemd/systemd/pull/37125 [1]
> > Tested-by: Luca Boccassi <luca.boccassi@gmail.com>
> > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > ---
> >  fs/coredump.c            | 59 ++++++++++++++++++++++++++++++++++++++++++++----
> >  include/linux/coredump.h |  1 +
> >  2 files changed, 56 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/coredump.c b/fs/coredump.c
> > index 9da592aa8f16..403be0ff780e 100644
> > --- a/fs/coredump.c
> > +++ b/fs/coredump.c
> > @@ -43,6 +43,9 @@
> >  #include <linux/timekeeping.h>
> >  #include <linux/sysctl.h>
> >  #include <linux/elf.h>
> > +#include <linux/pidfs.h>
> > +#include <uapi/linux/pidfd.h>
> > +#include <linux/vfsdebug.h>
> >  
> >  #include <linux/uaccess.h>
> >  #include <asm/mmu_context.h>
> > @@ -60,6 +63,12 @@ static void free_vma_snapshot(struct coredump_params *cprm);
> >  #define CORE_FILE_NOTE_SIZE_DEFAULT (4*1024*1024)
> >  /* Define a reasonable max cap */
> >  #define CORE_FILE_NOTE_SIZE_MAX (16*1024*1024)
> > +/*
> > + * File descriptor number for the pidfd for the thread-group leader of
> > + * the coredumping task installed into the usermode helper's file
> > + * descriptor table.
> > + */
> > +#define COREDUMP_PIDFD_NUMBER 3
> >  
> >  static int core_uses_pid;
> >  static unsigned int core_pipe_limit;
> > @@ -339,6 +348,27 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
> >  			case 'C':
> >  				err = cn_printf(cn, "%d", cprm->cpu);
> >  				break;
> > +			/* pidfd number */
> > +			case 'F': {
> > +				/*
> > +				 * Installing a pidfd only makes sense if
> > +				 * we actually spawn a usermode helper.
> > +				 */
> > +				if (!ispipe)
> > +					break;
> > +
> > +				/*
> > +				 * Note that we'll install a pidfd for the
> > +				 * thread-group leader. We know that task
> > +				 * linkage hasn't been removed yet and even if
> > +				 * this @current isn't the actual thread-group
> > +				 * leader we know that the thread-group leader
> > +				 * cannot be reaped until @current has exited.
> > +				 */
> > +				cprm->pid = task_tgid(current);
> > +				err = cn_printf(cn, "%d", COREDUMP_PIDFD_NUMBER);
> > +				break;
> > +			}
> >  			default:
> >  				break;
> >  			}
> > 
> 
> I tried this change with Apport: I took the Ubuntu mainline kernel build
> https://kernel.ubuntu.com/mainline/daily/2025-04-24/ (that refers to
> mainline commit e54f9b0410347c49b7ffdd495578811e70d7a407) and applied
> these three patches on top. Then I modified Apport to take the
> additional `-F%F` and tested that on Ubuntu 25.04 (plucky). The result
> is the coredump failed as long as there was `-F%F` on

I have no clue what -F%F is and whether that leading -F is something
specific to Apport but the specifier is %F not -F%F. For example:

        > cat /proc/sys/kernel/core_pattern
        |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %F

And note that this requires the pipe logic to be used, aka "|" needs to
be specified. Without it this doesn't make sense.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/3] coredump: hand a pidfd to the usermode coredump helper
  2025-04-25 11:57     ` Christian Brauner
@ 2025-04-25 12:03       ` Benjamin Drung
  2025-04-25 16:49         ` Christian Brauner
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Drung @ 2025-04-25 12:03 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Oleg Nesterov, Luca Boccassi, Lennart Poettering,
	Daan De Meyer, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	linux-kernel

On Fri, 2025-04-25 at 13:57 +0200, Christian Brauner wrote:
> On Fri, Apr 25, 2025 at 01:31:56PM +0200, Benjamin Drung wrote:
> > Hi,
> > 
> > On Mon, 2025-04-14 at 15:55 +0200, Christian Brauner wrote:
> > > Give userspace a way to instruct the kernel to install a pidfd into the
> > > usermode helper process. This makes coredump handling a lot more
> > > reliable for userspace. In parallel with this commit we already have
> > > systemd adding support for this in [1].
> > > 
> > > We create a pidfs file for the coredumping process when we process the
> > > corename pattern. When the usermode helper process is forked we then
> > > install the pidfs file as file descriptor three into the usermode
> > > helpers file descriptor table so it's available to the exec'd program.
> > > 
> > > Since usermode helpers are either children of the system_unbound_wq
> > > workqueue or kthreadd we know that the file descriptor table is empty
> > > and can thus always use three as the file descriptor number.
> > > 
> > > Note, that we'll install a pidfd for the thread-group leader even if a
> > > subthread is calling do_coredump(). We know that task linkage hasn't
> > > been removed due to delay_group_leader() and even if this @current isn't
> > > the actual thread-group leader we know that the thread-group leader
> > > cannot be reaped until @current has exited.
> > > 
> > > Link: https://github.com/systemd/systemd/pull/37125 [1]
> > > Tested-by: Luca Boccassi <luca.boccassi@gmail.com>
> > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > > ---
> > >  fs/coredump.c            | 59 ++++++++++++++++++++++++++++++++++++++++++++----
> > >  include/linux/coredump.h |  1 +
> > >  2 files changed, 56 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/fs/coredump.c b/fs/coredump.c
> > > index 9da592aa8f16..403be0ff780e 100644
> > > --- a/fs/coredump.c
> > > +++ b/fs/coredump.c
> > > @@ -43,6 +43,9 @@
> > >  #include <linux/timekeeping.h>
> > >  #include <linux/sysctl.h>
> > >  #include <linux/elf.h>
> > > +#include <linux/pidfs.h>
> > > +#include <uapi/linux/pidfd.h>
> > > +#include <linux/vfsdebug.h>
> > >  
> > >  #include <linux/uaccess.h>
> > >  #include <asm/mmu_context.h>
> > > @@ -60,6 +63,12 @@ static void free_vma_snapshot(struct coredump_params *cprm);
> > >  #define CORE_FILE_NOTE_SIZE_DEFAULT (4*1024*1024)
> > >  /* Define a reasonable max cap */
> > >  #define CORE_FILE_NOTE_SIZE_MAX (16*1024*1024)
> > > +/*
> > > + * File descriptor number for the pidfd for the thread-group leader of
> > > + * the coredumping task installed into the usermode helper's file
> > > + * descriptor table.
> > > + */
> > > +#define COREDUMP_PIDFD_NUMBER 3
> > >  
> > >  static int core_uses_pid;
> > >  static unsigned int core_pipe_limit;
> > > @@ -339,6 +348,27 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
> > >  			case 'C':
> > >  				err = cn_printf(cn, "%d", cprm->cpu);
> > >  				break;
> > > +			/* pidfd number */
> > > +			case 'F': {
> > > +				/*
> > > +				 * Installing a pidfd only makes sense if
> > > +				 * we actually spawn a usermode helper.
> > > +				 */
> > > +				if (!ispipe)
> > > +					break;
> > > +
> > > +				/*
> > > +				 * Note that we'll install a pidfd for the
> > > +				 * thread-group leader. We know that task
> > > +				 * linkage hasn't been removed yet and even if
> > > +				 * this @current isn't the actual thread-group
> > > +				 * leader we know that the thread-group leader
> > > +				 * cannot be reaped until @current has exited.
> > > +				 */
> > > +				cprm->pid = task_tgid(current);
> > > +				err = cn_printf(cn, "%d", COREDUMP_PIDFD_NUMBER);
> > > +				break;
> > > +			}
> > >  			default:
> > >  				break;
> > >  			}
> > > 
> > 
> > I tried this change with Apport: I took the Ubuntu mainline kernel build
> > https://kernel.ubuntu.com/mainline/daily/2025-04-24/ (that refers to
> > mainline commit e54f9b0410347c49b7ffdd495578811e70d7a407) and applied
> > these three patches on top. Then I modified Apport to take the
> > additional `-F%F` and tested that on Ubuntu 25.04 (plucky). The result
> > is the coredump failed as long as there was `-F%F` on
> 
> I have no clue what -F%F is and whether that leading -F is something
> specific to Apport but the specifier is %F not -F%F. For example:
> 
>         > cat /proc/sys/kernel/core_pattern
>         |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %F
> 
> And note that this requires the pipe logic to be used, aka "|" needs to
> be specified. Without it this doesn't make sense.

Apport takes short option parameters. They match the kernel template
specifiers. The failing pattern:

$ cat /proc/sys/kernel/core_pattern 
|/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -F%F -- %E

Once I drop %F Apport is called without issues:

$ cat /proc/sys/kernel/core_pattern 
|/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E

-- 
Benjamin Drung
Debian & Ubuntu Developer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/3] coredump: hand a pidfd to the usermode coredump helper
  2025-04-25 12:03       ` Benjamin Drung
@ 2025-04-25 16:49         ` Christian Brauner
  2025-04-30 11:39           ` Benjamin Drung
  0 siblings, 1 reply; 13+ messages in thread
From: Christian Brauner @ 2025-04-25 16:49 UTC (permalink / raw)
  To: Benjamin Drung
  Cc: linux-fsdevel, Oleg Nesterov, Luca Boccassi, Lennart Poettering,
	Daan De Meyer, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	linux-kernel

On Fri, Apr 25, 2025 at 02:03:34PM +0200, Benjamin Drung wrote:
> On Fri, 2025-04-25 at 13:57 +0200, Christian Brauner wrote:
> > On Fri, Apr 25, 2025 at 01:31:56PM +0200, Benjamin Drung wrote:
> > > Hi,
> > > 
> > > On Mon, 2025-04-14 at 15:55 +0200, Christian Brauner wrote:
> > > > Give userspace a way to instruct the kernel to install a pidfd into the
> > > > usermode helper process. This makes coredump handling a lot more
> > > > reliable for userspace. In parallel with this commit we already have
> > > > systemd adding support for this in [1].
> > > > 
> > > > We create a pidfs file for the coredumping process when we process the
> > > > corename pattern. When the usermode helper process is forked we then
> > > > install the pidfs file as file descriptor three into the usermode
> > > > helpers file descriptor table so it's available to the exec'd program.
> > > > 
> > > > Since usermode helpers are either children of the system_unbound_wq
> > > > workqueue or kthreadd we know that the file descriptor table is empty
> > > > and can thus always use three as the file descriptor number.
> > > > 
> > > > Note, that we'll install a pidfd for the thread-group leader even if a
> > > > subthread is calling do_coredump(). We know that task linkage hasn't
> > > > been removed due to delay_group_leader() and even if this @current isn't
> > > > the actual thread-group leader we know that the thread-group leader
> > > > cannot be reaped until @current has exited.
> > > > 
> > > > Link: https://github.com/systemd/systemd/pull/37125 [1]
> > > > Tested-by: Luca Boccassi <luca.boccassi@gmail.com>
> > > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > > > ---
> > > >  fs/coredump.c            | 59 ++++++++++++++++++++++++++++++++++++++++++++----
> > > >  include/linux/coredump.h |  1 +
> > > >  2 files changed, 56 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/fs/coredump.c b/fs/coredump.c
> > > > index 9da592aa8f16..403be0ff780e 100644
> > > > --- a/fs/coredump.c
> > > > +++ b/fs/coredump.c
> > > > @@ -43,6 +43,9 @@
> > > >  #include <linux/timekeeping.h>
> > > >  #include <linux/sysctl.h>
> > > >  #include <linux/elf.h>
> > > > +#include <linux/pidfs.h>
> > > > +#include <uapi/linux/pidfd.h>
> > > > +#include <linux/vfsdebug.h>
> > > >  
> > > >  #include <linux/uaccess.h>
> > > >  #include <asm/mmu_context.h>
> > > > @@ -60,6 +63,12 @@ static void free_vma_snapshot(struct coredump_params *cprm);
> > > >  #define CORE_FILE_NOTE_SIZE_DEFAULT (4*1024*1024)
> > > >  /* Define a reasonable max cap */
> > > >  #define CORE_FILE_NOTE_SIZE_MAX (16*1024*1024)
> > > > +/*
> > > > + * File descriptor number for the pidfd for the thread-group leader of
> > > > + * the coredumping task installed into the usermode helper's file
> > > > + * descriptor table.
> > > > + */
> > > > +#define COREDUMP_PIDFD_NUMBER 3
> > > >  
> > > >  static int core_uses_pid;
> > > >  static unsigned int core_pipe_limit;
> > > > @@ -339,6 +348,27 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
> > > >  			case 'C':
> > > >  				err = cn_printf(cn, "%d", cprm->cpu);
> > > >  				break;
> > > > +			/* pidfd number */
> > > > +			case 'F': {
> > > > +				/*
> > > > +				 * Installing a pidfd only makes sense if
> > > > +				 * we actually spawn a usermode helper.
> > > > +				 */
> > > > +				if (!ispipe)
> > > > +					break;
> > > > +
> > > > +				/*
> > > > +				 * Note that we'll install a pidfd for the
> > > > +				 * thread-group leader. We know that task
> > > > +				 * linkage hasn't been removed yet and even if
> > > > +				 * this @current isn't the actual thread-group
> > > > +				 * leader we know that the thread-group leader
> > > > +				 * cannot be reaped until @current has exited.
> > > > +				 */
> > > > +				cprm->pid = task_tgid(current);
> > > > +				err = cn_printf(cn, "%d", COREDUMP_PIDFD_NUMBER);
> > > > +				break;
> > > > +			}
> > > >  			default:
> > > >  				break;
> > > >  			}
> > > > 
> > > 
> > > I tried this change with Apport: I took the Ubuntu mainline kernel build
> > > https://kernel.ubuntu.com/mainline/daily/2025-04-24/ (that refers to
> > > mainline commit e54f9b0410347c49b7ffdd495578811e70d7a407) and applied
> > > these three patches on top. Then I modified Apport to take the
> > > additional `-F%F` and tested that on Ubuntu 25.04 (plucky). The result
> > > is the coredump failed as long as there was `-F%F` on
> > 
> > I have no clue what -F%F is and whether that leading -F is something
> > specific to Apport but the specifier is %F not -F%F. For example:
> > 
> >         > cat /proc/sys/kernel/core_pattern
> >         |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %F
> > 
> > And note that this requires the pipe logic to be used, aka "|" needs to
> > be specified. Without it this doesn't make sense.
> 
> Apport takes short option parameters. They match the kernel template
> specifiers. The failing pattern:
> 
> $ cat /proc/sys/kernel/core_pattern 
> |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -F%F -- %E
> 
> Once I drop %F Apport is called without issues:
> 
> $ cat /proc/sys/kernel/core_pattern 
> |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E

Youm must have CONFIG_DEBUG_VFS=y enabled where we trample the pidfs
file we just allocated. It's a debug only assert. I've removed it now
and pushed it to vfs-6.16.coredump. Can you either try with that or
simply unset CONFIG_DEBUG_VFS and retest.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/3] coredump: hand a pidfd to the usermode coredump helper
  2025-04-25 16:49         ` Christian Brauner
@ 2025-04-30 11:39           ` Benjamin Drung
  0 siblings, 0 replies; 13+ messages in thread
From: Benjamin Drung @ 2025-04-30 11:39 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Oleg Nesterov, Luca Boccassi, Lennart Poettering,
	Daan De Meyer, Mike Yuan, Zbigniew Jędrzejewski-Szmek,
	linux-kernel

On Fri, 2025-04-25 at 18:49 +0200, Christian Brauner wrote:
> On Fri, Apr 25, 2025 at 02:03:34PM +0200, Benjamin Drung wrote:
> > On Fri, 2025-04-25 at 13:57 +0200, Christian Brauner wrote:
> > > On Fri, Apr 25, 2025 at 01:31:56PM +0200, Benjamin Drung wrote:
> > > > Hi,
> > > > 
> > > > On Mon, 2025-04-14 at 15:55 +0200, Christian Brauner wrote:
> > > > > Give userspace a way to instruct the kernel to install a pidfd into the
> > > > > usermode helper process. This makes coredump handling a lot more
> > > > > reliable for userspace. In parallel with this commit we already have
> > > > > systemd adding support for this in [1].
> > > > > 
> > > > > We create a pidfs file for the coredumping process when we process the
> > > > > corename pattern. When the usermode helper process is forked we then
> > > > > install the pidfs file as file descriptor three into the usermode
> > > > > helpers file descriptor table so it's available to the exec'd program.
> > > > > 
> > > > > Since usermode helpers are either children of the system_unbound_wq
> > > > > workqueue or kthreadd we know that the file descriptor table is empty
> > > > > and can thus always use three as the file descriptor number.
> > > > > 
> > > > > Note, that we'll install a pidfd for the thread-group leader even if a
> > > > > subthread is calling do_coredump(). We know that task linkage hasn't
> > > > > been removed due to delay_group_leader() and even if this @current isn't
> > > > > the actual thread-group leader we know that the thread-group leader
> > > > > cannot be reaped until @current has exited.
> > > > > 
> > > > > Link: https://github.com/systemd/systemd/pull/37125 [1]
> > > > > Tested-by: Luca Boccassi <luca.boccassi@gmail.com>
> > > > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > > > > ---
> > > > >  fs/coredump.c            | 59 ++++++++++++++++++++++++++++++++++++++++++++----
> > > > >  include/linux/coredump.h |  1 +
> > > > >  2 files changed, 56 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/fs/coredump.c b/fs/coredump.c
> > > > > index 9da592aa8f16..403be0ff780e 100644
> > > > > --- a/fs/coredump.c
> > > > > +++ b/fs/coredump.c
> > > > > @@ -43,6 +43,9 @@
> > > > >  #include <linux/timekeeping.h>
> > > > >  #include <linux/sysctl.h>
> > > > >  #include <linux/elf.h>
> > > > > +#include <linux/pidfs.h>
> > > > > +#include <uapi/linux/pidfd.h>
> > > > > +#include <linux/vfsdebug.h>
> > > > >  
> > > > >  #include <linux/uaccess.h>
> > > > >  #include <asm/mmu_context.h>
> > > > > @@ -60,6 +63,12 @@ static void free_vma_snapshot(struct coredump_params *cprm);
> > > > >  #define CORE_FILE_NOTE_SIZE_DEFAULT (4*1024*1024)
> > > > >  /* Define a reasonable max cap */
> > > > >  #define CORE_FILE_NOTE_SIZE_MAX (16*1024*1024)
> > > > > +/*
> > > > > + * File descriptor number for the pidfd for the thread-group leader of
> > > > > + * the coredumping task installed into the usermode helper's file
> > > > > + * descriptor table.
> > > > > + */
> > > > > +#define COREDUMP_PIDFD_NUMBER 3
> > > > >  
> > > > >  static int core_uses_pid;
> > > > >  static unsigned int core_pipe_limit;
> > > > > @@ -339,6 +348,27 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
> > > > >  			case 'C':
> > > > >  				err = cn_printf(cn, "%d", cprm->cpu);
> > > > >  				break;
> > > > > +			/* pidfd number */
> > > > > +			case 'F': {
> > > > > +				/*
> > > > > +				 * Installing a pidfd only makes sense if
> > > > > +				 * we actually spawn a usermode helper.
> > > > > +				 */
> > > > > +				if (!ispipe)
> > > > > +					break;
> > > > > +
> > > > > +				/*
> > > > > +				 * Note that we'll install a pidfd for the
> > > > > +				 * thread-group leader. We know that task
> > > > > +				 * linkage hasn't been removed yet and even if
> > > > > +				 * this @current isn't the actual thread-group
> > > > > +				 * leader we know that the thread-group leader
> > > > > +				 * cannot be reaped until @current has exited.
> > > > > +				 */
> > > > > +				cprm->pid = task_tgid(current);
> > > > > +				err = cn_printf(cn, "%d", COREDUMP_PIDFD_NUMBER);
> > > > > +				break;
> > > > > +			}
> > > > >  			default:
> > > > >  				break;
> > > > >  			}
> > > > > 
> > > > 
> > > > I tried this change with Apport: I took the Ubuntu mainline kernel build
> > > > https://kernel.ubuntu.com/mainline/daily/2025-04-24/ (that refers to
> > > > mainline commit e54f9b0410347c49b7ffdd495578811e70d7a407) and applied
> > > > these three patches on top. Then I modified Apport to take the
> > > > additional `-F%F` and tested that on Ubuntu 25.04 (plucky). The result
> > > > is the coredump failed as long as there was `-F%F` on
> > > 
> > > I have no clue what -F%F is and whether that leading -F is something
> > > specific to Apport but the specifier is %F not -F%F. For example:
> > > 
> > >         > cat /proc/sys/kernel/core_pattern
> > >         |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %F
> > > 
> > > And note that this requires the pipe logic to be used, aka "|" needs to
> > > be specified. Without it this doesn't make sense.
> > 
> > Apport takes short option parameters. They match the kernel template
> > specifiers. The failing pattern:
> > 
> > $ cat /proc/sys/kernel/core_pattern 
> > > /usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -F%F -- %E
> > 
> > Once I drop %F Apport is called without issues:
> > 
> > $ cat /proc/sys/kernel/core_pattern 
> > > /usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E
> 
> Youm must have CONFIG_DEBUG_VFS=y enabled where we trample the pidfs
> file we just allocated. It's a debug only assert. I've removed it now
> and pushed it to vfs-6.16.coredump. Can you either try with that or
> simply unset CONFIG_DEBUG_VFS and retest.

Yes, the Ubuntu mainline builds have CONFIG_DEBUG_VFS=y set. I tried
your vfs-6.16.coredump and it works. Thanks.

-- 
Benjamin Drung
Debian & Ubuntu Developer

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-04-30 11:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-14 13:55 [PATCH v2 0/3] coredump: hand a pidfd to the usermode coredump helper Christian Brauner
2025-04-14 13:55 ` [PATCH v2 1/3] pidfs: move O_RDWR into pidfs_alloc_file() Christian Brauner
2025-04-14 13:55 ` [PATCH v2 2/3] coredump: fix error handling for replace_fd() Christian Brauner
2025-04-14 13:55 ` [PATCH v2 3/3] coredump: hand a pidfd to the usermode coredump helper Christian Brauner
2025-04-14 14:14   ` Oleg Nesterov
2025-04-14 14:26     ` Christian Brauner
2025-04-14 14:28     ` Oleg Nesterov
2025-04-14 14:41       ` Christian Brauner
2025-04-25 11:31   ` Benjamin Drung
2025-04-25 11:57     ` Christian Brauner
2025-04-25 12:03       ` Benjamin Drung
2025-04-25 16:49         ` Christian Brauner
2025-04-30 11:39           ` Benjamin Drung

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).