linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/3] coredump: support AF_UNIX sockets
@ 2025-04-30 11:05 Christian Brauner
  2025-04-30 11:05 ` [PATCH RFC 1/3] coredump: massage format_corname() Christian Brauner
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Christian Brauner @ 2025-04-30 11:05 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: David S. Miller, Alexander Viro, Daan De Meyer, David Rheinsberg,
	Eric Dumazet, Jakub Kicinski, Jan Kara, Kuniyuki Iwashima,
	Lennart Poettering, Luca Boccassi, Mike Yuan, Oleg Nesterov,
	Paolo Abeni, Simon Horman, Zbigniew Jędrzejewski-Szmek,
	linux-kernel, netdev, Christian Brauner

Coredumping currently supports two modes:

(1) Dumping directly into a file somewhere on the filesystem.
(2) Dumping into a pipe connected to a usermode helper process
    spawned as a child of the system_unbound_wq or kthreadd.

For simplicity I'm mostly ignoring (1). There's probably still some
users of (1) out there but processing coredumps in this way can be
considered adventurous especially in the face of set*id binaries.

The most common option should be (2) by now. It works by allowing
userspace to put a string into /proc/sys/kernel/core_pattern like:

        |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

The "|" at the beginning indicates to the kernel that a pipe must be
used. The path following the pipe indicator is a path to a binary that
will be spawned as a usermode helper process. Any additional parameters
pass information about the task that is generating the coredump to the
binary that processes the coredump.

In this case systemd-coredump is spawned as a usermode helper. There's
various conceptual consequences of this (non-exhaustive list):

- systemd-coredump is spawned with file descriptor number 0 (stdin)
  to the read-end of the pipe. All other file descriptors are closed.
  That specifically includes 1 (stdout) and 2 (stderr). This has already
  caused bugs because userspace assumed that this cannot happen (Whether
  or not this is a sane assumption is irrelevant.).

- systemd-coredump will be spawned as a child of system_unbound_wq. So
  it is not a child of any userspace process and specifically not a
  child of PID 1 so it cannot be waited upon and is in general a weird
  hybrid upcall.

- systemd-coredump is spawned highly privileged as it is spawned with
  full kernel credentials requiring all kinds of weird privilege
  dropping excercises in userspaces.

This adds another mode:

(3) Dumping into a AF_UNIX socket.

Userspace can set /proc/sys/kernel/core_pattern to:

        :/run/coredump.socket

The ":" at the beginning indicates to the kernel that an AF_UNIX socket
is used to process coredumps. The task generating the coredump simply
connects to the socket and writes the coredump into the socket.

Userspace can get a stable handle on the task generating the coredump by
using the SO_PEERPIDFD socket option. SO_PEERPIDFD uses the thread-group
leader pid stashed during connect(). Even if the task generating the
coredump is a subthread in the thread-group the pidfd of the
thread-group leader is a reliable stable handle. Userspace that's
interested in the credentials of the specific thread that crashed can
use SCM_PIDFD to retrieve them.

The pidfd can be used to safely open and parse /proc/<pid> of the task
and it can also be used to retrieve additional meta information via the
PIDFD_GET_INFO ioctl().

This will allow userspace to not have to rely on usermode helpers for
processing coredumps and thus to stop having to handle super privileged
coredumping helpers.

This is easy to test:

(a) coredump processing (we're using socat):

    > cat coredump_socket.sh
    #!/bin/bash
    
    set -x
    
    sudo bash -c "echo ':/tmp/stream.sock' > /proc/sys/kernel/core_pattern"
    socat --statistics unix-listen:/tmp/stream.sock,fork FILE:core_file,create,append,truncate

(b) trigger a coredump:

    user1@localhost:~/data/scripts$ cat crash.c
    #include <stdio.h>
    #include <unistd.h>
    
    int main(int argc, char *argv[])
    {
            fprintf(stderr, "%u\n", (1 / 0));
            _exit(0);
    }

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
Christian Brauner (3):
      coredump: massage format_corname()
      coredump: massage do_coredump()
      coredump: support AF_UNIX sockets

 fs/coredump.c | 241 ++++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 168 insertions(+), 73 deletions(-)
---
base-commit: 80e14080a00bc429a4ee440d17746a49867df663
change-id: 20250429-work-coredump-socket-87cc0f17729c


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RFC 1/3] coredump: massage format_corname()
  2025-04-30 11:05 [PATCH RFC 0/3] coredump: support AF_UNIX sockets Christian Brauner
@ 2025-04-30 11:05 ` Christian Brauner
  2025-04-30 11:05 ` [PATCH RFC 2/3] coredump: massage do_coredump() Christian Brauner
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Christian Brauner @ 2025-04-30 11:05 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: David S. Miller, Alexander Viro, Daan De Meyer, David Rheinsberg,
	Eric Dumazet, Jakub Kicinski, Jan Kara, Kuniyuki Iwashima,
	Lennart Poettering, Luca Boccassi, Mike Yuan, Oleg Nesterov,
	Paolo Abeni, Simon Horman, Zbigniew Jędrzejewski-Szmek,
	linux-kernel, netdev, Christian Brauner

We're going to extend the coredump code in follow-up patches.
Clean it up so we can do this more easily.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/coredump.c | 41 ++++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index d740a0411266..281320ea351f 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -76,9 +76,15 @@ static char core_pattern[CORENAME_MAX_SIZE] = "core";
 static int core_name_size = CORENAME_MAX_SIZE;
 unsigned int core_file_note_size_limit = CORE_FILE_NOTE_SIZE_DEFAULT;
 
+enum coredump_type_t {
+	COREDUMP_FILE = 1,
+	COREDUMP_PIPE = 2,
+};
+
 struct core_name {
 	char *corename;
 	int used, size;
+	enum coredump_type_t core_type;
 };
 
 static int expand_corename(struct core_name *cn, int size)
@@ -218,18 +224,21 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
 {
 	const struct cred *cred = current_cred();
 	const char *pat_ptr = core_pattern;
-	int ispipe = (*pat_ptr == '|');
 	bool was_space = false;
 	int pid_in_pattern = 0;
 	int err = 0;
 
 	cn->used = 0;
 	cn->corename = NULL;
+	if (*pat_ptr == '|')
+		cn->core_type = COREDUMP_PIPE;
+	else
+		cn->core_type = COREDUMP_FILE;
 	if (expand_corename(cn, core_name_size))
 		return -ENOMEM;
 	cn->corename[0] = '\0';
 
-	if (ispipe) {
+	if (cn->core_type == COREDUMP_PIPE) {
 		int argvs = sizeof(core_pattern) / 2;
 		(*argv) = kmalloc_array(argvs, sizeof(**argv), GFP_KERNEL);
 		if (!(*argv))
@@ -247,7 +256,7 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
 		 * Split on spaces before doing template expansion so that
 		 * %e and %E don't get split if they have spaces in them
 		 */
-		if (ispipe) {
+		if (cn->core_type == COREDUMP_PIPE) {
 			if (isspace(*pat_ptr)) {
 				if (cn->used != 0)
 					was_space = true;
@@ -353,7 +362,7 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
 				 * Installing a pidfd only makes sense if
 				 * we actually spawn a usermode helper.
 				 */
-				if (!ispipe)
+				if (!(cn->core_type != COREDUMP_PIPE))
 					break;
 
 				/*
@@ -384,12 +393,12 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
 	 * If core_pattern does not include a %p (as is the default)
 	 * and core_uses_pid is set, then .%pid will be appended to
 	 * the filename. Do not do this for piped commands. */
-	if (!ispipe && !pid_in_pattern && core_uses_pid) {
+	if (!(cn->core_type == COREDUMP_PIPE) && !pid_in_pattern && core_uses_pid) {
 		err = cn_printf(cn, ".%d", task_tgid_vnr(current));
 		if (err)
 			return err;
 	}
-	return ispipe;
+	return 0;
 }
 
 static int zap_process(struct signal_struct *signal, int exit_code)
@@ -583,7 +592,6 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 	const struct cred *old_cred;
 	struct cred *cred;
 	int retval = 0;
-	int ispipe;
 	size_t *argv = NULL;
 	int argc = 0;
 	/* require nonrelative corefile path and be extra careful */
@@ -632,19 +640,18 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 
 	old_cred = override_creds(cred);
 
-	ispipe = format_corename(&cn, &cprm, &argv, &argc);
+	retval = format_corename(&cn, &cprm, &argv, &argc);
+	if (retval < 0) {
+		coredump_report_failure("format_corename failed, aborting core");
+		goto fail_unlock;
+	}
 
-	if (ispipe) {
+	if (cn.core_type == COREDUMP_PIPE) {
 		int argi;
 		int dump_count;
 		char **helper_argv;
 		struct subprocess_info *sub_info;
 
-		if (ispipe < 0) {
-			coredump_report_failure("format_corename failed, aborting core");
-			goto fail_unlock;
-		}
-
 		if (cprm.limit == 1) {
 			/* See umh_coredump_setup() which sets RLIMIT_CORE = 1.
 			 *
@@ -695,7 +702,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 			coredump_report_failure("|%s pipe failed", cn.corename);
 			goto close_fail;
 		}
-	} else {
+	} else if (cn.core_type == COREDUMP_FILE) {
 		struct mnt_idmap *idmap;
 		struct inode *inode;
 		int open_flags = O_CREAT | O_WRONLY | O_NOFOLLOW |
@@ -823,13 +830,13 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 		file_end_write(cprm.file);
 		free_vma_snapshot(&cprm);
 	}
-	if (ispipe && core_pipe_limit)
+	if ((cn.core_type == COREDUMP_PIPE) && core_pipe_limit)
 		wait_for_dump_helpers(cprm.file);
 close_fail:
 	if (cprm.file)
 		filp_close(cprm.file, NULL);
 fail_dropcount:
-	if (ispipe)
+	if (cn.core_type == COREDUMP_PIPE)
 		atomic_dec(&core_dump_count);
 fail_unlock:
 	kfree(argv);

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC 2/3] coredump: massage do_coredump()
  2025-04-30 11:05 [PATCH RFC 0/3] coredump: support AF_UNIX sockets Christian Brauner
  2025-04-30 11:05 ` [PATCH RFC 1/3] coredump: massage format_corname() Christian Brauner
@ 2025-04-30 11:05 ` Christian Brauner
  2025-04-30 11:05 ` [PATCH RFC 3/3] coredump: support AF_UNIX sockets Christian Brauner
  2025-04-30 11:14 ` [PATCH RFC 0/3] " Christian Brauner
  3 siblings, 0 replies; 7+ messages in thread
From: Christian Brauner @ 2025-04-30 11:05 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: David S. Miller, Alexander Viro, Daan De Meyer, David Rheinsberg,
	Eric Dumazet, Jakub Kicinski, Jan Kara, Kuniyuki Iwashima,
	Lennart Poettering, Luca Boccassi, Mike Yuan, Oleg Nesterov,
	Paolo Abeni, Simon Horman, Zbigniew Jędrzejewski-Szmek,
	linux-kernel, netdev, Christian Brauner

We're going to extend the coredump code in follow-up patches.
Clean it up so we can do this more easily.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/coredump.c | 123 +++++++++++++++++++++++++++++++---------------------------
 1 file changed, 66 insertions(+), 57 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 281320ea351f..1779299b8c61 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -646,63 +646,8 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 		goto fail_unlock;
 	}
 
-	if (cn.core_type == COREDUMP_PIPE) {
-		int argi;
-		int dump_count;
-		char **helper_argv;
-		struct subprocess_info *sub_info;
-
-		if (cprm.limit == 1) {
-			/* See umh_coredump_setup() which sets RLIMIT_CORE = 1.
-			 *
-			 * Normally core limits are irrelevant to pipes, since
-			 * we're not writing to the file system, but we use
-			 * cprm.limit of 1 here as a special value, this is a
-			 * consistent way to catch recursive crashes.
-			 * We can still crash if the core_pattern binary sets
-			 * RLIM_CORE = !1, but it runs as root, and can do
-			 * lots of stupid things.
-			 *
-			 * Note that we use task_tgid_vnr here to grab the pid
-			 * of the process group leader.  That way we get the
-			 * right pid if a thread in a multi-threaded
-			 * core_pattern process dies.
-			 */
-			coredump_report_failure("RLIMIT_CORE is set to 1, aborting core");
-			goto fail_unlock;
-		}
-		cprm.limit = RLIM_INFINITY;
-
-		dump_count = atomic_inc_return(&core_dump_count);
-		if (core_pipe_limit && (core_pipe_limit < dump_count)) {
-			coredump_report_failure("over core_pipe_limit, skipping core dump");
-			goto fail_dropcount;
-		}
-
-		helper_argv = kmalloc_array(argc + 1, sizeof(*helper_argv),
-					    GFP_KERNEL);
-		if (!helper_argv) {
-			coredump_report_failure("%s failed to allocate memory", __func__);
-			goto fail_dropcount;
-		}
-		for (argi = 0; argi < argc; argi++)
-			helper_argv[argi] = cn.corename + argv[argi];
-		helper_argv[argi] = NULL;
-
-		retval = -ENOMEM;
-		sub_info = call_usermodehelper_setup(helper_argv[0],
-						helper_argv, NULL, GFP_KERNEL,
-						umh_coredump_setup, NULL, &cprm);
-		if (sub_info)
-			retval = call_usermodehelper_exec(sub_info,
-							  UMH_WAIT_EXEC);
-
-		kfree(helper_argv);
-		if (retval) {
-			coredump_report_failure("|%s pipe failed", cn.corename);
-			goto close_fail;
-		}
-	} else if (cn.core_type == COREDUMP_FILE) {
+	switch (cn.core_type) {
+	case COREDUMP_FILE: {
 		struct mnt_idmap *idmap;
 		struct inode *inode;
 		int open_flags = O_CREAT | O_WRONLY | O_NOFOLLOW |
@@ -796,6 +741,70 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 		if (do_truncate(idmap, cprm.file->f_path.dentry,
 				0, 0, cprm.file))
 			goto close_fail;
+		break;
+	}
+	case COREDUMP_PIPE: {
+		int argi;
+		int dump_count;
+		char **helper_argv;
+		struct subprocess_info *sub_info;
+
+		if (cprm.limit == 1) {
+			/* See umh_coredump_setup() which sets RLIMIT_CORE = 1.
+			 *
+			 * Normally core limits are irrelevant to pipes, since
+			 * we're not writing to the file system, but we use
+			 * cprm.limit of 1 here as a special value, this is a
+			 * consistent way to catch recursive crashes.
+			 * We can still crash if the core_pattern binary sets
+			 * RLIM_CORE = !1, but it runs as root, and can do
+			 * lots of stupid things.
+			 *
+			 * Note that we use task_tgid_vnr here to grab the pid
+			 * of the process group leader.  That way we get the
+			 * right pid if a thread in a multi-threaded
+			 * core_pattern process dies.
+			 */
+			coredump_report_failure("RLIMIT_CORE is set to 1, aborting core");
+			goto fail_unlock;
+		}
+		cprm.limit = RLIM_INFINITY;
+
+		dump_count = atomic_inc_return(&core_dump_count);
+		if (core_pipe_limit && (core_pipe_limit < dump_count)) {
+			coredump_report_failure("over core_pipe_limit, skipping core dump");
+			goto fail_dropcount;
+		}
+
+		helper_argv = kmalloc_array(argc + 1, sizeof(*helper_argv),
+					    GFP_KERNEL);
+		if (!helper_argv) {
+			coredump_report_failure("%s failed to allocate memory", __func__);
+			goto fail_dropcount;
+		}
+		for (argi = 0; argi < argc; argi++)
+			helper_argv[argi] = cn.corename + argv[argi];
+		helper_argv[argi] = NULL;
+
+		retval = -ENOMEM;
+		sub_info = call_usermodehelper_setup(helper_argv[0],
+						helper_argv, NULL, GFP_KERNEL,
+						umh_coredump_setup, NULL, &cprm);
+		if (sub_info)
+			retval = call_usermodehelper_exec(sub_info,
+							  UMH_WAIT_EXEC);
+
+		kfree(helper_argv);
+		if (retval) {
+			coredump_report_failure("|%s pipe failed", cn.corename);
+			goto close_fail;
+		}
+		break;
+	}
+	default:
+		WARN_ON_ONCE(true);
+		retval = -EINVAL;
+		goto close_fail;
 	}
 
 	/* get us an unshared descriptor table; almost always a no-op */

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC 3/3] coredump: support AF_UNIX sockets
  2025-04-30 11:05 [PATCH RFC 0/3] coredump: support AF_UNIX sockets Christian Brauner
  2025-04-30 11:05 ` [PATCH RFC 1/3] coredump: massage format_corname() Christian Brauner
  2025-04-30 11:05 ` [PATCH RFC 2/3] coredump: massage do_coredump() Christian Brauner
@ 2025-04-30 11:05 ` Christian Brauner
  2025-04-30 11:14   ` Christian Brauner
  2025-05-01  0:25   ` Kuniyuki Iwashima
  2025-04-30 11:14 ` [PATCH RFC 0/3] " Christian Brauner
  3 siblings, 2 replies; 7+ messages in thread
From: Christian Brauner @ 2025-04-30 11:05 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: David S. Miller, Alexander Viro, Daan De Meyer, David Rheinsberg,
	Eric Dumazet, Jakub Kicinski, Jan Kara, Kuniyuki Iwashima,
	Lennart Poettering, Luca Boccassi, Mike Yuan, Oleg Nesterov,
	Paolo Abeni, Simon Horman, Zbigniew Jędrzejewski-Szmek,
	linux-kernel, netdev, Christian Brauner

Coredumping currently supports two modes:

(1) Dumping directly into a file somewhere on the filesystem.
(2) Dumping into a pipe connected to a usermode helper process
    spawned as a child of the system_unbound_wq or kthreadd.

For simplicity I'm mostly ignoring (1). There's probably still some
users of (1) out there but processing coredumps in this way can be
considered adventurous especially in the face of set*id binaries.

The most common option should be (2) by now. It works by allowing
userspace to put a string into /proc/sys/kernel/core_pattern like:

        |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

The "|" at the beginning indicates to the kernel that a pipe must be
used. The path following the pipe indicator is a path to a binary that
will be spawned as a usermode helper process. Any additional parameters
pass information about the task that is generating the coredump to the
binary that processes the coredump.

In this case systemd-coredump is spawned as a usermode helper. There's
various conceptual consequences of this (non-exhaustive list):

- systemd-coredump is spawned with file descriptor number 0 (stdin)
  to the read-end of the pipe. All other file descriptors are closed.
  That specifically includes 1 (stdout) and 2 (stderr). This has already
  caused bugs because userspace assumed that this cannot happen (Whether
  or not this is a sane assumption is irrelevant.).

- systemd-coredump will be spawned as a child of system_unbound_wq. So
  it is not a child of any userspace process and specifically not a
  child of PID 1 so it cannot be waited upon and is in general a weird
  hybrid upcall.

- systemd-coredump is spawned highly privileged as it is spawned with
  full kernel credentials requiring all kinds of weird privilege
  dropping excercises in userspaces.

This adds another mode:

(3) Dumping into a AF_UNIX socket.

Userspace can set /proc/sys/kernel/core_pattern to:

        :/run/coredump.socket

The ":" at the beginning indicates to the kernel that an AF_UNIX socket
is used to process coredumps. The task generating the coredump simply
connects to the socket and writes the coredump into the socket.

Userspace can get a stable handle on the task generating the coredump by
using the SO_PEERPIDFD socket option. SO_PEERPIDFD uses the thread-group
leader pid stashed during connect(). Even if the task generating the
coredump is a subthread in the thread-group the pidfd of the
thread-group leader is a reliable stable handle. Userspace that's
interested in the credentials of the specific thread that crashed can
use SCM_PIDFD to retrieve them.

The pidfd can be used to safely open and parse /proc/<pid> of the task
and it can also be used to retrieve additional meta information via the
PIDFD_GET_INFO ioctl().

This will allow userspace to not have to rely on usermode helpers for
processing coredumps and thus to stop having to handle super privileged
coredumping helpers.

This is easy to test:

(a) coredump processing (we're using socat):

    > cat coredump_socket.sh
    #!/bin/bash

    set -x

    sudo bash -c "echo ':/tmp/stream.sock' > /proc/sys/kernel/core_pattern"
    socat --statistics unix-listen:/tmp/stream.sock,fork FILE:core_file,create,append,truncate

(b) trigger a coredump:

    user1@localhost:~/data/scripts$ cat crash.c
    #include <stdio.h>
    #include <unistd.h>

    int main(int argc, char *argv[])
    {
            fprintf(stderr, "%u\n", (1 / 0));
            _exit(0);
    }

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/coredump.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 81 insertions(+), 2 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 1779299b8c61..db914ff20a5e 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -45,6 +45,9 @@
 #include <linux/elf.h>
 #include <linux/pidfs.h>
 #include <uapi/linux/pidfd.h>
+#include <linux/net.h>
+#include <uapi/linux/un.h>
+#include <linux/socket.h>
 
 #include <linux/uaccess.h>
 #include <asm/mmu_context.h>
@@ -79,6 +82,7 @@ unsigned int core_file_note_size_limit = CORE_FILE_NOTE_SIZE_DEFAULT;
 enum coredump_type_t {
 	COREDUMP_FILE = 1,
 	COREDUMP_PIPE = 2,
+	COREDUMP_SOCK = 3,
 };
 
 struct core_name {
@@ -232,13 +236,16 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
 	cn->corename = NULL;
 	if (*pat_ptr == '|')
 		cn->core_type = COREDUMP_PIPE;
+	else if (*pat_ptr == ':')
+		cn->core_type = COREDUMP_SOCK;
 	else
 		cn->core_type = COREDUMP_FILE;
 	if (expand_corename(cn, core_name_size))
 		return -ENOMEM;
 	cn->corename[0] = '\0';
 
-	if (cn->core_type == COREDUMP_PIPE) {
+	switch (cn->core_type) {
+	case COREDUMP_PIPE: {
 		int argvs = sizeof(core_pattern) / 2;
 		(*argv) = kmalloc_array(argvs, sizeof(**argv), GFP_KERNEL);
 		if (!(*argv))
@@ -247,6 +254,35 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
 		++pat_ptr;
 		if (!(*pat_ptr))
 			return -ENOMEM;
+		break;
+	}
+	case COREDUMP_SOCK: {
+		/* skip ':' */
+		++pat_ptr;
+		/* no spaces */
+		if (!(*pat_ptr))
+			return -EINVAL;
+		/* must be an absolute path */
+		if (!(*pat_ptr == '/'))
+			return -EINVAL;
+		err = cn_printf(cn, "%s", pat_ptr);
+		if (err)
+			return err;
+		/*
+		 * No need to parse any other options. Relevant
+		 * information can be retrieved from the peer pidfd
+		 * retrievable via SO_PEERPIDFD by the receiver or via
+		 * /proc/<pid>, using the SO_PEERPIDFD to guard against
+		 * pid recycling when opening /proc/<pid>.
+		 *
+		 * Hell, we could even add a PIDFD_COREDUMP struct
+		 * retrievable via an ioctl.
+		 */
+		return 0;
+	}
+	default:
+		WARN_ON_ONCE(cn->core_type != COREDUMP_FILE);
+		break;
 	}
 
 	/* Repeat as long as we have more pattern to process and more output
@@ -801,6 +837,49 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 		}
 		break;
 	}
+	case COREDUMP_SOCK: {
+		struct file *file __free(fput) = NULL;
+		struct sockaddr_un unix_addr = {
+			.sun_family = AF_UNIX,
+		};
+		struct sockaddr_storage *addr;
+
+		retval = strscpy(unix_addr.sun_path, cn.corename, sizeof(unix_addr.sun_path));
+		if (retval < 0)
+			goto close_fail;
+
+		file = __sys_socket_file(AF_UNIX, SOCK_STREAM, 0);
+		if (IS_ERR(file))
+			goto close_fail;
+
+		/*
+		 * It is possible that the userspace process which is
+		 * supposed to handle the coredump and is listening on
+		 * the AF_UNIX socket coredumps. This should be fine
+		 * though. If this was the only process which was
+		 * listen()ing on the AF_UNIX socket for coredumps it
+		 * obviously won't be listen()ing anymore by the time it
+		 * gets here. So the __sys_connect_file() call will
+		 * often fail with ECONNREFUSED and the coredump.
+		 *
+		 * In general though, userspace should just mark itself
+		 * non dumpable and not do any of this nonsense. We
+		 * shouldn't work around this.
+		 */
+		addr = (struct sockaddr_storage *)(&unix_addr);
+		retval = __sys_connect_file(file, addr, sizeof(unix_addr), O_CLOEXEC);
+		if (retval)
+			goto close_fail;
+
+		/* The peer isn't supposed to write and we for sure won't read. */
+		retval =  __sys_shutdown_sock(sock_from_file(file), SHUT_RD);
+		if (retval)
+			goto close_fail;
+
+		cprm.file = no_free_ptr(file);
+		cprm.limit = RLIM_INFINITY;
+		break;
+	}
 	default:
 		WARN_ON_ONCE(true);
 		retval = -EINVAL;
@@ -1070,7 +1149,7 @@ EXPORT_SYMBOL(dump_align);
 void validate_coredump_safety(void)
 {
 	if (suid_dumpable == SUID_DUMP_ROOT &&
-	    core_pattern[0] != '/' && core_pattern[0] != '|') {
+	    core_pattern[0] != '/' && core_pattern[0] != '|' && core_pattern[0] != ':') {
 
 		coredump_report_failure("Unsafe core_pattern used with fs.suid_dumpable=2: "
 			"pipe handler or fully qualified core dump path required. "

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 3/3] coredump: support AF_UNIX sockets
  2025-04-30 11:05 ` [PATCH RFC 3/3] coredump: support AF_UNIX sockets Christian Brauner
@ 2025-04-30 11:14   ` Christian Brauner
  2025-05-01  0:25   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 7+ messages in thread
From: Christian Brauner @ 2025-04-30 11:14 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: David S. Miller, Alexander Viro, Daan De Meyer, David Rheinsberg,
	Eric Dumazet, Jakub Kicinski, Jan Kara, Kuniyuki Iwashima,
	Lennart Poettering, Luca Boccassi, Mike Yuan, Oleg Nesterov,
	Paolo Abeni, Simon Horman, Zbigniew Jędrzejewski-Szmek,
	linux-kernel, netdev

> This is easy to test:
> 
> (a) coredump processing (we're using socat):
> 
>     > cat coredump_socket.sh
>     #!/bin/bash
> 
>     set -x
> 
>     sudo bash -c "echo ':/tmp/stream.sock' > /proc/sys/kernel/core_pattern"
>     socat --statistics unix-listen:/tmp/stream.sock,fork FILE:core_file,create,append,truncate

Don't use "truncate" please. It's not a socat option and it won't work.
The correct option is "trunc".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 0/3] coredump: support AF_UNIX sockets
  2025-04-30 11:05 [PATCH RFC 0/3] coredump: support AF_UNIX sockets Christian Brauner
                   ` (2 preceding siblings ...)
  2025-04-30 11:05 ` [PATCH RFC 3/3] coredump: support AF_UNIX sockets Christian Brauner
@ 2025-04-30 11:14 ` Christian Brauner
  3 siblings, 0 replies; 7+ messages in thread
From: Christian Brauner @ 2025-04-30 11:14 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: David S. Miller, Alexander Viro, Daan De Meyer, David Rheinsberg,
	Eric Dumazet, Jakub Kicinski, Jan Kara, Kuniyuki Iwashima,
	Lennart Poettering, Luca Boccassi, Mike Yuan, Oleg Nesterov,
	Paolo Abeni, Simon Horman, Zbigniew Jędrzejewski-Szmek,
	linux-kernel, netdev

> This is easy to test:
> 
> (a) coredump processing (we're using socat):
> 
>     > cat coredump_socket.sh
>     #!/bin/bash
>     
>     set -x
>     
>     sudo bash -c "echo ':/tmp/stream.sock' > /proc/sys/kernel/core_pattern"
>     socat --statistics unix-listen:/tmp/stream.sock,fork FILE:core_file,create,append,truncate

Don't use "truncate" please. It's not a socat option and it won't work.
The correct option is "trunc".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 3/3] coredump: support AF_UNIX sockets
  2025-04-30 11:05 ` [PATCH RFC 3/3] coredump: support AF_UNIX sockets Christian Brauner
  2025-04-30 11:14   ` Christian Brauner
@ 2025-05-01  0:25   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 7+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-01  0:25 UTC (permalink / raw)
  To: brauner
  Cc: bluca, daan.j.demeyer, davem, david, edumazet, horms, jack, kuba,
	kuniyu, lennart, linux-fsdevel, linux-kernel, me, netdev, oleg,
	pabeni, viro, zbyszek

From: Christian Brauner <brauner@kernel.org>
Date: Wed, 30 Apr 2025 13:05:03 +0200
> @@ -801,6 +837,49 @@ void do_coredump(const kernel_siginfo_t *siginfo)
>  		}
>  		break;
>  	}
> +	case COREDUMP_SOCK: {
> +		struct file *file __free(fput) = NULL;
> +		struct sockaddr_un unix_addr = {
> +			.sun_family = AF_UNIX,
> +		};
> +		struct sockaddr_storage *addr;
> +
> +		retval = strscpy(unix_addr.sun_path, cn.corename, sizeof(unix_addr.sun_path));
> +		if (retval < 0)
> +			goto close_fail;
> +
> +		file = __sys_socket_file(AF_UNIX, SOCK_STREAM, 0);
> +		if (IS_ERR(file))
> +			goto close_fail;
> +
> +		/*
> +		 * It is possible that the userspace process which is
> +		 * supposed to handle the coredump and is listening on
> +		 * the AF_UNIX socket coredumps. This should be fine
> +		 * though. If this was the only process which was
> +		 * listen()ing on the AF_UNIX socket for coredumps it
> +		 * obviously won't be listen()ing anymore by the time it
> +		 * gets here. So the __sys_connect_file() call will
> +		 * often fail with ECONNREFUSED and the coredump.
> +		 *
> +		 * In general though, userspace should just mark itself
> +		 * non dumpable and not do any of this nonsense. We
> +		 * shouldn't work around this.
> +		 */
> +		addr = (struct sockaddr_storage *)(&unix_addr);
> +		retval = __sys_connect_file(file, addr, sizeof(unix_addr), O_CLOEXEC);

The 3rd argument should be offsetof(struct sockaddr_un, sun_path)
+ retval of strscpy() above ?

I guess you could see an unexpected error when
CONFIG_INIT_STACK_NONE=y and cn.corename has garbage at tail.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-05-01  0:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-30 11:05 [PATCH RFC 0/3] coredump: support AF_UNIX sockets Christian Brauner
2025-04-30 11:05 ` [PATCH RFC 1/3] coredump: massage format_corname() Christian Brauner
2025-04-30 11:05 ` [PATCH RFC 2/3] coredump: massage do_coredump() Christian Brauner
2025-04-30 11:05 ` [PATCH RFC 3/3] coredump: support AF_UNIX sockets Christian Brauner
2025-04-30 11:14   ` Christian Brauner
2025-05-01  0:25   ` Kuniyuki Iwashima
2025-04-30 11:14 ` [PATCH RFC 0/3] " Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).