public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: alexander@mihalicyn.com, bluca@debian.org,
	daan.j.demeyer@gmail.com,  daniel@iogearbox.net,
	davem@davemloft.net, david@readahead.eu, edumazet@google.com,
	 horms@kernel.org, jack@suse.cz, jannh@google.com,
	kuba@kernel.org,  lennart@poettering.net,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	 linux-security-module@vger.kernel.org, me@yhndnzj.com,
	netdev@vger.kernel.org, oleg@redhat.com,  pabeni@redhat.com,
	viro@zeniv.linux.org.uk, zbyszek@in.waw.pl
Subject: Re: [PATCH v7 4/9] coredump: add coredump socket
Date: Fri, 16 May 2025 12:14:05 +0200	[thread overview]
Message-ID: <20250516-schund-wohlbefinden-945aceec2edc@brauner> (raw)
In-Reply-To: <20250515170057.50816-1-kuniyu@amazon.com>

On Thu, May 15, 2025 at 10:00:43AM -0700, Kuniyuki Iwashima wrote:
> From: Christian Brauner <brauner@kernel.org>
> Date: Thu, 15 May 2025 00:03:37 +0200
> > Coredumping currently supports two modes:
> > 
> > (1) Dumping directly into a file somewhere on the filesystem.
> > (2) Dumping into a pipe connected to a usermode helper process
> >     spawned as a child of the system_unbound_wq or kthreadd.
> > 
> > For simplicity I'm mostly ignoring (1). There's probably still some
> > users of (1) out there but processing coredumps in this way can be
> > considered adventurous especially in the face of set*id binaries.
> > 
> > The most common option should be (2) by now. It works by allowing
> > userspace to put a string into /proc/sys/kernel/core_pattern like:
> > 
> >         |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
> > 
> > The "|" at the beginning indicates to the kernel that a pipe must be
> > used. The path following the pipe indicator is a path to a binary that
> > will be spawned as a usermode helper process. Any additional parameters
> > pass information about the task that is generating the coredump to the
> > binary that processes the coredump.
> > 
> > In the example core_pattern shown above systemd-coredump is spawned as a
> > usermode helper. There's various conceptual consequences of this
> > (non-exhaustive list):
> > 
> > - systemd-coredump is spawned with file descriptor number 0 (stdin)
> >   connected to the read-end of the pipe. All other file descriptors are
> >   closed. That specifically includes 1 (stdout) and 2 (stderr). This has
> >   already caused bugs because userspace assumed that this cannot happen
> >   (Whether or not this is a sane assumption is irrelevant.).
> > 
> > - systemd-coredump will be spawned as a child of system_unbound_wq. So
> >   it is not a child of any userspace process and specifically not a
> >   child of PID 1. It cannot be waited upon and is in a weird hybrid
> >   upcall which are difficult for userspace to control correctly.
> > 
> > - systemd-coredump is spawned with full kernel privileges. This
> >   necessitates all kinds of weird privilege dropping excercises in
> >   userspace to make this safe.
> > 
> > - A new usermode helper has to be spawned for each crashing process.
> > 
> > This series adds a new mode:
> > 
> > (3) Dumping into an AF_UNIX socket.
> > 
> > Userspace can set /proc/sys/kernel/core_pattern to:
> > 
> >         @/path/to/coredump.socket
> > 
> > The "@" at the beginning indicates to the kernel that an AF_UNIX
> > coredump socket will be used to process coredumps.
> > 
> > The coredump socket must be located in the initial mount namespace.
> > When a task coredumps it opens a client socket in the initial network
> > namespace and connects to the coredump socket.
> > 
> > - The coredump server uses SO_PEERPIDFD to get a stable handle on the
> >   connected crashing task. The retrieved pidfd will provide a stable
> >   reference even if the crashing task gets SIGKILLed while generating
> >   the coredump.
> > 
> > - By setting core_pipe_limit non-zero userspace can guarantee that the
> >   crashing task cannot be reaped behind it's back and thus process all
> >   necessary information in /proc/<pid>. The SO_PEERPIDFD can be used to
> >   detect whether /proc/<pid> still refers to the same process.
> > 
> >   The core_pipe_limit isn't used to rate-limit connections to the
> >   socket. This can simply be done via AF_UNIX sockets directly.
> > 
> > - The pidfd for the crashing task will grow new information how the task
> >   coredumps.
> > 
> > - The coredump server should mark itself as non-dumpable.
> > 
> > - A container coredump server in a separate network namespace can simply
> >   bind to another well-know address and systemd-coredump fowards
> >   coredumps to the container.
> > 
> > - Coredumps could in the future also be handled via per-user/session
> >   coredump servers that run only with that users privileges.
> > 
> >   The coredump server listens on the coredump socket and accepts a
> >   new coredump connection. It then retrieves SO_PEERPIDFD for the
> >   client, inspects uid/gid and hands the accepted client to the users
> >   own coredump handler which runs with the users privileges only
> >   (It must of coure pay close attention to not forward crashing suid
> >   binaries.).
> > 
> > The new coredump socket will allow userspace to not have to rely on
> > usermode helpers for processing coredumps and provides a safer way to
> > handle them instead of relying on super privileged coredumping helpers
> > that have and continue to cause significant CVEs.
> > 
> > This will also be significantly more lightweight since no fork()+exec()
> > for the usermodehelper is required for each crashing process. The
> > coredump server in userspace can e.g., just keep a worker pool.
> > 
> > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > ---
> >  fs/coredump.c       | 133 ++++++++++++++++++++++++++++++++++++++++++++++++----
> >  include/linux/net.h |   1 +
> >  net/unix/af_unix.c  |  53 ++++++++++++++++-----
> >  3 files changed, 166 insertions(+), 21 deletions(-)
> > 
> > diff --git a/fs/coredump.c b/fs/coredump.c
> > index a70929c3585b..e1256ebb89c1 100644
> > --- a/fs/coredump.c
> > +++ b/fs/coredump.c
> > @@ -44,7 +44,11 @@
> >  #include <linux/sysctl.h>
> >  #include <linux/elf.h>
> >  #include <linux/pidfs.h>
> > +#include <linux/net.h>
> > +#include <linux/socket.h>
> > +#include <net/net_namespace.h>
> >  #include <uapi/linux/pidfd.h>
> > +#include <uapi/linux/un.h>
> >  
> >  #include <linux/uaccess.h>
> >  #include <asm/mmu_context.h>
> > @@ -79,6 +83,7 @@ unsigned int core_file_note_size_limit = CORE_FILE_NOTE_SIZE_DEFAULT;
> >  enum coredump_type_t {
> >  	COREDUMP_FILE = 1,
> >  	COREDUMP_PIPE = 2,
> > +	COREDUMP_SOCK = 3,
> >  };
> >  
> >  struct core_name {
> > @@ -232,13 +237,16 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
> >  	cn->corename = NULL;
> >  	if (*pat_ptr == '|')
> >  		cn->core_type = COREDUMP_PIPE;
> > +	else if (*pat_ptr == '@')
> > +		cn->core_type = COREDUMP_SOCK;
> >  	else
> >  		cn->core_type = COREDUMP_FILE;
> >  	if (expand_corename(cn, core_name_size))
> >  		return -ENOMEM;
> >  	cn->corename[0] = '\0';
> >  
> > -	if (cn->core_type == COREDUMP_PIPE) {
> > +	switch (cn->core_type) {
> > +	case COREDUMP_PIPE: {
> >  		int argvs = sizeof(core_pattern) / 2;
> >  		(*argv) = kmalloc_array(argvs, sizeof(**argv), GFP_KERNEL);
> >  		if (!(*argv))
> > @@ -247,6 +255,33 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
> >  		++pat_ptr;
> >  		if (!(*pat_ptr))
> >  			return -ENOMEM;
> > +		break;
> > +	}
> > +	case COREDUMP_SOCK: {
> > +		/* skip the @ */
> > +		pat_ptr++;
> > +		err = cn_printf(cn, "%s", pat_ptr);
> > +		if (err)
> > +			return err;
> > +
> > +		/* Require absolute paths. */
> > +		if (cn->corename[0] != '/')
> > +			return -EINVAL;
> > +
> > +		/*
> > +		 * Currently no need to parse any other options.
> > +		 * Relevant information can be retrieved from the peer
> > +		 * pidfd retrievable via SO_PEERPIDFD by the receiver or
> > +		 * via /proc/<pid>, using the SO_PEERPIDFD to guard
> > +		 * against pid recycling when opening /proc/<pid>.
> > +		 */
> > +		return 0;
> > +	}
> > +	case COREDUMP_FILE:
> > +		break;
> > +	default:
> > +		WARN_ON_ONCE(true);
> > +		return -EINVAL;
> >  	}
> >  
> >  	/* Repeat as long as we have more pattern to process and more output
> > @@ -393,11 +428,20 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
> >  	 * If core_pattern does not include a %p (as is the default)
> >  	 * and core_uses_pid is set, then .%pid will be appended to
> >  	 * the filename. Do not do this for piped commands. */
> > -	if (!(cn->core_type == COREDUMP_PIPE) && !pid_in_pattern && core_uses_pid) {
> > -		err = cn_printf(cn, ".%d", task_tgid_vnr(current));
> > -		if (err)
> > -			return err;
> > +	if (!pid_in_pattern && core_uses_pid) {
> > +		switch (cn->core_type) {
> > +		case COREDUMP_FILE:
> > +			return cn_printf(cn, ".%d", task_tgid_vnr(current));
> > +		case COREDUMP_PIPE:
> > +			break;
> > +		case COREDUMP_SOCK:
> > +			break;
> > +		default:
> > +			WARN_ON_ONCE(true);
> > +			return -EINVAL;
> > +		}
> >  	}
> > +
> >  	return 0;
> >  }
> >  
> > @@ -801,6 +845,55 @@ void do_coredump(const kernel_siginfo_t *siginfo)
> >  		}
> >  		break;
> >  	}
> > +	case COREDUMP_SOCK: {
> > +#ifdef CONFIG_UNIX
> > +		struct file *file __free(fput) = NULL;
> > +		struct sockaddr_un addr = {
> > +			.sun_family = AF_UNIX,
> > +		};
> > +		ssize_t addr_len;
> > +		struct socket *socket;
> > +
> > +		retval = strscpy(addr.sun_path, cn.corename, sizeof(addr.sun_path));
> > +		if (retval < 0)
> > +			goto close_fail;
> > +		addr_len = offsetof(struct sockaddr_un, sun_path) + retval + 1;
> > +
> > +		/*
> > +		 * It is possible that the userspace process which is
> > +		 * supposed to handle the coredump and is listening on
> > +		 * the AF_UNIX socket coredumps. Userspace should just
> > +		 * mark itself non dumpable.
> > +		 */
> > +
> > +		retval = sock_create_kern(&init_net, AF_UNIX, SOCK_STREAM, 0, &socket);
> > +		if (retval < 0)
> > +			goto close_fail;
> > +
> > +		file = sock_alloc_file(socket, 0, NULL);
> > +		if (IS_ERR(file)) {
> > +			sock_release(socket);
> > +			goto close_fail;
> > +		}
> > +
> > +		retval = kernel_connect(socket, (struct sockaddr *)(&addr),
> > +					addr_len, O_NONBLOCK | SOCK_COREDUMP);
> > +		if (retval) {
> > +			if (retval == -EAGAIN)
> > +				coredump_report_failure("Coredump socket %s receive queue full", addr.sun_path);
> > +			else
> > +				coredump_report_failure("Coredump socket connection %s failed %d", addr.sun_path, retval);
> > +			goto close_fail;
> > +		}
> > +
> > +		cprm.limit = RLIM_INFINITY;
> > +		cprm.file = no_free_ptr(file);
> > +#else
> > +		coredump_report_failure("Core dump socket support %s disabled", cn.corename);
> > +		goto close_fail;
> > +#endif
> > +		break;
> > +	}
> >  	default:
> >  		WARN_ON_ONCE(true);
> >  		goto close_fail;
> > @@ -838,8 +931,32 @@ void do_coredump(const kernel_siginfo_t *siginfo)
> >  		file_end_write(cprm.file);
> >  		free_vma_snapshot(&cprm);
> >  	}
> > -	if ((cn.core_type == COREDUMP_PIPE) && core_pipe_limit)
> > -		wait_for_dump_helpers(cprm.file);
> > +
> > +	/*
> > +	 * When core_pipe_limit is set we wait for the coredump server
> > +	 * or usermodehelper to finish before exiting so it can e.g.,
> > +	 * inspect /proc/<pid>.
> > +	 */
> > +	if (core_pipe_limit) {
> > +		switch (cn.core_type) {
> > +		case COREDUMP_PIPE:
> > +			wait_for_dump_helpers(cprm.file);
> > +			break;
> > +		case COREDUMP_SOCK: {
> > +			/*
> > +			 * We use a simple read to wait for the coredump
> > +			 * processing to finish. Either the socket is
> > +			 * closed or we get sent unexpected data. In
> > +			 * both cases, we're done.
> > +			 */
> > +			__kernel_read(cprm.file, &(char){ 0 }, 1, NULL);
> > +			break;
> > +		}
> > +		default:
> > +			break;
> > +		}
> > +	}
> > +
> >  close_fail:
> >  	if (cprm.file)
> >  		filp_close(cprm.file, NULL);
> > @@ -1069,7 +1186,7 @@ EXPORT_SYMBOL(dump_align);
> >  void validate_coredump_safety(void)
> >  {
> >  	if (suid_dumpable == SUID_DUMP_ROOT &&
> > -	    core_pattern[0] != '/' && core_pattern[0] != '|') {
> > +	    core_pattern[0] != '/' && core_pattern[0] != '|' && core_pattern[0] != '@') {
> >  
> >  		coredump_report_failure("Unsafe core_pattern used with fs.suid_dumpable=2: "
> >  			"pipe handler or fully qualified core dump path required. "
> > diff --git a/include/linux/net.h b/include/linux/net.h
> > index 0ff950eecc6b..139c85d0f2ea 100644
> > --- a/include/linux/net.h
> > +++ b/include/linux/net.h
> > @@ -81,6 +81,7 @@ enum sock_type {
> >  #ifndef SOCK_NONBLOCK
> >  #define SOCK_NONBLOCK	O_NONBLOCK
> >  #endif
> > +#define SOCK_COREDUMP	O_NOCTTY
> >  
> >  #endif /* ARCH_HAS_SOCKET_TYPES */
> >  
> > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> > index 472f8aa9ea15..a9d1c9ba2961 100644
> > --- a/net/unix/af_unix.c
> > +++ b/net/unix/af_unix.c
> > @@ -85,10 +85,13 @@
> >  #include <linux/file.h>
> >  #include <linux/filter.h>
> >  #include <linux/fs.h>
> > +#include <linux/fs_struct.h>
> >  #include <linux/init.h>
> >  #include <linux/kernel.h>
> >  #include <linux/mount.h>
> >  #include <linux/namei.h>
> > +#include <linux/net.h>
> > +#include <linux/pidfs.h>
> >  #include <linux/poll.h>
> >  #include <linux/proc_fs.h>
> >  #include <linux/sched/signal.h>
> > @@ -100,7 +103,6 @@
> >  #include <linux/splice.h>
> >  #include <linux/string.h>
> >  #include <linux/uaccess.h>
> > -#include <linux/pidfs.h>
> >  #include <net/af_unix.h>
> >  #include <net/net_namespace.h>
> >  #include <net/scm.h>
> > @@ -1146,7 +1148,7 @@ static int unix_release(struct socket *sock)
> >  }
> >  
> >  static struct sock *unix_find_bsd(struct sockaddr_un *sunaddr, int addr_len,
> > -				  int type)
> > +				  int type, unsigned int flags)
>   				      	    ^^^
> nit: int flags

done

> 
> 
> >  {
> >  	struct inode *inode;
> >  	struct path path;
> > @@ -1154,13 +1156,38 @@ static struct sock *unix_find_bsd(struct sockaddr_un *sunaddr, int addr_len,
> >  	int err;
> >  
> >  	unix_mkname_bsd(sunaddr, addr_len);
> > -	err = kern_path(sunaddr->sun_path, LOOKUP_FOLLOW, &path);
> > -	if (err)
> > -		goto fail;
> >  
> > -	err = path_permission(&path, MAY_WRITE);
> > -	if (err)
> > -		goto path_put;
> > +	if (flags & SOCK_COREDUMP) {
> > +		struct path root;
> > +		struct cred *kcred;
> > +		const struct cred *cred;
> 
> nit: please keep these in the reverse xmas tree order.
> https://docs.kernel.org/process/maintainer-netdev.html#local-variable-ordering-reverse-xmas-tree-rcs

Done. I keep forgetting this. Another decade and maybe I'll remember.

  parent reply	other threads:[~2025-05-16 10:14 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-14 22:03 [PATCH v7 0/9] coredump: add coredump socket Christian Brauner
2025-05-14 22:03 ` [PATCH v7 1/9] coredump: massage format_corname() Christian Brauner
2025-05-15 13:19   ` Alexander Mikhalitsyn
2025-05-15 13:36   ` Serge E. Hallyn
2025-05-15 20:52   ` Jann Horn
2025-05-14 22:03 ` [PATCH v7 2/9] coredump: massage do_coredump() Christian Brauner
2025-05-15 13:21   ` Alexander Mikhalitsyn
2025-05-15 20:52   ` Jann Horn
2025-05-14 22:03 ` [PATCH v7 3/9] coredump: reflow dump helpers a little Christian Brauner
2025-05-15 13:22   ` Alexander Mikhalitsyn
2025-05-15 20:53   ` Jann Horn
2025-05-14 22:03 ` [PATCH v7 4/9] coredump: add coredump socket Christian Brauner
2025-05-15 13:47   ` Alexander Mikhalitsyn
2025-05-16  8:30     ` Christian Brauner
2025-05-15 17:00   ` Kuniyuki Iwashima
2025-05-15 20:52     ` Jann Horn
2025-05-15 21:04       ` Kuniyuki Iwashima
2025-05-16 10:14     ` Christian Brauner [this message]
2025-05-15 20:54   ` Jann Horn
2025-05-15 21:15     ` Kuniyuki Iwashima
2025-05-16 10:09     ` Christian Brauner
2025-05-16 10:20       ` Christian Brauner
2025-05-14 22:03 ` [PATCH v7 5/9] pidfs, coredump: add PIDFD_INFO_COREDUMP Christian Brauner
2025-05-15 14:08   ` Alexander Mikhalitsyn
2025-05-15 20:56   ` Jann Horn
2025-05-15 21:37     ` Jann Horn
2025-05-16 10:34     ` Christian Brauner
2025-05-16 14:26       ` Jann Horn
2025-05-14 22:03 ` [PATCH v7 6/9] coredump: show supported coredump modes Christian Brauner
2025-05-15 13:56   ` Alexander Mikhalitsyn
2025-05-15 20:56   ` Jann Horn
2025-05-14 22:03 ` [PATCH v7 7/9] coredump: validate socket name as it is written Christian Brauner
2025-05-15 14:03   ` Alexander Mikhalitsyn
2025-05-15 20:56   ` Jann Horn
2025-05-16  9:54     ` Christian Brauner
2025-05-16 13:29       ` Christian Brauner
2025-05-14 22:03 ` [PATCH v7 8/9] selftests/pidfd: add PIDFD_INFO_COREDUMP infrastructure Christian Brauner
2025-05-15 14:35   ` Alexander Mikhalitsyn
2025-05-14 22:03 ` [PATCH v7 9/9] selftests/coredump: add tests for AF_UNIX coredumps Christian Brauner
2025-05-15 14:37   ` Alexander Mikhalitsyn
2025-05-14 22:38 ` [PATCH v7 0/9] coredump: add coredump socket Luca Boccassi
2025-05-15  9:17 ` Christian Brauner
2025-05-15  9:26 ` Lennart Poettering

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250516-schund-wohlbefinden-945aceec2edc@brauner \
    --to=brauner@kernel.org \
    --cc=alexander@mihalicyn.com \
    --cc=bluca@debian.org \
    --cc=daan.j.demeyer@gmail.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=david@readahead.eu \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@amazon.com \
    --cc=lennart@poettering.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=me@yhndnzj.com \
    --cc=netdev@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=pabeni@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zbyszek@in.waw.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox