public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Kuniyuki Iwashima <kuniyu@amazon.com>
To: <bluca@debian.org>
Cc: <alexander@mihalicyn.com>, <brauner@kernel.org>,
	<daan.j.demeyer@gmail.com>, <daniel@iogearbox.net>,
	<davem@davemloft.net>, <david@readahead.eu>,
	<edumazet@google.com>, <horms@kernel.org>, <jack@suse.cz>,
	<jannh@google.com>, <kuba@kernel.org>, <kuniyu@amazon.com>,
	<lennart@poettering.net>, <linux-fsdevel@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>,
	<linux-security-module@vger.kernel.org>, <me@yhndnzj.com>,
	<netdev@vger.kernel.org>, <oleg@redhat.com>, <pabeni@redhat.com>,
	<viro@zeniv.linux.org.uk>, <zbyszek@in.waw.pl>
Subject: Re: [PATCH v6 4/9] coredump: add coredump socket
Date: Mon, 12 May 2025 19:14:48 -0700	[thread overview]
Message-ID: <20250513021626.86287-1-kuniyu@amazon.com> (raw)
In-Reply-To: <CAMw=ZnRC7Okmew=rrEocFuFn8hhrcergHciPjxFPuG4c6qH_Bw@mail.gmail.com>

From: Luca Boccassi <bluca@debian.org>
Date: Tue, 13 May 2025 02:09:24 +0100
> On Tue, 13 May 2025 at 01:18, Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From: Luca Boccassi <bluca@debian.org>
> > Date: Mon, 12 May 2025 11:58:54 +0100
> > > On Mon, 12 May 2025 at 09:56, Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > > Coredumping currently supports two modes:
> > > >
> > > > (1) Dumping directly into a file somewhere on the filesystem.
> > > > (2) Dumping into a pipe connected to a usermode helper process
> > > >     spawned as a child of the system_unbound_wq or kthreadd.
> > > >
> > > > For simplicity I'm mostly ignoring (1). There's probably still some
> > > > users of (1) out there but processing coredumps in this way can be
> > > > considered adventurous especially in the face of set*id binaries.
> > > >
> > > > The most common option should be (2) by now. It works by allowing
> > > > userspace to put a string into /proc/sys/kernel/core_pattern like:
> > > >
> > > >         |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
> > > >
> > > > The "|" at the beginning indicates to the kernel that a pipe must be
> > > > used. The path following the pipe indicator is a path to a binary that
> > > > will be spawned as a usermode helper process. Any additional parameters
> > > > pass information about the task that is generating the coredump to the
> > > > binary that processes the coredump.
> > > >
> > > > In the example core_pattern shown above systemd-coredump is spawned as a
> > > > usermode helper. There's various conceptual consequences of this
> > > > (non-exhaustive list):
> > > >
> > > > - systemd-coredump is spawned with file descriptor number 0 (stdin)
> > > >   connected to the read-end of the pipe. All other file descriptors are
> > > >   closed. That specifically includes 1 (stdout) and 2 (stderr). This has
> > > >   already caused bugs because userspace assumed that this cannot happen
> > > >   (Whether or not this is a sane assumption is irrelevant.).
> > > >
> > > > - systemd-coredump will be spawned as a child of system_unbound_wq. So
> > > >   it is not a child of any userspace process and specifically not a
> > > >   child of PID 1. It cannot be waited upon and is in a weird hybrid
> > > >   upcall which are difficult for userspace to control correctly.
> > > >
> > > > - systemd-coredump is spawned with full kernel privileges. This
> > > >   necessitates all kinds of weird privilege dropping excercises in
> > > >   userspace to make this safe.
> > > >
> > > > - A new usermode helper has to be spawned for each crashing process.
> > > >
> > > > This series adds a new mode:
> > > >
> > > > (3) Dumping into an abstract AF_UNIX socket.
> > > >
> > > > Userspace can set /proc/sys/kernel/core_pattern to:
> > > >
> > > >         @address SO_COOKIE
> > > >
> > > > The "@" at the beginning indicates to the kernel that the abstract
> > > > AF_UNIX coredump socket will be used to process coredumps. The address
> > > > is given by @address and must be followed by the socket cookie of the
> > > > coredump listening socket.
> > > >
> > > > The socket cookie is used to verify the socket connection. If the
> > > > coredump server restarts or crashes and someone recycles the socket
> > > > address the kernel will detect that the address has been recycled as the
> > > > socket cookie will have necessarily changed and refuse to connect.
> > >
> > > This dynamic/cookie prefix makes it impossible to use this with socket
> > > activation units. The way systemd-coredump works is that every
> > > instance is an independent templated unit, spawned when there's a
> > > connection to the private socket. If the path was fixed, we could just
> > > reuse the same mechanism, it would fit very nicely with minimal
> > > changes.
> >
> > Note this version does not use prefix.  Now it requires users to
> > just pass the socket cookie via core_pattern so that the kernel
> > can verify the peer.
> 
> Exactly - this means the pattern cannot be static in a sysctl.d early
> on boot anymore, and has to be set dynamically by <something>.

You missed the socket has to be created dynamically by <something>.


> This is
> a severe degradation over the status quo.
> 
> > > But because you need a "server" to be permanently running, this means
> > > socket-based activation can no longer work, and systemd-coredump must
> > > switch to a persistently-running mode.
> >
> > The only thing for systemd to do is assign a cookie after socket creation.
> >
> > As long as systemd hold the file descriptor of the socket, you don't need
> > a dedicated "server" running permanently, and the fd can be passed around
> > to a spawned/activated process.
> 
> There is no such facility, a socket is just a socket and there's no
> infrastructure to randomly extract random information from one and
> write it to some other random file in procfs,

As only one socket can be registered to core_pattern, the socket
must not be a random.


> and I don't see why we
> should add some super-special-case just for this,

Because this is a new special use case.


> it sounds really
> messy.
> Also sockets can be and in fact are routinely restarted (eg: on
> package upgrades), which would invalidate this whole scheme, and
> result in a very racy setup. When packages are upgraded it's one of
> the most complex workflows in modern distros, and it's very likely
> that things start crashing exactly at that point, and with this
> workflow it would mean we'll lose core files due to the race between
> restarting the socket unit and <something> updating the pattern
> accordingly.

Looks like you misunderstood the series.

As you need to specify the socket in core_pattern, there must be
only one socket that can receive core data, so the problem statement
is always true throughout the series.

kernel_connect() does not connect() to a random one out of sockets
that have the common prefix.

That's why the BPF was mentioned in the previous cover letter:

- Since unix_stream_connect() runs bpf programs during connect it's
  possible to even redirect or multiplex coredumps to other sockets.


> Also we very much want to be able to spawn as many core handlers at
> the same time as needed, which I don't see how can work with a cookie
> that has to be unique per socket.

As said, you can just pass the fd of the coredump listener or a fd
accept()ed from the listener, depending on how you want to handle
this in userspace.

  reply	other threads:[~2025-05-13  2:16 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-12  8:55 [PATCH v6 0/9] coredump: add coredump socket Christian Brauner
2025-05-12  8:55 ` [PATCH v6 1/9] coredump: massage format_corname() Christian Brauner
2025-05-12  8:55 ` [PATCH v6 2/9] coredump: massage do_coredump() Christian Brauner
2025-05-12  8:55 ` [PATCH v6 3/9] coredump: reflow dump helpers a little Christian Brauner
2025-05-12  8:55 ` [PATCH v6 4/9] coredump: add coredump socket Christian Brauner
2025-05-12 10:58   ` Luca Boccassi
2025-05-13  0:17     ` Kuniyuki Iwashima
2025-05-13  1:09       ` Luca Boccassi
2025-05-13  2:14         ` Kuniyuki Iwashima [this message]
2025-05-13  8:56           ` Lennart Poettering
2025-05-13 12:08             ` Christian Brauner
2025-05-13  0:06   ` Kuniyuki Iwashima
2025-05-12  8:55 ` [PATCH v6 5/9] pidfs, coredump: add PIDFD_INFO_COREDUMP Christian Brauner
2025-05-12  8:55 ` [PATCH v6 6/9] coredump: show supported coredump modes Christian Brauner
2025-05-12  8:55 ` [PATCH v6 7/9] coredump: validate socket name as it is written Christian Brauner
2025-05-12  8:55 ` [PATCH v6 8/9] selftests/pidfd: add PIDFD_INFO_COREDUMP infrastructure Christian Brauner
2025-05-12  8:55 ` [PATCH v6 9/9] selftests/coredump: add tests for AF_UNIX coredumps Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250513021626.86287-1-kuniyu@amazon.com \
    --to=kuniyu@amazon.com \
    --cc=alexander@mihalicyn.com \
    --cc=bluca@debian.org \
    --cc=brauner@kernel.org \
    --cc=daan.j.demeyer@gmail.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=david@readahead.eu \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=kuba@kernel.org \
    --cc=lennart@poettering.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=me@yhndnzj.com \
    --cc=netdev@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=pabeni@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zbyszek@in.waw.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox