linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kees Cook <kees@kernel.org>
To: Brian Mak <makb@juniper.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] binfmt_elf: Dump smaller VMAs first in ELF cores
Date: Mon, 5 Aug 2024 13:34:46 -0700	[thread overview]
Message-ID: <202408051330.277A639@keescook> (raw)
In-Reply-To: <230E81B0-A0BD-44B5-B354-3902DB50D3D0@juniper.net>

On Mon, Aug 05, 2024 at 06:44:44PM +0000, Brian Mak wrote:
> On Aug 5, 2024, at 10:25 AM, Kees Cook <kees@kernel.org> wrote:
> 
> > On Thu, Aug 01, 2024 at 05:58:06PM +0000, Brian Mak wrote:
> >> On Jul 31, 2024, at 7:52 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> >>> One practical concern with this approach is that I think the ELF
> >>> specification says that program headers should be written in memory
> >>> order.  So a comment on your testing to see if gdb or rr or any of
> >>> the other debuggers that read core dumps cares would be appreciated.
> >> 
> >> I've already tested readelf and gdb on core dumps (truncated and whole)
> >> with this patch and it is able to read/use these core dumps in these
> >> scenarios with a proper backtrace.
> > 
> > Can you compare the "rr" selftest before/after the patch? They have been
> > the most sensitive to changes to ELF, ptrace, seccomp, etc, so I've
> > tried to double-check "user visible" changes with their tree. :)
> 
> Hi Kees,
> 
> Thanks for your reply!
> 
> Can you please give me some more information on these self tests?
> What/where are they? I'm not too familiar with rr.

I start from where whenever I go through their tests:

https://github.com/rr-debugger/rr/wiki/Building-And-Installing#tests


> > And those VMAs weren't thread stacks?
> 
> Admittedly, I did do all of this exploration months ago, and only have
> my notes to go off of here, but no, they should not have been thread
> stacks since I had pulled all of them in during a "first pass".

Okay, cool. I suspect you'd already explored that, but I wanted to be
sure we didn't have an "easy to explain" solution. ;)

> > It does also feel like part of the overall problem is that systemd
> > doesn't have a way to know the process is crashing, and then creates the
> > truncation problem. (i.e. we're trying to use the kernel to work around
> > a visibility issue in userspace.)
> 
> Even if systemd had visibility into the fact that a crash is happening,
> there's not much systemd can do in some circumstances. In applications
> with strict time to recovery limits, the process needs to restart within
> a certain time limit. We run into a similar issue as the issue I raised
> in my last reply on this thread: to keep the core dump intact and
> recover, we either need to start up a new process while the old one is
> core dumping, or wait until core dumping is complete to restart.
> 
> If we start up a new process while the old one is core dumping, we risk
> system stability in applications with a large memory footprint since we
> could run out of memory from the duplication of memory consumption. If
> we wait until core dumping is complete to restart, we're in the same
> scenario as before with the core being truncated or we miss recovery
> time objectives by waiting too long.
> 
> For this reason, I wouldn't say we're using the kernel to work around a
> visibility issue or that systemd is creating the truncation problem, but
> rather that the issue exists due to limitations in how we're truncating
> cores. That being said, there might be some use in this type of
> visibility for others with less strict recovery time objectives or
> applications with a lower memory footprint.

Yeah, this is interesting. This effectively makes the coredumping
activity rather "critical path": the replacement process can't start
until the dump has finished... hmm. It feels like there should be a way
to move the dumping process aside, but with all the VMAs still live, I
can see how this might go weird. I'll think some more about this...

-- 
Kees Cook

      reply	other threads:[~2024-08-05 20:34 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-31 22:14 [RFC PATCH] binfmt_elf: Dump smaller VMAs first in ELF cores Brian Mak
2024-08-01  2:52 ` Eric W. Biederman
2024-08-01 17:58   ` Brian Mak
2024-08-02 16:16     ` Eric W. Biederman
2024-08-02 17:46       ` Brian Mak
2024-08-03  3:08         ` Eric W. Biederman
2024-08-05 17:25     ` Kees Cook
2024-08-05 18:44       ` Brian Mak
2024-08-05 20:34         ` Kees Cook [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202408051330.277A639@keescook \
    --to=kees@kernel.org \
    --cc=brauner@kernel.org \
    --cc=ebiederm@xmission.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=makb@juniper.net \
    --cc=oleg@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).