linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Fiona Ebner <f.ebner@proxmox.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Oleg Nesterov <oleg@redhat.com>
Cc: akpm@linux-foundation.org,
	Wolfgang Bumiller <w.bumiller@proxmox.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: segfaults of processes while being killed after commit "mm: make the page fault mmap locking killable"
Date: Wed, 26 Jul 2023 08:51:24 +0200	[thread overview]
Message-ID: <85876d36-ca1f-4ba4-9065-4e7fc58329c0@proxmox.com> (raw)
In-Reply-To: <CAHk-=whKBx_UUKagfyF72EJrpqNCupF4yeoPgapjEBe1bynGcw@mail.gmail.com>

On 25/07/2023 18:38, Linus Torvalds wrote:
> But before we revert it, would you mind trying out the attached
> trivial patch instead?

Not Fiona, but as I was still online yesterday I got around to already
try that patch out, after adding the missing `tsk` task_struct param
to the fatal_signal_pending call.
With the patched kernel booted, the original case we found in the wild
went from logging a segfault roughly twice per hour before, to none
afterward, and that with a bit more than 10h of boot time.
Fiona might have a more definitive confirmation, as IIRC she got a
better (= faster) reproducer used for bisecting.

> 
> I'd also still be interested if the symptoms were anything else than
> 'show_unhandled_signals' causing the show_signal_msg() dance, and
> resulting in a message something like
> 
>     a.out[1567]: segfault at xyz ip [..] likely on CPU X
> 
> in dmesg...

exactly, it was just like that with no actual fall out. The messages
were like:

> pverados[2183248]: segfault at 55e5a00f9ae0 ip 000055e5a00f9ae0 sp 00007ffc0720bea8 error 14 in perl[55e5a00d4000+195000] likely on CPU 10 (core 4, socket 0)

And the slightly odd code triggering this was basically a fork, where
the child wrote a message to the parent via a unix socket pair and
then called exit. The parent read that message and then send a SIGKILL
to the child process, i.e., the child exit and parent killing the
child process would be pretty closely aligned, basically racing with
each other.

cheers,
 Thomas



  reply	other threads:[~2023-07-26  6:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-25 11:16 segfaults of processes while being killed after commit "mm: make the page fault mmap locking killable" Fiona Ebner
2023-07-25 16:38 ` Linus Torvalds
2023-07-26  6:51   ` Thomas Lamprecht [this message]
2023-07-26  8:19   ` Fiona Ebner
2023-07-26 17:59     ` Linus Torvalds
2023-07-27  7:57       ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85876d36-ca1f-4ba4-9065-4e7fc58329c0@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=akpm@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=f.ebner@proxmox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=w.bumiller@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).