From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Fiona Ebner <f.ebner@proxmox.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Oleg Nesterov <oleg@redhat.com>
Cc: akpm@linux-foundation.org,
Wolfgang Bumiller <w.bumiller@proxmox.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: segfaults of processes while being killed after commit "mm: make the page fault mmap locking killable"
Date: Wed, 26 Jul 2023 08:51:24 +0200 [thread overview]
Message-ID: <85876d36-ca1f-4ba4-9065-4e7fc58329c0@proxmox.com> (raw)
In-Reply-To: <CAHk-=whKBx_UUKagfyF72EJrpqNCupF4yeoPgapjEBe1bynGcw@mail.gmail.com>
On 25/07/2023 18:38, Linus Torvalds wrote:
> But before we revert it, would you mind trying out the attached
> trivial patch instead?
Not Fiona, but as I was still online yesterday I got around to already
try that patch out, after adding the missing `tsk` task_struct param
to the fatal_signal_pending call.
With the patched kernel booted, the original case we found in the wild
went from logging a segfault roughly twice per hour before, to none
afterward, and that with a bit more than 10h of boot time.
Fiona might have a more definitive confirmation, as IIRC she got a
better (= faster) reproducer used for bisecting.
>
> I'd also still be interested if the symptoms were anything else than
> 'show_unhandled_signals' causing the show_signal_msg() dance, and
> resulting in a message something like
>
> a.out[1567]: segfault at xyz ip [..] likely on CPU X
>
> in dmesg...
exactly, it was just like that with no actual fall out. The messages
were like:
> pverados[2183248]: segfault at 55e5a00f9ae0 ip 000055e5a00f9ae0 sp 00007ffc0720bea8 error 14 in perl[55e5a00d4000+195000] likely on CPU 10 (core 4, socket 0)
And the slightly odd code triggering this was basically a fork, where
the child wrote a message to the parent via a unix socket pair and
then called exit. The parent read that message and then send a SIGKILL
to the child process, i.e., the child exit and parent killing the
child process would be pretty closely aligned, basically racing with
each other.
cheers,
Thomas
next prev parent reply other threads:[~2023-07-26 6:51 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-25 11:16 segfaults of processes while being killed after commit "mm: make the page fault mmap locking killable" Fiona Ebner
2023-07-25 16:38 ` Linus Torvalds
2023-07-26 6:51 ` Thomas Lamprecht [this message]
2023-07-26 8:19 ` Fiona Ebner
2023-07-26 17:59 ` Linus Torvalds
2023-07-27 7:57 ` Fiona Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=85876d36-ca1f-4ba4-9065-4e7fc58329c0@proxmox.com \
--to=t.lamprecht@proxmox.com \
--cc=akpm@linux-foundation.org \
--cc=ebiederm@xmission.com \
--cc=f.ebner@proxmox.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=oleg@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=w.bumiller@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).