From: "Darrick J. Wong" <djwong@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Chinner <david@fromorbit.com>, Zorro Lang <zlang@redhat.com>,
linux-xfs@vger.kernel.org,
"Eric W. Biederman" <ebiederm@xmission.com>,
Mike Christie <michael.christie@oracle.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
linux-kernel@vger.kernel.org
Subject: Re: [6.5-rc5 regression] core dump hangs (was Re: [Bug report] fstests generic/051 (on xfs) hang on latest linux v6.5-rc5+)
Date: Mon, 12 Jun 2023 08:36:29 -0700 [thread overview]
Message-ID: <20230612153629.GA11427@frogsfrogsfrogs> (raw)
In-Reply-To: <CAHk-=whJqZLKPR-cpX-V4wJTXVX-_tG5Vjuj2q9knvKGCPdfkg@mail.gmail.com>
On Sun, Jun 11, 2023 at 08:14:25PM -0700, Linus Torvalds wrote:
> On Sun, Jun 11, 2023 at 7:22 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > I guess the regression fix needs a regression fix....
>
> Yup.
>
> From the description of the problem, it sounds like this happens on
> real hardware, no vhost anywhere?
>
> Or maybe Darrick (who doesn't see the issue) is running on raw
> hardware, and you and Zorro are running in a virtual environment?
Ahah, it turns out that liburing-dev isn't installed on the test fleet,
so fstests didn't get built with io_uring support. That probably
explains why I don't see any of these hangs.
Oh. I can't *install* the debian liburing-dev package because it has
a versioned dependency on linux-libc-dev >= 5.1, which isn't compatible
with me having a linux-libc-dev-djwong package that contains the uapi
headers for the latest upstream kernel and Replaces: linux-libc-dev.
So either I have to create a dummy linux-libc-dev with adequate version
number that pulls in my own libc header package, or rename that package.
<sigh> It's going to take me a while to research how best to split this
stupid knot.
--D
> It sounds like zap_other_threads() and coredump_task_exit() do not
> agree about the core_state->nr_threads counting, which is part of what
> changed there.
>
> [ Goes off to look ]
>
> Hmm. Both seem to be using the same test for
>
> (t->flags & (PF_IO_WORKER | PF_USER_WORKER)) != PF_USER_WORKER
>
> which I don't love - I don't think io_uring threads should participate
> in core dumping either, so I think the test could just be
>
> (t->flags & PF_IO_WORKER)
>
> but that shouldn't be the issue here.
>
> But according to
>
> https://lore.kernel.org/all/20230611124836.whfktwaumnefm5z5@zlang-mailbox/
>
> it's clearly hanging in wait_for_completion_state() in
> coredump_wait(), so it really looks like some confusion about that
> core_waiters (aka core_state->nr_threads) count.
>
> Oh. Humm. Mike changed that initial rough patch of mine, and I had
> moved the "if you don't participate in c ore dumps" test up also past
> the "do_coredump()" logic.
>
> And I think it's horribly *wrong* for a thread that doesn't get
> counted for core-dumping to go into do_coredump(), because then it
> will set the "core_state" to possibly be the core-state of the vhost
> thread that isn't even counted.
>
> So *maybe* this attached patch might fix it? I haven't thought very
> deeply about this, but vhost workers most definitely shouldn't call
> do_coredump(), since they are then not counted.
>
> (And again, I think we should just check that PF_IO_WORKER bit, not
> use this more complex test, but that's a separate and bigger change).
>
> Linus
> kernel/signal.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 2547fa73bde5..a1e11ee8537c 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2847,6 +2847,10 @@ bool get_signal(struct ksignal *ksig)
> */
> current->flags |= PF_SIGNALED;
>
> + /* vhost workers don't participate in core dups */
> + if ((current->flags & (PF_IO_WORKER | PF_USER_WORKER)) != PF_USER_WORKER)
> + goto out;
> +
> if (sig_kernel_coredump(signr)) {
> if (print_fatal_signals)
> print_fatal_signal(ksig->info.si_signo);
next prev parent reply other threads:[~2023-06-12 15:36 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20230611124836.whfktwaumnefm5z5@zlang-mailbox>
[not found] ` <ZIZSPyzReZkGBEFy@dread.disaster.area>
[not found] ` <20230612015145.GA11441@frogsfrogsfrogs>
2023-06-12 2:21 ` [6.5-rc5 regression] core dump hangs (was Re: [Bug report] fstests generic/051 (on xfs) hang on latest linux v6.5-rc5+) Dave Chinner
2023-06-12 3:14 ` Linus Torvalds
2023-06-12 5:16 ` Dave Chinner
2023-06-12 5:34 ` Linus Torvalds
2023-06-12 5:49 ` Dave Chinner
2023-06-12 6:11 ` Linus Torvalds
2023-06-12 8:45 ` Eric W. Biederman
2023-06-12 9:30 ` Zorro Lang
2023-06-12 11:27 ` Dave Chinner
2023-06-12 6:36 ` Zorro Lang
2023-06-12 15:36 ` Darrick J. Wong [this message]
2023-06-12 15:52 ` Eric W. Biederman
2023-06-12 15:56 ` Linus Torvalds
2023-06-12 16:27 ` Jens Axboe
2023-06-12 16:38 ` Jens Axboe
2023-06-12 16:42 ` Linus Torvalds
2023-06-12 16:45 ` Jens Axboe
2023-06-12 16:57 ` Linus Torvalds
2023-06-12 17:11 ` Eric W. Biederman
2023-06-12 17:30 ` Jens Axboe
2023-06-12 17:29 ` Jens Axboe
2023-06-12 17:51 ` Linus Torvalds
2023-06-12 17:53 ` Jens Axboe
2023-06-12 17:56 ` Linus Torvalds
2023-06-12 18:34 ` Linus Torvalds
2023-06-12 23:33 ` Dave Chinner
2023-06-12 16:45 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230612153629.GA11427@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=ebiederm@xmission.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=michael.christie@oracle.com \
--cc=mst@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=zlang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox