From: Zhouyi Zhou <zhouzhouyi@gmail.com>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: rcu <rcu@vger.kernel.org>,
Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
Nicholas Piggin <npiggin@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: rcu_sched self-detected stall on CPU
Date: Fri, 8 Apr 2022 18:02:19 +0800 [thread overview]
Message-ID: <CAABZP2wVAzybDTjUWxwGG4HmWK7V8rJVVFxpRx-3F9n5oST3+A@mail.gmail.com> (raw)
In-Reply-To: <87pmls6nt7.fsf@mpe.ellerman.id.au>
On Fri, Apr 8, 2022 at 3:23 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> "Paul E. McKenney" <paulmck@kernel.org> writes:
> > On Wed, Apr 06, 2022 at 05:31:10PM +0800, Zhouyi Zhou wrote:
> >> Hi
> >>
> >> I can reproduce it in a ppc virtual cloud server provided by Oregon
> >> State University. Following is what I do:
> >> 1) curl -l https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/snapshot/linux-5.18-rc1.tar.gz
> >> -o linux-5.18-rc1.tar.gz
> >> 2) tar zxf linux-5.18-rc1.tar.gz
> >> 3) cp config linux-5.18-rc1/.config
> >> 4) cd linux-5.18-rc1
> >> 5) make vmlinux -j 8
> >> 6) qemu-system-ppc64 -kernel vmlinux -nographic -vga none -no-reboot
> >> -smp 2 (QEMU 4.2.1)
> >> 7) after 12 rounds, the bug got reproduced:
> >> (http://154.223.142.244/logs/20220406/qemu.log.txt)
> >
> > Just to make sure, are you both seeing the same thing? Last I knew,
> > Zhouyi was chasing an RCU-tasks issue that appears only in kernels
> > built with CONFIG_PROVE_RCU=y, which Miguel does not have set. Or did
> > I miss something?
> >
> > Miguel is instead seeing an RCU CPU stall warning where RCU's grace-period
> > kthread slept for three milliseconds, but did not wake up for more than
> > 20 seconds. This kthread would normally have awakened on CPU 1, but
> > CPU 1 looks to me to be very unhealthy, as can be seen in your console
> > output below (but maybe my idea of what is healthy for powerpc systems
> > is outdated). Please see also the inline annotations.
> >
> > Thoughts from the PPC guys?
>
> I haven't seen it in my testing. But using Miguel's config I can
> reproduce it seemingly on every boot.
>
> For me it bisects to:
>
> 35de589cb879 ("powerpc/time: improve decrementer clockevent processing")
>
> Which seems plausible.
I also bisect to 35de589cb879 ("powerpc/time: improve decrementer
clockevent processing")
>
> Reverting that on mainline makes the bug go away.
I also revert that on the mainline, and am currently doing a pressure
test (by repeatedly invoking qemu and checking the console.log) on PPC
VM in Oregon State University.
>
> I don't see an obvious bug in the diff, but I could be wrong, or the old
> code was papering over an existing bug?
>
> I'll try and work out what it is about Miguel's config that exposes
> this vs our defconfig, that might give us a clue.
Great job!
>
> cheers
Thanks
Zhouyi
next prev parent reply other threads:[~2022-04-08 10:03 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-05 21:41 rcu_sched self-detected stall on CPU Miguel Ojeda
2022-04-06 9:31 ` Zhouyi Zhou
2022-04-06 17:00 ` Paul E. McKenney
2022-04-06 18:25 ` Zhouyi Zhou
2022-04-06 19:50 ` Paul E. McKenney
2022-04-07 2:26 ` Zhouyi Zhou
2022-04-07 10:07 ` Miguel Ojeda
2022-04-07 15:15 ` Paul E. McKenney
2022-04-07 17:05 ` Miguel Ojeda
2022-04-07 17:55 ` Paul E. McKenney
2022-04-07 23:14 ` Zhouyi Zhou
2022-04-08 1:43 ` Paul E. McKenney
2022-04-08 7:23 ` Michael Ellerman
2022-04-08 10:02 ` Zhouyi Zhou [this message]
2022-04-08 14:07 ` Paul E. McKenney
2022-04-08 14:25 ` Zhouyi Zhou
2022-04-10 11:33 ` Michael Ellerman
2022-04-11 3:05 ` Paul E. McKenney
2022-04-12 6:53 ` Michael Ellerman
2022-04-12 13:36 ` Paul E. McKenney
2022-04-08 13:52 ` Miguel Ojeda
2022-04-08 14:06 ` Paul E. McKenney
2022-04-08 14:42 ` Michael Ellerman
2022-04-08 15:52 ` Paul E. McKenney
2022-04-08 17:02 ` Miguel Ojeda
2022-04-13 5:11 ` Nicholas Piggin
2022-04-13 6:10 ` Low-res tick handler device not going to ONESHOT_STOPPED when tick is stopped (was: rcu_sched self-detected stall on CPU) Nicholas Piggin
2022-04-14 17:15 ` Paul E. McKenney
2022-04-22 15:53 ` Thomas Gleixner
2022-04-23 2:29 ` Re: Nicholas Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAABZP2wVAzybDTjUWxwGG4HmWK7V8rJVVFxpRx-3F9n5oST3+A@mail.gmail.com \
--to=zhouzhouyi@gmail.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=miguel.ojeda.sandonis@gmail.com \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).