From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Sasha Levin <levinsasha928@gmail.com>
Cc: Michael Wang <wangyun@linux.vnet.ibm.com>,
Dave Jones <davej@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: RCU idle CPU detection is broken in linux-next
Date: Fri, 21 Sep 2012 08:12:03 -0700 [thread overview]
Message-ID: <20120921151203.GA2454@linux.vnet.ibm.com> (raw)
In-Reply-To: <505C6B03.7020305@gmail.com>
On Fri, Sep 21, 2012 at 03:26:27PM +0200, Sasha Levin wrote:
> On 09/21/2012 02:13 PM, Paul E. McKenney wrote:
> >> This might be unrelated, but I got the following dump as well when trinity
> >> > decided it's time to reboot my guest:
> > OK, sounds like we should hold off until you reproduce, then.
>
> I'm not sure what you mean.
>
> There are basically two issues I'm seeing now, which reproduce pretty much every
> time:
>
> 1. The "using when idle" warning.
> 2. The rcu related hangs during shutdown.
>
> The first one appears early on when I start fuzzing, the other one happens when
> shutting down - so both of them are reproducible in the same session.
Ah, I misunderstood the "reboot my guest" -- I thought that you were
doing something like repeated modprobe/rmmod cycles on rcutorture while
running the guest for an extended time period. That will teach me not
to reply to email so soon after waking up. ;-)
That said, #2 is expected behavior given the RCU CPU stall warnings in
your Sept. 20 dmesg. This is because rcutorture does rcu_barrier() on
the way out, which cannot complete if grace periods are not completing.
And the later soft lockup is also likely a consequence of the stall,
because CPU hotplug does a synchronize_sched() while holding the hotplug
lock, which will then cause get_online_cpus() to hang.
Looking further down, there are hung tasks that are waiting for a
timeout, but this is also a consequence of the hang because they
are waiting for MAX_SCHEDULE_TIMEOUT -- in other words, they are
waiting to be killed at shutdown time. I could suppress this by using
schedule_timeout_interruptible() in a loop in order to reduce the noise
in this case.
The remaining traces in that email are also consequences of the stall.
So why the stall?
Using RCU from a CPU that RCU believes to be idle can cause arbitrary
bad behavior (possibly including stalls), but with very low probability.
The reason that things can go arbitrarily bad is that RCU is ignoring
the CPU, and thus not waiting for any RCU read-side critical sections.
This could of course result in abitrary corruption of memory. The reason
for the low probability is that grace periods tend to be long and RCU
read-side critical sections tend to be short.
It looks like you are running -next, which has RCU grace periods driven
by a kthread. Is it possible that this kthread is not getting a chance
to run (in fact, the "Stall ended before state dump start" is consistent
with that possibility), but in that case I would expect to see a soft
lockup from it. Furthermore, in that case, it would be expected to
start running again as soon as things started going idle during shutdown.
Or did the system somehow manage to stay busy despite being in shutdown?
Or, for that matter, are you overcommitting the physical CPUs on your
trinity test setup?
Thanx, Paul
next prev parent reply other threads:[~2012-09-21 15:12 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-12 17:56 RCU idle CPU detection is broken in linux-next Sasha Levin
2012-09-19 15:39 ` Paul E. McKenney
2012-09-19 16:35 ` Sasha Levin
2012-09-19 17:06 ` Paul E. McKenney
2012-09-19 22:27 ` Sasha Levin
2012-09-20 7:33 ` Michael Wang
2012-09-20 7:44 ` Sasha Levin
2012-09-20 8:14 ` Michael Wang
2012-09-20 15:23 ` Paul E. McKenney
2012-09-21 9:30 ` Sasha Levin
2012-09-21 12:13 ` Paul E. McKenney
2012-09-21 13:26 ` Sasha Levin
2012-09-21 15:12 ` Paul E. McKenney [this message]
2012-09-21 15:18 ` Sasha Levin
2012-09-22 8:26 ` Sasha Levin
2012-09-22 15:09 ` Paul E. McKenney
2012-09-22 15:20 ` Paul E. McKenney
2012-09-22 15:40 ` Sasha Levin
2012-09-22 15:56 ` Paul E. McKenney
2012-09-22 17:50 ` Sasha Levin
2012-09-22 21:27 ` Paul E. McKenney
2012-09-23 0:21 ` Paul E. McKenney
2012-09-23 5:39 ` Sasha Levin
2012-09-24 21:29 ` Frederic Weisbecker
2012-09-24 22:47 ` Sasha Levin
2012-09-24 22:54 ` Sasha Levin
2012-09-24 23:06 ` Frederic Weisbecker
2012-09-24 23:10 ` Sasha Levin
2012-09-24 23:35 ` Frederic Weisbecker
2012-09-24 23:41 ` Frederic Weisbecker
2012-09-25 4:04 ` Paul E. McKenney
2012-09-25 11:59 ` Frederic Weisbecker
2012-09-25 13:04 ` Paul E. McKenney
2012-09-26 14:56 ` Frederic Weisbecker
2012-09-26 16:26 ` Paul E. McKenney
2012-09-25 12:06 ` Frederic Weisbecker
2012-09-25 18:28 ` Sasha Levin
2012-09-25 18:36 ` Paul E. McKenney
2012-09-26 15:46 ` Frederic Weisbecker
2012-09-26 16:59 ` Paul E. McKenney
2012-09-26 14:58 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120921151203.GA2454@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=davej@redhat.com \
--cc=levinsasha928@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=wangyun@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).