From: Ingo Molnar <mingo@kernel.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: mingo@elte.hu, linux-kernel@vger.kernel.org,
josh@joshtriplett.org, tglx@linutronix.de, sbw@mit.edu
Subject: Re: [GIT PULL rcu/urgent] Fix for RCU-related hang
Date: Wed, 27 Jun 2012 07:33:07 +0200 [thread overview]
Message-ID: <20120627053307.GA14913@gmail.com> (raw)
In-Reply-To: <20120625223940.GA17159@linux.vnet.ibm.com>
* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> Hello, Ingo,
>
> This series has a single patch that fixes a system hang that can occur
> in perhaps unusual but very real circumstances. This hang occurs
> because of a very stupid bug of mine introduced in commit b1420f1c
> (Make rcu_barrier() less disruptive) that can cause CPUs to miscount
> RCU callbacks. The sequence of events leading to the hang is as follows:
>
> 1. A CPU miscounts its callbacks.
> 2. That CPU invokes all of its callbacks, so that its callback
> list is empty, but the callback count is nonzero.
> 3. That CPU goes offline. Because its callback list is empty,
> RCU's CPU-hotplug CPU_DEAD notifiers leave both the list and
> the count alone. (In contrast, had the list been non-empty,
> RCU's CPU_DEAD notifiers would have emptied the list and
> zeroed the count.)
> 4. One of the remaining CPUs executes one of the rcu_barrier()
> family of primitives. The rcu_barrier() primitive notes
> that the offline CPU has a non-zero count of callbacks, and
> therefore hangs waiting for this count to reach zero. The
> theory behind the indefinite wait is that the only reason that
> an offline CPU can have a non-zero number of RCU callbacks is
> that the CPU's CPU_DEAD notifiers have not yet executed.
> But they already have executed, so the offlined CPU's callback
> count will remain non-zero until it is brought back online,
> in other words, perhaps never.
>
> However, this bug is likely to pass a combined rcutorture/CPU-hotplug
> stress test because offlined CPUs tend to be brought back online
> reasonably quickly. For the rcutorture tests to fail, the system must be
> in the state indicated by step #3 above at the time the "rmmod rcutorture"
> executes.
>
> The fix is simply to prevent the miscounting.
>
> This change is available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> Paul E. McKenney (1):
> rcu: Stop rcu_do_batch() from multiplexing the "count" variable
>
> kernel/rcutree.c | 14 +++++++-------
> 1 files changed, 7 insertions(+), 7 deletions(-)
Pulled, thanks Paul!
Ingo
prev parent reply other threads:[~2012-06-27 5:33 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-25 22:39 [GIT PULL rcu/urgent] Fix for RCU-related hang Paul E. McKenney
2012-06-27 5:33 ` Ingo Molnar [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120627053307.GA14913@gmail.com \
--to=mingo@kernel.org \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=paulmck@linux.vnet.ibm.com \
--cc=sbw@mit.edu \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).