From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Jiri Kosina <jkosina@suse.cz>
Cc: "Paul E. McKenney" <paul.mckenney@linaro.org>,
Josh Triplett <josh@joshtriplett.org>,
linux-kernel@vger.kernel.org,
"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Subject: Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
Date: Tue, 2 Oct 2012 16:31:38 -0700 [thread overview]
Message-ID: <20121002233138.GD2465@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.LNX.2.00.1210022356370.23544@pobox.suse.cz>
On Tue, Oct 02, 2012 at 11:58:36PM +0200, Jiri Kosina wrote:
> On Tue, 2 Oct 2012, Jiri Kosina wrote:
>
> > > > > 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is the first bad commit
> > > > > commit 1331e7a1bbe1f11b19c4327ba0853bee2a606543
> > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > > Date: Thu Aug 2 17:43:50 2012 -0700
> > > > >
> > > > > rcu: Remove _rcu_barrier() dependency on __stop_machine()
> > > > >
> > > > > Currently, _rcu_barrier() relies on preempt_disable() to prevent
> > > > > any CPU from going offline, which in turn depends on CPU hotplug's
> > > > > use of __stop_machine().
> > > > >
> > > > > This patch therefore makes _rcu_barrier() use get_online_cpus() to
> > > > > block CPU-hotplug operations. This has the added benefit of removing
> > > > > the need for _rcu_barrier() to adopt callbacks: Because CPU-hotplug
> > > > > operations are excluded, there can be no callbacks to adopt. This
> > > > > commit simplifies the code accordingly.
> > > > >
> > > > > Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > > Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> > > > > ==
> > > > >
> > > > > is causing lockdep to complain (see the full trace below). I haven't yet
> > > > > had time to analyze what exactly is happening, and probably will not have
> > > > > time to do so until tomorrow, so just sending this as a heads-up in case
> > > > > anyone sees the culprit immediately.
> > > >
> > > > Hmmm... Does the following patch help? It swaps the order in which
> > > > rcu_barrier() acquires the hotplug and rcu_barrier locks.
> > >
> > > It changed the report slightly (see for example the change in possible
> > > unsafe locking scenario, rcu_sched_state.barrier_mutex vanished and it's
> > > now directly about cpu_hotplug.lock). With the patch applied I get
> > >
> > >
> > >
> > > ======================================================
> > > [ INFO: possible circular locking dependency detected ]
> > > 3.6.0-03888-g3f99f3b #145 Not tainted
> >
> > And it really seems valid.
Yep, it sure is. I wasn't getting the full picture earlier, so please
accept my apologies for the bogus patch.
> > kmem_cache_destroy() calls rcu_barrier() with slab_mutex locked, which
> > introduces slab_mutex -> cpu_hotplug.lock dependency (through
> > rcu_barrier() -> _rcu_barrier() -> get_online_cpus()).
> >
> > On the other hand, _cpu_up() acquires cpu_hotplug.lock through
> > cpu_hotplug_begin(), and with this lock held cpuup_callback() notifier
> > gets called, which acquires slab_mutex. This gives the reverse dependency,
> > i.e. deadlock scenario is valid one.
> >
> > 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is triggering this, because
> > before that, there was no slab_mutex -> cpu_hotplug.lock dependency.
> >
> > Simply put, the commit causes get_online_cpus() to be called with
> > slab_mutex held, which is invalid.
>
> Oh, and it seems to be actually triggering in real.
>
> With HEAD being 974a847e00c, machine suspends nicely. With 974a847e00c +
> your patch, changing the order in which rcu_barrier() acquires hotplug and
> rcu_barrier locks, the machine hangs 100% reliably during suspend, which
> very likely actually is the deadlock described above.
Indeed. Slab seems to be doing an rcu_barrier() in a CPU hotplug
notifier, which doesn't sit so well with rcu_barrier() trying to exclude
CPU hotplug events. I could go back to the old approach, but it is
significantly more complex. I cannot say that I am all that happy
about anyone calling rcu_barrier() from a CPU hotplug notifier because
it doesn't help CPU hotplug latency, but that is a separate issue.
But the thing is that rcu_barrier()'s assumptions work just fine if either
(1) it excludes hotplug operations or (2) if it is called from a hotplug
notifier. You see, either way, the CPU cannot go away while rcu_barrier()
is executing. So the right way to resolve this seems to be to do the
get_online_cpus() only if rcu_barrier() is -not- executing in the context
of a hotplug notifier. Should be fixable without too much hassle...
Thanx, Paul
next prev parent reply other threads:[~2012-10-02 23:31 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-02 16:14 Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Jiri Kosina
2012-10-02 17:01 ` Paul E. McKenney
2012-10-02 21:27 ` Jiri Kosina
2012-10-02 21:49 ` Jiri Kosina
2012-10-02 21:58 ` Jiri Kosina
2012-10-02 23:31 ` Paul E. McKenney [this message]
2012-10-02 23:48 ` Jiri Kosina
2012-10-03 0:15 ` Paul E. McKenney
2012-10-03 0:45 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
2012-10-03 3:41 ` Paul E. McKenney
2012-10-03 3:50 ` Srivatsa S. Bhat
2012-10-03 6:08 ` Srivatsa S. Bhat
2012-10-03 8:21 ` Srivatsa S. Bhat
2012-10-03 9:46 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 12:22 ` Srivatsa S. Bhat
2012-10-03 12:53 ` [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() Srivatsa S. Bhat
2012-10-03 21:13 ` Andrew Morton
2012-10-04 6:16 ` Srivatsa S. Bhat
2012-10-05 3:24 ` Yasuaki Ishimatsu
2012-10-05 5:35 ` Srivatsa S. Bhat
2012-10-03 14:50 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Paul E. McKenney
2012-10-03 14:55 ` Srivatsa S. Bhat
2012-10-03 16:00 ` Paul E. McKenney
2012-10-03 14:17 ` Christoph Lameter
2012-10-03 14:15 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Christoph Lameter
2012-10-03 14:34 ` [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 15:00 ` Srivatsa S. Bhat
2012-10-03 15:05 ` [PATCH v4] " Jiri Kosina
2012-10-03 15:49 ` Srivatsa S. Bhat
2012-10-03 18:49 ` David Rientjes
2012-10-08 7:26 ` [PATCH] [RESEND] " Jiri Kosina
2012-10-10 6:27 ` Pekka Enberg
2012-10-03 3:59 ` Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Srivatsa S. Bhat
2012-10-03 4:07 ` Paul E. McKenney
2012-10-03 4:15 ` Srivatsa S. Bhat
2012-10-02 20:39 ` Srivatsa S. Bhat
2012-10-02 22:17 ` Jiri Kosina
2012-10-03 3:35 ` Srivatsa S. Bhat
2012-10-03 3:44 ` Paul E. McKenney
2012-10-03 4:04 ` Srivatsa S. Bhat
2012-10-03 7:43 ` Jiri Kosina
2012-10-03 8:11 ` Srivatsa S. Bhat
2012-10-03 8:19 ` Jiri Kosina
2012-10-03 8:30 ` Srivatsa S. Bhat
2012-10-03 9:24 ` Jiri Kosina
2012-10-03 9:58 ` Srivatsa S. Bhat
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121002233138.GD2465@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=jkosina@suse.cz \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paul.mckenney@linaro.org \
--cc=srivatsa.bhat@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).