All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Jiri Kosina <jkosina@suse.cz>
Cc: "Paul E. McKenney" <paul.mckenney@linaro.org>,
	Josh Triplett <josh@joshtriplett.org>,
	linux-kernel@vger.kernel.org,
	"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Subject: Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
Date: Tue, 2 Oct 2012 16:31:38 -0700	[thread overview]
Message-ID: <20121002233138.GD2465@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.LNX.2.00.1210022356370.23544@pobox.suse.cz>

On Tue, Oct 02, 2012 at 11:58:36PM +0200, Jiri Kosina wrote:
> On Tue, 2 Oct 2012, Jiri Kosina wrote:
> 
> > > > > 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is the first bad commit
> > > > > commit 1331e7a1bbe1f11b19c4327ba0853bee2a606543
> > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > > Date:   Thu Aug 2 17:43:50 2012 -0700
> > > > > 
> > > > >     rcu: Remove _rcu_barrier() dependency on __stop_machine()
> > > > >     
> > > > >     Currently, _rcu_barrier() relies on preempt_disable() to prevent
> > > > >     any CPU from going offline, which in turn depends on CPU hotplug's
> > > > >     use of __stop_machine().
> > > > >     
> > > > >     This patch therefore makes _rcu_barrier() use get_online_cpus() to
> > > > >     block CPU-hotplug operations.  This has the added benefit of removing
> > > > >     the need for _rcu_barrier() to adopt callbacks:  Because CPU-hotplug
> > > > >     operations are excluded, there can be no callbacks to adopt.  This
> > > > >     commit simplifies the code accordingly.
> > > > >     
> > > > >     Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > >     Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> > > > > ==
> > > > > 
> > > > > is causing lockdep to complain (see the full trace below). I haven't yet 
> > > > > had time to analyze what exactly is happening, and probably will not have 
> > > > > time to do so until tomorrow, so just sending this as a heads-up in case 
> > > > > anyone sees the culprit immediately.
> > > > 
> > > > Hmmm...  Does the following patch help?  It swaps the order in which
> > > > rcu_barrier() acquires the hotplug and rcu_barrier locks.
> > > 
> > > It changed the report slightly (see for example the change in possible 
> > > unsafe locking scenario, rcu_sched_state.barrier_mutex vanished and it's 
> > > now directly about cpu_hotplug.lock). With the patch applied I get
> > > 
> > > 
> > > 
> > > ======================================================
> > > [ INFO: possible circular locking dependency detected ]
> > > 3.6.0-03888-g3f99f3b #145 Not tainted
> > 
> > And it really seems valid. 

Yep, it sure is.  I wasn't getting the full picture earlier, so please
accept my apologies for the bogus patch.

> > kmem_cache_destroy() calls rcu_barrier() with slab_mutex locked, which 
> > introduces slab_mutex -> cpu_hotplug.lock dependency (through 
> > rcu_barrier() -> _rcu_barrier() -> get_online_cpus()).
> > 
> > On the other hand, _cpu_up() acquires cpu_hotplug.lock through 
> > cpu_hotplug_begin(), and with this lock held cpuup_callback() notifier 
> > gets called, which acquires slab_mutex. This gives the reverse dependency, 
> > i.e. deadlock scenario is valid one.
> > 
> > 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is triggering this, because 
> > before that, there was no slab_mutex -> cpu_hotplug.lock dependency.
> > 
> > Simply put, the commit causes get_online_cpus() to be called with 
> > slab_mutex held, which is invalid.
> 
> Oh, and it seems to be actually triggering in real.
> 
> With HEAD being 974a847e00c, machine suspends nicely. With 974a847e00c + 
> your patch, changing the order in which rcu_barrier() acquires hotplug and 
> rcu_barrier locks, the machine hangs 100% reliably during suspend, which 
> very likely actually is the deadlock described above.

Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug
notifier, which doesn't sit so well with rcu_barrier() trying to exclude
CPU hotplug events.  I could go back to the old approach, but it is
significantly more complex.  I cannot say that I am all that happy
about anyone calling rcu_barrier() from a CPU hotplug notifier because
it doesn't help CPU hotplug latency, but that is a separate issue.

But the thing is that rcu_barrier()'s assumptions work just fine if either
(1) it excludes hotplug operations or (2) if it is called from a hotplug
notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
is executing.  So the right way to resolve this seems to be to do the
get_online_cpus() only if rcu_barrier() is -not- executing in the context
of a hotplug notifier.  Should be fixable without too much hassle...

							Thanx, Paul


  reply	other threads:[~2012-10-02 23:31 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-02 16:14 Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Jiri Kosina
2012-10-02 17:01 ` Paul E. McKenney
2012-10-02 21:27   ` Jiri Kosina
2012-10-02 21:49     ` Jiri Kosina
2012-10-02 21:58       ` Jiri Kosina
2012-10-02 23:31         ` Paul E. McKenney [this message]
2012-10-02 23:48           ` Jiri Kosina
2012-10-03  0:15             ` Paul E. McKenney
2012-10-03  0:45               ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
2012-10-03  0:45                 ` Jiri Kosina
2012-10-03  3:41                 ` Paul E. McKenney
2012-10-03  3:41                   ` Paul E. McKenney
2012-10-03  3:50                 ` Srivatsa S. Bhat
2012-10-03  3:50                   ` Srivatsa S. Bhat
2012-10-03  6:08                   ` Srivatsa S. Bhat
2012-10-03  6:08                     ` Srivatsa S. Bhat
2012-10-03  8:21                     ` Srivatsa S. Bhat
2012-10-03  8:21                       ` Srivatsa S. Bhat
2012-10-03  9:46                 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03  9:46                   ` Jiri Kosina
2012-10-03 12:22                   ` Srivatsa S. Bhat
2012-10-03 12:22                     ` Srivatsa S. Bhat
2012-10-03 12:53                     ` [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() Srivatsa S. Bhat
2012-10-03 12:53                       ` Srivatsa S. Bhat
2012-10-03 21:13                       ` Andrew Morton
2012-10-03 21:13                         ` Andrew Morton
2012-10-04  6:16                         ` Srivatsa S. Bhat
2012-10-04  6:16                           ` Srivatsa S. Bhat
2012-10-05  3:24                           ` Yasuaki Ishimatsu
2012-10-05  3:24                             ` Yasuaki Ishimatsu
2012-10-05  5:35                             ` Srivatsa S. Bhat
2012-10-05  5:35                               ` Srivatsa S. Bhat
2012-10-03 14:50                     ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Paul E. McKenney
2012-10-03 14:50                       ` Paul E. McKenney
2012-10-03 14:55                       ` Srivatsa S. Bhat
2012-10-03 14:55                         ` Srivatsa S. Bhat
2012-10-03 16:00                         ` Paul E. McKenney
2012-10-03 16:00                           ` Paul E. McKenney
2012-10-03 14:17                   ` Christoph Lameter
2012-10-03 14:17                     ` Christoph Lameter
2012-10-03 14:15                 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Christoph Lameter
2012-10-03 14:15                   ` Christoph Lameter
2012-10-03 14:34                   ` [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 14:34                     ` Jiri Kosina
2012-10-03 15:00                     ` Srivatsa S. Bhat
2012-10-03 15:00                       ` Srivatsa S. Bhat
2012-10-03 15:05                       ` [PATCH v4] " Jiri Kosina
2012-10-03 15:05                         ` Jiri Kosina
2012-10-03 15:49                         ` Srivatsa S. Bhat
2012-10-03 15:49                           ` Srivatsa S. Bhat
2012-10-03 18:49                         ` David Rientjes
2012-10-03 18:49                           ` David Rientjes
2012-10-08  7:26                           ` [PATCH] [RESEND] " Jiri Kosina
2012-10-08  7:26                             ` Jiri Kosina
2012-10-10  6:27                             ` Pekka Enberg
2012-10-10  6:27                               ` Pekka Enberg
2012-10-03  3:59           ` Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Srivatsa S. Bhat
2012-10-03  4:07             ` Paul E. McKenney
2012-10-03  4:15               ` Srivatsa S. Bhat
2012-10-02 20:39 ` Srivatsa S. Bhat
2012-10-02 22:17   ` Jiri Kosina
2012-10-03  3:35     ` Srivatsa S. Bhat
2012-10-03  3:44       ` Paul E. McKenney
2012-10-03  4:04         ` Srivatsa S. Bhat
2012-10-03  7:43           ` Jiri Kosina
2012-10-03  8:11             ` Srivatsa S. Bhat
2012-10-03  8:19               ` Jiri Kosina
2012-10-03  8:30                 ` Srivatsa S. Bhat
2012-10-03  9:24                   ` Jiri Kosina
2012-10-03  9:58                     ` Srivatsa S. Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121002233138.GD2465@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=jkosina@suse.cz \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paul.mckenney@linaro.org \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.