All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: paulmck@linux.vnet.ibm.com
Cc: Jiri Kosina <jkosina@suse.cz>,
	"Paul E. McKenney" <paul.mckenney@linaro.org>,
	Josh Triplett <josh@joshtriplett.org>,
	linux-kernel@vger.kernel.org
Subject: Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
Date: Wed, 03 Oct 2012 09:45:00 +0530	[thread overview]
Message-ID: <506BBBC4.9040606@linux.vnet.ibm.com> (raw)
In-Reply-To: <20121003040701.GD13192@linux.vnet.ibm.com>

On 10/03/2012 09:37 AM, Paul E. McKenney wrote:
> On Wed, Oct 03, 2012 at 09:29:01AM +0530, Srivatsa S. Bhat wrote:
>> On 10/03/2012 05:01 AM, Paul E. McKenney wrote:
>>> On Tue, Oct 02, 2012 at 11:58:36PM +0200, Jiri Kosina wrote:
>>>> On Tue, 2 Oct 2012, Jiri Kosina wrote:
>>>>
>>>>>>>> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is the first bad commit
>>>>>>>> commit 1331e7a1bbe1f11b19c4327ba0853bee2a606543
>>>>>>>> Author: Paul E. McKenney <paul.mckenney@linaro.org>
>>>>>>>> Date:   Thu Aug 2 17:43:50 2012 -0700
>>>>>>>>
>>>>>>>>     rcu: Remove _rcu_barrier() dependency on __stop_machine()
>>>>>>>>     
>>>>>>>>     Currently, _rcu_barrier() relies on preempt_disable() to prevent
>>>>>>>>     any CPU from going offline, which in turn depends on CPU hotplug's
>>>>>>>>     use of __stop_machine().
>>>>>>>>     
>>>>>>>>     This patch therefore makes _rcu_barrier() use get_online_cpus() to
>>>>>>>>     block CPU-hotplug operations.  This has the added benefit of removing
>>>>>>>>     the need for _rcu_barrier() to adopt callbacks:  Because CPU-hotplug
>>>>>>>>     operations are excluded, there can be no callbacks to adopt.  This
>>>>>>>>     commit simplifies the code accordingly.
>>>>>>>>     
>>>>>>>>     Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
>>>>>>>>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>>>>>>>>     Reviewed-by: Josh Triplett <josh@joshtriplett.org>
>>>>>>>> ==
>>>>>>>>
>>>>>>>> is causing lockdep to complain (see the full trace below). I haven't yet 
>>>>>>>> had time to analyze what exactly is happening, and probably will not have 
>>>>>>>> time to do so until tomorrow, so just sending this as a heads-up in case 
>>>>>>>> anyone sees the culprit immediately.
>>>>>>>
>>>>>>> Hmmm...  Does the following patch help?  It swaps the order in which
>>>>>>> rcu_barrier() acquires the hotplug and rcu_barrier locks.
>>>>>>
>>>>>> It changed the report slightly (see for example the change in possible 
>>>>>> unsafe locking scenario, rcu_sched_state.barrier_mutex vanished and it's 
>>>>>> now directly about cpu_hotplug.lock). With the patch applied I get
>>>>>>
>>>>>>
>>>>>>
>>>>>> ======================================================
>>>>>> [ INFO: possible circular locking dependency detected ]
>>>>>> 3.6.0-03888-g3f99f3b #145 Not tainted
>>>>>
>>>>> And it really seems valid. 
>>>
>>> Yep, it sure is.  I wasn't getting the full picture earlier, so please
>>> accept my apologies for the bogus patch.
>>>
>>>>> kmem_cache_destroy() calls rcu_barrier() with slab_mutex locked, which 
>>>>> introduces slab_mutex -> cpu_hotplug.lock dependency (through 
>>>>> rcu_barrier() -> _rcu_barrier() -> get_online_cpus()).
>>>>>
>>>>> On the other hand, _cpu_up() acquires cpu_hotplug.lock through 
>>>>> cpu_hotplug_begin(), and with this lock held cpuup_callback() notifier 
>>>>> gets called, which acquires slab_mutex. This gives the reverse dependency, 
>>>>> i.e. deadlock scenario is valid one.
>>>>>
>>>>> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is triggering this, because 
>>>>> before that, there was no slab_mutex -> cpu_hotplug.lock dependency.
>>>>>
>>>>> Simply put, the commit causes get_online_cpus() to be called with 
>>>>> slab_mutex held, which is invalid.
>>>>
>>>> Oh, and it seems to be actually triggering in real.
>>>>
>>>> With HEAD being 974a847e00c, machine suspends nicely. With 974a847e00c + 
>>>> your patch, changing the order in which rcu_barrier() acquires hotplug and 
>>>> rcu_barrier locks, the machine hangs 100% reliably during suspend, which 
>>>> very likely actually is the deadlock described above.
>>>
>>> Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug
>>> notifier, which doesn't sit so well with rcu_barrier() trying to exclude
>>> CPU hotplug events.
>>
>> Why not? IMHO it should have been perfectly fine! See below...
>>
>>>  I could go back to the old approach, but it is
>>> significantly more complex.  I cannot say that I am all that happy
>>> about anyone calling rcu_barrier() from a CPU hotplug notifier because
>>> it doesn't help CPU hotplug latency, but that is a separate issue.
>>>
>>> But the thing is that rcu_barrier()'s assumptions work just fine if either
>>> (1) it excludes hotplug operations or (2) if it is called from a hotplug
>>> notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
>>> is executing.  So the right way to resolve this seems to be to do the
>>> get_online_cpus() only if rcu_barrier() is -not- executing in the context
>>> of a hotplug notifier.  Should be fixable without too much hassle...
>>>
>>
>> The thing is, get_online_cpus() is smart: it *knows* when you are calling
>> it in a hotplug-writer, IOW, when you are in a hotplug notifier.
>>
>> The relevant code is:
>>
>> void get_online_cpus(void)
>> {
>>         might_sleep();
>>         if (cpu_hotplug.active_writer == current)
>>                 return;
>> 	....
>> }
>>
>> So calling rcu_barrier() (and hence get_online_cpus()) from within a hotplug
>> notifier should pose no problem at all!
> 
> Indeed, that was my confusion.  The deadlock can happen with
> the slab CPU-hotplug notifier (without calling rcu_barrier()!), which
> establishes hotplug->slab.  The some other unrelated thread calls
> kmem_cache_destroy(), which acquires slab and then calls rcu_barrier(),
> which acquires hotplug.  So the deadlock can happen independently of
> rcu_barrier() being called from a CPU-hotplug notifier.
> 

Right, this is exactly what I thought yesterday. I had drafted a mail explaining
why the length of the circular locking dependency is really 2 but not 3 and
why the rcu_barrier() (barrier_mutex) is only aggravating a problem that is
there even without using rcu_barrier() at all. But then I stopped short of posting
it when I noticed the get/put_online_cpus() in kmem_cache_destroy() which really
looked puzzling to me. I (still) can't get myself to believe that kmem_cache_destroy()
could go beyond its get_online_cpus() and call rcu_barrier() at all, in the
presence of a concurrent CPU hotplug notifier!

Regards,
Srivatsa S. Bhat

> Making kmem_cache_destroy() release slab before calling rcu_barrier()
> seems to clear things up for Jiri, but we need Pekka's or Christoph
> Lameter's view on whether this is really safe.  (It looks safe to
> both Jiri and I, but...)
> 
> 							Thanx, Paul
> 


  reply	other threads:[~2012-10-03  4:15 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-02 16:14 Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Jiri Kosina
2012-10-02 17:01 ` Paul E. McKenney
2012-10-02 21:27   ` Jiri Kosina
2012-10-02 21:49     ` Jiri Kosina
2012-10-02 21:58       ` Jiri Kosina
2012-10-02 23:31         ` Paul E. McKenney
2012-10-02 23:48           ` Jiri Kosina
2012-10-03  0:15             ` Paul E. McKenney
2012-10-03  0:45               ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
2012-10-03  0:45                 ` Jiri Kosina
2012-10-03  3:41                 ` Paul E. McKenney
2012-10-03  3:41                   ` Paul E. McKenney
2012-10-03  3:50                 ` Srivatsa S. Bhat
2012-10-03  3:50                   ` Srivatsa S. Bhat
2012-10-03  6:08                   ` Srivatsa S. Bhat
2012-10-03  6:08                     ` Srivatsa S. Bhat
2012-10-03  8:21                     ` Srivatsa S. Bhat
2012-10-03  8:21                       ` Srivatsa S. Bhat
2012-10-03  9:46                 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03  9:46                   ` Jiri Kosina
2012-10-03 12:22                   ` Srivatsa S. Bhat
2012-10-03 12:22                     ` Srivatsa S. Bhat
2012-10-03 12:53                     ` [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() Srivatsa S. Bhat
2012-10-03 12:53                       ` Srivatsa S. Bhat
2012-10-03 21:13                       ` Andrew Morton
2012-10-03 21:13                         ` Andrew Morton
2012-10-04  6:16                         ` Srivatsa S. Bhat
2012-10-04  6:16                           ` Srivatsa S. Bhat
2012-10-05  3:24                           ` Yasuaki Ishimatsu
2012-10-05  3:24                             ` Yasuaki Ishimatsu
2012-10-05  5:35                             ` Srivatsa S. Bhat
2012-10-05  5:35                               ` Srivatsa S. Bhat
2012-10-03 14:50                     ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Paul E. McKenney
2012-10-03 14:50                       ` Paul E. McKenney
2012-10-03 14:55                       ` Srivatsa S. Bhat
2012-10-03 14:55                         ` Srivatsa S. Bhat
2012-10-03 16:00                         ` Paul E. McKenney
2012-10-03 16:00                           ` Paul E. McKenney
2012-10-03 14:17                   ` Christoph Lameter
2012-10-03 14:17                     ` Christoph Lameter
2012-10-03 14:15                 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Christoph Lameter
2012-10-03 14:15                   ` Christoph Lameter
2012-10-03 14:34                   ` [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 14:34                     ` Jiri Kosina
2012-10-03 15:00                     ` Srivatsa S. Bhat
2012-10-03 15:00                       ` Srivatsa S. Bhat
2012-10-03 15:05                       ` [PATCH v4] " Jiri Kosina
2012-10-03 15:05                         ` Jiri Kosina
2012-10-03 15:49                         ` Srivatsa S. Bhat
2012-10-03 15:49                           ` Srivatsa S. Bhat
2012-10-03 18:49                         ` David Rientjes
2012-10-03 18:49                           ` David Rientjes
2012-10-08  7:26                           ` [PATCH] [RESEND] " Jiri Kosina
2012-10-08  7:26                             ` Jiri Kosina
2012-10-10  6:27                             ` Pekka Enberg
2012-10-10  6:27                               ` Pekka Enberg
2012-10-03  3:59           ` Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Srivatsa S. Bhat
2012-10-03  4:07             ` Paul E. McKenney
2012-10-03  4:15               ` Srivatsa S. Bhat [this message]
2012-10-02 20:39 ` Srivatsa S. Bhat
2012-10-02 22:17   ` Jiri Kosina
2012-10-03  3:35     ` Srivatsa S. Bhat
2012-10-03  3:44       ` Paul E. McKenney
2012-10-03  4:04         ` Srivatsa S. Bhat
2012-10-03  7:43           ` Jiri Kosina
2012-10-03  8:11             ` Srivatsa S. Bhat
2012-10-03  8:19               ` Jiri Kosina
2012-10-03  8:30                 ` Srivatsa S. Bhat
2012-10-03  9:24                   ` Jiri Kosina
2012-10-03  9:58                     ` Srivatsa S. Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=506BBBC4.9040606@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=jkosina@suse.cz \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paul.mckenney@linaro.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.