linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: paulmck@linux.vnet.ibm.com
Cc: Jiri Kosina <jkosina@suse.cz>,
	"Paul E. McKenney" <paul.mckenney@linaro.org>,
	Josh Triplett <josh@joshtriplett.org>,
	linux-kernel@vger.kernel.org
Subject: Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
Date: Wed, 03 Oct 2012 09:29:01 +0530	[thread overview]
Message-ID: <506BB805.3000707@linux.vnet.ibm.com> (raw)
In-Reply-To: <20121002233138.GD2465@linux.vnet.ibm.com>

On 10/03/2012 05:01 AM, Paul E. McKenney wrote:
> On Tue, Oct 02, 2012 at 11:58:36PM +0200, Jiri Kosina wrote:
>> On Tue, 2 Oct 2012, Jiri Kosina wrote:
>>
>>>>>> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is the first bad commit
>>>>>> commit 1331e7a1bbe1f11b19c4327ba0853bee2a606543
>>>>>> Author: Paul E. McKenney <paul.mckenney@linaro.org>
>>>>>> Date:   Thu Aug 2 17:43:50 2012 -0700
>>>>>>
>>>>>>     rcu: Remove _rcu_barrier() dependency on __stop_machine()
>>>>>>     
>>>>>>     Currently, _rcu_barrier() relies on preempt_disable() to prevent
>>>>>>     any CPU from going offline, which in turn depends on CPU hotplug's
>>>>>>     use of __stop_machine().
>>>>>>     
>>>>>>     This patch therefore makes _rcu_barrier() use get_online_cpus() to
>>>>>>     block CPU-hotplug operations.  This has the added benefit of removing
>>>>>>     the need for _rcu_barrier() to adopt callbacks:  Because CPU-hotplug
>>>>>>     operations are excluded, there can be no callbacks to adopt.  This
>>>>>>     commit simplifies the code accordingly.
>>>>>>     
>>>>>>     Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
>>>>>>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>>>>>>     Reviewed-by: Josh Triplett <josh@joshtriplett.org>
>>>>>> ==
>>>>>>
>>>>>> is causing lockdep to complain (see the full trace below). I haven't yet 
>>>>>> had time to analyze what exactly is happening, and probably will not have 
>>>>>> time to do so until tomorrow, so just sending this as a heads-up in case 
>>>>>> anyone sees the culprit immediately.
>>>>>
>>>>> Hmmm...  Does the following patch help?  It swaps the order in which
>>>>> rcu_barrier() acquires the hotplug and rcu_barrier locks.
>>>>
>>>> It changed the report slightly (see for example the change in possible 
>>>> unsafe locking scenario, rcu_sched_state.barrier_mutex vanished and it's 
>>>> now directly about cpu_hotplug.lock). With the patch applied I get
>>>>
>>>>
>>>>
>>>> ======================================================
>>>> [ INFO: possible circular locking dependency detected ]
>>>> 3.6.0-03888-g3f99f3b #145 Not tainted
>>>
>>> And it really seems valid. 
> 
> Yep, it sure is.  I wasn't getting the full picture earlier, so please
> accept my apologies for the bogus patch.
> 
>>> kmem_cache_destroy() calls rcu_barrier() with slab_mutex locked, which 
>>> introduces slab_mutex -> cpu_hotplug.lock dependency (through 
>>> rcu_barrier() -> _rcu_barrier() -> get_online_cpus()).
>>>
>>> On the other hand, _cpu_up() acquires cpu_hotplug.lock through 
>>> cpu_hotplug_begin(), and with this lock held cpuup_callback() notifier 
>>> gets called, which acquires slab_mutex. This gives the reverse dependency, 
>>> i.e. deadlock scenario is valid one.
>>>
>>> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is triggering this, because 
>>> before that, there was no slab_mutex -> cpu_hotplug.lock dependency.
>>>
>>> Simply put, the commit causes get_online_cpus() to be called with 
>>> slab_mutex held, which is invalid.
>>
>> Oh, and it seems to be actually triggering in real.
>>
>> With HEAD being 974a847e00c, machine suspends nicely. With 974a847e00c + 
>> your patch, changing the order in which rcu_barrier() acquires hotplug and 
>> rcu_barrier locks, the machine hangs 100% reliably during suspend, which 
>> very likely actually is the deadlock described above.
> 
> Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug
> notifier, which doesn't sit so well with rcu_barrier() trying to exclude
> CPU hotplug events.

Why not? IMHO it should have been perfectly fine! See below...

>  I could go back to the old approach, but it is
> significantly more complex.  I cannot say that I am all that happy
> about anyone calling rcu_barrier() from a CPU hotplug notifier because
> it doesn't help CPU hotplug latency, but that is a separate issue.
> 
> But the thing is that rcu_barrier()'s assumptions work just fine if either
> (1) it excludes hotplug operations or (2) if it is called from a hotplug
> notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
> is executing.  So the right way to resolve this seems to be to do the
> get_online_cpus() only if rcu_barrier() is -not- executing in the context
> of a hotplug notifier.  Should be fixable without too much hassle...
> 

The thing is, get_online_cpus() is smart: it *knows* when you are calling
it in a hotplug-writer, IOW, when you are in a hotplug notifier.

The relevant code is:

void get_online_cpus(void)
{
        might_sleep();
        if (cpu_hotplug.active_writer == current)
                return;
	....
}

So calling rcu_barrier() (and hence get_online_cpus()) from within a hotplug
notifier should pose no problem at all!
Regards,
Srivatsa S. Bhat


  parent reply	other threads:[~2012-10-03  3:59 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-02 16:14 Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Jiri Kosina
2012-10-02 17:01 ` Paul E. McKenney
2012-10-02 21:27   ` Jiri Kosina
2012-10-02 21:49     ` Jiri Kosina
2012-10-02 21:58       ` Jiri Kosina
2012-10-02 23:31         ` Paul E. McKenney
2012-10-02 23:48           ` Jiri Kosina
2012-10-03  0:15             ` Paul E. McKenney
2012-10-03  0:45               ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
2012-10-03  3:41                 ` Paul E. McKenney
2012-10-03  3:50                 ` Srivatsa S. Bhat
2012-10-03  6:08                   ` Srivatsa S. Bhat
2012-10-03  8:21                     ` Srivatsa S. Bhat
2012-10-03  9:46                 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 12:22                   ` Srivatsa S. Bhat
2012-10-03 12:53                     ` [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() Srivatsa S. Bhat
2012-10-03 21:13                       ` Andrew Morton
2012-10-04  6:16                         ` Srivatsa S. Bhat
2012-10-05  3:24                           ` Yasuaki Ishimatsu
2012-10-05  5:35                             ` Srivatsa S. Bhat
2012-10-03 14:50                     ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Paul E. McKenney
2012-10-03 14:55                       ` Srivatsa S. Bhat
2012-10-03 16:00                         ` Paul E. McKenney
2012-10-03 14:17                   ` Christoph Lameter
2012-10-03 14:15                 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Christoph Lameter
2012-10-03 14:34                   ` [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 15:00                     ` Srivatsa S. Bhat
2012-10-03 15:05                       ` [PATCH v4] " Jiri Kosina
2012-10-03 15:49                         ` Srivatsa S. Bhat
2012-10-03 18:49                         ` David Rientjes
2012-10-08  7:26                           ` [PATCH] [RESEND] " Jiri Kosina
2012-10-10  6:27                             ` Pekka Enberg
2012-10-03  3:59           ` Srivatsa S. Bhat [this message]
2012-10-03  4:07             ` Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Paul E. McKenney
2012-10-03  4:15               ` Srivatsa S. Bhat
2012-10-02 20:39 ` Srivatsa S. Bhat
2012-10-02 22:17   ` Jiri Kosina
2012-10-03  3:35     ` Srivatsa S. Bhat
2012-10-03  3:44       ` Paul E. McKenney
2012-10-03  4:04         ` Srivatsa S. Bhat
2012-10-03  7:43           ` Jiri Kosina
2012-10-03  8:11             ` Srivatsa S. Bhat
2012-10-03  8:19               ` Jiri Kosina
2012-10-03  8:30                 ` Srivatsa S. Bhat
2012-10-03  9:24                   ` Jiri Kosina
2012-10-03  9:58                     ` Srivatsa S. Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=506BB805.3000707@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=jkosina@suse.cz \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paul.mckenney@linaro.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).