From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Jiri Kosina <jkosina@suse.cz>
Cc: paulmck@linux.vnet.ibm.com,
"Paul E. McKenney" <paul.mckenney@linaro.org>,
Josh Triplett <josh@joshtriplett.org>,
linux-kernel@vger.kernel.org
Subject: Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
Date: Wed, 03 Oct 2012 15:28:17 +0530 [thread overview]
Message-ID: <506C0C39.1060302@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.LNX.2.00.1210031107050.23544@pobox.suse.cz>
On 10/03/2012 02:54 PM, Jiri Kosina wrote:
> On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:
>
>>>> static void cpu_hotplug_begin(void)
>>>> {
>>>> cpu_hotplug.active_writer = current;
>>>>
>>>> for (;;) {
>>>> mutex_lock(&cpu_hotplug.lock);
>>>> if (likely(!cpu_hotplug.refcount)) <================ This one!
>>>> break;
>>>> __set_current_state(TASK_UNINTERRUPTIBLE);
>>>> mutex_unlock(&cpu_hotplug.lock);
>>>> schedule();
>>>> }
>>>> }
>>>
>>> I acutally just came to the same conclusion (7 hours of sleep later, the
>>> mind indeed seems to be brighter ... what a poet I am).
>>>
>>> Lockdep doesn't know about this semantics of cpu_hotplug_begin(), and
>>> therefore gets confused by the fact that mutual exclusion is actually
>>> achieved through the refcount instead of mutex (and the same apparently
>>> happened to me).
>>
>> No, that's not the problem. Lockdep is fine. The calltrace clearly shows that
>> our refcounting has messed up somewhere. As a result, we really *are* running
>> a hotplug-reader and a hotplug-writer at the same time! We really need to fix
>> *that*! So please try the second debug patch I sent just now (with the BUG_ON()
>> in put_online_cpus()). We need to know who is calling put_online_cpus() twice
>> and fix that culprit!
>
> I don't think so.
>
> Lockdep is complaining, because
>
> (a) during system bootup, the smp_init() -> cpu_up() -> cpuup_callback()
> teaches him about hotplug.lock -> slab_mutex dependency
>
> (b) many many jiffies later, nf_conntrack_cleanup_net() calls
> kmem_cache_destroy(), which introduces slab_mutex -> hotplug.lock
> dependency
>
> Lockdep rightfully (from his POV) sees this as potential ABBA, and reports
> it, it's as simple as that.
> It has no way of knowing the fact that the ABBA can actually never happen,
> because of special semantics of cpu_hotplug.refcount and it's handling in
> cpu_hotplug_begin().
>
Hmm, you are right.
> The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
> until everyone who called get_online_cpus() will call put_online_cpus()"
> is totally invisible to lockdep.
I see your point..
>
>>> So right, now I agree that the deadlock scenario I have come up with is
>>> indeed bogus (*), and we just have to annotate this fact to lockdep
>>> somehow.
>>
>> Yes, the deadlock scenario is bogus, but the refcounting leak is for real
>> and needs fixing.
>
> With your patch applied, the BUG_ON() in put_online_cpus() didn't trigger
> for me at all. Which is what I expected.
Oh, ok..
>
>> I'm fine with this, but the real problem is elsewhere, like I mentioned above.
>> This one is only a good-to-have, not a fix.
>>
>>> (*) I have seen machine locking hard reproducibly, but that was only with
>>> additional Paul's patch, so I guess the lock order there actually was
>>> wrong
>>
>> If refcounting was working fine, Paul's patch wouldn't have caused *any* issues.
>> With that patch in place, the 2 places where rcu_barrier() get invoked (ie.,
>> kmem_cache_destroy() and deactivate_locked_super()) both start waiting on
>> get_online_cpus() until the slab cpu hotplug notifier as well as the entire
>> cpu_up operation completes. Absolutely no problem in that! So the fact that
>> you are seeing lock-ups here is another indication that the problem is really
>> elsewhere!
>
> I don't agree. The reason why Paul's patch (1331e7a1bb) started to trigger
> this, is that (b) above doesn't exist in pre-1331e7a1bb kernels.
>
So basically what you are saying is, the calltraces in the lockdep splat are from
different points in time right? Then I see why its just a false positive and not
a real bug.
Regards,
Srivatsa S. Bhat
prev parent reply other threads:[~2012-10-03 9:59 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-02 16:14 Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Jiri Kosina
2012-10-02 17:01 ` Paul E. McKenney
2012-10-02 21:27 ` Jiri Kosina
2012-10-02 21:49 ` Jiri Kosina
2012-10-02 21:58 ` Jiri Kosina
2012-10-02 23:31 ` Paul E. McKenney
2012-10-02 23:48 ` Jiri Kosina
2012-10-03 0:15 ` Paul E. McKenney
2012-10-03 0:45 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
2012-10-03 3:41 ` Paul E. McKenney
2012-10-03 3:50 ` Srivatsa S. Bhat
2012-10-03 6:08 ` Srivatsa S. Bhat
2012-10-03 8:21 ` Srivatsa S. Bhat
2012-10-03 9:46 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 12:22 ` Srivatsa S. Bhat
2012-10-03 12:53 ` [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() Srivatsa S. Bhat
2012-10-03 21:13 ` Andrew Morton
2012-10-04 6:16 ` Srivatsa S. Bhat
2012-10-05 3:24 ` Yasuaki Ishimatsu
2012-10-05 5:35 ` Srivatsa S. Bhat
2012-10-03 14:50 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Paul E. McKenney
2012-10-03 14:55 ` Srivatsa S. Bhat
2012-10-03 16:00 ` Paul E. McKenney
2012-10-03 14:17 ` Christoph Lameter
2012-10-03 14:15 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Christoph Lameter
2012-10-03 14:34 ` [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 15:00 ` Srivatsa S. Bhat
2012-10-03 15:05 ` [PATCH v4] " Jiri Kosina
2012-10-03 15:49 ` Srivatsa S. Bhat
2012-10-03 18:49 ` David Rientjes
2012-10-08 7:26 ` [PATCH] [RESEND] " Jiri Kosina
2012-10-10 6:27 ` Pekka Enberg
2012-10-03 3:59 ` Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Srivatsa S. Bhat
2012-10-03 4:07 ` Paul E. McKenney
2012-10-03 4:15 ` Srivatsa S. Bhat
2012-10-02 20:39 ` Srivatsa S. Bhat
2012-10-02 22:17 ` Jiri Kosina
2012-10-03 3:35 ` Srivatsa S. Bhat
2012-10-03 3:44 ` Paul E. McKenney
2012-10-03 4:04 ` Srivatsa S. Bhat
2012-10-03 7:43 ` Jiri Kosina
2012-10-03 8:11 ` Srivatsa S. Bhat
2012-10-03 8:19 ` Jiri Kosina
2012-10-03 8:30 ` Srivatsa S. Bhat
2012-10-03 9:24 ` Jiri Kosina
2012-10-03 9:58 ` Srivatsa S. Bhat [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=506C0C39.1060302@linux.vnet.ibm.com \
--to=srivatsa.bhat@linux.vnet.ibm.com \
--cc=jkosina@suse.cz \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paul.mckenney@linaro.org \
--cc=paulmck@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).