All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Jiri Kosina <jkosina@suse.cz>
Cc: paulmck@linux.vnet.ibm.com,
	"Paul E. McKenney" <paul.mckenney@linaro.org>,
	Josh Triplett <josh@joshtriplett.org>,
	linux-kernel@vger.kernel.org
Subject: Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
Date: Wed, 03 Oct 2012 15:28:17 +0530	[thread overview]
Message-ID: <506C0C39.1060302@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.LNX.2.00.1210031107050.23544@pobox.suse.cz>

On 10/03/2012 02:54 PM, Jiri Kosina wrote:
> On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:
> 
>>>> static void cpu_hotplug_begin(void)
>>>> {
>>>>         cpu_hotplug.active_writer = current;
>>>>
>>>>         for (;;) {
>>>>                 mutex_lock(&cpu_hotplug.lock);
>>>>                 if (likely(!cpu_hotplug.refcount))   <================ This one!
>>>>                         break;
>>>>                 __set_current_state(TASK_UNINTERRUPTIBLE);
>>>>                 mutex_unlock(&cpu_hotplug.lock);
>>>>                 schedule();
>>>>         }   
>>>> }
>>>
>>> I acutally just came to the same conclusion (7 hours of sleep later, the 
>>> mind indeed seems to be brighter ... what a poet I am).
>>>
>>> Lockdep doesn't know about this semantics of cpu_hotplug_begin(), and 
>>> therefore gets confused by the fact that mutual exclusion is actually 
>>> achieved through the refcount instead of mutex (and the same apparently 
>>> happened to me).
>>
>> No, that's not the problem. Lockdep is fine. The calltrace clearly shows that
>> our refcounting has messed up somewhere. As a result, we really *are* running
>> a hotplug-reader and a hotplug-writer at the same time! We really need to fix
>> *that*! So please try the second debug patch I sent just now (with the BUG_ON()
>> in put_online_cpus()). We need to know who is calling put_online_cpus() twice
>> and fix that culprit!
> 
> I don't think so.
> 
> Lockdep is complaining, because
> 
> (a) during system bootup, the smp_init() -> cpu_up() -> cpuup_callback() 
>     teaches him about hotplug.lock -> slab_mutex dependency
> 
> (b) many many jiffies later, nf_conntrack_cleanup_net() calls 
>     kmem_cache_destroy(), which introduces slab_mutex -> hotplug.lock 
>     dependency
> 
> Lockdep rightfully (from his POV) sees this as potential ABBA, and reports 
> it, it's as simple as that.
> It has no way of knowing the fact that the ABBA can actually never happen, 
> because of special semantics of cpu_hotplug.refcount and it's handling in 
> cpu_hotplug_begin().
>

Hmm, you are right.
 
> The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin() 
> until everyone who called get_online_cpus() will call put_online_cpus()" 
> is totally invisible to lockdep.

I see your point..

> 
>>> So right, now I agree that the deadlock scenario I have come up with is 
>>> indeed bogus (*), and we just have to annotate this fact to lockdep 
>>> somehow.
>>
>> Yes, the deadlock scenario is bogus, but the refcounting leak is for real
>> and needs fixing.
> 
> With your patch applied, the BUG_ON() in put_online_cpus() didn't trigger 
> for me at all. Which is what I expected.

Oh, ok..

> 
>> I'm fine with this, but the real problem is elsewhere, like I mentioned above.
>> This one is only a good-to-have, not a fix.
>>  
>>> (*) I have seen machine locking hard reproducibly, but that was only with 
>>> additional Paul's patch, so I guess the lock order there actually was 
>>> wrong
>>
>> If refcounting was working fine, Paul's patch wouldn't have caused *any* issues.
>> With that patch in place, the 2 places where rcu_barrier() get invoked (ie.,
>> kmem_cache_destroy() and deactivate_locked_super()) both start waiting on
>> get_online_cpus() until the slab cpu hotplug notifier as well as the entire
>> cpu_up operation completes. Absolutely no problem in that! So the fact that
>> you are seeing lock-ups here is another indication that the problem is really
>> elsewhere!
> 
> I don't agree. The reason why Paul's patch (1331e7a1bb) started to trigger 
> this, is that (b) above doesn't exist in pre-1331e7a1bb kernels.
>

So basically what you are saying is, the calltraces in the lockdep splat are from
different points in time right? Then I see why its just a false positive and not
a real bug.
 
Regards,
Srivatsa S. Bhat


      reply	other threads:[~2012-10-03  9:59 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-02 16:14 Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Jiri Kosina
2012-10-02 17:01 ` Paul E. McKenney
2012-10-02 21:27   ` Jiri Kosina
2012-10-02 21:49     ` Jiri Kosina
2012-10-02 21:58       ` Jiri Kosina
2012-10-02 23:31         ` Paul E. McKenney
2012-10-02 23:48           ` Jiri Kosina
2012-10-03  0:15             ` Paul E. McKenney
2012-10-03  0:45               ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
2012-10-03  0:45                 ` Jiri Kosina
2012-10-03  3:41                 ` Paul E. McKenney
2012-10-03  3:41                   ` Paul E. McKenney
2012-10-03  3:50                 ` Srivatsa S. Bhat
2012-10-03  3:50                   ` Srivatsa S. Bhat
2012-10-03  6:08                   ` Srivatsa S. Bhat
2012-10-03  6:08                     ` Srivatsa S. Bhat
2012-10-03  8:21                     ` Srivatsa S. Bhat
2012-10-03  8:21                       ` Srivatsa S. Bhat
2012-10-03  9:46                 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03  9:46                   ` Jiri Kosina
2012-10-03 12:22                   ` Srivatsa S. Bhat
2012-10-03 12:22                     ` Srivatsa S. Bhat
2012-10-03 12:53                     ` [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() Srivatsa S. Bhat
2012-10-03 12:53                       ` Srivatsa S. Bhat
2012-10-03 21:13                       ` Andrew Morton
2012-10-03 21:13                         ` Andrew Morton
2012-10-04  6:16                         ` Srivatsa S. Bhat
2012-10-04  6:16                           ` Srivatsa S. Bhat
2012-10-05  3:24                           ` Yasuaki Ishimatsu
2012-10-05  3:24                             ` Yasuaki Ishimatsu
2012-10-05  5:35                             ` Srivatsa S. Bhat
2012-10-05  5:35                               ` Srivatsa S. Bhat
2012-10-03 14:50                     ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Paul E. McKenney
2012-10-03 14:50                       ` Paul E. McKenney
2012-10-03 14:55                       ` Srivatsa S. Bhat
2012-10-03 14:55                         ` Srivatsa S. Bhat
2012-10-03 16:00                         ` Paul E. McKenney
2012-10-03 16:00                           ` Paul E. McKenney
2012-10-03 14:17                   ` Christoph Lameter
2012-10-03 14:17                     ` Christoph Lameter
2012-10-03 14:15                 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Christoph Lameter
2012-10-03 14:15                   ` Christoph Lameter
2012-10-03 14:34                   ` [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 14:34                     ` Jiri Kosina
2012-10-03 15:00                     ` Srivatsa S. Bhat
2012-10-03 15:00                       ` Srivatsa S. Bhat
2012-10-03 15:05                       ` [PATCH v4] " Jiri Kosina
2012-10-03 15:05                         ` Jiri Kosina
2012-10-03 15:49                         ` Srivatsa S. Bhat
2012-10-03 15:49                           ` Srivatsa S. Bhat
2012-10-03 18:49                         ` David Rientjes
2012-10-03 18:49                           ` David Rientjes
2012-10-08  7:26                           ` [PATCH] [RESEND] " Jiri Kosina
2012-10-08  7:26                             ` Jiri Kosina
2012-10-10  6:27                             ` Pekka Enberg
2012-10-10  6:27                               ` Pekka Enberg
2012-10-03  3:59           ` Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Srivatsa S. Bhat
2012-10-03  4:07             ` Paul E. McKenney
2012-10-03  4:15               ` Srivatsa S. Bhat
2012-10-02 20:39 ` Srivatsa S. Bhat
2012-10-02 22:17   ` Jiri Kosina
2012-10-03  3:35     ` Srivatsa S. Bhat
2012-10-03  3:44       ` Paul E. McKenney
2012-10-03  4:04         ` Srivatsa S. Bhat
2012-10-03  7:43           ` Jiri Kosina
2012-10-03  8:11             ` Srivatsa S. Bhat
2012-10-03  8:19               ` Jiri Kosina
2012-10-03  8:30                 ` Srivatsa S. Bhat
2012-10-03  9:24                   ` Jiri Kosina
2012-10-03  9:58                     ` Srivatsa S. Bhat [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=506C0C39.1060302@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=jkosina@suse.cz \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paul.mckenney@linaro.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.