* Re: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()"))
2012-10-03 0:45 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
@ 2012-10-03 3:41 ` Paul E. McKenney
2012-10-03 3:50 ` Srivatsa S. Bhat
` (2 subsequent siblings)
3 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2012-10-03 3:41 UTC (permalink / raw)
To: Jiri Kosina
Cc: Christoph Lameter, Pekka Enberg, Paul E. McKenney, Josh Triplett,
linux-kernel, Srivatsa S. Bhat, linux-mm
On Wed, Oct 03, 2012 at 02:45:30AM +0200, Jiri Kosina wrote:
> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
>
> > On Wed, Oct 03, 2012 at 01:48:21AM +0200, Jiri Kosina wrote:
> > > On Tue, 2 Oct 2012, Paul E. McKenney wrote:
> > >
> > > > Indeed. Slab seems to be doing an rcu_barrier() in a CPU hotplug
> > > > notifier, which doesn't sit so well with rcu_barrier() trying to exclude
> > > > CPU hotplug events. I could go back to the old approach, but it is
> > > > significantly more complex. I cannot say that I am all that happy about
> > > > anyone calling rcu_barrier() from a CPU hotplug notifier because it
> > > > doesn't help CPU hotplug latency, but that is a separate issue.
> > > >
> > > > But the thing is that rcu_barrier()'s assumptions work just fine if either
> > > > (1) it excludes hotplug operations or (2) if it is called from a hotplug
> > > > notifier. You see, either way, the CPU cannot go away while rcu_barrier()
> > > > is executing. So the right way to resolve this seems to be to do the
> > > > get_online_cpus() only if rcu_barrier() is -not- executing in the context
> > > > of a hotplug notifier. Should be fixable without too much hassle...
> > >
> > > Sorry, I don't think I understand what you are proposing just yet.
> > >
> > > If I understand it correctly, you are proposing to introduce some magic
> > > into _rcu_barrier() such as (pseudocode of course):
> > >
> > > if (!being_called_from_hotplug_notifier_callback)
> > > get_online_cpus()
> > >
> > > How does that protect from the scenario I've outlined before though?
> > >
> > > CPU 0 CPU 1
> > > kmem_cache_destroy()
> > > mutex_lock(slab_mutex)
> > > _cpu_up()
> > > cpu_hotplug_begin()
> > > mutex_lock(cpu_hotplug.lock)
> > > rcu_barrier()
> > > _rcu_barrier()
> > > get_online_cpus()
> > > mutex_lock(cpu_hotplug.lock)
> > > (blocks, CPU 1 has the mutex)
> > > __cpu_notify()
> > > mutex_lock(slab_mutex)
> > >
> > > CPU 0 grabs both locks anyway (it's not running from notifier callback).
> > > CPU 1 grabs both locks as well, as there is no _rcu_barrier() being called
> > > from notifier callback either.
> > >
> > > What did I miss?
> >
> > You didn't miss anything, I was suffering a failure to read carefully.
> >
> > So my next stupid question is "Why can't kmem_cache_destroy drop
> > slab_mutex early?" like the following:
> >
> > void kmem_cache_destroy(struct kmem_cache *cachep)
> > {
> > BUG_ON(!cachep || in_interrupt());
> >
> > /* Find the cache in the chain of caches. */
> > get_online_cpus();
> > mutex_lock(&slab_mutex);
> > /*
> > * the chain is never empty, cache_cache is never destroyed
> > */
> > list_del(&cachep->list);
> > if (__cache_shrink(cachep)) {
> > slab_error(cachep, "Can't free all objects");
> > list_add(&cachep->list, &slab_caches);
> > mutex_unlock(&slab_mutex);
> > put_online_cpus();
> > return;
> > }
> > mutex_unlock(&slab_mutex);
> >
> > if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
> > rcu_barrier();
> >
> > __kmem_cache_destroy(cachep);
> > put_online_cpus();
> > }
> >
> > Or did I miss some reason why __kmem_cache_destroy() needs that lock?
> > Looks to me like it is just freeing now-disconnected memory.
>
> Good question. I believe it should be safe to drop slab_mutex earlier, as
> cachep has already been unlinked. I am adding slab people and linux-mm to
> CC (the whole thread on LKML can be found at
> https://lkml.org/lkml/2012/10/2/296 for reference).
>
> How about the patch below? Pekka, Christoph, please?
>
> It makes the lockdep happy again, and obviously removes the deadlock (I
> tested it).
You can certainly add my Reviewed-by, for whatever that is worth. ;-)
BTW, with this patch, are you able to dispense with my earlier patch,
or is it still needed?
Thanx, Paul
> From: Jiri Kosina <jkosina@suse.cz>
> Subject: mm, slab: release slab_mutex earlier in kmem_cache_destroy()
>
> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> dependency through kmem_cache_destroy() -> rcu_barrier() ->
> _rcu_barrier() -> get_online_cpus().
>
> This opens a possibilty for deadlock:
>
> CPU 0 CPU 1
> kmem_cache_destroy()
> mutex_lock(slab_mutex)
> _cpu_up()
> cpu_hotplug_begin()
> mutex_lock(cpu_hotplug.lock)
> rcu_barrier()
> _rcu_barrier()
> get_online_cpus()
> mutex_lock(cpu_hotplug.lock)
> (blocks, CPU 1 has the mutex)
> __cpu_notify()
> mutex_lock(slab_mutex)
>
> It turns out that slab's kmem_cache_destroy() might release slab_mutex
> earlier before calling out to rcu_barrier(), as cachep has already been
> unlinked.
>
> This patch removes the AB-BA dependency by calling rcu_barrier() with
> slab_mutex already unlocked.
>
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> ---
> mm/slab.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 1133911..693c7cb 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -2801,12 +2801,12 @@ void kmem_cache_destroy(struct kmem_cache *cachep)
> put_online_cpus();
> return;
> }
> + mutex_unlock(&slab_mutex);
>
> if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
> rcu_barrier();
>
> __kmem_cache_destroy(cachep);
> - mutex_unlock(&slab_mutex);
> put_online_cpus();
> }
> EXPORT_SYMBOL(kmem_cache_destroy);
>
> --
> Jiri Kosina
> SUSE Labs
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()"))
2012-10-03 0:45 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
2012-10-03 3:41 ` Paul E. McKenney
@ 2012-10-03 3:50 ` Srivatsa S. Bhat
2012-10-03 6:08 ` Srivatsa S. Bhat
2012-10-03 9:46 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 14:15 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Christoph Lameter
3 siblings, 1 reply; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03 3:50 UTC (permalink / raw)
To: Jiri Kosina
Cc: Paul E. McKenney, Christoph Lameter, Pekka Enberg,
Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm
On 10/03/2012 06:15 AM, Jiri Kosina wrote:
> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
>
>> On Wed, Oct 03, 2012 at 01:48:21AM +0200, Jiri Kosina wrote:
>>> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
>>>
>>>> Indeed. Slab seems to be doing an rcu_barrier() in a CPU hotplug
>>>> notifier, which doesn't sit so well with rcu_barrier() trying to exclude
>>>> CPU hotplug events. I could go back to the old approach, but it is
>>>> significantly more complex. I cannot say that I am all that happy about
>>>> anyone calling rcu_barrier() from a CPU hotplug notifier because it
>>>> doesn't help CPU hotplug latency, but that is a separate issue.
>>>>
>>>> But the thing is that rcu_barrier()'s assumptions work just fine if either
>>>> (1) it excludes hotplug operations or (2) if it is called from a hotplug
>>>> notifier. You see, either way, the CPU cannot go away while rcu_barrier()
>>>> is executing. So the right way to resolve this seems to be to do the
>>>> get_online_cpus() only if rcu_barrier() is -not- executing in the context
>>>> of a hotplug notifier. Should be fixable without too much hassle...
>>>
>>> Sorry, I don't think I understand what you are proposing just yet.
>>>
>>> If I understand it correctly, you are proposing to introduce some magic
>>> into _rcu_barrier() such as (pseudocode of course):
>>>
>>> if (!being_called_from_hotplug_notifier_callback)
>>> get_online_cpus()
>>>
>>> How does that protect from the scenario I've outlined before though?
>>>
>>> CPU 0 CPU 1
>>> kmem_cache_destroy()
>>> mutex_lock(slab_mutex)
>>> _cpu_up()
>>> cpu_hotplug_begin()
>>> mutex_lock(cpu_hotplug.lock)
>>> rcu_barrier()
>>> _rcu_barrier()
>>> get_online_cpus()
>>> mutex_lock(cpu_hotplug.lock)
>>> (blocks, CPU 1 has the mutex)
>>> __cpu_notify()
>>> mutex_lock(slab_mutex)
>>>
>>> CPU 0 grabs both locks anyway (it's not running from notifier callback).
>>> CPU 1 grabs both locks as well, as there is no _rcu_barrier() being called
>>> from notifier callback either.
>>>
>>> What did I miss?
>>
>> You didn't miss anything, I was suffering a failure to read carefully.
>>
>> So my next stupid question is "Why can't kmem_cache_destroy drop
>> slab_mutex early?" like the following:
>>
>> void kmem_cache_destroy(struct kmem_cache *cachep)
>> {
>> BUG_ON(!cachep || in_interrupt());
>>
>> /* Find the cache in the chain of caches. */
>> get_online_cpus();
>> mutex_lock(&slab_mutex);
>> /*
>> * the chain is never empty, cache_cache is never destroyed
>> */
>> list_del(&cachep->list);
>> if (__cache_shrink(cachep)) {
>> slab_error(cachep, "Can't free all objects");
>> list_add(&cachep->list, &slab_caches);
>> mutex_unlock(&slab_mutex);
>> put_online_cpus();
>> return;
>> }
>> mutex_unlock(&slab_mutex);
>>
>> if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
>> rcu_barrier();
>>
>> __kmem_cache_destroy(cachep);
>> put_online_cpus();
>> }
>>
>> Or did I miss some reason why __kmem_cache_destroy() needs that lock?
>> Looks to me like it is just freeing now-disconnected memory.
>
> Good question. I believe it should be safe to drop slab_mutex earlier, as
> cachep has already been unlinked. I am adding slab people and linux-mm to
> CC (the whole thread on LKML can be found at
> https://lkml.org/lkml/2012/10/2/296 for reference).
>
> How about the patch below? Pekka, Christoph, please?
>
> It makes the lockdep happy again, and obviously removes the deadlock (I
> tested it).
>
>
>
> From: Jiri Kosina <jkosina@suse.cz>
> Subject: mm, slab: release slab_mutex earlier in kmem_cache_destroy()
>
> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> dependency through kmem_cache_destroy() -> rcu_barrier() ->
> _rcu_barrier() -> get_online_cpus().
>
> This opens a possibilty for deadlock:
>
> CPU 0 CPU 1
> kmem_cache_destroy()
> mutex_lock(slab_mutex)
> _cpu_up()
> cpu_hotplug_begin()
> mutex_lock(cpu_hotplug.lock)
> rcu_barrier()
> _rcu_barrier()
> get_online_cpus()
> mutex_lock(cpu_hotplug.lock)
> (blocks, CPU 1 has the mutex)
> __cpu_notify()
> mutex_lock(slab_mutex)
Hmm.. no, this should *never* happen IMHO!
If I am seeing the code right, kmem_cache_destroy() wraps its entire content
inside get/put_online_cpus(), which means it cannot run concurrently with cpu_up()
or cpu_down(). Are we really hitting a corner case where the refcounting logic
in get/put_online_cpus() is failing and allowing a hotplug writer to run in
parallel with a hotplug reader? If yes, *that* is the problem we have to fix..
Regards,
Srivatsa S. Bhat
>
> It turns out that slab's kmem_cache_destroy() might release slab_mutex
> earlier before calling out to rcu_barrier(), as cachep has already been
> unlinked.
>
> This patch removes the AB-BA dependency by calling rcu_barrier() with
> slab_mutex already unlocked.
>
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> ---
> mm/slab.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 1133911..693c7cb 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -2801,12 +2801,12 @@ void kmem_cache_destroy(struct kmem_cache *cachep)
> put_online_cpus();
> return;
> }
> + mutex_unlock(&slab_mutex);
>
> if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
> rcu_barrier();
>
> __kmem_cache_destroy(cachep);
> - mutex_unlock(&slab_mutex);
> put_online_cpus();
> }
> EXPORT_SYMBOL(kmem_cache_destroy);
>
--
Regards,
Srivatsa S. Bhat
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()"))
2012-10-03 3:50 ` Srivatsa S. Bhat
@ 2012-10-03 6:08 ` Srivatsa S. Bhat
2012-10-03 8:21 ` Srivatsa S. Bhat
0 siblings, 1 reply; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03 6:08 UTC (permalink / raw)
To: Jiri Kosina
Cc: Paul E. McKenney, Christoph Lameter, Pekka Enberg,
Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm
On 10/03/2012 09:20 AM, Srivatsa S. Bhat wrote:
> On 10/03/2012 06:15 AM, Jiri Kosina wrote:
>> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
>>
>>> On Wed, Oct 03, 2012 at 01:48:21AM +0200, Jiri Kosina wrote:
>>>> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
>>>>
>>>>> Indeed. Slab seems to be doing an rcu_barrier() in a CPU hotplug
>>>>> notifier, which doesn't sit so well with rcu_barrier() trying to exclude
>>>>> CPU hotplug events. I could go back to the old approach, but it is
>>>>> significantly more complex. I cannot say that I am all that happy about
>>>>> anyone calling rcu_barrier() from a CPU hotplug notifier because it
>>>>> doesn't help CPU hotplug latency, but that is a separate issue.
>>>>>
>>>>> But the thing is that rcu_barrier()'s assumptions work just fine if either
>>>>> (1) it excludes hotplug operations or (2) if it is called from a hotplug
>>>>> notifier. You see, either way, the CPU cannot go away while rcu_barrier()
>>>>> is executing. So the right way to resolve this seems to be to do the
>>>>> get_online_cpus() only if rcu_barrier() is -not- executing in the context
>>>>> of a hotplug notifier. Should be fixable without too much hassle...
>>>>
>>>> Sorry, I don't think I understand what you are proposing just yet.
>>>>
>>>> If I understand it correctly, you are proposing to introduce some magic
>>>> into _rcu_barrier() such as (pseudocode of course):
>>>>
>>>> if (!being_called_from_hotplug_notifier_callback)
>>>> get_online_cpus()
>>>>
>>>> How does that protect from the scenario I've outlined before though?
>>>>
>>>> CPU 0 CPU 1
>>>> kmem_cache_destroy()
>>>> mutex_lock(slab_mutex)
>>>> _cpu_up()
>>>> cpu_hotplug_begin()
>>>> mutex_lock(cpu_hotplug.lock)
>>>> rcu_barrier()
>>>> _rcu_barrier()
>>>> get_online_cpus()
>>>> mutex_lock(cpu_hotplug.lock)
>>>> (blocks, CPU 1 has the mutex)
>>>> __cpu_notify()
>>>> mutex_lock(slab_mutex)
>>>>
>>>> CPU 0 grabs both locks anyway (it's not running from notifier callback).
>>>> CPU 1 grabs both locks as well, as there is no _rcu_barrier() being called
>>>> from notifier callback either.
>>>>
>>>> What did I miss?
>>>
>>> You didn't miss anything, I was suffering a failure to read carefully.
>>>
>>> So my next stupid question is "Why can't kmem_cache_destroy drop
>>> slab_mutex early?" like the following:
>>>
>>> void kmem_cache_destroy(struct kmem_cache *cachep)
>>> {
>>> BUG_ON(!cachep || in_interrupt());
>>>
>>> /* Find the cache in the chain of caches. */
>>> get_online_cpus();
>>> mutex_lock(&slab_mutex);
>>> /*
>>> * the chain is never empty, cache_cache is never destroyed
>>> */
>>> list_del(&cachep->list);
>>> if (__cache_shrink(cachep)) {
>>> slab_error(cachep, "Can't free all objects");
>>> list_add(&cachep->list, &slab_caches);
>>> mutex_unlock(&slab_mutex);
>>> put_online_cpus();
>>> return;
>>> }
>>> mutex_unlock(&slab_mutex);
>>>
>>> if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
>>> rcu_barrier();
>>>
>>> __kmem_cache_destroy(cachep);
>>> put_online_cpus();
>>> }
>>>
>>> Or did I miss some reason why __kmem_cache_destroy() needs that lock?
>>> Looks to me like it is just freeing now-disconnected memory.
>>
>> Good question. I believe it should be safe to drop slab_mutex earlier, as
>> cachep has already been unlinked. I am adding slab people and linux-mm to
>> CC (the whole thread on LKML can be found at
>> https://lkml.org/lkml/2012/10/2/296 for reference).
>>
>> How about the patch below? Pekka, Christoph, please?
>>
>> It makes the lockdep happy again, and obviously removes the deadlock (I
>> tested it).
>>
>>
>>
>> From: Jiri Kosina <jkosina@suse.cz>
>> Subject: mm, slab: release slab_mutex earlier in kmem_cache_destroy()
>>
>> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
>> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
>> dependency through kmem_cache_destroy() -> rcu_barrier() ->
>> _rcu_barrier() -> get_online_cpus().
>>
>> This opens a possibilty for deadlock:
>>
>> CPU 0 CPU 1
>> kmem_cache_destroy()
>> mutex_lock(slab_mutex)
>> _cpu_up()
>> cpu_hotplug_begin()
>> mutex_lock(cpu_hotplug.lock)
>> rcu_barrier()
>> _rcu_barrier()
>> get_online_cpus()
>> mutex_lock(cpu_hotplug.lock)
>> (blocks, CPU 1 has the mutex)
>> __cpu_notify()
>> mutex_lock(slab_mutex)
>
> Hmm.. no, this should *never* happen IMHO!
>
> If I am seeing the code right, kmem_cache_destroy() wraps its entire content
> inside get/put_online_cpus(), which means it cannot run concurrently with cpu_up()
> or cpu_down(). Are we really hitting a corner case where the refcounting logic
> in get/put_online_cpus() is failing and allowing a hotplug writer to run in
> parallel with a hotplug reader? If yes, *that* is the problem we have to fix..
>
One scenario where we can see this happen is if we had a put_online_cpus() "leak"
somewhere.. that is, perhaps the task that was about to invoke kmem_cache_destroy()
previously called put_online_cpus() once too many. In that case, the get_online_cpus()
in kmem_cache_destroy() might prove useless in excluding it from concurrent hotplug
operations.
Jiri, can you please try the debug patch below? It would be good to see if the
refcount ever went negative...
Regards,
Srivatsa S. Bhat
----------------------------------------------
kernel/cpu.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index f560598..cf82da9 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -69,6 +69,7 @@ void get_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(&cpu_hotplug.lock);
+ BUG_ON(cpu_hotplug.refcount < 0);
cpu_hotplug.refcount++;
mutex_unlock(&cpu_hotplug.lock);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()"))
2012-10-03 6:08 ` Srivatsa S. Bhat
@ 2012-10-03 8:21 ` Srivatsa S. Bhat
0 siblings, 0 replies; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03 8:21 UTC (permalink / raw)
To: Jiri Kosina
Cc: Paul E. McKenney, Christoph Lameter, Pekka Enberg,
Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm
On 10/03/2012 11:38 AM, Srivatsa S. Bhat wrote:
> On 10/03/2012 09:20 AM, Srivatsa S. Bhat wrote:
>> On 10/03/2012 06:15 AM, Jiri Kosina wrote:
>>> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
>>>
>>>> On Wed, Oct 03, 2012 at 01:48:21AM +0200, Jiri Kosina wrote:
>>>>> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
>>>>>
>>>>>> Indeed. Slab seems to be doing an rcu_barrier() in a CPU hotplug
>>>>>> notifier, which doesn't sit so well with rcu_barrier() trying to exclude
>>>>>> CPU hotplug events. I could go back to the old approach, but it is
>>>>>> significantly more complex. I cannot say that I am all that happy about
>>>>>> anyone calling rcu_barrier() from a CPU hotplug notifier because it
>>>>>> doesn't help CPU hotplug latency, but that is a separate issue.
>>>>>>
>>>>>> But the thing is that rcu_barrier()'s assumptions work just fine if either
>>>>>> (1) it excludes hotplug operations or (2) if it is called from a hotplug
>>>>>> notifier. You see, either way, the CPU cannot go away while rcu_barrier()
>>>>>> is executing. So the right way to resolve this seems to be to do the
>>>>>> get_online_cpus() only if rcu_barrier() is -not- executing in the context
>>>>>> of a hotplug notifier. Should be fixable without too much hassle...
>>>>>
>>>>> Sorry, I don't think I understand what you are proposing just yet.
>>>>>
>>>>> If I understand it correctly, you are proposing to introduce some magic
>>>>> into _rcu_barrier() such as (pseudocode of course):
>>>>>
>>>>> if (!being_called_from_hotplug_notifier_callback)
>>>>> get_online_cpus()
>>>>>
>>>>> How does that protect from the scenario I've outlined before though?
>>>>>
>>>>> CPU 0 CPU 1
>>>>> kmem_cache_destroy()
>>>>> mutex_lock(slab_mutex)
>>>>> _cpu_up()
>>>>> cpu_hotplug_begin()
>>>>> mutex_lock(cpu_hotplug.lock)
>>>>> rcu_barrier()
>>>>> _rcu_barrier()
>>>>> get_online_cpus()
>>>>> mutex_lock(cpu_hotplug.lock)
>>>>> (blocks, CPU 1 has the mutex)
>>>>> __cpu_notify()
>>>>> mutex_lock(slab_mutex)
>>>>>
>>>>> CPU 0 grabs both locks anyway (it's not running from notifier callback).
>>>>> CPU 1 grabs both locks as well, as there is no _rcu_barrier() being called
>>>>> from notifier callback either.
>>>>>
>>>>> What did I miss?
>>>>
>>>> You didn't miss anything, I was suffering a failure to read carefully.
>>>>
>>>> So my next stupid question is "Why can't kmem_cache_destroy drop
>>>> slab_mutex early?" like the following:
>>>>
>>>> void kmem_cache_destroy(struct kmem_cache *cachep)
>>>> {
>>>> BUG_ON(!cachep || in_interrupt());
>>>>
>>>> /* Find the cache in the chain of caches. */
>>>> get_online_cpus();
>>>> mutex_lock(&slab_mutex);
>>>> /*
>>>> * the chain is never empty, cache_cache is never destroyed
>>>> */
>>>> list_del(&cachep->list);
>>>> if (__cache_shrink(cachep)) {
>>>> slab_error(cachep, "Can't free all objects");
>>>> list_add(&cachep->list, &slab_caches);
>>>> mutex_unlock(&slab_mutex);
>>>> put_online_cpus();
>>>> return;
>>>> }
>>>> mutex_unlock(&slab_mutex);
>>>>
>>>> if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
>>>> rcu_barrier();
>>>>
>>>> __kmem_cache_destroy(cachep);
>>>> put_online_cpus();
>>>> }
>>>>
>>>> Or did I miss some reason why __kmem_cache_destroy() needs that lock?
>>>> Looks to me like it is just freeing now-disconnected memory.
>>>
>>> Good question. I believe it should be safe to drop slab_mutex earlier, as
>>> cachep has already been unlinked. I am adding slab people and linux-mm to
>>> CC (the whole thread on LKML can be found at
>>> https://lkml.org/lkml/2012/10/2/296 for reference).
>>>
>>> How about the patch below? Pekka, Christoph, please?
>>>
>>> It makes the lockdep happy again, and obviously removes the deadlock (I
>>> tested it).
>>>
>>>
>>>
>>> From: Jiri Kosina <jkosina@suse.cz>
>>> Subject: mm, slab: release slab_mutex earlier in kmem_cache_destroy()
>>>
>>> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
>>> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
>>> dependency through kmem_cache_destroy() -> rcu_barrier() ->
>>> _rcu_barrier() -> get_online_cpus().
>>>
>>> This opens a possibilty for deadlock:
>>>
>>> CPU 0 CPU 1
>>> kmem_cache_destroy()
>>> mutex_lock(slab_mutex)
>>> _cpu_up()
>>> cpu_hotplug_begin()
>>> mutex_lock(cpu_hotplug.lock)
>>> rcu_barrier()
>>> _rcu_barrier()
>>> get_online_cpus()
>>> mutex_lock(cpu_hotplug.lock)
>>> (blocks, CPU 1 has the mutex)
>>> __cpu_notify()
>>> mutex_lock(slab_mutex)
>>
>> Hmm.. no, this should *never* happen IMHO!
>>
>> If I am seeing the code right, kmem_cache_destroy() wraps its entire content
>> inside get/put_online_cpus(), which means it cannot run concurrently with cpu_up()
>> or cpu_down(). Are we really hitting a corner case where the refcounting logic
>> in get/put_online_cpus() is failing and allowing a hotplug writer to run in
>> parallel with a hotplug reader? If yes, *that* is the problem we have to fix..
>>
>
> One scenario where we can see this happen is if we had a put_online_cpus() "leak"
> somewhere.. that is, perhaps the task that was about to invoke kmem_cache_destroy()
> previously called put_online_cpus() once too many. In that case, the get_online_cpus()
> in kmem_cache_destroy() might prove useless in excluding it from concurrent hotplug
> operations.
>
> Jiri, can you please try the debug patch below? It would be good to see if the
> refcount ever went negative...
>
Better to catch the bug even earlier, at the right moment, in put_online_cpus()
itself. Could you try this one instead?
Regards,
Srivatsa S. Bhat
----------------------------------------------------
kernel/cpu.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index f560598..00d29bc 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -80,6 +80,7 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(&cpu_hotplug.lock);
+ BUG_ON(cpu_hotplug.refcount == 0);
if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
wake_up_process(cpu_hotplug.active_writer);
mutex_unlock(&cpu_hotplug.lock);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
2012-10-03 0:45 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
2012-10-03 3:41 ` Paul E. McKenney
2012-10-03 3:50 ` Srivatsa S. Bhat
@ 2012-10-03 9:46 ` Jiri Kosina
2012-10-03 12:22 ` Srivatsa S. Bhat
2012-10-03 14:17 ` Christoph Lameter
2012-10-03 14:15 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Christoph Lameter
3 siblings, 2 replies; 24+ messages in thread
From: Jiri Kosina @ 2012-10-03 9:46 UTC (permalink / raw)
To: Paul E. McKenney, Christoph Lameter, Pekka Enberg
Cc: Paul E. McKenney, Josh Triplett, linux-kernel, Srivatsa S. Bhat,
linux-mm
On Wed, 3 Oct 2012, Jiri Kosina wrote:
> Good question. I believe it should be safe to drop slab_mutex earlier, as
> cachep has already been unlinked. I am adding slab people and linux-mm to
> CC (the whole thread on LKML can be found at
> https://lkml.org/lkml/2012/10/2/296 for reference).
>
> How about the patch below? Pekka, Christoph, please?
It turns out that lockdep is actually getting this wrong, so the changelog
in the previous version wasn't accurate.
Please find patch with updated changelog below. Pekka, Christoph, could
you please check whether it makes sense to you as well? Thanks.
From: Jiri Kosina <jkosina@suse.cz>
Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
__stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
dependency through kmem_cache_destroy() -> rcu_barrier() ->
_rcu_barrier() -> get_online_cpus().
Lockdep thinks that this might actually result in ABBA deadlock,
and reports it as below:
=== [ cut here ] ===
======================================================
[ INFO: possible circular locking dependency detected ]
3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
-------------------------------------------------------
kworker/u:2/40 is trying to acquire lock:
(rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
but task is already holding lock:
(slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (slab_mutex){+.+.+.}:
[<ffffffff810ae1e2>] validate_chain+0x632/0x720
[<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
[<ffffffff810ae921>] lock_acquire+0x121/0x190
[<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
[<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
[<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
[<ffffffff81564b83>] notifier_call_chain+0x93/0x140
[<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
[<ffffffff8155719d>] _cpu_up+0xba/0x14e
[<ffffffff815572ed>] cpu_up+0xbc/0x117
[<ffffffff81ae05e3>] smp_init+0x6b/0x9f
[<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
[<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
-> #1 (cpu_hotplug.lock){+.+.+.}:
[<ffffffff810ae1e2>] validate_chain+0x632/0x720
[<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
[<ffffffff810ae921>] lock_acquire+0x121/0x190
[<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
[<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
[<ffffffff81049197>] get_online_cpus+0x37/0x50
[<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
[<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
[<ffffffff810f2309>] rcu_barrier+0x9/0x10
[<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
[<ffffffff8118cc01>] deactivate_super+0x61/0x70
[<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
[<ffffffff811ab49e>] sys_umount+0x6e/0xd0
[<ffffffff81569979>] system_call_fastpath+0x16/0x1b
-> #0 (rcu_sched_state.barrier_mutex){+.+...}:
[<ffffffff810adb4e>] check_prev_add+0x3de/0x440
[<ffffffff810ae1e2>] validate_chain+0x632/0x720
[<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
[<ffffffff810ae921>] lock_acquire+0x121/0x190
[<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
[<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
[<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
[<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
[<ffffffff810f2309>] rcu_barrier+0x9/0x10
[<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
[<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
[<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
[<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
[<ffffffff81454b79>] ops_exit_list+0x39/0x60
[<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
[<ffffffff8106917b>] process_one_work+0x26b/0x4c0
[<ffffffff81069f3e>] worker_thread+0x12e/0x320
[<ffffffff8106f73e>] kthread+0x9e/0xb0
[<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
other info that might help us debug this:
Chain exists of:
rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(slab_mutex);
lock(cpu_hotplug.lock);
lock(slab_mutex);
lock(rcu_sched_state.barrier_mutex);
*** DEADLOCK ***
=== [ cut here ] ===
This is actually a false positive. Lockdep has no way of knowing the fact
that the ABBA can actually never happen, because of special semantics of
cpu_hotplug.refcount and itss handling in cpu_hotplug_begin(); the mutual
exclusion there is not achieved through mutex, but through
cpu_hotplug.refcount.
The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
until everyone who called get_online_cpus() will call put_online_cpus()"
semantics is totally invisible to lockdep.
This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
is being called with it unlocked. It has two advantages:
- it slightly reduces hold time of slab_mutex; as it's used to protect
the cachep list, it's not necessary to hold it over __kmem_cache_destroy()
call any more
- it silences the lockdep false positive warning, as it avoids lockdep ever
learning about slab_mutex -> cpu_hotplug.lock dependency
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
mm/slab.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/slab.c b/mm/slab.c
index 1133911..693c7cb 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2801,12 +2801,12 @@ void kmem_cache_destroy(struct kmem_cache *cachep)
put_online_cpus();
return;
}
+ mutex_unlock(&slab_mutex);
if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
rcu_barrier();
__kmem_cache_destroy(cachep);
- mutex_unlock(&slab_mutex);
put_online_cpus();
}
EXPORT_SYMBOL(kmem_cache_destroy);
--
Jiri Kosina
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
2012-10-03 9:46 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
@ 2012-10-03 12:22 ` Srivatsa S. Bhat
2012-10-03 12:53 ` [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() Srivatsa S. Bhat
2012-10-03 14:50 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Paul E. McKenney
2012-10-03 14:17 ` Christoph Lameter
1 sibling, 2 replies; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03 12:22 UTC (permalink / raw)
To: Jiri Kosina
Cc: Paul E. McKenney, Christoph Lameter, Pekka Enberg,
Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm
On 10/03/2012 03:16 PM, Jiri Kosina wrote:
> On Wed, 3 Oct 2012, Jiri Kosina wrote:
>
>> Good question. I believe it should be safe to drop slab_mutex earlier, as
>> cachep has already been unlinked. I am adding slab people and linux-mm to
>> CC (the whole thread on LKML can be found at
>> https://lkml.org/lkml/2012/10/2/296 for reference).
>>
>> How about the patch below? Pekka, Christoph, please?
>
> It turns out that lockdep is actually getting this wrong, so the changelog
> in the previous version wasn't accurate.
>
> Please find patch with updated changelog below. Pekka, Christoph, could
> you please check whether it makes sense to you as well? Thanks.
>
>
>
>
> From: Jiri Kosina <jkosina@suse.cz>
> Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
>
> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> dependency through kmem_cache_destroy() -> rcu_barrier() ->
> _rcu_barrier() -> get_online_cpus().
>
> Lockdep thinks that this might actually result in ABBA deadlock,
> and reports it as below:
>
> === [ cut here ] ===
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
> -------------------------------------------------------
> kworker/u:2/40 is trying to acquire lock:
> (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
>
> but task is already holding lock:
> (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (slab_mutex){+.+.+.}:
> [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> [<ffffffff810ae921>] lock_acquire+0x121/0x190
> [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
> [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
> [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
> [<ffffffff8155719d>] _cpu_up+0xba/0x14e
> [<ffffffff815572ed>] cpu_up+0xbc/0x117
> [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
> [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
> [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
>
> -> #1 (cpu_hotplug.lock){+.+.+.}:
> [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> [<ffffffff810ae921>] lock_acquire+0x121/0x190
> [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> [<ffffffff81049197>] get_online_cpus+0x37/0x50
> [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
> [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
> [<ffffffff810f2309>] rcu_barrier+0x9/0x10
> [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
> [<ffffffff8118cc01>] deactivate_super+0x61/0x70
> [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
> [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
> [<ffffffff81569979>] system_call_fastpath+0x16/0x1b
>
> -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
> [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
> [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> [<ffffffff810ae921>] lock_acquire+0x121/0x190
> [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
> [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
> [<ffffffff810f2309>] rcu_barrier+0x9/0x10
> [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
> [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
> [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
> [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
> [<ffffffff81454b79>] ops_exit_list+0x39/0x60
> [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
> [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
> [<ffffffff81069f3e>] worker_thread+0x12e/0x320
> [<ffffffff8106f73e>] kthread+0x9e/0xb0
> [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
>
> other info that might help us debug this:
>
> Chain exists of:
> rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(slab_mutex);
> lock(cpu_hotplug.lock);
> lock(slab_mutex);
> lock(rcu_sched_state.barrier_mutex);
>
> *** DEADLOCK ***
> === [ cut here ] ===
>
> This is actually a false positive. Lockdep has no way of knowing the fact
> that the ABBA can actually never happen, because of special semantics of
> cpu_hotplug.refcount and itss handling in cpu_hotplug_begin(); the mutual
> exclusion there is not achieved through mutex, but through
> cpu_hotplug.refcount.
>
> The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
> until everyone who called get_online_cpus() will call put_online_cpus()"
> semantics is totally invisible to lockdep.
>
> This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
> is being called with it unlocked. It has two advantages:
>
> - it slightly reduces hold time of slab_mutex; as it's used to protect
> the cachep list, it's not necessary to hold it over __kmem_cache_destroy()
> call any more
> - it silences the lockdep false positive warning, as it avoids lockdep ever
> learning about slab_mutex -> cpu_hotplug.lock dependency
>
> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Earlier I was under the wrong impression that all the calltraces that lockdep
spewed were reflecting the same point in time. So, sorry about that noise! :-)
It is indeed a false-positive and there is no real bug here, and the fix looks
good, provided __kmem_cache_destroy() doesn't expect slab_mutex to be held.
But, I'm also quite surprised that the put_online_cpus() code as it stands today
doesn't have any checks for the refcount going negative. I believe that such a
check would be valuable to help catch cases where we might end up inadvertently
causing an imbalance between get_online_cpus() and put_online_cpus(). I'll post
that as a separate patch.
Regards,
Srivatsa S. Bhat
> ---
> mm/slab.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 1133911..693c7cb 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -2801,12 +2801,12 @@ void kmem_cache_destroy(struct kmem_cache *cachep)
> put_online_cpus();
> return;
> }
> + mutex_unlock(&slab_mutex);
>
> if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
> rcu_barrier();
>
> __kmem_cache_destroy(cachep);
> - mutex_unlock(&slab_mutex);
> put_online_cpus();
> }
> EXPORT_SYMBOL(kmem_cache_destroy);
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
2012-10-03 12:22 ` Srivatsa S. Bhat
@ 2012-10-03 12:53 ` Srivatsa S. Bhat
2012-10-03 21:13 ` Andrew Morton
2012-10-03 14:50 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Paul E. McKenney
1 sibling, 1 reply; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03 12:53 UTC (permalink / raw)
To: Jiri Kosina, Thomas Gleixner, akpm@linux-foundation.org,
Ingo Molnar, Peter Zijlstra
Cc: Paul E. McKenney, Christoph Lameter, Pekka Enberg,
Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm
On 10/03/2012 05:52 PM, Srivatsa S. Bhat wrote:
> On 10/03/2012 03:16 PM, Jiri Kosina wrote:
>> On Wed, 3 Oct 2012, Jiri Kosina wrote:
>>
>>> Good question. I believe it should be safe to drop slab_mutex earlier, as
>>> cachep has already been unlinked. I am adding slab people and linux-mm to
>>> CC (the whole thread on LKML can be found at
>>> https://lkml.org/lkml/2012/10/2/296 for reference).
>>>
[...]
>
> But, I'm also quite surprised that the put_online_cpus() code as it stands today
> doesn't have any checks for the refcount going negative. I believe that such a
> check would be valuable to help catch cases where we might end up inadvertently
> causing an imbalance between get_online_cpus() and put_online_cpus(). I'll post
> that as a separate patch.
>
-----------------------------------
From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Subject: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
The synchronization between CPU hotplug readers and writers is achieved by
means of refcounting, safe-guarded by the cpu_hotplug.lock.
get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
it. If we ever hit an imbalance between the two, we end up compromising the
guarantees of the hotplug synchronization i.e, for example, an extra call to
put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
where the refcount can go negative.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---
kernel/cpu.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index f560598..00d29bc 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -80,6 +80,7 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(&cpu_hotplug.lock);
+ BUG_ON(cpu_hotplug.refcount == 0);
if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
wake_up_process(cpu_hotplug.active_writer);
mutex_unlock(&cpu_hotplug.lock);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
2012-10-03 12:53 ` [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() Srivatsa S. Bhat
@ 2012-10-03 21:13 ` Andrew Morton
2012-10-04 6:16 ` Srivatsa S. Bhat
0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2012-10-03 21:13 UTC (permalink / raw)
To: Srivatsa S. Bhat
Cc: Jiri Kosina, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Paul E. McKenney, Christoph Lameter, Pekka Enberg,
Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm
On Wed, 03 Oct 2012 18:23:09 +0530
"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:
> The synchronization between CPU hotplug readers and writers is achieved by
> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>
> get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
> it. If we ever hit an imbalance between the two, we end up compromising the
> guarantees of the hotplug synchronization i.e, for example, an extra call to
> put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
> where the refcount can go negative.
>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
>
> kernel/cpu.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index f560598..00d29bc 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -80,6 +80,7 @@ void put_online_cpus(void)
> if (cpu_hotplug.active_writer == current)
> return;
> mutex_lock(&cpu_hotplug.lock);
> + BUG_ON(cpu_hotplug.refcount == 0);
> if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
> wake_up_process(cpu_hotplug.active_writer);
> mutex_unlock(&cpu_hotplug.lock);
I think calling BUG() here is a bit harsh. We should only do that if
there's a risk to proceeding: a risk of data loss, a reduced ability to
analyse the underlying bug, etc.
But a cpu-hotplug locking imbalance is a really really really minor
problem! So how about we emit a warning then try to fix things up?
This should increase the chance that the machine will keep running and
so will increase the chance that a user will be able to report the bug
to us.
--- a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
+++ a/kernel/cpu.c
@@ -80,9 +80,12 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(&cpu_hotplug.lock);
- BUG_ON(cpu_hotplug.refcount == 0);
- if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
- wake_up_process(cpu_hotplug.active_writer);
+ if (!--cpu_hotplug.refcount) {
+ if (WARN_ON(cpu_hotplug.refcount == -1))
+ cpu_hotplug.refcount++; /* try to fix things up */
+ if (unlikely(cpu_hotplug.active_writer))
+ wake_up_process(cpu_hotplug.active_writer);
+ }
mutex_unlock(&cpu_hotplug.lock);
}
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
2012-10-03 21:13 ` Andrew Morton
@ 2012-10-04 6:16 ` Srivatsa S. Bhat
2012-10-05 3:24 ` Yasuaki Ishimatsu
0 siblings, 1 reply; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-04 6:16 UTC (permalink / raw)
To: Andrew Morton
Cc: Jiri Kosina, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Paul E. McKenney, Christoph Lameter, Pekka Enberg,
Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm
On 10/04/2012 02:43 AM, Andrew Morton wrote:
> On Wed, 03 Oct 2012 18:23:09 +0530
> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>
>> The synchronization between CPU hotplug readers and writers is achieved by
>> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>>
>> get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
>> it. If we ever hit an imbalance between the two, we end up compromising the
>> guarantees of the hotplug synchronization i.e, for example, an extra call to
>> put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
>> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
>> where the refcount can go negative.
>>
>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> ---
>>
>> kernel/cpu.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index f560598..00d29bc 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -80,6 +80,7 @@ void put_online_cpus(void)
>> if (cpu_hotplug.active_writer == current)
>> return;
>> mutex_lock(&cpu_hotplug.lock);
>> + BUG_ON(cpu_hotplug.refcount == 0);
>> if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>> wake_up_process(cpu_hotplug.active_writer);
>> mutex_unlock(&cpu_hotplug.lock);
>
> I think calling BUG() here is a bit harsh. We should only do that if
> there's a risk to proceeding: a risk of data loss, a reduced ability to
> analyse the underlying bug, etc.
>
> But a cpu-hotplug locking imbalance is a really really really minor
> problem! So how about we emit a warning then try to fix things up?
That would be better indeed, thanks!
> This should increase the chance that the machine will keep running and
> so will increase the chance that a user will be able to report the bug
> to us.
>
Yep, sounds good.
>
> --- a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
> +++ a/kernel/cpu.c
> @@ -80,9 +80,12 @@ void put_online_cpus(void)
> if (cpu_hotplug.active_writer == current)
> return;
> mutex_lock(&cpu_hotplug.lock);
> - BUG_ON(cpu_hotplug.refcount == 0);
> - if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
> - wake_up_process(cpu_hotplug.active_writer);
> + if (!--cpu_hotplug.refcount) {
This won't catch it. We'll enter this 'if' condition only when cpu_hotplug.refcount was
decremented to zero. We'll miss out the case when it went negative (which we intended to detect).
> + if (WARN_ON(cpu_hotplug.refcount == -1))
> + cpu_hotplug.refcount++; /* try to fix things up */
> + if (unlikely(cpu_hotplug.active_writer))
> + wake_up_process(cpu_hotplug.active_writer);
> + }
> mutex_unlock(&cpu_hotplug.lock);
>
> }
So how about something like below:
------------------------------------------------------>
From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Subject: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
The synchronization between CPU hotplug readers and writers is achieved by
means of refcounting, safe-guarded by the cpu_hotplug.lock.
get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
it. If we ever hit an imbalance between the two, we end up compromising the
guarantees of the hotplug synchronization i.e, for example, an extra call to
put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect such cases
where the refcount can go negative, and also attempt to fix it up, so that we can
continue to run.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---
kernel/cpu.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index f560598..42bd331 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -80,6 +80,10 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(&cpu_hotplug.lock);
+
+ if (WARN_ON(!cpu_hotplug.refcount))
+ cpu_hotplug.refcount++; /* try to fix things up */
+
if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
wake_up_process(cpu_hotplug.active_writer);
mutex_unlock(&cpu_hotplug.lock);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
2012-10-04 6:16 ` Srivatsa S. Bhat
@ 2012-10-05 3:24 ` Yasuaki Ishimatsu
2012-10-05 5:35 ` Srivatsa S. Bhat
0 siblings, 1 reply; 24+ messages in thread
From: Yasuaki Ishimatsu @ 2012-10-05 3:24 UTC (permalink / raw)
To: Srivatsa S. Bhat
Cc: Andrew Morton, Jiri Kosina, Thomas Gleixner, Ingo Molnar,
Peter Zijlstra, Paul E. McKenney, Christoph Lameter, Pekka Enberg,
Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm
2012/10/04 15:16, Srivatsa S. Bhat wrote:
> On 10/04/2012 02:43 AM, Andrew Morton wrote:
>> On Wed, 03 Oct 2012 18:23:09 +0530
>> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>
>>> The synchronization between CPU hotplug readers and writers is achieved by
>>> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>>>
>>> get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
>>> it. If we ever hit an imbalance between the two, we end up compromising the
>>> guarantees of the hotplug synchronization i.e, for example, an extra call to
>>> put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
>>> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
>>> where the refcount can go negative.
>>>
>>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>> ---
>>>
>>> kernel/cpu.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>>> index f560598..00d29bc 100644
>>> --- a/kernel/cpu.c
>>> +++ b/kernel/cpu.c
>>> @@ -80,6 +80,7 @@ void put_online_cpus(void)
>>> if (cpu_hotplug.active_writer == current)
>>> return;
>>> mutex_lock(&cpu_hotplug.lock);
>>> + BUG_ON(cpu_hotplug.refcount == 0);
>>> if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>>> wake_up_process(cpu_hotplug.active_writer);
>>> mutex_unlock(&cpu_hotplug.lock);
>>
>> I think calling BUG() here is a bit harsh. We should only do that if
>> there's a risk to proceeding: a risk of data loss, a reduced ability to
>> analyse the underlying bug, etc.
>>
>> But a cpu-hotplug locking imbalance is a really really really minor
>> problem! So how about we emit a warning then try to fix things up?
>
> That would be better indeed, thanks!
>
>> This should increase the chance that the machine will keep running and
>> so will increase the chance that a user will be able to report the bug
>> to us.
>>
>
> Yep, sounds good.
>
>>
>> --- a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
>> +++ a/kernel/cpu.c
>> @@ -80,9 +80,12 @@ void put_online_cpus(void)
>> if (cpu_hotplug.active_writer == current)
>> return;
>> mutex_lock(&cpu_hotplug.lock);
>> - BUG_ON(cpu_hotplug.refcount == 0);
>> - if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>> - wake_up_process(cpu_hotplug.active_writer);
>> + if (!--cpu_hotplug.refcount) {
>
> This won't catch it. We'll enter this 'if' condition only when cpu_hotplug.refcount was
> decremented to zero. We'll miss out the case when it went negative (which we intended to detect).
>
>> + if (WARN_ON(cpu_hotplug.refcount == -1))
>> + cpu_hotplug.refcount++; /* try to fix things up */
>> + if (unlikely(cpu_hotplug.active_writer))
>> + wake_up_process(cpu_hotplug.active_writer);
>> + }
>> mutex_unlock(&cpu_hotplug.lock);
>>
>> }
>
> So how about something like below:
>
> ------------------------------------------------------>
>
> From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Subject: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
>
> The synchronization between CPU hotplug readers and writers is achieved by
> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>
> get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
> it. If we ever hit an imbalance between the two, we end up compromising the
> guarantees of the hotplug synchronization i.e, for example, an extra call to
> put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
> a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect such cases
> where the refcount can go negative, and also attempt to fix it up, so that we can
> continue to run.
>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
Looks good to me.
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>
> kernel/cpu.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index f560598..42bd331 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -80,6 +80,10 @@ void put_online_cpus(void)
> if (cpu_hotplug.active_writer == current)
> return;
> mutex_lock(&cpu_hotplug.lock);
> +
> + if (WARN_ON(!cpu_hotplug.refcount))
> + cpu_hotplug.refcount++; /* try to fix things up */
> +
> if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
> wake_up_process(cpu_hotplug.active_writer);
> mutex_unlock(&cpu_hotplug.lock);
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
2012-10-05 3:24 ` Yasuaki Ishimatsu
@ 2012-10-05 5:35 ` Srivatsa S. Bhat
0 siblings, 0 replies; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-05 5:35 UTC (permalink / raw)
To: Yasuaki Ishimatsu
Cc: Andrew Morton, Jiri Kosina, Thomas Gleixner, Ingo Molnar,
Peter Zijlstra, Paul E. McKenney, Christoph Lameter, Pekka Enberg,
Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm
On 10/05/2012 08:54 AM, Yasuaki Ishimatsu wrote:
> 2012/10/04 15:16, Srivatsa S. Bhat wrote:
>> On 10/04/2012 02:43 AM, Andrew Morton wrote:
>>> On Wed, 03 Oct 2012 18:23:09 +0530
>>> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>
>>>> The synchronization between CPU hotplug readers and writers is
>>>> achieved by
>>>> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>>>>
>>>> get_online_cpus() increments the refcount, whereas put_online_cpus()
>>>> decrements
>>>> it. If we ever hit an imbalance between the two, we end up
>>>> compromising the
>>>> guarantees of the hotplug synchronization i.e, for example, an extra
>>>> call to
>>>> put_online_cpus() can end up allowing a hotplug reader to execute
>>>> concurrently with
>>>> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect
>>>> such cases
>>>> where the refcount can go negative.
>>>>
>>>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>>> ---
>>>>
>>>> kernel/cpu.c | 1 +
>>>> 1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>>>> index f560598..00d29bc 100644
>>>> --- a/kernel/cpu.c
>>>> +++ b/kernel/cpu.c
>>>> @@ -80,6 +80,7 @@ void put_online_cpus(void)
>>>> if (cpu_hotplug.active_writer == current)
>>>> return;
>>>> mutex_lock(&cpu_hotplug.lock);
>>>> + BUG_ON(cpu_hotplug.refcount == 0);
>>>> if (!--cpu_hotplug.refcount &&
>>>> unlikely(cpu_hotplug.active_writer))
>>>> wake_up_process(cpu_hotplug.active_writer);
>>>> mutex_unlock(&cpu_hotplug.lock);
>>>
>>> I think calling BUG() here is a bit harsh. We should only do that if
>>> there's a risk to proceeding: a risk of data loss, a reduced ability to
>>> analyse the underlying bug, etc.
>>>
>>> But a cpu-hotplug locking imbalance is a really really really minor
>>> problem! So how about we emit a warning then try to fix things up?
>>
>> That would be better indeed, thanks!
>>
>>> This should increase the chance that the machine will keep running and
>>> so will increase the chance that a user will be able to report the bug
>>> to us.
>>>
>>
>> Yep, sounds good.
>>
>>>
>>> ---
>>> a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
>>>
>>> +++ a/kernel/cpu.c
>>> @@ -80,9 +80,12 @@ void put_online_cpus(void)
>>> if (cpu_hotplug.active_writer == current)
>>> return;
>>> mutex_lock(&cpu_hotplug.lock);
>>> - BUG_ON(cpu_hotplug.refcount == 0);
>>> - if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>>> - wake_up_process(cpu_hotplug.active_writer);
>>> + if (!--cpu_hotplug.refcount) {
>>
>> This won't catch it. We'll enter this 'if' condition only when
>> cpu_hotplug.refcount was
>> decremented to zero. We'll miss out the case when it went negative
>> (which we intended to detect).
>>
>>> + if (WARN_ON(cpu_hotplug.refcount == -1))
>>> + cpu_hotplug.refcount++; /* try to fix things up */
>>> + if (unlikely(cpu_hotplug.active_writer))
>>> + wake_up_process(cpu_hotplug.active_writer);
>>> + }
>>> mutex_unlock(&cpu_hotplug.lock);
>>>
>>> }
>>
>> So how about something like below:
>>
>> ------------------------------------------------------>
>>
>> From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> Subject: [PATCH] CPU hotplug, debug: Detect imbalance between
>> get_online_cpus() and put_online_cpus()
>>
>> The synchronization between CPU hotplug readers and writers is
>> achieved by
>> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>>
>> get_online_cpus() increments the refcount, whereas put_online_cpus()
>> decrements
>> it. If we ever hit an imbalance between the two, we end up
>> compromising the
>> guarantees of the hotplug synchronization i.e, for example, an extra
>> call to
>> put_online_cpus() can end up allowing a hotplug reader to execute
>> concurrently with
>> a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect
>> such cases
>> where the refcount can go negative, and also attempt to fix it up, so
>> that we can
>> continue to run.
>>
>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> ---
>
> Looks good to me.
> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>
Thanks for your review Yasuaki!
Regards,
Srivatsa S. Bhat
>>
>> kernel/cpu.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index f560598..42bd331 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -80,6 +80,10 @@ void put_online_cpus(void)
>> if (cpu_hotplug.active_writer == current)
>> return;
>> mutex_lock(&cpu_hotplug.lock);
>> +
>> + if (WARN_ON(!cpu_hotplug.refcount))
>> + cpu_hotplug.refcount++; /* try to fix things up */
>> +
>> if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>> wake_up_process(cpu_hotplug.active_writer);
>> mutex_unlock(&cpu_hotplug.lock);
>>
>>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
2012-10-03 12:22 ` Srivatsa S. Bhat
2012-10-03 12:53 ` [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() Srivatsa S. Bhat
@ 2012-10-03 14:50 ` Paul E. McKenney
2012-10-03 14:55 ` Srivatsa S. Bhat
1 sibling, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2012-10-03 14:50 UTC (permalink / raw)
To: Srivatsa S. Bhat
Cc: Jiri Kosina, Christoph Lameter, Pekka Enberg, Paul E. McKenney,
Josh Triplett, linux-kernel, linux-mm
On Wed, Oct 03, 2012 at 05:52:26PM +0530, Srivatsa S. Bhat wrote:
> On 10/03/2012 03:16 PM, Jiri Kosina wrote:
> > On Wed, 3 Oct 2012, Jiri Kosina wrote:
> >
> >> Good question. I believe it should be safe to drop slab_mutex earlier, as
> >> cachep has already been unlinked. I am adding slab people and linux-mm to
> >> CC (the whole thread on LKML can be found at
> >> https://lkml.org/lkml/2012/10/2/296 for reference).
> >>
> >> How about the patch below? Pekka, Christoph, please?
> >
> > It turns out that lockdep is actually getting this wrong, so the changelog
> > in the previous version wasn't accurate.
> >
> > Please find patch with updated changelog below. Pekka, Christoph, could
> > you please check whether it makes sense to you as well? Thanks.
> >
> >
> >
> >
> > From: Jiri Kosina <jkosina@suse.cz>
> > Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
> >
> > Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> > __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> > dependency through kmem_cache_destroy() -> rcu_barrier() ->
> > _rcu_barrier() -> get_online_cpus().
> >
> > Lockdep thinks that this might actually result in ABBA deadlock,
> > and reports it as below:
> >
> > === [ cut here ] ===
> > ======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
> > -------------------------------------------------------
> > kworker/u:2/40 is trying to acquire lock:
> > (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
> >
> > but task is already holding lock:
> > (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
> >
> > which lock already depends on the new lock.
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #2 (slab_mutex){+.+.+.}:
> > [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> > [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> > [<ffffffff810ae921>] lock_acquire+0x121/0x190
> > [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> > [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> > [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
> > [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
> > [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
> > [<ffffffff8155719d>] _cpu_up+0xba/0x14e
> > [<ffffffff815572ed>] cpu_up+0xbc/0x117
> > [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
> > [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
> > [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
> >
> > -> #1 (cpu_hotplug.lock){+.+.+.}:
> > [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> > [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> > [<ffffffff810ae921>] lock_acquire+0x121/0x190
> > [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> > [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> > [<ffffffff81049197>] get_online_cpus+0x37/0x50
> > [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
> > [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
> > [<ffffffff810f2309>] rcu_barrier+0x9/0x10
> > [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
> > [<ffffffff8118cc01>] deactivate_super+0x61/0x70
> > [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
> > [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
> > [<ffffffff81569979>] system_call_fastpath+0x16/0x1b
> >
> > -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
> > [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
> > [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> > [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> > [<ffffffff810ae921>] lock_acquire+0x121/0x190
> > [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> > [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> > [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
> > [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
> > [<ffffffff810f2309>] rcu_barrier+0x9/0x10
> > [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
> > [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
> > [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
> > [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
> > [<ffffffff81454b79>] ops_exit_list+0x39/0x60
> > [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
> > [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
> > [<ffffffff81069f3e>] worker_thread+0x12e/0x320
> > [<ffffffff8106f73e>] kthread+0x9e/0xb0
> > [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
> >
> > other info that might help us debug this:
> >
> > Chain exists of:
> > rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
> >
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(slab_mutex);
> > lock(cpu_hotplug.lock);
> > lock(slab_mutex);
> > lock(rcu_sched_state.barrier_mutex);
> >
> > *** DEADLOCK ***
> > === [ cut here ] ===
> >
> > This is actually a false positive. Lockdep has no way of knowing the fact
> > that the ABBA can actually never happen, because of special semantics of
> > cpu_hotplug.refcount and itss handling in cpu_hotplug_begin(); the mutual
> > exclusion there is not achieved through mutex, but through
> > cpu_hotplug.refcount.
> >
> > The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
> > until everyone who called get_online_cpus() will call put_online_cpus()"
> > semantics is totally invisible to lockdep.
> >
> > This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
> > is being called with it unlocked. It has two advantages:
> >
> > - it slightly reduces hold time of slab_mutex; as it's used to protect
> > the cachep list, it's not necessary to hold it over __kmem_cache_destroy()
> > call any more
> > - it silences the lockdep false positive warning, as it avoids lockdep ever
> > learning about slab_mutex -> cpu_hotplug.lock dependency
> >
> > Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Signed-off-by: Jiri Kosina <jkosina@suse.cz>
>
> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>
> Earlier I was under the wrong impression that all the calltraces that lockdep
> spewed were reflecting the same point in time. So, sorry about that noise! :-)
> It is indeed a false-positive and there is no real bug here, and the fix looks
> good, provided __kmem_cache_destroy() doesn't expect slab_mutex to be held.
I am not so sure about it being a false positive. Consider the following
sequence of events:
o Thread A starts a CPU-hotplug operation, acquiring the
hotplug mutex.
o Thread B does a kmem_cache_destroy(), acquiring the slab mutex.
o Thread A reaches the slab CPU-hotplug notifier, but cannot acquire
the slab mutex because Thread B hold it.
o Thread B enters rcu_barrier(), but cannot acquire the hotplug
mutex because Thread A holds it.
So I would argue that lockdep's output was a bit confusing, but that
the deadlock it flagged is real. Or am I still missing something?
Thanx, Paul
> But, I'm also quite surprised that the put_online_cpus() code as it stands today
> doesn't have any checks for the refcount going negative. I believe that such a
> check would be valuable to help catch cases where we might end up inadvertently
> causing an imbalance between get_online_cpus() and put_online_cpus(). I'll post
> that as a separate patch.
>
> Regards,
> Srivatsa S. Bhat
>
> > ---
> > mm/slab.c | 2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 1133911..693c7cb 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -2801,12 +2801,12 @@ void kmem_cache_destroy(struct kmem_cache *cachep)
> > put_online_cpus();
> > return;
> > }
> > + mutex_unlock(&slab_mutex);
> >
> > if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
> > rcu_barrier();
> >
> > __kmem_cache_destroy(cachep);
> > - mutex_unlock(&slab_mutex);
> > put_online_cpus();
> > }
> > EXPORT_SYMBOL(kmem_cache_destroy);
> >
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
2012-10-03 14:50 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Paul E. McKenney
@ 2012-10-03 14:55 ` Srivatsa S. Bhat
2012-10-03 16:00 ` Paul E. McKenney
0 siblings, 1 reply; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03 14:55 UTC (permalink / raw)
To: paulmck
Cc: Jiri Kosina, Christoph Lameter, Pekka Enberg, Paul E. McKenney,
Josh Triplett, linux-kernel, linux-mm
On 10/03/2012 08:20 PM, Paul E. McKenney wrote:
> On Wed, Oct 03, 2012 at 05:52:26PM +0530, Srivatsa S. Bhat wrote:
>> On 10/03/2012 03:16 PM, Jiri Kosina wrote:
>>> On Wed, 3 Oct 2012, Jiri Kosina wrote:
>>>
>>>> Good question. I believe it should be safe to drop slab_mutex earlier, as
>>>> cachep has already been unlinked. I am adding slab people and linux-mm to
>>>> CC (the whole thread on LKML can be found at
>>>> https://lkml.org/lkml/2012/10/2/296 for reference).
>>>>
>>>> How about the patch below? Pekka, Christoph, please?
>>>
>>> It turns out that lockdep is actually getting this wrong, so the changelog
>>> in the previous version wasn't accurate.
>>>
>>> Please find patch with updated changelog below. Pekka, Christoph, could
>>> you please check whether it makes sense to you as well? Thanks.
>>>
>>>
>>>
>>>
>>> From: Jiri Kosina <jkosina@suse.cz>
>>> Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
>>>
>>> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
>>> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
>>> dependency through kmem_cache_destroy() -> rcu_barrier() ->
>>> _rcu_barrier() -> get_online_cpus().
>>>
>>> Lockdep thinks that this might actually result in ABBA deadlock,
>>> and reports it as below:
>>>
>>> === [ cut here ] ===
>>> ======================================================
>>> [ INFO: possible circular locking dependency detected ]
>>> 3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
>>> -------------------------------------------------------
>>> kworker/u:2/40 is trying to acquire lock:
>>> (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
>>>
>>> but task is already holding lock:
>>> (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
>>>
>>> which lock already depends on the new lock.
>>>
>>> the existing dependency chain (in reverse order) is:
>>>
>>> -> #2 (slab_mutex){+.+.+.}:
>>> [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>>> [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>>> [<ffffffff810ae921>] lock_acquire+0x121/0x190
>>> [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>>> [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>>> [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
>>> [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
>>> [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
>>> [<ffffffff8155719d>] _cpu_up+0xba/0x14e
>>> [<ffffffff815572ed>] cpu_up+0xbc/0x117
>>> [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
>>> [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
>>> [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
>>>
>>> -> #1 (cpu_hotplug.lock){+.+.+.}:
>>> [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>>> [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>>> [<ffffffff810ae921>] lock_acquire+0x121/0x190
>>> [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>>> [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>>> [<ffffffff81049197>] get_online_cpus+0x37/0x50
>>> [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
>>> [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
>>> [<ffffffff810f2309>] rcu_barrier+0x9/0x10
>>> [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
>>> [<ffffffff8118cc01>] deactivate_super+0x61/0x70
>>> [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
>>> [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
>>> [<ffffffff81569979>] system_call_fastpath+0x16/0x1b
>>>
>>> -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
>>> [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
>>> [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>>> [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>>> [<ffffffff810ae921>] lock_acquire+0x121/0x190
>>> [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>>> [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>>> [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
>>> [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
>>> [<ffffffff810f2309>] rcu_barrier+0x9/0x10
>>> [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
>>> [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
>>> [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
>>> [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
>>> [<ffffffff81454b79>] ops_exit_list+0x39/0x60
>>> [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
>>> [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
>>> [<ffffffff81069f3e>] worker_thread+0x12e/0x320
>>> [<ffffffff8106f73e>] kthread+0x9e/0xb0
>>> [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
>>>
>>> other info that might help us debug this:
>>>
>>> Chain exists of:
>>> rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
>>>
>>> Possible unsafe locking scenario:
>>>
>>> CPU0 CPU1
>>> ---- ----
>>> lock(slab_mutex);
>>> lock(cpu_hotplug.lock);
>>> lock(slab_mutex);
>>> lock(rcu_sched_state.barrier_mutex);
>>>
>>> *** DEADLOCK ***
>>> === [ cut here ] ===
>>>
>>> This is actually a false positive. Lockdep has no way of knowing the fact
>>> that the ABBA can actually never happen, because of special semantics of
>>> cpu_hotplug.refcount and itss handling in cpu_hotplug_begin(); the mutual
>>> exclusion there is not achieved through mutex, but through
>>> cpu_hotplug.refcount.
>>>
>>> The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
>>> until everyone who called get_online_cpus() will call put_online_cpus()"
>>> semantics is totally invisible to lockdep.
>>>
>>> This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
>>> is being called with it unlocked. It has two advantages:
>>>
>>> - it slightly reduces hold time of slab_mutex; as it's used to protect
>>> the cachep list, it's not necessary to hold it over __kmem_cache_destroy()
>>> call any more
>>> - it silences the lockdep false positive warning, as it avoids lockdep ever
>>> learning about slab_mutex -> cpu_hotplug.lock dependency
>>>
>>> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>>> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
>>
>> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>
>> Earlier I was under the wrong impression that all the calltraces that lockdep
>> spewed were reflecting the same point in time. So, sorry about that noise! :-)
>> It is indeed a false-positive and there is no real bug here, and the fix looks
>> good, provided __kmem_cache_destroy() doesn't expect slab_mutex to be held.
>
> I am not so sure about it being a false positive. Consider the following
> sequence of events:
>
> o Thread A starts a CPU-hotplug operation, acquiring the
> hotplug mutex.
>
> o Thread B does a kmem_cache_destroy(), acquiring the slab mutex.
This can't happen. Because kmem_cache_destroy() will call get_online_cpus() before
trying to acquire slab mutex. And it sleeps waiting at get_online_cpus() because
the hotplug lock has already been acquired by Thread A.
>
> o Thread A reaches the slab CPU-hotplug notifier, but cannot acquire
> the slab mutex because Thread B hold it.
>
> o Thread B enters rcu_barrier(), but cannot acquire the hotplug
> mutex because Thread A holds it.
>
> So I would argue that lockdep's output was a bit confusing, but that
> the deadlock it flagged is real. Or am I still missing something?
>
So the key point is, Thread A is a hotplug writer. Thread B becomes a hotplug reader
the moment it calls get_online_cpus(). So they can't co-exist/run together. They will
get serialized. That is, Thread A runs to completion, releases hotplug lock. Only then
thread B gets past get_online_cpus().
Regards,
Srivatsa S. Bhat
>> But, I'm also quite surprised that the put_online_cpus() code as it stands today
>> doesn't have any checks for the refcount going negative. I believe that such a
>> check would be valuable to help catch cases where we might end up inadvertently
>> causing an imbalance between get_online_cpus() and put_online_cpus(). I'll post
>> that as a separate patch.
>>
>> Regards,
>> Srivatsa S. Bhat
>>
>>> ---
>>> mm/slab.c | 2 +-
>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/mm/slab.c b/mm/slab.c
>>> index 1133911..693c7cb 100644
>>> --- a/mm/slab.c
>>> +++ b/mm/slab.c
>>> @@ -2801,12 +2801,12 @@ void kmem_cache_destroy(struct kmem_cache *cachep)
>>> put_online_cpus();
>>> return;
>>> }
>>> + mutex_unlock(&slab_mutex);
>>>
>>> if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
>>> rcu_barrier();
>>>
>>> __kmem_cache_destroy(cachep);
>>> - mutex_unlock(&slab_mutex);
>>> put_online_cpus();
>>> }
>>> EXPORT_SYMBOL(kmem_cache_destroy);
>>>
>>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
2012-10-03 14:55 ` Srivatsa S. Bhat
@ 2012-10-03 16:00 ` Paul E. McKenney
0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2012-10-03 16:00 UTC (permalink / raw)
To: Srivatsa S. Bhat
Cc: Jiri Kosina, Christoph Lameter, Pekka Enberg, Paul E. McKenney,
Josh Triplett, linux-kernel, linux-mm
On Wed, Oct 03, 2012 at 08:25:28PM +0530, Srivatsa S. Bhat wrote:
> On 10/03/2012 08:20 PM, Paul E. McKenney wrote:
> > On Wed, Oct 03, 2012 at 05:52:26PM +0530, Srivatsa S. Bhat wrote:
> >> On 10/03/2012 03:16 PM, Jiri Kosina wrote:
> >>> On Wed, 3 Oct 2012, Jiri Kosina wrote:
> >>>
> >>>> Good question. I believe it should be safe to drop slab_mutex earlier, as
> >>>> cachep has already been unlinked. I am adding slab people and linux-mm to
> >>>> CC (the whole thread on LKML can be found at
> >>>> https://lkml.org/lkml/2012/10/2/296 for reference).
> >>>>
> >>>> How about the patch below? Pekka, Christoph, please?
> >>>
> >>> It turns out that lockdep is actually getting this wrong, so the changelog
> >>> in the previous version wasn't accurate.
> >>>
> >>> Please find patch with updated changelog below. Pekka, Christoph, could
> >>> you please check whether it makes sense to you as well? Thanks.
> >>>
> >>>
> >>>
> >>>
> >>> From: Jiri Kosina <jkosina@suse.cz>
> >>> Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
> >>>
> >>> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> >>> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> >>> dependency through kmem_cache_destroy() -> rcu_barrier() ->
> >>> _rcu_barrier() -> get_online_cpus().
> >>>
> >>> Lockdep thinks that this might actually result in ABBA deadlock,
> >>> and reports it as below:
> >>>
> >>> === [ cut here ] ===
> >>> ======================================================
> >>> [ INFO: possible circular locking dependency detected ]
> >>> 3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
> >>> -------------------------------------------------------
> >>> kworker/u:2/40 is trying to acquire lock:
> >>> (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
> >>>
> >>> but task is already holding lock:
> >>> (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
> >>>
> >>> which lock already depends on the new lock.
> >>>
> >>> the existing dependency chain (in reverse order) is:
> >>>
> >>> -> #2 (slab_mutex){+.+.+.}:
> >>> [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> >>> [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> >>> [<ffffffff810ae921>] lock_acquire+0x121/0x190
> >>> [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> >>> [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> >>> [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
> >>> [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
> >>> [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
> >>> [<ffffffff8155719d>] _cpu_up+0xba/0x14e
> >>> [<ffffffff815572ed>] cpu_up+0xbc/0x117
> >>> [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
> >>> [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
> >>> [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
> >>>
> >>> -> #1 (cpu_hotplug.lock){+.+.+.}:
> >>> [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> >>> [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> >>> [<ffffffff810ae921>] lock_acquire+0x121/0x190
> >>> [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> >>> [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> >>> [<ffffffff81049197>] get_online_cpus+0x37/0x50
> >>> [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
> >>> [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
> >>> [<ffffffff810f2309>] rcu_barrier+0x9/0x10
> >>> [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
> >>> [<ffffffff8118cc01>] deactivate_super+0x61/0x70
> >>> [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
> >>> [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
> >>> [<ffffffff81569979>] system_call_fastpath+0x16/0x1b
> >>>
> >>> -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
> >>> [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
> >>> [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> >>> [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> >>> [<ffffffff810ae921>] lock_acquire+0x121/0x190
> >>> [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> >>> [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> >>> [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
> >>> [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
> >>> [<ffffffff810f2309>] rcu_barrier+0x9/0x10
> >>> [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
> >>> [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
> >>> [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
> >>> [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
> >>> [<ffffffff81454b79>] ops_exit_list+0x39/0x60
> >>> [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
> >>> [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
> >>> [<ffffffff81069f3e>] worker_thread+0x12e/0x320
> >>> [<ffffffff8106f73e>] kthread+0x9e/0xb0
> >>> [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
> >>>
> >>> other info that might help us debug this:
> >>>
> >>> Chain exists of:
> >>> rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
> >>>
> >>> Possible unsafe locking scenario:
> >>>
> >>> CPU0 CPU1
> >>> ---- ----
> >>> lock(slab_mutex);
> >>> lock(cpu_hotplug.lock);
> >>> lock(slab_mutex);
> >>> lock(rcu_sched_state.barrier_mutex);
> >>>
> >>> *** DEADLOCK ***
> >>> === [ cut here ] ===
> >>>
> >>> This is actually a false positive. Lockdep has no way of knowing the fact
> >>> that the ABBA can actually never happen, because of special semantics of
> >>> cpu_hotplug.refcount and itss handling in cpu_hotplug_begin(); the mutual
> >>> exclusion there is not achieved through mutex, but through
> >>> cpu_hotplug.refcount.
> >>>
> >>> The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
> >>> until everyone who called get_online_cpus() will call put_online_cpus()"
> >>> semantics is totally invisible to lockdep.
> >>>
> >>> This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
> >>> is being called with it unlocked. It has two advantages:
> >>>
> >>> - it slightly reduces hold time of slab_mutex; as it's used to protect
> >>> the cachep list, it's not necessary to hold it over __kmem_cache_destroy()
> >>> call any more
> >>> - it silences the lockdep false positive warning, as it avoids lockdep ever
> >>> learning about slab_mutex -> cpu_hotplug.lock dependency
> >>>
> >>> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >>> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> >>
> >> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> >>
> >> Earlier I was under the wrong impression that all the calltraces that lockdep
> >> spewed were reflecting the same point in time. So, sorry about that noise! :-)
> >> It is indeed a false-positive and there is no real bug here, and the fix looks
> >> good, provided __kmem_cache_destroy() doesn't expect slab_mutex to be held.
> >
> > I am not so sure about it being a false positive. Consider the following
> > sequence of events:
> >
> > o Thread A starts a CPU-hotplug operation, acquiring the
> > hotplug mutex.
> >
> > o Thread B does a kmem_cache_destroy(), acquiring the slab mutex.
>
> This can't happen. Because kmem_cache_destroy() will call get_online_cpus() before
> trying to acquire slab mutex. And it sleeps waiting at get_online_cpus() because
> the hotplug lock has already been acquired by Thread A.
Good point!!! False positive it is!
Thanx, Paul
> > o Thread A reaches the slab CPU-hotplug notifier, but cannot acquire
> > the slab mutex because Thread B hold it.
> >
> > o Thread B enters rcu_barrier(), but cannot acquire the hotplug
> > mutex because Thread A holds it.
> >
> > So I would argue that lockdep's output was a bit confusing, but that
> > the deadlock it flagged is real. Or am I still missing something?
> >
>
> So the key point is, Thread A is a hotplug writer. Thread B becomes a hotplug reader
> the moment it calls get_online_cpus(). So they can't co-exist/run together. They will
> get serialized. That is, Thread A runs to completion, releases hotplug lock. Only then
> thread B gets past get_online_cpus().
>
> Regards,
> Srivatsa S. Bhat
>
> >> But, I'm also quite surprised that the put_online_cpus() code as it stands today
> >> doesn't have any checks for the refcount going negative. I believe that such a
> >> check would be valuable to help catch cases where we might end up inadvertently
> >> causing an imbalance between get_online_cpus() and put_online_cpus(). I'll post
> >> that as a separate patch.
> >>
> >> Regards,
> >> Srivatsa S. Bhat
> >>
> >>> ---
> >>> mm/slab.c | 2 +-
> >>> 1 files changed, 1 insertions(+), 1 deletions(-)
> >>>
> >>> diff --git a/mm/slab.c b/mm/slab.c
> >>> index 1133911..693c7cb 100644
> >>> --- a/mm/slab.c
> >>> +++ b/mm/slab.c
> >>> @@ -2801,12 +2801,12 @@ void kmem_cache_destroy(struct kmem_cache *cachep)
> >>> put_online_cpus();
> >>> return;
> >>> }
> >>> + mutex_unlock(&slab_mutex);
> >>>
> >>> if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
> >>> rcu_barrier();
> >>>
> >>> __kmem_cache_destroy(cachep);
> >>> - mutex_unlock(&slab_mutex);
> >>> put_online_cpus();
> >>> }
> >>> EXPORT_SYMBOL(kmem_cache_destroy);
> >>>
> >>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
2012-10-03 9:46 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 12:22 ` Srivatsa S. Bhat
@ 2012-10-03 14:17 ` Christoph Lameter
1 sibling, 0 replies; 24+ messages in thread
From: Christoph Lameter @ 2012-10-03 14:17 UTC (permalink / raw)
To: Jiri Kosina
Cc: Paul E. McKenney, Pekka Enberg, Paul E. McKenney, Josh Triplett,
linux-kernel, Srivatsa S. Bhat, linux-mm
Acked-by: Christoph Lameter <cl@linux.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()"))
2012-10-03 0:45 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
` (2 preceding siblings ...)
2012-10-03 9:46 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
@ 2012-10-03 14:15 ` Christoph Lameter
2012-10-03 14:34 ` [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
3 siblings, 1 reply; 24+ messages in thread
From: Christoph Lameter @ 2012-10-03 14:15 UTC (permalink / raw)
To: Jiri Kosina
Cc: Paul E. McKenney, Pekka Enberg, Paul E. McKenney, Josh Triplett,
linux-kernel, Srivatsa S. Bhat, linux-mm
On Wed, 3 Oct 2012, Jiri Kosina wrote:
> How about the patch below? Pekka, Christoph, please?
Looks fine for -stable. For upstream there is going to be a move to
slab_common coming in this merge period. We would need a fix against -next
or Pekka's tree too.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
2012-10-03 14:15 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Christoph Lameter
@ 2012-10-03 14:34 ` Jiri Kosina
2012-10-03 15:00 ` Srivatsa S. Bhat
0 siblings, 1 reply; 24+ messages in thread
From: Jiri Kosina @ 2012-10-03 14:34 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg
Cc: Paul E. McKenney, Paul E. McKenney, Josh Triplett, linux-kernel,
Srivatsa S. Bhat, linux-mm
On Wed, 3 Oct 2012, Christoph Lameter wrote:
> > How about the patch below? Pekka, Christoph, please?
>
> Looks fine for -stable. For upstream there is going to be a move to
> slab_common coming in this merge period. We would need a fix against -next
> or Pekka's tree too.
Thanks Christoph. Patch against Pekka's slab/for-linus branch below.
I have kept the Acked-by/Reviewed-by from the version of the patch against
current Linus' tree, if anyone object, please shout loudly. Ideally should
go in during this merge window to keep lockdep happy.
From: Jiri Kosina <jkosina@suse.cz>
Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
__stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
dependency through kmem_cache_destroy() -> rcu_barrier() ->
_rcu_barrier() -> get_online_cpus().
Lockdep thinks that this might actually result in ABBA deadlock,
and reports it as below:
=== [ cut here ] ===
======================================================
[ INFO: possible circular locking dependency detected ]
3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
-------------------------------------------------------
kworker/u:2/40 is trying to acquire lock:
(rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
but task is already holding lock:
(slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (slab_mutex){+.+.+.}:
[<ffffffff810ae1e2>] validate_chain+0x632/0x720
[<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
[<ffffffff810ae921>] lock_acquire+0x121/0x190
[<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
[<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
[<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
[<ffffffff81564b83>] notifier_call_chain+0x93/0x140
[<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
[<ffffffff8155719d>] _cpu_up+0xba/0x14e
[<ffffffff815572ed>] cpu_up+0xbc/0x117
[<ffffffff81ae05e3>] smp_init+0x6b/0x9f
[<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
[<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
-> #1 (cpu_hotplug.lock){+.+.+.}:
[<ffffffff810ae1e2>] validate_chain+0x632/0x720
[<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
[<ffffffff810ae921>] lock_acquire+0x121/0x190
[<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
[<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
[<ffffffff81049197>] get_online_cpus+0x37/0x50
[<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
[<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
[<ffffffff810f2309>] rcu_barrier+0x9/0x10
[<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
[<ffffffff8118cc01>] deactivate_super+0x61/0x70
[<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
[<ffffffff811ab49e>] sys_umount+0x6e/0xd0
[<ffffffff81569979>] system_call_fastpath+0x16/0x1b
-> #0 (rcu_sched_state.barrier_mutex){+.+...}:
[<ffffffff810adb4e>] check_prev_add+0x3de/0x440
[<ffffffff810ae1e2>] validate_chain+0x632/0x720
[<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
[<ffffffff810ae921>] lock_acquire+0x121/0x190
[<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
[<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
[<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
[<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
[<ffffffff810f2309>] rcu_barrier+0x9/0x10
[<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
[<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
[<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
[<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
[<ffffffff81454b79>] ops_exit_list+0x39/0x60
[<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
[<ffffffff8106917b>] process_one_work+0x26b/0x4c0
[<ffffffff81069f3e>] worker_thread+0x12e/0x320
[<ffffffff8106f73e>] kthread+0x9e/0xb0
[<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
other info that might help us debug this:
Chain exists of:
rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(slab_mutex);
lock(cpu_hotplug.lock);
lock(slab_mutex);
lock(rcu_sched_state.barrier_mutex);
*** DEADLOCK ***
=== [ cut here ] ===
This is actually a false positive. Lockdep has no way of knowing the fact
that the ABBA can actually never happen, because of special semantics of
cpu_hotplug.refcount and itss handling in cpu_hotplug_begin(); the mutual
exclusion there is not achieved through mutex, but through
cpu_hotplug.refcount.
The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
until everyone who called get_online_cpus() will call put_online_cpus()"
semantics is totally invisible to lockdep.
This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
is being called with it unlocked. It has two advantages:
- it slightly reduces hold time of slab_mutex; as it's used to protect
the cachep list, it's not necessary to hold it over kmem_cache_free()
call any more
- it silences the lockdep false positive warning, as it avoids lockdep ever
learning about slab_mutex -> cpu_hotplug.lock dependency
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
mm/slab_common.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 9c21725..90c3053 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -166,6 +166,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
s->refcount--;
if (!s->refcount) {
list_del(&s->list);
+ mutex_unlock(&slab_mutex);
if (!__kmem_cache_shutdown(s)) {
if (s->flags & SLAB_DESTROY_BY_RCU)
@@ -179,8 +180,9 @@ void kmem_cache_destroy(struct kmem_cache *s)
s->name);
dump_stack();
}
+ } else {
+ mutex_unlock(&slab_mutex);
}
- mutex_unlock(&slab_mutex);
put_online_cpus();
}
EXPORT_SYMBOL(kmem_cache_destroy);
--
Jiri Kosina
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
2012-10-03 14:34 ` [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
@ 2012-10-03 15:00 ` Srivatsa S. Bhat
2012-10-03 15:05 ` [PATCH v4] " Jiri Kosina
0 siblings, 1 reply; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03 15:00 UTC (permalink / raw)
To: Jiri Kosina
Cc: Christoph Lameter, Pekka Enberg, Paul E. McKenney,
Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm
On 10/03/2012 08:04 PM, Jiri Kosina wrote:
> On Wed, 3 Oct 2012, Christoph Lameter wrote:
>
>>> How about the patch below? Pekka, Christoph, please?
>>
>> Looks fine for -stable. For upstream there is going to be a move to
>> slab_common coming in this merge period. We would need a fix against -next
>> or Pekka's tree too.
>
> Thanks Christoph. Patch against Pekka's slab/for-linus branch below.
>
> I have kept the Acked-by/Reviewed-by from the version of the patch against
> current Linus' tree, if anyone object, please shout loudly. Ideally should
> go in during this merge window to keep lockdep happy.
>
>
>
>
>
> From: Jiri Kosina <jkosina@suse.cz>
> Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
>
> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> dependency through kmem_cache_destroy() -> rcu_barrier() ->
> _rcu_barrier() -> get_online_cpus().
>
> Lockdep thinks that this might actually result in ABBA deadlock,
> and reports it as below:
>
[...]
> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Acked-by: Christoph Lameter <cl@linux.com>
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> ---
> mm/slab_common.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 9c21725..90c3053 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -166,6 +166,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
> s->refcount--;
> if (!s->refcount) {
> list_del(&s->list);
> + mutex_unlock(&slab_mutex);
>
> if (!__kmem_cache_shutdown(s)) {
__kmem_cache_shutdown() calls __cache_shrink(). And __cache_shrink() has this
comment over it:
/* Called with slab_mutex held to protect against cpu hotplug */
So, I guess the question is whether to modify your patch to hold the slab_mutex
while calling this function, or to update the comment on top of this function
saying that we are OK to call this function (even without slab_mutex) when we
are inside a get/put_online_cpus() section.
> if (s->flags & SLAB_DESTROY_BY_RCU)
> @@ -179,8 +180,9 @@ void kmem_cache_destroy(struct kmem_cache *s)
> s->name);
> dump_stack();
There is a list_add() before this dump_stack(). I assume we need to hold the
slab_mutex while calling it.
> }
> + } else {
> + mutex_unlock(&slab_mutex);
> }
> - mutex_unlock(&slab_mutex);
> put_online_cpus();
> }
> EXPORT_SYMBOL(kmem_cache_destroy);
>
Regards,
Srivatsa S. Bhat
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v4] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
2012-10-03 15:00 ` Srivatsa S. Bhat
@ 2012-10-03 15:05 ` Jiri Kosina
2012-10-03 15:49 ` Srivatsa S. Bhat
2012-10-03 18:49 ` David Rientjes
0 siblings, 2 replies; 24+ messages in thread
From: Jiri Kosina @ 2012-10-03 15:05 UTC (permalink / raw)
To: Srivatsa S. Bhat, Christoph Lameter, Pekka Enberg
Cc: Paul E. McKenney, Paul E. McKenney, Josh Triplett, linux-kernel,
linux-mm
On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 9c21725..90c3053 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -166,6 +166,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
> > s->refcount--;
> > if (!s->refcount) {
> > list_del(&s->list);
> > + mutex_unlock(&slab_mutex);
> >
> > if (!__kmem_cache_shutdown(s)) {
>
> __kmem_cache_shutdown() calls __cache_shrink(). And __cache_shrink() has this
> comment over it:
> /* Called with slab_mutex held to protect against cpu hotplug */
>
> So, I guess the question is whether to modify your patch to hold the slab_mutex
> while calling this function, or to update the comment on top of this function
> saying that we are OK to call this function (even without slab_mutex) when we
> are inside a get/put_online_cpus() section.
>
> > if (s->flags & SLAB_DESTROY_BY_RCU)
> > @@ -179,8 +180,9 @@ void kmem_cache_destroy(struct kmem_cache *s)
> > s->name);
> > dump_stack();
>
> There is a list_add() before this dump_stack(). I assume we need to hold the
> slab_mutex while calling it.
Gah, of course it is, thanks for spotting this.
From: Jiri Kosina <jkosina@suse.cz>
Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
__stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
dependency through kmem_cache_destroy() -> rcu_barrier() ->
_rcu_barrier() -> get_online_cpus().
Lockdep thinks that this might actually result in ABBA deadlock,
and reports it as below:
=== [ cut here ] ===
======================================================
[ INFO: possible circular locking dependency detected ]
3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
-------------------------------------------------------
kworker/u:2/40 is trying to acquire lock:
(rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
but task is already holding lock:
(slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (slab_mutex){+.+.+.}:
[<ffffffff810ae1e2>] validate_chain+0x632/0x720
[<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
[<ffffffff810ae921>] lock_acquire+0x121/0x190
[<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
[<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
[<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
[<ffffffff81564b83>] notifier_call_chain+0x93/0x140
[<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
[<ffffffff8155719d>] _cpu_up+0xba/0x14e
[<ffffffff815572ed>] cpu_up+0xbc/0x117
[<ffffffff81ae05e3>] smp_init+0x6b/0x9f
[<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
[<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
-> #1 (cpu_hotplug.lock){+.+.+.}:
[<ffffffff810ae1e2>] validate_chain+0x632/0x720
[<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
[<ffffffff810ae921>] lock_acquire+0x121/0x190
[<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
[<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
[<ffffffff81049197>] get_online_cpus+0x37/0x50
[<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
[<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
[<ffffffff810f2309>] rcu_barrier+0x9/0x10
[<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
[<ffffffff8118cc01>] deactivate_super+0x61/0x70
[<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
[<ffffffff811ab49e>] sys_umount+0x6e/0xd0
[<ffffffff81569979>] system_call_fastpath+0x16/0x1b
-> #0 (rcu_sched_state.barrier_mutex){+.+...}:
[<ffffffff810adb4e>] check_prev_add+0x3de/0x440
[<ffffffff810ae1e2>] validate_chain+0x632/0x720
[<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
[<ffffffff810ae921>] lock_acquire+0x121/0x190
[<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
[<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
[<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
[<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
[<ffffffff810f2309>] rcu_barrier+0x9/0x10
[<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
[<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
[<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
[<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
[<ffffffff81454b79>] ops_exit_list+0x39/0x60
[<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
[<ffffffff8106917b>] process_one_work+0x26b/0x4c0
[<ffffffff81069f3e>] worker_thread+0x12e/0x320
[<ffffffff8106f73e>] kthread+0x9e/0xb0
[<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
other info that might help us debug this:
Chain exists of:
rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(slab_mutex);
lock(cpu_hotplug.lock);
lock(slab_mutex);
lock(rcu_sched_state.barrier_mutex);
*** DEADLOCK ***
=== [ cut here ] ===
This is actually a false positive. Lockdep has no way of knowing the fact
that the ABBA can actually never happen, because of special semantics of
cpu_hotplug.refcount and itss handling in cpu_hotplug_begin(); the mutual
exclusion there is not achieved through mutex, but through
cpu_hotplug.refcount.
The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
until everyone who called get_online_cpus() will call put_online_cpus()"
semantics is totally invisible to lockdep.
This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
is being called with it unlocked. It has two advantages:
- it slightly reduces hold time of slab_mutex; as it's used to protect
the cachep list, it's not necessary to hold it over kmem_cache_free()
call any more
- it silences the lockdep false positive warning, as it avoids lockdep ever
learning about slab_mutex -> cpu_hotplug.lock dependency
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
mm/slab_common.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 9c21725..069a24e6 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -168,6 +168,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
list_del(&s->list);
if (!__kmem_cache_shutdown(s)) {
+ mutex_unlock(&slab_mutex);
if (s->flags & SLAB_DESTROY_BY_RCU)
rcu_barrier();
@@ -175,12 +176,14 @@ void kmem_cache_destroy(struct kmem_cache *s)
kmem_cache_free(kmem_cache, s);
} else {
list_add(&s->list, &slab_caches);
+ mutex_unlock(&slab_mutex);
printk(KERN_ERR "kmem_cache_destroy %s: Slab cache still has objects\n",
s->name);
dump_stack();
}
+ } else {
+ mutex_unlock(&slab_mutex);
}
- mutex_unlock(&slab_mutex);
put_online_cpus();
}
EXPORT_SYMBOL(kmem_cache_destroy);
--
Jiri Kosina
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v4] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
2012-10-03 15:05 ` [PATCH v4] " Jiri Kosina
@ 2012-10-03 15:49 ` Srivatsa S. Bhat
2012-10-03 18:49 ` David Rientjes
1 sibling, 0 replies; 24+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03 15:49 UTC (permalink / raw)
To: Jiri Kosina
Cc: Christoph Lameter, Pekka Enberg, Paul E. McKenney,
Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm
On 10/03/2012 08:35 PM, Jiri Kosina wrote:
> On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:
>
>>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>>> index 9c21725..90c3053 100644
>>> --- a/mm/slab_common.c
>>> +++ b/mm/slab_common.c
>>> @@ -166,6 +166,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
>>> s->refcount--;
>>> if (!s->refcount) {
>>> list_del(&s->list);
>>> + mutex_unlock(&slab_mutex);
>>>
>>> if (!__kmem_cache_shutdown(s)) {
>>
>> __kmem_cache_shutdown() calls __cache_shrink(). And __cache_shrink() has this
>> comment over it:
>> /* Called with slab_mutex held to protect against cpu hotplug */
>>
>> So, I guess the question is whether to modify your patch to hold the slab_mutex
>> while calling this function, or to update the comment on top of this function
>> saying that we are OK to call this function (even without slab_mutex) when we
>> are inside a get/put_online_cpus() section.
>>
>>> if (s->flags & SLAB_DESTROY_BY_RCU)
>>> @@ -179,8 +180,9 @@ void kmem_cache_destroy(struct kmem_cache *s)
>>> s->name);
>>> dump_stack();
>>
>> There is a list_add() before this dump_stack(). I assume we need to hold the
>> slab_mutex while calling it.
>
> Gah, of course it is, thanks for spotting this.
>
>
> From: Jiri Kosina <jkosina@suse.cz>
> Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
>
> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> dependency through kmem_cache_destroy() -> rcu_barrier() ->
> _rcu_barrier() -> get_online_cpus().
>
> Lockdep thinks that this might actually result in ABBA deadlock,
> and reports it as below:
>
[...]
> This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
> is being called with it unlocked. It has two advantages:
>
> - it slightly reduces hold time of slab_mutex; as it's used to protect
> the cachep list, it's not necessary to hold it over kmem_cache_free()
> call any more
> - it silences the lockdep false positive warning, as it avoids lockdep ever
> learning about slab_mutex -> cpu_hotplug.lock dependency
>
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Hmm.. We can't do much about readability I guess... :(
Regards,
Srivatsa S. Bhat
> ---
> mm/slab_common.c | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 9c21725..069a24e6 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -168,6 +168,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
> list_del(&s->list);
>
> if (!__kmem_cache_shutdown(s)) {
> + mutex_unlock(&slab_mutex);
> if (s->flags & SLAB_DESTROY_BY_RCU)
> rcu_barrier();
>
> @@ -175,12 +176,14 @@ void kmem_cache_destroy(struct kmem_cache *s)
> kmem_cache_free(kmem_cache, s);
> } else {
> list_add(&s->list, &slab_caches);
> + mutex_unlock(&slab_mutex);
> printk(KERN_ERR "kmem_cache_destroy %s: Slab cache still has objects\n",
> s->name);
> dump_stack();
> }
> + } else {
> + mutex_unlock(&slab_mutex);
> }
> - mutex_unlock(&slab_mutex);
> put_online_cpus();
> }
> EXPORT_SYMBOL(kmem_cache_destroy);
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
2012-10-03 15:05 ` [PATCH v4] " Jiri Kosina
2012-10-03 15:49 ` Srivatsa S. Bhat
@ 2012-10-03 18:49 ` David Rientjes
2012-10-08 7:26 ` [PATCH] [RESEND] " Jiri Kosina
1 sibling, 1 reply; 24+ messages in thread
From: David Rientjes @ 2012-10-03 18:49 UTC (permalink / raw)
To: Jiri Kosina
Cc: Srivatsa S. Bhat, Christoph Lameter, Pekka Enberg,
Paul E. McKenney, Paul E. McKenney, Josh Triplett, linux-kernel,
linux-mm
On Wed, 3 Oct 2012, Jiri Kosina wrote:
> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> dependency through kmem_cache_destroy() -> rcu_barrier() ->
> _rcu_barrier() -> get_online_cpus().
>
> Lockdep thinks that this might actually result in ABBA deadlock,
> and reports it as below:
>
> === [ cut here ] ===
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
> -------------------------------------------------------
> kworker/u:2/40 is trying to acquire lock:
> (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
>
> but task is already holding lock:
> (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (slab_mutex){+.+.+.}:
> [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> [<ffffffff810ae921>] lock_acquire+0x121/0x190
> [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
> [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
> [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
> [<ffffffff8155719d>] _cpu_up+0xba/0x14e
> [<ffffffff815572ed>] cpu_up+0xbc/0x117
> [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
> [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
> [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
>
> -> #1 (cpu_hotplug.lock){+.+.+.}:
> [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> [<ffffffff810ae921>] lock_acquire+0x121/0x190
> [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> [<ffffffff81049197>] get_online_cpus+0x37/0x50
> [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
> [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
> [<ffffffff810f2309>] rcu_barrier+0x9/0x10
> [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
> [<ffffffff8118cc01>] deactivate_super+0x61/0x70
> [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
> [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
> [<ffffffff81569979>] system_call_fastpath+0x16/0x1b
>
> -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
> [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
> [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> [<ffffffff810ae921>] lock_acquire+0x121/0x190
> [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
> [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
> [<ffffffff810f2309>] rcu_barrier+0x9/0x10
> [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
> [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
> [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
> [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
> [<ffffffff81454b79>] ops_exit_list+0x39/0x60
> [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
> [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
> [<ffffffff81069f3e>] worker_thread+0x12e/0x320
> [<ffffffff8106f73e>] kthread+0x9e/0xb0
> [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
>
> other info that might help us debug this:
>
> Chain exists of:
> rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(slab_mutex);
> lock(cpu_hotplug.lock);
> lock(slab_mutex);
> lock(rcu_sched_state.barrier_mutex);
>
> *** DEADLOCK ***
> === [ cut here ] ===
>
> This is actually a false positive. Lockdep has no way of knowing the fact
> that the ABBA can actually never happen, because of special semantics of
> cpu_hotplug.refcount and itss handling in cpu_hotplug_begin(); the mutual
> exclusion there is not achieved through mutex, but through
> cpu_hotplug.refcount.
>
> The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
> until everyone who called get_online_cpus() will call put_online_cpus()"
> semantics is totally invisible to lockdep.
>
> This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
> is being called with it unlocked. It has two advantages:
>
> - it slightly reduces hold time of slab_mutex; as it's used to protect
> the cachep list, it's not necessary to hold it over kmem_cache_free()
> call any more
> - it silences the lockdep false positive warning, as it avoids lockdep ever
> learning about slab_mutex -> cpu_hotplug.lock dependency
>
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH] [RESEND] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
2012-10-03 18:49 ` David Rientjes
@ 2012-10-08 7:26 ` Jiri Kosina
2012-10-10 6:27 ` Pekka Enberg
0 siblings, 1 reply; 24+ messages in thread
From: Jiri Kosina @ 2012-10-08 7:26 UTC (permalink / raw)
To: Pekka Enberg
Cc: Srivatsa S. Bhat, Christoph Lameter, Paul E. McKenney,
Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm,
David Rientjes
Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
__stop_machine()") introduced slab_mutex -> cpu_hotplug.lock dependency
through kmem_cache_destroy() -> rcu_barrier() -> _rcu_barrier() ->
get_online_cpus().
Lockdep thinks that this might actually result in ABBA deadlock,
and reports it as below:
=== [ cut here ] ===
======================================================
[ INFO: possible circular locking dependency detected ]
3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
-------------------------------------------------------
kworker/u:2/40 is trying to acquire lock:
(rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
but task is already holding lock:
(slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (slab_mutex){+.+.+.}:
[<ffffffff810ae1e2>] validate_chain+0x632/0x720
[<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
[<ffffffff810ae921>] lock_acquire+0x121/0x190
[<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
[<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
[<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
[<ffffffff81564b83>] notifier_call_chain+0x93/0x140
[<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
[<ffffffff8155719d>] _cpu_up+0xba/0x14e
[<ffffffff815572ed>] cpu_up+0xbc/0x117
[<ffffffff81ae05e3>] smp_init+0x6b/0x9f
[<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
[<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
-> #1 (cpu_hotplug.lock){+.+.+.}:
[<ffffffff810ae1e2>] validate_chain+0x632/0x720
[<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
[<ffffffff810ae921>] lock_acquire+0x121/0x190
[<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
[<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
[<ffffffff81049197>] get_online_cpus+0x37/0x50
[<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
[<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
[<ffffffff810f2309>] rcu_barrier+0x9/0x10
[<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
[<ffffffff8118cc01>] deactivate_super+0x61/0x70
[<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
[<ffffffff811ab49e>] sys_umount+0x6e/0xd0
[<ffffffff81569979>] system_call_fastpath+0x16/0x1b
-> #0 (rcu_sched_state.barrier_mutex){+.+...}:
[<ffffffff810adb4e>] check_prev_add+0x3de/0x440
[<ffffffff810ae1e2>] validate_chain+0x632/0x720
[<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
[<ffffffff810ae921>] lock_acquire+0x121/0x190
[<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
[<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
[<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
[<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
[<ffffffff810f2309>] rcu_barrier+0x9/0x10
[<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
[<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
[<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
[<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
[<ffffffff81454b79>] ops_exit_list+0x39/0x60
[<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
[<ffffffff8106917b>] process_one_work+0x26b/0x4c0
[<ffffffff81069f3e>] worker_thread+0x12e/0x320
[<ffffffff8106f73e>] kthread+0x9e/0xb0
[<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
other info that might help us debug this:
Chain exists of:
rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(slab_mutex);
lock(cpu_hotplug.lock);
lock(slab_mutex);
lock(rcu_sched_state.barrier_mutex);
*** DEADLOCK ***
=== [ cut here ] ===
This is actually a false positive. Lockdep has no way of knowing the fact
that the ABBA can actually never happen, because of special semantics of
cpu_hotplug.refcount and its handling in cpu_hotplug_begin(); the mutual
exclusion there is not achieved through mutex, but through
cpu_hotplug.refcount.
The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
until everyone who called get_online_cpus() will call put_online_cpus()"
semantics is totally invisible to lockdep.
This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
is being called with it unlocked. It has two advantages:
- it slightly reduces hold time of slab_mutex; as it's used to protect
the cachep list, it's not necessary to hold it over kmem_cache_free()
call any more
- it silences the lockdep false positive warning, as it avoids lockdep ever
learning about slab_mutex -> cpu_hotplug.lock dependency
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
mm/slab_common.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 9c21725..069a24e6 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -168,6 +168,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
list_del(&s->list);
if (!__kmem_cache_shutdown(s)) {
+ mutex_unlock(&slab_mutex);
if (s->flags & SLAB_DESTROY_BY_RCU)
rcu_barrier();
@@ -175,12 +176,14 @@ void kmem_cache_destroy(struct kmem_cache *s)
kmem_cache_free(kmem_cache, s);
} else {
list_add(&s->list, &slab_caches);
+ mutex_unlock(&slab_mutex);
printk(KERN_ERR "kmem_cache_destroy %s: Slab cache still has objects\n",
s->name);
dump_stack();
}
+ } else {
+ mutex_unlock(&slab_mutex);
}
- mutex_unlock(&slab_mutex);
put_online_cpus();
}
EXPORT_SYMBOL(kmem_cache_destroy);
--
Jiri Kosina
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH] [RESEND] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
2012-10-08 7:26 ` [PATCH] [RESEND] " Jiri Kosina
@ 2012-10-10 6:27 ` Pekka Enberg
0 siblings, 0 replies; 24+ messages in thread
From: Pekka Enberg @ 2012-10-10 6:27 UTC (permalink / raw)
To: Jiri Kosina
Cc: Srivatsa S. Bhat, Christoph Lameter, Paul E. McKenney,
Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm,
David Rientjes
On Mon, Oct 8, 2012 at 10:26 AM, Jiri Kosina <jkosina@suse.cz> wrote:
> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock dependency
> through kmem_cache_destroy() -> rcu_barrier() -> _rcu_barrier() ->
> get_online_cpus().
>
> Lockdep thinks that this might actually result in ABBA deadlock,
> and reports it as below:
>
> === [ cut here ] ===
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
> -------------------------------------------------------
> kworker/u:2/40 is trying to acquire lock:
> (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
>
> but task is already holding lock:
> (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (slab_mutex){+.+.+.}:
> [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> [<ffffffff810ae921>] lock_acquire+0x121/0x190
> [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
> [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
> [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
> [<ffffffff8155719d>] _cpu_up+0xba/0x14e
> [<ffffffff815572ed>] cpu_up+0xbc/0x117
> [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
> [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
> [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
>
> -> #1 (cpu_hotplug.lock){+.+.+.}:
> [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> [<ffffffff810ae921>] lock_acquire+0x121/0x190
> [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> [<ffffffff81049197>] get_online_cpus+0x37/0x50
> [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
> [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
> [<ffffffff810f2309>] rcu_barrier+0x9/0x10
> [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
> [<ffffffff8118cc01>] deactivate_super+0x61/0x70
> [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
> [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
> [<ffffffff81569979>] system_call_fastpath+0x16/0x1b
>
> -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
> [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
> [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> [<ffffffff810ae921>] lock_acquire+0x121/0x190
> [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
> [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
> [<ffffffff810f2309>] rcu_barrier+0x9/0x10
> [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
> [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
> [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
> [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
> [<ffffffff81454b79>] ops_exit_list+0x39/0x60
> [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
> [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
> [<ffffffff81069f3e>] worker_thread+0x12e/0x320
> [<ffffffff8106f73e>] kthread+0x9e/0xb0
> [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
>
> other info that might help us debug this:
>
> Chain exists of:
> rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(slab_mutex);
> lock(cpu_hotplug.lock);
> lock(slab_mutex);
> lock(rcu_sched_state.barrier_mutex);
>
> *** DEADLOCK ***
> === [ cut here ] ===
>
> This is actually a false positive. Lockdep has no way of knowing the fact
> that the ABBA can actually never happen, because of special semantics of
> cpu_hotplug.refcount and its handling in cpu_hotplug_begin(); the mutual
> exclusion there is not achieved through mutex, but through
> cpu_hotplug.refcount.
>
> The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
> until everyone who called get_online_cpus() will call put_online_cpus()"
> semantics is totally invisible to lockdep.
>
> This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
> is being called with it unlocked. It has two advantages:
>
> - it slightly reduces hold time of slab_mutex; as it's used to protect
> the cachep list, it's not necessary to hold it over kmem_cache_free()
> call any more
> - it silences the lockdep false positive warning, as it avoids lockdep ever
> learning about slab_mutex -> cpu_hotplug.lock dependency
>
> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Acked-by: David Rientjes <rientjes@google.com>
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> ---
> mm/slab_common.c | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 9c21725..069a24e6 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -168,6 +168,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
> list_del(&s->list);
>
> if (!__kmem_cache_shutdown(s)) {
> + mutex_unlock(&slab_mutex);
> if (s->flags & SLAB_DESTROY_BY_RCU)
> rcu_barrier();
>
> @@ -175,12 +176,14 @@ void kmem_cache_destroy(struct kmem_cache *s)
> kmem_cache_free(kmem_cache, s);
> } else {
> list_add(&s->list, &slab_caches);
> + mutex_unlock(&slab_mutex);
> printk(KERN_ERR "kmem_cache_destroy %s: Slab cache still has objects\n",
> s->name);
> dump_stack();
> }
> + } else {
> + mutex_unlock(&slab_mutex);
> }
> - mutex_unlock(&slab_mutex);
> put_online_cpus();
> }
> EXPORT_SYMBOL(kmem_cache_destroy);
Applied, thanks!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 24+ messages in thread