kernel-testers.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* SLUB/debugobjects locking (Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26)
@ 2008-08-25 19:44 Vegard Nossum
       [not found] ` <19f34abd0808251244v439e78b1hbb24f77c637559c3-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Vegard Nossum @ 2008-08-25 19:44 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Linus Torvalds, Rafael J. Wysocki, Thomas Gleixner, Ingo Molnar,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List

[-- Attachment #1: Type: text/plain, Size: 1500 bytes --]

On Mon, Aug 25, 2008 at 3:03 PM, Daniel J Blueman
<daniel.blueman-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Linus, Vegard,
>
> On Sun, Aug 24, 2008 at 7:58 PM, Linus Torvalds
> <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
>> On Sun, 24 Aug 2008, Vegard Nossum wrote:
> [snip]
>> Anyway, I think your patch is likely fine, I just thought it looked a bit
>> odd to have a loop to move a list from one head pointer to another.
>>
>> But regardless, it would need some testing. Daniel?
>
> This opens another lockdep report at boot-time [1] - promoting
> pool_lock may not be the best fix?
>
> We then see a new deadlock condition (on the pool_lock spinlock) [2],
> which seemingly was avoided by taking the debug-bucket lock first.
>
> We reproduce this by booting with debug_objects=1 and causing a lot of activity.

Thanks. I get the same thing here as well.

I tried your suggestion of promoting the lock to irq-safe, and it
fixed the warning for me (didn't get or look for deadlocks yet, but it
seems likely that it is caused by the same thing?), the patch is
attached for reference.

I also don't know if this is the best fix, but I also don't have any
other (better) suggestions.

Others are welcome to pick it up from here...


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

[-- Attachment #2: promoted.patch --]
[-- Type: application/octet-stream, Size: 1577 bytes --]

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 9d8161f..ac46405 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -116,9 +116,10 @@ static struct debug_obj *lookup_object(void *addr, struct debug_bucket *b)
 static struct debug_obj *
 alloc_object(void *addr, struct debug_bucket *b, struct debug_obj_descr *descr)
 {
+	unsigned long flags;
 	struct debug_obj *obj = NULL;
 
-	spin_lock(&pool_lock);
+	spin_lock_irqsave(&pool_lock, flags);
 	if (obj_pool.first) {
 		obj	    = hlist_entry(obj_pool.first, typeof(*obj), node);
 
@@ -137,7 +138,7 @@ alloc_object(void *addr, struct debug_bucket *b, struct debug_obj_descr *descr)
 		if (obj_pool_free < obj_pool_min_free)
 			obj_pool_min_free = obj_pool_free;
 	}
-	spin_unlock(&pool_lock);
+	spin_unlock_irqrestore(&pool_lock, flags);
 
 	return obj;
 }
@@ -147,18 +148,19 @@ alloc_object(void *addr, struct debug_bucket *b, struct debug_obj_descr *descr)
  */
 static void free_object(struct debug_obj *obj)
 {
+	unsigned long flags;
 	unsigned long idx = (unsigned long)(obj - obj_static_pool);
 
 	if (obj_pool_free < ODEBUG_POOL_SIZE || idx < ODEBUG_POOL_SIZE) {
-		spin_lock(&pool_lock);
+		spin_lock_irqsave(&pool_lock, flags);
 		hlist_add_head(&obj->node, &obj_pool);
 		obj_pool_free++;
 		obj_pool_used--;
-		spin_unlock(&pool_lock);
+		spin_unlock_irqrestore(&pool_lock, flags);
 	} else {
-		spin_lock(&pool_lock);
+		spin_lock_irqsave(&pool_lock, flags);
 		obj_pool_used--;
-		spin_unlock(&pool_lock);
+		spin_unlock_irqrestore(&pool_lock, flags);
 		kmem_cache_free(obj_cache, obj);
 	}
 }

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: SLUB/debugobjects locking (Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26)
       [not found] ` <19f34abd0808251244v439e78b1hbb24f77c637559c3-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-08-26 22:14   ` Daniel J Blueman
       [not found]     ` <6278d2220808261514p2661251aw914215652c547125-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel J Blueman @ 2008-08-26 22:14 UTC (permalink / raw)
  To: Vegard Nossum, Thomas Gleixner, Christoph Lameter
  Cc: Linus Torvalds, Rafael J. Wysocki, Ingo Molnar,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List

On Mon, Aug 25, 2008 at 8:44 PM, Vegard Nossum <vegard.nossum-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Mon, Aug 25, 2008 at 3:03 PM, Daniel J Blueman
> <daniel.blueman-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Hi Linus, Vegard,
>>
>> On Sun, Aug 24, 2008 at 7:58 PM, Linus Torvalds
>> <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
>>> On Sun, 24 Aug 2008, Vegard Nossum wrote:
>> [snip]
>>> Anyway, I think your patch is likely fine, I just thought it looked a bit
>>> odd to have a loop to move a list from one head pointer to another.
>>>
>>> But regardless, it would need some testing. Daniel?
>>
>> This opens another lockdep report at boot-time [1] - promoting
>> pool_lock may not be the best fix?
>>
>> We then see a new deadlock condition (on the pool_lock spinlock) [2],
>> which seemingly was avoided by taking the debug-bucket lock first.
>>
>> We reproduce this by booting with debug_objects=1 and causing a lot of activity.
>
> Thanks. I get the same thing here as well.
>
> I tried your suggestion of promoting the lock to irq-safe, and it
> fixed the warning for me (didn't get or look for deadlocks yet, but it
> seems likely that it is caused by the same thing?), the patch is
> attached for reference.
>
> I also don't know if this is the best fix, but I also don't have any
> other (better) suggestions.
>
> Others are welcome to pick it up from here...

The solution looks like is needs to get the lock ordering correct
w.r.t. SLUB, as we get this, alas:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.27-rc4-229c-debug #1
-------------------------------------------------------
make/9475 is trying to acquire lock:
 (&n->list_lock){++..}, at: [<ffffffff802dfb23>] __slab_free+0x3a3/0x3f0

but task is already holding lock:
 (&obj_hash[i].lock){++..}, at: [<ffffffff8046965e>]
__debug_check_no_obj_freed+0x6e/0x170

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&obj_hash[i].lock){++..}:
       [<ffffffff802703b1>] __lock_acquire+0xdc1/0x1160
       [<ffffffff802707e1>] lock_acquire+0x91/0xc0
       [<ffffffff806a55d3>] _spin_lock_irqsave+0x53/0x90
       [<ffffffff8046965e>] __debug_check_no_obj_freed+0x6e/0x170
       [<ffffffff80469775>] debug_check_no_obj_freed+0x15/0x20
       [<ffffffff802b4a30>] free_hot_cold_page+0x130/0x290
       [<ffffffff802b4c20>] free_hot_page+0x10/0x20
       [<ffffffff802b5695>] __free_pages+0x45/0x50
       [<ffffffff802df301>] __free_slab+0xa1/0x150
       [<ffffffff802df3e8>] discard_slab+0x38/0x60
       [<ffffffff802e235f>] kmem_cache_shrink+0x18f/0x2c0
       [<ffffffff8048f421>] acpi_os_purge_cache+0xe/0x12
       [<ffffffff804a6cbd>] acpi_purge_cached_objects+0x15/0x3d
       [<ffffffff804a6d4d>] acpi_initialize_objects+0x4e/0x59
       [<ffffffff8130559b>] acpi_init+0x91/0x226
       [<ffffffff80209265>] do_one_initcall+0x45/0x190
       [<ffffffff812e5c9b>] kernel_init+0x14d/0x1b2
       [<ffffffff8020d919>] child_rip+0xa/0x11
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (&n->list_lock){++..}:
       [<ffffffff80270495>] __lock_acquire+0xea5/0x1160
       [<ffffffff802707e1>] lock_acquire+0x91/0xc0
       [<ffffffff806a5441>] _spin_lock+0x41/0x80
       [<ffffffff802dfb23>] __slab_free+0x3a3/0x3f0
       [<ffffffff802dfec4>] kmem_cache_free+0xa4/0x110
       [<ffffffff8046958b>] free_object+0x6b/0xd0
-> (&db->lock) taken
       [<ffffffff804696aa>] __debug_check_no_obj_freed+0xba/0x170
       [<ffffffff80469775>] debug_check_no_obj_freed+0x15/0x20
       [<ffffffff802dff0c>] kmem_cache_free+0xec/0x110
       [<ffffffff8024264b>] __cleanup_signal+0x1b/0x20
       [<ffffffff80248273>] release_task+0x233/0x3d0
       [<ffffffff80248960>] wait_consider_task+0x550/0x8b0
       [<ffffffff80248e16>] do_wait+0x156/0x350
       [<ffffffff802490a6>] sys_wait4+0x96/0xf0
       [<ffffffff8020c86b>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

2 locks held by make/9475:
 #0:  (tasklist_lock){..??}, at: [<ffffffff80248084>] release_task+0x44/0x3d0
 #1:  (&obj_hash[i].lock){++..}, at: [<ffffffff8046965e>]
__debug_check_no_obj_freed+0x6e/0x170

stack backtrace:
Pid: 9475, comm: make Not tainted 2.6.27-rc4-229c-debug #1

Call Trace:
 [<ffffffff8026e2b7>] print_circular_bug_tail+0xa7/0xf0
 [<ffffffff80270495>] __lock_acquire+0xea5/0x1160
 [<ffffffff802707e1>] lock_acquire+0x91/0xc0
 [<ffffffff802dfb23>] ? __slab_free+0x3a3/0x3f0
 [<ffffffff806a5441>] _spin_lock+0x41/0x80
 [<ffffffff802dfb23>] ? __slab_free+0x3a3/0x3f0
 [<ffffffff802dfb23>] __slab_free+0x3a3/0x3f0
 [<ffffffff8046958b>] ? free_object+0x6b/0xd0
 [<ffffffff802dfec4>] kmem_cache_free+0xa4/0x110
 [<ffffffff8046958b>] ? free_object+0x6b/0xd0
 [<ffffffff8046958b>] free_object+0x6b/0xd0
-> &db->lock taken
 [<ffffffff804696aa>] __debug_check_no_obj_freed+0xba/0x170
 [<ffffffff80469775>] debug_check_no_obj_freed+0x15/0x20
 [<ffffffff802dff0c>] kmem_cache_free+0xec/0x110
 [<ffffffff8024264b>] ? __cleanup_signal+0x1b/0x20
 [<ffffffff8024264b>] __cleanup_signal+0x1b/0x20
 [<ffffffff80248273>] release_task+0x233/0x3d0
 [<ffffffff80248960>] wait_consider_task+0x550/0x8b0
 [<ffffffff80248e16>] do_wait+0x156/0x350
 [<ffffffff8023b6f0>] ? default_wake_function+0x0/0x10
 [<ffffffff802490a6>] sys_wait4+0x96/0xf0
 [<ffffffff8020c86b>] system_call_fastpath+0x16/0x1b
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SLUB/debugobjects locking (Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26)
       [not found]     ` <6278d2220808261514p2661251aw914215652c547125-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-08-28 13:42       ` Vegard Nossum
  2008-08-28 13:56         ` Ingo Molnar
  0 siblings, 1 reply; 4+ messages in thread
From: Vegard Nossum @ 2008-08-28 13:42 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Thomas Gleixner, Christoph Lameter, Linus Torvalds,
	Rafael J. Wysocki, Ingo Molnar, Linux Kernel Mailing List,
	Adrian Bunk, Andrew Morton, Natalie Protasevich,
	Kernel Testers List

On Wed, Aug 27, 2008 at 12:14 AM, Daniel J Blueman
<daniel.blueman-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Mon, Aug 25, 2008 at 8:44 PM, Vegard Nossum <vegard.nossum-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> I tried your suggestion of promoting the lock to irq-safe, and it
>> fixed the warning for me (didn't get or look for deadlocks yet, but it
>> seems likely that it is caused by the same thing?), the patch is
>> attached for reference.
>>
>> I also don't know if this is the best fix, but I also don't have any
>> other (better) suggestions.
>>
>> Others are welcome to pick it up from here...
>
> The solution looks like is needs to get the lock ordering correct
> w.r.t. SLUB, as we get this, alas:
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.27-rc4-229c-debug #1

Hm. Is this with my first patch + the one in the e-mail you replied
to? It was intended to be a delta patch on top of my first one.

That would be the one in

    http://marc.info/?l=linux-kernel&m=121946972307110&w=4

plus the one in

    http://marc.info/?l=linux-kernel&m=121969394110327&w=1

Thanks for testing and sorry for the confusion. (Maybe I just confused
myself. Please let me know either way.)


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SLUB/debugobjects locking (Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26)
  2008-08-28 13:42       ` Vegard Nossum
@ 2008-08-28 13:56         ` Ingo Molnar
  0 siblings, 0 replies; 4+ messages in thread
From: Ingo Molnar @ 2008-08-28 13:56 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Daniel J Blueman, Thomas Gleixner, Christoph Lameter,
	Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Adrian Bunk, Andrew Morton, Natalie Protasevich,
	Kernel Testers List


* Vegard Nossum <vegard.nossum@gmail.com> wrote:

> On Wed, Aug 27, 2008 at 12:14 AM, Daniel J Blueman
> <daniel.blueman@gmail.com> wrote:
> > On Mon, Aug 25, 2008 at 8:44 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> >> I tried your suggestion of promoting the lock to irq-safe, and it
> >> fixed the warning for me (didn't get or look for deadlocks yet, but it
> >> seems likely that it is caused by the same thing?), the patch is
> >> attached for reference.
> >>
> >> I also don't know if this is the best fix, but I also don't have any
> >> other (better) suggestions.
> >>
> >> Others are welcome to pick it up from here...
> >
> > The solution looks like is needs to get the lock ordering correct
> > w.r.t. SLUB, as we get this, alas:
> >
> > =======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 2.6.27-rc4-229c-debug #1
> 
> Hm. Is this with my first patch + the one in the e-mail you replied
> to? It was intended to be a delta patch on top of my first one.
> 
> That would be the one in
> 
>     http://marc.info/?l=linux-kernel&m=121946972307110&w=4
> 
> plus the one in
> 
>     http://marc.info/?l=linux-kernel&m=121969394110327&w=1
> 
> Thanks for testing and sorry for the confusion. (Maybe I just confused
> myself. Please let me know either way.)

could you resend the final patch please? It's a candidate for .27, if it 
works out fine.

	Ingo

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-08-28 13:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-25 19:44 SLUB/debugobjects locking (Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26) Vegard Nossum
     [not found] ` <19f34abd0808251244v439e78b1hbb24f77c637559c3-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-08-26 22:14   ` Daniel J Blueman
     [not found]     ` <6278d2220808261514p2661251aw914215652c547125-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-08-28 13:42       ` Vegard Nossum
2008-08-28 13:56         ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).