* [2.6.27-rc4] SLUB list_lock vs obj_hash.lock...
@ 2008-08-22 21:48 Daniel J Blueman
2008-08-23 5:34 ` Vegard Nossum
0 siblings, 1 reply; 2+ messages in thread
From: Daniel J Blueman @ 2008-08-22 21:48 UTC (permalink / raw)
To: Linux Kernel
When booting 2.6.27-rc4 with SLUB and debug_objects=1, we see (after
some activity) lock ordering issues with obj_hash.lock and SLUB's
list_lock [1].
Thanks,
Daniel
--- [1]
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.27-rc4-224c #1
-------------------------------------------------------
hald/4680 is trying to acquire lock:
(&n->list_lock){++..}, at: [<ffffffff802bfa26>] add_partial+0x26/0x80
but task is already holding lock:
(&obj_hash[i].lock){++..}, at: [<ffffffff8041cfdc>]
debug_object_free+0x5c/0x120
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&obj_hash[i].lock){++..}:
[<ffffffff8026b011>] __lock_acquire+0xdb1/0x1150
[<ffffffff8026b441>] lock_acquire+0x91/0xc0
[<ffffffff80641b29>] _spin_lock_irqsave+0x49/0x90
[<ffffffff8041cd0e>] __debug_check_no_obj_freed+0x6e/0x170
[<ffffffff8041ce25>] debug_check_no_obj_freed+0x15/0x20
[<ffffffff8029accf>] free_hot_cold_page+0x11f/0x240
[<ffffffff8029ae80>] free_hot_page+0x10/0x20
[<ffffffff8029b36d>] __free_pages+0x3d/0x50
[<ffffffff802bfffe>] __free_slab+0x7e/0x90
[<ffffffff802c0028>] discard_slab+0x18/0x40
[<ffffffff802c1d2b>] kmem_cache_shrink+0x17b/0x220
[<ffffffff80442521>] acpi_os_purge_cache+0xe/0x12
[<ffffffff80459db9>] acpi_purge_cached_objects+0x15/0x3d
[<ffffffff80459e49>] acpi_initialize_objects+0x4e/0x59
[<ffffffff808a64ea>] acpi_init+0x91/0x226
[<ffffffff80209265>] do_one_initcall+0x45/0x190
[<ffffffff80887c53>] kernel_init+0x145/0x1a2
[<ffffffff8020d749>] child_rip+0xa/0x11
[<ffffffffffffffff>] 0xffffffffffffffff
-> #0 (&n->list_lock){++..}:
[<ffffffff8026b0f5>] __lock_acquire+0xe95/0x1150
[<ffffffff8026b441>] lock_acquire+0x91/0xc0
[<ffffffff806419c6>] _spin_lock+0x36/0x70
[<ffffffff802bfa26>] add_partial+0x26/0x80
[<ffffffff802c0156>] __slab_free+0x106/0x110
[<ffffffff802c0507>] kmem_cache_free+0xa7/0x110
[<ffffffff8041cc48>] free_object+0x68/0xc0
[<ffffffff8041d033>] debug_object_free+0xb3/0x120
[<ffffffff8063f35e>] schedule_timeout+0x7e/0xe0
[<ffffffff802d5439>] do_sys_poll+0x3b9/0x440
[<ffffffff802d57d8>] sys_poll+0x38/0xa0
[<ffffffff8020c74b>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
other info that might help us debug this:
1 lock held by hald/4680:
#0: (&obj_hash[i].lock){++..}, at: [<ffffffff8041cfdc>]
debug_object_free+0x5c/0x120
stack backtrace:
Pid: 4680, comm: hald Not tainted 2.6.27-rc4-224c #1
Call Trace:
[<ffffffff80268d1f>] print_circular_bug_tail+0x9f/0xe0
[<ffffffff8026b0f5>] __lock_acquire+0xe95/0x1150
[<ffffffff8026b441>] lock_acquire+0x91/0xc0
[<ffffffff802bfa26>] ? add_partial+0x26/0x80
[<ffffffff806419c6>] _spin_lock+0x36/0x70
[<ffffffff802bfa26>] ? add_partial+0x26/0x80
[<ffffffff802bfa26>] add_partial+0x26/0x80
[<ffffffff802c0156>] __slab_free+0x106/0x110
[<ffffffff802c0507>] kmem_cache_free+0xa7/0x110
[<ffffffff8041cc48>] ? free_object+0x68/0xc0
[<ffffffff8041cc48>] free_object+0x68/0xc0
[<ffffffff8041d033>] debug_object_free+0xb3/0x120
[<ffffffff8063f35e>] schedule_timeout+0x7e/0xe0
[<ffffffff8024c200>] ? process_timeout+0x0/0x10
[<ffffffff8063f342>] ? schedule_timeout+0x62/0xe0
[<ffffffff802d5439>] do_sys_poll+0x3b9/0x440
[<ffffffff802d5f00>] ? __pollwait+0x0/0x120
[<ffffffff8023a720>] ? default_wake_function+0x0/0x10
[<ffffffff8023a720>] ? default_wake_function+0x0/0x10
[<ffffffff8023a720>] ? default_wake_function+0x0/0x10
[<ffffffff8023a720>] ? default_wake_function+0x0/0x10
[<ffffffff8023a720>] ? default_wake_function+0x0/0x10
[<ffffffff8023a720>] ? default_wake_function+0x0/0x10
[<ffffffff8023a720>] ? default_wake_function+0x0/0x10
[<ffffffff8023a720>] ? default_wake_function+0x0/0x10
[<ffffffff8023a720>] ? default_wake_function+0x0/0x10
[<ffffffff8023a720>] ? default_wake_function+0x0/0x10
[<ffffffff8064128e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff802d57d8>] sys_poll+0x38/0xa0
[<ffffffff8020c74b>] system_call_fastpath+0x16/0x1b
--
Daniel J Blueman
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: [2.6.27-rc4] SLUB list_lock vs obj_hash.lock... 2008-08-22 21:48 [2.6.27-rc4] SLUB list_lock vs obj_hash.lock Daniel J Blueman @ 2008-08-23 5:34 ` Vegard Nossum 0 siblings, 0 replies; 2+ messages in thread From: Vegard Nossum @ 2008-08-23 5:34 UTC (permalink / raw) To: Daniel J Blueman, Thomas Gleixner; +Cc: Linux Kernel On Fri, Aug 22, 2008 at 11:48 PM, Daniel J Blueman <daniel.blueman@gmail.com> wrote: > When booting 2.6.27-rc4 with SLUB and debug_objects=1, we see (after > some activity) lock ordering issues with obj_hash.lock and SLUB's > list_lock [1]. That's quite funny :-) > ======================================================= > [ INFO: possible circular locking dependency detected ] > 2.6.27-rc4-224c #1 > ------------------------------------------------------- > hald/4680 is trying to acquire lock: > (&n->list_lock){++..}, at: [<ffffffff802bfa26>] add_partial+0x26/0x80 > > but task is already holding lock: > (&obj_hash[i].lock){++..}, at: [<ffffffff8041cfdc>] > debug_object_free+0x5c/0x120 It seems to be a similar problem as before, except that this is the free path, and not alloc. You might try the appended patch -- but it's completely untested on my end! What it does is to move the actual freeing until after we've released the hash-entry lock. In the cases of multiple objects freed, we enqueue all the objects on a local list. Also added Thomas to Cc. Vegard PS: Why are you the only one seeing these things? Does debug_objects=1 need to be passed to activate this code, or can a CONFIG also be used? diff --git a/lib/debugobjects.c b/lib/debugobjects.c index 45a6bde..9d8161f 100644 --- a/lib/debugobjects.c +++ b/lib/debugobjects.c @@ -171,19 +171,28 @@ static void debug_objects_oom(void) { struct debug_bucket *db = obj_hash; struct hlist_node *node, *tmp; + HLIST_HEAD(freelist); struct debug_obj *obj; unsigned long flags; int i; printk(KERN_WARNING "ODEBUG: Out of memory. ODEBUG disabled\n"); + /* XXX: Could probably be optimized by transplantation of more than + * one entry at a time. */ for (i = 0; i < ODEBUG_HASH_SIZE; i++, db++) { spin_lock_irqsave(&db->lock, flags); hlist_for_each_entry_safe(obj, node, tmp, &db->list, node) { hlist_del(&obj->node); - free_object(obj); + hlist_add_head(&obj->node, &freelist); } spin_unlock_irqrestore(&db->lock, flags); + + /* Now free them */ + hlist_for_each_entry_safe(obj, node, tmp, &freelist, node) { + hlist_del(&obj->node); + free_object(obj); + } } } @@ -498,8 +507,9 @@ void debug_object_free(void *addr, struct debug_obj_descr *descr) return; default: hlist_del(&obj->node); + spin_unlock_irqrestore(&db->lock, flags); free_object(obj); - break; + return; } out_unlock: spin_unlock_irqrestore(&db->lock, flags); @@ -510,6 +520,7 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size) { unsigned long flags, oaddr, saddr, eaddr, paddr, chunks; struct hlist_node *node, *tmp; + HLIST_HEAD(freelist); struct debug_obj_descr *descr; enum debug_obj_state state; struct debug_bucket *db; @@ -545,11 +556,18 @@ repeat: goto repeat; default: hlist_del(&obj->node); - free_object(obj); + hlist_add_head(&obj->node, &freelist); break; } } spin_unlock_irqrestore(&db->lock, flags); + + /* Now free them */ + hlist_for_each_entry_safe(obj, node, tmp, &freelist, node) { + hlist_del(&obj->node); + free_object(obj); + } + if (cnt > debug_objects_maxchain) debug_objects_maxchain = cnt; } ^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2008-08-23 5:34 UTC | newest] Thread overview: 2+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-22 21:48 [2.6.27-rc4] SLUB list_lock vs obj_hash.lock Daniel J Blueman 2008-08-23 5:34 ` Vegard Nossum
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox