public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [2.6.27-rc4] SLUB list_lock vs obj_hash.lock...
@ 2008-08-22 21:48 Daniel J Blueman
  2008-08-23  5:34 ` Vegard Nossum
  0 siblings, 1 reply; 2+ messages in thread
From: Daniel J Blueman @ 2008-08-22 21:48 UTC (permalink / raw)
  To: Linux Kernel

When booting 2.6.27-rc4 with SLUB and debug_objects=1, we see (after
some activity) lock ordering issues with obj_hash.lock and SLUB's
list_lock [1].

Thanks,
  Daniel

--- [1]

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.27-rc4-224c #1
-------------------------------------------------------
hald/4680 is trying to acquire lock:
 (&n->list_lock){++..}, at: [<ffffffff802bfa26>] add_partial+0x26/0x80

but task is already holding lock:
 (&obj_hash[i].lock){++..}, at: [<ffffffff8041cfdc>]
debug_object_free+0x5c/0x120

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&obj_hash[i].lock){++..}:
       [<ffffffff8026b011>] __lock_acquire+0xdb1/0x1150
       [<ffffffff8026b441>] lock_acquire+0x91/0xc0
       [<ffffffff80641b29>] _spin_lock_irqsave+0x49/0x90
       [<ffffffff8041cd0e>] __debug_check_no_obj_freed+0x6e/0x170
       [<ffffffff8041ce25>] debug_check_no_obj_freed+0x15/0x20
       [<ffffffff8029accf>] free_hot_cold_page+0x11f/0x240
       [<ffffffff8029ae80>] free_hot_page+0x10/0x20
       [<ffffffff8029b36d>] __free_pages+0x3d/0x50
       [<ffffffff802bfffe>] __free_slab+0x7e/0x90
       [<ffffffff802c0028>] discard_slab+0x18/0x40
       [<ffffffff802c1d2b>] kmem_cache_shrink+0x17b/0x220
       [<ffffffff80442521>] acpi_os_purge_cache+0xe/0x12
       [<ffffffff80459db9>] acpi_purge_cached_objects+0x15/0x3d
       [<ffffffff80459e49>] acpi_initialize_objects+0x4e/0x59
       [<ffffffff808a64ea>] acpi_init+0x91/0x226
       [<ffffffff80209265>] do_one_initcall+0x45/0x190
       [<ffffffff80887c53>] kernel_init+0x145/0x1a2
       [<ffffffff8020d749>] child_rip+0xa/0x11
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (&n->list_lock){++..}:
       [<ffffffff8026b0f5>] __lock_acquire+0xe95/0x1150
       [<ffffffff8026b441>] lock_acquire+0x91/0xc0
       [<ffffffff806419c6>] _spin_lock+0x36/0x70
       [<ffffffff802bfa26>] add_partial+0x26/0x80
       [<ffffffff802c0156>] __slab_free+0x106/0x110
       [<ffffffff802c0507>] kmem_cache_free+0xa7/0x110
       [<ffffffff8041cc48>] free_object+0x68/0xc0
       [<ffffffff8041d033>] debug_object_free+0xb3/0x120
       [<ffffffff8063f35e>] schedule_timeout+0x7e/0xe0
       [<ffffffff802d5439>] do_sys_poll+0x3b9/0x440
       [<ffffffff802d57d8>] sys_poll+0x38/0xa0
       [<ffffffff8020c74b>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

1 lock held by hald/4680:
 #0:  (&obj_hash[i].lock){++..}, at: [<ffffffff8041cfdc>]
debug_object_free+0x5c/0x120

stack backtrace:
Pid: 4680, comm: hald Not tainted 2.6.27-rc4-224c #1

Call Trace:
 [<ffffffff80268d1f>] print_circular_bug_tail+0x9f/0xe0
 [<ffffffff8026b0f5>] __lock_acquire+0xe95/0x1150
 [<ffffffff8026b441>] lock_acquire+0x91/0xc0
 [<ffffffff802bfa26>] ? add_partial+0x26/0x80
 [<ffffffff806419c6>] _spin_lock+0x36/0x70
 [<ffffffff802bfa26>] ? add_partial+0x26/0x80
 [<ffffffff802bfa26>] add_partial+0x26/0x80
 [<ffffffff802c0156>] __slab_free+0x106/0x110
 [<ffffffff802c0507>] kmem_cache_free+0xa7/0x110
 [<ffffffff8041cc48>] ? free_object+0x68/0xc0
 [<ffffffff8041cc48>] free_object+0x68/0xc0
 [<ffffffff8041d033>] debug_object_free+0xb3/0x120
 [<ffffffff8063f35e>] schedule_timeout+0x7e/0xe0
 [<ffffffff8024c200>] ? process_timeout+0x0/0x10
 [<ffffffff8063f342>] ? schedule_timeout+0x62/0xe0
 [<ffffffff802d5439>] do_sys_poll+0x3b9/0x440
 [<ffffffff802d5f00>] ? __pollwait+0x0/0x120
 [<ffffffff8023a720>] ? default_wake_function+0x0/0x10
 [<ffffffff8023a720>] ? default_wake_function+0x0/0x10
 [<ffffffff8023a720>] ? default_wake_function+0x0/0x10
 [<ffffffff8023a720>] ? default_wake_function+0x0/0x10
 [<ffffffff8023a720>] ? default_wake_function+0x0/0x10
 [<ffffffff8023a720>] ? default_wake_function+0x0/0x10
 [<ffffffff8023a720>] ? default_wake_function+0x0/0x10
 [<ffffffff8023a720>] ? default_wake_function+0x0/0x10
 [<ffffffff8023a720>] ? default_wake_function+0x0/0x10
 [<ffffffff8023a720>] ? default_wake_function+0x0/0x10
 [<ffffffff8064128e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff802d57d8>] sys_poll+0x38/0xa0
 [<ffffffff8020c74b>] system_call_fastpath+0x16/0x1b
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [2.6.27-rc4] SLUB list_lock vs obj_hash.lock...
  2008-08-22 21:48 [2.6.27-rc4] SLUB list_lock vs obj_hash.lock Daniel J Blueman
@ 2008-08-23  5:34 ` Vegard Nossum
  0 siblings, 0 replies; 2+ messages in thread
From: Vegard Nossum @ 2008-08-23  5:34 UTC (permalink / raw)
  To: Daniel J Blueman, Thomas Gleixner; +Cc: Linux Kernel

On Fri, Aug 22, 2008 at 11:48 PM, Daniel J Blueman <daniel.blueman@gmail.com> wrote:
> When booting 2.6.27-rc4 with SLUB and debug_objects=1, we see (after
> some activity) lock ordering issues with obj_hash.lock and SLUB's
> list_lock [1].

That's quite funny :-)

> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.27-rc4-224c #1
> -------------------------------------------------------
> hald/4680 is trying to acquire lock:
>  (&n->list_lock){++..}, at: [<ffffffff802bfa26>] add_partial+0x26/0x80
>
> but task is already holding lock:
>  (&obj_hash[i].lock){++..}, at: [<ffffffff8041cfdc>]
> debug_object_free+0x5c/0x120

It seems to be a similar problem as before, except that this is the free path, and not alloc.

You might try the appended patch -- but it's completely untested on my end!

What it does is to move the actual freeing until after we've released the hash-entry lock. In the cases of multiple objects freed, we enqueue all the objects on a local list.

Also added Thomas to Cc.


Vegard

PS: Why are you the only one seeing these things? Does debug_objects=1 need to be passed to activate this code, or can a CONFIG also be used?


diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 45a6bde..9d8161f 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -171,19 +171,28 @@ static void debug_objects_oom(void)
 {
 	struct debug_bucket *db = obj_hash;
 	struct hlist_node *node, *tmp;
+	HLIST_HEAD(freelist);
 	struct debug_obj *obj;
 	unsigned long flags;
 	int i;
 
 	printk(KERN_WARNING "ODEBUG: Out of memory. ODEBUG disabled\n");
 
+	/* XXX: Could probably be optimized by transplantation of more than
+	 * one entry at a time. */
 	for (i = 0; i < ODEBUG_HASH_SIZE; i++, db++) {
 		spin_lock_irqsave(&db->lock, flags);
 		hlist_for_each_entry_safe(obj, node, tmp, &db->list, node) {
 			hlist_del(&obj->node);
-			free_object(obj);
+			hlist_add_head(&obj->node, &freelist);
 		}
 		spin_unlock_irqrestore(&db->lock, flags);
+
+		/* Now free them */
+		hlist_for_each_entry_safe(obj, node, tmp, &freelist, node) {
+			hlist_del(&obj->node);
+			free_object(obj);
+		}
 	}
 }
 
@@ -498,8 +507,9 @@ void debug_object_free(void *addr, struct debug_obj_descr *descr)
 		return;
 	default:
 		hlist_del(&obj->node);
+		spin_unlock_irqrestore(&db->lock, flags);
 		free_object(obj);
-		break;
+		return;
 	}
 out_unlock:
 	spin_unlock_irqrestore(&db->lock, flags);
@@ -510,6 +520,7 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size)
 {
 	unsigned long flags, oaddr, saddr, eaddr, paddr, chunks;
 	struct hlist_node *node, *tmp;
+	HLIST_HEAD(freelist);
 	struct debug_obj_descr *descr;
 	enum debug_obj_state state;
 	struct debug_bucket *db;
@@ -545,11 +556,18 @@ repeat:
 				goto repeat;
 			default:
 				hlist_del(&obj->node);
-				free_object(obj);
+				hlist_add_head(&obj->node, &freelist);
 				break;
 			}
 		}
 		spin_unlock_irqrestore(&db->lock, flags);
+
+		/* Now free them */
+		hlist_for_each_entry_safe(obj, node, tmp, &freelist, node) {
+			hlist_del(&obj->node);
+			free_object(obj);
+		}
+
 		if (cnt > debug_objects_maxchain)
 			debug_objects_maxchain = cnt;
 	}

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-08-23  5:34 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-22 21:48 [2.6.27-rc4] SLUB list_lock vs obj_hash.lock Daniel J Blueman
2008-08-23  5:34 ` Vegard Nossum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox