Failure to release lock after CPU hot-unplug canceled

All of lore.kernel.org
 help / color / mirror / Atom feed

* Failure to release lock after CPU hot-unplug canceled
@ 2007-01-08 17:07 Benjamin Gilbert
  2007-01-09 12:17 ` Heiko Carstens
  0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Gilbert @ 2007-01-08 17:07 UTC (permalink / raw)
  To: linux-kernel

If a module returns NOTIFY_BAD to a CPU_DOWN_PREPARE callback, subsequent
attempts to take a CPU down cause the write into sysfs to wedge.

This is reproducible in 2.6.20-rc4, but was originally found in 2.6.18.5.

Steps to reproduce:

1.  Load the test module included below
2.  Run the following shell commands as root:

echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu1/online

The second echo command hangs in uninterruptible sleep during the write()
call, and the following appears in dmesg:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.20-rc4-686 #1
-------------------------------------------------------
bash/1699 is trying to acquire lock:
 (cpu_add_remove_lock){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f

but task is already holding lock:
 (workqueue_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (workqueue_mutex){--..}:
       [<c01374b9>] __lock_acquire+0x912/0xa34
       [<c01378f6>] lock_acquire+0x67/0x8a
       [<c037900d>] __mutex_lock_slowpath+0xf6/0x2b8
       [<c03791eb>] mutex_lock+0x1c/0x1f
       [<c012dc27>] workqueue_cpu_callback+0x10b/0x20c
       [<c037c687>] notifier_call_chain+0x20/0x31
       [<c012a907>] raw_notifier_call_chain+0x8/0xa
       [<c013aa10>] _cpu_down+0x47/0x1f8
       [<c013abe7>] cpu_down+0x26/0x38
       [<c0296462>] store_online+0x27/0x5a
       [<c02935f4>] sysdev_store+0x20/0x25
       [<c0190da1>] sysfs_write_file+0xb3/0xdb
       [<c01602d9>] vfs_write+0xaf/0x163
       [<c0160925>] sys_write+0x3d/0x61
       [<c0102d88>] syscall_call+0x7/0xb
       [<ffffffff>] 0xffffffff

-> #1 (cache_chain_mutex){--..}:
       [<c01374b9>] __lock_acquire+0x912/0xa34
       [<c01378f6>] lock_acquire+0x67/0x8a
       [<c037900d>] __mutex_lock_slowpath+0xf6/0x2b8
       [<c03791eb>] mutex_lock+0x1c/0x1f
       [<c015dc0d>] cpuup_callback+0x29/0x2d3
       [<c037c687>] notifier_call_chain+0x20/0x31
       [<c012a907>] raw_notifier_call_chain+0x8/0xa
       [<c013a869>] _cpu_up+0x3d/0xbf
       [<c013a911>] cpu_up+0x26/0x38
       [<c010045e>] init+0x7d/0x2d9
       [<c0103a3f>] kernel_thread_helper+0x7/0x10
       [<ffffffff>] 0xffffffff

-> #0 (cpu_add_remove_lock){--..}:
       [<c01373ba>] __lock_acquire+0x813/0xa34
       [<c01378f6>] lock_acquire+0x67/0x8a
       [<c037900d>] __mutex_lock_slowpath+0xf6/0x2b8
       [<c03791eb>] mutex_lock+0x1c/0x1f
       [<c013abd2>] cpu_down+0x11/0x38
       [<c0296462>] store_online+0x27/0x5a
       [<c02935f4>] sysdev_store+0x20/0x25
       [<c0190da1>] sysfs_write_file+0xb3/0xdb
       [<c01602d9>] vfs_write+0xaf/0x163
       [<c0160925>] sys_write+0x3d/0x61
       [<c0102d88>] syscall_call+0x7/0xb
       [<ffffffff>] 0xffffffff

other info that might help us debug this:

2 locks held by bash/1699:
 #0:  (cache_chain_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f
 #1:  (workqueue_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f

stack backtrace:
 [<c0103dcd>] show_trace_log_lvl+0x1a/0x2f
 [<c01043f4>] show_trace+0x12/0x14
 [<c01044a6>] dump_stack+0x16/0x18
 [<c0135c99>] print_circular_bug_tail+0x5f/0x68
 [<c01373ba>] __lock_acquire+0x813/0xa34
 [<c01378f6>] lock_acquire+0x67/0x8a
 [<c037900d>] __mutex_lock_slowpath+0xf6/0x2b8
 [<c03791eb>] mutex_lock+0x1c/0x1f
 [<c013abd2>] cpu_down+0x11/0x38
 [<c0296462>] store_online+0x27/0x5a
 [<c02935f4>] sysdev_store+0x20/0x25
 [<c0190da1>] sysfs_write_file+0xb3/0xdb
 [<c01602d9>] vfs_write+0xaf/0x163
 [<c0160925>] sys_write+0x3d/0x61
 [<c0102d88>] syscall_call+0x7/0xb
 =======================

Exiting the bash process after the first echo command instead results in
the following:

=====================================
[ BUG: lock held at task exit time! ]
-------------------------------------
bash/1547 is exiting with locks still held!
2 locks held by bash/1547:
 #0:  (cache_chain_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f
 #1:  (workqueue_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f

stack backtrace:
 [<c0103dcd>] show_trace_log_lvl+0x1a/0x2f
 [<c01043f4>] show_trace+0x12/0x14
 [<c01044a6>] dump_stack+0x16/0x18
 [<c01358ba>] debug_check_no_locks_held+0x80/0x86
 [<c01217ed>] do_exit+0x6bf/0x6f5
 [<c0121893>] sys_exit_group+0x0/0x11
 [<c01218a2>] sys_exit_group+0xf/0x11
 [<c0102d88>] syscall_call+0x7/0xb
 =======================

If I can provide any other information to help track this down, please let
me know.

--Benjamin Gilbert

8<---------------------------------------------------------->8

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/cpu.h>

static int cpu_callback(struct notifier_block *nb, unsigned long action,
			void *data)
{
	int cpu=(int)data;
	
	switch (action) {
	case CPU_DOWN_PREPARE:
		printk(KERN_DEBUG "Refusing shutdown of CPU %d\n", cpu);
		return NOTIFY_BAD;
	case CPU_DEAD:
		printk(KERN_DEBUG "CPU %d down\n", cpu);
		break;
	}
	return NOTIFY_OK;
}

static struct notifier_block cpu_notifier = {
	.notifier_call = cpu_callback
};

int __init mod_start(void)
{
	int err;
	
	err=register_cpu_notifier(&cpu_notifier);
	if (err)
		return err;
	return 0;
}
module_init(mod_start);

void __exit mod_shutdown(void)
{
	unregister_cpu_notifier(&cpu_notifier);
}
module_exit(mod_shutdown);

MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Failure to release lock after CPU hot-unplug canceled
  2007-01-08 17:07 Failure to release lock after CPU hot-unplug canceled Benjamin Gilbert
@ 2007-01-09 12:17 ` Heiko Carstens
  2007-01-09 12:27   ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 9+ messages in thread
From: Heiko Carstens @ 2007-01-09 12:17 UTC (permalink / raw)
  To: Benjamin Gilbert
  Cc: linux-kernel, vatsa, Ingo Molnar, Gautham shenoy, Andrew Morton

On Mon, Jan 08, 2007 at 12:07:19PM -0500, Benjamin Gilbert wrote:
> If a module returns NOTIFY_BAD to a CPU_DOWN_PREPARE callback, subsequent
> attempts to take a CPU down cause the write into sysfs to wedge.
> 
> This is reproducible in 2.6.20-rc4, but was originally found in 2.6.18.5.
> 
> Steps to reproduce:
> 
> 1.  Load the test module included below
> 2.  Run the following shell commands as root:
> 
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu1/online
> 
> The second echo command hangs in uninterruptible sleep during the write()
> call, and the following appears in dmesg:
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.20-rc4-686 #1
> -------------------------------------------------------
> bash/1699 is trying to acquire lock:
>  (cpu_add_remove_lock){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f
> 
> but task is already holding lock:
>  (workqueue_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f
> 
> which lock already depends on the new lock.

There is something like this

raw_notifier_call_chain(&cpu_chain, CPU_DOWN_FAILED, (void *)(long)cpu);

missing in kernel cpu.c in _cpu_down() in case CPU_DOWN_PREPARE
returned with NOTIFY_BAD. However... this reveals that there is just a
more fundamental problem.

The workqueue code grabs a lock on CPU_[UP|DOWN]_PREPARE and releases it
again on CPU_DOWN_FAILED/CPU_UP_CANCELED. If something in the callchain
returns NOTIFY_BAD the rest of the entries in the callchain won't be
called anymore. But DOWN_FAILED/UP_CANCELED will be called for every
entry.
So we might even end up with a mutex_unlock(&workqueue_mutex) even if
mutex_lock(&workqueue_mutex) hasn't been called...

Maybe this will be addressed by somebody else since cpu hotplug locking
is being worked on (again).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Failure to release lock after CPU hot-unplug canceled
  2007-01-09 12:17 ` Heiko Carstens
@ 2007-01-09 12:27   ` Srivatsa Vaddagiri
  2007-01-09 15:03     ` Heiko Carstens
  0 siblings, 1 reply; 9+ messages in thread
From: Srivatsa Vaddagiri @ 2007-01-09 12:27 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Benjamin Gilbert, linux-kernel, Ingo Molnar, Gautham shenoy,
	Andrew Morton

On Tue, Jan 09, 2007 at 01:17:38PM +0100, Heiko Carstens wrote:
> missing in kernel cpu.c in _cpu_down() in case CPU_DOWN_PREPARE
> returned with NOTIFY_BAD. However... this reveals that there is just a
> more fundamental problem.
> 
> The workqueue code grabs a lock on CPU_[UP|DOWN]_PREPARE and releases it
> again on CPU_DOWN_FAILED/CPU_UP_CANCELED. If something in the callchain
> returns NOTIFY_BAD the rest of the entries in the callchain won't be
> called anymore. But DOWN_FAILED/UP_CANCELED will be called for every
> entry.
> So we might even end up with a mutex_unlock(&workqueue_mutex) even if
> mutex_lock(&workqueue_mutex) hasn't been called...

This is a known problem. Gautham had sent out patches to address them

http://lkml.org/lkml/2006/11/14/93

Looks like they are in latest mm tree. Perhaps the testcase should be
retried against latest mm.

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Failure to release lock after CPU hot-unplug canceled
  2007-01-09 12:27   ` Srivatsa Vaddagiri
@ 2007-01-09 15:03     ` Heiko Carstens
  2007-01-09 15:05       ` [patch -mm] call cpu_chain with CPU_DOWN_FAILED if CPU_DOWN_PREPARE failed Heiko Carstens
                         ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Heiko Carstens @ 2007-01-09 15:03 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Benjamin Gilbert, linux-kernel, Ingo Molnar, Gautham shenoy,
	Andrew Morton

On Tue, Jan 09, 2007 at 05:57:40PM +0530, Srivatsa Vaddagiri wrote:
> On Tue, Jan 09, 2007 at 01:17:38PM +0100, Heiko Carstens wrote:
> > missing in kernel cpu.c in _cpu_down() in case CPU_DOWN_PREPARE
> > returned with NOTIFY_BAD. However... this reveals that there is just a
> > more fundamental problem.
> >
> > The workqueue code grabs a lock on CPU_[UP|DOWN]_PREPARE and releases it
> > again on CPU_DOWN_FAILED/CPU_UP_CANCELED. If something in the callchain
> > returns NOTIFY_BAD the rest of the entries in the callchain won't be
> > called anymore. But DOWN_FAILED/UP_CANCELED will be called for every
> > entry.
> > So we might even end up with a mutex_unlock(&workqueue_mutex) even if
> > mutex_lock(&workqueue_mutex) hasn't been called...
> 
> This is a known problem. Gautham had sent out patches to address them
> 
> http://lkml.org/lkml/2006/11/14/93
> 
> Looks like they are in latest mm tree. Perhaps the testcase should be
> retried against latest mm.

Ah, nice! Wasn't aware of that. But I still think we should have a
CPU_DOWN_FAILED in case CPU_DOWN_PREPARED failed.
Also the slab cache code hasn't been changed to make use of the of the
new CPU_LOCK_[ACQUIRE|RELEASE] stuff. I'm going to send patches in reply
to this mail.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch -mm] call cpu_chain with CPU_DOWN_FAILED if CPU_DOWN_PREPARE failed
  2007-01-09 15:03     ` Heiko Carstens
@ 2007-01-09 15:05       ` Heiko Carstens
  2007-01-09 15:06       ` [patch -mm] slab: use CPU_LOCK_[ACQUIRE|RELEASE] Heiko Carstens
  2007-01-09 16:34       ` Failure to release lock after CPU hot-unplug canceled Benjamin Gilbert
  2 siblings, 0 replies; 9+ messages in thread
From: Heiko Carstens @ 2007-01-09 15:05 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Benjamin Gilbert, linux-kernel, Ingo Molnar, Gautham shenoy,
	Andrew Morton

From: Heiko Carstens <heiko.carstens@de.ibm.com>

This makes cpu hotplug symmetrical: if CPU_UP_PREPARE fails we get
CPU_UP_CANCELED, so we can undo what ever happened on PREPARE.
The same should happen for CPU_DOWN_PREPARE.

Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Gautham Shenoy <ego@in.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 kernel/cpu.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

Index: linux-2.6.20-rc3-mm1/kernel/cpu.c
===================================================================
--- linux-2.6.20-rc3-mm1.orig/kernel/cpu.c
+++ linux-2.6.20-rc3-mm1/kernel/cpu.c
@@ -122,9 +122,10 @@ static int take_cpu_down(void *unused)
 /* Requires cpu_add_remove_lock to be held */
 static int _cpu_down(unsigned int cpu)
 {
-	int err;
+	int err, nr_calls = 0;
 	struct task_struct *p;
 	cpumask_t old_allowed, tmp;
+	void *hcpu = (void *)(long)cpu;
 
 	if (num_online_cpus() == 1)
 		return -EBUSY;
@@ -132,11 +133,12 @@ static int _cpu_down(unsigned int cpu)
 	if (!cpu_online(cpu))
 		return -EINVAL;
 
-	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_ACQUIRE,
-						(void *)(long)cpu);
-	err = raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE,
-						(void *)(long)cpu);
+	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_ACQUIRE, hcpu);
+	err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE,
+					hcpu, -1, &nr_calls);
 	if (err == NOTIFY_BAD) {
+		__raw_notifier_call_chain(&cpu_chain, CPU_DOWN_FAILED, hcpu,
+					  nr_calls, NULL);
 		printk("%s: attempt to take down CPU %u failed\n",
 				__FUNCTION__, cpu);
 		err = -EINVAL;
@@ -156,7 +158,7 @@ static int _cpu_down(unsigned int cpu)
 	if (IS_ERR(p) || cpu_online(cpu)) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
 		if (raw_notifier_call_chain(&cpu_chain, CPU_DOWN_FAILED,
-				(void *)(long)cpu) == NOTIFY_BAD)
+					    hcpu) == NOTIFY_BAD)
 			BUG();
 
 		if (IS_ERR(p)) {
@@ -178,8 +180,7 @@ static int _cpu_down(unsigned int cpu)
 	put_cpu();
 
 	/* CPU is completely dead: tell everyone.  Too late to complain. */
-	if (raw_notifier_call_chain(&cpu_chain, CPU_DEAD,
-			(void *)(long)cpu) == NOTIFY_BAD)
+	if (raw_notifier_call_chain(&cpu_chain, CPU_DEAD, hcpu) == NOTIFY_BAD)
 		BUG();
 
 	check_for_tasks(cpu);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch -mm] slab: use CPU_LOCK_[ACQUIRE|RELEASE]
  2007-01-09 15:03     ` Heiko Carstens
  2007-01-09 15:05       ` [patch -mm] call cpu_chain with CPU_DOWN_FAILED if CPU_DOWN_PREPARE failed Heiko Carstens
@ 2007-01-09 15:06       ` Heiko Carstens
  2007-01-10 18:20         ` Christoph Lameter
  2007-01-09 16:34       ` Failure to release lock after CPU hot-unplug canceled Benjamin Gilbert
  2 siblings, 1 reply; 9+ messages in thread
From: Heiko Carstens @ 2007-01-09 15:06 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Benjamin Gilbert, linux-kernel, Ingo Molnar, Gautham shenoy,
	Andrew Morton, Pekka Enberg

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Looks like this was forgotten when CPU_LOCK_[ACQUIRE|RELEASE] was
introduced.

Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Gautham Shenoy <ego@in.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 mm/slab.c |   13 +++++--------
 1 files changed, 5 insertions(+), 8 deletions(-)

Index: linux-2.6.20-rc3-mm1/mm/slab.c
===================================================================
--- linux-2.6.20-rc3-mm1.orig/mm/slab.c
+++ linux-2.6.20-rc3-mm1/mm/slab.c
@@ -1177,8 +1177,10 @@ static int __cpuinit cpuup_callback(stru
 	int memsize = sizeof(struct kmem_list3);
 
 	switch (action) {
-	case CPU_UP_PREPARE:
+	case CPU_LOCK_ACQUIRE:
 		mutex_lock(&cache_chain_mutex);
+		break;
+	case CPU_UP_PREPARE:
 		/*
 		 * We need to do this right in the beginning since
 		 * alloc_arraycache's are going to use this list.
@@ -1264,16 +1266,9 @@ static int __cpuinit cpuup_callback(stru
 		}
 		break;
 	case CPU_ONLINE:
-		mutex_unlock(&cache_chain_mutex);
 		start_cpu_timer(cpu);
 		break;
 #ifdef CONFIG_HOTPLUG_CPU
-	case CPU_DOWN_PREPARE:
-		mutex_lock(&cache_chain_mutex);
-		break;
-	case CPU_DOWN_FAILED:
-		mutex_unlock(&cache_chain_mutex);
-		break;
 	case CPU_DEAD:
 		/*
 		 * Even if all the cpus of a node are down, we don't free the
@@ -1344,6 +1339,8 @@ free_array_cache:
 				continue;
 			drain_freelist(cachep, l3, l3->free_objects);
 		}
+		break;
+	case CPU_LOCK_RELEASE:
 		mutex_unlock(&cache_chain_mutex);
 		break;
 	}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch -mm] slab: use CPU_LOCK_[ACQUIRE|RELEASE]
  2007-01-09 15:06       ` [patch -mm] slab: use CPU_LOCK_[ACQUIRE|RELEASE] Heiko Carstens
@ 2007-01-10 18:20         ` Christoph Lameter
  2007-01-11  2:30           ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Lameter @ 2007-01-10 18:20 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Srivatsa Vaddagiri, Benjamin Gilbert, linux-kernel, Ingo Molnar,
	Gautham shenoy, Andrew Morton, Pekka Enberg

On Tue, 9 Jan 2007, Heiko Carstens wrote:

> -	case CPU_UP_PREPARE:
> +	case CPU_LOCK_ACQUIRE:
>  		mutex_lock(&cache_chain_mutex);
> +		break;

I have got a bad feeling about upcoming deadlock problems when looking at 
the mutex_lock / unlock code in cpuup_callback in slab.c. Branches 
that just obtain a lock or release a lock? I hope there is some 
control of  what happens between lock acquisition and release?

You are aware that this lock is taken for cache shrinking/destroy, tuning 
of cpu cache sizes, proc output and cache creation? Any of those run on 
the same processor should cause a deadlock.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch -mm] slab: use CPU_LOCK_[ACQUIRE|RELEASE]
  2007-01-10 18:20         ` Christoph Lameter
@ 2007-01-11  2:30           ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 9+ messages in thread
From: Srivatsa Vaddagiri @ 2007-01-11  2:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Heiko Carstens, Benjamin Gilbert, linux-kernel, Ingo Molnar,
	Gautham shenoy, Andrew Morton, Pekka Enberg

On Wed, Jan 10, 2007 at 10:20:28AM -0800, Christoph Lameter wrote:
> I have got a bad feeling about upcoming deadlock problems when looking at
> the mutex_lock / unlock code in cpuup_callback in slab.c. Branches
> that just obtain a lock or release a lock? I hope there is some
> control of  what happens between lock acquisition and release?

A cpu hotplug should happen between LOCK_ACQUIRE/RELEASE

> You are aware that this lock is taken for cache shrinking/destroy, tuning
> of cpu cache sizes, proc output and cache creation? Any of those run on
> the same processor should cause a deadlock.

Why? mutex_lock() taken in LOCK_ACQ will just block those functions
(cache create etc) from proceeding simultaneously as a hotplug event.
This per-subsystem mutex_lock() is supposed to be a replacement for the global
lock_cpu_hotplug() lock .. 

But the whole thing is changing again ..we will likely move towards a
process freezer based cpu hotplug locking ..all the lock_cpu_hotplugs()
and the existing LOCK_ACQ/RELS can go away when we do that ..

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Failure to release lock after CPU hot-unplug canceled
  2007-01-09 15:03     ` Heiko Carstens
  2007-01-09 15:05       ` [patch -mm] call cpu_chain with CPU_DOWN_FAILED if CPU_DOWN_PREPARE failed Heiko Carstens
  2007-01-09 15:06       ` [patch -mm] slab: use CPU_LOCK_[ACQUIRE|RELEASE] Heiko Carstens
@ 2007-01-09 16:34       ` Benjamin Gilbert
  2 siblings, 0 replies; 9+ messages in thread
From: Benjamin Gilbert @ 2007-01-09 16:34 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Srivatsa Vaddagiri, linux-kernel, Ingo Molnar, Gautham shenoy,
	Andrew Morton

Heiko Carstens wrote:
> On Tue, Jan 09, 2007 at 05:57:40PM +0530, Srivatsa Vaddagiri wrote:
>> On Tue, Jan 09, 2007 at 01:17:38PM +0100, Heiko Carstens wrote:
>>> The workqueue code grabs a lock on CPU_[UP|DOWN]_PREPARE and releases it
>>> again on CPU_DOWN_FAILED/CPU_UP_CANCELED. If something in the callchain
>>> returns NOTIFY_BAD the rest of the entries in the callchain won't be
>>> called anymore. But DOWN_FAILED/UP_CANCELED will be called for every
>>> entry.
>>> So we might even end up with a mutex_unlock(&workqueue_mutex) even if
>>> mutex_lock(&workqueue_mutex) hasn't been called...
 >>
>> This is a known problem. Gautham had sent out patches to address them
>>
>> http://lkml.org/lkml/2006/11/14/93
>>
>> Looks like they are in latest mm tree. Perhaps the testcase should be
>> retried against latest mm.
 >
> Ah, nice! Wasn't aware of that. But I still think we should have a
> CPU_DOWN_FAILED in case CPU_DOWN_PREPARED failed.
> Also the slab cache code hasn't been changed to make use of the of the
> new CPU_LOCK_[ACQUIRE|RELEASE] stuff. I'm going to send patches in reply
> to this mail.

2.6.20-rc3-mm1 plus your patches fixes it for me.

Thanks
--Benjamin Gilbert


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-01-11  2:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-08 17:07 Failure to release lock after CPU hot-unplug canceled Benjamin Gilbert
2007-01-09 12:17 ` Heiko Carstens
2007-01-09 12:27   ` Srivatsa Vaddagiri
2007-01-09 15:03     ` Heiko Carstens
2007-01-09 15:05       ` [patch -mm] call cpu_chain with CPU_DOWN_FAILED if CPU_DOWN_PREPARE failed Heiko Carstens
2007-01-09 15:06       ` [patch -mm] slab: use CPU_LOCK_[ACQUIRE|RELEASE] Heiko Carstens
2007-01-10 18:20         ` Christoph Lameter
2007-01-11  2:30           ` Srivatsa Vaddagiri
2007-01-09 16:34       ` Failure to release lock after CPU hot-unplug canceled Benjamin Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.