All of lore.kernel.org
 help / color / mirror / Atom feed
* Failure to release lock after CPU hot-unplug canceled
@ 2007-01-08 17:07 Benjamin Gilbert
  2007-01-09 12:17 ` Heiko Carstens
  0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Gilbert @ 2007-01-08 17:07 UTC (permalink / raw)
  To: linux-kernel

If a module returns NOTIFY_BAD to a CPU_DOWN_PREPARE callback, subsequent
attempts to take a CPU down cause the write into sysfs to wedge.

This is reproducible in 2.6.20-rc4, but was originally found in 2.6.18.5.

Steps to reproduce:

1.  Load the test module included below
2.  Run the following shell commands as root:

echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu1/online

The second echo command hangs in uninterruptible sleep during the write()
call, and the following appears in dmesg:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.20-rc4-686 #1
-------------------------------------------------------
bash/1699 is trying to acquire lock:
 (cpu_add_remove_lock){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f

but task is already holding lock:
 (workqueue_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (workqueue_mutex){--..}:
       [<c01374b9>] __lock_acquire+0x912/0xa34
       [<c01378f6>] lock_acquire+0x67/0x8a
       [<c037900d>] __mutex_lock_slowpath+0xf6/0x2b8
       [<c03791eb>] mutex_lock+0x1c/0x1f
       [<c012dc27>] workqueue_cpu_callback+0x10b/0x20c
       [<c037c687>] notifier_call_chain+0x20/0x31
       [<c012a907>] raw_notifier_call_chain+0x8/0xa
       [<c013aa10>] _cpu_down+0x47/0x1f8
       [<c013abe7>] cpu_down+0x26/0x38
       [<c0296462>] store_online+0x27/0x5a
       [<c02935f4>] sysdev_store+0x20/0x25
       [<c0190da1>] sysfs_write_file+0xb3/0xdb
       [<c01602d9>] vfs_write+0xaf/0x163
       [<c0160925>] sys_write+0x3d/0x61
       [<c0102d88>] syscall_call+0x7/0xb
       [<ffffffff>] 0xffffffff

-> #1 (cache_chain_mutex){--..}:
       [<c01374b9>] __lock_acquire+0x912/0xa34
       [<c01378f6>] lock_acquire+0x67/0x8a
       [<c037900d>] __mutex_lock_slowpath+0xf6/0x2b8
       [<c03791eb>] mutex_lock+0x1c/0x1f
       [<c015dc0d>] cpuup_callback+0x29/0x2d3
       [<c037c687>] notifier_call_chain+0x20/0x31
       [<c012a907>] raw_notifier_call_chain+0x8/0xa
       [<c013a869>] _cpu_up+0x3d/0xbf
       [<c013a911>] cpu_up+0x26/0x38
       [<c010045e>] init+0x7d/0x2d9
       [<c0103a3f>] kernel_thread_helper+0x7/0x10
       [<ffffffff>] 0xffffffff

-> #0 (cpu_add_remove_lock){--..}:
       [<c01373ba>] __lock_acquire+0x813/0xa34
       [<c01378f6>] lock_acquire+0x67/0x8a
       [<c037900d>] __mutex_lock_slowpath+0xf6/0x2b8
       [<c03791eb>] mutex_lock+0x1c/0x1f
       [<c013abd2>] cpu_down+0x11/0x38
       [<c0296462>] store_online+0x27/0x5a
       [<c02935f4>] sysdev_store+0x20/0x25
       [<c0190da1>] sysfs_write_file+0xb3/0xdb
       [<c01602d9>] vfs_write+0xaf/0x163
       [<c0160925>] sys_write+0x3d/0x61
       [<c0102d88>] syscall_call+0x7/0xb
       [<ffffffff>] 0xffffffff

other info that might help us debug this:

2 locks held by bash/1699:
 #0:  (cache_chain_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f
 #1:  (workqueue_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f

stack backtrace:
 [<c0103dcd>] show_trace_log_lvl+0x1a/0x2f
 [<c01043f4>] show_trace+0x12/0x14
 [<c01044a6>] dump_stack+0x16/0x18
 [<c0135c99>] print_circular_bug_tail+0x5f/0x68
 [<c01373ba>] __lock_acquire+0x813/0xa34
 [<c01378f6>] lock_acquire+0x67/0x8a
 [<c037900d>] __mutex_lock_slowpath+0xf6/0x2b8
 [<c03791eb>] mutex_lock+0x1c/0x1f
 [<c013abd2>] cpu_down+0x11/0x38
 [<c0296462>] store_online+0x27/0x5a
 [<c02935f4>] sysdev_store+0x20/0x25
 [<c0190da1>] sysfs_write_file+0xb3/0xdb
 [<c01602d9>] vfs_write+0xaf/0x163
 [<c0160925>] sys_write+0x3d/0x61
 [<c0102d88>] syscall_call+0x7/0xb
 =======================

Exiting the bash process after the first echo command instead results in
the following:

=====================================
[ BUG: lock held at task exit time! ]
-------------------------------------
bash/1547 is exiting with locks still held!
2 locks held by bash/1547:
 #0:  (cache_chain_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f
 #1:  (workqueue_mutex){--..}, at: [<c03791eb>] mutex_lock+0x1c/0x1f

stack backtrace:
 [<c0103dcd>] show_trace_log_lvl+0x1a/0x2f
 [<c01043f4>] show_trace+0x12/0x14
 [<c01044a6>] dump_stack+0x16/0x18
 [<c01358ba>] debug_check_no_locks_held+0x80/0x86
 [<c01217ed>] do_exit+0x6bf/0x6f5
 [<c0121893>] sys_exit_group+0x0/0x11
 [<c01218a2>] sys_exit_group+0xf/0x11
 [<c0102d88>] syscall_call+0x7/0xb
 =======================

If I can provide any other information to help track this down, please let
me know.

--Benjamin Gilbert

8<---------------------------------------------------------->8

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/cpu.h>

static int cpu_callback(struct notifier_block *nb, unsigned long action,
			void *data)
{
	int cpu=(int)data;
	
	switch (action) {
	case CPU_DOWN_PREPARE:
		printk(KERN_DEBUG "Refusing shutdown of CPU %d\n", cpu);
		return NOTIFY_BAD;
	case CPU_DEAD:
		printk(KERN_DEBUG "CPU %d down\n", cpu);
		break;
	}
	return NOTIFY_OK;
}

static struct notifier_block cpu_notifier = {
	.notifier_call = cpu_callback
};

int __init mod_start(void)
{
	int err;
	
	err=register_cpu_notifier(&cpu_notifier);
	if (err)
		return err;
	return 0;
}
module_init(mod_start);

void __exit mod_shutdown(void)
{
	unregister_cpu_notifier(&cpu_notifier);
}
module_exit(mod_shutdown);

MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-01-11  2:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-08 17:07 Failure to release lock after CPU hot-unplug canceled Benjamin Gilbert
2007-01-09 12:17 ` Heiko Carstens
2007-01-09 12:27   ` Srivatsa Vaddagiri
2007-01-09 15:03     ` Heiko Carstens
2007-01-09 15:05       ` [patch -mm] call cpu_chain with CPU_DOWN_FAILED if CPU_DOWN_PREPARE failed Heiko Carstens
2007-01-09 15:06       ` [patch -mm] slab: use CPU_LOCK_[ACQUIRE|RELEASE] Heiko Carstens
2007-01-10 18:20         ` Christoph Lameter
2007-01-11  2:30           ` Srivatsa Vaddagiri
2007-01-09 16:34       ` Failure to release lock after CPU hot-unplug canceled Benjamin Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.