From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753242AbYBEOGj (ORCPT ); Tue, 5 Feb 2008 09:06:39 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751581AbYBEOG3 (ORCPT ); Tue, 5 Feb 2008 09:06:29 -0500 Received: from sinclair.provo.novell.com ([137.65.248.137]:30719 "EHLO sinclair.provo.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751556AbYBEOG1 convert rfc822-to-8bit (ORCPT ); Tue, 5 Feb 2008 09:06:27 -0500 Message-Id: <47A82590.BA47.005A.0@novell.com> X-Mailer: Novell GroupWise Internet Agent 7.0.2 HP Date: Tue, 05 Feb 2008 07:00:00 -0700 From: "Gregory Haskins" To: , "Max Krasnyanskiy" Cc: "Ingo Molnar" , "LKML" , Subject: Re: CPU hotplug and IRQ affinity with 2.6.24-rt1 References: <47A7A131.8040800@qualcomm.com> <20080205025144.GA31774@dwalker1.mvista.com> In-Reply-To: <20080205025144.GA31774@dwalker1.mvista.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8BIT Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >>> On Mon, Feb 4, 2008 at 9:51 PM, in message <20080205025144.GA31774@dwalker1.mvista.com>, Daniel Walker wrote: > On Mon, Feb 04, 2008 at 03:35:13PM -0800, Max Krasnyanskiy wrote: [snip] >> >> Also the first thing I tried was to bring CPU1 off-line. Thats the fastest >> way to get irqs, soft-irqs, timers, etc of a CPU. But the box hung >> completely. After applying my earlier submitted patch, I was able to reproduce the hang you mentioned. I poked around in sysrq and it looked like a deadlock on a rt_mutex, so I turned on lockdep and it found: ======================================================= [ INFO: possible circular locking dependency detected ] [ 2.6.24-rt1-rt #3 ------------------------------------------------------- bash/4604 is trying to acquire lock: (events){--..}, at: [] cleanup_workqueue_thread+0x16/0x80 but task is already holding lock: (workqueue_mutex){--..}, at: [] workqueue_cpu_callback+0xe5/0x140 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #5 (workqueue_mutex){--..}: [] __lock_acquire+0xf82/0x1090 [] lock_acquire+0x57/0x80 [] workqueue_cpu_callback+0xe5/0x140 [] _mutex_lock+0x28/0x40 [] workqueue_cpu_callback+0xe5/0x140 [] notifier_call_chain+0x45/0x90 [] __raw_notifier_call_chain+0x9/0x10 [] raw_notifier_call_chain+0x11/0x20 [] _cpu_down+0x97/0x2d0 [] cpu_down+0x25/0x60 [] cpu_down+0x38/0x60 [] store_online+0x49/0xa0 [] sysdev_store+0x24/0x30 [] sysfs_write_file+0xcf/0x140 [] vfs_write+0xe5/0x1a0 [] sys_write+0x53/0x90 [] system_call+0x7e/0x83 [] 0xffffffffffffffff -> #4 (cache_chain_mutex){--..}: [] __lock_acquire+0xf82/0x1090 [] lock_acquire+0x57/0x80 [] kmem_cache_create+0x6a/0x480 [] _mutex_lock+0x28/0x40 [] kmem_cache_create+0x6a/0x480 [] __rcu_read_unlock+0x96/0xb0 [] fib_hash_init+0xa4/0xe0 [] fib_new_table+0x35/0x70 [] fib_magic+0x91/0x100 [] fib_add_ifaddr+0x73/0x170 [] fib_inetaddr_event+0x4b/0x260 [] notifier_call_chain+0x45/0x90 [] __blocking_notifier_call_chain+0x5e/0x90 [] blocking_notifier_call_chain+0x11/0x20 [] __inet_insert_ifa+0xd4/0x170 [] inet_insert_ifa+0xd/0x10 [] inetdev_event+0x45a/0x510 [] fib_rules_event+0x6d/0x160 [] notifier_call_chain+0x45/0x90 [] __raw_notifier_call_chain+0x9/0x10 [] raw_notifier_call_chain+0x11/0x20 [] call_netdevice_notifiers+0x16/0x20 [] dev_open+0x8d/0xa0 [] dev_change_flags+0x99/0x1b0 [] devinet_ioctl+0x5ad/0x760 [] dev_ioctl+0x4ba/0x590 [] trace_hardirqs_on+0xd/0x10 [] inet_ioctl+0x5d/0x80 [] sock_ioctl+0xd1/0x260 [] do_ioctl+0x34/0xa0 [] vfs_ioctl+0x79/0x2f0 [] trace_hardirqs_on_thunk+0x3a/0x3f [] sys_ioctl+0x82/0xa0 [] system_call+0x7e/0x83 [] 0xffffffffffffffff -> #3 ((inetaddr_chain).rwsem){..--}: [] __lock_acquire+0xf82/0x1090 [] lock_acquire+0x57/0x80 [] rt_down_read+0xb/0x10 [] __rt_down_read+0x29/0x80 [] rt_down_read+0xb/0x10 [] __blocking_notifier_call_chain+0x48/0x90 [] blocking_notifier_call_chain+0x11/0x20 [] __inet_insert_ifa+0xd4/0x170 [] inet_insert_ifa+0xd/0x10 [] inetdev_event+0x45a/0x510 [] fib_rules_event+0x6d/0x160 [] notifier_call_chain+0x45/0x90 [] __raw_notifier_call_chain+0x9/0x10 [] raw_notifier_call_chain+0x11/0x20 [] call_netdevice_notifiers+0x16/0x20 [] dev_open+0x8d/0xa0 [] dev_change_flags+0x99/0x1b0 [] devinet_ioctl+0x5ad/0x760 [] dev_ioctl+0x4ba/0x590 [] trace_hardirqs_on+0xd/0x10 [] inet_ioctl+0x5d/0x80 [] sock_ioctl+0xd1/0x260 [] do_ioctl+0x34/0xa0 [] vfs_ioctl+0x79/0x2f0 [] trace_hardirqs_on_thunk+0x3a/0x3f [] sys_ioctl+0x82/0xa0 [] system_call+0x7e/0x83 [] 0xffffffffffffffff -> #2 (rtnl_mutex){--..}: [] __lock_acquire+0xf82/0x1090 [] lock_acquire+0x57/0x80 [] rtnl_lock+0x10/0x20 [] _mutex_lock+0x28/0x40 [] rtnl_lock+0x10/0x20 [] linkwatch_event+0x9/0x40 [] run_workqueue+0x221/0x2f0 [] linkwatch_event+0x0/0x40 [] worker_thread+0xd3/0x140 [] autoremove_wake_function+0x0/0x40 [] worker_thread+0x0/0x140 [] kthread+0x4d/0x80 [] child_rip+0xa/0x12 [] restore_args+0x0/0x30 [] kthread+0x0/0x80 [] child_rip+0x0/0x12 [] 0xffffffffffffffff -> #1 ((linkwatch_work).work){--..}: [] __lock_acquire+0xf82/0x1090 [] lock_acquire+0x57/0x80 [] run_workqueue+0x1ca/0x2f0 [] run_workqueue+0x21a/0x2f0 [] linkwatch_event+0x0/0x40 [] worker_thread+0xd3/0x140 [] autoremove_wake_function+0x0/0x40 [] worker_thread+0x0/0x140 [] kthread+0x4d/0x80 [] child_rip+0xa/0x12 [] restore_args+0x0/0x30 [] kthread+0x0/0x80 [] child_rip+0x0/0x12 [] 0xffffffffffffffff -> #0 (events){--..}: [] print_circular_bug_entry+0x49/0x60 [] __lock_acquire+0xd80/0x1090 [] lock_acquire+0x57/0x80 [] cleanup_workqueue_thread+0x16/0x80 [] cleanup_workqueue_thread+0x39/0x80 [] workqueue_cpu_callback+0x8d/0x140 [] notifier_call_chain+0x45/0x90 [] __raw_notifier_call_chain+0x9/0x10 [] raw_notifier_call_chain+0x11/0x20 [] _cpu_down+0x1eb/0x2d0 [] cpu_down+0x25/0x60 [] cpu_down+0x38/0x60 [] store_online+0x49/0xa0 [] sysdev_store+0x24/0x30 [] sysfs_write_file+0xcf/0x140 [] vfs_write+0xe5/0x1a0 [] sys_write+0x53/0x90 [] system_call+0x7e/0x83 [] 0xffffffffffffffff other info that might help us debug this: 5 locks held by bash/4604: #0: (&buffer->mutex){--..}, at: [] sysfs_write_file+0x41/0x140 #1: (cpu_add_remove_lock){--..}, at: [] cpu_down+0x25/0x60 #2: (sched_hotcpu_mutex){--..}, at: [] migration_call+0x2b1/0x540 #3: (cache_chain_mutex){--..}, at: [] cpuup_callback+0x211/0x400 #4: (workqueue_mutex){--..}, at: [] workqueue_cpu_callback+0xe5/0x140 stack backtrace: Pid: 4604, comm: bash Not tainted 2.6.24-rt1-rt #3 Call Trace: [] print_circular_bug_tail+0x84/0x90 [] print_circular_bug_entry+0x49/0x60 [] __lock_acquire+0xd80/0x1090 [] lock_acquire+0x57/0x80 [] cleanup_workqueue_thread+0x16/0x80 [] cleanup_workqueue_thread+0x39/0x80 [] workqueue_cpu_callback+0x8d/0x140 [] notifier_call_chain+0x45/0x90 [] __raw_notifier_call_chain+0x9/0x10 [] raw_notifier_call_chain+0x11/0x20 [] _cpu_down+0x1eb/0x2d0 [] cpu_down+0x25/0x60 [] cpu_down+0x38/0x60 [] store_online+0x49/0xa0 [] sysdev_store+0x24/0x30 [] sysfs_write_file+0xcf/0x140 [] vfs_write+0xe5/0x1a0 [] sys_write+0x53/0x90 [] system_call+0x7e/0x83 INFO: lockdep is turned off. --------------------------- | preempt count: 00000000 ] | 0-level deep critical section nesting: ----------------------------------------