From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755327Ab1GPTmn (ORCPT ); Sat, 16 Jul 2011 15:42:43 -0400 Received: from mail001.aei.ca ([206.123.6.130]:46460 "EHLO mail001.aei.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754286Ab1GPTml (ORCPT ); Sat, 16 Jul 2011 15:42:41 -0400 From: Ed Tomlinson To: paulmck@linux.vnet.ibm.com Subject: Re: INFO: possible circular locking dependency detected Date: Sat, 16 Jul 2011 15:42:30 -0400 User-Agent: KMail/1.13.7 (Linux/3.0.0-rc7-crc+; KDE/4.6.5; x86_64; ; ) Cc: Peter Zijlstra , Steven Rostedt , Sergey Senozhatsky , Ingo Molnar , Thomas Gleixner , Andrew Morton , Dipankar Sarma , linux-kernel@vger.kernel.org References: <1310665613.27864.50.camel@gandalf.stny.rr.com> <201107151748.07109.edt@aei.ca> <20110715220447.GJ2327@linux.vnet.ibm.com> In-Reply-To: <20110715220447.GJ2327@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201107161542.31755.edt@aei.ca> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Friday 15 July 2011 18:04:47 Paul E. McKenney wrote: > On Fri, Jul 15, 2011 at 05:48:06PM -0400, Ed Tomlinson wrote: > > On Friday 15 July 2011 12:56:13 Paul E. McKenney wrote: > > [ . . . ] > > > > OK. Ed, would you be willing to try the patch out? > > > > I am booted at the same git commit with a bluetooth and the disable local_bh around softirq() > > patch from this thread. So far so good. Not sure how 'easy' this one is to trigger a second time - > > I've been running with threadirq enabled since .39 came out. Last night was the first deadlock... > > If nothing happened post rc6 to make it more likely it could be a while before it triggers again. > > Thank you for trying it out, Ed! And I know that you will not be shy > should the problem recur. ;-) Found this in dmesg this afternoon. This time, though X was dead, I was able to cancel and restart it. This is with Peter's patch to call softirq() with local_bh disabled. [34962.637491] chrome_sandbox (4031): /proc/4029/oom_adj is deprecated, please use /proc/4029/oom_score_adj instead. [39066.186558] [39066.186561] ======================================================= [39066.193875] [ INFO: possible circular locking dependency detected ] [39066.195274] 3.0.0-rc7-crc+ #336 [39066.195274] ------------------------------------------------------- [39066.195274] chrome/3961 is trying to acquire lock: [39066.195274] (&p->pi_lock){-.-.-.}, at: [] try_to_wake_up+0x2e/0x360 [39066.195274] [39066.195274] but task is already holding lock: [39066.195274] (sync_rcu_preempt_exp_wq.lock){......}, at: [] __wake_up+0x2a/0x60 [39066.195274] [39066.195274] which lock already depends on the new lock. [39066.195274] [39066.195274] [39066.195274] the existing dependency chain (in reverse order) is: [39066.195274] [39066.195274] -> #3 (sync_rcu_preempt_exp_wq.lock){......}: [39066.195274] [] lock_acquire+0x95/0x140 [39066.195274] [] _raw_spin_lock_irqsave+0x46/0x60 [39066.195274] [] __wake_up+0x2a/0x60 [39066.195274] [] rcu_report_exp_rnp+0xa1/0xb0 [39066.195274] [] synchronize_rcu_expedited+0x157/0x210 [39066.195274] [] synchronize_net+0x45/0x50 [39066.195274] [] ipip6_tunnel_ioctl+0x5f3/0x800 [sit] [39066.195274] [] dev_ifsioc+0x11a/0x2c0 [39066.195274] [] dev_ioctl+0x35a/0x810 [39066.195274] [] sock_ioctl+0xea/0x2b0 [39066.195274] [] do_vfs_ioctl+0xa4/0x5a0 [39066.195274] [] sys_ioctl+0x99/0xa0 [39066.195274] [] system_call_fastpath+0x16/0x1b [39066.195274] [39066.195274] -> #2 (rcu_node_level_0){..-.-.}: [39066.195274] [] lock_acquire+0x95/0x140 [39066.195274] [] _raw_spin_lock+0x3b/0x50 [39066.195274] [] __rcu_read_unlock+0x19f/0x2d0 [39066.195274] [] cpuacct_charge+0xc8/0xe0 [39066.195274] [] update_curr+0x1a5/0x210 [39066.195274] [] enqueue_task_fair+0x7a/0x650 [39066.195274] [] enqueue_task+0x79/0x90 [39066.195274] [] activate_task+0x2d/0x40 [39066.195274] [] ttwu_activate+0x21/0x50 [39066.195274] [] T.2447+0x3c/0x60 [39066.195274] [] sched_ttwu_pending+0x44/0x60 [39066.195274] [] scheduler_ipi+0xe/0x10 [39066.195274] [] smp_reschedule_interrupt+0x2a/0x30 [39066.195274] [] reschedule_interrupt+0x13/0x20 [39066.195274] [] sock_def_readable+0x94/0xc0 [39066.195274] [] unix_stream_sendmsg+0x2a8/0x410 [unix] [39066.195274] [] sock_sendmsg+0xe9/0x120 [39066.195274] [] sys_sendto+0x139/0x190 [39066.195274] [] system_call_fastpath+0x16/0x1b [39066.195274] [39066.195274] -> #1 (&rq->lock){-.-.-.}: [39066.195274] [] lock_acquire+0x95/0x140 [39066.195274] [] _raw_spin_lock+0x3b/0x50 [39066.195274] [] wake_up_new_task+0xc8/0x280 [39066.195274] [] do_fork+0x23a/0x450 [39066.195274] [] kernel_thread+0x71/0x80 [39066.195274] [] rest_init+0x26/0xe0 [39066.195274] [] start_kernel+0x32f/0x33a [39066.195274] [] x86_64_start_reservations+0xfb/0xff [39066.195274] [] x86_64_start_kernel+0xed/0xf4 [39066.195274] [39066.195274] -> #0 (&p->pi_lock){-.-.-.}: [39066.195274] [] __lock_acquire+0x1588/0x16a0 [39066.195274] [] lock_acquire+0x95/0x140 [39066.195274] [] _raw_spin_lock_irqsave+0x46/0x60 [39066.195274] [] try_to_wake_up+0x2e/0x360 [39066.195274] [] default_wake_function+0x12/0x20 [39066.195274] [] autoremove_wake_function+0x16/0x40 [39066.195274] [] __wake_up_common+0x59/0x90 [39066.195274] [] __wake_up+0x40/0x60 [39066.195274] [] rcu_report_exp_rnp+0xa1/0xb0 [39066.195274] [] __rcu_read_unlock+0x258/0x2d0 [39066.195274] [] sock_def_wakeup+0x80/0x90 [39066.195274] [] unix_release_sock+0x219/0x260 [unix] [39066.195274] [] unix_release+0x26/0x30 [unix] [39066.195274] [] sock_release+0x29/0x90 [39066.195274] [] sock_close+0x17/0x30 [39066.195274] [] fput+0x12a/0x2a0 [39066.195274] [] filp_close+0x7b/0xd0 [39066.195274] [] put_files_struct+0x178/0x1b0 [39066.195274] [] exit_files+0x52/0x60 [39066.195274] [] do_exit+0x391/0xa60 [39066.195274] [] do_group_exit+0x6a/0xd0 [39066.195274] [] get_signal_to_deliver+0x4bb/0x5e0 [39066.195274] [] do_signal+0x75/0x8b0 [39066.195274] [] do_notify_resume+0x65/0x80 [39066.195274] [] int_signal+0x12/0x17 [39066.195274] [39066.195274] other info that might help us debug this: [39066.195274] [39066.195274] Chain exists of: [39066.195274] &p->pi_lock --> rcu_node_level_0 --> sync_rcu_preempt_exp_wq.lock [39066.195274] [39066.195274] Possible unsafe locking scenario: [39066.195274] [39066.195274] CPU0 CPU1 [39066.195274] ---- ---- [39066.195274] lock(sync_rcu_preempt_exp_wq.lock); [39066.195274] lock(rcu_node_level_0); [39066.195274] lock(sync_rcu_preempt_exp_wq.lock); [39066.195274] lock(&p->pi_lock); [39066.195274] [39066.195274] *** DEADLOCK *** [39066.195274] [39066.195274] 2 locks held by chrome/3961: [39066.195274] #0: (rcu_node_level_0){..-.-.}, at: [] rcu_report_exp_rnp+0x1d/0xb0 [39066.195274] #1: (sync_rcu_preempt_exp_wq.lock){......}, at: [] __wake_up+0x2a/0x60 [39066.195274] [39066.195274] stack backtrace: [39066.195274] Pid: 3961, comm: chrome Not tainted 3.0.0-rc7-crc+ #336 [39066.195274] Call Trace: [39066.195274] [] print_circular_bug+0x20e/0x2f0 [39066.195274] [] __lock_acquire+0x1588/0x16a0 [39066.195274] [] ? try_to_wake_up+0x2e/0x360 [39066.195274] [] lock_acquire+0x95/0x140 [39066.195274] [] ? try_to_wake_up+0x2e/0x360 [39066.195274] [] ? _raw_spin_lock_irqsave+0x1c/0x60 [39066.195274] [] _raw_spin_lock_irqsave+0x46/0x60 [39066.195274] [] ? try_to_wake_up+0x2e/0x360 [39066.195274] [] try_to_wake_up+0x2e/0x360 [39066.195274] [] ? __wake_up+0x2a/0x60 [39066.195274] [] default_wake_function+0x12/0x20 [39066.195274] [] autoremove_wake_function+0x16/0x40 [39066.195274] [] __wake_up_common+0x59/0x90 [39066.195274] [] __wake_up+0x40/0x60 [39066.195274] [] ? rcu_report_exp_rnp+0x1d/0xb0 [39066.195274] [] rcu_report_exp_rnp+0xa1/0xb0 [39066.195274] [] __rcu_read_unlock+0x258/0x2d0 [39066.195274] [] sock_def_wakeup+0x80/0x90 [39066.195274] [] ? sock_def_error_report+0xc0/0xc0 [39066.195274] [] ? _raw_spin_unlock+0x5c/0x70 [39066.195274] [] unix_release_sock+0x219/0x260 [unix] [39066.195274] [] unix_release+0x26/0x30 [unix] [39066.195274] [] sock_release+0x29/0x90 [39066.195274] [] sock_close+0x17/0x30 [39066.195274] [] fput+0x12a/0x2a0 [39066.195274] [] filp_close+0x7b/0xd0 [39066.195274] [] put_files_struct+0x178/0x1b0 [39066.195274] [] ? put_files_struct+0x38/0x1b0 [39066.195274] [] exit_files+0x52/0x60 [39066.195274] [] do_exit+0x391/0xa60 [39066.195274] [] ? get_signal_to_deliver+0x9f/0x5e0 [39066.195274] [] do_group_exit+0x6a/0xd0 [39066.195274] [] get_signal_to_deliver+0x4bb/0x5e0 [39066.195274] [] do_signal+0x75/0x8b0 [39066.195274] [] ? security_file_permission+0x8b/0x90 [39066.195274] [] ? rw_verify_area+0xec/0x190 [39066.195274] [] do_notify_resume+0x65/0x80 [39066.195274] [] ? trace_hardirqs_on_thunk+0x3a/0x3f [39066.195274] [] int_signal+0x12/0x17 System is booted with threadirqs. Hope this helps, Ed