From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750912Ab1LZQhn (ORCPT ); Mon, 26 Dec 2011 11:37:43 -0500 Received: from mail-qy0-f174.google.com ([209.85.216.174]:55480 "EHLO mail-qy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750706Ab1LZQhl (ORCPT ); Mon, 26 Dec 2011 11:37:41 -0500 Date: Mon, 26 Dec 2011 17:37:36 +0100 From: Frederic Weisbecker To: "Paul E. McKenney" Cc: Sasha Levin , linux-kernel Subject: Re: INFO: task rcuc/0:7 blocked for more than 120 seconds. Message-ID: <20111226163734.GF28309@somewhere.redhat.com> References: <1324901803.31721.4.camel@lappy> <20111226163148.GC2435@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20111226163148.GC2435@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 26, 2011 at 08:31:48AM -0800, Paul E. McKenney wrote: > On Mon, Dec 26, 2011 at 02:16:43PM +0200, Sasha Levin wrote: > > Hi Paul, > > > > I've recently got the following panic which was caused by khungtask: > > > > [ 1921.589512] INFO: task rcuc/0:7 blocked for more than 120 seconds. > > [ 1921.590370] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 1921.597103] rcuc/0 D ffff880012f61630 4400 7 2 0x00000000 > > [ 1921.598646] ffff880012f6b980 0000000000000086 ffff880012f6bfd8 00000000001d4740 > > [ 1921.600289] ffff880012f6bfd8 ffff880012f61630 ffff880012f6bfd8 ffff880012f6a000 > > [ 1921.601707] 00000000001d4800 ffff880012f6a000 ffff880012f6bfd8 00000000001d4800 > > [ 1921.603258] Call Trace: > > [ 1921.603703] [] schedule+0x3a/0x50 > > [ 1921.605462] [] schedule_timeout+0x255/0x4d0 > > [ 1921.606540] [] ? mark_held_locks+0x6e/0x130 > > [ 1921.607633] [] ? lock_release_holdtime+0xb2/0x160 > > [ 1921.608798] [] ? _raw_spin_unlock_irq+0x2b/0x70 > > [ 1921.610154] [] wait_for_common+0x120/0x170 > > [ 1921.617878] [] ? try_to_wake_up+0x2f0/0x2f0 > > [ 1921.618949] [] ? __call_rcu+0x3c0/0x3c0 > > [ 1921.621405] [] wait_for_completion+0x18/0x20 > > [ 1921.623622] [] wait_rcu_gp+0x59/0x80 > > [ 1921.626789] [] ? perf_trace_rcu_batch_end+0x120/0x120 > > [ 1921.629440] [] ? wait_for_common+0x44/0x170 > > [ 1921.632445] [] synchronize_rcu+0x1c/0x20 > > [ 1921.635455] [] atomic_notifier_chain_unregister+0x60/0x80 > > This called synchronize_rcu(). > > > [ 1921.638550] [] task_handoff_unregister+0x13/0x20 > > [ 1921.641271] [] task_notify_func+0x2f/0x40 > > [ 1921.643894] [] notifier_call_chain+0x67/0x110 > > [ 1921.646580] [] __atomic_notifier_call_chain+0x74/0x110 > > This called rcu_read_lock(). > > Now, calling synchronize_rcu() from within an RCU read-side critical > section is grossly illegal. This will result in either deadlock (for > preemptible RCU) or premature grace-period end and memory corruption > (for non-preemptible RCU). Don't we have debugging checks for that? I can't seem to find any. May be worth having a WARN_ON_ONCE(rcu_read_lock_held()) in synchronize_rcu().