From mboxrd@z Thu Jan 1 00:00:00 1970 From: paulmck@linux.vnet.ibm.com (Paul E. McKenney) Date: Sat, 22 Sep 2012 12:52:53 -0700 Subject: rcu self-detected stall messages on OMAP3, 4 boards In-Reply-To: References: <20120920000351.GI2455@linux.vnet.ibm.com> <20120920220130.GN2449@linux.vnet.ibm.com> <20120921212054.GE2454@linux.vnet.ibm.com> <20120922000537.GH2454@linux.vnet.ibm.com> Message-ID: <20120922195253.GD2934@linux.vnet.ibm.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Sat, Sep 22, 2012 at 06:16:15PM +0000, Paul Walmsley wrote: > Hi Paul > > On Fri, 21 Sep 2012, Paul E. McKenney wrote: > > > I am wondering if your system somehow figured out how to start a grace > > period that had no RCU callbacks waiting for it. If that happened, > > then a CONFIG_NO_HZ=y system could in theory get into a state where all > > CPUs are in dyntick-idle mode, so that none of them is doing anything > > to force the grace period to complete. > > > > That should be easy to diagnose, anyway. Please see below, which > > includes the earlier diagnostic patch. > > Here you go. > > - Paul > > [ 248.902618] INFO: rcu_sched self-detected stall on CPU > [ 248.905456] 0: (1 ticks this GP) idle=933/1/0 > [ 248.907897] (t=26570 jiffies g=11 c=10 q=0) Bingo!!! (q=0, in case you were wondering. And thank you for testing this!) Strangely enough, I believe that I have inadvertently fixed this in my -rcu tree: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/next Nevertheless, if you get a chance to try it, I would be interested to hear if my guess is correct. The trick is that a kthread drives the grace period in -rcu, regardless of whether or not there are callbacks. However, the backport would not be something that -stable would be happy with, so I will be putting together a fix for mainline. This thing has been in the kernel since about 2004, not sure why you didn't hit it earlier. Thanx, Paul > [ 248.910339] [] (unwind_backtrace+0x0/0xf0) from [] (rcu_check_callbacks+0x220/0x714) > [ 248.915527] [] (rcu_check_callbacks+0x220/0x714) from [] (update_process_times+0x38/0x68) > [ 248.920928] [] (update_process_times+0x38/0x68) from [] (tick_sched_timer+0x80/0xec) > [ 248.926116] [] (tick_sched_timer+0x80/0xec) from [] (__run_hrtimer+0x7c/0x1e0) > [ 248.930999] [] (__run_hrtimer+0x7c/0x1e0) from [] (hrtimer_interrupt+0x11c/0x2d0) > [ 248.936035] [] (hrtimer_interrupt+0x11c/0x2d0) from [] (twd_handler+0x30/0x44) > [ 248.940948] [] (twd_handler+0x30/0x44) from [] (handle_percpu_devid_irq+0x90/0x13c) > [ 248.946075] [] (handle_percpu_devid_irq+0x90/0x13c) from [] (generic_handle_irq+0x30/0x48) > [ 248.951538] [] (generic_handle_irq+0x30/0x48) from [] (handle_IRQ+0x4c/0xac) > [ 248.956329] [] (handle_IRQ+0x4c/0xac) from [] (gic_handle_irq+0x28/0x5c) > [ 248.960937] [] (gic_handle_irq+0x28/0x5c) from [] (__irq_svc+0x44/0x5c) > [ 248.965484] Exception stack(0xc0729f58 to 0xc0729fa0) > [ 248.968231] 9f40: 0003b832 00000001 > [ 248.972686] 9f60: 00000000 c074a8e8 c0728000 c07c42c8 c05065a0 c074bdc8 00000000 411fc092 > [ 248.977142] 9f80: c074bfe8 00000000 00000001 c0729fa0 0003b833 c0015130 20000113 ffffffff > [ 248.981597] [] (__irq_svc+0x44/0x5c) from [] (default_idle+0x20/0x44) > [ 248.986083] [] (default_idle+0x20/0x44) from [] (cpu_idle+0x9c/0x114) > [ 248.990539] [] (cpu_idle+0x9c/0x114) from [] (start_kernel+0x2b4/0x304) >