linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: paulmck@linux.vnet.ibm.com (Paul E. McKenney)
To: linux-arm-kernel@lists.infradead.org
Subject: rcu self-detected stall messages on OMAP3, 4 boards
Date: Wed, 12 Sep 2012 18:12:08 -0700	[thread overview]
Message-ID: <20120913011208.GT4257@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1209122237080.7275@utopia.booyaka.com>

On Wed, Sep 12, 2012 at 10:51:30PM +0000, Paul Walmsley wrote:
> 
> Hi Paul
> 
> Recently several of us have been seeing "INFO: rcu_sched self-detected
> stall on CPU { 0} (t=20611 jiffies)" stack tracebacks on various OMAP3
> and 4 board.
> 
> I only noticed it during v3.6-rc3, but I suspect it's been happening 
> for users at least since May:
> 
> http://www.mail-archive.com/linux-omap at vger.kernel.org/msg68942.html
> 
> 
> The only quasi-reproducible test case that I've found so far 
> is to boot the board with serial console enabled to a login prompt, then 
> wait for a few minutes, then send a keypress to the board via serial.
> The tracebacks I get look like this:

Interesting.  I am assuming that the interrupt in the stack below came
from idle, if not, please let me know what.

Could you please reproduce with CONFIG_RCU_CPU_STALL_INFO=y?  That would
give me a bit more information about why RCU thought that there was
a stall.  (CCing Becky Bruce, who saw something similar recently.)

Subodh Nijsure (also CCed) reported something that might be similar on
ARM, and also reported that setting the following got rid of the stalls:

	CONFIG_CPU_IDLE=y
	CONFIG_CPU_IDLE_GOV_LADDER=y
	CONFIG_CPU_IDLE_GOV_MENU=y

At which point he was happy, which was good, but which also left the
underlying problem unsolved.  Do these affect your system?  If so,
do they cause a different ARM idle loop to be executed?

							Thanx, Paul

> [  467.480712] INFO: rcu_sched self-detected stall on CPU { 0}  (t=20611 jiffies)
> [  467.484741] [<c001b7cc>] (unwind_backtrace+0x0/0xf0) from [<c00acc94>] (rcu_check_callbacks+0x180/0x630)
> [  467.489929] [<c00acc94>] (rcu_check_callbacks+0x180/0x630) from [<c0052b18>] (update_process_times+0x38/0x68)
> [  467.495361] [<c0052b18>] (update_process_times+0x38/0x68) from [<c008c04c>] (tick_sched_timer+0x80/0xec)
> [  467.500518] [<c008c04c>] (tick_sched_timer+0x80/0xec) from [<c0068544>] (__run_hrtimer+0x7c/0x1e0)
> [  467.505401] [<c0068544>] (__run_hrtimer+0x7c/0x1e0) from [<c0069328>] (hrtimer_interrupt+0x11c/0x2d0)
> [  467.510437] [<c0069328>] (hrtimer_interrupt+0x11c/0x2d0) from [<c001a04c>] (twd_handler+0x30/0x44)
> [  467.515350] [<c001a04c>] (twd_handler+0x30/0x44) from [<c00a71a0>] (handle_percpu_devid_irq+0x90/0x13c)
> [  467.520477] [<c00a71a0>] (handle_percpu_devid_irq+0x90/0x13c) from [<c00a3914>] (generic_handle_irq+0x30/0x48)
> [  467.525939] [<c00a3914>] (generic_handle_irq+0x30/0x48) from [<c0014c58>] (handle_IRQ+0x4c/0xac)
> [  467.530731] [<c0014c58>] (handle_IRQ+0x4c/0xac) from [<c0008478>] (gic_handle_irq+0x28/0x5c)
> [  467.535339] [<c0008478>] (gic_handle_irq+0x28/0x5c) from [<c04f8ce4>] (__irq_svc+0x44/0x5c)
> [  467.539886] Exception stack(0xc0729f58 to 0xc0729fa0)
> [  467.542663] 9f40:                                                       00047f2a 00000001
> [  467.547119] 9f60: 00000000 c074a940 c0728000 c07c4b08 c05045a0 c074be20 00000000 411fc092
> [  467.551574] 9f80: c074c040 00000000 00000001 c0729fa0 00047f2b c0014f50 20000113 ffffffff
> [  467.556030] [<c04f8ce4>] (__irq_svc+0x44/0x5c) from [<c0014f50>] (default_idle+0x20/0x44)
> [  467.560485] [<c0014f50>] (default_idle+0x20/0x44) from [<c001517c>] (cpu_idle+0x9c/0x114)
> [  467.564971] [<c001517c>] (cpu_idle+0x9c/0x114) from [<c06d77b0>] (start_kernel+0x2b4/0x304)
> 
> Looks like this message was added as of commit
> a858af2875fb291d0f4b0a4419fefbf03c2379c0 ("rcu: Print scheduling-clock
> information on RCU CPU stall-warning messages").
> 
> Do you have any suggestions for how we can determine what is causing
> this?
> 
> Here's an example of a kernel config that we use:
> 
> http://www.pwsan.com/omap/testlogs/am33xx_hwmod_clock_devel_3.7/20120912092510/build/omap2plus_defconfig/Kconfig
> 
> A few observations that may or may not be relevant: we use NO_HZ, and 
> we also have a clockevents timer that is relatively slow to program.
> 
> 
> regards,
> 
> - Paul
> 

  reply	other threads:[~2012-09-13  1:12 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-12 22:51 rcu self-detected stall messages on OMAP3, 4 boards Paul Walmsley
2012-09-13  1:12 ` Paul E. McKenney [this message]
2012-09-13 18:52   ` Paul Walmsley
2012-09-20  0:03     ` Paul E. McKenney
2012-09-20  7:56       ` Paul Walmsley
2012-09-20 15:03         ` Bruce, Becky
2012-09-20 21:49         ` Bruce, Becky
2012-09-20 22:01           ` Paul E. McKenney
2012-09-20 22:47             ` Paul Walmsley
2012-09-20 23:21               ` Paul E. McKenney
2012-09-21 18:08                 ` Paul Walmsley
2012-09-21 18:58                   ` Paul E. McKenney
2012-09-21 19:11                     ` Paul Walmsley
2012-09-21 19:57                       ` Paul E. McKenney
2012-09-21 20:31                         ` Tony Lindgren
2012-09-21 22:03                           ` Paul E. McKenney
2012-09-22 15:45                             ` Frederic Weisbecker
2012-09-22 16:00                               ` Paul E. McKenney
2012-09-21 22:12                         ` Paul E. McKenney
2012-09-22 18:42                         ` Paul Walmsley
2012-09-22 20:10                           ` Paul E. McKenney
2012-09-22 21:59                             ` Paul E. McKenney
2012-09-22 22:25                               ` Paul Walmsley
2012-09-22 23:11                                 ` Paul E. McKenney
2012-09-23  7:55                                   ` Paul Walmsley
2012-09-23 12:11                                     ` Paul E. McKenney
2012-09-23  1:42                                 ` Paul Walmsley
2012-09-23  1:56                                   ` Paul E. McKenney
2012-09-23  2:01                                     ` Paul Walmsley
2012-09-24  9:41                               ` Shilimkar, Santosh
2012-09-24 13:18                                 ` Paul E. McKenney
2012-10-01  8:55                               ` Linus Walleij
2012-10-01 13:28                                 ` Paul E. McKenney
2012-09-21 18:59                   ` Paul Walmsley
2012-09-21 17:47               ` Paul Walmsley
2012-09-21 17:51                 ` Paul Walmsley
2012-09-21 21:20                 ` Paul E. McKenney
2012-09-21 22:41                   ` Paul Walmsley
2012-09-22  0:05                     ` Paul E. McKenney
2012-09-22 18:16                       ` Paul Walmsley
2012-09-22 19:52                         ` Paul E. McKenney
2012-09-22 22:20                           ` Paul Walmsley
2012-09-22 23:17                             ` Paul E. McKenney
2012-09-24 21:54                               ` Paul Walmsley
2012-09-24 22:00                                 ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120913011208.GT4257@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).