All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Paul Walmsley <paul@pwsan.com>
Cc: "Bruce, Becky" <bbruce@ti.com>,
	"Paul E. McKenney" <paul.mckenney@linaro.org>,
	"<linux-kernel@vger.kernel.org>" <linux-kernel@vger.kernel.org>,
	"<linux-omap@vger.kernel.org>" <linux-omap@vger.kernel.org>,
	"<linux-arm-kernel@lists.infradead.org>"
	<linux-arm-kernel@lists.infradead.org>,
	"Hilman, Kevin" <khilman@ti.com>,
	"Shilimkar, Santosh" <santosh.shilimkar@ti.com>,
	"Hunter, Jon" <jon-hunter@ti.com>,
	"<snijsure@grid-net.com>" <snijsure@grid-net.com>,
	fweisbec@gmail.com
Subject: Re: rcu self-detected stall messages on OMAP3, 4 boards
Date: Fri, 21 Sep 2012 12:57:17 -0700	[thread overview]
Message-ID: <20120921195717.GD2454@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1209211900520.8839@utopia.booyaka.com>

On Fri, Sep 21, 2012 at 07:11:14PM +0000, Paul Walmsley wrote:
> On Fri, 21 Sep 2012, Paul E. McKenney wrote:
> 
> > On Fri, Sep 21, 2012 at 06:08:59PM +0000, Paul Walmsley wrote:
> > 
> > > As far as I know, our only idle entry point is in 
> > > arch/arm/common/process.c:cpu_idle().
> > 
> > In mainline, this is arch/arm/kernel/process.c, correct?
> 
> Indeed; sorry about that, mistyped.

No problem!

> > > Looking at the x86 idle entry, they call rcu_idle_{enter,exit}() inside 
> > > {stop,start}_critical_timings().  Making that change here didn't help.
> > 
> > The reason x86 does this is that they have idle notifiers deeper in the
> > idle loop that use RCU read-side critical sections.  So this was an
> > expected result.
> 
> OK
> 
> > > Also tried commenting out the code from the stop_critical_timings() call 
> > > to the WARN_ON(irqs_disabled()), and adding a local_irq_enable().  That 
> > > also didn't help, which suggests that the problem is not caused by the 
> > > OMAP-specific PM idle code.
> > 
> > I must admit that you make a convincing case here.  Though it does leave
> > me wondering what is different about Panda (and MX28, IIRC).
> 
> Given the dependency on CONFIG_NO_HZ, the stalls are probably dependent on 
> the userspace in use.  The userspaces here are quite minimal and so allow 
> the system to stay idle for relatively long periods of time.

Could you please point me to a recipe for creating a minimal userspace?
Just in case it is the userspac erather than the architecture/hardware
that makes the difference.

> > I may take your advice of remote access to a Panda board, though that
> > is likely to take a bit of time due to timezones.  Regardless of the
> > underlying issue here, I clearly need to make the stall-warning messages
> > do a better job of printing out needed information.
> 
> If you've got a patch in mind for that, I'll boot it here.

Hammering it out, will send it along when it is a bit less destructive.  ;-)

> One other observation.  omap2plus_defconfig sets CONFIG_NO_HZ=y but 
> doesn't set CONFIG_RCU_FAST_NO_HZ.  The stall warning messages still 
> appear when CONFIG_RCU_FAST_NO_HZ=y.  One of them is attached below (with 
> CONFIG_RCU_CPU_STALL_INFO set as well, obviously).

Just to make sure I understand the combinations:

o	All stalls have happened when running a minimal userspace.
o	CONFIG_NO_HZ=n suppresses the stalls.
o	CONFIG_RCU_FAST_NO_HZ (which depends on CONFIG_NO_HZ=y) has
	no observable effect on the stalls.

Did I get that right, or am I missing a combination?

> As an aside, in the CONFIG_RCU_FAST_NO_HZ=y build, I dropped a printk() 
> into rcu_idle_gp_timer_func() and it doesn't look like it ever executed.

Indeed, rcu_idle_gp_timer_func() is a bit strange in that it is cancelled
upon exit from idle, and therefore should (almost) never actually execute.
Its sole purpose is to wake up the CPU.  ;-)

							Thanx, Paul

> - Paul
> 
> [  305.832000] INFO: rcu_sched self-detected stall on CPU
> [  305.834838]  1: (2 GPs behind) idle=5b1/1/0 drain=0 . timer=4294967295
> [  305.838378]   (t=17463 jiffies)
> [  305.840118] [<c001be10>] (unwind_backtrace+0x0/0xf0) from [<c00ad65c>] (rcu_pending+0xd0/0x540)
> [  305.844848] [<c00ad65c>] (rcu_pending+0xd0/0x540) from [<c00ae5cc>] (rcu_check_callbacks+0x110/0x198)
> [  305.849884] [<c00ae5cc>] (rcu_check_callbacks+0x110/0x198) from [<c0053800>] (update_process_times+0x38/0x68)
> [  305.855285] [<c0053800>] (update_process_times+0x38/0x68) from [<c008cf40>] (tick_sched_timer+0x80/0xec)
> [  305.860473] [<c008cf40>] (tick_sched_timer+0x80/0xec) from [<c006942c>] (__run_hrtimer+0x7c/0x1e0)
> [  305.865356] [<c006942c>] (__run_hrtimer+0x7c/0x1e0) from [<c006a210>] (hrtimer_interrupt+0x11c/0x2d0)
> [  305.870361] [<c006a210>] (hrtimer_interrupt+0x11c/0x2d0) from [<c001a54c>] (twd_handler+0x30/0x44)
> [  305.875274] [<c001a54c>] (twd_handler+0x30/0x44) from [<c00a8128>] (handle_percpu_devid_irq+0x90/0x13c)
> [  305.880371] [<c00a8128>] (handle_percpu_devid_irq+0x90/0x13c) from [<c00a489c>] (generic_handle_irq+0x30/0x48)
> [  305.885833] [<c00a489c>] (generic_handle_irq+0x30/0x48) from [<c0014fb8>] (handle_IRQ+0x4c/0xac)
> [  305.890624] [<c0014fb8>] (handle_IRQ+0x4c/0xac) from [<c000864c>] (gic_handle_irq+0x28/0x5c)
> [  305.895233] [<c000864c>] (gic_handle_irq+0x28/0x5c) from [<c04fbc64>] (__irq_svc+0x44/0x5c)
> [  305.899780] Exception stack(0xde86ff88 to 0xde86ffd0)
> [  305.902526] ff80:                   0004c062 00000001 00000000 de8660c0 de86e000 c07c42c8
> [  305.906982] ffa0: c05075a0 c074bdd0 00000000 411fc092 c074bff0 00000000 00000001 de86ffd0
> [  305.911437] ffc0: 0004c063 c00152b0 20000113 ffffffff
> [  305.914184] [<c04fbc64>] (__irq_svc+0x44/0x5c) from [<c00152b0>] (default_idle+0x20/0x44)
> [  305.918640] [<c00152b0>] (default_idle+0x20/0x44) from [<c00154dc>] (cpu_idle+0x9c/0x114)
> [  305.923126] [<c00154dc>] (cpu_idle+0x9c/0x114) from [<804f4a34>] (0x804f4a34)
> 

WARNING: multiple messages have this Message-ID (diff)
From: paulmck@linux.vnet.ibm.com (Paul E. McKenney)
To: linux-arm-kernel@lists.infradead.org
Subject: rcu self-detected stall messages on OMAP3, 4 boards
Date: Fri, 21 Sep 2012 12:57:17 -0700	[thread overview]
Message-ID: <20120921195717.GD2454@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1209211900520.8839@utopia.booyaka.com>

On Fri, Sep 21, 2012 at 07:11:14PM +0000, Paul Walmsley wrote:
> On Fri, 21 Sep 2012, Paul E. McKenney wrote:
> 
> > On Fri, Sep 21, 2012 at 06:08:59PM +0000, Paul Walmsley wrote:
> > 
> > > As far as I know, our only idle entry point is in 
> > > arch/arm/common/process.c:cpu_idle().
> > 
> > In mainline, this is arch/arm/kernel/process.c, correct?
> 
> Indeed; sorry about that, mistyped.

No problem!

> > > Looking at the x86 idle entry, they call rcu_idle_{enter,exit}() inside 
> > > {stop,start}_critical_timings().  Making that change here didn't help.
> > 
> > The reason x86 does this is that they have idle notifiers deeper in the
> > idle loop that use RCU read-side critical sections.  So this was an
> > expected result.
> 
> OK
> 
> > > Also tried commenting out the code from the stop_critical_timings() call 
> > > to the WARN_ON(irqs_disabled()), and adding a local_irq_enable().  That 
> > > also didn't help, which suggests that the problem is not caused by the 
> > > OMAP-specific PM idle code.
> > 
> > I must admit that you make a convincing case here.  Though it does leave
> > me wondering what is different about Panda (and MX28, IIRC).
> 
> Given the dependency on CONFIG_NO_HZ, the stalls are probably dependent on 
> the userspace in use.  The userspaces here are quite minimal and so allow 
> the system to stay idle for relatively long periods of time.

Could you please point me to a recipe for creating a minimal userspace?
Just in case it is the userspac erather than the architecture/hardware
that makes the difference.

> > I may take your advice of remote access to a Panda board, though that
> > is likely to take a bit of time due to timezones.  Regardless of the
> > underlying issue here, I clearly need to make the stall-warning messages
> > do a better job of printing out needed information.
> 
> If you've got a patch in mind for that, I'll boot it here.

Hammering it out, will send it along when it is a bit less destructive.  ;-)

> One other observation.  omap2plus_defconfig sets CONFIG_NO_HZ=y but 
> doesn't set CONFIG_RCU_FAST_NO_HZ.  The stall warning messages still 
> appear when CONFIG_RCU_FAST_NO_HZ=y.  One of them is attached below (with 
> CONFIG_RCU_CPU_STALL_INFO set as well, obviously).

Just to make sure I understand the combinations:

o	All stalls have happened when running a minimal userspace.
o	CONFIG_NO_HZ=n suppresses the stalls.
o	CONFIG_RCU_FAST_NO_HZ (which depends on CONFIG_NO_HZ=y) has
	no observable effect on the stalls.

Did I get that right, or am I missing a combination?

> As an aside, in the CONFIG_RCU_FAST_NO_HZ=y build, I dropped a printk() 
> into rcu_idle_gp_timer_func() and it doesn't look like it ever executed.

Indeed, rcu_idle_gp_timer_func() is a bit strange in that it is cancelled
upon exit from idle, and therefore should (almost) never actually execute.
Its sole purpose is to wake up the CPU.  ;-)

							Thanx, Paul

> - Paul
> 
> [  305.832000] INFO: rcu_sched self-detected stall on CPU
> [  305.834838]  1: (2 GPs behind) idle=5b1/1/0 drain=0 . timer=4294967295
> [  305.838378]   (t=17463 jiffies)
> [  305.840118] [<c001be10>] (unwind_backtrace+0x0/0xf0) from [<c00ad65c>] (rcu_pending+0xd0/0x540)
> [  305.844848] [<c00ad65c>] (rcu_pending+0xd0/0x540) from [<c00ae5cc>] (rcu_check_callbacks+0x110/0x198)
> [  305.849884] [<c00ae5cc>] (rcu_check_callbacks+0x110/0x198) from [<c0053800>] (update_process_times+0x38/0x68)
> [  305.855285] [<c0053800>] (update_process_times+0x38/0x68) from [<c008cf40>] (tick_sched_timer+0x80/0xec)
> [  305.860473] [<c008cf40>] (tick_sched_timer+0x80/0xec) from [<c006942c>] (__run_hrtimer+0x7c/0x1e0)
> [  305.865356] [<c006942c>] (__run_hrtimer+0x7c/0x1e0) from [<c006a210>] (hrtimer_interrupt+0x11c/0x2d0)
> [  305.870361] [<c006a210>] (hrtimer_interrupt+0x11c/0x2d0) from [<c001a54c>] (twd_handler+0x30/0x44)
> [  305.875274] [<c001a54c>] (twd_handler+0x30/0x44) from [<c00a8128>] (handle_percpu_devid_irq+0x90/0x13c)
> [  305.880371] [<c00a8128>] (handle_percpu_devid_irq+0x90/0x13c) from [<c00a489c>] (generic_handle_irq+0x30/0x48)
> [  305.885833] [<c00a489c>] (generic_handle_irq+0x30/0x48) from [<c0014fb8>] (handle_IRQ+0x4c/0xac)
> [  305.890624] [<c0014fb8>] (handle_IRQ+0x4c/0xac) from [<c000864c>] (gic_handle_irq+0x28/0x5c)
> [  305.895233] [<c000864c>] (gic_handle_irq+0x28/0x5c) from [<c04fbc64>] (__irq_svc+0x44/0x5c)
> [  305.899780] Exception stack(0xde86ff88 to 0xde86ffd0)
> [  305.902526] ff80:                   0004c062 00000001 00000000 de8660c0 de86e000 c07c42c8
> [  305.906982] ffa0: c05075a0 c074bdd0 00000000 411fc092 c074bff0 00000000 00000001 de86ffd0
> [  305.911437] ffc0: 0004c063 c00152b0 20000113 ffffffff
> [  305.914184] [<c04fbc64>] (__irq_svc+0x44/0x5c) from [<c00152b0>] (default_idle+0x20/0x44)
> [  305.918640] [<c00152b0>] (default_idle+0x20/0x44) from [<c00154dc>] (cpu_idle+0x9c/0x114)
> [  305.923126] [<c00154dc>] (cpu_idle+0x9c/0x114) from [<804f4a34>] (0x804f4a34)
> 

  reply	other threads:[~2012-09-21 19:57 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-12 22:51 rcu self-detected stall messages on OMAP3, 4 boards Paul Walmsley
2012-09-12 22:51 ` Paul Walmsley
2012-09-13  1:12 ` Paul E. McKenney
2012-09-13  1:12   ` Paul E. McKenney
2012-09-13 18:52   ` Paul Walmsley
2012-09-13 18:52     ` Paul Walmsley
2012-09-20  0:03     ` Paul E. McKenney
2012-09-20  0:03       ` Paul E. McKenney
2012-09-20  0:03       ` Paul E. McKenney
2012-09-20  7:56       ` Paul Walmsley
2012-09-20  7:56         ` Paul Walmsley
2012-09-20 15:03         ` Bruce, Becky
2012-09-20 15:03           ` Bruce, Becky
2012-09-20 21:49         ` Bruce, Becky
2012-09-20 21:49           ` Bruce, Becky
2012-09-20 22:01           ` Paul E. McKenney
2012-09-20 22:01             ` Paul E. McKenney
2012-09-20 22:01             ` Paul E. McKenney
2012-09-20 22:47             ` Paul Walmsley
2012-09-20 22:47               ` Paul Walmsley
2012-09-20 23:21               ` Paul E. McKenney
2012-09-20 23:21                 ` Paul E. McKenney
2012-09-20 23:21                 ` Paul E. McKenney
2012-09-21 18:08                 ` Paul Walmsley
2012-09-21 18:08                   ` Paul Walmsley
2012-09-21 18:58                   ` Paul E. McKenney
2012-09-21 18:58                     ` Paul E. McKenney
2012-09-21 19:11                     ` Paul Walmsley
2012-09-21 19:11                       ` Paul Walmsley
2012-09-21 19:57                       ` Paul E. McKenney [this message]
2012-09-21 19:57                         ` Paul E. McKenney
2012-09-21 20:31                         ` Tony Lindgren
2012-09-21 20:31                           ` Tony Lindgren
2012-09-21 22:03                           ` Paul E. McKenney
2012-09-21 22:03                             ` Paul E. McKenney
2012-09-22 15:45                             ` Frederic Weisbecker
2012-09-22 15:45                               ` Frederic Weisbecker
2012-09-22 16:00                               ` Paul E. McKenney
2012-09-22 16:00                                 ` Paul E. McKenney
2012-09-21 22:12                         ` Paul E. McKenney
2012-09-21 22:12                           ` Paul E. McKenney
2012-09-22 18:42                         ` Paul Walmsley
2012-09-22 18:42                           ` Paul Walmsley
2012-09-22 20:10                           ` Paul E. McKenney
2012-09-22 20:10                             ` Paul E. McKenney
2012-09-22 21:59                             ` Paul E. McKenney
2012-09-22 21:59                               ` Paul E. McKenney
2012-09-22 22:25                               ` Paul Walmsley
2012-09-22 22:25                                 ` Paul Walmsley
2012-09-22 23:11                                 ` Paul E. McKenney
2012-09-22 23:11                                   ` Paul E. McKenney
2012-09-22 23:11                                   ` Paul E. McKenney
2012-09-23  7:55                                   ` Paul Walmsley
2012-09-23  7:55                                     ` Paul Walmsley
2012-09-23  7:55                                     ` Paul Walmsley
2012-09-23 12:11                                     ` Paul E. McKenney
2012-09-23 12:11                                       ` Paul E. McKenney
2012-09-23 12:11                                       ` Paul E. McKenney
2012-09-23  1:42                                 ` Paul Walmsley
2012-09-23  1:42                                   ` Paul Walmsley
2012-09-23  1:56                                   ` Paul E. McKenney
2012-09-23  1:56                                     ` Paul E. McKenney
2012-09-23  1:56                                     ` Paul E. McKenney
2012-09-23  2:01                                     ` Paul Walmsley
2012-09-23  2:01                                       ` Paul Walmsley
2012-09-24  9:41                               ` Shilimkar, Santosh
2012-09-24  9:41                                 ` Shilimkar, Santosh
2012-09-24 13:18                                 ` Paul E. McKenney
2012-09-24 13:18                                   ` Paul E. McKenney
2012-10-01  8:55                               ` Linus Walleij
2012-10-01  8:55                                 ` Linus Walleij
2012-10-01 13:28                                 ` Paul E. McKenney
2012-10-01 13:28                                   ` Paul E. McKenney
2012-09-21 18:59                   ` Paul Walmsley
2012-09-21 18:59                     ` Paul Walmsley
2012-09-21 17:47               ` Paul Walmsley
2012-09-21 17:47                 ` Paul Walmsley
2012-09-21 17:51                 ` Paul Walmsley
2012-09-21 17:51                   ` Paul Walmsley
2012-09-21 21:20                 ` Paul E. McKenney
2012-09-21 21:20                   ` Paul E. McKenney
2012-09-21 21:20                   ` Paul E. McKenney
2012-09-21 22:41                   ` Paul Walmsley
2012-09-21 22:41                     ` Paul Walmsley
2012-09-22  0:05                     ` Paul E. McKenney
2012-09-22  0:05                       ` Paul E. McKenney
2012-09-22 18:16                       ` Paul Walmsley
2012-09-22 18:16                         ` Paul Walmsley
2012-09-22 18:16                         ` Paul Walmsley
2012-09-22 19:52                         ` Paul E. McKenney
2012-09-22 19:52                           ` Paul E. McKenney
2012-09-22 19:52                           ` Paul E. McKenney
2012-09-22 22:20                           ` Paul Walmsley
2012-09-22 22:20                             ` Paul Walmsley
2012-09-22 22:20                             ` Paul Walmsley
2012-09-22 23:17                             ` Paul E. McKenney
2012-09-22 23:17                               ` Paul E. McKenney
2012-09-24 21:54                               ` Paul Walmsley
2012-09-24 21:54                                 ` Paul Walmsley
2012-09-24 22:00                                 ` Paul E. McKenney
2012-09-24 22:00                                   ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120921195717.GD2454@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=bbruce@ti.com \
    --cc=fweisbec@gmail.com \
    --cc=jon-hunter@ti.com \
    --cc=khilman@ti.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-omap@vger.kernel.org \
    --cc=paul.mckenney@linaro.org \
    --cc=paul@pwsan.com \
    --cc=santosh.shilimkar@ti.com \
    --cc=snijsure@grid-net.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.