Re: Regression in RCU subsystem in latest mainline kernel

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Michael Ellerman <michael@ellerman.id.au>
Cc: Rojhalat Ibrahim <imr@rtschenk.de>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	linux-kernel@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: Regression in RCU subsystem in latest mainline kernel
Date: Wed, 26 Jun 2013 07:16:17 -0700	[thread overview]
Message-ID: <20130626141617.GJ3828@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130626081057.GB10796@concordia>

On Wed, Jun 26, 2013 at 06:10:58PM +1000, Michael Ellerman wrote:
> On Tue, Jun 25, 2013 at 09:03:32AM -0700, Paul E. McKenney wrote:
> > On Tue, Jun 25, 2013 at 05:44:23PM +1000, Michael Ellerman wrote:
> > > On Tue, Jun 25, 2013 at 05:19:14PM +1000, Michael Ellerman wrote:
> > > > 
> > > > Here's another trace from 3.10-rc7 plus a few local patches.
> > > 
> > > And here's another with CONFIG_RCU_CPU_STALL_INFO=y in case that's useful:
> > > 
> > > PASS running test_pmc5_6_overuse()
> > > INFO: rcu_sched self-detected stall on CPU
> > > 	8: (1 GPs behind) idle=8eb/140000000000002/0 softirq=215/220 
> > 
> > So this CPU has been out of action since before the beginning of the
> > current grace period ("1 GPs behind").  It is not idle, having taken
> > a pair of nested interrupts from process context (matching the stack
> > below).  This CPU has take five softirqs since the last grace period
> > that it noticed, which makes it likely that the loop is within the
> > softirq handler.
> > 
> > > 	 (t=2100 jiffies g=18446744073709551583 c=18446744073709551582 q=13)
> > 
> > Assuming HZ=100, this stall has been going on  for 21 seconds.  There
> > is a grace period in progress according to RCU's global state (which
> > this CPU is not yet aware of).  There are a total of 13 RCU callbacks
> > queued across the entire system.
> > 
> > If the system is at all responsive, I suggest using ftrace (either from
> > the boot command line or at runtime) to trace __do_softirq() and
> > hrtimer_interrupt().
> 
> Thanks for decoding it Paul.
> 
> I've narrowed down the test case and I think this is probably just a
> case of too many perf interrupts. If I reduce the sampling period by
> half the test runs fine.
> 
> There is logic in perf to detect an interrupt storm, but for some reason
> it's not saving us. I'll dig in there, but I don't think it's an RCU
> problem.

Whew!  ;-)

							Thanx, Paul

WARNING: multiple messages have this Message-ID (diff)

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Michael Ellerman <michael@ellerman.id.au>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Rojhalat Ibrahim <imr@rtschenk.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	linux-kernel@vger.kernel.org
Subject: Re: Regression in RCU subsystem in latest mainline kernel
Date: Wed, 26 Jun 2013 07:16:17 -0700	[thread overview]
Message-ID: <20130626141617.GJ3828@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130626081057.GB10796@concordia>

On Wed, Jun 26, 2013 at 06:10:58PM +1000, Michael Ellerman wrote:
> On Tue, Jun 25, 2013 at 09:03:32AM -0700, Paul E. McKenney wrote:
> > On Tue, Jun 25, 2013 at 05:44:23PM +1000, Michael Ellerman wrote:
> > > On Tue, Jun 25, 2013 at 05:19:14PM +1000, Michael Ellerman wrote:
> > > > 
> > > > Here's another trace from 3.10-rc7 plus a few local patches.
> > > 
> > > And here's another with CONFIG_RCU_CPU_STALL_INFO=y in case that's useful:
> > > 
> > > PASS running test_pmc5_6_overuse()
> > > INFO: rcu_sched self-detected stall on CPU
> > > 	8: (1 GPs behind) idle=8eb/140000000000002/0 softirq=215/220 
> > 
> > So this CPU has been out of action since before the beginning of the
> > current grace period ("1 GPs behind").  It is not idle, having taken
> > a pair of nested interrupts from process context (matching the stack
> > below).  This CPU has take five softirqs since the last grace period
> > that it noticed, which makes it likely that the loop is within the
> > softirq handler.
> > 
> > > 	 (t=2100 jiffies g=18446744073709551583 c=18446744073709551582 q=13)
> > 
> > Assuming HZ=100, this stall has been going on  for 21 seconds.  There
> > is a grace period in progress according to RCU's global state (which
> > this CPU is not yet aware of).  There are a total of 13 RCU callbacks
> > queued across the entire system.
> > 
> > If the system is at all responsive, I suggest using ftrace (either from
> > the boot command line or at runtime) to trace __do_softirq() and
> > hrtimer_interrupt().
> 
> Thanks for decoding it Paul.
> 
> I've narrowed down the test case and I think this is probably just a
> case of too many perf interrupts. If I reduce the sampling period by
> half the test runs fine.
> 
> There is logic in perf to detect an interrupt storm, but for some reason
> it's not saving us. I'll dig in there, but I don't think it's an RCU
> problem.

Whew!  ;-)

							Thanx, Paul

next prev parent reply	other threads:[~2013-06-26 14:22 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-14 11:47 Regression in RCU subsystem in latest mainline kernel Rojhalat Ibrahim
2013-06-14 12:28 ` Paul E. McKenney
2013-06-14 12:46   ` Rojhalat Ibrahim
2013-06-14 21:06     ` Steven Rostedt
2013-06-14 21:06       ` Steven Rostedt
2013-06-15  2:02       ` Benjamin Herrenschmidt
2013-06-15  2:02         ` Benjamin Herrenschmidt
2013-06-15  2:17         ` Steven Rostedt
2013-06-15  2:17           ` Steven Rostedt
2013-06-15  2:21           ` Benjamin Herrenschmidt
2013-06-15  2:21             ` Benjamin Herrenschmidt
2013-06-15  2:31             ` Steven Rostedt
2013-06-15  2:31               ` Steven Rostedt
2013-06-15  2:51               ` Paul E. McKenney
2013-06-15  2:51                 ` Paul E. McKenney
2013-06-17 13:21           ` Rojhalat Ibrahim
2013-06-17 13:21             ` Rojhalat Ibrahim
2013-06-17 13:51             ` Steven Rostedt
2013-06-17 13:51               ` Steven Rostedt
2013-06-17  7:42         ` Michael Ellerman
2013-06-17  7:42           ` Michael Ellerman
2013-06-19  4:09           ` Paul E. McKenney
2013-06-19  4:09             ` Paul E. McKenney
2013-06-25  7:19             ` Michael Ellerman
2013-06-25  7:19               ` Michael Ellerman
2013-06-25  7:36               ` Benjamin Herrenschmidt
2013-06-25  7:36                 ` Benjamin Herrenschmidt
2013-06-25  7:44               ` Michael Ellerman
2013-06-25  7:44                 ` Michael Ellerman
2013-06-25 16:03                 ` Paul E. McKenney
2013-06-25 16:03                   ` Paul E. McKenney
2013-06-26  8:10                   ` Michael Ellerman
2013-06-26  8:10                     ` Michael Ellerman
2013-06-26 14:16                     ` Paul E. McKenney [this message]
2013-06-26 14:16                       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130626141617.GJ3828@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=imr@rtschenk.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=michael@ellerman.id.au \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.