linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	linux-arm-kernel@lists.infradead.org, linuxarm@huawei.com,
	Andrew Morton <akpm@linux-foundation.org>,
	Abdul Haleem <abdhalee@linux.vnet.ibm.com>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Don Zickus <dzickus@redhat.com>,
	David Miller <davem@davemloft.net>,
	sparclinux@vger.kernel.org,
	Stephen Rothwell <sfr@canb.auug.org.au>
Subject: Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n  - any one else seeing this?
Date: Tue, 25 Jul 2017 06:46:26 -0700	[thread overview]
Message-ID: <20170725134626.GL3730@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170725222654.5a968588@roar.ozlabs.ibm.com>

On Tue, Jul 25, 2017 at 10:26:54PM +1000, Nicholas Piggin wrote:
> On Tue, 25 Jul 2017 19:32:10 +0800
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > Hi All,
> > 
> > We observed a regression on our d05 boards (but curiously not
> > the fairly similar but single socket / smaller core count
> > d03), initially seen with linux-next prior to the merge window
> > and still present in v4.13-rc2.
> > 
> > The symptom is:

Adding Dave Miller and the sparclinux@vger.kernel.org email on CC, as
they have been seeing something similar, and you might well have saved
them the trouble of bisecting.

[ . . . ]

> > [ 1984.628602] rcu_preempt kthread starved for 5663 jiffies! g1566 c1565 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1

This is the cause from an RCU perspective.  You had a lot of idle CPUs,
and RCU is not permitted to disturb them -- the battery-powered embedded
guys get very annoyed by that sort of thing.  What happens instead is
that each CPU updates a per-CPU state variable when entering or exiting
idle, and the grace-period kthread ("rcu_preempt kthread" in the above
message) checks these state variables, and if when sees an idle CPU,
it reports a quiescent state on that CPU's behalf.

But the grace-period kthread can only do this work if it gets a chance
to run.  And the message above says that this kthread hasn't had a chance
to run for a full 5,663 jiffies.  For completeness, the "g1566 c1565"
says that grace period #1566 is in progress, the "f0x0" says that no one
is needing another grace period #1567.  The "RCU_GP_WAIT_FQS(3)" says
that the grace-period kthread has fully initialized the current grace
period and is sleeping for a few jiffies waiting to scan for idle tasks.
Finally, the "->state=0x1" says that the grace-period kthread is in
TASK_INTERRUPTIBLE state, in other words, still sleeping.

So my first question is "What did commit 05a4a9527 (kernel/watchdog:
split up config options) do to prevent the grace-period kthread from
getting a chance to run?"  I must confess that I don't see anything
obvious in that commit, so my second question is "Are we sure that
reverting this commit makes the problem go away?" and my third is "Is
this an intermittent problem that led to a false bisection?"

[ . . . ]

> > Reducing the RCU CPU stall timeout makes it happen more often,
> > but we are seeing even with the default value of 24 seconds.
> > 
> > Tends to occur after a period or relatively low usage, but has
> > also been seen mid way through performance tests.
> > 
> > This was not seen with v4.12 so a bisection run later lead to
> > commit 05a4a9527 (kernel/watchdog: split up config options).
> > 
> > Which was odd until we discovered that a side effect of this patch
> > was to change whether the softlockup detector was enabled or not in
> > the arm64 defconfig.
> > 
> > On 4.13-rc2 enabling the softlockup detector indeed stopped us
> > seeing the rcu issue. Disabling the equivalent on 4.12 made the
> > issue occur there as well.
> > 
> > Clearly the softlockup detector results in a thread on every cpu,
> > which might be related but beyond that we are still looking into
> > the issue.
> > 
> > So the obvious question is whether anyone else is seeing this as
> > it might help us to focus in on where to look!
> 
> Huh. Something similar has been seen very intermittently on powerpc
> as well. We couldn't reproduce it reliably to bisect it already, so
> this is a good help.
> 
> http://marc.info/?l=linuxppc-embedded&m=149872815523646&w=2
> 
> It looks like the watchdog patch has a similar effect on powerpc in
> that it stops enabling the softlockup detector by default. Haven't
> confirmed, but it looks like the same thing.
> 
> A bug in RCU stall detection?

Well, if I am expected to make grace periods complete when my grace-period
kthreads aren't getting any CPU time, I will have to make some substantial
changes.  ;-)

One possibility is that the timer isn't firing and another is that the
timer's wakeup is being lost somehow.

So another thing to try is to boot with rcutree.rcu_kick_kthreads=1.
This will cause RCU to do redundant wakeups on the grace-period kthread
if the grace period is moving slowly.  This is of course a crude hack,
which is why this boot parameter will also cause a splat if it ever has
to do anything.

Does this help at all?

							Thanx, Paul

> > In the meantime we'll carry on digging.
> > 
> > Thanks,
> > 
> > Jonathan
> > 
> > p.s. As a more general question, do we want to have the
> > soft lockup detector enabledon arm64 by default?
> 
> I've cc'ed Don. My patch should not have changed defconfigs, I
> should have been more careful with that.
> 
> Thanks,
> Nick
> 

  reply	other threads:[~2017-07-25 13:46 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20170725193039.00007c80@huawei.com>
2017-07-25 12:26 ` RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this? Nicholas Piggin
2017-07-25 13:46   ` Paul E. McKenney [this message]
2017-07-25 14:42     ` Jonathan Cameron
2017-07-25 15:12       ` Paul E. McKenney
2017-07-25 16:52         ` Jonathan Cameron
2017-07-25 21:10           ` David Miller
2017-07-26  3:55             ` Paul E. McKenney
2017-07-26  4:02               ` David Miller
2017-07-26  4:12                 ` Paul E. McKenney
2017-07-26  8:16                   ` Jonathan Cameron
2017-07-26  9:32                     ` Jonathan Cameron
2017-07-26 12:28                       ` Jonathan Cameron
2017-07-26 12:49                         ` Jonathan Cameron
2017-07-26 14:14                         ` Paul E. McKenney
2017-07-26 14:23                           ` Jonathan Cameron
2017-07-26 15:33                             ` Jonathan Cameron
2017-07-26 15:49                               ` Paul E. McKenney
2017-07-26 16:54                                 ` David Miller
2017-07-26 17:13                                   ` Jonathan Cameron
2017-07-27  7:41                                     ` Jonathan Cameron
2017-07-26 17:50                                   ` Paul E. McKenney
2017-07-26 22:36                                     ` Paul E. McKenney
2017-07-26 22:45                                       ` David Miller
2017-07-26 23:15                                         ` Paul E. McKenney
2017-07-26 23:22                                           ` David Miller
2017-07-27  1:42                                             ` Paul E. McKenney
2017-07-27  4:34                                               ` Nicholas Piggin
2017-07-27 12:49                                                 ` Paul E. McKenney
2017-07-27 13:49                                                   ` Jonathan Cameron
2017-07-27 16:39                                                     ` Jonathan Cameron
2017-07-27 16:52                                                       ` Paul E. McKenney
2017-07-28  7:44                                                         ` Jonathan Cameron
2017-07-28 12:54                                                           ` Boqun Feng
2017-07-28 13:13                                                             ` Jonathan Cameron
2017-07-28 14:55                                                             ` Paul E. McKenney
2017-07-28 18:41                                                               ` Paul E. McKenney
2017-07-28 19:09                                                                 ` Paul E. McKenney
2017-07-30 13:37                                                                   ` Boqun Feng
2017-07-30 16:59                                                                     ` Paul E. McKenney
2017-07-29  1:20                                                                 ` Boqun Feng
2017-07-28 18:42                                                             ` David Miller
2017-07-28 13:08                                                           ` Jonathan Cameron
2017-07-28 13:24                                                           ` Jonathan Cameron
     [not found]                                                             ` <20170728165529.GF3730@linux.vnet.ibm.com>
2017-07-28 17:27                                                               ` Jonathan Cameron
2017-07-28 19:03                                                                 ` Paul E. McKenney
2017-07-31 11:08                                                                   ` Jonathan Cameron
2017-07-31 15:04                                                                     ` Paul E. McKenney
2017-07-31 15:27                                                                       ` Jonathan Cameron
2017-08-01 18:46                                                                         ` Paul E. McKenney
2017-08-02 16:25                                                                           ` Jonathan Cameron
2017-08-15 15:47                                                                             ` Paul E. McKenney
2017-08-16  1:24                                                                               ` Jonathan Cameron
2017-08-16 12:43                                                                               ` Michael Ellerman
2017-08-16 12:56                                                                                 ` Paul E. McKenney
2017-08-16 15:31                                                                                   ` Nicholas Piggin
2017-08-16 16:27                                                                                   ` Paul E. McKenney
2017-08-17 13:55                                                                                     ` Michael Ellerman
2017-08-20  4:45                                                                                     ` Nicholas Piggin
2017-08-20  5:01                                                                                       ` David Miller
2017-08-20  5:04                                                                                       ` Paul E. McKenney
2017-08-20 13:00                                                                                       ` Nicholas Piggin
2017-08-20 18:35                                                                                         ` Paul E. McKenney
2017-08-20 21:14                                                                                           ` Paul E. McKenney
2017-08-21  0:52                                                                                             ` Nicholas Piggin
2017-08-21  6:06                                                                                               ` Nicholas Piggin
2017-08-21 10:18                                                                                                 ` Jonathan Cameron
2017-08-21 14:19                                                                                                   ` Nicholas Piggin
2017-08-21 15:02                                                                                                     ` Jonathan Cameron
2017-08-21 20:55                                                                                                     ` David Miller
2017-08-22  7:49                                                                                                       ` Jonathan Cameron
2017-08-22  8:51                                                                                                         ` Abdul Haleem
2017-08-22 15:26                                                                                                           ` Paul E. McKenney
2017-09-06 12:28                                                                                                             ` Paul E. McKenney
2017-08-22  0:38                                                                                               ` Paul E. McKenney
2017-07-31 11:09                                           ` Jonathan Cameron
2017-07-31 11:55                                             ` Jonathan Cameron
2017-08-01 10:53                                               ` Jonathan Cameron
2017-07-26 16:48                           ` David Miller
2017-07-26  3:53           ` Paul E. McKenney
2017-07-26  7:51             ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170725134626.GL3730@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=abdhalee@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=dzickus@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linuxarm@huawei.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=npiggin@gmail.com \
    --cc=sfr@canb.auug.org.au \
    --cc=sparclinux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).