From: David Miller <davem@davemloft.net>
To: Jonathan.Cameron@huawei.com
Cc: paulmck@linux.vnet.ibm.com, npiggin@gmail.com,
linux-arm-kernel@lists.infradead.org, linuxarm@huawei.com,
akpm@linux-foundation.org, abdhalee@linux.vnet.ibm.com,
linuxppc-dev@lists.ozlabs.org, dzickus@redhat.com,
sparclinux@vger.kernel.org, sfr@canb.auug.org.au
Subject: Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?
Date: Tue, 25 Jul 2017 14:10:29 -0700 (PDT) [thread overview]
Message-ID: <20170725.141029.676882447882600000.davem@davemloft.net> (raw)
In-Reply-To: <20170725175207.000001cb@huawei.com>
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Date: Wed, 26 Jul 2017 00:52:07 +0800
> On Tue, 25 Jul 2017 08:12:45 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>
>> On Tue, Jul 25, 2017 at 10:42:45PM +0800, Jonathan Cameron wrote:
>> > On Tue, 25 Jul 2017 06:46:26 -0700
>> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>> >
>> > > On Tue, Jul 25, 2017 at 10:26:54PM +1000, Nicholas Piggin wrote:
>> > > > On Tue, 25 Jul 2017 19:32:10 +0800
>> > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
>> > > >
>> > > > > Hi All,
>> > > > >
>> > > > > We observed a regression on our d05 boards (but curiously not
>> > > > > the fairly similar but single socket / smaller core count
>> > > > > d03), initially seen with linux-next prior to the merge window
>> > > > > and still present in v4.13-rc2.
>> > > > >
>> > > > > The symptom is:
>> > >
>> > > Adding Dave Miller and the sparclinux@vger.kernel.org email on CC, as
>> > > they have been seeing something similar, and you might well have saved
>> > > them the trouble of bisecting.
>> > >
>> > > [ . . . ]
>> > >
>> > > > > [ 1984.628602] rcu_preempt kthread starved for 5663 jiffies! g1566 c1565 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
>> > >
>> > > This is the cause from an RCU perspective. You had a lot of idle CPUs,
>> > > and RCU is not permitted to disturb them -- the battery-powered embedded
>> > > guys get very annoyed by that sort of thing. What happens instead is
>> > > that each CPU updates a per-CPU state variable when entering or exiting
>> > > idle, and the grace-period kthread ("rcu_preempt kthread" in the above
>> > > message) checks these state variables, and if when sees an idle CPU,
>> > > it reports a quiescent state on that CPU's behalf.
>> > >
>> > > But the grace-period kthread can only do this work if it gets a chance
>> > > to run. And the message above says that this kthread hasn't had a chance
>> > > to run for a full 5,663 jiffies. For completeness, the "g1566 c1565"
>> > > says that grace period #1566 is in progress, the "f0x0" says that no one
>> > > is needing another grace period #1567. The "RCU_GP_WAIT_FQS(3)" says
>> > > that the grace-period kthread has fully initialized the current grace
>> > > period and is sleeping for a few jiffies waiting to scan for idle tasks.
>> > > Finally, the "->state=0x1" says that the grace-period kthread is in
>> > > TASK_INTERRUPTIBLE state, in other words, still sleeping.
>> >
>> > Thanks for the explanation!
>> > >
>> > > So my first question is "What did commit 05a4a9527 (kernel/watchdog:
>> > > split up config options) do to prevent the grace-period kthread from
>> > > getting a chance to run?"
>> >
>> > As far as we can tell it was a side effect of that patch.
>> >
>> > The real cause is that patch changed the result of defconfigs to stop running
>> > the softlockup detector - now CONFIG_SOFTLOCKUP_DETECTOR
>> >
>> > Enabling that on 4.13-rc2 (and presumably everything in between)
>> > means we don't see the problem any more.
>> >
>> > > I must confess that I don't see anything
>> > > obvious in that commit, so my second question is "Are we sure that
>> > > reverting this commit makes the problem go away?"
>> >
>> > Simply enabling CONFIG_SOFTLOCKUP_DETECTOR seems to make it go away.
>> > That detector fires up a thread on every cpu, which may be relevant.
>>
>> Interesting... Why should it be necessary to fire up a thread on every
>> CPU in order to make sure that RCU's grace-period kthreads get some
>> CPU time? Especially give how many idle CPUs you had on your system.
>>
>> So I have to ask if there is some other bug that the softlockup detector
>> is masking.
> I am thinking the same. We can try going back further than 4.12 tomorrow
> (we think we can realistically go back to 4.8 and possibly 4.6
> with this board)
Just to report, turning softlockup back on fixes things for me on
sparc64 too.
The thing about softlockup is it runs an hrtimer, which seems to run
about every 4 seconds.
So I wonder if this is a NO_HZ problem.
next prev parent reply other threads:[~2017-07-25 21:10 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20170725193039.00007c80@huawei.com>
2017-07-25 12:26 ` RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this? Nicholas Piggin
2017-07-25 13:46 ` Paul E. McKenney
2017-07-25 14:42 ` Jonathan Cameron
2017-07-25 15:12 ` Paul E. McKenney
2017-07-25 16:52 ` Jonathan Cameron
2017-07-25 21:10 ` David Miller [this message]
2017-07-26 3:55 ` Paul E. McKenney
2017-07-26 4:02 ` David Miller
2017-07-26 4:12 ` Paul E. McKenney
2017-07-26 8:16 ` Jonathan Cameron
2017-07-26 9:32 ` Jonathan Cameron
2017-07-26 12:28 ` Jonathan Cameron
2017-07-26 12:49 ` Jonathan Cameron
2017-07-26 14:14 ` Paul E. McKenney
2017-07-26 14:23 ` Jonathan Cameron
2017-07-26 15:33 ` Jonathan Cameron
2017-07-26 15:49 ` Paul E. McKenney
2017-07-26 16:54 ` David Miller
2017-07-26 17:13 ` Jonathan Cameron
2017-07-27 7:41 ` Jonathan Cameron
2017-07-26 17:50 ` Paul E. McKenney
2017-07-26 22:36 ` Paul E. McKenney
2017-07-26 22:45 ` David Miller
2017-07-26 23:15 ` Paul E. McKenney
2017-07-26 23:22 ` David Miller
2017-07-27 1:42 ` Paul E. McKenney
2017-07-27 4:34 ` Nicholas Piggin
2017-07-27 12:49 ` Paul E. McKenney
2017-07-27 13:49 ` Jonathan Cameron
2017-07-27 16:39 ` Jonathan Cameron
2017-07-27 16:52 ` Paul E. McKenney
2017-07-28 7:44 ` Jonathan Cameron
2017-07-28 12:54 ` Boqun Feng
2017-07-28 13:13 ` Jonathan Cameron
2017-07-28 14:55 ` Paul E. McKenney
2017-07-28 18:41 ` Paul E. McKenney
2017-07-28 19:09 ` Paul E. McKenney
2017-07-30 13:37 ` Boqun Feng
2017-07-30 16:59 ` Paul E. McKenney
2017-07-29 1:20 ` Boqun Feng
2017-07-28 18:42 ` David Miller
2017-07-28 13:08 ` Jonathan Cameron
2017-07-28 13:24 ` Jonathan Cameron
[not found] ` <20170728165529.GF3730@linux.vnet.ibm.com>
2017-07-28 17:27 ` Jonathan Cameron
2017-07-28 19:03 ` Paul E. McKenney
2017-07-31 11:08 ` Jonathan Cameron
2017-07-31 15:04 ` Paul E. McKenney
2017-07-31 15:27 ` Jonathan Cameron
2017-08-01 18:46 ` Paul E. McKenney
2017-08-02 16:25 ` Jonathan Cameron
2017-08-15 15:47 ` Paul E. McKenney
2017-08-16 1:24 ` Jonathan Cameron
2017-08-16 12:43 ` Michael Ellerman
2017-08-16 12:56 ` Paul E. McKenney
2017-08-16 15:31 ` Nicholas Piggin
2017-08-16 16:27 ` Paul E. McKenney
2017-08-17 13:55 ` Michael Ellerman
2017-08-20 4:45 ` Nicholas Piggin
2017-08-20 5:01 ` David Miller
2017-08-20 5:04 ` Paul E. McKenney
2017-08-20 13:00 ` Nicholas Piggin
2017-08-20 18:35 ` Paul E. McKenney
2017-08-20 21:14 ` Paul E. McKenney
2017-08-21 0:52 ` Nicholas Piggin
2017-08-21 6:06 ` Nicholas Piggin
2017-08-21 10:18 ` Jonathan Cameron
2017-08-21 14:19 ` Nicholas Piggin
2017-08-21 15:02 ` Jonathan Cameron
2017-08-21 20:55 ` David Miller
2017-08-22 7:49 ` Jonathan Cameron
2017-08-22 8:51 ` Abdul Haleem
2017-08-22 15:26 ` Paul E. McKenney
2017-09-06 12:28 ` Paul E. McKenney
2017-08-22 0:38 ` Paul E. McKenney
2017-07-31 11:09 ` Jonathan Cameron
2017-07-31 11:55 ` Jonathan Cameron
2017-08-01 10:53 ` Jonathan Cameron
2017-07-26 16:48 ` David Miller
2017-07-26 3:53 ` Paul E. McKenney
2017-07-26 7:51 ` Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170725.141029.676882447882600000.davem@davemloft.net \
--to=davem@davemloft.net \
--cc=Jonathan.Cameron@huawei.com \
--cc=abdhalee@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=dzickus@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linuxarm@huawei.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=npiggin@gmail.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=sfr@canb.auug.org.au \
--cc=sparclinux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).