From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: David Miller <davem@davemloft.net>
Cc: Jonathan.Cameron@huawei.com, dzickus@redhat.com,
sfr@canb.auug.org.au, linuxarm@huawei.com, npiggin@gmail.com,
abdhalee@linux.vnet.ibm.com, sparclinux@vger.kernel.org,
akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org,
linux-arm-kernel@lists.infradead.org
Subject: Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?
Date: Wed, 26 Jul 2017 18:42:14 -0700 [thread overview]
Message-ID: <20170727014214.GH3730@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170726.162200.1904949371593276937.davem@davemloft.net>
On Wed, Jul 26, 2017 at 04:22:00PM -0700, David Miller wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Date: Wed, 26 Jul 2017 16:15:05 -0700
>
> > On Wed, Jul 26, 2017 at 03:45:40PM -0700, David Miller wrote:
> >> Just out of curiousity, what x86 idle method is your machine using?
> >> The mwait one or the one which simply uses 'halt'? The mwait variant
> >> might mask this bug, and halt would be a lot closer to how sparc64 and
> >> Jonathan's system operates.
> >
> > My kernel builds with CONFIG_INTEL_IDLE=n, which I believe means that
> > I am not using the mwait one. Here is a grep for IDLE in my .config:
> >
> > CONFIG_NO_HZ_IDLE=y
> > CONFIG_GENERIC_SMP_IDLE_THREAD=y
> > # CONFIG_IDLE_PAGE_TRACKING is not set
> > CONFIG_ACPI_PROCESSOR_IDLE=y
> > CONFIG_CPU_IDLE=y
> > # CONFIG_CPU_IDLE_GOV_LADDER is not set
> > CONFIG_CPU_IDLE_GOV_MENU=y
> > # CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set
> > # CONFIG_INTEL_IDLE is not set
>
> No, that doesn't influence it. It is determined by cpu features at
> run time.
>
> If you are using mwait, it'll say so in your kernel log like this:
>
> using mwait in idle threads
Thank you for the hint!
And vim says:
"E486: Pattern not found: using mwait in idle threads"
> >> On sparc64 the cpu yield we do in the idle loop sleeps the cpu. It's
> >> local TICK register keeps advancing, and the local timer therefore
> >> will still trigger. Also, any externally generated interrupts
> >> (including cross calls) will wake up the cpu as well.
> >>
> >> The tick-sched code is really tricky wrt. NO_HZ even in the NO_HZ_IDLE
> >> case. One of my running theories is that we miss scheduling a tick
> >> due to a race. That would be consistent with the behavior we see
> >> in the RCU dumps, I think.
> >
> > But wouldn't you have to miss a -lot- of ticks to get an RCU CPU stall
> > warning? By default, your grace period needs to extend for more than
> > 21 seconds (more than one-third of a -minute-) to get one. Or do
> > you mean that the ticks get shut off now and forever, as opposed to
> > just losing one of them?
>
> Hmmm, good point. And I was talking about simply missing one tick.
>
> Indeed, that really wouldn't explain how we end up with a RCU stall
> dump listing almost all of the cpus as having missed a grace period.
I have seen stranger things, but admittedly not often.
Thanx, Paul
WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: linux-arm-kernel@lists.infradead.org
Subject: Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?
Date: Thu, 27 Jul 2017 01:42:14 +0000 [thread overview]
Message-ID: <20170727014214.GH3730@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170726.162200.1904949371593276937.davem@davemloft.net>
On Wed, Jul 26, 2017 at 04:22:00PM -0700, David Miller wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Date: Wed, 26 Jul 2017 16:15:05 -0700
>
> > On Wed, Jul 26, 2017 at 03:45:40PM -0700, David Miller wrote:
> >> Just out of curiousity, what x86 idle method is your machine using?
> >> The mwait one or the one which simply uses 'halt'? The mwait variant
> >> might mask this bug, and halt would be a lot closer to how sparc64 and
> >> Jonathan's system operates.
> >
> > My kernel builds with CONFIG_INTEL_IDLE=n, which I believe means that
> > I am not using the mwait one. Here is a grep for IDLE in my .config:
> >
> > CONFIG_NO_HZ_IDLE=y
> > CONFIG_GENERIC_SMP_IDLE_THREAD=y
> > # CONFIG_IDLE_PAGE_TRACKING is not set
> > CONFIG_ACPI_PROCESSOR_IDLE=y
> > CONFIG_CPU_IDLE=y
> > # CONFIG_CPU_IDLE_GOV_LADDER is not set
> > CONFIG_CPU_IDLE_GOV_MENU=y
> > # CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set
> > # CONFIG_INTEL_IDLE is not set
>
> No, that doesn't influence it. It is determined by cpu features at
> run time.
>
> If you are using mwait, it'll say so in your kernel log like this:
>
> using mwait in idle threads
Thank you for the hint!
And vim says:
"E486: Pattern not found: using mwait in idle threads"
> >> On sparc64 the cpu yield we do in the idle loop sleeps the cpu. It's
> >> local TICK register keeps advancing, and the local timer therefore
> >> will still trigger. Also, any externally generated interrupts
> >> (including cross calls) will wake up the cpu as well.
> >>
> >> The tick-sched code is really tricky wrt. NO_HZ even in the NO_HZ_IDLE
> >> case. One of my running theories is that we miss scheduling a tick
> >> due to a race. That would be consistent with the behavior we see
> >> in the RCU dumps, I think.
> >
> > But wouldn't you have to miss a -lot- of ticks to get an RCU CPU stall
> > warning? By default, your grace period needs to extend for more than
> > 21 seconds (more than one-third of a -minute-) to get one. Or do
> > you mean that the ticks get shut off now and forever, as opposed to
> > just losing one of them?
>
> Hmmm, good point. And I was talking about simply missing one tick.
>
> Indeed, that really wouldn't explain how we end up with a RCU stall
> dump listing almost all of the cpus as having missed a grace period.
I have seen stranger things, but admittedly not often.
Thanx, Paul
WARNING: multiple messages have this Message-ID (diff)
From: paulmck@linux.vnet.ibm.com (Paul E. McKenney)
To: linux-arm-kernel@lists.infradead.org
Subject: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?
Date: Wed, 26 Jul 2017 18:42:14 -0700 [thread overview]
Message-ID: <20170727014214.GH3730@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170726.162200.1904949371593276937.davem@davemloft.net>
On Wed, Jul 26, 2017 at 04:22:00PM -0700, David Miller wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Date: Wed, 26 Jul 2017 16:15:05 -0700
>
> > On Wed, Jul 26, 2017 at 03:45:40PM -0700, David Miller wrote:
> >> Just out of curiousity, what x86 idle method is your machine using?
> >> The mwait one or the one which simply uses 'halt'? The mwait variant
> >> might mask this bug, and halt would be a lot closer to how sparc64 and
> >> Jonathan's system operates.
> >
> > My kernel builds with CONFIG_INTEL_IDLE=n, which I believe means that
> > I am not using the mwait one. Here is a grep for IDLE in my .config:
> >
> > CONFIG_NO_HZ_IDLE=y
> > CONFIG_GENERIC_SMP_IDLE_THREAD=y
> > # CONFIG_IDLE_PAGE_TRACKING is not set
> > CONFIG_ACPI_PROCESSOR_IDLE=y
> > CONFIG_CPU_IDLE=y
> > # CONFIG_CPU_IDLE_GOV_LADDER is not set
> > CONFIG_CPU_IDLE_GOV_MENU=y
> > # CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set
> > # CONFIG_INTEL_IDLE is not set
>
> No, that doesn't influence it. It is determined by cpu features at
> run time.
>
> If you are using mwait, it'll say so in your kernel log like this:
>
> using mwait in idle threads
Thank you for the hint!
And vim says:
"E486: Pattern not found: using mwait in idle threads"
> >> On sparc64 the cpu yield we do in the idle loop sleeps the cpu. It's
> >> local TICK register keeps advancing, and the local timer therefore
> >> will still trigger. Also, any externally generated interrupts
> >> (including cross calls) will wake up the cpu as well.
> >>
> >> The tick-sched code is really tricky wrt. NO_HZ even in the NO_HZ_IDLE
> >> case. One of my running theories is that we miss scheduling a tick
> >> due to a race. That would be consistent with the behavior we see
> >> in the RCU dumps, I think.
> >
> > But wouldn't you have to miss a -lot- of ticks to get an RCU CPU stall
> > warning? By default, your grace period needs to extend for more than
> > 21 seconds (more than one-third of a -minute-) to get one. Or do
> > you mean that the ticks get shut off now and forever, as opposed to
> > just losing one of them?
>
> Hmmm, good point. And I was talking about simply missing one tick.
>
> Indeed, that really wouldn't explain how we end up with a RCU stall
> dump listing almost all of the cpus as having missed a grace period.
I have seen stranger things, but admittedly not often.
Thanx, Paul
next prev parent reply other threads:[~2017-07-27 1:42 UTC|newest]
Thread overview: 241+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-25 11:32 RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this? Jonathan Cameron
2017-07-25 12:26 ` Nicholas Piggin
2017-07-25 12:26 ` Nicholas Piggin
2017-07-25 13:46 ` Paul E. McKenney
2017-07-25 13:46 ` Paul E. McKenney
2017-07-25 13:46 ` Paul E. McKenney
2017-07-25 14:42 ` Jonathan Cameron
2017-07-25 14:42 ` Jonathan Cameron
2017-07-25 14:42 ` Jonathan Cameron
2017-07-25 15:12 ` Paul E. McKenney
2017-07-25 15:12 ` Paul E. McKenney
2017-07-25 15:12 ` Paul E. McKenney
2017-07-25 16:52 ` Jonathan Cameron
2017-07-25 16:52 ` Jonathan Cameron
2017-07-25 16:52 ` Jonathan Cameron
2017-07-25 21:10 ` David Miller
2017-07-25 21:10 ` David Miller
2017-07-25 21:10 ` David Miller
2017-07-26 3:55 ` Paul E. McKenney
2017-07-26 3:55 ` Paul E. McKenney
2017-07-26 3:55 ` Paul E. McKenney
2017-07-26 4:02 ` David Miller
2017-07-26 4:02 ` David Miller
2017-07-26 4:02 ` David Miller
2017-07-26 4:12 ` Paul E. McKenney
2017-07-26 4:12 ` Paul E. McKenney
2017-07-26 4:12 ` Paul E. McKenney
2017-07-26 8:16 ` Jonathan Cameron
2017-07-26 8:16 ` Jonathan Cameron
2017-07-26 8:16 ` Jonathan Cameron
2017-07-26 9:32 ` Jonathan Cameron
2017-07-26 9:32 ` Jonathan Cameron
2017-07-26 9:32 ` Jonathan Cameron
2017-07-26 12:28 ` Jonathan Cameron
2017-07-26 12:28 ` Jonathan Cameron
2017-07-26 12:28 ` Jonathan Cameron
2017-07-26 12:49 ` Jonathan Cameron
2017-07-26 12:49 ` Jonathan Cameron
2017-07-26 12:49 ` Jonathan Cameron
2017-07-26 14:14 ` Paul E. McKenney
2017-07-26 14:14 ` Paul E. McKenney
2017-07-26 14:14 ` Paul E. McKenney
2017-07-26 14:23 ` Jonathan Cameron
2017-07-26 14:23 ` Jonathan Cameron
2017-07-26 14:23 ` Jonathan Cameron
2017-07-26 15:33 ` Jonathan Cameron
2017-07-26 15:33 ` Jonathan Cameron
2017-07-26 15:33 ` Jonathan Cameron
2017-07-26 15:49 ` Paul E. McKenney
2017-07-26 15:49 ` Paul E. McKenney
2017-07-26 15:49 ` Paul E. McKenney
2017-07-26 16:54 ` David Miller
2017-07-26 16:54 ` David Miller
2017-07-26 16:54 ` David Miller
2017-07-26 17:13 ` Jonathan Cameron
2017-07-26 17:13 ` Jonathan Cameron
2017-07-26 17:13 ` Jonathan Cameron
2017-07-27 7:41 ` Jonathan Cameron
2017-07-27 7:41 ` Jonathan Cameron
2017-07-27 7:41 ` Jonathan Cameron
2017-07-26 17:50 ` Paul E. McKenney
2017-07-26 17:50 ` Paul E. McKenney
2017-07-26 17:50 ` Paul E. McKenney
2017-07-26 22:36 ` Paul E. McKenney
2017-07-26 22:36 ` Paul E. McKenney
2017-07-26 22:36 ` Paul E. McKenney
2017-07-26 22:45 ` David Miller
2017-07-26 22:45 ` David Miller
2017-07-26 22:45 ` David Miller
2017-07-26 23:15 ` Paul E. McKenney
2017-07-26 23:15 ` Paul E. McKenney
2017-07-26 23:15 ` Paul E. McKenney
2017-07-26 23:22 ` David Miller
2017-07-26 23:22 ` David Miller
2017-07-26 23:22 ` David Miller
2017-07-27 1:42 ` Paul E. McKenney [this message]
2017-07-27 1:42 ` Paul E. McKenney
2017-07-27 1:42 ` Paul E. McKenney
2017-07-27 4:34 ` Nicholas Piggin
2017-07-27 4:34 ` Nicholas Piggin
2017-07-27 4:34 ` Nicholas Piggin
2017-07-27 12:49 ` Paul E. McKenney
2017-07-27 12:49 ` Paul E. McKenney
2017-07-27 12:49 ` Paul E. McKenney
2017-07-27 13:49 ` Jonathan Cameron
2017-07-27 13:49 ` Jonathan Cameron
2017-07-27 13:49 ` Jonathan Cameron
2017-07-27 16:39 ` Jonathan Cameron
2017-07-27 16:39 ` Jonathan Cameron
2017-07-27 16:39 ` Jonathan Cameron
2017-07-27 16:52 ` Paul E. McKenney
2017-07-27 16:52 ` Paul E. McKenney
2017-07-27 16:52 ` Paul E. McKenney
2017-07-28 7:44 ` Jonathan Cameron
2017-07-28 7:44 ` Jonathan Cameron
2017-07-28 7:44 ` Jonathan Cameron
2017-07-28 12:54 ` Boqun Feng
2017-07-28 12:54 ` Boqun Feng
2017-07-28 12:54 ` Boqun Feng
2017-07-28 13:13 ` Jonathan Cameron
2017-07-28 13:13 ` Jonathan Cameron
2017-07-28 13:13 ` Jonathan Cameron
2017-07-28 14:55 ` Paul E. McKenney
2017-07-28 14:55 ` Paul E. McKenney
2017-07-28 14:55 ` Paul E. McKenney
2017-07-28 18:41 ` Paul E. McKenney
2017-07-28 18:41 ` Paul E. McKenney
2017-07-28 18:41 ` Paul E. McKenney
2017-07-28 19:09 ` Paul E. McKenney
2017-07-28 19:09 ` Paul E. McKenney
2017-07-28 19:09 ` Paul E. McKenney
2017-07-30 13:37 ` Boqun Feng
2017-07-30 13:37 ` Boqun Feng
2017-07-30 13:37 ` Boqun Feng
2017-07-30 16:59 ` Paul E. McKenney
2017-07-30 16:59 ` Paul E. McKenney
2017-07-30 16:59 ` Paul E. McKenney
2017-07-29 1:20 ` Boqun Feng
2017-07-29 1:20 ` Boqun Feng
2017-07-29 1:20 ` Boqun Feng
2017-07-28 18:42 ` David Miller
2017-07-28 18:42 ` David Miller
2017-07-28 18:42 ` David Miller
2017-07-28 13:08 ` Jonathan Cameron
2017-07-28 13:08 ` Jonathan Cameron
2017-07-28 13:24 ` Jonathan Cameron
2017-07-28 13:24 ` Jonathan Cameron
2017-07-28 13:24 ` Jonathan Cameron
2017-07-28 16:55 ` Paul E. McKenney
2017-07-28 16:55 ` Paul E. McKenney
2017-07-28 17:27 ` Jonathan Cameron
2017-07-28 17:27 ` Jonathan Cameron
2017-07-28 17:27 ` Jonathan Cameron
2017-07-28 19:03 ` Paul E. McKenney
2017-07-28 19:03 ` Paul E. McKenney
2017-07-28 19:03 ` Paul E. McKenney
2017-07-31 11:08 ` Jonathan Cameron
2017-07-31 11:08 ` Jonathan Cameron
2017-07-31 11:08 ` Jonathan Cameron
2017-07-31 15:04 ` Paul E. McKenney
2017-07-31 15:04 ` Paul E. McKenney
2017-07-31 15:04 ` Paul E. McKenney
2017-07-31 15:27 ` Jonathan Cameron
2017-07-31 15:27 ` Jonathan Cameron
2017-07-31 15:27 ` Jonathan Cameron
2017-08-01 18:46 ` Paul E. McKenney
2017-08-01 18:46 ` Paul E. McKenney
2017-08-01 18:46 ` Paul E. McKenney
2017-08-02 16:25 ` Jonathan Cameron
2017-08-02 16:25 ` Jonathan Cameron
2017-08-02 16:25 ` Jonathan Cameron
2017-08-15 15:47 ` Paul E. McKenney
2017-08-15 15:47 ` Paul E. McKenney
2017-08-15 15:47 ` Paul E. McKenney
2017-08-16 1:24 ` Jonathan Cameron
2017-08-16 1:24 ` Jonathan Cameron
2017-08-16 1:24 ` Jonathan Cameron
2017-08-16 12:43 ` Michael Ellerman
2017-08-16 12:43 ` Michael Ellerman
2017-08-16 12:43 ` Michael Ellerman
2017-08-16 12:56 ` Paul E. McKenney
2017-08-16 12:56 ` Paul E. McKenney
2017-08-16 12:56 ` Paul E. McKenney
2017-08-16 15:31 ` Nicholas Piggin
2017-08-16 15:31 ` Nicholas Piggin
2017-08-16 15:31 ` Nicholas Piggin
2017-08-16 16:27 ` Paul E. McKenney
2017-08-16 16:27 ` Paul E. McKenney
2017-08-16 16:27 ` Paul E. McKenney
2017-08-17 13:55 ` Michael Ellerman
2017-08-17 13:55 ` Michael Ellerman
2017-08-17 13:55 ` Michael Ellerman
2017-08-20 4:45 ` Nicholas Piggin
2017-08-20 4:45 ` Nicholas Piggin
2017-08-20 4:45 ` Nicholas Piggin
2017-08-20 5:01 ` David Miller
2017-08-20 5:01 ` David Miller
2017-08-20 5:01 ` David Miller
2017-08-20 5:04 ` Paul E. McKenney
2017-08-20 5:04 ` Paul E. McKenney
2017-08-20 5:04 ` Paul E. McKenney
2017-08-20 13:00 ` Nicholas Piggin
2017-08-20 13:00 ` Nicholas Piggin
2017-08-20 13:00 ` Nicholas Piggin
2017-08-20 18:35 ` Paul E. McKenney
2017-08-20 18:35 ` Paul E. McKenney
2017-08-20 18:35 ` Paul E. McKenney
2017-08-20 21:14 ` Paul E. McKenney
2017-08-20 21:14 ` Paul E. McKenney
2017-08-20 21:14 ` Paul E. McKenney
2017-08-21 0:52 ` Nicholas Piggin
2017-08-21 0:52 ` Nicholas Piggin
2017-08-21 0:52 ` Nicholas Piggin
2017-08-21 6:06 ` Nicholas Piggin
2017-08-21 6:06 ` Nicholas Piggin
2017-08-21 6:06 ` Nicholas Piggin
2017-08-21 10:18 ` Jonathan Cameron
2017-08-21 10:18 ` Jonathan Cameron
2017-08-21 10:18 ` Jonathan Cameron
2017-08-21 14:19 ` Nicholas Piggin
2017-08-21 14:19 ` Nicholas Piggin
2017-08-21 14:19 ` Nicholas Piggin
2017-08-21 15:02 ` Jonathan Cameron
2017-08-21 15:02 ` Jonathan Cameron
2017-08-21 15:02 ` Jonathan Cameron
2017-08-21 20:55 ` David Miller
2017-08-21 20:55 ` David Miller
2017-08-21 20:55 ` David Miller
2017-08-22 7:49 ` Jonathan Cameron
2017-08-22 7:49 ` Jonathan Cameron
2017-08-22 7:49 ` Jonathan Cameron
2017-08-22 8:51 ` Abdul Haleem
2017-08-22 8:51 ` Abdul Haleem
2017-08-22 8:51 ` Abdul Haleem
2017-08-22 15:26 ` Paul E. McKenney
2017-08-22 15:26 ` Paul E. McKenney
2017-08-22 15:26 ` Paul E. McKenney
2017-09-06 12:28 ` Paul E. McKenney
2017-09-06 12:28 ` Paul E. McKenney
2017-09-06 12:28 ` Paul E. McKenney
2017-08-22 0:38 ` Paul E. McKenney
2017-08-22 0:38 ` Paul E. McKenney
2017-08-22 0:38 ` Paul E. McKenney
2017-07-31 11:09 ` Jonathan Cameron
2017-07-31 11:09 ` Jonathan Cameron
2017-07-31 11:09 ` Jonathan Cameron
2017-07-31 11:55 ` Jonathan Cameron
2017-07-31 11:55 ` Jonathan Cameron
2017-07-31 11:55 ` Jonathan Cameron
2017-08-01 10:53 ` Jonathan Cameron
2017-08-01 10:53 ` Jonathan Cameron
2017-08-01 10:53 ` Jonathan Cameron
2017-07-26 16:48 ` David Miller
2017-07-26 16:48 ` David Miller
2017-07-26 16:48 ` David Miller
2017-07-26 3:53 ` Paul E. McKenney
2017-07-26 3:53 ` Paul E. McKenney
2017-07-26 3:53 ` Paul E. McKenney
2017-07-26 7:51 ` Jonathan Cameron
2017-07-26 7:51 ` Jonathan Cameron
2017-07-26 7:51 ` Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170727014214.GH3730@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=abdhalee@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=dzickus@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linuxarm@huawei.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=npiggin@gmail.com \
--cc=sfr@canb.auug.org.au \
--cc=sparclinux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.