From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: David Miller <davem@davemloft.net>
Cc: linux-kernel@vger.kernel.org, sparclinux@vger.kernel.org
Subject: Re: RCU stall warnings...
Date: Mon, 24 Jul 2017 23:49:27 +0000 [thread overview]
Message-ID: <20170724234927.GK3730@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170724.163458.1860624258080140141.davem@davemloft.net>
On Mon, Jul 24, 2017 at 04:34:58PM -0700, David Miller wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Date: Mon, 24 Jul 2017 16:20:33 -0700
>
> > It looks like the system isn't letting the rcu_sched grace-period kthread
> > run:
> >
> > [402138.240512] rcu_sched kthread starved for 2757 jiffies! g53669 c53668 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
> >
> > This kthread tried to wait for a few jiffies (the exact number depends
> > on HZ and the number of CPUs), but 2,757 jiffies have elapsed and it is
> > still waiting. This kthread is responsible for detecting idle CPUs and
> > reporting quiescent states on their behalf, so if this kthread doesn't
> > get a chance to run, then the stall warnings you are seeing are expected
> > behavior.
> >
> > I am seeing someething like sort of like this in my rcutorture runs,
> > but only when I boot with nr_cpus quite a bit bigger than maxcpus, as in
> > something like nr_cpusC and maxcpus=8. This causes 8 CPUs to be brought
> > online at the usual time, and the other 35 come online some time later.
> > One difference from your situation is that I see the grace-period
> > kthread in ->state=0x401 (TASK_WAKING) instead of your ->state=0x1.
> > If I send extra wakeups to the grace-period kthread (which shouldn't be
> > needed), it does make progress, but then other kthreads fall into that
> > same half-woken state.
> >
> > So now that I ahve shared the full extent of my ignorance on this topic,
> > any ideas? ;-)
>
> Shoing my ignorance as well, after reading this, for some reason this
> commit below sticks out to me. Maybe I should do a bisect and see if
> it lands on this commit.
I would be very surprised if this commit was the culprit, but then
again, I have been very surprised before.
> That would take a while as it's hard to forcibly set this thing off.
And my similar error can take awhile as well. But maybe I should try
forcing nr_cpusC and maxcpus=8 on older versions to see what happens.
A bisection would of course be quite helpful, depending of course on
the value of "a while". ;-)
Thanx, Paul
> ==========
> commit f92c734f02cbf10e40569facff82059ae9b61920
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date: Mon Apr 10 15:40:35 2017 -0700
>
> rcu: Prevent rcu_barrier() from starting needless grace periods
>
> Currently rcu_barrier() uses call_rcu() to enqueue new callbacks
> on each CPU with a non-empty callback list. This works, but means
> that rcu_barrier() forces grace periods that are not otherwise needed.
> The key point is that rcu_barrier() never needs to wait for a grace
> period, but instead only for all pre-existing callbacks to be invoked.
> This means that rcu_barrier()'s new callbacks should be placed in
> the callback-list segment containing the last pre-existing callback.
>
> This commit makes this change using the new rcu_segcblist_entrain()
> function.
>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: David Miller <davem@davemloft.net>
Cc: linux-kernel@vger.kernel.org, sparclinux@vger.kernel.org
Subject: Re: RCU stall warnings...
Date: Mon, 24 Jul 2017 16:49:27 -0700 [thread overview]
Message-ID: <20170724234927.GK3730@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170724.163458.1860624258080140141.davem@davemloft.net>
On Mon, Jul 24, 2017 at 04:34:58PM -0700, David Miller wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Date: Mon, 24 Jul 2017 16:20:33 -0700
>
> > It looks like the system isn't letting the rcu_sched grace-period kthread
> > run:
> >
> > [402138.240512] rcu_sched kthread starved for 2757 jiffies! g53669 c53668 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
> >
> > This kthread tried to wait for a few jiffies (the exact number depends
> > on HZ and the number of CPUs), but 2,757 jiffies have elapsed and it is
> > still waiting. This kthread is responsible for detecting idle CPUs and
> > reporting quiescent states on their behalf, so if this kthread doesn't
> > get a chance to run, then the stall warnings you are seeing are expected
> > behavior.
> >
> > I am seeing someething like sort of like this in my rcutorture runs,
> > but only when I boot with nr_cpus quite a bit bigger than maxcpus, as in
> > something like nr_cpus=43 and maxcpus=8. This causes 8 CPUs to be brought
> > online at the usual time, and the other 35 come online some time later.
> > One difference from your situation is that I see the grace-period
> > kthread in ->state=0x401 (TASK_WAKING) instead of your ->state=0x1.
> > If I send extra wakeups to the grace-period kthread (which shouldn't be
> > needed), it does make progress, but then other kthreads fall into that
> > same half-woken state.
> >
> > So now that I ahve shared the full extent of my ignorance on this topic,
> > any ideas? ;-)
>
> Shoing my ignorance as well, after reading this, for some reason this
> commit below sticks out to me. Maybe I should do a bisect and see if
> it lands on this commit.
I would be very surprised if this commit was the culprit, but then
again, I have been very surprised before.
> That would take a while as it's hard to forcibly set this thing off.
And my similar error can take awhile as well. But maybe I should try
forcing nr_cpus=43 and maxcpus=8 on older versions to see what happens.
A bisection would of course be quite helpful, depending of course on
the value of "a while". ;-)
Thanx, Paul
> ====================
> commit f92c734f02cbf10e40569facff82059ae9b61920
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date: Mon Apr 10 15:40:35 2017 -0700
>
> rcu: Prevent rcu_barrier() from starting needless grace periods
>
> Currently rcu_barrier() uses call_rcu() to enqueue new callbacks
> on each CPU with a non-empty callback list. This works, but means
> that rcu_barrier() forces grace periods that are not otherwise needed.
> The key point is that rcu_barrier() never needs to wait for a grace
> period, but instead only for all pre-existing callbacks to be invoked.
> This means that rcu_barrier()'s new callbacks should be placed in
> the callback-list segment containing the last pre-existing callback.
>
> This commit makes this change using the new rcu_segcblist_entrain()
> function.
>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
next prev parent reply other threads:[~2017-07-24 23:49 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-24 22:32 RCU stall warnings David Miller
2017-07-24 22:32 ` David Miller
2017-07-24 23:20 ` Paul E. McKenney
2017-07-24 23:20 ` Paul E. McKenney
2017-07-24 23:34 ` David Miller
2017-07-24 23:34 ` David Miller
2017-07-24 23:49 ` Paul E. McKenney [this message]
2017-07-24 23:49 ` Paul E. McKenney
2017-07-25 2:44 ` Paul E. McKenney
2017-07-25 2:44 ` Paul E. McKenney
2017-07-25 3:45 ` Stephen Rothwell
2017-07-25 3:45 ` Stephen Rothwell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170724234927.GK3730@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=sparclinux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.