From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: "Shilimkar, Santosh" <santosh.shilimkar@ti.com>
Cc: Paul Walmsley <paul@pwsan.com>, "Bruce, Becky" <bbruce@ti.com>,
"Paul E. McKenney" <paul.mckenney@linaro.org>,
"<linux-kernel@vger.kernel.org>" <linux-kernel@vger.kernel.org>,
"<linux-omap@vger.kernel.org>" <linux-omap@vger.kernel.org>,
"<linux-arm-kernel@lists.infradead.org>"
<linux-arm-kernel@lists.infradead.org>,
"Hilman, Kevin" <khilman@ti.com>,
"Hunter, Jon" <jon-hunter@ti.com>,
"<snijsure@grid-net.com>" <snijsure@grid-net.com>,
fweisbec@gmail.com
Subject: Re: rcu self-detected stall messages on OMAP3, 4 boards
Date: Mon, 24 Sep 2012 06:18:12 -0700 [thread overview]
Message-ID: <20120924131806.GA2477@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAMQu2gwZE+oXg6YfCj_Ua0PQxAk8ADjCga193UKMDTBTgoo4fw@mail.gmail.com>
On Mon, Sep 24, 2012 at 03:11:34PM +0530, Shilimkar, Santosh wrote:
> On Sun, Sep 23, 2012 at 3:29 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Sat, Sep 22, 2012 at 01:10:43PM -0700, Paul E. McKenney wrote:
> >> On Sat, Sep 22, 2012 at 06:42:08PM +0000, Paul Walmsley wrote:
> >> > On Fri, 21 Sep 2012, Paul E. McKenney wrote:
>
> [...]
>
> >
> > And here is a patch. I am still having trouble reproducing the problem,
> > but figured that I should avoid serializing things.
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > b/kernel/rcutree.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > rcu: Fix day-one dyntick-idle stall-warning bug
> >
> > Each grace period is supposed to have at least one callback waiting
> > for that grace period to complete. However, if CONFIG_NO_HZ=n, an
> > extra callback-free grace period is no big problem -- it will chew up
> > a tiny bit of CPU time, but it will complete normally. In contrast,
> > CONFIG_NO_HZ=y kernels have the potential for all the CPUs to go to
> > sleep indefinitely, in turn indefinitely delaying completion of the
> > callback-free grace period. Given that nothing is waiting on this grace
> > period, this is also not a problem.
> >
> > Unless RCU CPU stall warnings are also enabled, as they are in recent
> > kernels. In this case, if a CPU wakes up after at least one minute
> > of inactivity, an RCU CPU stall warning will result. The reason that
> > no one noticed until quite recently is that most systems have enough
> > OS noise that they will never remain absolutely idle for a full minute.
> > But there are some embedded systems with cut-down userspace configurations
> > that get into this mode quite easily.
> >
> > All this begs the question of exactly how a callback-free grace period
> > gets started in the first place. This can happen due to the fact that
> > CPUs do not necessarily agree on which grace period is in progress.
> > If a CPU still believes that the grace period that just completed is
> > still ongoing, it will believe that it has callbacks that need to wait
> > for another grace period, never mind the fact that the grace period
> > that they were waiting for just completed. This CPU can therefore
> > erroneously decide to start a new grace period.
> >
> > Once this CPU notices that the earlier grace period completed, it will
> > invoke its callbacks. It then won't have any callbacks left. If no
> > other CPU has any callbacks, we now have a callback-free grace period.
> >
> > This commit therefore makes CPUs check more carefully before starting a
> > new grace period. This new check relies on an array of tail pointers
> > into each CPU's list of callbacks. If the CPU is up to date on which
> > grace periods have completed, it checks to see if any callbacks follow
> > the RCU_DONE_TAIL segment, otherwise it checks to see if any callbacks
> > follow the RCU_WAIT_TAIL segment. The reason that this works is that
> > the RCU_WAIT_TAIL segment will be promoted to the RCU_DONE_TAIL segment
> > as soon as the CPU figures out that the old grace period has ended.
> >
> > This change is to cpu_needs_another_gp(), which is called in a number
> > of places. The only one that really matters is in rcu_start_gp(), where
> > the root rcu_node structure's ->lock is held, which prevents any
> > other CPU from starting or completing a grace period, so that the
> > comparison that determines whether the CPU is missing the completion
> > of a grace period is stable.
> >
> > Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >
> As already confirmed by Paul W and others, I too no longer see the rcu dumps
> any more with above patch. Thanks a lot for the fix.
Glad it finally works!
Thanx, Paul
WARNING: multiple messages have this Message-ID (diff)
From: paulmck@linux.vnet.ibm.com (Paul E. McKenney)
To: linux-arm-kernel@lists.infradead.org
Subject: rcu self-detected stall messages on OMAP3, 4 boards
Date: Mon, 24 Sep 2012 06:18:12 -0700 [thread overview]
Message-ID: <20120924131806.GA2477@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAMQu2gwZE+oXg6YfCj_Ua0PQxAk8ADjCga193UKMDTBTgoo4fw@mail.gmail.com>
On Mon, Sep 24, 2012 at 03:11:34PM +0530, Shilimkar, Santosh wrote:
> On Sun, Sep 23, 2012 at 3:29 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Sat, Sep 22, 2012 at 01:10:43PM -0700, Paul E. McKenney wrote:
> >> On Sat, Sep 22, 2012 at 06:42:08PM +0000, Paul Walmsley wrote:
> >> > On Fri, 21 Sep 2012, Paul E. McKenney wrote:
>
> [...]
>
> >
> > And here is a patch. I am still having trouble reproducing the problem,
> > but figured that I should avoid serializing things.
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > b/kernel/rcutree.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > rcu: Fix day-one dyntick-idle stall-warning bug
> >
> > Each grace period is supposed to have at least one callback waiting
> > for that grace period to complete. However, if CONFIG_NO_HZ=n, an
> > extra callback-free grace period is no big problem -- it will chew up
> > a tiny bit of CPU time, but it will complete normally. In contrast,
> > CONFIG_NO_HZ=y kernels have the potential for all the CPUs to go to
> > sleep indefinitely, in turn indefinitely delaying completion of the
> > callback-free grace period. Given that nothing is waiting on this grace
> > period, this is also not a problem.
> >
> > Unless RCU CPU stall warnings are also enabled, as they are in recent
> > kernels. In this case, if a CPU wakes up after at least one minute
> > of inactivity, an RCU CPU stall warning will result. The reason that
> > no one noticed until quite recently is that most systems have enough
> > OS noise that they will never remain absolutely idle for a full minute.
> > But there are some embedded systems with cut-down userspace configurations
> > that get into this mode quite easily.
> >
> > All this begs the question of exactly how a callback-free grace period
> > gets started in the first place. This can happen due to the fact that
> > CPUs do not necessarily agree on which grace period is in progress.
> > If a CPU still believes that the grace period that just completed is
> > still ongoing, it will believe that it has callbacks that need to wait
> > for another grace period, never mind the fact that the grace period
> > that they were waiting for just completed. This CPU can therefore
> > erroneously decide to start a new grace period.
> >
> > Once this CPU notices that the earlier grace period completed, it will
> > invoke its callbacks. It then won't have any callbacks left. If no
> > other CPU has any callbacks, we now have a callback-free grace period.
> >
> > This commit therefore makes CPUs check more carefully before starting a
> > new grace period. This new check relies on an array of tail pointers
> > into each CPU's list of callbacks. If the CPU is up to date on which
> > grace periods have completed, it checks to see if any callbacks follow
> > the RCU_DONE_TAIL segment, otherwise it checks to see if any callbacks
> > follow the RCU_WAIT_TAIL segment. The reason that this works is that
> > the RCU_WAIT_TAIL segment will be promoted to the RCU_DONE_TAIL segment
> > as soon as the CPU figures out that the old grace period has ended.
> >
> > This change is to cpu_needs_another_gp(), which is called in a number
> > of places. The only one that really matters is in rcu_start_gp(), where
> > the root rcu_node structure's ->lock is held, which prevents any
> > other CPU from starting or completing a grace period, so that the
> > comparison that determines whether the CPU is missing the completion
> > of a grace period is stable.
> >
> > Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >
> As already confirmed by Paul W and others, I too no longer see the rcu dumps
> any more with above patch. Thanks a lot for the fix.
Glad it finally works!
Thanx, Paul
next prev parent reply other threads:[~2012-09-24 13:18 UTC|newest]
Thread overview: 101+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-12 22:51 rcu self-detected stall messages on OMAP3, 4 boards Paul Walmsley
2012-09-12 22:51 ` Paul Walmsley
2012-09-13 1:12 ` Paul E. McKenney
2012-09-13 1:12 ` Paul E. McKenney
2012-09-13 18:52 ` Paul Walmsley
2012-09-13 18:52 ` Paul Walmsley
2012-09-20 0:03 ` Paul E. McKenney
2012-09-20 0:03 ` Paul E. McKenney
2012-09-20 0:03 ` Paul E. McKenney
2012-09-20 7:56 ` Paul Walmsley
2012-09-20 7:56 ` Paul Walmsley
2012-09-20 15:03 ` Bruce, Becky
2012-09-20 15:03 ` Bruce, Becky
2012-09-20 21:49 ` Bruce, Becky
2012-09-20 21:49 ` Bruce, Becky
2012-09-20 22:01 ` Paul E. McKenney
2012-09-20 22:01 ` Paul E. McKenney
2012-09-20 22:01 ` Paul E. McKenney
2012-09-20 22:47 ` Paul Walmsley
2012-09-20 22:47 ` Paul Walmsley
2012-09-20 23:21 ` Paul E. McKenney
2012-09-20 23:21 ` Paul E. McKenney
2012-09-20 23:21 ` Paul E. McKenney
2012-09-21 18:08 ` Paul Walmsley
2012-09-21 18:08 ` Paul Walmsley
2012-09-21 18:58 ` Paul E. McKenney
2012-09-21 18:58 ` Paul E. McKenney
2012-09-21 19:11 ` Paul Walmsley
2012-09-21 19:11 ` Paul Walmsley
2012-09-21 19:57 ` Paul E. McKenney
2012-09-21 19:57 ` Paul E. McKenney
2012-09-21 20:31 ` Tony Lindgren
2012-09-21 20:31 ` Tony Lindgren
2012-09-21 22:03 ` Paul E. McKenney
2012-09-21 22:03 ` Paul E. McKenney
2012-09-22 15:45 ` Frederic Weisbecker
2012-09-22 15:45 ` Frederic Weisbecker
2012-09-22 16:00 ` Paul E. McKenney
2012-09-22 16:00 ` Paul E. McKenney
2012-09-21 22:12 ` Paul E. McKenney
2012-09-21 22:12 ` Paul E. McKenney
2012-09-22 18:42 ` Paul Walmsley
2012-09-22 18:42 ` Paul Walmsley
2012-09-22 20:10 ` Paul E. McKenney
2012-09-22 20:10 ` Paul E. McKenney
2012-09-22 21:59 ` Paul E. McKenney
2012-09-22 21:59 ` Paul E. McKenney
2012-09-22 22:25 ` Paul Walmsley
2012-09-22 22:25 ` Paul Walmsley
2012-09-22 23:11 ` Paul E. McKenney
2012-09-22 23:11 ` Paul E. McKenney
2012-09-22 23:11 ` Paul E. McKenney
2012-09-23 7:55 ` Paul Walmsley
2012-09-23 7:55 ` Paul Walmsley
2012-09-23 7:55 ` Paul Walmsley
2012-09-23 12:11 ` Paul E. McKenney
2012-09-23 12:11 ` Paul E. McKenney
2012-09-23 12:11 ` Paul E. McKenney
2012-09-23 1:42 ` Paul Walmsley
2012-09-23 1:42 ` Paul Walmsley
2012-09-23 1:56 ` Paul E. McKenney
2012-09-23 1:56 ` Paul E. McKenney
2012-09-23 1:56 ` Paul E. McKenney
2012-09-23 2:01 ` Paul Walmsley
2012-09-23 2:01 ` Paul Walmsley
2012-09-24 9:41 ` Shilimkar, Santosh
2012-09-24 9:41 ` Shilimkar, Santosh
2012-09-24 13:18 ` Paul E. McKenney [this message]
2012-09-24 13:18 ` Paul E. McKenney
2012-10-01 8:55 ` Linus Walleij
2012-10-01 8:55 ` Linus Walleij
2012-10-01 13:28 ` Paul E. McKenney
2012-10-01 13:28 ` Paul E. McKenney
2012-09-21 18:59 ` Paul Walmsley
2012-09-21 18:59 ` Paul Walmsley
2012-09-21 17:47 ` Paul Walmsley
2012-09-21 17:47 ` Paul Walmsley
2012-09-21 17:51 ` Paul Walmsley
2012-09-21 17:51 ` Paul Walmsley
2012-09-21 21:20 ` Paul E. McKenney
2012-09-21 21:20 ` Paul E. McKenney
2012-09-21 21:20 ` Paul E. McKenney
2012-09-21 22:41 ` Paul Walmsley
2012-09-21 22:41 ` Paul Walmsley
2012-09-22 0:05 ` Paul E. McKenney
2012-09-22 0:05 ` Paul E. McKenney
2012-09-22 18:16 ` Paul Walmsley
2012-09-22 18:16 ` Paul Walmsley
2012-09-22 18:16 ` Paul Walmsley
2012-09-22 19:52 ` Paul E. McKenney
2012-09-22 19:52 ` Paul E. McKenney
2012-09-22 19:52 ` Paul E. McKenney
2012-09-22 22:20 ` Paul Walmsley
2012-09-22 22:20 ` Paul Walmsley
2012-09-22 22:20 ` Paul Walmsley
2012-09-22 23:17 ` Paul E. McKenney
2012-09-22 23:17 ` Paul E. McKenney
2012-09-24 21:54 ` Paul Walmsley
2012-09-24 21:54 ` Paul Walmsley
2012-09-24 22:00 ` Paul E. McKenney
2012-09-24 22:00 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120924131806.GA2477@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=bbruce@ti.com \
--cc=fweisbec@gmail.com \
--cc=jon-hunter@ti.com \
--cc=khilman@ti.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-omap@vger.kernel.org \
--cc=paul.mckenney@linaro.org \
--cc=paul@pwsan.com \
--cc=santosh.shilimkar@ti.com \
--cc=snijsure@grid-net.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.