From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Paul Walmsley <paul@pwsan.com>
Cc: "Hilman, Kevin" <khilman@ti.com>,
"<snijsure@grid-net.com>" <snijsure@grid-net.com>,
fweisbec@gmail.com, "Bruce, Becky" <bbruce@ti.com>,
"<linux-kernel@vger.kernel.org>" <linux-kernel@vger.kernel.org>,
"Paul E. McKenney" <paul.mckenney@linaro.org>,
"Shilimkar, Santosh" <santosh.shilimkar@ti.com>,
"Hunter, Jon" <jon-hunter@ti.com>,
"<linux-omap@vger.kernel.org>" <linux-omap@vger.kernel.org>,
"<linux-arm-kernel@lists.infradead.org>"
<linux-arm-kernel@lists.infradead.org>
Subject: Re: rcu self-detected stall messages on OMAP3, 4 boards
Date: Sat, 22 Sep 2012 18:56:45 -0700 [thread overview]
Message-ID: <20120923015645.GJ2934@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1209230136320.28806@utopia.booyaka.com>
On Sun, Sep 23, 2012 at 01:42:10AM +0000, Paul Walmsley wrote:
> Hi Paul
>
> On Sat, 22 Sep 2012, Paul Walmsley wrote:
>
> > On Sat, 22 Sep 2012, Paul E. McKenney wrote:
> >
> > > And here is a patch. I am still having trouble reproducing the problem,
> > > but figured that I should avoid serializing things.
> >
> > Thanks, testing this now on v3.6-rc6.
>
> Looks like you solved it!
>
> Tested v3.6-rc6 + your stall diagnostic patch:
>
> http://marc.info/?l=linux-arm-kernel&m=134827237215882&w=2
>
> on OMAP4430ES2 Pandaboard using omap2plus_defconfig and
> CONFIG_RCU_CPU_STALL_INFO=y; got the stall warnings.
>
> Then added "rcu: Fix day-one dyntick-idle stall-warning bug" from:
>
> http://marc.info/?l=linux-arm-kernel&m=134835120600590&w=2
>
> Booted that, and the stall warnings did not appear within 30 minutes.
Very cool, thank you for your testing efforts!!!
May I apply your Tested-by to this patch?
And good show on the debugging patch -- it is quite good to have such
solid evidence that the bug that the fix was intended for was actually
occurring.
Thanx, Paul
> To confirm that the problem being solved matched your hypothesis, the
> debugging patch below[1] was added to the RCU idle entry/exit code.
>
> Without the bugfix patch, a boot log transcript was obtained
> indicating that the idle loop was entered with tick_nohz_enabled=1
> during a grace period with no callbacks present:
>
> http://www.pwsan.com/omap/transcripts/20120922-rcu-stall-debug-pre-fix.txt
>
> The debugging events started to appear at 1.867370 seconds into the
> boot. ENTER was pressed about 464 seconds in; this triggered the
> rcu_sched stall traceback.
>
> With the bugfix patch, a boot log transcript was obtained that
> indicated that the condition under test never occurred after waiting
> about 20 minutes:
>
> http://www.pwsan.com/omap/transcripts/20120922-rcu-stall-debug-post-fix.txt
>
> Thanks for being so willing to root-cause the issue, Paul; it's
> appreciated, and it's been quite instructive as well. Will address some
> remaining loose ends in follow-up E-mails.
>
>
> - Paul
>
>
> [1] Debugging patch to printk() if the previous idle loop entry occurred
> with tick_nohz_enabled=1 during a grace period with no RCU callbacks
> present:
>
>
> ---
> kernel/rcutree.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index f1eb7ad..f42941b 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -60,6 +60,9 @@
>
> /* Data structures. */
>
> +extern int tick_nohz_enabled;
> +static int no_cbs_idle_entry_count;
> +
> static struct lock_class_key rcu_node_class[RCU_NUM_LVLS];
>
> #define RCU_STATE_INITIALIZER(sname, cr) { \
> @@ -400,8 +403,12 @@ void rcu_idle_enter(void)
> unsigned long flags;
> long long oldval;
> struct rcu_dynticks *rdtp;
> + int cpu;
> + long totqlen = 0;
> + struct rcu_data *rdp;
>
> local_irq_save(flags);
> + rdp = &__get_cpu_var(rcu_sched_data);
> rdtp = &__get_cpu_var(rcu_dynticks);
> oldval = rdtp->dynticks_nesting;
> WARN_ON_ONCE((oldval & DYNTICK_TASK_NEST_MASK) == 0);
> @@ -410,6 +417,12 @@ void rcu_idle_enter(void)
> else
> rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE;
> rcu_idle_enter_common(rdtp, oldval);
> + if (tick_nohz_enabled && rcu_gp_in_progress(rdp->rsp)) {
> + for_each_possible_cpu(cpu)
> + totqlen += per_cpu_ptr(rdp->rsp->rda, cpu)->qlen;
> + if (totqlen == 0)
> + no_cbs_idle_entry_count = 1;
> + }
> local_irq_restore(flags);
> }
> EXPORT_SYMBOL_GPL(rcu_idle_enter);
> @@ -503,6 +516,10 @@ void rcu_idle_exit(void)
> rdtp->dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
> rcu_idle_exit_common(rdtp, oldval);
> local_irq_restore(flags);
> + if (no_cbs_idle_entry_count) {
> + no_cbs_idle_entry_count = 0;
> + pr_err("* Tickless idle was entered with zero RCU callbacks\n");
> + }
> }
> EXPORT_SYMBOL_GPL(rcu_idle_exit);
>
> --
> 1.7.10.4
>
WARNING: multiple messages have this Message-ID (diff)
From: paulmck@linux.vnet.ibm.com (Paul E. McKenney)
To: linux-arm-kernel@lists.infradead.org
Subject: rcu self-detected stall messages on OMAP3, 4 boards
Date: Sat, 22 Sep 2012 18:56:45 -0700 [thread overview]
Message-ID: <20120923015645.GJ2934@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1209230136320.28806@utopia.booyaka.com>
On Sun, Sep 23, 2012 at 01:42:10AM +0000, Paul Walmsley wrote:
> Hi Paul
>
> On Sat, 22 Sep 2012, Paul Walmsley wrote:
>
> > On Sat, 22 Sep 2012, Paul E. McKenney wrote:
> >
> > > And here is a patch. I am still having trouble reproducing the problem,
> > > but figured that I should avoid serializing things.
> >
> > Thanks, testing this now on v3.6-rc6.
>
> Looks like you solved it!
>
> Tested v3.6-rc6 + your stall diagnostic patch:
>
> http://marc.info/?l=linux-arm-kernel&m=134827237215882&w=2
>
> on OMAP4430ES2 Pandaboard using omap2plus_defconfig and
> CONFIG_RCU_CPU_STALL_INFO=y; got the stall warnings.
>
> Then added "rcu: Fix day-one dyntick-idle stall-warning bug" from:
>
> http://marc.info/?l=linux-arm-kernel&m=134835120600590&w=2
>
> Booted that, and the stall warnings did not appear within 30 minutes.
Very cool, thank you for your testing efforts!!!
May I apply your Tested-by to this patch?
And good show on the debugging patch -- it is quite good to have such
solid evidence that the bug that the fix was intended for was actually
occurring.
Thanx, Paul
> To confirm that the problem being solved matched your hypothesis, the
> debugging patch below[1] was added to the RCU idle entry/exit code.
>
> Without the bugfix patch, a boot log transcript was obtained
> indicating that the idle loop was entered with tick_nohz_enabled=1
> during a grace period with no callbacks present:
>
> http://www.pwsan.com/omap/transcripts/20120922-rcu-stall-debug-pre-fix.txt
>
> The debugging events started to appear at 1.867370 seconds into the
> boot. ENTER was pressed about 464 seconds in; this triggered the
> rcu_sched stall traceback.
>
> With the bugfix patch, a boot log transcript was obtained that
> indicated that the condition under test never occurred after waiting
> about 20 minutes:
>
> http://www.pwsan.com/omap/transcripts/20120922-rcu-stall-debug-post-fix.txt
>
> Thanks for being so willing to root-cause the issue, Paul; it's
> appreciated, and it's been quite instructive as well. Will address some
> remaining loose ends in follow-up E-mails.
>
>
> - Paul
>
>
> [1] Debugging patch to printk() if the previous idle loop entry occurred
> with tick_nohz_enabled=1 during a grace period with no RCU callbacks
> present:
>
>
> ---
> kernel/rcutree.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index f1eb7ad..f42941b 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -60,6 +60,9 @@
>
> /* Data structures. */
>
> +extern int tick_nohz_enabled;
> +static int no_cbs_idle_entry_count;
> +
> static struct lock_class_key rcu_node_class[RCU_NUM_LVLS];
>
> #define RCU_STATE_INITIALIZER(sname, cr) { \
> @@ -400,8 +403,12 @@ void rcu_idle_enter(void)
> unsigned long flags;
> long long oldval;
> struct rcu_dynticks *rdtp;
> + int cpu;
> + long totqlen = 0;
> + struct rcu_data *rdp;
>
> local_irq_save(flags);
> + rdp = &__get_cpu_var(rcu_sched_data);
> rdtp = &__get_cpu_var(rcu_dynticks);
> oldval = rdtp->dynticks_nesting;
> WARN_ON_ONCE((oldval & DYNTICK_TASK_NEST_MASK) == 0);
> @@ -410,6 +417,12 @@ void rcu_idle_enter(void)
> else
> rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE;
> rcu_idle_enter_common(rdtp, oldval);
> + if (tick_nohz_enabled && rcu_gp_in_progress(rdp->rsp)) {
> + for_each_possible_cpu(cpu)
> + totqlen += per_cpu_ptr(rdp->rsp->rda, cpu)->qlen;
> + if (totqlen == 0)
> + no_cbs_idle_entry_count = 1;
> + }
> local_irq_restore(flags);
> }
> EXPORT_SYMBOL_GPL(rcu_idle_enter);
> @@ -503,6 +516,10 @@ void rcu_idle_exit(void)
> rdtp->dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
> rcu_idle_exit_common(rdtp, oldval);
> local_irq_restore(flags);
> + if (no_cbs_idle_entry_count) {
> + no_cbs_idle_entry_count = 0;
> + pr_err("* Tickless idle was entered with zero RCU callbacks\n");
> + }
> }
> EXPORT_SYMBOL_GPL(rcu_idle_exit);
>
> --
> 1.7.10.4
>
WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Paul Walmsley <paul@pwsan.com>
Cc: "Bruce, Becky" <bbruce@ti.com>,
"Paul E. McKenney" <paul.mckenney@linaro.org>,
"<linux-kernel@vger.kernel.org>" <linux-kernel@vger.kernel.org>,
"<linux-omap@vger.kernel.org>" <linux-omap@vger.kernel.org>,
"<linux-arm-kernel@lists.infradead.org>"
<linux-arm-kernel@lists.infradead.org>,
"Hilman, Kevin" <khilman@ti.com>,
"Shilimkar, Santosh" <santosh.shilimkar@ti.com>,
"Hunter, Jon" <jon-hunter@ti.com>,
"<snijsure@grid-net.com>" <snijsure@grid-net.com>,
fweisbec@gmail.com
Subject: Re: rcu self-detected stall messages on OMAP3, 4 boards
Date: Sat, 22 Sep 2012 18:56:45 -0700 [thread overview]
Message-ID: <20120923015645.GJ2934@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1209230136320.28806@utopia.booyaka.com>
On Sun, Sep 23, 2012 at 01:42:10AM +0000, Paul Walmsley wrote:
> Hi Paul
>
> On Sat, 22 Sep 2012, Paul Walmsley wrote:
>
> > On Sat, 22 Sep 2012, Paul E. McKenney wrote:
> >
> > > And here is a patch. I am still having trouble reproducing the problem,
> > > but figured that I should avoid serializing things.
> >
> > Thanks, testing this now on v3.6-rc6.
>
> Looks like you solved it!
>
> Tested v3.6-rc6 + your stall diagnostic patch:
>
> http://marc.info/?l=linux-arm-kernel&m=134827237215882&w=2
>
> on OMAP4430ES2 Pandaboard using omap2plus_defconfig and
> CONFIG_RCU_CPU_STALL_INFO=y; got the stall warnings.
>
> Then added "rcu: Fix day-one dyntick-idle stall-warning bug" from:
>
> http://marc.info/?l=linux-arm-kernel&m=134835120600590&w=2
>
> Booted that, and the stall warnings did not appear within 30 minutes.
Very cool, thank you for your testing efforts!!!
May I apply your Tested-by to this patch?
And good show on the debugging patch -- it is quite good to have such
solid evidence that the bug that the fix was intended for was actually
occurring.
Thanx, Paul
> To confirm that the problem being solved matched your hypothesis, the
> debugging patch below[1] was added to the RCU idle entry/exit code.
>
> Without the bugfix patch, a boot log transcript was obtained
> indicating that the idle loop was entered with tick_nohz_enabled=1
> during a grace period with no callbacks present:
>
> http://www.pwsan.com/omap/transcripts/20120922-rcu-stall-debug-pre-fix.txt
>
> The debugging events started to appear at 1.867370 seconds into the
> boot. ENTER was pressed about 464 seconds in; this triggered the
> rcu_sched stall traceback.
>
> With the bugfix patch, a boot log transcript was obtained that
> indicated that the condition under test never occurred after waiting
> about 20 minutes:
>
> http://www.pwsan.com/omap/transcripts/20120922-rcu-stall-debug-post-fix.txt
>
> Thanks for being so willing to root-cause the issue, Paul; it's
> appreciated, and it's been quite instructive as well. Will address some
> remaining loose ends in follow-up E-mails.
>
>
> - Paul
>
>
> [1] Debugging patch to printk() if the previous idle loop entry occurred
> with tick_nohz_enabled=1 during a grace period with no RCU callbacks
> present:
>
>
> ---
> kernel/rcutree.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index f1eb7ad..f42941b 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -60,6 +60,9 @@
>
> /* Data structures. */
>
> +extern int tick_nohz_enabled;
> +static int no_cbs_idle_entry_count;
> +
> static struct lock_class_key rcu_node_class[RCU_NUM_LVLS];
>
> #define RCU_STATE_INITIALIZER(sname, cr) { \
> @@ -400,8 +403,12 @@ void rcu_idle_enter(void)
> unsigned long flags;
> long long oldval;
> struct rcu_dynticks *rdtp;
> + int cpu;
> + long totqlen = 0;
> + struct rcu_data *rdp;
>
> local_irq_save(flags);
> + rdp = &__get_cpu_var(rcu_sched_data);
> rdtp = &__get_cpu_var(rcu_dynticks);
> oldval = rdtp->dynticks_nesting;
> WARN_ON_ONCE((oldval & DYNTICK_TASK_NEST_MASK) == 0);
> @@ -410,6 +417,12 @@ void rcu_idle_enter(void)
> else
> rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE;
> rcu_idle_enter_common(rdtp, oldval);
> + if (tick_nohz_enabled && rcu_gp_in_progress(rdp->rsp)) {
> + for_each_possible_cpu(cpu)
> + totqlen += per_cpu_ptr(rdp->rsp->rda, cpu)->qlen;
> + if (totqlen == 0)
> + no_cbs_idle_entry_count = 1;
> + }
> local_irq_restore(flags);
> }
> EXPORT_SYMBOL_GPL(rcu_idle_enter);
> @@ -503,6 +516,10 @@ void rcu_idle_exit(void)
> rdtp->dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
> rcu_idle_exit_common(rdtp, oldval);
> local_irq_restore(flags);
> + if (no_cbs_idle_entry_count) {
> + no_cbs_idle_entry_count = 0;
> + pr_err("* Tickless idle was entered with zero RCU callbacks\n");
> + }
> }
> EXPORT_SYMBOL_GPL(rcu_idle_exit);
>
> --
> 1.7.10.4
>
next prev parent reply other threads:[~2012-09-23 1:56 UTC|newest]
Thread overview: 101+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-12 22:51 rcu self-detected stall messages on OMAP3, 4 boards Paul Walmsley
2012-09-12 22:51 ` Paul Walmsley
2012-09-13 1:12 ` Paul E. McKenney
2012-09-13 1:12 ` Paul E. McKenney
2012-09-13 18:52 ` Paul Walmsley
2012-09-13 18:52 ` Paul Walmsley
2012-09-20 0:03 ` Paul E. McKenney
2012-09-20 0:03 ` Paul E. McKenney
2012-09-20 0:03 ` Paul E. McKenney
2012-09-20 7:56 ` Paul Walmsley
2012-09-20 7:56 ` Paul Walmsley
2012-09-20 15:03 ` Bruce, Becky
2012-09-20 15:03 ` Bruce, Becky
2012-09-20 21:49 ` Bruce, Becky
2012-09-20 21:49 ` Bruce, Becky
2012-09-20 22:01 ` Paul E. McKenney
2012-09-20 22:01 ` Paul E. McKenney
2012-09-20 22:01 ` Paul E. McKenney
2012-09-20 22:47 ` Paul Walmsley
2012-09-20 22:47 ` Paul Walmsley
2012-09-20 23:21 ` Paul E. McKenney
2012-09-20 23:21 ` Paul E. McKenney
2012-09-20 23:21 ` Paul E. McKenney
2012-09-21 18:08 ` Paul Walmsley
2012-09-21 18:08 ` Paul Walmsley
2012-09-21 18:58 ` Paul E. McKenney
2012-09-21 18:58 ` Paul E. McKenney
2012-09-21 19:11 ` Paul Walmsley
2012-09-21 19:11 ` Paul Walmsley
2012-09-21 19:57 ` Paul E. McKenney
2012-09-21 19:57 ` Paul E. McKenney
2012-09-21 20:31 ` Tony Lindgren
2012-09-21 20:31 ` Tony Lindgren
2012-09-21 22:03 ` Paul E. McKenney
2012-09-21 22:03 ` Paul E. McKenney
2012-09-22 15:45 ` Frederic Weisbecker
2012-09-22 15:45 ` Frederic Weisbecker
2012-09-22 16:00 ` Paul E. McKenney
2012-09-22 16:00 ` Paul E. McKenney
2012-09-21 22:12 ` Paul E. McKenney
2012-09-21 22:12 ` Paul E. McKenney
2012-09-22 18:42 ` Paul Walmsley
2012-09-22 18:42 ` Paul Walmsley
2012-09-22 20:10 ` Paul E. McKenney
2012-09-22 20:10 ` Paul E. McKenney
2012-09-22 21:59 ` Paul E. McKenney
2012-09-22 21:59 ` Paul E. McKenney
2012-09-22 22:25 ` Paul Walmsley
2012-09-22 22:25 ` Paul Walmsley
2012-09-22 23:11 ` Paul E. McKenney
2012-09-22 23:11 ` Paul E. McKenney
2012-09-22 23:11 ` Paul E. McKenney
2012-09-23 7:55 ` Paul Walmsley
2012-09-23 7:55 ` Paul Walmsley
2012-09-23 7:55 ` Paul Walmsley
2012-09-23 12:11 ` Paul E. McKenney
2012-09-23 12:11 ` Paul E. McKenney
2012-09-23 12:11 ` Paul E. McKenney
2012-09-23 1:42 ` Paul Walmsley
2012-09-23 1:42 ` Paul Walmsley
2012-09-23 1:56 ` Paul E. McKenney [this message]
2012-09-23 1:56 ` Paul E. McKenney
2012-09-23 1:56 ` Paul E. McKenney
2012-09-23 2:01 ` Paul Walmsley
2012-09-23 2:01 ` Paul Walmsley
2012-09-24 9:41 ` Shilimkar, Santosh
2012-09-24 9:41 ` Shilimkar, Santosh
2012-09-24 13:18 ` Paul E. McKenney
2012-09-24 13:18 ` Paul E. McKenney
2012-10-01 8:55 ` Linus Walleij
2012-10-01 8:55 ` Linus Walleij
2012-10-01 13:28 ` Paul E. McKenney
2012-10-01 13:28 ` Paul E. McKenney
2012-09-21 18:59 ` Paul Walmsley
2012-09-21 18:59 ` Paul Walmsley
2012-09-21 17:47 ` Paul Walmsley
2012-09-21 17:47 ` Paul Walmsley
2012-09-21 17:51 ` Paul Walmsley
2012-09-21 17:51 ` Paul Walmsley
2012-09-21 21:20 ` Paul E. McKenney
2012-09-21 21:20 ` Paul E. McKenney
2012-09-21 21:20 ` Paul E. McKenney
2012-09-21 22:41 ` Paul Walmsley
2012-09-21 22:41 ` Paul Walmsley
2012-09-22 0:05 ` Paul E. McKenney
2012-09-22 0:05 ` Paul E. McKenney
2012-09-22 18:16 ` Paul Walmsley
2012-09-22 18:16 ` Paul Walmsley
2012-09-22 18:16 ` Paul Walmsley
2012-09-22 19:52 ` Paul E. McKenney
2012-09-22 19:52 ` Paul E. McKenney
2012-09-22 19:52 ` Paul E. McKenney
2012-09-22 22:20 ` Paul Walmsley
2012-09-22 22:20 ` Paul Walmsley
2012-09-22 22:20 ` Paul Walmsley
2012-09-22 23:17 ` Paul E. McKenney
2012-09-22 23:17 ` Paul E. McKenney
2012-09-24 21:54 ` Paul Walmsley
2012-09-24 21:54 ` Paul Walmsley
2012-09-24 22:00 ` Paul E. McKenney
2012-09-24 22:00 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120923015645.GJ2934@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=bbruce@ti.com \
--cc=fweisbec@gmail.com \
--cc=jon-hunter@ti.com \
--cc=khilman@ti.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-omap@vger.kernel.org \
--cc=paul.mckenney@linaro.org \
--cc=paul@pwsan.com \
--cc=santosh.shilimkar@ti.com \
--cc=snijsure@grid-net.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.