From: paulmck@linux.vnet.ibm.com (Paul E. McKenney)
To: linux-arm-kernel@lists.infradead.org
Subject: rcu self-detected stall messages on OMAP3, 4 boards
Date: Sat, 22 Sep 2012 18:56:45 -0700 [thread overview]
Message-ID: <20120923015645.GJ2934@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1209230136320.28806@utopia.booyaka.com>
On Sun, Sep 23, 2012 at 01:42:10AM +0000, Paul Walmsley wrote:
> Hi Paul
>
> On Sat, 22 Sep 2012, Paul Walmsley wrote:
>
> > On Sat, 22 Sep 2012, Paul E. McKenney wrote:
> >
> > > And here is a patch. I am still having trouble reproducing the problem,
> > > but figured that I should avoid serializing things.
> >
> > Thanks, testing this now on v3.6-rc6.
>
> Looks like you solved it!
>
> Tested v3.6-rc6 + your stall diagnostic patch:
>
> http://marc.info/?l=linux-arm-kernel&m=134827237215882&w=2
>
> on OMAP4430ES2 Pandaboard using omap2plus_defconfig and
> CONFIG_RCU_CPU_STALL_INFO=y; got the stall warnings.
>
> Then added "rcu: Fix day-one dyntick-idle stall-warning bug" from:
>
> http://marc.info/?l=linux-arm-kernel&m=134835120600590&w=2
>
> Booted that, and the stall warnings did not appear within 30 minutes.
Very cool, thank you for your testing efforts!!!
May I apply your Tested-by to this patch?
And good show on the debugging patch -- it is quite good to have such
solid evidence that the bug that the fix was intended for was actually
occurring.
Thanx, Paul
> To confirm that the problem being solved matched your hypothesis, the
> debugging patch below[1] was added to the RCU idle entry/exit code.
>
> Without the bugfix patch, a boot log transcript was obtained
> indicating that the idle loop was entered with tick_nohz_enabled=1
> during a grace period with no callbacks present:
>
> http://www.pwsan.com/omap/transcripts/20120922-rcu-stall-debug-pre-fix.txt
>
> The debugging events started to appear at 1.867370 seconds into the
> boot. ENTER was pressed about 464 seconds in; this triggered the
> rcu_sched stall traceback.
>
> With the bugfix patch, a boot log transcript was obtained that
> indicated that the condition under test never occurred after waiting
> about 20 minutes:
>
> http://www.pwsan.com/omap/transcripts/20120922-rcu-stall-debug-post-fix.txt
>
> Thanks for being so willing to root-cause the issue, Paul; it's
> appreciated, and it's been quite instructive as well. Will address some
> remaining loose ends in follow-up E-mails.
>
>
> - Paul
>
>
> [1] Debugging patch to printk() if the previous idle loop entry occurred
> with tick_nohz_enabled=1 during a grace period with no RCU callbacks
> present:
>
>
> ---
> kernel/rcutree.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index f1eb7ad..f42941b 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -60,6 +60,9 @@
>
> /* Data structures. */
>
> +extern int tick_nohz_enabled;
> +static int no_cbs_idle_entry_count;
> +
> static struct lock_class_key rcu_node_class[RCU_NUM_LVLS];
>
> #define RCU_STATE_INITIALIZER(sname, cr) { \
> @@ -400,8 +403,12 @@ void rcu_idle_enter(void)
> unsigned long flags;
> long long oldval;
> struct rcu_dynticks *rdtp;
> + int cpu;
> + long totqlen = 0;
> + struct rcu_data *rdp;
>
> local_irq_save(flags);
> + rdp = &__get_cpu_var(rcu_sched_data);
> rdtp = &__get_cpu_var(rcu_dynticks);
> oldval = rdtp->dynticks_nesting;
> WARN_ON_ONCE((oldval & DYNTICK_TASK_NEST_MASK) == 0);
> @@ -410,6 +417,12 @@ void rcu_idle_enter(void)
> else
> rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE;
> rcu_idle_enter_common(rdtp, oldval);
> + if (tick_nohz_enabled && rcu_gp_in_progress(rdp->rsp)) {
> + for_each_possible_cpu(cpu)
> + totqlen += per_cpu_ptr(rdp->rsp->rda, cpu)->qlen;
> + if (totqlen == 0)
> + no_cbs_idle_entry_count = 1;
> + }
> local_irq_restore(flags);
> }
> EXPORT_SYMBOL_GPL(rcu_idle_enter);
> @@ -503,6 +516,10 @@ void rcu_idle_exit(void)
> rdtp->dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
> rcu_idle_exit_common(rdtp, oldval);
> local_irq_restore(flags);
> + if (no_cbs_idle_entry_count) {
> + no_cbs_idle_entry_count = 0;
> + pr_err("* Tickless idle was entered with zero RCU callbacks\n");
> + }
> }
> EXPORT_SYMBOL_GPL(rcu_idle_exit);
>
> --
> 1.7.10.4
>
next prev parent reply other threads:[~2012-09-23 1:56 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-12 22:51 rcu self-detected stall messages on OMAP3, 4 boards Paul Walmsley
2012-09-13 1:12 ` Paul E. McKenney
2012-09-13 18:52 ` Paul Walmsley
2012-09-20 0:03 ` Paul E. McKenney
2012-09-20 7:56 ` Paul Walmsley
2012-09-20 15:03 ` Bruce, Becky
2012-09-20 21:49 ` Bruce, Becky
2012-09-20 22:01 ` Paul E. McKenney
2012-09-20 22:47 ` Paul Walmsley
2012-09-20 23:21 ` Paul E. McKenney
2012-09-21 18:08 ` Paul Walmsley
2012-09-21 18:58 ` Paul E. McKenney
2012-09-21 19:11 ` Paul Walmsley
2012-09-21 19:57 ` Paul E. McKenney
2012-09-21 20:31 ` Tony Lindgren
2012-09-21 22:03 ` Paul E. McKenney
2012-09-22 15:45 ` Frederic Weisbecker
2012-09-22 16:00 ` Paul E. McKenney
2012-09-21 22:12 ` Paul E. McKenney
2012-09-22 18:42 ` Paul Walmsley
2012-09-22 20:10 ` Paul E. McKenney
2012-09-22 21:59 ` Paul E. McKenney
2012-09-22 22:25 ` Paul Walmsley
2012-09-22 23:11 ` Paul E. McKenney
2012-09-23 7:55 ` Paul Walmsley
2012-09-23 12:11 ` Paul E. McKenney
2012-09-23 1:42 ` Paul Walmsley
2012-09-23 1:56 ` Paul E. McKenney [this message]
2012-09-23 2:01 ` Paul Walmsley
2012-09-24 9:41 ` Shilimkar, Santosh
2012-09-24 13:18 ` Paul E. McKenney
2012-10-01 8:55 ` Linus Walleij
2012-10-01 13:28 ` Paul E. McKenney
2012-09-21 18:59 ` Paul Walmsley
2012-09-21 17:47 ` Paul Walmsley
2012-09-21 17:51 ` Paul Walmsley
2012-09-21 21:20 ` Paul E. McKenney
2012-09-21 22:41 ` Paul Walmsley
2012-09-22 0:05 ` Paul E. McKenney
2012-09-22 18:16 ` Paul Walmsley
2012-09-22 19:52 ` Paul E. McKenney
2012-09-22 22:20 ` Paul Walmsley
2012-09-22 23:17 ` Paul E. McKenney
2012-09-24 21:54 ` Paul Walmsley
2012-09-24 22:00 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120923015645.GJ2934@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).