public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: kordex - <kordex@gmail.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Badness at kernel/rcutree.c:1228
Date: Wed, 9 Dec 2009 11:18:07 -0800	[thread overview]
Message-ID: <20091209191807.GA16422@linux.vnet.ibm.com> (raw)
In-Reply-To: <8b8dd87a0912091036t3a3499c4p67be783252d80dc@mail.gmail.com>

On Wed, Dec 09, 2009 at 08:36:33PM +0200, kordex - wrote:
> I will turn debugging options on after
> http://lkml.org/lkml/2009/12/9/44 gets traced down so I can do that.

Hmmm...  You have recently run a memory test on your system, right?

							Thanx, Paul

> --Mikko Kortelainen
> 
> 2009/12/9 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> > In that case, the best thing would be to drop the warning into the
> > beginnings and ends of processing for system calls used by your workload.
> > The hope would be to find it triggering at the end of a given syscall,
> > permitting you to binary search the intervening code.
> >
> > Given that I cannot reproduce this, I cannot do much more than to offer
> > random hints.
> >
> >                                                        Thanx, Paul
> >
> > On Wed, Dec 09, 2009 at 07:35:44PM +0200, kordex - wrote:
> >> It did actually show the Badness after system had been running a long
> >> time. And this cut from it shows that system was fully done kernel
> >> init routines as there is ntpd running:
> >>
> >> warning: `ntpd' uses 32-bit capabilities (legacy support in use)
> >> ------------[ cut here ]------------
> >> Badness at kernel/rcutree.c:1228
> >> NIP: c004ecbc LR: c004f14c CTR: c007bd70
> >> REGS: df34dde0 TRAP: 0700   Not tainted  (2.6.32)
> >>
> >> --Mikko Kortelainen
> >>
> >> 2009/12/9 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> >> > Hmmm...  Didn't the first console output you sent me show the beenonline
> >> > WARN_ON_ONCE() triggering during late boot?  Yes, you had other failures
> >> > later, but it might be that whatever is triggering this warning is
> >> > related to those failures, right?
> >> >
> >> >                                                        Thanx, Paul
> >> >
> >> > On Wed, Dec 09, 2009 at 04:57:54PM +0200, kordex - wrote:
> >> >> Sorry but,
> >> >>
> >> >> Where actually this "down nearer to the point in the boot-up sequence"
> >> >> would be as I encountered the errors while the system was running (had
> >> >> been for days).
> >> >>
> >> >> --Mikko Kortelainen
> >> >>
> >> >> 2009/12/9 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> >> >> > On Wed, Dec 09, 2009 at 11:15:17AM +0200, kordex - wrote:
> >> >> >> Hey,
> >> >> >>
> >> >> >> I hope it's in the right place.
> >> >> >
> >> >> > Looks fine to me.
> >> >> >
> >> >> > And the fact that you did -not- see anything in your dmesg indicates
> >> >> > that the beenonline fields are set correctly at that point, as expected.
> >> >> > You will only see a complaint if the beenonline fields have been
> >> >> > corrupted.
> >> >> >
> >> >> > Please move them down nearer to the point in the boot-up sequence where
> >> >> > you were seeing the failure.  Please note that interrupts had been on
> >> >> > for one good long time when your original kernel complained, so there
> >> >> > had been a very large number of executions with beenonline set
> >> >> > correctly.
> >> >> >
> >> >> > So it will probably be faster to start at the original failure
> >> >> > and move towards boot rather than vice versa.
> >> >> >
> >> >> >                                                        Thanx, Paul
> >> >> >
> >> >> >> --Mikko Kortelainen
> >> >> >>
> >> >> >> navi:/usr/src# diff -Naur a/init/main.c b/init/main.c
> >> >> >> --- a/init/main.c       2009-12-03 05:51:21.000000000 +0200
> >> >> >> +++ b/init/main.c       2009-12-09 03:22:15.000000000 +0200
> >> >> >> @@ -81,6 +81,9 @@
> >> >> >>  #include <asm/smp.h>
> >> >> >>  #endif
> >> >> >>
> >> >> >> +/* DEBUG STATEMENT 2009/12/08 */
> >> >> >> +#include <linux/rcutree.h>
> >> >> >> +
> >> >> >>  static int kernel_init(void *);
> >> >> >>
> >> >> >>  extern void init_IRQ(void);
> >> >> >> @@ -589,6 +592,10 @@
> >> >> >>                 local_irq_disable();
> >> >> >>         }
> >> >> >>         rcu_init();
> >> >> >> +
> >> >> >> +       /* DEBUG STATEMENT 2009/12/08 */
> >> >> >> +       WARN_ON_ONCE(rcu_check_beenonline());
> >> >> >> +
> >> >> >>         /* init some links before init_ISA_irqs() */
> >> >> >>         early_irq_init();
> >> >> >>         init_IRQ();
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> 2009/12/9 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> >> >> >> > On Wed, Dec 09, 2009 at 03:41:00AM +0200, kordex - wrote:
> >> >> >> >> Hey,
> >> >> >> >>
> >> >> >> >> I put the debug function under init/main.c after rcu_init(); but there
> >> >> >> >> is no output on dmesg which means that it receives zero value.
> >> >> >> >>
> >> >> >> >> Full dmesg: http://xnet.fi/opt/apps/lkml-2.6.32-vanilla.dmesg.rcu-init.txt
> >> >> >> >
> >> >> >> > Could you please send the patch you applied to, as you said, put the
> >> >> >> > debug function under init/main.c after rcu_init()?
> >> >> >> >
> >> >> >> >                                                        Thanx, Paul
> >> >> >> >
> >> >> >> >> --Mikko Kortelainen
> >> >> >> >>
> >> >> >> >> 2009/12/8 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> >> >> >> >> > On Tue, Dec 08, 2009 at 11:22:07AM -0800, Paul E. McKenney wrote:
> >> >> >> >> >> At this point, I must defer to those more skilled than I at diagnosing
> >> >> >> >> >> early-boot problems.
> >> >> >> >> >
> >> >> >> >> > Well, that is silly on my part -- the problem seems to appear late in
> >> >> >> >> > boot, and you had no problem capturing that portion of the boot log.
> >> >> >> >> >
> >> >> >> >> > So please see below for a patch providing a rcu_check_beenonline()
> >> >> >> >> > function that, when called after rcu_init(), returns non-zero if the
> >> >> >> >> > beenonline fields have become corrupted.  So put calls of the form:
> >> >> >> >> >
> >> >> >> >> >        WARN_ON_ONCE(rcu_check_beenonline());
> >> >> >> >> >
> >> >> >> >> > in the initialization code path preceding the problem.  Either #include
> >> >> >> >> > rcupdate.h or explicitly declare the function as appropriate.
> >> >> >> >> >
> >> >> >> >> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >> >> >> >> > ---
> >> >> >> >> >
> >> >> >> >> > diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
> >> >> >> >> > index 9642c6b..190a687 100644
> >> >> >> >> > --- a/include/linux/rcutree.h
> >> >> >> >> > +++ b/include/linux/rcutree.h
> >> >> >> >> > @@ -39,6 +39,8 @@ extern int rcu_cpu_notify(struct notifier_block *self,
> >> >> >> >> >  extern int rcu_needs_cpu(int cpu);
> >> >> >> >> >  extern int rcu_expedited_torture_stats(char *page);
> >> >> >> >> >
> >> >> >> >> > +extern int rcu_check_beenonline(void);
> >> >> >> >> > +
> >> >> >> >> >  #ifdef CONFIG_TREE_PREEMPT_RCU
> >> >> >> >> >
> >> >> >> >> >  extern void __rcu_read_lock(void);
> >> >> >> >> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> >> >> >> >> > index 207125b..27d3722 100644
> >> >> >> >> > --- a/kernel/rcutree.c
> >> >> >> >> > +++ b/kernel/rcutree.c
> >> >> >> >> > @@ -77,6 +77,17 @@ DEFINE_PER_CPU(struct rcu_data, rcu_sched_data);
> >> >> >> >> >  struct rcu_state rcu_bh_state = RCU_STATE_INITIALIZER(rcu_bh_state);
> >> >> >> >> >  DEFINE_PER_CPU(struct rcu_data, rcu_bh_data);
> >> >> >> >> >
> >> >> >> >> > +/*
> >> >> >> >> > + * Ad-hoc diagnostic function, for use only after rcu_init() has
> >> >> >> >> > + * returned.  Assumes that the boot CPU is CPU 0.  Assumes that
> >> >> >> >> > + * the kernel has been built with CONFIG_TREE_RCU.  Not for inclusion.
> >> >> >> >> > + * Usage: "WARN_ON_ONCE(rcu_check_beenonline());"
> >> >> >> >> > + */
> >> >> >> >> > +int rcu_check_beenonline(void)
> >> >> >> >> > +{
> >> >> >> >> > +       return !per_cpu(rcu_sched_data, 0).beenonline ||
> >> >> >> >> > +              !per_cpu(rcu_bh_data, 0).beenonline;
> >> >> >> >> > +}
> >> >> >> >> >
> >> >> >> >> >  /*
> >> >> >> >> >  * Return true if an RCU grace period is in progress.  The ACCESS_ONCE()s
> >> >> >> >> >
> >> >> >> >
> >> >> >
> >> >
> >

  reply	other threads:[~2009-12-09 19:18 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-07 18:58 Badness at kernel/rcutree.c:1228 kordex -
2009-12-08  0:08 ` Paul E. McKenney
     [not found]   ` <8b8dd87a0912080352j36fa24bbvf301a4101d80e434@mail.gmail.com>
2009-12-08 11:54     ` kordex -
2009-12-08 15:35       ` Paul E. McKenney
     [not found]         ` <8b8dd87a0912080924q3890ea8o23f6d1cfab37e306@mail.gmail.com>
     [not found]           ` <20091208192207.GC6779@linux.vnet.ibm.com>
     [not found]             ` <20091208200657.GA12990@linux.vnet.ibm.com>
2009-12-09  1:41               ` kordex -
2009-12-09  2:08                 ` Paul E. McKenney
     [not found]                   ` <8b8dd87a0912090115i3c4877b8s2e47f84f9c66ff7@mail.gmail.com>
     [not found]                     ` <20091209140334.GB6812@linux.vnet.ibm.com>
     [not found]                       ` <8b8dd87a0912090657y2702ecd7x5f41beb3ab785ccf@mail.gmail.com>
     [not found]                         ` <20091209165304.GB6938@linux.vnet.ibm.com>
     [not found]                           ` <8b8dd87a0912090935s75de7011s905c3007311a6b6b@mail.gmail.com>
     [not found]                             ` <20091209183042.GD6938@linux.vnet.ibm.com>
2009-12-09 18:36                               ` kordex -
2009-12-09 19:18                                 ` Paul E. McKenney [this message]
     [not found]                                   ` <8b8dd87a0912091309m21b8e560pd68e47e7ec38f097@mail.gmail.com>
     [not found]                                     ` <20091210024809.GJ6938@linux.vnet.ibm.com>
2009-12-10 14:24                                       ` kordex -
2009-12-10 19:32                                         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091209191807.GA16422@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=kordex@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox