From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Steffen Persvold <sp@numascale.com>
Cc: Daniel J Blueman <daniel@numascale-asia.com>,
Dipankar Sarma <dipankar@in.ibm.com>,
linux-kernel@vger.kernel.org, x86@kernel.org
Subject: Re: RCU qsmask !=0 warnings on large-SMP...
Date: Mon, 30 Jan 2012 08:15:29 -0800 [thread overview]
Message-ID: <20120130161529.GA5118@linux.vnet.ibm.com> (raw)
In-Reply-To: <20120129060921.GC17696@linux.vnet.ibm.com>
On Sat, Jan 28, 2012 at 10:09:21PM -0800, Paul E. McKenney wrote:
> On Fri, Jan 27, 2012 at 12:09:25PM +0100, Steffen Persvold wrote:
> > On 1/26/2012 20:26, Paul E. McKenney wrote:
> > >On Thu, Jan 26, 2012 at 04:04:37PM +0100, Steffen Persvold wrote:
> > >>On 1/26/2012 02:58, Paul E. McKenney wrote:
> > >>>On Wed, Jan 25, 2012 at 11:48:58PM +0100, Steffen Persvold wrote:
> > >>[]
> > >>>
> > >>>This looks like it will produce useful information, but I am not seeing
> > >>>output from it below.
> > >>>
> > >>> Thanx, Paul
> > >>>
> > >>>>This run it was CPU24 that triggered the issue :
> > >>>>
> > >>
> > >>This line is the printout for the root level :
> > >>
> > >>>>[ 231.572688] CPU 24, treason uncloaked, rsp @ ffffffff81a1cd80 (rcu_sched), rnp @ ffffffff81a1cd80(r) qsmask=0x1f, c=5132 g=5132 nc=5132 ng=5133 sc=5132 sg=5133 mc=5132 mg=5133
> > >
> > >OK, so the rcu_state structure (sc and sg) believes that grace period
> > >5133 has started but not completed, as expected. Strangely enough, so
> > >does the root rcu_node structure (nc and ng) and the CPU's leaf rcu_node
> > >structure (mc and mg).
> > >
> > >The per-CPU rcu_data structure (c and g) does not yet know about the
> > >new 5133 grace period, as expected.
> > >
> > >So this is the code in kernel/rcutree.c:rcu_start_gp() that does the
> > >initialization:
> > >
> > > rcu_for_each_node_breadth_first(rsp, rnp) {
> > > raw_spin_lock(&rnp->lock); /* irqs already disabled. */
> > > rcu_preempt_check_blocked_tasks(rnp);
> > > rnp->qsmask = rnp->qsmaskinit;
> > > rnp->gpnum = rsp->gpnum;
> > > rnp->completed = rsp->completed;
> > > if (rnp == rdp->mynode)
> > > rcu_start_gp_per_cpu(rsp, rnp, rdp);
> > > rcu_preempt_boost_start_gp(rnp);
> > > trace_rcu_grace_period_init(rsp->name, rnp->gpnum,
> > > rnp->level, rnp->grplo,
> > > rnp->grphi, rnp->qsmask);
> > > raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */
> > > }
> > >
> > >I am assuming that your debug prints are still invoked right after
> > >the raw_spin_lock() above. If so, I would expect nc==ng and mc==mg.
> > >Even if your debug prints followed the assignments to rnp->gpnum and
> > >rnp->completed, I would expect mc==mg for the root and internal rcu_node
> > >structures. But you say below that you get the same values throughout,
> > >and in that case, I would expect the leaf rcu_node structure to show
> > >something different than the root and internal structures.
> > >
> > >The code really does hold the root rcu_node lock at all calls to
> > >rcu_gp_start(), so I don't see how we could be getting two CPUs in that
> > >code at the same time, which would be one way that the rcu_node and
> > >rcu_data structures might get advance notice of the new grace period,
> > >but in that case, you would have more than one bit set in ->qsmask.
> > >
> > >So, any luck with the trace events for rcu_grace_period and
> > >rcu_grace_period_init?
> > >
> >
> > I've successfully enabled them and it seems to work, however once
> > the issue is triggered any attempt to access
> > /sys/kernel/debug/tracing/trace just hangs :/
>
> Hmmm... I wonder if it waits for a grace period?
>
> If it cannot be made to work, I can probably put together some
> alternative diagnostics, but it will take me a day or three.
Actually, another thing to try is "torture_type=rcu_bh" on the modprobe
line for rcutorture. Also, it would be good to get a stack dump of the
hung process -- it might be hung for some other reason.
Thanx, Paul
next prev parent reply other threads:[~2012-01-30 16:20 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-25 9:44 RCU qsmask !=0 warnings on large-SMP Daniel J Blueman
2012-01-25 14:00 ` Paul E. McKenney
2012-01-25 14:18 ` Steffen Persvold
2012-01-25 18:14 ` Paul E. McKenney
2012-01-25 20:35 ` Steffen Persvold
2012-01-25 21:51 ` Paul E. McKenney
2012-01-25 22:51 ` Steffen Persvold
2012-01-26 1:57 ` Paul E. McKenney
2012-01-25 21:14 ` Steffen Persvold
2012-01-25 21:34 ` Paul E. McKenney
2012-01-25 22:48 ` Steffen Persvold
2012-01-26 1:58 ` Paul E. McKenney
2012-01-26 15:04 ` Steffen Persvold
2012-01-26 19:26 ` Paul E. McKenney
2012-01-27 11:09 ` Steffen Persvold
2012-01-29 6:09 ` Paul E. McKenney
2012-01-30 16:15 ` Paul E. McKenney [this message]
2012-01-31 17:33 ` Steffen Persvold
2012-01-31 17:38 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120130161529.GA5118@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=daniel@numascale-asia.com \
--cc=dipankar@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sp@numascale.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.