From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Steffen Persvold <sp@numascale.com>
Cc: Daniel J Blueman <daniel@numascale-asia.com>,
Dipankar Sarma <dipankar@in.ibm.com>,
linux-kernel@vger.kernel.org, x86@kernel.org
Subject: Re: RCU qsmask !=0 warnings on large-SMP...
Date: Thu, 26 Jan 2012 11:26:53 -0800 [thread overview]
Message-ID: <20120126192653.GC2437@linux.vnet.ibm.com> (raw)
In-Reply-To: <4F216B85.7010208@numascale.com>
On Thu, Jan 26, 2012 at 04:04:37PM +0100, Steffen Persvold wrote:
> On 1/26/2012 02:58, Paul E. McKenney wrote:
> >On Wed, Jan 25, 2012 at 11:48:58PM +0100, Steffen Persvold wrote:
> []
> >
> >This looks like it will produce useful information, but I am not seeing
> >output from it below.
> >
> > Thanx, Paul
> >
> >>This run it was CPU24 that triggered the issue :
> >>
>
> This line is the printout for the root level :
>
> >>[ 231.572688] CPU 24, treason uncloaked, rsp @ ffffffff81a1cd80 (rcu_sched), rnp @ ffffffff81a1cd80(r) qsmask=0x1f, c=5132 g=5132 nc=5132 ng=5133 sc=5132 sg=5133 mc=5132 mg=5133
OK, so the rcu_state structure (sc and sg) believes that grace period
5133 has started but not completed, as expected. Strangely enough, so
does the root rcu_node structure (nc and ng) and the CPU's leaf rcu_node
structure (mc and mg).
The per-CPU rcu_data structure (c and g) does not yet know about the
new 5133 grace period, as expected.
So this is the code in kernel/rcutree.c:rcu_start_gp() that does the
initialization:
rcu_for_each_node_breadth_first(rsp, rnp) {
raw_spin_lock(&rnp->lock); /* irqs already disabled. */
rcu_preempt_check_blocked_tasks(rnp);
rnp->qsmask = rnp->qsmaskinit;
rnp->gpnum = rsp->gpnum;
rnp->completed = rsp->completed;
if (rnp == rdp->mynode)
rcu_start_gp_per_cpu(rsp, rnp, rdp);
rcu_preempt_boost_start_gp(rnp);
trace_rcu_grace_period_init(rsp->name, rnp->gpnum,
rnp->level, rnp->grplo,
rnp->grphi, rnp->qsmask);
raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */
}
I am assuming that your debug prints are still invoked right after
the raw_spin_lock() above. If so, I would expect nc==ng and mc==mg.
Even if your debug prints followed the assignments to rnp->gpnum and
rnp->completed, I would expect mc==mg for the root and internal rcu_node
structures. But you say below that you get the same values throughout,
and in that case, I would expect the leaf rcu_node structure to show
something different than the root and internal structures.
The code really does hold the root rcu_node lock at all calls to
rcu_gp_start(), so I don't see how we could be getting two CPUs in that
code at the same time, which would be one way that the rcu_node and
rcu_data structures might get advance notice of the new grace period,
but in that case, you would have more than one bit set in ->qsmask.
So, any luck with the trace events for rcu_grace_period and
rcu_grace_period_init?
Thanx, Paul
> (this is the WARN_ON printout) :
> >>[ 231.576167] ------------[ cut here ]------------
> >>[ 231.576167] WARNING: at kernel/rcutree_plugin.h:1011 rcu_preempt_check_blocked_tasks+0x27/0x30()
> >>[ 231.576167] Hardware name: H8QI6
> >>[ 231.576167] Modules linked in: rcutorture
> >>[ 231.576167] Pid: 4603, comm: rcu_torture_rea Not tainted 3.2.1-numaconnect10+ #77
> >>[ 231.576167] Call Trace:
> >>[ 231.576167]<IRQ> [<ffffffff810bb217>] ? rcu_preempt_check_blocked_tasks+0x27/0x30
> >>[ 231.576167] [<ffffffff8106f47b>] warn_slowpath_common+0x8b/0xc0
> >>[ 231.576167] [<ffffffff8106f4c5>] warn_slowpath_null+0x15/0x20
> >>[ 231.576167] [<ffffffff810bb217>] rcu_preempt_check_blocked_tasks+0x27/0x30
> >>[ 231.576167] [<ffffffff810bb330>] rcu_start_gp+0x110/0x1b0
> >>[ 231.576167] [<ffffffff810bbf3b>] __rcu_process_callbacks+0x8b/0xd0
> >>[ 231.576167] [<ffffffff810bc7a0>] rcu_process_callbacks+0x20/0x40
> >>[ 231.576167] [<ffffffff8107580d>] __do_softirq+0x9d/0x140
> >>[ 231.576167] [<ffffffff815d982c>] call_softirq+0x1c/0x30
> >>[ 231.576167] [<ffffffff8103451a>] do_softirq+0x4a/0x80
> >>[ 231.576167] [<ffffffff81075b83>] irq_exit+0x43/0x60
> >>[ 231.576167] [<ffffffff8104aed5>] smp_apic_timer_interrupt+0x45/0x60
> >>[ 231.576167] [<ffffffff815d834b>] apic_timer_interrupt+0x6b/0x70
> >>[ 231.576167]<EOI> [<ffffffff81067aa9>] ? finish_task_switch+0x59/0xc0
> >>[ 231.576167] [<ffffffff815d4d37>] __schedule+0x337/0x710
> >>[ 231.576167] [<ffffffff81090425>] ? sched_clock_local+0x15/0x80
> >>[ 231.576167] [<ffffffff8107b826>] ? lock_timer_base+0x36/0x70
> >>[ 231.576167] [<ffffffff8107baa2>] ? mod_timer+0xf2/0x1d0
> >>[ 231.576167] [<ffffffffa0001510>] ? rcu_torture_shuffle+0x80/0x80 [rcutorture]
> >>[ 231.576167] [<ffffffff815d53ea>] schedule+0x3a/0x60
> >>[ 231.576167] [<ffffffffa0001640>] rcu_torture_reader+0x130/0x230 [rcutorture]
> >>[ 231.576167] [<ffffffffa0001dc0>] ? rcu_torture_writer+0x160/0x160 [rcutorture]
> >>[ 231.576167] [<ffffffffa0001510>] ? rcu_torture_shuffle+0x80/0x80 [rcutorture]
> >>[ 231.576167] [<ffffffff8108a726>] kthread+0x96/0xa0
> >>[ 231.576167] [<ffffffff815d9734>] kernel_thread_helper+0x4/0x10
> >>[ 231.576167] [<ffffffff8108a690>] ? kthread_stop+0x70/0x70
> >>[ 231.576167] [<ffffffff815d9730>] ? gs_change+0xb/0xb
> >>[ 231.576167] ---[ end trace 828c8d7afbd02d1b ]---
> >>
>
> I didn't include the leaf node printout, but the counters were
> indentical to the root printout (with the exception of the rnp
> address and qsmask of course).
>
> Cheers,
> --
> Steffen Persvold, Chief Architect NumaChip
> Numascale AS - www.numascale.com
> Tel: +47 92 49 25 54 Skype: spersvold
>
next prev parent reply other threads:[~2012-01-26 19:28 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-25 9:44 RCU qsmask !=0 warnings on large-SMP Daniel J Blueman
2012-01-25 14:00 ` Paul E. McKenney
2012-01-25 14:18 ` Steffen Persvold
2012-01-25 18:14 ` Paul E. McKenney
2012-01-25 20:35 ` Steffen Persvold
2012-01-25 21:51 ` Paul E. McKenney
2012-01-25 22:51 ` Steffen Persvold
2012-01-26 1:57 ` Paul E. McKenney
2012-01-25 21:14 ` Steffen Persvold
2012-01-25 21:34 ` Paul E. McKenney
2012-01-25 22:48 ` Steffen Persvold
2012-01-26 1:58 ` Paul E. McKenney
2012-01-26 15:04 ` Steffen Persvold
2012-01-26 19:26 ` Paul E. McKenney [this message]
2012-01-27 11:09 ` Steffen Persvold
2012-01-29 6:09 ` Paul E. McKenney
2012-01-30 16:15 ` Paul E. McKenney
2012-01-31 17:33 ` Steffen Persvold
2012-01-31 17:38 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120126192653.GC2437@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=daniel@numascale-asia.com \
--cc=dipankar@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sp@numascale.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.