From: Steffen Persvold <sp@numascale.com>
To: paulmck@linux.vnet.ibm.com
Cc: Daniel J Blueman <daniel@numascale-asia.com>,
Dipankar Sarma <dipankar@in.ibm.com>,
linux-kernel@vger.kernel.org, x86@kernel.org
Subject: Re: RCU qsmask !=0 warnings on large-SMP...
Date: Wed, 25 Jan 2012 15:18:53 +0100 [thread overview]
Message-ID: <4F200F4D.5000201@numascale.com> (raw)
In-Reply-To: <20120125140029.GA2534@linux.vnet.ibm.com>
On 1/25/2012 15:00, Paul E. McKenney wrote:
> On Wed, Jan 25, 2012 at 05:44:34PM +0800, Daniel J Blueman wrote:
[]
>
> I do test three-level trees regularly, but on relatively small systems
> by limiting CONFIG_RCU_FANOUT. If you are running 64-bit systems
> with default values of CONFIG_RCU_FANOUT, a 1024-CPU system fits into a
> two-level tree, so I would guess that you either have more than 1024 CPUs
> (at least as far as NR_CPUS is concerned) or gaps in your CPU-numbering
> scheme.
NR_CPU is set to 4096 (i.e CONFIG_MAXSMP=y) and CONFIG_FCU_FANOUT is 64. We see 3 levels of RCU nodes with this configuration.
>
>> Has this been encountered previously?
>
> I haven't seen that warning since I was originally developing
> TREE_PREEMPT_RCU some years back.
>
> Numascale NumaConnect is cache-coherent, right?
Correct.
[]
>>
>> WARNING: at kernel/rcutree_plugin.h:990
>
> OK, if I have the right version, RCU-sched is trying to do a new grace
> period, but finds that one of the rcu_node structure's ->qsmask fields
> indicates that a CPU (or group of CPUs) failed to check into the previous
> grace period.
>
[]
>>
>> CPU 48, treason uncloaked, rsp @ ffffffff81a1d000 (rcu_sched),
>> rsp->gpnum is 4058, rsp->completed is 4057, rnp @ ffffffff81a1d000,
>> qsmask is 0x1
>> 0 ffff8803f840d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=0 ri=279
>> ql=1 qs=..W. b=10 ci=145949 co=0 ca=0
>> 12 ffff880bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=7 ri=10
>> ql=2 qs=.R.. b=10 ci=561 co=0 ca=0
>> 24 ffff8813d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=14
>> ql=0 qs=.... b=10 ci=406 co=0 ca=0
>> 36 ffff881bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=5 ri=22
>> ql=0 qs=.... b=10 ci=215 co=0 ca=0
>> 48 ffff8823d040d4e0 c=4057 g=4057 pq=1 pgp=4057 qp=0 of=6 ri=14
>> ql=26 qs=.RWD b=10 ci=7345 co=0 ca=0
>
> And it looks like CPU 48 is the one that did not check in. What is
> CONFIG_NO_HZ set to? If CONFIG_NO_HZ=y, this can legitimately
> happen, and printing the values of rdp->dynticks->dynticks,
> rdp->dynticks->dynticks_nesting, rdp->dynticks->dynticks_nmi_nesting,
> and rdp->dynticks_fqs will determine whether or not the situation is
> legitimate or not.
>
> If CONFIG_NO_HZ=n, than this should not happen. In this case, I should
> be able to provide you with a more focused debug print.
CONFIG_NO_HZ is not set, so it should not happen. We see that the behavior is the same with CONFIG_NO_HZ=y though, but it takes longer to reproduce usually.
>
>> 60 ffff882bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=16
>> ql=0 qs=.... b=10 ci=50 co=0 ca=0
>> 72 ffff8833d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=26
>> ql=0 qs=.... b=10 ci=22 co=0 ca=0
[]
>> CPU 48, treason uncloaked, rsp @ ffffffff81a1d000 (rcu_sched),
>> rsp->gpnum is 4058, rsp->completed is 4057, rnp @ ffffffff81a1d100,
>> qsmask is 0x8
>> 0 ffff8803f840d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=0 ri=279
>> ql=1 qs=..W. b=10 ci=145949 co=0 ca=0
>> 12 ffff880bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=7 ri=10
>> ql=3 qs=NR.. b=10 ci=561 co=0 ca=0
>> 24 ffff8813d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=14
>> ql=0 qs=.... b=10 ci=406 co=0 ca=0
>> 36 ffff881bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=5 ri=22
>> ql=0 qs=.... b=10 ci=215 co=0 ca=0
>> 48 ffff8823d040d4e0 c=4057 g=4057 pq=1 pgp=4057 qp=0 of=6 ri=19
>> ql=26 qs=.RWD b=10 ci=7345 co=0 ca=0
>
> Hmmmm... Same CPU and grace-period number. This is a repeat of the
> above printout, correct?
Because the RCU tree is 3 levels, the printout function we added in the patch gets called 3 times each time with the same RDP but with different RNPs (in rcu_start_gp()).
>
>> 60 ffff882bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=16
>> ql=0 qs=.... b=10 ci=50 co=0 ca=0
>> 72 ffff8833d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=26
>> ql=0 qs=.... b=10 ci=22 co=0 ca=0
>> CPU 48, treason uncloaked, rsp @ ffffffff81a1d000 (rcu_sched),
>> rsp->gpnum is 4058, rsp->completed is 4057, rnp @ ffffffff81a1d800,
>> qsmask is 0x1
>> 0 ffff8803f840d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=0 ri=280
>> ql=1 qs=..W. b=10 ci=145949 co=0 ca=0
>> 12 ffff880bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=7 ri=11
>> ql=3 qs=NR.. b=10 ci=561 co=0 ca=0
>> 24 ffff8813d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=15
>> ql=0 qs=.... b=10 ci=406 co=0 ca=0
>> 36 ffff881bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=5 ri=23
>> ql=0 qs=.... b=10 ci=215 co=0 ca=0
>> 48 ffff8823d040d4e0 c=4057 g=4057 pq=1 pgp=4057 qp=0 of=6 ri=21
>> ql=26 qs=.RWD b=10 ci=7345 co=0 ca=0
>
> Same here, but most of the ql= values are larger. Later printout?
The loop in rcu_start_gp() releases the node lock between each time it gets a new level in the RCU tree (it has to) :
rcu_for_each_node_breadth_first(rsp, rnp) {
raw_spin_lock(&rnp->lock); /* irqs already disabled. */
rcu_debug_print(rsp, rnp);
so I guess this allows ql= values to increase maybe, no ?
>
> Thanx, Paul
>
>> 60 ffff882bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=16
>> ql=0 qs=.... b=10 ci=50 co=0 ca=0
>> 72 ffff8833d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=26
>> ql=0 qs=.... b=10 ci=22 co=0 ca=0
>>
Thanks for looking into this Paul, we'd be more than happy to test out theories and patches.
Kind regards,
--
Steffen Persvold, Chief Architect NumaChip
Numascale AS - www.numascale.com
Tel: +47 92 49 25 54 Skype: spersvold
next prev parent reply other threads:[~2012-01-25 16:29 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-25 9:44 RCU qsmask !=0 warnings on large-SMP Daniel J Blueman
2012-01-25 14:00 ` Paul E. McKenney
2012-01-25 14:18 ` Steffen Persvold [this message]
2012-01-25 18:14 ` Paul E. McKenney
2012-01-25 20:35 ` Steffen Persvold
2012-01-25 21:51 ` Paul E. McKenney
2012-01-25 22:51 ` Steffen Persvold
2012-01-26 1:57 ` Paul E. McKenney
2012-01-25 21:14 ` Steffen Persvold
2012-01-25 21:34 ` Paul E. McKenney
2012-01-25 22:48 ` Steffen Persvold
2012-01-26 1:58 ` Paul E. McKenney
2012-01-26 15:04 ` Steffen Persvold
2012-01-26 19:26 ` Paul E. McKenney
2012-01-27 11:09 ` Steffen Persvold
2012-01-29 6:09 ` Paul E. McKenney
2012-01-30 16:15 ` Paul E. McKenney
2012-01-31 17:33 ` Steffen Persvold
2012-01-31 17:38 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F200F4D.5000201@numascale.com \
--to=sp@numascale.com \
--cc=daniel@numascale-asia.com \
--cc=dipankar@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.