From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755963Ab2AYQ3g (ORCPT ); Wed, 25 Jan 2012 11:29:36 -0500 Received: from cpanel23.proisp.no ([88.87.44.74]:35438 "EHLO cpanel23.proisp.no" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755340Ab2AYQ3f (ORCPT ); Wed, 25 Jan 2012 11:29:35 -0500 X-Greylist: delayed 7831 seconds by postgrey-1.27 at vger.kernel.org; Wed, 25 Jan 2012 11:29:34 EST Message-ID: <4F200F4D.5000201@numascale.com> Date: Wed, 25 Jan 2012 15:18:53 +0100 From: Steffen Persvold User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111222 Thunderbird/9.0.1 MIME-Version: 1.0 To: paulmck@linux.vnet.ibm.com CC: Daniel J Blueman , Dipankar Sarma , linux-kernel@vger.kernel.org, x86@kernel.org Subject: Re: RCU qsmask !=0 warnings on large-SMP... References: <4F1FCF02.9060209@numascale-asia.com> <20120125140029.GA2534@linux.vnet.ibm.com> In-Reply-To: <20120125140029.GA2534@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - cpanel23.proisp.no X-AntiAbuse: Original Domain - vger.kernel.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - numascale.com X-Source: X-Source-Args: X-Source-Dir: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/25/2012 15:00, Paul E. McKenney wrote: > On Wed, Jan 25, 2012 at 05:44:34PM +0800, Daniel J Blueman wrote: [] > > I do test three-level trees regularly, but on relatively small systems > by limiting CONFIG_RCU_FANOUT. If you are running 64-bit systems > with default values of CONFIG_RCU_FANOUT, a 1024-CPU system fits into a > two-level tree, so I would guess that you either have more than 1024 CPUs > (at least as far as NR_CPUS is concerned) or gaps in your CPU-numbering > scheme. NR_CPU is set to 4096 (i.e CONFIG_MAXSMP=y) and CONFIG_FCU_FANOUT is 64. We see 3 levels of RCU nodes with this configuration. > >> Has this been encountered previously? > > I haven't seen that warning since I was originally developing > TREE_PREEMPT_RCU some years back. > > Numascale NumaConnect is cache-coherent, right? Correct. [] >> >> WARNING: at kernel/rcutree_plugin.h:990 > > OK, if I have the right version, RCU-sched is trying to do a new grace > period, but finds that one of the rcu_node structure's ->qsmask fields > indicates that a CPU (or group of CPUs) failed to check into the previous > grace period. > [] >> >> CPU 48, treason uncloaked, rsp @ ffffffff81a1d000 (rcu_sched), >> rsp->gpnum is 4058, rsp->completed is 4057, rnp @ ffffffff81a1d000, >> qsmask is 0x1 >> 0 ffff8803f840d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=0 ri=279 >> ql=1 qs=..W. b=10 ci=145949 co=0 ca=0 >> 12 ffff880bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=7 ri=10 >> ql=2 qs=.R.. b=10 ci=561 co=0 ca=0 >> 24 ffff8813d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=14 >> ql=0 qs=.... b=10 ci=406 co=0 ca=0 >> 36 ffff881bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=5 ri=22 >> ql=0 qs=.... b=10 ci=215 co=0 ca=0 >> 48 ffff8823d040d4e0 c=4057 g=4057 pq=1 pgp=4057 qp=0 of=6 ri=14 >> ql=26 qs=.RWD b=10 ci=7345 co=0 ca=0 > > And it looks like CPU 48 is the one that did not check in. What is > CONFIG_NO_HZ set to? If CONFIG_NO_HZ=y, this can legitimately > happen, and printing the values of rdp->dynticks->dynticks, > rdp->dynticks->dynticks_nesting, rdp->dynticks->dynticks_nmi_nesting, > and rdp->dynticks_fqs will determine whether or not the situation is > legitimate or not. > > If CONFIG_NO_HZ=n, than this should not happen. In this case, I should > be able to provide you with a more focused debug print. CONFIG_NO_HZ is not set, so it should not happen. We see that the behavior is the same with CONFIG_NO_HZ=y though, but it takes longer to reproduce usually. > >> 60 ffff882bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=16 >> ql=0 qs=.... b=10 ci=50 co=0 ca=0 >> 72 ffff8833d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=26 >> ql=0 qs=.... b=10 ci=22 co=0 ca=0 [] >> CPU 48, treason uncloaked, rsp @ ffffffff81a1d000 (rcu_sched), >> rsp->gpnum is 4058, rsp->completed is 4057, rnp @ ffffffff81a1d100, >> qsmask is 0x8 >> 0 ffff8803f840d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=0 ri=279 >> ql=1 qs=..W. b=10 ci=145949 co=0 ca=0 >> 12 ffff880bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=7 ri=10 >> ql=3 qs=NR.. b=10 ci=561 co=0 ca=0 >> 24 ffff8813d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=14 >> ql=0 qs=.... b=10 ci=406 co=0 ca=0 >> 36 ffff881bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=5 ri=22 >> ql=0 qs=.... b=10 ci=215 co=0 ca=0 >> 48 ffff8823d040d4e0 c=4057 g=4057 pq=1 pgp=4057 qp=0 of=6 ri=19 >> ql=26 qs=.RWD b=10 ci=7345 co=0 ca=0 > > Hmmmm... Same CPU and grace-period number. This is a repeat of the > above printout, correct? Because the RCU tree is 3 levels, the printout function we added in the patch gets called 3 times each time with the same RDP but with different RNPs (in rcu_start_gp()). > >> 60 ffff882bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=16 >> ql=0 qs=.... b=10 ci=50 co=0 ca=0 >> 72 ffff8833d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=26 >> ql=0 qs=.... b=10 ci=22 co=0 ca=0 >> CPU 48, treason uncloaked, rsp @ ffffffff81a1d000 (rcu_sched), >> rsp->gpnum is 4058, rsp->completed is 4057, rnp @ ffffffff81a1d800, >> qsmask is 0x1 >> 0 ffff8803f840d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=0 ri=280 >> ql=1 qs=..W. b=10 ci=145949 co=0 ca=0 >> 12 ffff880bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=7 ri=11 >> ql=3 qs=NR.. b=10 ci=561 co=0 ca=0 >> 24 ffff8813d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=15 >> ql=0 qs=.... b=10 ci=406 co=0 ca=0 >> 36 ffff881bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=5 ri=23 >> ql=0 qs=.... b=10 ci=215 co=0 ca=0 >> 48 ffff8823d040d4e0 c=4057 g=4057 pq=1 pgp=4057 qp=0 of=6 ri=21 >> ql=26 qs=.RWD b=10 ci=7345 co=0 ca=0 > > Same here, but most of the ql= values are larger. Later printout? The loop in rcu_start_gp() releases the node lock between each time it gets a new level in the RCU tree (it has to) : rcu_for_each_node_breadth_first(rsp, rnp) { raw_spin_lock(&rnp->lock); /* irqs already disabled. */ rcu_debug_print(rsp, rnp); so I guess this allows ql= values to increase maybe, no ? > > Thanx, Paul > >> 60 ffff882bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=16 >> ql=0 qs=.... b=10 ci=50 co=0 ca=0 >> 72 ffff8833d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=26 >> ql=0 qs=.... b=10 ci=22 co=0 ca=0 >> Thanks for looking into this Paul, we'd be more than happy to test out theories and patches. Kind regards, -- Steffen Persvold, Chief Architect NumaChip Numascale AS - www.numascale.com Tel: +47 92 49 25 54 Skype: spersvold