From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753681Ab2AZT2I (ORCPT ); Thu, 26 Jan 2012 14:28:08 -0500 Received: from e3.ny.us.ibm.com ([32.97.182.143]:40429 "EHLO e3.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752333Ab2AZT2G (ORCPT ); Thu, 26 Jan 2012 14:28:06 -0500 Date: Thu, 26 Jan 2012 11:26:53 -0800 From: "Paul E. McKenney" To: Steffen Persvold Cc: Daniel J Blueman , Dipankar Sarma , linux-kernel@vger.kernel.org, x86@kernel.org Subject: Re: RCU qsmask !=0 warnings on large-SMP... Message-ID: <20120126192653.GC2437@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <4F1FCF02.9060209@numascale-asia.com> <20120125140029.GA2534@linux.vnet.ibm.com> <4F200F4D.5000201@numascale.com> <20120125181441.GD2849@linux.vnet.ibm.com> <4F2070B9.2000104@numascale.com> <20120125213450.GJ2849@linux.vnet.ibm.com> <4F2086DA.4040203@numascale.com> <20120126015843.GN2849@linux.vnet.ibm.com> <4F216B85.7010208@numascale.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F216B85.7010208@numascale.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Content-Scanned: Fidelis XPS MAILER x-cbid: 12012619-8974-0000-0000-000005982A00 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 26, 2012 at 04:04:37PM +0100, Steffen Persvold wrote: > On 1/26/2012 02:58, Paul E. McKenney wrote: > >On Wed, Jan 25, 2012 at 11:48:58PM +0100, Steffen Persvold wrote: > [] > > > >This looks like it will produce useful information, but I am not seeing > >output from it below. > > > > Thanx, Paul > > > >>This run it was CPU24 that triggered the issue : > >> > > This line is the printout for the root level : > > >>[ 231.572688] CPU 24, treason uncloaked, rsp @ ffffffff81a1cd80 (rcu_sched), rnp @ ffffffff81a1cd80(r) qsmask=0x1f, c=5132 g=5132 nc=5132 ng=5133 sc=5132 sg=5133 mc=5132 mg=5133 OK, so the rcu_state structure (sc and sg) believes that grace period 5133 has started but not completed, as expected. Strangely enough, so does the root rcu_node structure (nc and ng) and the CPU's leaf rcu_node structure (mc and mg). The per-CPU rcu_data structure (c and g) does not yet know about the new 5133 grace period, as expected. So this is the code in kernel/rcutree.c:rcu_start_gp() that does the initialization: rcu_for_each_node_breadth_first(rsp, rnp) { raw_spin_lock(&rnp->lock); /* irqs already disabled. */ rcu_preempt_check_blocked_tasks(rnp); rnp->qsmask = rnp->qsmaskinit; rnp->gpnum = rsp->gpnum; rnp->completed = rsp->completed; if (rnp == rdp->mynode) rcu_start_gp_per_cpu(rsp, rnp, rdp); rcu_preempt_boost_start_gp(rnp); trace_rcu_grace_period_init(rsp->name, rnp->gpnum, rnp->level, rnp->grplo, rnp->grphi, rnp->qsmask); raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */ } I am assuming that your debug prints are still invoked right after the raw_spin_lock() above. If so, I would expect nc==ng and mc==mg. Even if your debug prints followed the assignments to rnp->gpnum and rnp->completed, I would expect mc==mg for the root and internal rcu_node structures. But you say below that you get the same values throughout, and in that case, I would expect the leaf rcu_node structure to show something different than the root and internal structures. The code really does hold the root rcu_node lock at all calls to rcu_gp_start(), so I don't see how we could be getting two CPUs in that code at the same time, which would be one way that the rcu_node and rcu_data structures might get advance notice of the new grace period, but in that case, you would have more than one bit set in ->qsmask. So, any luck with the trace events for rcu_grace_period and rcu_grace_period_init? Thanx, Paul > (this is the WARN_ON printout) : > >>[ 231.576167] ------------[ cut here ]------------ > >>[ 231.576167] WARNING: at kernel/rcutree_plugin.h:1011 rcu_preempt_check_blocked_tasks+0x27/0x30() > >>[ 231.576167] Hardware name: H8QI6 > >>[ 231.576167] Modules linked in: rcutorture > >>[ 231.576167] Pid: 4603, comm: rcu_torture_rea Not tainted 3.2.1-numaconnect10+ #77 > >>[ 231.576167] Call Trace: > >>[ 231.576167] [] ? rcu_preempt_check_blocked_tasks+0x27/0x30 > >>[ 231.576167] [] warn_slowpath_common+0x8b/0xc0 > >>[ 231.576167] [] warn_slowpath_null+0x15/0x20 > >>[ 231.576167] [] rcu_preempt_check_blocked_tasks+0x27/0x30 > >>[ 231.576167] [] rcu_start_gp+0x110/0x1b0 > >>[ 231.576167] [] __rcu_process_callbacks+0x8b/0xd0 > >>[ 231.576167] [] rcu_process_callbacks+0x20/0x40 > >>[ 231.576167] [] __do_softirq+0x9d/0x140 > >>[ 231.576167] [] call_softirq+0x1c/0x30 > >>[ 231.576167] [] do_softirq+0x4a/0x80 > >>[ 231.576167] [] irq_exit+0x43/0x60 > >>[ 231.576167] [] smp_apic_timer_interrupt+0x45/0x60 > >>[ 231.576167] [] apic_timer_interrupt+0x6b/0x70 > >>[ 231.576167] [] ? finish_task_switch+0x59/0xc0 > >>[ 231.576167] [] __schedule+0x337/0x710 > >>[ 231.576167] [] ? sched_clock_local+0x15/0x80 > >>[ 231.576167] [] ? lock_timer_base+0x36/0x70 > >>[ 231.576167] [] ? mod_timer+0xf2/0x1d0 > >>[ 231.576167] [] ? rcu_torture_shuffle+0x80/0x80 [rcutorture] > >>[ 231.576167] [] schedule+0x3a/0x60 > >>[ 231.576167] [] rcu_torture_reader+0x130/0x230 [rcutorture] > >>[ 231.576167] [] ? rcu_torture_writer+0x160/0x160 [rcutorture] > >>[ 231.576167] [] ? rcu_torture_shuffle+0x80/0x80 [rcutorture] > >>[ 231.576167] [] kthread+0x96/0xa0 > >>[ 231.576167] [] kernel_thread_helper+0x4/0x10 > >>[ 231.576167] [] ? kthread_stop+0x70/0x70 > >>[ 231.576167] [] ? gs_change+0xb/0xb > >>[ 231.576167] ---[ end trace 828c8d7afbd02d1b ]--- > >> > > I didn't include the leaf node printout, but the counters were > indentical to the root printout (with the exception of the rnp > address and qsmask of course). > > Cheers, > -- > Steffen Persvold, Chief Architect NumaChip > Numascale AS - www.numascale.com > Tel: +47 92 49 25 54 Skype: spersvold >