From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757537Ab2GFSjM (ORCPT ); Fri, 6 Jul 2012 14:39:12 -0400 Received: from wolverine02.qualcomm.com ([199.106.114.251]:64359 "EHLO wolverine02.qualcomm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754946Ab2GFSjJ (ORCPT ); Fri, 6 Jul 2012 14:39:09 -0400 X-IronPort-AV: E=McAfee;i="5400,1158,6764"; a="205765313" Message-ID: <4FF730CD.7050907@codeaurora.org> Date: Fri, 06 Jul 2012 11:39:09 -0700 From: Stephen Boyd User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: paulmck@linux.vnet.ibm.com CC: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] ARM: smp: Fix suspicious RCU originating from cpu_die() References: <1341531958-31721-1-git-send-email-sboyd@codeaurora.org> <20120706002404.GJ2522@linux.vnet.ibm.com> In-Reply-To: <20120706002404.GJ2522@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/05/12 17:24, Paul E. McKenney wrote: > On Thu, Jul 05, 2012 at 04:45:58PM -0700, Stephen Boyd wrote: >> @@ -179,7 +184,7 @@ void __ref cpu_die(void) >> mb(); >> >> /* Tell __cpu_die() that this CPU is now safe to dispose of */ >> - complete(&cpu_died); >> + __this_cpu_write(cpu_state, CPU_DEAD); > Or you could do something like: > > RCU_NONIDLE(complete(&cpu_died)); > > This would tell RCU that it needed to pay attention to this CPU for > the duration of the "complete()" function call despite the CPU's being > idle. And might allow you to dispense with the rest of the patch. Great! I like that more since we get to keep the completion mechanism instead of a busy wait. Russell, which one would you prefer? Here's the other version ----->8-----8<----- Subject: [PATCH] ARM: smp: Fix suspicious RCU originating from cpu_die() While running hotplug tests I ran into this RCU splat =============================== [ INFO: suspicious RCU usage. ] 3.4.0 #3275 Tainted: G W ------------------------------- include/linux/rcupdate.h:729 rcu_read_lock() used illegally while idle! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! 4 locks held by swapper/2/0: #0: ((cpu_died).wait.lock){......}, at: [] complete+0x1c/0x5c #1: (&p->pi_lock){-.-.-.}, at: [] try_to_wake_up+0x2c/0x388 #2: (&rq->lock){-.-.-.}, at: [] try_to_wake_up+0x130/0x388 #3: (rcu_read_lock){.+.+..}, at: [] cpuacct_charge+0x28/0x1f4 stack backtrace: [] (unwind_backtrace+0x0/0x12c) from [] (cpuacct_charge+0x94/0x1f4) [] (cpuacct_charge+0x94/0x1f4) from [] (update_curr+0x24c/0x2c8) [] (update_curr+0x24c/0x2c8) from [] (enqueue_task_fair+0x50/0x194) [] (enqueue_task_fair+0x50/0x194) from [] (enqueue_task+0x30/0x34) [] (enqueue_task+0x30/0x34) from [] (ttwu_activate+0x14/0x38) [] (ttwu_activate+0x14/0x38) from [] (try_to_wake_up+0x178/0x388) [] (try_to_wake_up+0x178/0x388) from [] (__wake_up_common+0x34/0x78) [] (__wake_up_common+0x34/0x78) from [] (complete+0x48/0x5c) [] (complete+0x48/0x5c) from [] (cpu_die+0x2c/0x58) [] (cpu_die+0x2c/0x58) from [] (cpu_idle+0x64/0xfc) [] (cpu_idle+0x64/0xfc) from [<80208160>] (0x80208160) When a cpu is marked offline during its idle thread it calls cpu_die() during an RCU idle period. cpu_die() calls complete() to notify the killing process that the cpu has died. complete() calls into the scheduler code and eventually grabs an RCU read lock in cpuacct_charge(). Mark complete() as RCU_NONIDLE so that RCU pays attention to this CPU for the duration of the complete() function even though it's in idle. Suggested-by: "Paul E. McKenney" Signed-off-by: Stephen Boyd --- arch/arm/kernel/smp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index 2c7217d..aea74f5 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -179,7 +179,7 @@ void __ref cpu_die(void) mb(); /* Tell __cpu_die() that this CPU is now safe to dispose of */ - complete(&cpu_died); + RCU_NONIDLE(complete(&cpu_died)); /* * actual CPU shutdown procedure is at least platform (if not -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.