From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030328Ab2CTPCy (ORCPT ); Tue, 20 Mar 2012 11:02:54 -0400 Received: from e23smtp04.au.ibm.com ([202.81.31.146]:42595 "EHLO e23smtp04.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758036Ab2CTPCx (ORCPT ); Tue, 20 Mar 2012 11:02:53 -0400 Message-ID: <4F689C14.9000708@linux.vnet.ibm.com> Date: Tue, 20 Mar 2012 20:32:44 +0530 From: "Srivatsa S. Bhat" User-Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.1) Gecko/20120209 Thunderbird/10.0.1 MIME-Version: 1.0 To: Peter Zijlstra CC: "Liu, Chuansheng" , "linux-kernel@vger.kernel.org" , Yanmin Zhang , "tglx@linutronix.de" , "Rafael J. Wysocki" , Linux PM mailing list , "Srivatsa S. Bhat" Subject: Re: [PATCH] Fix the race between smp_call_function and CPU booting References: <27240C0AC20F114CBF8149A2696CBE4A053BE8@SHSMSX101.ccr.corp.intel.com> <1331546307.18960.26.camel@twins> <27240C0AC20F114CBF8149A2696CBE4A054D70@SHSMSX101.ccr.corp.intel.com> <1331654251.18960.78.camel@twins> <27240C0AC20F114CBF8149A2696CBE4A0556FA@SHSMSX101.ccr.corp.intel.com> <1331718197.18960.106.camel@twins> <27240C0AC20F114CBF8149A2696CBE4A056F47@SHSMSX101.ccr.corp.intel.com> <1331808391.18960.160.camel@twins> <27240C0AC20F114CBF8149A2696CBE4A05806B@SHSMSX101.ccr.corp.intel.com> <1331891364.18960.221.camel@twins> <27240C0AC20F114CBF8149A2696CBE4A058AAE@SHSMSX101.ccr.corp.intel.com> <1332151397.18960.252.camel@twins> <27240C0AC20F114CBF8149A2696CBE4A05A6BF@SHSMSX101.ccr.corp.intel.com> <1332245842.18960.413.camel@twins> <4F6882BA.2040000@linux.vnet.ibm.com> <1332250867.18960.423.camel@twins> In-Reply-To: <1332250867.18960.423.camel@twins> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit x-cbid: 12032004-9264-0000-0000-000001172546 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/20/2012 07:11 PM, Peter Zijlstra wrote: > On Tue, 2012-03-20 at 18:44 +0530, Srivatsa S. Bhat wrote: >> >> >> I don't think this patch would change anything, atleast it wouldn't get >> rid of the warning that Liu reported. Because, he is running his stress >> tests on a machine which has only 2 CPUs. So effectively, we are hotplugging >> only CPU1 (since CPU0 can't be taken offline, on Intel boxes). >> >> Also, CPU1 is removed from cpu_active_mask during CPU_DOWN_PREPARE time itself, >> and migrate_tasks() comes much later (during CPU_DYING). And in any case, >> dest_cpu will never be CPU1, because it is the CPU that is going down. So it >> *has* to be CPU0 anyway. >> >> So, I don't think changes to select_fallback_rq() to make it more careful is >> going to make any difference in the particular scenario that Liu is testing. >> >> That said, even I don't know what the root cause of the warning is.. :-( > > Its a race in cpu-up, we set active before online, when we do a wakeup > select_task_rq() will see !cpu_online(), we then call > select_fallback_rq() to compute a new cpu, select_fallback_rq() computes > a new cpu against cpu_active (which is set) and can thus return cpu 1, > even though it is still offline. > > So we queue the task on cpu 1 and send a reschedule ipi, at which point > we'll get the reported warning. > > My change modifies select_fallback_rq() to require online && active. > Ok, that makes sense.. Thanks a lot for the explanation! Regards, Srivatsa S. Bhat