From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934462AbbJHQMJ (ORCPT ); Thu, 8 Oct 2015 12:12:09 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35436 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933099AbbJHQME (ORCPT ); Thu, 8 Oct 2015 12:12:04 -0400 Date: Thu, 8 Oct 2015 18:08:45 +0200 From: Oleg Nesterov To: Peter Zijlstra Cc: heiko.carstens@de.ibm.com, Tejun Heo , Ingo Molnar , Rik van Riel , Thomas Gleixner , linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/3] stop_machine: change cpu_stop_queue_two_works() to rely on stopper->enabled Message-ID: <20151008160845.GA21176@redhat.com> References: <20151008145059.GA17916@redhat.com> <20151008145136.GA18149@redhat.com> <20151008150431.GO3816@twins.programming.kicks-ass.net> <20151008155953.GA20508@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151008155953.GA20508@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/08, Oleg Nesterov wrote: > > On 10/08, Peter Zijlstra wrote: > > > > On Thu, Oct 08, 2015 at 04:51:36PM +0200, Oleg Nesterov wrote: > > > @@ -261,12 +276,8 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * > > > set_state(&msdata, MULTI_STOP_PREPARE); > > > > > > /* > > > + * We do not want to migrate to inactive CPU. FIXME: move this > > > + * into the caller. > > > */ > > > if (!cpu_active(cpu1) || !cpu_active(cpu2)) { > > > preempt_enable(); > > > > So we cannot move that into the caller.. > > Why? > > > because this function sleeps > > with wait_for_completion(). > > > > Or rather, it would force the caller to use get_online_cpus(), which we > > worked really hard to avoid. > > Aaah wait. Sorry for confusion! > > I meant "move this into the callback, migrate_swap_stop()". Forgot to mention... And note that both these cpu_active() are obviously racy, CPU_DOWN_PREPARE can make it inactive right after the check. > > Also, I think we still want the patch I proposed which ensures the > > stopper thread is active 'early', because the load balancer pretty much > > Perhaps. Although I do not really understand why it is important. > I mean, either way we unpark it at CPU_ONLINE stage, just > sched_cpu_active() has a higher priority. > > But this is off-topic in a sense that the main point of this patch > is that stop_two_cpus() no longer needs to abuse cpu_active() checks > to avoid the race with cpu_up/down, we can simply rely on ->enabled. > > And again, we need to take both locks to remove "lglock stop_cpus_lock". > > So I think your change can be applied after this series too. Or I missed > something? So I think this series makes sense anyway and hopefully should fix the problem with or without your change. Oleg.