From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S934462AbbJHQMJ (ORCPT <rfc822;w@1wt.eu>);
	Thu, 8 Oct 2015 12:12:09 -0400
Received: from mx1.redhat.com ([209.132.183.28]:35436 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S933099AbbJHQME (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Thu, 8 Oct 2015 12:12:04 -0400
Date: Thu, 8 Oct 2015 18:08:45 +0200
From: Oleg Nesterov <oleg@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: heiko.carstens@de.ibm.com, Tejun Heo <tj@kernel.org>,
        Ingo Molnar <mingo@kernel.org>, Rik van Riel <riel@redhat.com>,
        Thomas Gleixner <tglx@linutronix.de>, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] stop_machine: change cpu_stop_queue_two_works() to
	rely on stopper->enabled
Message-ID: <20151008160845.GA21176@redhat.com>
References: <20151008145059.GA17916@redhat.com> <20151008145136.GA18149@redhat.com> <20151008150431.GO3816@twins.programming.kicks-ass.net> <20151008155953.GA20508@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20151008155953.GA20508@redhat.com>
User-Agent: Mutt/1.5.18 (2008-05-17)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 10/08, Oleg Nesterov wrote:
>
> On 10/08, Peter Zijlstra wrote:
> >
> > On Thu, Oct 08, 2015 at 04:51:36PM +0200, Oleg Nesterov wrote:
> > > @@ -261,12 +276,8 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
> > >  	set_state(&msdata, MULTI_STOP_PREPARE);
> > >
> > >  	/*
> > > +	 * We do not want to migrate to inactive CPU. FIXME: move this
> > > +	 * into the caller.
> > >  	 */
> > >  	if (!cpu_active(cpu1) || !cpu_active(cpu2)) {
> > >  		preempt_enable();
> >
> > So we cannot move that into the caller..
>
> Why?
>
> > because this function sleeps
> > with wait_for_completion().
> >
> > Or rather, it would force the caller to use get_online_cpus(), which we
> > worked really hard to avoid.
>
> Aaah wait. Sorry for confusion!
>
> I meant "move this into the callback, migrate_swap_stop()".

Forgot to mention... And note that both these cpu_active() are obviously
racy, CPU_DOWN_PREPARE can make it inactive right after the check.

> > Also, I think we still want the patch I proposed which ensures the
> > stopper thread is active 'early', because the load balancer pretty much
>
> Perhaps. Although I do not really understand why it is important.
> I mean, either way we unpark it at CPU_ONLINE stage, just
> sched_cpu_active() has a higher priority.
>
> But this is off-topic in a sense that the main point of this patch
> is that stop_two_cpus() no longer needs to abuse cpu_active() checks
> to avoid the race with cpu_up/down, we can simply rely on ->enabled.
>
> And again, we need to take both locks to remove "lglock stop_cpus_lock".
>
> So I think your change can be applied after this series too. Or I missed
> something?

So I think this series makes sense anyway and hopefully should fix the
problem with or without your change.

Oleg.