From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752831AbbJGMi6 (ORCPT <rfc822;w@1wt.eu>);
	Wed, 7 Oct 2015 08:38:58 -0400
Received: from bombadil.infradead.org ([198.137.202.9]:46795 "EHLO
	bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751302AbbJGMi5 (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 7 Oct 2015 08:38:57 -0400
Date: Wed, 7 Oct 2015 14:38:52 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: heiko.carstens@de.ibm.com, linux-kernel@vger.kernel.org,
        Tejun Heo <tj@kernel.org>, Ingo Molnar <mingo@kernel.org>,
        Rik van Riel <riel@redhat.com>
Subject: Re: [RFC][PATCH] sched: Start stopper early
Message-ID: <20151007123852.GH17308@twins.programming.kicks-ass.net>
References: <20151007084110.GX2881@worktop.programming.kicks-ass.net>
 <20151007123046.GA21460@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20151007123046.GA21460@redhat.com>
User-Agent: Mutt/1.5.21 (2012-12-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, Oct 07, 2015 at 02:30:46PM +0200, Oleg Nesterov wrote:
> On 10/07, Peter Zijlstra wrote:
> >
> > So Heiko reported some 'interesting' fail where stop_two_cpus() got
> > stuck in multi_cpu_stop() with one cpu waiting for another that never
> > happens.
> >
> > It _looks_ like the 'other' cpu isn't running and the current best
> > theory is that we race on cpu-up and get the stop_two_cpus() call in
> > before the stopper task is running.
> >
> > This _is_ possible because we set 'online && active'
> 
> Argh. Can't really comment this change right now, but this reminds me
> that stop_two_cpus() path should not rely on cpu_active() at all. I mean
> we should not use this check to avoid the deadlock, migrate_swap_stop()
> can check it itself. And cpu_stop_park()->cpu_stop_signal_done() should
> be replaced by BUG_ON().
> 
> Probably slightly off-topic, but what do you finally think about the old
> "[PATCH v2 6/6] stop_machine: kill stop_cpus_lock and lg_double_lock/unlock()"
> we discussed in http://marc.info/?t=143750670300014 ?
> 
> I won't really insist if you still dislike it, but it seems we both
> agree that "lg_lock stop_cpus_lock" must die in any case, and after that
> we can the cleanups mentioned above.

Yes, I was looking at that, this issue reminded me we still had that
issue open.

> And, Peter, I see a lot of interesting emails from you, but currently
> can't even read them. I hope very much I will read them later and perhaps
> even reply ;)

Sure, take your time.