From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754008AbbJGMeH (ORCPT ); Wed, 7 Oct 2015 08:34:07 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52698 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753817AbbJGMeE (ORCPT ); Wed, 7 Oct 2015 08:34:04 -0400 Date: Wed, 7 Oct 2015 14:30:46 +0200 From: Oleg Nesterov To: Peter Zijlstra Cc: heiko.carstens@de.ibm.com, linux-kernel@vger.kernel.org, Tejun Heo , Ingo Molnar , Rik van Riel Subject: Re: [RFC][PATCH] sched: Start stopper early Message-ID: <20151007123046.GA21460@redhat.com> References: <20151007084110.GX2881@worktop.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151007084110.GX2881@worktop.programming.kicks-ass.net> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/07, Peter Zijlstra wrote: > > So Heiko reported some 'interesting' fail where stop_two_cpus() got > stuck in multi_cpu_stop() with one cpu waiting for another that never > happens. > > It _looks_ like the 'other' cpu isn't running and the current best > theory is that we race on cpu-up and get the stop_two_cpus() call in > before the stopper task is running. > > This _is_ possible because we set 'online && active' Argh. Can't really comment this change right now, but this reminds me that stop_two_cpus() path should not rely on cpu_active() at all. I mean we should not use this check to avoid the deadlock, migrate_swap_stop() can check it itself. And cpu_stop_park()->cpu_stop_signal_done() should be replaced by BUG_ON(). Probably slightly off-topic, but what do you finally think about the old "[PATCH v2 6/6] stop_machine: kill stop_cpus_lock and lg_double_lock/unlock()" we discussed in http://marc.info/?t=143750670300014 ? I won't really insist if you still dislike it, but it seems we both agree that "lg_lock stop_cpus_lock" must die in any case, and after that we can the cleanups mentioned above. And, Peter, I see a lot of interesting emails from you, but currently can't even read them. I hope very much I will read them later and perhaps even reply ;) Oleg.