From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752425AbbJKSHa (ORCPT ); Sun, 11 Oct 2015 14:07:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52516 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752135AbbJKSH3 (ORCPT ); Sun, 11 Oct 2015 14:07:29 -0400 Date: Sun, 11 Oct 2015 20:04:06 +0200 From: Oleg Nesterov To: Peter Zijlstra Cc: heiko.carstens@de.ibm.com, Tejun Heo , Ingo Molnar , Rik van Riel , Thomas Gleixner , Vitaly Kuznetsov , linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq() Message-ID: <20151011180406.GA8899@redhat.com> References: <20151010185255.GA24075@redhat.com> <20151010185309.GA24089@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151010185309.GA24089@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/10, Oleg Nesterov wrote: > > I do not understand the cpu_active() check in select_fallback_rq(). > x86 doesn't need it, and the recent commit dd9d3843755d "sched: Fix > cpu_active_mask/cpu_online_mask race" documents the fact that on any > architecture we can ignore !active starting from CPU_ONLINE stage. > > But any possible reason why do we need this check in "fallback" must > equally apply to select_task_rq(). And I still think this is true, select_task_rq() and select_fallback_rq() should use the same check in any case... > +static inline bool cpu_allowed(int cpu) > +{ > + return cpu_online(cpu) && cpu_active(cpu); > +} ... > @@ -1390,7 +1391,7 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags) > * not worry about this generic constraint ] > */ > if (unlikely(!cpumask_test_cpu(cpu, tsk_cpus_allowed(p)) || > - !cpu_online(cpu))) > + !cpu_allowed(cpu))) > cpu = select_fallback_rq(task_cpu(p), p); But as Fengguang reports (thanks a lot!) this change is wrong. It leads to another BUG_ON(td->cpu != smp_processor_id()) before ht->park(td->cpu) in smpboot_thread_fn(). I should have realized this. smpboot_park_threads() is called after CPU_DOWN_PREPARE. And this can break other PF_NO_SETAFFINITY threads. Perhaps I am totally confused, but to me this looks like another indication that select_fallback_rq() should not check cpu_active(), or at least this needs some changes... Oleg.