From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754977AbbICADz (ORCPT ); Wed, 2 Sep 2015 20:03:55 -0400 Received: from mail-wi0-f169.google.com ([209.85.212.169]:36195 "EHLO mail-wi0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751319AbbICADy (ORCPT ); Wed, 2 Sep 2015 20:03:54 -0400 Date: Thu, 3 Sep 2015 02:03:51 +0200 From: Frederic Weisbecker To: Peter Zijlstra Cc: Tejun Heo , "Paul E. McKenney" , linux-kernel@vger.kernel.org Subject: Re: Warning in irq_work_queue_on() Message-ID: <20150903000350.GA28870@lerouge> References: <20150825001611.GA1751@linux.vnet.ibm.com> <20150902194405.GM22326@mtj.duckdns.org> <20150902215020.GA21505@lerouge> <20150902222427.GW19282@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150902222427.GW19282@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 03, 2015 at 12:24:27AM +0200, Peter Zijlstra wrote: > On Wed, Sep 02, 2015 at 11:50:22PM +0200, Frederic Weisbecker wrote: > > > > [ 875.703227] [] tick_nohz_full_kick_cpu+0x44/0x50 > > > > It happens in nohz full, but I'm not sure the guilty is nohz full. > > > > The problem here is that wake_up_nohz_cpu() selects a CPU that is offline. > > wake_up_nohz_cpu() doesn't do any such thing. Where does the selection > logic live? Err, got confused with get_nohz_timer_target(). But yeah wake_up_nohz_cpu() is called with a CPU that is chosen by mod_timer() -> get_nohz_timer_target(). > > > But this shouldn't happen. Either it selects a CPU that is in the domain tree, > > and I suspect offline CPUs aren't supposed to be there, or it selects the current > > CPU. And if the CPU is offlined, it shouldn't be running some kthread... > > Do no assume things like that.. always check with the active mask. Hmm, so perhaps we need something like this (makes me realize that the is_housekeeping_cpu() passes the wrong argument, no issue in practice since nohz full aren't in the domain tree but I still need to fix that along). diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0902e4d..2c10a69 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -628,7 +628,7 @@ int get_nohz_timer_target(void) rcu_read_lock(); for_each_domain(cpu, sd) { - for_each_cpu(i, sched_domain_span(sd)) { + for_each_cpu_and(i, sched_domain_span(sd), cpu_online_mask) { if (!idle_cpu(i) && is_housekeeping_cpu(cpu)) { cpu = i; goto unlock;