From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B456C433E0 for ; Thu, 21 May 2020 12:43:43 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3237B2070A for ; Thu, 21 May 2020 12:43:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="dTw9ChoU" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3237B2070A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 49STnR0f1qzDqlJ for ; Thu, 21 May 2020 22:43:39 +1000 (AEST) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=kernel.org (client-ip=198.145.29.99; helo=mail.kernel.org; envelope-from=frederic@kernel.org; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=kernel.org Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256 header.s=default header.b=dTw9ChoU; dkim-atps=neutral Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 49STkl26jGzDqXY for ; Thu, 21 May 2020 22:41:18 +1000 (AEST) Received: from localhost (lfbn-ncy-1-985-231.w90-101.abo.wanadoo.fr [90.101.63.231]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 12EAA2070A; Thu, 21 May 2020 12:41:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1590064876; bh=igxjKFnhcuPmlFKYMli0yfykVd0eALHpcuoJKGGL0xg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=dTw9ChoUj1HRmO17CmilmiPewiK4u2GM1zvZ5VH2Ttspv8+c75/gOAQ+BwEs+zH0v KGAwtvW5j3uPGp+f3zObwtDktr9w3RBkW8K/HeVpEyiW6dTi7MEq6u/M+uDWuj4yQ3 aKUdBlgX2q61gCSMhHLXgRjQlawM73EgnAMR0gQ8= Date: Thu, 21 May 2020 14:41:14 +0200 From: Frederic Weisbecker To: Peter Zijlstra Subject: Re: Endless soft-lockups for compiling workload since next-20200519 Message-ID: <20200521124113.GC15455@lenoir> References: <20200520125056.GC325280@hirez.programming.kicks-ass.net> <20200521004035.GA15455@lenoir> <20200521093938.GG325280@hirez.programming.kicks-ass.net> <20200521104937.GB325303@hirez.programming.kicks-ass.net> <20200521110027.GC325303@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200521110027.GC325303@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.9.4 (2018-02-28) X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Paul E. McKenney" , Linux Kernel Mailing List , Qian Cai , Borislav Petkov , Thomas Gleixner , linuxppc-dev Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Thu, May 21, 2020 at 01:00:27PM +0200, Peter Zijlstra wrote: > On Thu, May 21, 2020 at 12:49:37PM +0200, Peter Zijlstra wrote: > > On Thu, May 21, 2020 at 11:39:39AM +0200, Peter Zijlstra wrote: > > > On Thu, May 21, 2020 at 02:40:36AM +0200, Frederic Weisbecker wrote: > > > > > This: > > > > > > > smp_call_function_single_async() { smp_call_function_single_async() { > > > > // verified csd->flags != CSD_LOCK // verified csd->flags != CSD_LOCK > > > > csd->flags = CSD_LOCK csd->flags = CSD_LOCK > > > > > > concurrent smp_call_function_single_async() using the same csd is what > > > I'm looking at as well. > > > > So something like this ought to cure the fundamental problem and make > > smp_call_function_single_async() more user friendly, but also more > > expensive. > > > > The problem is that while the ILB case is easy to fix, I can't seem to > > find an equally nice solution for the ttwu_remote_queue() case; that > > would basically require sticking the wake_csd in task_struct, I'll also > > post that. > > > > So it's either this: > > Or this: > > --- > include/linux/sched.h | 4 ++++ > kernel/sched/core.c | 7 ++++--- > kernel/sched/fair.c | 2 +- > kernel/sched/sched.h | 1 - > 4 files changed, 9 insertions(+), 5 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index f38d62c4632c..136ee400b568 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -696,6 +696,10 @@ struct task_struct { > struct uclamp_se uclamp[UCLAMP_CNT]; > #endif > > +#ifdef CONFIG_SMP > + call_single_data_t wake_csd; > +#endif > + > #ifdef CONFIG_PREEMPT_NOTIFIERS > /* List of struct preempt_notifier: */ > struct hlist_head preempt_notifiers; > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 5b286469e26e..a7129652e89b 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -2320,7 +2320,7 @@ static void ttwu_queue_remote(struct task_struct *p, int cpu, int wake_flags) > > if (llist_add(&p->wake_entry, &rq->wake_list)) { > if (!set_nr_if_polling(rq->idle)) > - smp_call_function_single_async(cpu, &rq->wake_csd); > + smp_call_function_single_async(cpu, &p->wake_csd); > else > trace_sched_wake_idle_without_ipi(cpu); > } > @@ -2921,6 +2921,9 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) > #endif > #if defined(CONFIG_SMP) > p->on_cpu = 0; > + p->wake_csd = (struct __call_single_data) { > + .func = wake_csd_func, > + }; > #endif > init_task_preempt_count(p); > #ifdef CONFIG_SMP > @@ -6723,8 +6726,6 @@ void __init sched_init(void) > rq->avg_idle = 2*sysctl_sched_migration_cost; > rq->max_idle_balance_cost = sysctl_sched_migration_cost; > > - rq_csd_init(rq, &rq->wake_csd, wake_csd_func); > - > INIT_LIST_HEAD(&rq->cfs_tasks); > > rq_attach_root(rq, &def_root_domain); > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 01f94cf52783..b6d8a7b991f0 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -10033,7 +10033,7 @@ static void kick_ilb(unsigned int flags) > * is idle. And the softirq performing nohz idle load balance > * will be run before returning from the IPI. > */ > - smp_call_function_single_async(ilb_cpu, &cpu_rq(ilb_cpu)->nohz_csd); > + smp_call_function_single_async(ilb_cpu, &this_rq()->nohz_csd); My fear here is that if a previous call from the the same CPU but to another target is still pending, the new one will be spuriously ignored. Namely this could happen: CPU 0 CPU 1 ----- ----- local_irq_disable() or VMEXIT kick_ilb() { smp_call_function_single_async(CPU 1, &this_rq()->nohz_csd); } kick_ilb() { smp_call_function_single_async(CPU 2, &this_rq()->nohz_csd) { // IPI to CPU 2 ignored if (csd->flags == CSD_LOCK) return -EBUSY; } } local_irq_enable(); But I believe we can still keep the remote csd if nohz_flags() are strictly only set before the IPI and strictly only cleared from it. And I still don't understand why trigger_load_balance() raise the softirq without setting the current CPU as ilb. run_rebalance_domains() thus ignores it most of the time in the end or it spuriously clear the nohz_flags set by an IPI sender. Or there is something I misunderstood there. (Haven't checked the wake up case yet).