From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756164AbbI2Oza (ORCPT ); Tue, 29 Sep 2015 10:55:30 -0400 Received: from mail-wi0-f173.google.com ([209.85.212.173]:36736 "EHLO mail-wi0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753943AbbI2Oz3 (ORCPT ); Tue, 29 Sep 2015 10:55:29 -0400 Message-ID: <1443538525.27815.47.camel@gmail.com> Subject: Re: [PATCH] sched/fair: Skip wake_affine() for core siblings From: Mike Galbraith To: Kirill Tkhai Cc: linux-kernel@vger.kernel.org, Peter Zijlstra , Ingo Molnar Date: Tue, 29 Sep 2015 16:55:25 +0200 In-Reply-To: <56095E7C.7080300@odin.com> References: <56058A3F.5060408@odin.com> <1443281111.3521.30.camel@gmail.com> <56091651.6070607@odin.com> <1443445947.3529.48.camel@gmail.com> <56095E7C.7080300@odin.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.12.11 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2015-09-28 at 18:36 +0300, Kirill Tkhai wrote: > --- > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 4df37a4..dfbe06b 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -4930,8 +4930,13 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f > int want_affine = 0; > int sync = wake_flags & WF_SYNC; > > - if (sd_flag & SD_BALANCE_WAKE) > - want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, tsk_cpus_allowed(p)); > + if (sd_flag & SD_BALANCE_WAKE) { > + want_affine = 1; > + if (cpu == prev_cpu || !cpumask_test_cpu(cpu, tsk_cpus_allowed(p))) > + goto want_affine; > + if (wake_wide(p)) > + goto want_affine; > + } That blew wake_wide() right out of the water. It's not only about things like pgbench. Drive multiple tasks in a Xen guest (single event channel dom0 -> domu, and no select_idle_sibling() to save the day) via network, and watch workers fail to be all they can be because they keep being stacked up on the irq source. Load balancing yanks them apart, next irq stacks them right back up. I met that in enterprise land, thought wake_wide() should cure it, and indeed it did. -Mike