From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934295AbbI2QDh (ORCPT ); Tue, 29 Sep 2015 12:03:37 -0400 Received: from mx2.parallels.com ([199.115.105.18]:57739 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932168AbbI2QD2 (ORCPT ); Tue, 29 Sep 2015 12:03:28 -0400 Subject: Re: [PATCH] sched/fair: Skip wake_affine() for core siblings To: Mike Galbraith References: <56058A3F.5060408@odin.com> <1443281111.3521.30.camel@gmail.com> <56091651.6070607@odin.com> <1443445947.3529.48.camel@gmail.com> <56095E7C.7080300@odin.com> <1443538525.27815.47.camel@gmail.com> <560AB591.4070407@odin.com> CC: , Peter Zijlstra , Ingo Molnar From: Kirill Tkhai Message-ID: <560AB648.8090009@odin.com> Date: Tue, 29 Sep 2015 19:03:20 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.2.0 MIME-Version: 1.0 In-Reply-To: <560AB591.4070407@odin.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-ClientProxiedBy: US-EXCH2.sw.swsoft.com (10.255.249.46) To US-EXCH2.sw.swsoft.com (10.255.249.46) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 29.09.2015 19:00, Kirill Tkhai wrote: > > > On 29.09.2015 17:55, Mike Galbraith wrote: >> On Mon, 2015-09-28 at 18:36 +0300, Kirill Tkhai wrote: >> >>> --- >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>> index 4df37a4..dfbe06b 100644 >>> --- a/kernel/sched/fair.c >>> +++ b/kernel/sched/fair.c >>> @@ -4930,8 +4930,13 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f >>> int want_affine = 0; >>> int sync = wake_flags & WF_SYNC; >>> >>> - if (sd_flag & SD_BALANCE_WAKE) >>> - want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, tsk_cpus_allowed(p)); >>> + if (sd_flag & SD_BALANCE_WAKE) { >>> + want_affine = 1; >>> + if (cpu == prev_cpu || !cpumask_test_cpu(cpu, tsk_cpus_allowed(p))) >>> + goto want_affine; >>> + if (wake_wide(p)) >>> + goto want_affine; >>> + } >> >> That blew wake_wide() right out of the water. >> >> It's not only about things like pgbench. Drive multiple tasks in a Xen >> guest (single event channel dom0 -> domu, and no select_idle_sibling() >> to save the day) via network, and watch workers fail to be all they can >> be because they keep being stacked up on the irq source. Load balancing >> yanks them apart, next irq stacks them right back up. I met that in >> enterprise land, thought wake_wide() should cure it, and indeed it did. > > 1)Hm.. The patch makes select_task_rq_fair() to prefer old cpu instead of > current, doesn't it? We more often don't set affine_sd. So, the skipped > part of patch (skipped in quote) selects prev_cpu. > > 2)I thought about waking by irq handler and even was going to ask why > we use affine logic for such wakeups. Device handlers usually aren't > bound, timers may migrate since NO_HZ logic presents. The only explanation > I found is unbound timers is very unlikely case (I added statistics printk > to my local sched_debug to check that). But if we have the situations like > you described above, don't we have to disable affine logic for in_interrupt() > cases? > > 3)I ask about just because (being outside of scheduler history) it's a little > bit strange, we prefer smp_processor_id()'s sd_llc so much. Sync wakeup's > profit is less or more clear: smp_processor_id()'s sd_llc may contain some > data, which is interesting for a wakee, and this minimizes cache misses. > But we do the same in other cases too, and at every migration we loose > itlb, dtlb... Of course, it requires more accurate patches, then posted ***typo: instruction and data caches > (not so rude patches). > > Thanks, > Kirill >