From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S934295AbbI2QDh (ORCPT <rfc822;w@1wt.eu>);
	Tue, 29 Sep 2015 12:03:37 -0400
Received: from mx2.parallels.com ([199.115.105.18]:57739 "EHLO
	mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S932168AbbI2QD2 (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 29 Sep 2015 12:03:28 -0400
Subject: Re: [PATCH] sched/fair: Skip wake_affine() for core siblings
To: Mike Galbraith <umgwanakikbuti@gmail.com>
References: <56058A3F.5060408@odin.com> <1443281111.3521.30.camel@gmail.com>
 <56091651.6070607@odin.com> <1443445947.3529.48.camel@gmail.com>
 <56095E7C.7080300@odin.com> <1443538525.27815.47.camel@gmail.com>
 <560AB591.4070407@odin.com>
CC: <linux-kernel@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>,
        Ingo Molnar <mingo@redhat.com>
From: Kirill Tkhai <ktkhai@odin.com>
Message-ID: <560AB648.8090009@odin.com>
Date: Tue, 29 Sep 2015 19:03:20 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Icedove/38.2.0
MIME-Version: 1.0
In-Reply-To: <560AB591.4070407@odin.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
X-ClientProxiedBy: US-EXCH2.sw.swsoft.com (10.255.249.46) To
 US-EXCH2.sw.swsoft.com (10.255.249.46)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


On 29.09.2015 19:00, Kirill Tkhai wrote:
> 
> 
> On 29.09.2015 17:55, Mike Galbraith wrote:
>> On Mon, 2015-09-28 at 18:36 +0300, Kirill Tkhai wrote:
>>
>>> ---
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 4df37a4..dfbe06b 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -4930,8 +4930,13 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
>>>  	int want_affine = 0;
>>>  	int sync = wake_flags & WF_SYNC;
>>>  
>>> -	if (sd_flag & SD_BALANCE_WAKE)
>>> -		want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, tsk_cpus_allowed(p));
>>> +	if (sd_flag & SD_BALANCE_WAKE) {
>>> +		want_affine = 1;
>>> +		if (cpu == prev_cpu || !cpumask_test_cpu(cpu, tsk_cpus_allowed(p)))
>>> +			goto want_affine;
>>> +		if (wake_wide(p))
>>> +			goto want_affine;
>>> +	}
>>
>> That blew wake_wide() right out of the water.
>>
>> It's not only about things like pgbench.  Drive multiple tasks in a Xen
>> guest (single event channel dom0 -> domu, and no select_idle_sibling()
>> to save the day) via network, and watch workers fail to be all they can
>> be because they keep being stacked up on the irq source.  Load balancing
>> yanks them apart, next irq stacks them right back up.  I met that in
>> enterprise land, thought wake_wide() should cure it, and indeed it did.
> 
> 1)Hm.. The patch makes select_task_rq_fair() to prefer old cpu instead of
> current, doesn't it? We more often don't set affine_sd. So, the skipped
> part of patch (skipped in quote) selects prev_cpu.
> 
> 2)I thought about waking by irq handler and even was going to ask why
> we use affine logic for such wakeups. Device handlers usually aren't
> bound, timers may migrate since NO_HZ logic presents. The only explanation
> I found is unbound timers is very unlikely case (I added statistics printk
> to my local sched_debug to check that). But if we have the situations like
> you described above, don't we have to disable affine logic for in_interrupt()
> cases?
> 
> 3)I ask about just because (being outside of scheduler history) it's a little
> bit strange, we prefer smp_processor_id()'s sd_llc so much. Sync wakeup's
> profit is less or more clear: smp_processor_id()'s sd_llc may contain some
> data, which is interesting for a wakee, and this minimizes cache misses.
> But we do the same in other cases too, and at every migration we loose
> itlb, dtlb... Of course, it requires more accurate patches, then posted

***typo: instruction and data caches

> (not so rude patches).
> 
> Thanks,
> Kirill
>