From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753575AbbCXSr5 (ORCPT <rfc822;w@1wt.eu>);
	Tue, 24 Mar 2015 14:47:57 -0400
Received: from service87.mimecast.com ([91.220.42.44]:34688 "EHLO
	service87.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752096AbbCXSr4 convert rfc822-to-8bit (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 24 Mar 2015 14:47:56 -0400
Message-ID: <5511B157.6030200@arm.com>
Date: Tue, 24 Mar 2015 18:47:51 +0000
From: Dietmar Eggemann <dietmar.eggemann@arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0
MIME-Version: 1.0
To: Peter Zijlstra <peterz@infradead.org>,
        Morten Rasmussen <Morten.Rasmussen@arm.com>
CC: "mingo@redhat.com" <mingo@redhat.com>,
        "vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
        "yuyang.du@intel.com" <yuyang.du@intel.com>,
        "preeti@linux.vnet.ibm.com" <preeti@linux.vnet.ibm.com>,
        "mturquette@linaro.org" <mturquette@linaro.org>,
        "nico@linaro.org" <nico@linaro.org>,
        "rjw@rjwysocki.net" <rjw@rjwysocki.net>,
        Juri Lelli <Juri.Lelli@arm.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFCv3 PATCH 44/48] sched: Tipping point from energy-aware to
 conventional load balancing
References: <1423074685-6336-1-git-send-email-morten.rasmussen@arm.com> <1423074685-6336-45-git-send-email-morten.rasmussen@arm.com> <20150324152655.GT23123@twins.programming.kicks-ass.net>
In-Reply-To: <20150324152655.GT23123@twins.programming.kicks-ass.net>
X-OriginalArrivalTime: 24 Mar 2015 18:47:51.0949 (UTC) FILETIME=[0838F3D0:01D06663]
X-MC-Unique: 115032418475301501
Content-Type: text/plain; charset=WINDOWS-1252; format=flowed
Content-Transfer-Encoding: 8BIT
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 24/03/15 15:26, Peter Zijlstra wrote:
> On Wed, Feb 04, 2015 at 06:31:21PM +0000, Morten Rasmussen wrote:
>> From: Dietmar Eggemann <dietmar.eggemann@arm.com>
>>
>> Energy-aware load balancing bases on cpu usage so the upper bound of its
>> operational range is a fully utilized cpu. Above this tipping point it
>> makes more sense to use weighted_cpuload to preserve smp_nice.
>> This patch implements the tipping point detection in update_sg_lb_stats
>> as if one cpu is over-utilized the current energy-aware load balance
>> operation will fall back into the conventional weighted load based one.
>>
>> cc: Ingo Molnar <mingo@redhat.com>
>> cc: Peter Zijlstra <peterz@infradead.org>
>>
>> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
>> ---
>>   kernel/sched/fair.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 6b79603..4849bad 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6723,6 +6723,10 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>>   		sgs->sum_weighted_load += weighted_cpuload(i);
>>   		if (idle_cpu(i))
>>   			sgs->idle_cpus++;
>> +
>> +		/* If cpu is over-utilized, bail out of ea */
>> +		if (env->use_ea && cpu_overutilized(i, env->sd))
>> +			env->use_ea = false;
>>   	}
>
> I don't immediately see why this is desired. Why would a single
> overloaded CPU be reason to quit? It could be the cpus simply aren't
> 'balanced' right and the group as a whole is still under utilized.

We want to play it safe here.

E.g. in a >2 cluster system, this over-utilized cpu could run >1 high 
priority tasks on a cluster with energy efficient cpus and this cluster 
could still not be the lb src on DIE level because a not over-utilized 
cluster with less energy-efficient cpus (burning more energy) could be 
chosen instead. We could construct cases where the other cpus in this 
energy efficient cluster can't help the over-utilized cpu during lb on 
MC level.

I can see that using per-cpu data in code which deals w/ sg's is against 
the sd scalability design where we should rely on per-sg and not per-cpu 
data though.

By bailing out in such a scenario we at least guarantee smpnice provided 
by conv. CFS.

We could also favor an sg with an over-utilized cpu to become the src 
but which one do we pick if there're multiple potential src sg's w/ an 
over-utilized cpu?

>
> In that case we want to continue the balance pass to reach this
> equilibrium.
>