From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751572AbdITRVX (ORCPT ); Wed, 20 Sep 2017 13:21:23 -0400 Received: from foss.arm.com ([217.140.101.70]:36324 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750825AbdITRVV (ORCPT ); Wed, 20 Sep 2017 13:21:21 -0400 References: <20170807163900.25180-1-brendan.jackman@arm.com> User-agent: mu4e 0.9.17; emacs 25.1.1 From: Brendan Jackman To: linux-kernel@vger.kernel.org To: Josef Bacik , Peter Zijlstra Cc: Ingo Molnar , Morten Rasmussen , Mike Galbraith , Paul Turner Subject: Re: [PATCH] sched/fair: Force balancing on nohz balance if local group has capacity In-reply-to: <20170807163900.25180-1-brendan.jackman@arm.com> Date: Wed, 20 Sep 2017 18:21:20 +0100 Message-ID: <87h8vxi8vj.fsf@arm.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Peter, Josef, Do you have any thoughts on this one? On Mon, Aug 07 2017 at 16:39, Brendan Jackman wrote: > The "goto force_balance" here is intended to mitigate the fact that > avg_load calculations can result in bad placement decisions when > priority is asymmetrical. From the original commit (fab476228ba3 > "sched: Force balancing on newidle balance if local group has > capacity") that adds it: > > Under certain situations, such as a niced down task (i.e. nice = > -15) in the presence of nr_cpus NICE0 tasks, the niced task lands > on a sched group and kicks away other tasks because of its large > weight. This leads to sub-optimal utilization of the > machine. Even though the sched group has capacity, it does not > pull tasks because sds.this_load >> sds.max_load, and f_b_g() > returns NULL. > > A similar but inverted issue also affects ARM > big.LITTLE (asymmetrical CPU capacity) systems - consider 8 > always-running, same-priority tasks on a system with 4 "big" and 4 > "little" CPUs. Suppose that 5 of them end up on the "big" CPUs (which > will be represented by one sched_group in the DIE sched_domain) and 3 > on the "little" (the other sched_group in DIE), leaving one CPU > unused. Because the "big" group has a higher group_capacity its > avg_load may not present an imbalance that would cause migrating a > task to the idle "little". > > The force_balance case here solves the problem but currently only for > CPU_NEWLY_IDLE balances, which in theory might never happen on the > unused CPU. Including CPU_IDLE in the force_balance case means > there's an upper bound on the time before we can attempt to solve the > underutilization: after DIE's sd->balance_interval has passed the > next nohz balance kick will help us out. > > Signed-off-by: Brendan Jackman > Cc: Ingo Molnar > Cc: Morten Rasmussen > Cc: Peter Zijlstra > Cc: Mike Galbraith > Cc: Paul Turner > --- > kernel/sched/fair.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index c95880e216f6..63eff3e881a0 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7801,8 +7801,11 @@ static struct sched_group *find_busiest_group(struct lb_env *env) > if (busiest->group_type == group_imbalanced) > goto force_balance; > > - /* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */ > - if (env->idle == CPU_NEWLY_IDLE && group_has_capacity(env, local) && > + /* > + * When dst_cpu is idle, prevent SMP nice and/or asymmetric group > + * capacities from resulting in underutilization due to avg_load. > + */ > + if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && > busiest->group_no_capacity) > goto force_balance;