From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754425Ab0JOLun (ORCPT ); Fri, 15 Oct 2010 07:50:43 -0400 Received: from casper.infradead.org ([85.118.1.10]:35624 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753475Ab0JOLum convert rfc822-to-8bit (ORCPT ); Fri, 15 Oct 2010 07:50:42 -0400 Subject: Re: [PATCH 3/4] sched: drop group_capacity to 1 only if local group has extra capacity From: Peter Zijlstra To: Nikhil Rao Cc: Ingo Molnar , Mike Galbraith , Suresh Siddha , Venkatesh Pallipadi , linux-kernel@vger.kernel.org In-Reply-To: <1287035281-25579-1-git-send-email-ncrao@google.com> References: <1286996978-7007-4-git-send-email-ncrao@google.com> <1287035281-25579-1-git-send-email-ncrao@google.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Date: Fri, 15 Oct 2010 13:50:23 +0200 Message-ID: <1287143423.29097.1460.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2010-10-13 at 22:48 -0700, Nikhil Rao wrote: > Resending this patch since the original patch was munged. Thanks to Mike > Galbraith for pointing this out. > > When SD_PREFER_SIBLING is set on a sched domain, drop group_capacity to 1 > only if the local group has extra capacity. For niced task balancing, we pull > low weight tasks away from a sched group as long as there is capacity in other > groups. When all other groups are saturated, we do not drop capacity of the > niced group down to 1. This prevents active balance from kicking out the low > weight threads and which hurts system utilization. > > Signed-off-by: Nikhil Rao > --- > kernel/sched_fair.c | 8 ++++++-- > 1 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c > index 0dd1021..da0c688 100644 > --- a/kernel/sched_fair.c > +++ b/kernel/sched_fair.c > @@ -2030,6 +2030,7 @@ struct sd_lb_stats { > unsigned long this_load; > unsigned long this_load_per_task; > unsigned long this_nr_running; > + unsigned long this_group_capacity; > > /* Statistics of the busiest group */ > unsigned long max_load; > @@ -2546,15 +2547,18 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu, > /* > * In case the child domain prefers tasks go to siblings > * first, lower the sg capacity to one so that we'll try > - * and move all the excess tasks away. > + * and move all the excess tasks away. We lower capacity only > + * if the local group can handle the extra capacity. > */ > - if (prefer_sibling) > + if (prefer_sibling && !local_group && > + sds->this_nr_running < sds->this_group_capacity) > sgs.group_capacity = min(sgs.group_capacity, 1UL); > > if (local_group) { > sds->this_load = sgs.avg_load; > sds->this = sg; > sds->this_nr_running = sgs.sum_nr_running; > + sds->this_group_capacity = sgs.group_capacity; > sds->this_load_per_task = sgs.sum_weighted_load; > } else if (update_sd_pick_busiest(sd, sds, sg, &sgs, this_cpu)) { > sds->max_load = sgs.avg_load; OK, this thing confuses me, that changelog nor the comment actually seem to help with understanding this..