From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754795Ab3AJRRb (ORCPT ); Thu, 10 Jan 2013 12:17:31 -0500 Received: from service87.mimecast.com ([91.220.42.44]:44456 "EHLO service87.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753946Ab3AJRRa convert rfc822-to-8bit (ORCPT ); Thu, 10 Jan 2013 12:17:30 -0500 Date: Thu, 10 Jan 2013 17:17:29 +0000 From: Morten Rasmussen To: Alex Shi Cc: "mingo@redhat.com" , "peterz@infradead.org" , "tglx@linutronix.de" , "akpm@linux-foundation.org" , "arjan@linux.intel.com" , "bp@alien8.de" , "pjt@google.com" , "namhyung@kernel.org" , "efault@gmx.de" , "vincent.guittot@linaro.org" , "gregkh@linuxfoundation.org" , "preeti@linux.vnet.ibm.com" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing Message-ID: <20130110171728.GG2046@e103034-lin> References: <1357375071-11793-1-git-send-email-alex.shi@intel.com> <1357375071-11793-18-git-send-email-alex.shi@intel.com> MIME-Version: 1.0 In-Reply-To: <1357375071-11793-18-git-send-email-alex.shi@intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-OriginalArrivalTime: 10 Jan 2013 17:17:24.0787 (UTC) FILETIME=[5BAC8030:01CDEF56] X-MC-Unique: 113011017172704201 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: 8BIT Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jan 05, 2013 at 08:37:46AM +0000, Alex Shi wrote: > If the wake/exec task is small enough, utils < 12.5%, it will > has the chance to be packed into a cpu which is busy but still has space to > handle it. > > Signed-off-by: Alex Shi > --- > kernel/sched/fair.c | 51 +++++++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 45 insertions(+), 6 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 8d0d3af..0596e81 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3471,19 +3471,57 @@ static inline int get_sd_sched_policy(struct sched_domain *sd, > } > > /* > + * find_leader_cpu - find the busiest but still has enough leisure time cpu > + * among the cpus in group. > + */ > +static int > +find_leader_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) > +{ > + unsigned vacancy, min_vacancy = UINT_MAX; unsigned int? > + int idlest = -1; > + int i; > + /* percentage the task's util */ > + unsigned putil = p->se.avg.runnable_avg_sum * 100 > + / (p->se.avg.runnable_avg_period + 1); Alternatively you could use se.avg.load_avg_contrib which is the same ratio scaled by the task priority (se->load.weight). In the above expression you don't take priority into account. > + > + /* Traverse only the allowed CPUs */ > + for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { > + struct rq *rq = cpu_rq(i); > + int nr_running = rq->nr_running > 0 ? rq->nr_running : 1; > + > + /* only pack task which putil < 12.5% */ > + vacancy = FULL_UTIL - (rq->util * nr_running + putil * 8); I can't follow this expression. The variables can have the following values: FULL_UTIL = 99 rq->util = [0..99] nr_running = [1..inf] putil = [0..99] Why multiply rq->util by nr_running? Let's take an example where rq->util = 50, nr_running = 2, and putil = 10. In this case the value of putil doesn't really matter as vacancy would be negative anyway since FULL_UTIL - rq->util * nr_running is -1. However, with rq->util = 50 there should be plenty of spare cpu time to take another task. Also, why multiply putil by 8? rq->util must be very close to 0 for vacancy to be positive if putil is close to 12 (12.5%). The vacancy variable is declared unsigned, so it will underflow instead of becoming negative. Is this intentional? I may be missing something, but could the expression be something like the below instead? Create a putil < 12.5% check before the loop. There is no reason to recheck it every iteration. Then: vacancy = FULL_UTIL - (rq->util + putil) should be enough? > + > + /* bias toward local cpu */ > + if (vacancy > 0 && (i == this_cpu)) > + return i; > + > + if (vacancy > 0 && vacancy < min_vacancy) { > + min_vacancy = vacancy; > + idlest = i; "idlest" may be a bit misleading here as you actually select busiest cpu that have enough spare capacity to take the task. Morten > + } > + } > + return idlest; > +} > + > +/* > * If power policy is eligible for this domain, and it has task allowed cpu. > * we will select CPU from this domain. > */ > static int get_cpu_for_power_policy(struct sched_domain *sd, int cpu, > - struct task_struct *p, struct sd_lb_stats *sds) > + struct task_struct *p, struct sd_lb_stats *sds, int fork) > { > int policy; > int new_cpu = -1; > > policy = get_sd_sched_policy(sd, cpu, p, sds); > - if (policy != SCHED_POLICY_PERFORMANCE && sds->group_leader) > - new_cpu = find_idlest_cpu(sds->group_leader, p, cpu); > - > + if (policy != SCHED_POLICY_PERFORMANCE && sds->group_leader) { > + if (!fork) > + new_cpu = find_leader_cpu(sds->group_leader, p, cpu); > + /* for fork balancing and a little busy task */ > + if (new_cpu == -1) > + new_cpu = find_idlest_cpu(sds->group_leader, p, cpu); > + } > return new_cpu; > } > > @@ -3534,14 +3572,15 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int flags) > if (tmp->flags & sd_flag) { > sd = tmp; > > - new_cpu = get_cpu_for_power_policy(sd, cpu, p, &sds); > + new_cpu = get_cpu_for_power_policy(sd, cpu, p, &sds, > + flags & SD_BALANCE_FORK); > if (new_cpu != -1) > goto unlock; > } > } > > if (affine_sd) { > - new_cpu = get_cpu_for_power_policy(affine_sd, cpu, p, &sds); > + new_cpu = get_cpu_for_power_policy(affine_sd, cpu, p, &sds, 0); > if (new_cpu != -1) > goto unlock; > > -- > 1.7.12 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ >