From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754406AbbIIJn1 (ORCPT ); Wed, 9 Sep 2015 05:43:27 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:47425 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751443AbbIIJnS (ORCPT ); Wed, 9 Sep 2015 05:43:18 -0400 Date: Wed, 9 Sep 2015 11:43:05 +0200 From: Peter Zijlstra To: Morten Rasmussen Cc: Vincent Guittot , Dietmar Eggemann , Steve Muckle , "mingo@redhat.com" , "daniel.lezcano@linaro.org" , "yuyang.du@intel.com" , "mturquette@baylibre.com" , "rjw@rjwysocki.net" , Juri Lelli , "sgurrappadi@nvidia.com" , "pang.xunlei@zte.com.cn" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig Message-ID: <20150909094305.GO3644@twins.programming.kicks-ass.net> References: <1439569394-11974-6-git-send-email-morten.rasmussen@arm.com> <55E8DD00.2030706@linaro.org> <55EDAF43.30500@arm.com> <55EDDD5A.70904@arm.com> <20150908122606.GH3644@twins.programming.kicks-ass.net> <20150908125205.GW18673@twins.programming.kicks-ass.net> <20150908143157.GA27098@e105550-lin.cambridge.arm.com> <20150908165331.GC27098@e105550-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150908165331.GC27098@e105550-lin.cambridge.arm.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 08, 2015 at 05:53:31PM +0100, Morten Rasmussen wrote: > On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote: > > On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote: > > But if we apply the scaling to the weight instead of time, we would only > > have to apply it once and not three times like it is now? So maybe we > > can end up with almost the same number of multiplications. > > > > We might be loosing bits for low priority task running on cpus at a low > > frequency though. > > Something like the below. We should be saving one multiplication. > @@ -2577,8 +2575,13 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, > return 0; > sa->last_update_time = now; > > - scale_freq = arch_scale_freq_capacity(NULL, cpu); > - scale_cpu = arch_scale_cpu_capacity(NULL, cpu); > + if (weight || running) > + scale_freq = arch_scale_freq_capacity(NULL, cpu); > + if (weight) > + scaled_weight = weight * scale_freq >> SCHED_CAPACITY_SHIFT; > + if (running) > + scale_freq_cpu = scale_freq * arch_scale_cpu_capacity(NULL, cpu) > + >> SCHED_CAPACITY_SHIFT; > > /* delta_w is the amount already accumulated against our next period */ > delta_w = sa->period_contrib; > @@ -2594,16 +2597,15 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa, > * period and accrue it. > */ > delta_w = 1024 - delta_w; > - scaled_delta_w = cap_scale(delta_w, scale_freq); > if (weight) { > - sa->load_sum += weight * scaled_delta_w; > + sa->load_sum += scaled_weight * delta_w; > if (cfs_rq) { > cfs_rq->runnable_load_sum += > - weight * scaled_delta_w; > + scaled_weight * delta_w; > } > } > if (running) > - sa->util_sum += scaled_delta_w * scale_cpu; > + sa->util_sum += delta_w * scale_freq_cpu; > > delta -= delta_w; > Sadly that makes the code worse; I get 14 mul instructions where previously I had 11. What happens is that GCC gets confused and cannot constant propagate the new variables, so what used to be shifts now end up being actual multiplications. With this, I get back to 11. Can you see what happens on ARM where you have both functions defined to non constants? --- --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2551,10 +2551,10 @@ static __always_inline int __update_load_avg(u64 now, int cpu, struct sched_avg *sa, unsigned long weight, int running, struct cfs_rq *cfs_rq) { + unsigned long scaled_weight, scale_freq, scale_freq_cpu; + unsigned int delta_w, decayed = 0; u64 delta, periods; u32 contrib; - unsigned int delta_w, decayed = 0; - unsigned long scaled_weight = 0, scale_freq, scale_freq_cpu = 0; delta = now - sa->last_update_time; /* @@ -2575,13 +2575,10 @@ __update_load_avg(u64 now, int cpu, stru return 0; sa->last_update_time = now; - if (weight || running) - scale_freq = arch_scale_freq_capacity(NULL, cpu); - if (weight) - scaled_weight = weight * scale_freq >> SCHED_CAPACITY_SHIFT; - if (running) - scale_freq_cpu = scale_freq * arch_scale_cpu_capacity(NULL, cpu) - >> SCHED_CAPACITY_SHIFT; + scale_freq = arch_scale_freq_capacity(NULL, cpu); + + scaled_weight = weight * scale_freq >> SCHED_CAPACITY_SHIFT; + scale_freq_cpu = scale_freq * arch_scale_cpu_capacity(NULL, cpu) >> SCHED_CAPACITY_SHIFT; /* delta_w is the amount already accumulated against our next period */ delta_w = sa->period_contrib;