From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753219AbaHKRbw (ORCPT ); Mon, 11 Aug 2014 13:31:52 -0400 Received: from g6t1524.atlanta.hp.com ([15.193.200.67]:15367 "EHLO g6t1524.atlanta.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752642AbaHKRbv (ORCPT ); Mon, 11 Aug 2014 13:31:51 -0400 Message-ID: <1407778307.14059.12.camel@j-VirtualBox> Subject: Re: [PATCH] sched: Reduce contention in update_cfs_rq_blocked_load From: Jason Low To: bsegall@google.com Cc: Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, Waiman Long , Mel Gorman , Mike Galbraith , Rik van Riel , Aswin Chandramouleeswaran , Chegu Vinod , Scott J Norton , pjt@google.com, jason.low2@hp.com Date: Mon, 11 Aug 2014 10:31:47 -0700 In-Reply-To: References: <1407184118.11407.11.camel@j-VirtualBox> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.3-0ubuntu6 Content-Transfer-Encoding: 7bit Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2014-08-04 at 13:52 -0700, bsegall@google.com wrote: > > That said, it might be better to remove force_update for this function, > or make it just reduce the minimum to /64 or something. If the test is > easy to run it would be good to see what it's like just removing the > force_update param for this function to see if it's worth worrying > about or if the zero case catches ~all the perf gain. Hi Ben, I removed the force update in __update_cfs_rq_tg_load_contrib and it helped reduce overhead a lot more. I saw up to a 20x reduction in system overhead from update_cfs_rq_blocked_load when running some of the AIM7 workloads with this change. ----- diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index fea7d33..7a6e18b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2352,8 +2352,7 @@ static inline u64 __synchronize_entity_decay(struct sched_entity *se) } #ifdef CONFIG_FAIR_GROUP_SCHED -static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq, - int force_update) +static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq) { struct task_group *tg = cfs_rq->tg; long tg_contrib; @@ -2361,7 +2360,7 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq, tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg; tg_contrib -= cfs_rq->tg_load_contrib; - if (force_update || abs(tg_contrib) > cfs_rq->tg_load_contrib / 8) { + if (abs(tg_contrib) > cfs_rq->tg_load_contrib / 8) { atomic_long_add(tg_contrib, &tg->load_avg); cfs_rq->tg_load_contrib += tg_contrib; } @@ -2436,8 +2435,7 @@ static inline void update_rq_runnable_avg(struct rq *rq, int runnable) __update_tg_runnable_avg(&rq->avg, &rq->cfs); } #else /* CONFIG_FAIR_GROUP_SCHED */ -static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq, - int force_update) {} +static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq) {} static inline void __update_tg_runnable_avg(struct sched_avg *sa, struct cfs_rq *cfs_rq) {} static inline void __update_group_entity_contrib(struct sched_entity *se) {} @@ -2537,7 +2535,7 @@ static void update_cfs_rq_blocked_load(struct cfs_rq *cfs_rq, int force_update) cfs_rq->last_decay = now; } - __update_cfs_rq_tg_load_contrib(cfs_rq, force_update); + __update_cfs_rq_tg_load_contrib(cfs_rq); } /* Add the load generated by se into cfs_rq's child load-average */